Sample records for cluster detection methods

  1. A spatial scan statistic for multiple clusters.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2011-10-01

    Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. The detection methods of dynamic objects

    NASA Astrophysics Data System (ADS)

    Knyazev, N. L.; Denisova, L. A.

    2018-01-01

    The article deals with the application of cluster analysis methods for solving the task of aircraft detection on the basis of distribution of navigation parameters selection into groups (clusters). The modified method of cluster analysis for search and detection of objects and then iterative combining in clusters with the subsequent count of their quantity for increase in accuracy of the aircraft detection have been suggested. The course of the method operation and the features of implementation have been considered. In the conclusion the noted efficiency of the offered method for exact cluster analysis for finding targets has been shown.

  3. Spatial cluster detection for repeatedly measured outcomes while accounting for residential history.

    PubMed

    Cook, Andrea J; Gold, Diane R; Li, Yi

    2009-10-01

    Spatial cluster detection has become an important methodology in quantifying the effect of hazardous exposures. Previous methods have focused on cross-sectional outcomes that are binary or continuous. There are virtually no spatial cluster detection methods proposed for longitudinal outcomes. This paper proposes a new spatial cluster detection method for repeated outcomes using cumulative geographic residuals. A major advantage of this method is its ability to readily incorporate information on study participants relocation, which most cluster detection statistics cannot. Application of these methods will be illustrated by the Home Allergens and Asthma prospective cohort study analyzing the relationship between environmental exposures and repeated measured outcome, occurrence of wheeze in the last 6 months, while taking into account mobile locations.

  4. A spatial scan statistic for nonisotropic two-level risk cluster.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2012-01-30

    Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.

  5. K2: A NEW METHOD FOR THE DETECTION OF GALAXY CLUSTERS BASED ON CANADA-FRANCE-HAWAII TELESCOPE LEGACY SURVEY MULTICOLOR IMAGES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thanjavur, Karun; Willis, Jon; Crampton, David, E-mail: karun@uvic.c

    2009-11-20

    We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially, K2 was applied to the two color (gri) 161 deg{sup 2} images of the Canada-France-Hawaii Telescope Legacy Survey Wide (CFHTLS-W) data. Our simulationsmore » show that the false detection rate for these data, at our selected threshold, is only approx1%, and that the cluster catalogs are approx80% complete up to a redshift of z = 0.6 for Fornax-like and richer clusters and to z approx 0.3 for poorer clusters. Based on the g-, r-, and i-band photometric catalogs of the Terapix T05 release, 35 clusters/deg{sup 2} are detected, with 1-2 Fornax-like or richer clusters every 2 deg{sup 2}. Catalogs containing data for 6144 galaxy clusters have been prepared, of which 239 are rich clusters. These clusters, especially the latter, are being searched for gravitational lenses-one of our chief motivations for cluster detection in CFHTLS. The K2 method can be easily extended to use additional color information and thus improve overall cluster detection to higher redshifts. The complete set of K2 cluster catalogs, along with the supplementary catalogs for the member galaxies, are available on request from the authors.« less

  6. A matched filter approach for blind joint detection of galaxy clusters in X-ray and SZ surveys

    NASA Astrophysics Data System (ADS)

    Tarrío, P.; Melin, J.-B.; Arnaud, M.

    2018-06-01

    The combination of X-ray and Sunyaev-Zeldovich (SZ) observations can potentially improve the cluster detection efficiency, when compared to using only one of these probes, since both probe the same medium, the hot ionized gas of the intra-cluster medium. We present a method based on matched multifrequency filters (MMF) for detecting galaxy clusters from SZ and X-ray surveys. This method builds on a previously proposed joint X-ray-SZ extraction method and allows the blind detection of clusters, that is finding new clusters without knowing their position, size, or redshift, by searching on SZ and X-ray maps simultaneously. The proposed method is tested using data from the ROSAT all-sky survey and from the Planck survey. The evaluation is done by comparison with existing cluster catalogues in the area of the sky covered by the deep SPT survey. Thanks to the addition of the X-ray information, the joint detection method is able to achieve simultaneously better purity, better detection efficiency, and better position accuracy than its predecessor Planck MMF, which is based on SZ maps alone. For a purity of 85%, the X-ray-SZ method detects 141 confirmed clusters in the SPT region; to detect the same number of confirmed clusters with Planck MMF, we would need to decrease its purity to 70%. We provide a catalogue of 225 sources selected by the proposed method in the SPT footprint, with masses ranging between 0.7 and 14.5 ×1014 M⊙ and redshifts between 0.01 and 1.2.

  7. A Hybrid Approach for CpG Island Detection in the Human Genome.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Da; Chiang, Yi-Cheng; Chuang, Li-Yeh

    2016-01-01

    CpG islands have been demonstrated to influence local chromatin structures and simplify the regulation of gene activity. However, the accurate and rapid determination of CpG islands for whole DNA sequences remains experimentally and computationally challenging. A novel procedure is proposed to detect CpG islands by combining clustering technology with the sliding-window method (PSO-based). Clustering technology is used to detect the locations of all possible CpG islands and process the data, thus effectively obviating the need for the extensive and unnecessary processing of DNA fragments, and thus improving the efficiency of sliding-window based particle swarm optimization (PSO) search. This proposed approach, named ClusterPSO, provides versatile and highly-sensitive detection of CpG islands in the human genome. In addition, the detection efficiency of ClusterPSO is compared with eight CpG island detection methods in the human genome. Comparison of the detection efficiency for the CpG islands in human genome, including sensitivity, specificity, accuracy, performance coefficient (PC), and correlation coefficient (CC), ClusterPSO revealed superior detection ability among all of the test methods. Moreover, the combination of clustering technology and PSO method can successfully overcome their respective drawbacks while maintaining their advantages. Thus, clustering technology could be hybridized with the optimization algorithm method to optimize CpG island detection. The prediction accuracy of ClusterPSO was quite high, indicating the combination of CpGcluster and PSO has several advantages over CpGcluster and PSO alone. In addition, ClusterPSO significantly reduced implementation time.

  8. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    USGS Publications Warehouse

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  9. Clustering Categorical Data Using Community Detection Techniques

    PubMed Central

    2017-01-01

    With the advent of the k-modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in k-modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing k modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top-k detected communities by size will define the k modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for k-modes in terms of accuracy, precision, and recall in most of the cases. PMID:29430249

  10. Stepwise and stagewise approaches for spatial cluster detection

    PubMed Central

    Xu, Jiale

    2016-01-01

    Spatial cluster detection is an important tool in many areas such as sociology, botany and public health. Previous work has mostly taken either hypothesis testing framework or Bayesian framework. In this paper, we propose a few approaches under a frequentist variable selection framework for spatial cluster detection. The forward stepwise methods search for multiple clusters by iteratively adding currently most likely cluster while adjusting for the effects of previously identified clusters. The stagewise methods also consist of a series of steps, but with tiny step size in each iteration. We study the features and performances of our proposed methods using simulations on idealized grids or real geographic area. From the simulations, we compare the performance of the proposed methods in terms of estimation accuracy and power of detections. These methods are applied to the the well-known New York leukemia data as well as Indiana poverty data. PMID:27246273

  11. Application of Scan Statistics to Detect Suicide Clusters in Australia

    PubMed Central

    Cheung, Yee Tak Derek; Spittal, Matthew J.; Williamson, Michelle Kate; Tung, Sui Jay; Pirkis, Jane

    2013-01-01

    Background Suicide clustering occurs when multiple suicide incidents take place in a small area or/and within a short period of time. In spite of the multi-national research attention and particular efforts in preparing guidelines for tackling suicide clusters, the broader picture of epidemiology of suicide clustering remains unclear. This study aimed to develop techniques in using scan statistics to detect clusters, with the detection of suicide clusters in Australia as example. Methods and Findings Scan statistics was applied to detect clusters among suicides occurring between 2004 and 2008. Manipulation of parameter settings and change of area for scan statistics were performed to remedy shortcomings in existing methods. In total, 243 suicides out of 10,176 (2.4%) were identified as belonging to 15 suicide clusters. These clusters were mainly located in the Northern Territory, the northern part of Western Australia, and the northern part of Queensland. Among the 15 clusters, 4 (26.7%) were detected by both national and state cluster detections, 8 (53.3%) were only detected by the state cluster detection, and 3 (20%) were only detected by the national cluster detection. Conclusions These findings illustrate that the majority of spatial-temporal clusters of suicide were located in the inland northern areas, with socio-economic deprivation and higher proportions of indigenous people. Discrepancies between national and state/territory cluster detection by scan statistics were due to the contrast of the underlying suicide rates across states/territories. Performing both small-area and large-area analyses, and applying multiple parameter settings may yield the maximum benefits for exploring clusters. PMID:23342098

  12. Stepwise and stagewise approaches for spatial cluster detection.

    PubMed

    Xu, Jiale; Gangnon, Ronald E

    2016-05-01

    Spatial cluster detection is an important tool in many areas such as sociology, botany and public health. Previous work has mostly taken either a hypothesis testing framework or a Bayesian framework. In this paper, we propose a few approaches under a frequentist variable selection framework for spatial cluster detection. The forward stepwise methods search for multiple clusters by iteratively adding currently most likely cluster while adjusting for the effects of previously identified clusters. The stagewise methods also consist of a series of steps, but with a tiny step size in each iteration. We study the features and performances of our proposed methods using simulations on idealized grids or real geographic areas. From the simulations, we compare the performance of the proposed methods in terms of estimation accuracy and power. These methods are applied to the the well-known New York leukemia data as well as Indiana poverty data. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Detecting cancer clusters in a regional population with local cluster tests and Bayesian smoothing methods: a simulation study

    PubMed Central

    2013-01-01

    Background There is a rising public and political demand for prospective cancer cluster monitoring. But there is little empirical evidence on the performance of established cluster detection tests under conditions of small and heterogeneous sample sizes and varying spatial scales, such as are the case for most existing population-based cancer registries. Therefore this simulation study aims to evaluate different cluster detection methods, implemented in the open soure environment R, in their ability to identify clusters of lung cancer using real-life data from an epidemiological cancer registry in Germany. Methods Risk surfaces were constructed with two different spatial cluster types, representing a relative risk of RR = 2.0 or of RR = 4.0, in relation to the overall background incidence of lung cancer, separately for men and women. Lung cancer cases were sampled from this risk surface as geocodes using an inhomogeneous Poisson process. The realisations of the cancer cases were analysed within small spatial (census tracts, N = 1983) and within aggregated large spatial scales (communities, N = 78). Subsequently, they were submitted to the cluster detection methods. The test accuracy for cluster location was determined in terms of detection rates (DR), false-positive (FP) rates and positive predictive values. The Bayesian smoothing models were evaluated using ROC curves. Results With moderate risk increase (RR = 2.0), local cluster tests showed better DR (for both spatial aggregation scales > 0.90) and lower FP rates (both < 0.05) than the Bayesian smoothing methods. When the cluster RR was raised four-fold, the local cluster tests showed better DR with lower FPs only for the small spatial scale. At a large spatial scale, the Bayesian smoothing methods, especially those implementing a spatial neighbourhood, showed a substantially lower FP rate than the cluster tests. However, the risk increases at this scale were mostly diluted by data aggregation. Conclusion High resolution spatial scales seem more appropriate as data base for cancer cluster testing and monitoring than the commonly used aggregated scales. We suggest the development of a two-stage approach that combines methods with high detection rates as a first-line screening with methods of higher predictive ability at the second stage. PMID:24314148

  14. Segmentation of the Clustered Cells with Optimized Boundary Detection in Negative Phase Contrast Images

    PubMed Central

    Wang, Yuliang; Zhang, Zaicheng; Wang, Huimin; Bi, Shusheng

    2015-01-01

    Cell image segmentation plays a central role in numerous biology studies and clinical applications. As a result, the development of cell image segmentation algorithms with high robustness and accuracy is attracting more and more attention. In this study, an automated cell image segmentation algorithm is developed to get improved cell image segmentation with respect to cell boundary detection and segmentation of the clustered cells for all cells in the field of view in negative phase contrast images. A new method which combines the thresholding method and edge based active contour method was proposed to optimize cell boundary detection. In order to segment clustered cells, the geographic peaks of cell light intensity were utilized to detect numbers and locations of the clustered cells. In this paper, the working principles of the algorithms are described. The influence of parameters in cell boundary detection and the selection of the threshold value on the final segmentation results are investigated. At last, the proposed algorithm is applied to the negative phase contrast images from different experiments. The performance of the proposed method is evaluated. Results show that the proposed method can achieve optimized cell boundary detection and highly accurate segmentation for clustered cells. PMID:26066315

  15. Spatial cluster detection using dynamic programming

    PubMed Central

    2012-01-01

    Background The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. Methods We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. Results When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. Conclusions We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm. PMID:22443103

  16. A spatial scan statistic for compound Poisson data.

    PubMed

    Rosychuk, Rhonda J; Chang, Hsing-Ming

    2013-12-20

    The topic of spatial cluster detection gained attention in statistics during the late 1980s and early 1990s. Effort has been devoted to the development of methods for detecting spatial clustering of cases and events in the biological sciences, astronomy and epidemiology. More recently, research has examined detecting clusters of correlated count data associated with health conditions of individuals. Such a method allows researchers to examine spatial relationships of disease-related events rather than just incident or prevalent cases. We introduce a spatial scan test that identifies clusters of events in a study region. Because an individual case may have multiple (repeated) events, we base the test on a compound Poisson model. We illustrate our method for cluster detection on emergency department visits, where individuals may make multiple disease-related visits. Copyright © 2013 John Wiley & Sons, Ltd.

  17. Applying spatial analysis tools in public health: an example using SaTScan to detect geographic targets for colorectal cancer screening interventions.

    PubMed

    Sherman, Recinda L; Henry, Kevin A; Tannenbaum, Stacey L; Feaster, Daniel J; Kobetz, Erin; Lee, David J

    2014-03-20

    Epidemiologists are gradually incorporating spatial analysis into health-related research as geocoded cases of disease become widely available and health-focused geospatial computer applications are developed. One health-focused application of spatial analysis is cluster detection. Using cluster detection to identify geographic areas with high-risk populations and then screening those populations for disease can improve cancer control. SaTScan is a free cluster-detection software application used by epidemiologists around the world to describe spatial clusters of infectious and chronic disease, as well as disease vectors and risk factors. The objectives of this article are to describe how spatial analysis can be used in cancer control to detect geographic areas in need of colorectal cancer screening intervention, identify issues commonly encountered by SaTScan users, detail how to select the appropriate methods for using SaTScan, and explain how method selection can affect results. As an example, we used various methods to detect areas in Florida where the population is at high risk for late-stage diagnosis of colorectal cancer. We found that much of our analysis was underpowered and that no single method detected all clusters of statistical or public health significance. However, all methods detected 1 area as high risk; this area is potentially a priority area for a screening intervention. Cluster detection can be incorporated into routine public health operations, but the challenge is to identify areas in which the burden of disease can be alleviated through public health intervention. Reliance on SaTScan's default settings does not always produce pertinent results.

  18. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    PubMed

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  19. Galaxy clusters in the cosmic web

    NASA Astrophysics Data System (ADS)

    Acebrón, A.; Durret, F.; Martinet, N.; Adami, C.; Guennou, L.

    2014-12-01

    Simulations of large scale structure formation in the universe predict that matter is essentially distributed along filaments at the intersection of which lie galaxy clusters. We have analysed 9 clusters in the redshift range 0.4

  20. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  1. Detecting space-time disease clusters with arbitrary shapes and sizes using a co-clustering approach.

    PubMed

    Ullah, Sami; Daud, Hanita; Dass, Sarat C; Khan, Habib Nawaz; Khalil, Alamgir

    2017-11-06

    Ability to detect potential space-time clusters in spatio-temporal data on disease occurrences is necessary for conducting surveillance and implementing disease prevention policies. Most existing techniques use geometrically shaped (circular, elliptical or square) scanning windows to discover disease clusters. In certain situations, where the disease occurrences tend to cluster in very irregularly shaped areas, these algorithms are not feasible in practise for the detection of space-time clusters. To address this problem, a new algorithm is proposed, which uses a co-clustering strategy to detect prospective and retrospective space-time disease clusters with no restriction on shape and size. The proposed method detects space-time disease clusters by tracking the changes in space-time occurrence structure instead of an in-depth search over space. This method was utilised to detect potential clusters in the annual and monthly malaria data in Khyber Pakhtunkhwa Province, Pakistan from 2012 to 2016 visualising the results on a heat map. The results of the annual data analysis showed that the most likely hotspot emerged in three sub-regions in the years 2013-2014. The most likely hotspots in monthly data appeared in the month of July to October in each year and showed a strong periodic trend.

  2. Detection of protein complex from protein-protein interaction network using Markov clustering

    NASA Astrophysics Data System (ADS)

    Ochieng, P. J.; Kusuma, W. A.; Haryanto, T.

    2017-05-01

    Detection of complexes, or groups of functionally related proteins, is an important challenge while analysing biological networks. However, existing algorithms to identify protein complexes are insufficient when applied to dense networks of experimentally derived interaction data. Therefore, we introduced a graph clustering method based on Markov clustering algorithm to identify protein complex within highly interconnected protein-protein interaction networks. Protein-protein interaction network was first constructed to develop geometrical network, the network was then partitioned using Markov clustering to detect protein complexes. The interest of the proposed method was illustrated by its application to Human Proteins associated to type II diabetes mellitus. Flow simulation of MCL algorithm was initially performed and topological properties of the resultant network were analysed for detection of the protein complex. The results indicated the proposed method successfully detect an overall of 34 complexes with 11 complexes consisting of overlapping modules and 20 non-overlapping modules. The major complex consisted of 102 proteins and 521 interactions with cluster modularity and density of 0.745 and 0.101 respectively. The comparison analysis revealed MCL out perform AP, MCODE and SCPS algorithms with high clustering coefficient (0.751) network density and modularity index (0.630). This demonstrated MCL was the most reliable and efficient graph clustering algorithm for detection of protein complexes from PPI networks.

  3. Spatial cluster detection using dynamic programming.

    PubMed

    Sverchkov, Yuriy; Jiang, Xia; Cooper, Gregory F

    2012-03-25

    The task of spatial cluster detection involves finding spatial regions where some property deviates from the norm or the expected value. In a probabilistic setting this task can be expressed as finding a region where some event is significantly more likely than usual. Spatial cluster detection is of interest in fields such as biosurveillance, mining of astronomical data, military surveillance, and analysis of fMRI images. In almost all such applications we are interested both in the question of whether a cluster exists in the data, and if it exists, we are interested in finding the most accurate characterization of the cluster. We present a general dynamic programming algorithm for grid-based spatial cluster detection. The algorithm can be used for both Bayesian maximum a-posteriori (MAP) estimation of the most likely spatial distribution of clusters and Bayesian model averaging over a large space of spatial cluster distributions to compute the posterior probability of an unusual spatial clustering. The algorithm is explained and evaluated in the context of a biosurveillance application, specifically the detection and identification of Influenza outbreaks based on emergency department visits. A relatively simple underlying model is constructed for the purpose of evaluating the algorithm, and the algorithm is evaluated using the model and semi-synthetic test data. When compared to baseline methods, tests indicate that the new algorithm can improve MAP estimates under certain conditions: the greedy algorithm we compared our method to was found to be more sensitive to smaller outbreaks, while as the size of the outbreaks increases, in terms of area affected and proportion of individuals affected, our method overtakes the greedy algorithm in spatial precision and recall. The new algorithm performs on-par with baseline methods in the task of Bayesian model averaging. We conclude that the dynamic programming algorithm performs on-par with other available methods for spatial cluster detection and point to its low computational cost and extendability as advantages in favor of further research and use of the algorithm.

  4. Method and apparatus for detecting and/or imaging clusters of small scattering centers in the body

    DOEpatents

    Perez-Mendez, V.; Sommer, F.G.

    1982-07-13

    An ultrasonic method and apparatus are provided for detecting and imaging clusters of small scattering centers in the breast wherein periodic pulses are applied to an ultrasound emitting transducer and projected into the body, thereafter being received by at least one receiving transducer positioned to receive scattering from the scattering center clusters. The signals are processed to provide an image showing cluster extent and location. 6 figs.

  5. Method and apparatus for detecting and/or imaging clusters of small scattering centers in the body

    DOEpatents

    Perez-Mendez, Victor; Sommer, Frank G.

    1982-01-01

    An ultrasonic method and apparatus are provided for detecting and imaging clusters of small scattering centers in the breast wherein periodic pulses are applied to an ultrasound emitting transducer and projected into the body, thereafter being received by at least one receiving transducer positioned to receive scattering from the scattering center clusters. The signals are processed to provide an image showing cluster extent and location.

  6. Applying a 2D based CAD scheme for detecting micro-calcification clusters using digital breast tomosynthesis images: an assessment

    NASA Astrophysics Data System (ADS)

    Park, Sang Cheol; Zheng, Bin; Wang, Xiao-Hui; Gur, David

    2008-03-01

    Digital breast tomosynthesis (DBT) has emerged as a promising imaging modality for screening mammography. However, visually detecting micro-calcification clusters depicted on DBT images is a difficult task. Computer-aided detection (CAD) schemes for detecting micro-calcification clusters depicted on mammograms can achieve high performance and the use of CAD results can assist radiologists in detecting subtle micro-calcification clusters. In this study, we compared the performance of an available 2D based CAD scheme with one that includes a new grouping and scoring method when applied to both projection and reconstructed DBT images. We selected a dataset involving 96 DBT examinations acquired on 45 women. Each DBT image set included 11 low dose projection images and a varying number of reconstructed image slices ranging from 18 to 87. In this dataset 20 true-positive micro-calcification clusters were visually detected on the projection images and 40 were visually detected on the reconstructed images, respectively. We first applied the CAD scheme that was previously developed in our laboratory to the DBT dataset. We then tested a new grouping method that defines an independent cluster by grouping the same cluster detected on different projection or reconstructed images. We then compared four scoring methods to assess the CAD performance. The maximum sensitivity level observed for the different grouping and scoring methods were 70% and 88% for the projection and reconstructed images with a maximum false-positive rate of 4.0 and 15.9 per examination, respectively. This preliminary study demonstrates that (1) among the maximum, the minimum or the average CAD generated scores, using the maximum score of the grouped cluster regions achieved the highest performance level, (2) the histogram based scoring method is reasonably effective in reducing false-positive detections on the projection images but the overall CAD sensitivity is lower due to lower signal-to-noise ratio, and (3) CAD achieved higher sensitivity and higher false-positive rate (per examination) on the reconstructed images. We concluded that without changing the detection threshold or performing pre-filtering to possibly increase detection sensitivity, current CAD schemes developed and optimized for 2D mammograms perform relatively poorly and need to be re-optimized using DBT datasets and new grouping and scoring methods need to be incorporated into the schemes if these are to be used on the DBT examinations.

  7. Mapping Health Data: Improved Privacy Protection With Donut Method Geomasking

    PubMed Central

    Hampton, Kristen H.; Fitch, Molly K.; Allshouse, William B.; Doherty, Irene A.; Gesink, Dionne C.; Leone, Peter A.; Serre, Marc L.; Miller, William C.

    2010-01-01

    A major challenge in mapping health data is protecting patient privacy while maintaining the spatial resolution necessary for spatial surveillance and outbreak identification. A new adaptive geomasking technique, referred to as the donut method, extends current methods of random displacement by ensuring a user-defined minimum level of geoprivacy. In donut method geomasking, each geocoded address is relocated in a random direction by at least a minimum distance, but less than a maximum distance. The authors compared the donut method with current methods of random perturbation and aggregation regarding measures of privacy protection and cluster detection performance by masking multiple disease field simulations under a range of parameters. Both the donut method and random perturbation performed better than aggregation in cluster detection measures. The performance of the donut method in geoprivacy measures was at least 42.7% higher and in cluster detection measures was less than 4.8% lower than that of random perturbation. Results show that the donut method provides a consistently higher level of privacy protection with a minimal decrease in cluster detection performance, especially in areas where the risk to individual geoprivacy is greatest. PMID:20817785

  8. Mapping health data: improved privacy protection with donut method geomasking.

    PubMed

    Hampton, Kristen H; Fitch, Molly K; Allshouse, William B; Doherty, Irene A; Gesink, Dionne C; Leone, Peter A; Serre, Marc L; Miller, William C

    2010-11-01

    A major challenge in mapping health data is protecting patient privacy while maintaining the spatial resolution necessary for spatial surveillance and outbreak identification. A new adaptive geomasking technique, referred to as the donut method, extends current methods of random displacement by ensuring a user-defined minimum level of geoprivacy. In donut method geomasking, each geocoded address is relocated in a random direction by at least a minimum distance, but less than a maximum distance. The authors compared the donut method with current methods of random perturbation and aggregation regarding measures of privacy protection and cluster detection performance by masking multiple disease field simulations under a range of parameters. Both the donut method and random perturbation performed better than aggregation in cluster detection measures. The performance of the donut method in geoprivacy measures was at least 42.7% higher and in cluster detection measures was less than 4.8% lower than that of random perturbation. Results show that the donut method provides a consistently higher level of privacy protection with a minimal decrease in cluster detection performance, especially in areas where the risk to individual geoprivacy is greatest.

  9. Automatic detection of multiple UXO-like targets using magnetic anomaly inversion and self-adaptive fuzzy c-means clustering

    NASA Astrophysics Data System (ADS)

    Yin, Gang; Zhang, Yingtang; Fan, Hongbo; Ren, Guoquan; Li, Zhining

    2017-12-01

    We have developed a method for automatically detecting UXO-like targets based on magnetic anomaly inversion and self-adaptive fuzzy c-means clustering. Magnetic anomaly inversion methods are used to estimate the initial locations of multiple UXO-like sources. Although these initial locations have some errors with respect to the real positions, they form dense clouds around the actual positions of the magnetic sources. Then we use the self-adaptive fuzzy c-means clustering algorithm to cluster these initial locations. The estimated number of cluster centroids represents the number of targets and the cluster centroids are regarded as the locations of magnetic targets. Effectiveness of the method has been demonstrated using synthetic datasets. Computational results show that the proposed method can be applied to the case of several UXO-like targets that are randomly scattered within in a confined, shallow subsurface, volume. A field test was carried out to test the validity of the proposed method and the experimental results show that the prearranged magnets can be detected unambiguously and located precisely.

  10. WordCluster: detecting clusters of DNA words and genomic elements

    PubMed Central

    2011-01-01

    Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981

  11. A clustering algorithm for determining community structure in complex networks

    NASA Astrophysics Data System (ADS)

    Jin, Hong; Yu, Wei; Li, ShiJun

    2018-02-01

    Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.

  12. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  13. Dengue Fever Occurrence and Vector Detection by Larval Survey, Ovitrap and MosquiTRAP: A Space-Time Clusters Analysis

    PubMed Central

    de Melo, Diogo Portella Ornelas; Scherrer, Luciano Rios; Eiras, Álvaro Eduardo

    2012-01-01

    The use of vector surveillance tools for preventing dengue disease requires fine assessment of risk, in order to improve vector control activities. Nevertheless, the thresholds between vector detection and dengue fever occurrence are currently not well established. In Belo Horizonte (Minas Gerais, Brazil), dengue has been endemic for several years. From January 2007 to June 2008, the dengue vector Aedes (Stegomyia) aegypti was monitored by ovitrap, the sticky-trap MosquiTRAP™ and larval surveys in an study area in Belo Horizonte. Using a space-time scan for clusters detection implemented in SaTScan software, the vector presence recorded by the different monitoring methods was evaluated. Clusters of vectors and dengue fever were detected. It was verified that ovitrap and MosquiTRAP vector detection methods predicted dengue occurrence better than larval survey, both spatially and temporally. MosquiTRAP and ovitrap presented similar results of space-time intersections to dengue fever clusters. Nevertheless ovitrap clusters presented longer duration periods than MosquiTRAP ones, less acuratelly signalizing the dengue risk areas, since the detection of vector clusters during most of the study period was not necessarily correlated to dengue fever occurrence. It was verified that ovitrap clusters occurred more than 200 days (values ranged from 97.0±35.35 to 283.0±168.4 days) before dengue fever clusters, whereas MosquiTRAP clusters preceded dengue fever clusters by approximately 80 days (values ranged from 65.5±58.7 to 94.0±14. 3 days), the former showing to be more temporally precise. Thus, in the present cluster analysis study MosquiTRAP presented superior results for signaling dengue transmission risks both geographically and temporally. Since early detection is crucial for planning and deploying effective preventions, MosquiTRAP showed to be a reliable tool and this method provides groundwork for the development of even more precise tools. PMID:22848729

  14. Self-similarity Clustering Event Detection Based on Triggers Guidance

    NASA Astrophysics Data System (ADS)

    Zhang, Xianfei; Li, Bicheng; Tian, Yuxuan

    Traditional method of Event Detection and Characterization (EDC) regards event detection task as classification problem. It makes words as samples to train classifier, which can lead to positive and negative samples of classifier imbalance. Meanwhile, there is data sparseness problem of this method when the corpus is small. This paper doesn't classify event using word as samples, but cluster event in judging event types. It adopts self-similarity to convergence the value of K in K-means algorithm by the guidance of event triggers, and optimizes clustering algorithm. Then, combining with named entity and its comparative position information, the new method further make sure the pinpoint type of event. The new method avoids depending on template of event in tradition methods, and its result of event detection can well be used in automatic text summarization, text retrieval, and topic detection and tracking.

  15. An improved method to detect correct protein folds using partial clustering.

    PubMed

    Zhou, Jianjun; Wishart, David S

    2013-01-16

    Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either C(α) RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance.

  16. An improved method to detect correct protein folds using partial clustering

    PubMed Central

    2013-01-01

    Background Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient “partial“ clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods. Results We propose a new scheme that performs rapid but incomplete clustering on protein decoys. Our method detects structurally similar decoys (measured using either Cα RMSD or GDT-TS score) and extracts representatives from them without assigning every decoy to a cluster. We integrated our new clustering strategy with several different scoring functions to assess both the performance and speed in identifying correct or near-correct folds. Experimental results on 35 Rosetta decoy sets and 40 I-TASSER decoy sets show that our method can improve the correct fold detection rate as assessed by two different quality criteria. This improvement is significantly better than two recently published clustering methods, Durandal and Calibur-lite. Speed and efficiency testing shows that our method can handle much larger decoy sets and is up to 22 times faster than Durandal and Calibur-lite. Conclusions The new method, named HS-Forest, avoids the computationally expensive task of clustering every decoy, yet still allows superior correct-fold selection. Its improved speed, efficiency and decoy-selection performance should enable structure prediction researchers to work with larger decoy sets and significantly improve their ab initio structure prediction performance. PMID:23323835

  17. Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

    PubMed Central

    Maulik, Ujjwal; Sarkar, Anasua

    2013-01-01

    Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439

  18. Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

    PubMed

    Maulik, Ujjwal; Sarkar, Anasua

    2013-01-01

    Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.

  19. Weighted community detection and data clustering using message passing

    NASA Astrophysics Data System (ADS)

    Shi, Cheng; Liu, Yanchen; Zhang, Pan

    2018-03-01

    Grouping objects into clusters based on the similarities or weights between them is one of the most important problems in science and engineering. In this work, by extending message-passing algorithms and spectral algorithms proposed for an unweighted community detection problem, we develop a non-parametric method based on statistical physics, by mapping the problem to the Potts model at the critical temperature of spin-glass transition and applying belief propagation to solve the marginals corresponding to the Boltzmann distribution. Our algorithm is robust to over-fitting and gives a principled way to determine whether there are significant clusters in the data and how many clusters there are. We apply our method to different clustering tasks. In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms. In the clustering problem, where the data were generated by mixture models in the sparse regime, we show that our method works all the way down to the theoretical limit of detectability and gives accuracy very close to that of the optimal Bayesian inference. In the semi-supervised clustering problem, our method only needs several labels to work perfectly in classic datasets. Finally, we further develop Thouless-Anderson-Palmer equations which heavily reduce the computation complexity in dense networks but give almost the same performance as belief propagation.

  20. Detection and Characterization of Galaxy Systems at Intermediate Redshift.

    NASA Astrophysics Data System (ADS)

    Barrena, Rafael

    2004-11-01

    This thesis is divided into two very related parts. In the first part we implement and apply a galaxy cluster detection method, based on multiband observations in visible. For this purpose, we use a new algorithm, the Voronoi Galaxy Cluster Finder, which identifies overdensities over a Poissonian field of objects. By applying this algorithm over four photometric bands (B, V, R and I) we reduce the possibility of detecting galaxy projection effects and spurious detections instead of real galaxy clusters. The B, V, R and I photometry allows a good characterization of galaxy systems. Therefore, we analyze the colour and early-type sequences in the colour-magnitude diagrams of the detected clusters. This analysis helps us to confirm the selected candidates as actual galaxy systems. In addition, by comparing observational early-type sequences with a semiempirical model we can estimate a photometric redshift for the detected clusters. We will apply this detection method on four 0.5x0.5 square degrees areas, that partially overlap the Postman Distant Cluster Survey (PDCS). The observations were performed as part of the International Time Programme 1999-B using the Wide Field Camera mounted at Isaac Newton Telescope (Roque de los Muchachos Observatory, La Palma island, Spain). The B and R data obtained were completed with V and I photometry performed by Marc Postman. The comparison of our cluster catalogue with that of PDCS reveals that our work is a clear improvement in the cluster detection techniques. Our method efficiently selects galaxy clusters, in particular low mass galaxy systems, even at relative high redshift, and estimate a precise photometric redshift. The validation of our method comes by observing spectroscopically several selected candidates. By comparing photometric and spectroscopic redshifts we conclude: 1) our photometric estimation method gives an precision lower than 0.1; 2) our detection technique is even able to detect galaxy systems at z~0.7 using visible photometric bands. In the second part of this thesis we analyze in detail the dynamical state of 1E0657-56 (z=0.296), a hot galaxy cluster with strong X-ray and radio emissions. Using spectroscopic and photometric observations in visible (obtained with the New Technology Telescope and the Very Large Telescope, both located at La Silla Observatory, Chile) we analyze the velocity field, morphology, colour and star formation in the galaxy population of this cluster. 1E0657-56 is involved in a collision event. We identify the substructure involved in this collision and we propose a dynamical model that allows us to investigate the origins of X-ray and radio emissions and the relation between them. The analysis of 1E0657-56 presented in this thesis constitutes a good example of what kind of properties could be studied in some of the clusters catalogued in first part of this thesis. In addition, the detailed analysis of this cluster represents an improvement in the study of the origin of X-ray and radio emissions and merging processes in galaxy clusters.

  1. Unsupervised Learning —A Novel Clustering Method for Rolling Bearing Faults Identification

    NASA Astrophysics Data System (ADS)

    Kai, Li; Bo, Luo; Tao, Ma; Xuefeng, Yang; Guangming, Wang

    2017-12-01

    To promptly process the massive fault data and automatically provide accurate diagnosis results, numerous studies have been conducted on intelligent fault diagnosis of rolling bearing. Among these studies, such as artificial neural networks, support vector machines, decision trees and other supervised learning methods are used commonly. These methods can detect the failure of rolling bearing effectively, but to achieve better detection results, it often requires a lot of training samples. Based on above, a novel clustering method is proposed in this paper. This novel method is able to find the correct number of clusters automatically the effectiveness of the proposed method is validated using datasets from rolling element bearings. The diagnosis results show that the proposed method can accurately detect the fault types of small samples. Meanwhile, the diagnosis results are also relative high accuracy even for massive samples.

  2. Clustering approaches to feature change detection

    NASA Astrophysics Data System (ADS)

    G-Michael, Tesfaye; Gunzburger, Max; Peterson, Janet

    2018-05-01

    The automated detection of changes occurring between multi-temporal images is of significant importance in a wide range of medical, environmental, safety, as well as many other settings. The usage of k-means clustering is explored as a means for detecting objects added to a scene. The silhouette score for the clustering is used to define the optimal number of clusters that should be used. For simple images having a limited number of colors, new objects can be detected by examining the change between the optimal number of clusters for the original and modified images. For more complex images, new objects may need to be identified by examining the relative areas covered by corresponding clusters in the original and modified images. Which method is preferable depends on the composition and range of colors present in the images. In addition to describing the clustering and change detection methodology of our proposed approach, we provide some simple illustrations of its application.

  3. The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

    NASA Astrophysics Data System (ADS)

    Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

    2017-05-01

    Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.

  4. Evaluation of the Gini Coefficient in Spatial Scan Statistics for Detecting Irregularly Shaped Clusters

    PubMed Central

    Kim, Jiyu; Jung, Inkyung

    2017-01-01

    Spatial scan statistics with circular or elliptic scanning windows are commonly used for cluster detection in various applications, such as the identification of geographical disease clusters from epidemiological data. It has been pointed out that the method may have difficulty in correctly identifying non-compact, arbitrarily shaped clusters. In this paper, we evaluated the Gini coefficient for detecting irregularly shaped clusters through a simulation study. The Gini coefficient, the use of which in spatial scan statistics was recently proposed, is a criterion measure for optimizing the maximum reported cluster size. Our simulation study results showed that using the Gini coefficient works better than the original spatial scan statistic for identifying irregularly shaped clusters, by reporting an optimized and refined collection of clusters rather than a single larger cluster. We have provided a real data example that seems to support the simulation results. We think that using the Gini coefficient in spatial scan statistics can be helpful for the detection of irregularly shaped clusters. PMID:28129368

  5. A new method to search for high-redshift clusters using photometric redshifts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castignani, G.; Celotti, A.; Chiaberge, M.

    2014-09-10

    We describe a new method (Poisson probability method, PPM) to search for high-redshift galaxy clusters and groups by using photometric redshift information and galaxy number counts. The method relies on Poisson statistics and is primarily introduced to search for megaparsec-scale environments around a specific beacon. The PPM is tailored to both the properties of the FR I radio galaxies in the Chiaberge et al. sample, which are selected within the COSMOS survey, and to the specific data set used. We test the efficiency of our method of searching for cluster candidates against simulations. Two different approaches are adopted. (1) Wemore » use two z ∼ 1 X-ray detected cluster candidates found in the COSMOS survey and we shift them to higher redshift up to z = 2. We find that the PPM detects the cluster candidates up to z = 1.5, and it correctly estimates both the redshift and size of the two clusters. (2) We simulate spherically symmetric clusters of different size and richness, and we locate them at different redshifts (i.e., z = 1.0, 1.5, and 2.0) in the COSMOS field. We find that the PPM detects the simulated clusters within the considered redshift range with a statistical 1σ redshift accuracy of ∼0.05. The PPM is an efficient alternative method for high-redshift cluster searches that may also be applied to both present and future wide field surveys such as SDSS Stripe 82, LSST, and Euclid. Accurate photometric redshifts and a survey depth similar or better than that of COSMOS (e.g., I < 25) are required.« less

  6. Comparison of Molecular Typing Methods Useful for Detecting Clusters of Campylobacter jejuni and C. coli Isolates through Routine Surveillance

    PubMed Central

    Taboada, Eduardo; Grant, Christopher C. R.; Blakeston, Connie; Pollari, Frank; Marshall, Barbara; Rahn, Kris; MacKinnon, Joanne; Daignault, Danielle; Pillai, Dylan; Ng, Lai-King

    2012-01-01

    Campylobacter spp. may be responsible for unreported outbreaks of food-borne disease. The detection of these outbreaks is made more difficult by the fact that appropriate methods for detecting clusters of Campylobacter have not been well defined. We have compared the characteristics of five molecular typing methods on Campylobacter jejuni and C. coli isolates obtained from human and nonhuman sources during sentinel site surveillance during a 3-year period. Comparative genomic fingerprinting (CGF) appears to be one of the optimal methods for the detection of clusters of cases, and it could be supplemented by the sequencing of the flaA gene short variable region (flaA SVR sequence typing), with or without subsequent multilocus sequence typing (MLST). Different methods may be optimal for uncovering different aspects of source attribution. Finally, the use of several different molecular typing or analysis methods for comparing individuals within a population reveals much more about that population than a single method. Similarly, comparing several different typing methods reveals a great deal about differences in how the methods group individuals within the population. PMID:22162562

  7. Segmentation of clustered cells in negative phase contrast images with integrated light intensity and cell shape information.

    PubMed

    Wang, Y; Wang, C; Zhang, Z

    2018-05-01

    Automated cell segmentation plays a key role in characterisations of cell behaviours for both biology research and clinical practices. Currently, the segmentation of clustered cells still remains as a challenge and is the main reason for false segmentation. In this study, the emphasis was put on the segmentation of clustered cells in negative phase contrast images. A new method was proposed to combine both light intensity and cell shape information through the construction of grey-weighted distance transform (GWDT) within preliminarily segmented areas. With the constructed GWDT, the clustered cells can be detected and then separated with a modified region skeleton-based method. Moreover, a contour expansion operation was applied to get optimised detection of cell boundaries. In this paper, the working principle and detailed procedure of the proposed method are described, followed by the evaluation of the method on clustered cell segmentation. Results show that the proposed method achieves an improved performance in clustered cell segmentation compared with other methods, with 85.8% and 97.16% accuracy rate for clustered cells and all cells, respectively. © 2017 The Authors Journal of Microscopy © 2017 Royal Microscopical Society.

  8. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  9. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    PubMed Central

    Zhang, Zhaoyang; Wang, Honggang

    2016-01-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  10. Automatic video shot boundary detection using k-means clustering and improved adaptive dual threshold comparison

    NASA Astrophysics Data System (ADS)

    Sa, Qila; Wang, Zhihui

    2018-03-01

    At present, content-based video retrieval (CBVR) is the most mainstream video retrieval method, using the video features of its own to perform automatic identification and retrieval. This method involves a key technology, i.e. shot segmentation. In this paper, the method of automatic video shot boundary detection with K-means clustering and improved adaptive dual threshold comparison is proposed. First, extract the visual features of every frame and divide them into two categories using K-means clustering algorithm, namely, one with significant change and one with no significant change. Then, as to the classification results, utilize the improved adaptive dual threshold comparison method to determine the abrupt as well as gradual shot boundaries.Finally, achieve automatic video shot boundary detection system.

  11. Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

    PubMed Central

    Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

    2009-01-01

    Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan. PMID:19822013

  12. Performance map of a cluster detection test using extended power

    PubMed Central

    2013-01-01

    Background Conventional power studies possess limited ability to assess the performance of cluster detection tests. In particular, they cannot evaluate the accuracy of the cluster location, which is essential in such assessments. Furthermore, they usually estimate power for one or a few particular alternative hypotheses and thus cannot assess performance over an entire region. Takahashi and Tango developed the concept of extended power that indicates both the rate of null hypothesis rejection and the accuracy of the cluster location. We propose a systematic assessment method, using here extended power, to produce a map showing the performance of cluster detection tests over an entire region. Methods To explore the behavior of a cluster detection test on identical cluster types at any possible location, we successively applied four different spatial and epidemiological parameters. These parameters determined four cluster collections, each covering the entire study region. We simulated 1,000 datasets for each cluster and analyzed them with Kulldorff’s spatial scan statistic. From the area under the extended power curve, we constructed a map for each parameter set showing the performance of the test across the entire region. Results Consistent with previous studies, the performance of the spatial scan statistic increased with the baseline incidence of disease, the size of the at-risk population and the strength of the cluster (i.e., the relative risk). Performance was heterogeneous, however, even for very similar clusters (i.e., similar with respect to the aforementioned factors), suggesting the influence of other factors. Conclusions The area under the extended power curve is a single measure of performance and, although needing further exploration, it is suitable to conduct a systematic spatial evaluation of performance. The performance map we propose enables epidemiologists to assess cluster detection tests across an entire study region. PMID:24156765

  13. Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system.

    PubMed

    Mathes, Robert W; Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J; Olson, Don; Weiss, Don

    2017-01-01

    The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method's implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System's C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis.

  14. Three-Dimensional Computer-Aided Detection of Microcalcification Clusters in Digital Breast Tomosynthesis.

    PubMed

    Jeong, Ji-Wook; Chae, Seung-Hoon; Chae, Eun Young; Kim, Hak Hee; Choi, Young-Wook; Lee, Sooyeul

    2016-01-01

    We propose computer-aided detection (CADe) algorithm for microcalcification (MC) clusters in reconstructed digital breast tomosynthesis (DBT) images. The algorithm consists of prescreening, MC detection, clustering, and false-positive (FP) reduction steps. The DBT images containing the MC-like objects were enhanced by a multiscale Hessian-based three-dimensional (3D) objectness response function and a connected-component segmentation method was applied to extract the cluster seed objects as potential clustering centers of MCs. Secondly, a signal-to-noise ratio (SNR) enhanced image was also generated to detect the individual MC candidates and prescreen the MC-like objects. Each cluster seed candidate was prescreened by counting neighboring individual MC candidates nearby the cluster seed object according to several microcalcification clustering criteria. As a second step, we introduced bounding boxes for the accepted seed candidate, clustered all the overlapping cubes, and examined. After the FP reduction step, the average number of FPs per case was estimated to be 2.47 per DBT volume with a sensitivity of 83.3%.

  15. Cluster signal-to-noise analysis for evaluation of the information content in an image.

    PubMed

    Weerawanich, Warangkana; Shimizu, Mayumi; Takeshita, Yohei; Okamura, Kazutoshi; Yoshida, Shoko; Yoshiura, Kazunori

    2018-01-01

    (1) To develop an observer-free method of analysing image quality related to the observer performance in the detection task and (2) to analyse observer behaviour patterns in the detection of small mass changes in cone-beam CT images. 13 observers detected holes in a Teflon phantom in cone-beam CT images. Using the same images, we developed a new method, cluster signal-to-noise analysis, to detect the holes by applying various cut-off values using ImageJ and reconstructing cluster signal-to-noise curves. We then evaluated the correlation between cluster signal-to-noise analysis and the observer performance test. We measured the background noise in each image to evaluate the relationship with false positive rates (FPRs) of the observers. Correlations between mean FPRs and intra- and interobserver variations were also evaluated. Moreover, we calculated true positive rates (TPRs) and accuracies from background noise and evaluated their correlations with TPRs from observers. Cluster signal-to-noise curves were derived in cluster signal-to-noise analysis. They yield the detection of signals (true holes) related to noise (false holes). This method correlated highly with the observer performance test (R 2 = 0.9296). In noisy images, increasing background noise resulted in higher FPRs and larger intra- and interobserver variations. TPRs and accuracies calculated from background noise had high correlation with actual TPRs from observers; R 2 was 0.9244 and 0.9338, respectively. Cluster signal-to-noise analysis can simulate the detection performance of observers and thus replace the observer performance test in the evaluation of image quality. Erroneous decision-making increased with increasing background noise.

  16. Detection of the power lines in UAV remote sensed images using spectral-spatial methods.

    PubMed

    Bhola, Rishav; Krishna, Nandigam Hari; Ramesh, K N; Senthilnath, J; Anand, Gautham

    2018-01-15

    In this paper, detection of the power lines on images acquired by Unmanned Aerial Vehicle (UAV) based remote sensing is carried out using spectral-spatial methods. Spectral clustering was performed using Kmeans and Expectation Maximization (EM) algorithm to classify the pixels into the power lines and non-power lines. The spectral clustering methods used in this study are parametric in nature, to automate the number of clusters Davies-Bouldin index (DBI) is used. The UAV remote sensed image is clustered into the number of clusters determined by DBI. The k clustered image is merged into 2 clusters (power lines and non-power lines). Further, spatial segmentation was performed using morphological and geometric operations, to eliminate the non-power line regions. In this study, UAV images acquired at different altitudes and angles were analyzed to validate the robustness of the proposed method. It was observed that the EM with spatial segmentation (EM-Seg) performed better than the Kmeans with spatial segmentation (Kmeans-Seg) on most of the UAV images. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A Novel Automatic Detection System for ECG Arrhythmias Using Maximum Margin Clustering with Immune Evolutionary Algorithm

    PubMed Central

    Zhu, Bohui; Ding, Yongsheng; Hao, Kuangrong

    2013-01-01

    This paper presents a novel maximum margin clustering method with immune evolution (IEMMC) for automatic diagnosis of electrocardiogram (ECG) arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias. PMID:23690875

  18. Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system

    PubMed Central

    Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J.; Olson, Don; Weiss, Don

    2017-01-01

    The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method’s implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System’s C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis. PMID:28886112

  19. Fuzzy Kernel k-Medoids algorithm for anomaly detection problems

    NASA Astrophysics Data System (ADS)

    Rustam, Z.; Talita, A. S.

    2017-07-01

    Intrusion Detection System (IDS) is an essential part of security systems to strengthen the security of information systems. IDS can be used to detect the abuse by intruders who try to get into the network system in order to access and utilize the available data sources in the system. There are two approaches of IDS, Misuse Detection and Anomaly Detection (behavior-based intrusion detection). Fuzzy clustering-based methods have been widely used to solve Anomaly Detection problems. Other than using fuzzy membership concept to determine the object to a cluster, other approaches as in combining fuzzy and possibilistic membership or feature-weighted based methods are also used. We propose Fuzzy Kernel k-Medoids that combining fuzzy and possibilistic membership as a powerful method to solve anomaly detection problem since on numerical experiment it is able to classify IDS benchmark data into five different classes simultaneously. We classify IDS benchmark data KDDCup'99 data set into five different classes simultaneously with the best performance was achieved by using 30 % of training data with clustering accuracy reached 90.28 percent.

  20. Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers.

    PubMed

    Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

    2009-10-12

    The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*(pop); and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. For simulated data with outlier patterns, Tango's MEET, Moran's I and I*(pop) had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*(pop) (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*(pop) perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan.

  1. GDPC: Gravitation-based Density Peaks Clustering algorithm

    NASA Astrophysics Data System (ADS)

    Jiang, Jianhua; Hao, Dehao; Chen, Yujun; Parmar, Milan; Li, Keqin

    2018-07-01

    The Density Peaks Clustering algorithm, which we refer to as DPC, is a novel and efficient density-based clustering approach, and it is published in Science in 2014. The DPC has advantages of discovering clusters with varying sizes and varying densities, but has some limitations of detecting the number of clusters and identifying anomalies. We develop an enhanced algorithm with an alternative decision graph based on gravitation theory and nearby distance to identify centroids and anomalies accurately. We apply our method to some UCI and synthetic data sets. We report comparative clustering performances using F-Measure and 2-dimensional vision. We also compare our method to other clustering algorithms, such as K-Means, Affinity Propagation (AP) and DPC. We present F-Measure scores and clustering accuracies of our GDPC algorithm compared to K-Means, AP and DPC on different data sets. We show that the GDPC has the superior performance in its capability of: (1) detecting the number of clusters obviously; (2) aggregating clusters with varying sizes, varying densities efficiently; (3) identifying anomalies accurately.

  2. Galaxy clusters in the SDSS Stripe 82 based on photometric redshifts

    DOE PAGES

    Durret, F.; Adami, C.; Bertin, E.; ...

    2015-06-10

    Based on a recent photometric redshift galaxy catalogue, we have searched for galaxy clusters in the Stripe ~82 region of the Sloan Digital Sky Survey by applying the Adami & MAzure Cluster FInder (AMACFI). Extensive tests were made to fine-tune the AMACFI parameters and make the cluster detection as reliable as possible. The same method was applied to the Millennium simulation to estimate our detection efficiency and the approximate masses of the detected clusters. Considering all the cluster galaxies (i.e. within a 1 Mpc radius of the cluster to which they belong and with a photoz differing by less thanmore » 0.05 from that of the cluster), we stacked clusters in various redshift bins to derive colour-magnitude diagrams and galaxy luminosity functions (GLFs). For each galaxy with absolute magnitude brighter than -19.0 in the r band, we computed the disk and spheroid components by applying SExtractor, and by stacking clusters we determined how the disk-to-spheroid flux ratio varies with cluster redshift and mass. We also detected 3663 clusters in the redshift range 0.1513 and a few 10 14 solar masses. Furthermore, by stacking the cluster galaxies in various redshift bins, we find a clear red sequence in the (g'-r') versus r' colour-magnitude diagrams, and the GLFs are typical of clusters, though with a possible contamination from field galaxies. The morphological analysis of the cluster galaxies shows that the fraction of late-type to early-type galaxies shows an increase with redshift (particularly in high mass clusters) and a decrease with detection level, i.e. cluster mass. From the properties of the cluster galaxies, the majority of the candidate clusters detected here seem to be real clusters with typical cluster properties.« less

  3. Galaxy clusters in the SDSS Stripe 82 based on photometric redshifts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Durret, F.; Adami, C.; Bertin, E.

    Based on a recent photometric redshift galaxy catalogue, we have searched for galaxy clusters in the Stripe ~82 region of the Sloan Digital Sky Survey by applying the Adami & MAzure Cluster FInder (AMACFI). Extensive tests were made to fine-tune the AMACFI parameters and make the cluster detection as reliable as possible. The same method was applied to the Millennium simulation to estimate our detection efficiency and the approximate masses of the detected clusters. Considering all the cluster galaxies (i.e. within a 1 Mpc radius of the cluster to which they belong and with a photoz differing by less thanmore » 0.05 from that of the cluster), we stacked clusters in various redshift bins to derive colour-magnitude diagrams and galaxy luminosity functions (GLFs). For each galaxy with absolute magnitude brighter than -19.0 in the r band, we computed the disk and spheroid components by applying SExtractor, and by stacking clusters we determined how the disk-to-spheroid flux ratio varies with cluster redshift and mass. We also detected 3663 clusters in the redshift range 0.1513 and a few 10 14 solar masses. Furthermore, by stacking the cluster galaxies in various redshift bins, we find a clear red sequence in the (g'-r') versus r' colour-magnitude diagrams, and the GLFs are typical of clusters, though with a possible contamination from field galaxies. The morphological analysis of the cluster galaxies shows that the fraction of late-type to early-type galaxies shows an increase with redshift (particularly in high mass clusters) and a decrease with detection level, i.e. cluster mass. From the properties of the cluster galaxies, the majority of the candidate clusters detected here seem to be real clusters with typical cluster properties.« less

  4. Detection of calcification clusters in digital breast tomosynthesis slices at different dose levels utilizing a SRSAR reconstruction and JAFROC

    NASA Astrophysics Data System (ADS)

    Timberg, P.; Dustler, M.; Petersson, H.; Tingberg, A.; Zackrisson, S.

    2015-03-01

    Purpose: To investigate detection performance for calcification clusters in reconstructed digital breast tomosynthesis (DBT) slices at different dose levels using a Super Resolution and Statistical Artifact Reduction (SRSAR) reconstruction method. Method: Simulated calcifications with irregular profile (0.2 mm diameter) where combined to form clusters that were added to projection images (1-3 per abnormal image) acquired on a DBT system (Mammomat Inspiration, Siemens). The projection images were dose reduced by software to form 35 abnormal cases and 25 normal cases as if acquired at 100%, 75% and 50% dose level (AGD of approximately 1.6 mGy for a 53 mm standard breast, measured according to EUREF v0.15). A standard FBP and a SRSAR reconstruction method (utilizing IRIS (iterative reconstruction filters), and outlier detection using Maximum-Intensity Projections and Average-Intensity Projections) were used to reconstruct single central slices to be used in a Free-response task (60 images per observer and dose level). Six observers participated and their task was to detect the clusters and assign confidence rating in randomly presented images from the whole image set (balanced by dose level). Each trial was separated by one weeks to reduce possible memory bias. The outcome was analyzed for statistical differences using Jackknifed Alternative Free-response Receiver Operating Characteristics. Results: The results indicate that it is possible reduce the dose by 50% with SRSAR without jeopardizing cluster detection. Conclusions: The detection performance for clusters can be maintained at a lower dose level by using SRSAR reconstruction.

  5. A Noninvasive and Real-Time Method for Circulating Tumor Cell Detection by In Vivo Flow Cytometry.

    PubMed

    Wei, Xunbin; Zhou, Jian; Zhu, Xi; Yang, Xinrong; Yang, Ping; Wang, Qiyan

    2017-01-01

    The quantification of circulating tumor cells (CTCs) has been considered a potentially powerful tool in cancer diagnosis and prognosis, as CTCs have been shown to appear very early in cancer development. Great efforts have been made to develop methods that were less invasive and more sensitive to detect CTCs earlier. There is growing evidence that CTC clusters have greater metastatic potential than single CTCs. Therefore, the detection of CTC clusters is also important. This chapter is aimed to introduce a noninvasive technique for CTCs detection named in vivo flow cytometry (IVFC), which has been demonstrated to be capable of monitoring CTCs dynamics continuously. Furthermore, IVFC could be helpful for CTC cluster enumeration.

  6. Substructures in Clusters of Galaxies

    NASA Astrophysics Data System (ADS)

    Lehodey, Brigitte Tome

    2000-01-01

    This dissertation presents two methods for the detection of substructures in clusters of galaxies and the results of their application to a group of four clusters. In chapters 2 and 3, we remember the main properties of clusters of galaxies and give the definition of substructures. We also try to show why the study of substructures in clusters of galaxies is so important for Cosmology. Chapters 4 and 5 describe these two methods, the first one, the adaptive Kernel, is applied to the study of the spatial and kinematical distribution of the cluster galaxies. The second one, the MVM (Multiscale Vision Model), is applied to analyse the cluster diffuse X-ray emission, i.e., the intracluster gas distribution. At the end of these two chapters, we also present the results of the application of these methods to our sample of clusters. In chapter 6, we draw the conclusions from the comparison of the results we obtain with each method. In the last chapter, we present the main conclusions of this work trying to point out possible developments. We close with two appendices in which we detail some questions raised in this work not directly linked to the problem of substructures detection.

  7. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

    PubMed Central

    Craig, Hugh; Berretta, Regina; Moscato, Pablo

    2016-01-01

    In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416

  8. A Cyber-Attack Detection Model Based on Multivariate Analyses

    NASA Astrophysics Data System (ADS)

    Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

    In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.

  9. Simplified computer-aided detection scheme of microcalcification clusters in digital breast tomosynthesis images.

    PubMed

    Ji-Wook Jeong; Seung-Hoon Chae; Eun Young Chae; Hak Hee Kim; Young Wook Choi; Sooyeul Lee

    2016-08-01

    A computer-aided detection (CADe) algorithm for clustered microcalcifications (MCs) in reconstructed digital breast tomosynthesis (DBT) images is suggested. The MC-like objects were enhanced by a Hessian-based 3D calcification response function, and a signal-to-noise ratio (SNR) enhanced image was also generated to screen the MC clustering seed objects. A connected component segmentation method was used to detect the cluster seed objects, which were considered as potential clustering centers of MCs. Bounding cubes for the accepted clustering seed candidate were generated and the overlapping cubes were combined and examined. After the MC clustering and false-positive (FP) reduction step, the average number of FPs was estimated to be 0.87 per DBT volume with a sensitivity of 90.5%.

  10. Exploring supervised and unsupervised methods to detect topics in biomedical text

    PubMed Central

    Lee, Minsuk; Wang, Weiqing; Yu, Hong

    2006-01-01

    Background Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on information content. Topic detection will benefit many other natural language processing tasks including information retrieval, text summarization and question answering; and is a necessary step towards the building of an information system that provides an efficient way for biologists to seek information from an ocean of literature. Results We have explored the methods of Topic Spotting, a task of text categorization that applies the supervised machine-learning technique naïve Bayes to assign automatically a document into one or more predefined topics; and Topic Clustering, which apply unsupervised hierarchical clustering algorithms to aggregate documents into clusters such that each cluster represents a topic. We have applied our methods to detect topics of more than fifteen thousand of articles that represent over sixteen thousand entries in the Online Mendelian Inheritance in Man (OMIM) database. We have explored bag of words as the features. Additionally, we have explored semantic features; namely, the Medical Subject Headings (MeSH) that are assigned to the MEDLINE records, and the Unified Medical Language System (UMLS) semantic types that correspond to the MeSH terms, in addition to bag of words, to facilitate the tasks of topic detection. Our results indicate that incorporating the MeSH terms and the UMLS semantic types as additional features enhances the performance of topic detection and the naïve Bayes has the highest accuracy, 66.4%, for predicting the topic of an OMIM article as one of the total twenty-five topics. Conclusion Our results indicate that the supervised topic spotting methods outperformed the unsupervised topic clustering; on the other hand, the unsupervised topic clustering methods have the advantages of being robust and applicable in real world settings. PMID:16539745

  11. Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range

    NASA Astrophysics Data System (ADS)

    Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

    2014-01-01

    Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 < z < 0.9), all with HST imaging data available. This survey has two main objectives: to constrain dark energy (DE) using weak lensing tomography on galaxy clusters and to build a database (deep multi-band imaging allowing photometric redshift estimates, spectroscopic data, X-ray data) of rich distant clusters to study their properties. Aims: We analyse the structures of all the clusters in the DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures show the same general behaviour as regular cluster galaxies; however, in substructures, there is a deficiency of both late type and old stellar population galaxies. Late type galaxies with recent bursts of star formation seem to be missing in the substructures close to the bottom of the host cluster potential well. However, our sample would need to be increased to allow a more robust analysis. Tables 1, 2, 4 and Appendices A-C are available in electronic form at http://www.aanda.org

  12. Vertebra identification using template matching modelmp and K-means clustering.

    PubMed

    Larhmam, Mohamed Amine; Benjelloun, Mohammed; Mahmoudi, Saïd

    2014-03-01

    Accurate vertebra detection and segmentation are essential steps for automating the diagnosis of spinal disorders. This study is dedicated to vertebra alignment measurement, the first step in a computer-aided diagnosis tool for cervical spine trauma. Automated vertebral segment alignment determination is a challenging task due to low contrast imaging and noise. A software tool for segmenting vertebrae and detecting subluxations has clinical significance. A robust method was developed and tested for cervical vertebra identification and segmentation that extracts parameters used for vertebra alignment measurement. Our contribution involves a novel combination of a template matching method and an unsupervised clustering algorithm. In this method, we build a geometric vertebra mean model. To achieve vertebra detection, manual selection of the region of interest is performed initially on the input image. Subsequent preprocessing is done to enhance image contrast and detect edges. Candidate vertebra localization is then carried out by using a modified generalized Hough transform (GHT). Next, an adapted cost function is used to compute local voted centers and filter boundary data. Thereafter, a K-means clustering algorithm is applied to obtain clusters distribution corresponding to the targeted vertebrae. These clusters are combined with the vote parameters to detect vertebra centers. Rigid segmentation is then carried out by using GHT parameters. Finally, cervical spine curves are extracted to measure vertebra alignment. The proposed approach was successfully applied to a set of 66 high-resolution X-ray images. Robust detection was achieved in 97.5 % of the 330 tested cervical vertebrae. An automated vertebral identification method was developed and demonstrated to be robust to noise and occlusion. This work presents a first step toward an automated computer-aided diagnosis system for cervical spine trauma detection.

  13. Space-time clusters for early detection of grizzly bear predation.

    PubMed

    Kermish-Wells, Joseph; Massolo, Alessandro; Stenhouse, Gordon B; Larsen, Terrence A; Musiani, Marco

    2018-01-01

    Accurate detection and classification of predation events is important to determine predation and consumption rates by predators. However, obtaining this information for large predators is constrained by the speed at which carcasses disappear and the cost of field data collection. To accurately detect predation events, researchers have used GPS collar technology combined with targeted site visits. However, kill sites are often investigated well after the predation event due to limited data retrieval options on GPS collars (VHF or UHF downloading) and to ensure crew safety when working with large predators. This can lead to missing information from small-prey (including young ungulates) kill sites due to scavenging and general site deterioration (e.g., vegetation growth). We used a space-time permutation scan statistic (STPSS) clustering method (SaTScan) to detect predation events of grizzly bears ( Ursus arctos ) fitted with satellite transmitting GPS collars. We used generalized linear mixed models to verify predation events and the size of carcasses using spatiotemporal characteristics as predictors. STPSS uses a probability model to compare expected cluster size (space and time) with the observed size. We applied this method retrospectively to data from 2006 to 2007 to compare our method to random GPS site selection. In 2013-2014, we applied our detection method to visit sites one week after their occupation. Both datasets were collected in the same study area. Our approach detected 23 of 27 predation sites verified by visiting 464 random grizzly bear locations in 2006-2007, 187 of which were within space-time clusters and 277 outside. Predation site detection increased by 2.75 times (54 predation events of 335 visited clusters) using 2013-2014 data. Our GLMMs showed that cluster size and duration predicted predation events and carcass size with high sensitivity (0.72 and 0.94, respectively). Coupling GPS satellite technology with clusters using a program based on space-time probability models allows for prompt visits to predation sites. This enables accurate identification of the carcass size and increases fieldwork efficiency in predation studies.

  14. Detecting multiple outliers in linear functional relationship model for circular variables using clustering technique

    NASA Astrophysics Data System (ADS)

    Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor

    2017-05-01

    Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.

  15. Unsupervised Anomaly Detection Based on Clustering and Multiple One-Class SVM

    NASA Astrophysics Data System (ADS)

    Song, Jungsuk; Takakura, Hiroki; Okabe, Yasuo; Kwon, Yongjin

    Intrusion detection system (IDS) has played an important role as a device to defend our networks from cyber attacks. However, since it is unable to detect unknown attacks, i.e., 0-day attacks, the ultimate challenge in intrusion detection field is how we can exactly identify such an attack by an automated manner. Over the past few years, several studies on solving these problems have been made on anomaly detection using unsupervised learning techniques such as clustering, one-class support vector machine (SVM), etc. Although they enable one to construct intrusion detection models at low cost and effort, and have capability to detect unforeseen attacks, they still have mainly two problems in intrusion detection: a low detection rate and a high false positive rate. In this paper, we propose a new anomaly detection method based on clustering and multiple one-class SVM in order to improve the detection rate while maintaining a low false positive rate. We evaluated our method using KDD Cup 1999 data set. Evaluation results show that our approach outperforms the existing algorithms reported in the literature; especially in detection of unknown attacks.

  16. Detecting synchronization clusters in multivariate time series via coarse-graining of Markov chains.

    PubMed

    Allefeld, Carsten; Bialonski, Stephan

    2007-12-01

    Synchronization cluster analysis is an approach to the detection of underlying structures in data sets of multivariate time series, starting from a matrix R of bivariate synchronization indices. A previous method utilized the eigenvectors of R for cluster identification, analogous to several recent attempts at group identification using eigenvectors of the correlation matrix. All of these approaches assumed a one-to-one correspondence of dominant eigenvectors and clusters, which has however been shown to be wrong in important cases. We clarify the usefulness of eigenvalue decomposition for synchronization cluster analysis by translating the problem into the language of stochastic processes, and derive an enhanced clustering method harnessing recent insights from the coarse-graining of finite-state Markov processes. We illustrate the operation of our method using a simulated system of coupled Lorenz oscillators, and we demonstrate its superior performance over the previous approach. Finally we investigate the question of robustness of the algorithm against small sample size, which is important with regard to field applications.

  17. [A cloud detection algorithm for MODIS images combining Kmeans clustering and multi-spectral threshold method].

    PubMed

    Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei

    2011-04-01

    An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach.

  18. Quick detection of QRS complexes and R-waves using a wavelet transform and K-means clustering.

    PubMed

    Xia, Yong; Han, Junze; Wang, Kuanquan

    2015-01-01

    Based on the idea of telemedicine, 24-hour uninterrupted monitoring on electrocardiograms (ECG) has started to be implemented. To create an intelligent ECG monitoring system, an efficient and quick detection algorithm for the characteristic waveforms is needed. This paper aims to give a quick and effective method for detecting QRS-complexes and R-waves in ECGs. The real ECG signal from the MIT-BIH Arrhythmia Database is used for the performance evaluation. The method proposed combined a wavelet transform and the K-means clustering algorithm. A wavelet transform is adopted in the data analysis and preprocessing. Then, based on the slope information of the filtered data, a segmented K-means clustering method is adopted to detect the QRS region. Detection of the R-peak is based on comparing the local amplitudes in each QRS region, which is different from other approaches, and the time cost of R-wave detection is reduced. Of the tested 8 records (total 18201 beats) from the MIT-BIH Arrhythmia Database, an average R-peak detection sensitivity of 99.72 and a positive predictive value of 99.80% are gained; the average time consumed detecting a 30-min original signal is 5.78s, which is competitive with other methods.

  19. Ant colony clustering with fitness perception and pheromone diffusion for community detection in complex networks

    NASA Astrophysics Data System (ADS)

    Ji, Junzhong; Song, Xiangjing; Liu, Chunnian; Zhang, Xiuzhen

    2013-08-01

    Community structure detection in complex networks has been intensively investigated in recent years. In this paper, we propose an adaptive approach based on ant colony clustering to discover communities in a complex network. The focus of the method is the clustering process of an ant colony in a virtual grid, where each ant represents a node in the complex network. During the ant colony search, the method uses a new fitness function to percept local environment and employs a pheromone diffusion model as a global information feedback mechanism to realize information exchange among ants. A significant advantage of our method is that the locations in the grid environment and the connections of the complex network structure are simultaneously taken into account in ants moving. Experimental results on computer-generated and real-world networks show the capability of our method to successfully detect community structures.

  20. Privacy Protection Versus Cluster Detection in Spatial Epidemiology

    PubMed Central

    Olson, Karen L.; Grannis, Shaun J.; Mandl, Kenneth D.

    2006-01-01

    Objectives. Patient data that includes precise locations can reveal patients’ identities, whereas data aggregated into administrative regions may preserve privacy and confidentiality. We investigated the effect of varying degrees of address precision (exact latitude and longitude vs the center points of zip code or census tracts) on detection of spatial clusters of cases. Methods. We simulated disease outbreaks by adding supplementary spatially clustered emergency department visits to authentic hospital emergency department syndromic surveillance data. We identified clusters with a spatial scan statistic and evaluated detection rate and accuracy. Results. More clusters were identified, and clusters were more accurately detected, when exact locations were used. That is, these clusters contained at least half of the simulated points and involved few additional emergency department visits. These results were especially apparent when the synthetic clustered points crossed administrative boundaries and fell into multiple zip code or census tracts. Conclusions. The spatial cluster detection algorithm performed better when addresses were analyzed as exact locations than when they were analyzed as center points of zip code or census tracts, particularly when the clustered points crossed administrative boundaries. Use of precise addresses offers improved performance, but this practice must be weighed against privacy concerns in the establishment of public health data exchange policies. PMID:17018828

  1. A possibilistic approach to clustering

    NASA Technical Reports Server (NTRS)

    Krishnapuram, Raghu; Keller, James M.

    1993-01-01

    Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

  2. A statistical method (cross-validation) for bone loss region detection after spaceflight

    PubMed Central

    Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.

    2010-01-01

    Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144

  3. A speeded-up saliency region-based contrast detection method for small targets

    NASA Astrophysics Data System (ADS)

    Li, Zhengjie; Zhang, Haiying; Bai, Jiaojiao; Zhou, Zhongjun; Zheng, Huihuang

    2018-04-01

    To cope with the rapid development of the real applications for infrared small targets, the researchers have tried their best to pursue more robust detection methods. At present, the contrast measure-based method has become a promising research branch. Following the framework, in this paper, a speeded-up contrast measure scheme is proposed based on the saliency detection and density clustering. First, the saliency region is segmented by saliency detection method, and then, the Multi-scale contrast calculation is carried out on it instead of traversing the whole image. Second, the target with a certain "integrity" property in spatial is exploited to distinguish the target from the isolated noises by density clustering. Finally, the targets are detected by a self-adaptation threshold. Compared with time-consuming MPCM (Multiscale Patch Contrast Map), the time cost of the speeded-up version is within a few seconds. Additional, due to the use of "clustering segmentation", the false alarm caused by heavy noises can be restrained to a lower level. The experiments show that our method has a satisfied FASR (False alarm suppression ratio) and real-time performance compared with the state-of-art algorithms no matter in cloudy sky or sea-sky background.

  4. Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

    PubMed

    Bansal, Ravi; Peterson, Bradley S

    2018-06-01

    Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Fast EEG spike detection via eigenvalue analysis and clustering of spatial amplitude distribution

    NASA Astrophysics Data System (ADS)

    Fukami, Tadanori; Shimada, Takamasa; Ishikawa, Bunnoshin

    2018-06-01

    Objective. In the current study, we tested a proposed method for fast spike detection in electroencephalography (EEG). Approach. We performed eigenvalue analysis in two-dimensional space spanned by gradients calculated from two neighboring samples to detect high-amplitude negative peaks. We extracted the spike candidates by imposing restrictions on parameters regarding spike shape and eigenvalues reflecting detection characteristics of individual medical doctors. We subsequently performed clustering, classifying detected peaks by considering the amplitude distribution at 19 scalp electrodes. Clusters with a small number of candidates were excluded. We then defined a score for eliminating spike candidates for which the pattern of detected electrodes differed from the overall pattern in a cluster. Spikes were detected by setting the score threshold. Main results. Based on visual inspection by a psychiatrist experienced in EEG, we evaluated the proposed method using two statistical measures of precision and recall with respect to detection performance. We found that precision and recall exhibited a trade-off relationship. The average recall value was 0.708 in eight subjects with the score threshold that maximized the F-measure, with 58.6  ±  36.2 spikes per subject. Under this condition, the average precision was 0.390, corresponding to a false positive rate 2.09 times higher than the true positive rate. Analysis of the required processing time revealed that, using a general-purpose computer, our method could be used to perform spike detection in 12.1% of the recording time. The process of narrowing down spike candidates based on shape occupied most of the processing time. Significance. Although the average recall value was comparable with that of other studies, the proposed method significantly shortened the processing time.

  6. Object-Oriented Image Clustering Method Using UAS Photogrammetric Imagery

    NASA Astrophysics Data System (ADS)

    Lin, Y.; Larson, A.; Schultz-Fellenz, E. S.; Sussman, A. J.; Swanson, E.; Coppersmith, R.

    2016-12-01

    Unmanned Aerial Systems (UAS) have been used widely as an imaging modality to obtain remotely sensed multi-band surface imagery, and are growing in popularity due to their efficiency, ease of use, and affordability. Los Alamos National Laboratory (LANL) has employed the use of UAS for geologic site characterization and change detection studies at a variety of field sites. The deployed UAS equipped with a standard visible band camera to collect imagery datasets. Based on the imagery collected, we use deep sparse algorithmic processing to detect and discriminate subtle topographic features created or impacted by subsurface activities. In this work, we develop an object-oriented remote sensing imagery clustering method for land cover classification. To improve the clustering and segmentation accuracy, instead of using conventional pixel-based clustering methods, we integrate the spatial information from neighboring regions to create super-pixels to avoid salt-and-pepper noise and subsequent over-segmentation. To further improve robustness of our clustering method, we also incorporate a custom digital elevation model (DEM) dataset generated using a structure-from-motion (SfM) algorithm together with the red, green, and blue (RGB) band data for clustering. In particular, we first employ an agglomerative clustering to create an initial segmentation map, from where every object is treated as a single (new) pixel. Based on the new pixels obtained, we generate new features to implement another level of clustering. We employ our clustering method to the RGB+DEM datasets collected at the field site. Through binary clustering and multi-object clustering tests, we verify that our method can accurately separate vegetation from non-vegetation regions, and are also able to differentiate object features on the surface.

  7. Comprehensive cluster analysis with Transitivity Clustering.

    PubMed

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  8. Minimal spanning tree algorithm for γ-ray source detection in sparse photon images: cluster parameters and selection strategies

    DOE PAGES

    Campana, R.; Bernieri, E.; Massaro, E.; ...

    2013-05-22

    We present that the minimal spanning tree (MST) algorithm is a graph-theoretical cluster-finding method. We previously applied it to γ-ray bidimensional images, showing that it is quite sensitive in finding faint sources. Possible sources are associated with the regions where the photon arrival directions clusterize. MST selects clusters starting from a particular “tree” connecting all the point of the image and performing a cut based on the angular distance between photons, with a number of events higher than a given threshold. In this paper, we show how a further filtering, based on some parameters linked to the cluster properties, canmore » be applied to reduce spurious detections. We find that the most efficient parameter for this secondary selection is the magnitudeM of a cluster, defined as the product of its number of events by its clustering degree. We test the sensitivity of the method by means of simulated and real Fermi-Large Area Telescope (LAT) fields. Our results show that √M is strongly correlated with other statistical significance parameters, derived from a wavelet based algorithm and maximum likelihood (ML) analysis, and that it can be used as a good estimator of statistical significance of MST detections. Finally, we apply the method to a 2-year LAT image at energies higher than 3 GeV, and we show the presence of new clusters, likely associated with BL Lac objects.« less

  9. Advances in Significance Testing for Cluster Detection

    NASA Astrophysics Data System (ADS)

    Coleman, Deidra Andrea

    Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).

  10. A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB.

    PubMed

    Kent, Peter; Jensen, Rikke K; Kongsted, Alice

    2014-10-02

    There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups. Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.

  11. A spatial hazard model for cluster detection on continuous indicators of disease: application to somatic cell score.

    PubMed

    Gay, Emilie; Senoussi, Rachid; Barnouin, Jacques

    2007-01-01

    Methods for spatial cluster detection dealing with diseases quantified by continuous variables are few, whereas several diseases are better approached by continuous indicators. For example, subclinical mastitis of the dairy cow is evaluated using a continuous marker of udder inflammation, the somatic cell score (SCS). Consequently, this study proposed to analyze spatialized risk and cluster components of herd SCS through a new method based on a spatial hazard model. The dataset included annual SCS for 34 142 French dairy herds for the year 2000, and important SCS risk factors: mean parity, percentage of winter and spring calvings, and herd size. The model allowed the simultaneous estimation of the effects of known risk factors and of potential spatial clusters on SCS, and the mapping of the estimated clusters and their range. Mean parity and winter and spring calvings were significantly associated with subclinical mastitis risk. The model with the presence of 3 clusters was highly significant, and the 3 clusters were attractive, i.e. closeness to cluster center increased the occurrence of high SCS. The three localizations were the following: close to the city of Troyes in the northeast of France; around the city of Limoges in the center-west; and in the southwest close to the city of Tarbes. The semi-parametric method based on spatial hazard modeling applies to continuous variables, and takes account of both risk factors and potential heterogeneity of the background population. This tool allows a quantitative detection but assumes a spatially specified form for clusters.

  12. Substructures in DAFT/FADA survey clusters based on XMM and optical data

    NASA Astrophysics Data System (ADS)

    Durret, F.; DAFT/FADA Team

    2014-07-01

    The DAFT/FADA survey was initiated to perform weak lensing tomography on a sample of 90 massive clusters in the redshift range [0.4,0.9] with HST imaging available. The complementary deep multiband imaging constitutes a high quality imaging data base for these clusters. In X-rays, we have analysed the XMM-Newton and/or Chandra data available for 32 clusters, and for 23 clusters we fit the X-ray emissivity with a beta-model and subtract it to search for substructures in the X-ray gas. This study was coupled with a dynamical analysis for the 18 clusters with at least 15 spectroscopic galaxy redshifts in the cluster range, based on a Serna & Gerbal (SG) analysis. We detected ten substructures in eight clusters by both methods (X-rays and SG). The percentage of mass included in substructures is found to be roughly constant with redshift, with values of 5-15%. Most of the substructures detected both in X-rays and with the SG method are found to be relatively recent infalls, probably at their first cluster pericenter approach.

  13. A multimodal detection model of dolphins to estimate abundance validated by field experiments.

    PubMed

    Akamatsu, Tomonari; Ura, Tamaki; Sugimatsu, Harumi; Bahl, Rajendar; Behera, Sandeep; Panda, Sudarsan; Khan, Muntaz; Kar, S K; Kar, C S; Kimura, Satoko; Sasaki-Yamamoto, Yukiko

    2013-09-01

    Abundance estimation of marine mammals requires matching of detection of an animal or a group of animal by two independent means. A multimodal detection model using visual and acoustic cues (surfacing and phonation) that enables abundance estimation of dolphins is proposed. The method does not require a specific time window to match the cues of both means for applying mark-recapture method. The proposed model was evaluated using data obtained in field observations of Ganges River dolphins and Irrawaddy dolphins, as examples of dispersed and condensed distributions of animals, respectively. The acoustic detection probability was approximately 80%, 20% higher than that of visual detection for both species, regardless of the distribution of the animals in present study sites. The abundance estimates of Ganges River dolphins and Irrawaddy dolphins fairly agreed with the numbers reported in previous monitoring studies. The single animal detection probability was smaller than that of larger cluster size, as predicted by the model and confirmed by field data. However, dense groups of Irrawaddy dolphins showed difference in cluster sizes observed by visual and acoustic methods. Lower detection probability of single clusters of this species seemed to be caused by the clumped distribution of this species.

  14. Clustering-based Filtering to Detect Isolated and Intermittent Pulses in Radio Astronomy Data

    NASA Astrophysics Data System (ADS)

    Wagstaff, Kiri; Tang, B.; Lazio, T. J.; Spolaor, S.

    2013-01-01

    Radio-emitting neutron stars (pulsars) produce a series of periodic pulses at radio frequencies. Dispersion, caused by propagation through the interstellar medium, delays signals at lower frequencies more than higher frequencies. This well understood effect can be reversed though de-dispersion at the appropriate dispersion measure (DM). The periodic nature of a pulsar provides multiple samples of signals at the same DM, increasing the reliability of any candidate detection. However, existing methods for pulsar detection are ineffective for many pulse-emitting phenomena now being discovered. Sources exhibit a wide range of pulse repetition rates, from highly regular canonical pulsars to intermittent and nulling pulsars to rotating radio transients (RRATs) that may emit only a few pulses per hour. Other source types may emit only a few pulses, or even only a single pulse. We seek to broaden the scope of radio signal analysis to enable the detection of isolated and intermittent pulses. Without a requirement that detected sources be periodic, we find that a typical de-dispersion search yields results that are often dominated by spurious detections from radio frequency interference (RFI). These occur across the DM range, so filtering out DM-0 signals is insufficient. We employ DBSCAN data clustering to identify groups within the de-dispersion results, using information for each candidate about time, DM, SNR, and pulse width. DBSCAN is a density-based clustering algorithm that offers two advantages over other clustering methods: 1) the number of clusters need not to be specified, and 2) there is no model of expected cluster shape (such as the Gaussian assumption behind EM clustering). Each data cluster can be selectively masked or investigated to facilitate the process of sifting through hundreds of thousands of detections to focus on those of true interest. Using data obtained by the Byrd Green Bank Telescope (GBT), we show how this approach can help separate RFI from difficult to find single and intermittent pulses.

  15. Saliency detection algorithm based on LSC-RC

    NASA Astrophysics Data System (ADS)

    Wu, Wei; Tian, Weiye; Wang, Ding; Luo, Xin; Wu, Yingfei; Zhang, Yu

    2018-02-01

    Image prominence is the most important region in an image, which can cause the visual attention and response of human beings. Preferentially allocating the computer resources for the image analysis and synthesis by the significant region is of great significance to improve the image area detecting. As a preprocessing of other disciplines in image processing field, the image prominence has widely applications in image retrieval and image segmentation. Among these applications, the super-pixel segmentation significance detection algorithm based on linear spectral clustering (LSC) has achieved good results. The significance detection algorithm proposed in this paper is better than the regional contrast ratio by replacing the method of regional formation in the latter with the linear spectral clustering image is super-pixel block. After combining with the latest depth learning method, the accuracy of the significant region detecting has a great promotion. At last, the superiority and feasibility of the super-pixel segmentation detection algorithm based on linear spectral clustering are proved by the comparative test.

  16. The Atacama Cosmology Telescope: Sunyaev-Zel'dovich-Selected Galaxy Clusters AT 148 GHz in the 2008 Survey

    NASA Technical Reports Server (NTRS)

    Marriage, Tobias A.; Acquaviva, Viviana; Ade, Peter A. R.; Aguirre, Paula; Amiri, Mandana; Appel, John William; Barrientos, L. Felipe; Battistelli, Elia S.; Bond, J. Richard; Brown, Ben; hide

    2011-01-01

    We report on 23 clusters detected blindly as Sunyaev-Zel'dovich (SZ) decrements in a 148 GHz, 455 deg (exp 2) map of the southern sky made with data from the Atacama Cosmology Telescope 2008 observing season. All SZ detections announced in this work have confirmed optical counterparts. Ten of the clusters are new discoveries. One newly discovered cluster, ACT-CL 10102-4915, with a redshift of 0.75 (photometric), has an SZ decrement comparable to the most massive systems at lower redshifts. Simulations of the cluster recovery method reproduce the sample purity measured by optical follow-up. In particular, for clusters detected with a signal-to-noise ratio greater than six, simulations are consistent with optical follow-up that demonstrated this subsample is 100% pure, The simulations further imply that the total sample is 80% complete for clusters with mass in excess of 6 x 10(exp 14) solar masses referenced to the cluster volume characterized by 500 times the critical density. The Compton gamma-X-ray luminosity mass comparison for the 11 best-detected clusters visually agrees with both self-similar and non-adiabatic, simulation-derived scaling laws,

  17. Automated detection of case clusters of waterborne acute gastroenteritis from health insurance data - pilot study in three French districts.

    PubMed

    Rambaud, Loïc; Galey, Catherine; Beaudeau, Pascal

    2016-04-01

    This pilot study was conducted to assess the utility of using a health insurance database for the automated detection of waterborne outbreaks of acute gastroenteritis (AGE). The weekly number of AGE cases for which the patient consulted a doctor (cAGE) was derived from this database for 1,543 towns in three French districts during the 2009-2012 period. The method we used is based on a spatial comparison of incidence rates and of their time trends between the target town and the district. Each municipality was tested, week by week, for the entire study period. Overall, 193 clusters were identified, 10% of the municipalities were involved in at least one cluster and less than 2% in several. We can infer that nationwide more than 1,000 clusters involving 30,000 cases of cAGE each year may be linked to tap water. The clusters discovered with this automated detection system will be reported to local operators for investigation of the situations at highest risk. This method will be compared with others before automated detection is implemented on a national level.

  18. Detecting communities in large networks

    NASA Astrophysics Data System (ADS)

    Capocci, A.; Servedio, V. D. P.; Caldarelli, G.; Colaiori, F.

    2005-07-01

    We develop an algorithm to detect community structure in complex networks. The algorithm is based on spectral methods and takes into account weights and link orientation. Since the method detects efficiently clustered nodes in large networks even when these are not sharply partitioned, it turns to be specially suitable for the analysis of social and information networks. We test the algorithm on a large-scale data-set from a psychological experiment of word association. In this case, it proves to be successful both in clustering words, and in uncovering mental association patterns.

  19. Fast clustering using adaptive density peak detection.

    PubMed

    Wang, Xiao-Feng; Xu, Yifan

    2017-12-01

    Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

  20. Hybrid Clustering And Boundary Value Refinement for Tumor Segmentation using Brain MRI

    NASA Astrophysics Data System (ADS)

    Gupta, Anjali; Pahuja, Gunjan

    2017-08-01

    The method of brain tumor segmentation is the separation of tumor area from Brain Magnetic Resonance (MR) images. There are number of methods already exist for segmentation of brain tumor efficiently. However it’s tedious task to identify the brain tumor from MR images. The segmentation process is extraction of different tumor tissues such as active, tumor, necrosis, and edema from the normal brain tissues such as gray matter (GM), white matter (WM), as well as cerebrospinal fluid (CSF). As per the survey study, most of time the brain tumors are detected easily from brain MR image using region based approach but required level of accuracy, abnormalities classification is not predictable. The segmentation of brain tumor consists of many stages. Manually segmenting the tumor from brain MR images is very time consuming hence there exist many challenges in manual segmentation. In this research paper, our main goal is to present the hybrid clustering which consists of Fuzzy C-Means Clustering (for accurate tumor detection) and level set method(for handling complex shapes) for the detection of exact shape of tumor in minimal computational time. using this approach we observe that for a certain set of images 0.9412 sec of time is taken to detect tumor which is very less in comparison to recent existing algorithm i.e. Hybrid clustering (Fuzzy C-Means and K Means clustering).

  1. SAR image change detection using watershed and spectral clustering

    NASA Astrophysics Data System (ADS)

    Niu, Ruican; Jiao, L. C.; Wang, Guiting; Feng, Jie

    2011-12-01

    A new method of change detection in SAR images based on spectral clustering is presented in this paper. Spectral clustering is employed to extract change information from a pair images acquired on the same geographical area at different time. Watershed transform is applied to initially segment the big image into non-overlapped local regions, leading to reduce the complexity. Experiments results and system analysis confirm the effectiveness of the proposed algorithm.

  2. Thermal wake/vessel detection technique

    DOEpatents

    Roskovensky, John K [Albuquerque, NM; Nandy, Prabal [Albuquerque, NM; Post, Brian N [Albuquerque, NM

    2012-01-10

    A computer-automated method for detecting a vessel in water based on an image of a portion of Earth includes generating a thermal anomaly mask. The thermal anomaly mask flags each pixel of the image initially deemed to be a wake pixel based on a comparison of a thermal value of each pixel against other thermal values of other pixels localized about each pixel. Contiguous pixels flagged by the thermal anomaly mask are grouped into pixel clusters. A shape of each of the pixel clusters is analyzed to determine whether each of the pixel clusters represents a possible vessel detection event. The possible vessel detection events are represented visually within the image.

  3. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  4. Water cluster fragmentation probed by pickup experiments

    NASA Astrophysics Data System (ADS)

    Huang, Chuanfu; Kresin, Vitaly V.; Pysanenko, Andriy; Fárník, Michal

    2016-09-01

    Electron ionization is a common tool for the mass spectrometry of atomic and molecular clusters. Any cluster can be ionized efficiently by sufficiently energetic electrons, but concomitant fragmentation can seriously obstruct the goal of size-resolved detection. We present a new general method to assess the original neutral population of the cluster beam. Clusters undergo a sticking collision with a molecule from a crossed beam, and the velocities of neat and doped cluster ion peaks are measured and compared. By making use of longitudinal momentum conservation, one can reconstruct the sizes of the neutral precursors. Here this method is applied to H2O and D2O clusters in the detected ion size range of 3-10. It is found that water clusters do fragment significantly upon electron impact: the deduced neutral precursor size is ˜3-5 times larger than the observed cluster ions. This conclusion agrees with beam size characterization by another experimental technique: photoionization after Na-doping. Abundant post-ionization fragmentation of water clusters must therefore be an important factor in the interpretation of experimental data; interestingly, there is at present no detailed microscopic understanding of the underlying fragmentation dynamics.

  5. Detecting dominant motion patterns in crowds of pedestrians

    NASA Astrophysics Data System (ADS)

    Saqib, Muhammad; Khan, Sultan Daud; Blumenstein, Michael

    2017-02-01

    As the population of the world increases, urbanization generates crowding situations which poses challenges to public safety and security. Manual analysis of crowded situations is a tedious job and usually prone to errors. In this paper, we propose a novel technique of crowd analysis, the aim of which is to detect different dominant motion patterns in real-time videos. A motion field is generated by computing the dense optical flow. The motion field is then divided into blocks. For each block, we adopt an Intra-clustering algorithm for detecting different flows within the block. Later on, we employ Inter-clustering for clustering the flow vectors among different blocks. We evaluate the performance of our approach on different real-time videos. The experimental results show that our proposed method is capable of detecting distinct motion patterns in crowded videos. Moreover, our algorithm outperforms state-of-the-art methods.

  6. Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering

    NASA Astrophysics Data System (ADS)

    Onishi, Masaki; Yoda, Ikushi

    In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.

  7. The Atacama Cosmology Telescope (ACT): Beam Profiles and First SZ Cluster Maps

    NASA Technical Reports Server (NTRS)

    Hincks, A. D.; Acquaviva, V.; Ade, P. A.; Aguirre, P.; Amiri, M.; Appel, J. W.; Barrientos, L. F.; Battistelli, E. S.; Bond, J. R.; Brown, B.; hide

    2010-01-01

    The Atacama Cosmology Telescope (ACT) is currently observing the cosmic microwave background with arcminute resolution at 148 GHz, 218 GHz, and 277 GHz, In this paper, we present ACT's first results. Data have been analyzed using a maximum-likelihood map-making method which uses B-splines to model and remove the atmospheric signal. It has been used to make high-precision beam maps from which we determine the experiment's window functions, This beam information directly impacts all subsequent analyses of the data. We also used the method to map a sample of galaxy clusters via the Sunyaev-Ze1'dovich (SZ) effect, and show five clusters previously detected with X-ray or SZ observations, We provide integrated Compton-y measurements for each cluster. Of particular interest is our detection of the z = 0.44 component of A3128 and our current non-detection of the low-redshift part, providing strong evidence that the further cluster is more massive as suggested by X-ray measurements. This is a compelling example of the redshift-independent mass selection of the SZ effect.

  8. a Probabilistic Embedding Clustering Method for Urban Structure Detection

    NASA Astrophysics Data System (ADS)

    Lin, X.; Li, H.; Zhang, Y.; Gao, L.; Zhao, L.; Deng, M.

    2017-09-01

    Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by "learning" via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.

  9. An Energy-Efficient Cluster-Based Vehicle Detection on Road Network Using Intention Numeration Method

    PubMed Central

    Devasenapathy, Deepa; Kannan, Kathiravan

    2015-01-01

    The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN) is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate. PMID:25793221

  10. An energy-efficient cluster-based vehicle detection on road network using intention numeration method.

    PubMed

    Devasenapathy, Deepa; Kannan, Kathiravan

    2015-01-01

    The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN) is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate.

  11. A Context-sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection

    PubMed Central

    Cassa, Christopher A.; Grannis, Shaun J.; Overhage, J. Marc; Mandl, Kenneth D.

    2006-01-01

    Objective: The use of spatially based methods and algorithms in epidemiology and surveillance presents privacy challenges for researchers and public health agencies. We describe a novel method for anonymizing individuals in public health data sets by transposing their spatial locations through a process informed by the underlying population density. Further, we measure the impact of the skew on detection of spatial clustering as measured by a spatial scanning statistic. Design: Cases were emergency department (ED) visits for respiratory illness. Baseline ED visit data were injected with artificially created clusters ranging in magnitude, shape, and location. The geocoded locations were then transformed using a de-identification algorithm that accounts for the local underlying population density. Measurements: A total of 12,600 separate weeks of case data with artificially created clusters were combined with control data and the impact on detection of spatial clustering identified by a spatial scan statistic was measured. Results: The anonymization algorithm produced an expected skew of cases that resulted in high values of data set k-anonymity. De-identification that moves points an average distance of 0.25 km lowers the spatial cluster detection sensitivity by less than 4% and lowers the detection specificity less than 1%. Conclusion: A population-density–based Gaussian spatial blurring markedly decreases the ability to identify individuals in a data set while only slightly decreasing the performance of a standardly used outbreak detection tool. These findings suggest new approaches to anonymizing data for spatial epidemiology and surveillance. PMID:16357353

  12. Network module detection: Affinity search technique with the multi-node topological overlap measure

    PubMed Central

    Li, Ai; Horvath, Steve

    2009-01-01

    Background Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: PMID:19619323

  13. Network module detection: Affinity search technique with the multi-node topological overlap measure.

    PubMed

    Li, Ai; Horvath, Steve

    2009-07-20

    Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/

  14. Detection of wood failure by image processing method: influence of algorithm, adhesive and wood species

    Treesearch

    Lanying Lin; Sheng He; Feng Fu; Xiping Wang

    2015-01-01

    Wood failure percentage (WFP) is an important index for evaluating the bond strength of plywood. Currently, the method used for detecting WFP is visual inspection, which lacks efficiency. In order to improve it, image processing methods are applied to wood failure detection. The present study used thresholding and K-means clustering algorithms in wood failure detection...

  15. Use of joint two-view information for computerized lesion detection on mammograms: improvement of microcalcification detection accuracy

    NASA Astrophysics Data System (ADS)

    Sahiner, Berkman; Gurcan, Metin N.; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Petrick, Nicholas; Helvie, Mark A.

    2002-05-01

    We are developing new techniques to improve the accuracy of computerized microcalcification detection by using the joint two-view information on craniocaudal (CC) and mediolateral-oblique (MLO) views. After cluster candidates were detected using a single-view detection technique, candidates on CC and MLO views were paired using their radial distances from the nipple. Object pairs were classified with a joint two-view classifier that used the similarity of objects in a pair. Each cluster candidate was also classified as a true microcalcification cluster or a false-positive (FP) using its single-view features. The outputs of these two classifiers were fused. A data set of 38 pairs of mammograms from our database was used to train the new detection technique. The independent test set consisted of 77 pairs of mammograms from the University of South Florida public database. At a per-film sensitivity of 70%, the FP rates were 0.17 and 0.27 with the fusion and single-view detection methods, respectively. Our results indicate that correspondence of cluster candidates on two different views provides valuable additional information for distinguishing false from true microcalcification clusters.

  16. Voronoi distance based prospective space-time scans for point data sets: a dengue fever cluster analysis in a southeast Brazilian town

    PubMed Central

    2011-01-01

    Background The Prospective Space-Time scan statistic (PST) is widely used for the evaluation of space-time clusters of point event data. Usually a window of cylindrical shape is employed, with a circular or elliptical base in the space domain. Recently, the concept of Minimum Spanning Tree (MST) was applied to specify the set of potential clusters, through the Density-Equalizing Euclidean MST (DEEMST) method, for the detection of arbitrarily shaped clusters. The original map is cartogram transformed, such that the control points are spread uniformly. That method is quite effective, but the cartogram construction is computationally expensive and complicated. Results A fast method for the detection and inference of point data set space-time disease clusters is presented, the Voronoi Based Scan (VBScan). A Voronoi diagram is built for points representing population individuals (cases and controls). The number of Voronoi cells boundaries intercepted by the line segment joining two cases points defines the Voronoi distance between those points. That distance is used to approximate the density of the heterogeneous population and build the Voronoi distance MST linking the cases. The successive removal of edges from the Voronoi distance MST generates sub-trees which are the potential space-time clusters. Finally, those clusters are evaluated through the scan statistic. Monte Carlo replications of the original data are used to evaluate the significance of the clusters. An application for dengue fever in a small Brazilian city is presented. Conclusions The ability to promptly detect space-time clusters of disease outbreaks, when the number of individuals is large, was shown to be feasible, due to the reduced computational load of VBScan. Instead of changing the map, VBScan modifies the metric used to define the distance between cases, without requiring the cartogram construction. Numerical simulations showed that VBScan has higher power of detection, sensitivity and positive predicted value than the Elliptic PST. Furthermore, as VBScan also incorporates topological information from the point neighborhood structure, in addition to the usual geometric information, it is more robust than purely geometric methods such as the elliptic scan. Those advantages were illustrated in a real setting for dengue fever space-time clusters. PMID:21513556

  17. Cluster Detection Tests in Spatial Epidemiology: A Global Indicator for Performance Assessment

    PubMed Central

    Guttmann, Aline; Li, Xinran; Feschet, Fabien; Gaudart, Jean; Demongeot, Jacques; Boire, Jean-Yves; Ouchchane, Lemlih

    2015-01-01

    In cluster detection of disease, the use of local cluster detection tests (CDTs) is current. These methods aim both at locating likely clusters and testing for their statistical significance. New or improved CDTs are regularly proposed to epidemiologists and must be subjected to performance assessment. Because location accuracy has to be considered, performance assessment goes beyond the raw estimation of type I or II errors. As no consensus exists for performance evaluations, heterogeneous methods are used, and therefore studies are rarely comparable. A global indicator of performance, which assesses both spatial accuracy and usual power, would facilitate the exploration of CDTs behaviour and help between-studies comparisons. The Tanimoto coefficient (TC) is a well-known measure of similarity that can assess location accuracy but only for one detected cluster. In a simulation study, performance is measured for many tests. From the TC, we here propose two statistics, the averaged TC and the cumulated TC, as indicators able to provide a global overview of CDTs performance for both usual power and location accuracy. We evidence the properties of these two indicators and the superiority of the cumulated TC to assess performance. We tested these indicators to conduct a systematic spatial assessment displayed through performance maps. PMID:26086911

  18. Combinatorial clustering and Its Application to 3D Polygonal Traffic Sign Reconstruction From Multiple Images

    NASA Astrophysics Data System (ADS)

    Vallet, B.; Soheilian, B.; Brédif, M.

    2014-08-01

    The 3D reconstruction of similar 3D objects detected in 2D faces a major issue when it comes to grouping the 2D detections into clusters to be used to reconstruct the individual 3D objects. Simple clustering heuristics fail as soon as similar objects are close. This paper formulates a framework to use the geometric quality of the reconstruction as a hint to do a proper clustering. We present a methodology to solve the resulting combinatorial optimization problem with some simplifications and approximations in order to make it tractable. The proposed method is applied to the reconstruction of 3D traffic signs from their 2D detections to demonstrate its capacity to solve ambiguities.

  19. A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters.

    PubMed

    Tango, Toshiro; Takahashi, Kunihiko

    2012-12-30

    Spatial scan statistics are widely used tools for detection of disease clusters. Especially, the circular spatial scan statistic proposed by Kulldorff (1997) has been utilized in a wide variety of epidemiological studies and disease surveillance. However, as it cannot detect noncircular, irregularly shaped clusters, many authors have proposed different spatial scan statistics, including the elliptic version of Kulldorff's scan statistic. The flexible spatial scan statistic proposed by Tango and Takahashi (2005) has also been used for detecting irregularly shaped clusters. However, this method sets a feasible limitation of a maximum of 30 nearest neighbors for searching candidate clusters because of heavy computational load. In this paper, we show a flexible spatial scan statistic implemented with a restricted likelihood ratio proposed by Tango (2008) to (1) eliminate the limitation of 30 nearest neighbors and (2) to have surprisingly much less computational time than the original flexible spatial scan statistic. As a side effect, it is shown to be able to detect clusters with any shape reasonably well as the relative risk of the cluster becomes large via Monte Carlo simulation. We illustrate the proposed spatial scan statistic with data on mortality from cerebrovascular disease in the Tokyo Metropolitan area, Japan. Copyright © 2012 John Wiley & Sons, Ltd.

  20. Disparity : scalable anomaly detection for clusters.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Desai, N.; Bradshaw, R.; Lusk, E.

    2008-01-01

    In this paper, we describe disparity, a tool that does parallel, scalable anomaly detection for clusters. Disparity uses basic statistical methods and scalable reduction operations to perform data reduction on client nodes and uses these results to locate node anomalies. We discuss the implementation of disparity and present results of its use on a SiCortex SC5832 system.

  1. Application of a XMM-Newton EPIC Monte Carlo to Analysis And Interpretation of Data for Abell 1689, RXJ0658-55 And the Centaurus Clusters of Galaxies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andersson, Karl E.; /Stockholm U. /SLAC; Peterson, J.R.

    2007-04-17

    We propose a new Monte Carlo method to study extended X-ray sources with the European Photon Imaging Camera (EPIC) aboard XMM Newton. The Smoothed Particle Inference (SPI) technique, described in a companion paper, is applied here to the EPIC data for the clusters of galaxies Abell 1689, Centaurus and RXJ 0658-55 (the ''bullet cluster''). We aim to show the advantages of this method of simultaneous spectral-spatial modeling over traditional X-ray spectral analysis. In Abell 1689 we confirm our earlier findings about structure in temperature distribution and produce a high resolution temperature map. We also confirm our findings about velocity structuremore » within the gas. In the bullet cluster, RXJ 0658-55, we produce the highest resolution temperature map ever to be published of this cluster allowing us to trace what looks like the motion of the bullet in the cluster. We even detect a south to north temperature gradient within the bullet itself. In the Centaurus cluster we detect, by dividing up the luminosity of the cluster in bands of gas temperatures, a striking feature to the north-east of the cluster core. We hypothesize that this feature is caused by a subcluster left over from a substantial merger that slightly displaced the core. We conclude that our method is very powerful in determining the spatial distributions of plasma temperatures and very useful for systematic studies in cluster structure.« less

  2. Possibilistic clustering for shape recognition

    NASA Technical Reports Server (NTRS)

    Keller, James M.; Krishnapuram, Raghu

    1993-01-01

    Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.

  3. Possibilistic clustering for shape recognition

    NASA Technical Reports Server (NTRS)

    Keller, James M.; Krishnapuram, Raghu

    1992-01-01

    Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

  4. Detection and quantification of solute clusters in a nanostructured ferritic alloy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, Michael K.; Larson, David J.; Reinhard, D. A.

    2014-12-26

    A series of simulated atom probe datasets were examined with a friends-of-friends method to establish the detection efficiency required to resolve solute clusters in the ferrite phase of a 14YWT nanostructured ferritic alloy. The size and number densities of solute clusters in the ferrite of the as-milled mechanically-alloyed condition and the stir zone of a friction stir weld were estimated with a prototype high-detection-efficiency (~80%) local electrode atom probe. High number densities, 1.8 × 10 24 m –3 and 1.2 × 10 24 m –3, respectively of solute clusters containing between 2 and 9 solute atoms of Ti, Y andmore » O and were detected for these two conditions. Furthermore, these results support first principle calculations that predicted that vacancies stabilize these Ti–Y–O– clusters, which retard diffusion and contribute to the excellent high temperature stability of the microstructure and radiation tolerance of nanostructured ferritic alloys.« less

  5. The spatial clustering of obesity: does the built environment matter?

    PubMed

    Huang, R; Moudon, A V; Cook, A J; Drewnowski, A

    2015-12-01

    Obesity rates in the USA show distinct geographical patterns. The present study used spatial cluster detection methods and individual-level data to locate obesity clusters and to analyse them in relation to the neighbourhood built environment. The 2008-2009 Seattle Obesity Study provided data on the self-reported height, weight, and sociodemographic characteristics of 1602 King County adults. Home addresses were geocoded. Clusters of high or low body mass index were identified using Anselin's Local Moran's I and a spatial scan statistic with regression models that searched for unmeasured neighbourhood-level factors from residuals, adjusting for measured individual-level covariates. Spatially continuous values of objectively measured features of the local neighbourhood built environment (SmartMaps) were constructed for seven variables obtained from tax rolls and commercial databases. Both the Local Moran's I and a spatial scan statistic identified similar spatial concentrations of obesity. High and low obesity clusters were attenuated after adjusting for age, gender, race, education and income, and they disappeared once neighbourhood residential property values and residential density were included in the model. Using individual-level data to detect obesity clusters with two cluster detection methods, the present study showed that the spatial concentration of obesity was wholly explained by neighbourhood composition and socioeconomic characteristics. These characteristics may serve to more precisely locate obesity prevention and intervention programmes. © 2014 The British Dietetic Association Ltd.

  6. Comparison of Poisson and Bernoulli spatial cluster analyses of pediatric injuries in a fire district

    PubMed Central

    Warden, Craig R

    2008-01-01

    Background With limited resources available, injury prevention efforts need to be targeted both geographically and to specific populations. As part of a pediatric injury prevention project, data was obtained on all pediatric medical and injury incidents in a fire district to evaluate geographical clustering of pediatric injuries. This will be the first step in attempting to prevent these injuries with specific interventions depending on locations and mechanisms. Results There were a total of 4803 incidents involving patients less than 15 years of age that the fire district responded to during 2001–2005 of which 1997 were categorized as injuries and 2806 as medical calls. The two cohorts (injured versus medical) differed in age distribution (7.7 ± 4.4 years versus 5.4 ± 4.8 years, p < 0.001) and location type of incident (school or church 12% versus 15%, multifamily residence 22% versus 13%, single family residence 51% versus 28%, sport, park or recreational facility 3% versus 8%, public building 8% versus 7%, and street or road 3% versus 30%, respectively, p < 0.001). Using the medical incident locations as controls, there was no significant clustering for environmental or assault injuries using the Bernoulli method while there were four significant clusters for all injury mechanisms combined, 13 clusters for motor vehicle collisions, one for falls, and two for pedestrian or bicycle injuries. Using the Poisson cluster method on incidence rates by census tract identified four clusters for all injuries, three for motor vehicle collisions, four for fall injuries, and one each for environmental and assault injuries. The two detection methods shared a minority of overlapping geographical clusters. Conclusion Significant clustering occurs overall for all injury mechanisms combined and for each mechanism depending on the cluster detection method used. There was some overlap in geographic clusters identified by both methods. The Bernoulli method allows more focused cluster mapping and evaluation since it directly uses location data. Once clusters are found, interventions can be targeted to specific geographic locations, location types, ages of victims, and mechanisms of injury. PMID:18808720

  7. Comparison of Poisson and Bernoulli spatial cluster analyses of pediatric injuries in a fire district.

    PubMed

    Warden, Craig R

    2008-09-22

    With limited resources available, injury prevention efforts need to be targeted both geographically and to specific populations. As part of a pediatric injury prevention project, data was obtained on all pediatric medical and injury incidents in a fire district to evaluate geographical clustering of pediatric injuries. This will be the first step in attempting to prevent these injuries with specific interventions depending on locations and mechanisms. There were a total of 4803 incidents involving patients less than 15 years of age that the fire district responded to during 2001-2005 of which 1997 were categorized as injuries and 2806 as medical calls. The two cohorts (injured versus medical) differed in age distribution (7.7 +/- 4.4 years versus 5.4 +/- 4.8 years, p < 0.001) and location type of incident (school or church 12% versus 15%, multifamily residence 22% versus 13%, single family residence 51% versus 28%, sport, park or recreational facility 3% versus 8%, public building 8% versus 7%, and street or road 3% versus 30%, respectively, p < 0.001). Using the medical incident locations as controls, there was no significant clustering for environmental or assault injuries using the Bernoulli method while there were four significant clusters for all injury mechanisms combined, 13 clusters for motor vehicle collisions, one for falls, and two for pedestrian or bicycle injuries. Using the Poisson cluster method on incidence rates by census tract identified four clusters for all injuries, three for motor vehicle collisions, four for fall injuries, and one each for environmental and assault injuries. The two detection methods shared a minority of overlapping geographical clusters. Significant clustering occurs overall for all injury mechanisms combined and for each mechanism depending on the cluster detection method used. There was some overlap in geographic clusters identified by both methods. The Bernoulli method allows more focused cluster mapping and evaluation since it directly uses location data. Once clusters are found, interventions can be targeted to specific geographic locations, location types, ages of victims, and mechanisms of injury.

  8. A source-attractor approach to network detection of radiation sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Qishi; Barry, M. L..; Grieme, M.

    Radiation source detection using a network of detectors is an active field of research for homeland security and defense applications. We propose Source-attractor Radiation Detection (SRD) method to aggregate measurements from a network of detectors for radiation source detection. SRD method models a potential radiation source as a magnet -like attractor that pulls in pre-computed virtual points from the detector locations. A detection decision is made if a sufficient level of attraction, quantified by the increase in the clustering of the shifted virtual points, is observed. Compared with traditional methods, SRD has the following advantages: i) it does not requiremore » an accurate estimate of the source location from limited and noise-corrupted sensor readings, unlike the localizationbased methods, and ii) its virtual point shifting and clustering calculation involve simple arithmetic operations based on the number of detectors, avoiding the high computational complexity of grid-based likelihood estimation methods. We evaluate its detection performance using canonical datasets from Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) tests. SRD achieves both lower false alarm rate and false negative rate compared to three existing algorithms for network source detection.« less

  9. Small target detection based on difference accumulation and Gaussian curvature under complex conditions

    NASA Astrophysics Data System (ADS)

    Zhang, He; Niu, Yanxiong; Zhang, Hao

    2017-12-01

    Small target detection is a significant subject in infrared search and track and other photoelectric imaging systems. The small target is imaged under complex conditions, which contains clouds, horizon and bright part. In this paper, a novel small target detection method is proposed based on difference accumulation, clustering and Gaussian curvature. Difference accumulation varies from regions. Therefore, after obtaining difference accumulations, clustering is applied to determine whether the pixel belongs to the heterogeneous region, and eliminate heterogeneous region. Then Gaussian curvature is used to separate target from the homogeneous region. Experiments are conducted for verification, along with comparisons to several other methods. The experimental results demonstrate that our method has an advantage of 1-2 orders of magnitude on SCRG and BSF than others. Given that the false alarm rate is 1, the detection probability can be approximately 0.9 by using proposed method.

  10. Optimization of fecal cytology in the dog: comparison of three sampling methods.

    PubMed

    Frezoulis, Petros S; Angelidou, Elisavet; Diakou, Anastasia; Rallis, Timoleon S; Mylonakis, Mathios E

    2017-09-01

    Dry-mount fecal cytology (FC) is a component of the diagnostic evaluation of gastrointestinal diseases. There is limited information on the possible effect of the sampling method on the cytologic findings of healthy dogs or dogs admitted with diarrhea. We aimed to: (1) establish sampling method-specific expected values of selected cytologic parameters (isolated or clustered epithelial cells, neutrophils, lymphocytes, macrophages, spore-forming rods) in clinically healthy dogs; (2) investigate if the detection of cytologic abnormalities differs among methods in dogs admitted with diarrhea; and (3) investigate if there is any association between FC abnormalities and the anatomic origin (small- or large-bowel diarrhea) or the chronicity of diarrhea. Sampling with digital examination (DE), rectal scraping (RS), and rectal lavage (RL) was prospectively assessed in 37 healthy and 34 diarrheic dogs. The median numbers of isolated ( p = 0.000) or clustered ( p = 0.002) epithelial cells, and of lymphocytes ( p = 0.000), differed among the 3 methods in healthy dogs. In the diarrheic dogs, the RL method was the least sensitive in detecting neutrophils, and isolated or clustered epithelial cells. Cytologic abnormalities were not associated with the origin or the chronicity of diarrhea. Sampling methods differed in their sensitivity to detect abnormalities in FC; DE or RS may be of higher sensitivity compared to RL. Anatomic origin or chronicity of diarrhea do not seem to affect the detection of cytologic abnormalities.

  11. A proximity-based graph clustering method for the identification and application of transcription factor clusters.

    PubMed

    Spadafore, Maxwell; Najarian, Kayvan; Boyle, Alan P

    2017-11-29

    Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions. Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF. We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions. The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.

  12. Community Detection in Complex Networks via Clique Conductance.

    PubMed

    Lu, Zhenqi; Wahlström, Johan; Nehorai, Arye

    2018-04-13

    Network science plays a central role in understanding and modeling complex systems in many areas including physics, sociology, biology, computer science, economics, politics, and neuroscience. One of the most important features of networks is community structure, i.e., clustering of nodes that are locally densely interconnected. Communities reveal the hierarchical organization of nodes, and detecting communities is of great importance in the study of complex systems. Most existing community-detection methods consider low-order connection patterns at the level of individual links. But high-order connection patterns, at the level of small subnetworks, are generally not considered. In this paper, we develop a novel community-detection method based on cliques, i.e., local complete subnetworks. The proposed method overcomes the deficiencies of previous similar community-detection methods by considering the mathematical properties of cliques. We apply the proposed method to computer-generated graphs and real-world network datasets. When applied to networks with known community structure, the proposed method detects the structure with high fidelity and sensitivity. When applied to networks with no a priori information regarding community structure, the proposed method yields insightful results revealing the organization of these complex networks. We also show that the proposed method is guaranteed to detect near-optimal clusters in the bipartition case.

  13. Performance Analysis of a Pole and Tree Trunk Detection Method for Mobile Laser Scanning Data

    NASA Astrophysics Data System (ADS)

    Lehtomäki, M.; Jaakkola, A.; Hyyppä, J.; Kukko, A.; Kaartinen, H.

    2011-09-01

    Dense point clouds can be collected efficiently from large areas using mobile laser scanning (MLS) technology. Accurate MLS data can be used for detailed 3D modelling of the road surface and objects around it. The 3D models can be utilised, for example, in street planning and maintenance and noise modelling. Utility poles, traffic signs, and lamp posts can be considered an important part of road infrastructure. Poles and trees stand out from the environment and should be included in realistic 3D models. Detection of narrow vertical objects, such as poles and tree trunks, from MLS data was studied. MLS produces huge amounts of data and, therefore, processing methods should be as automatic as possible and for the methods to be practical, the algorithms should run in an acceptable time. The automatic pole detection method tested in this study is based on first finding point clusters that are good candidates for poles and then separating poles and tree trunks from other clusters using features calculated from the clusters and by applying a mask that acts as a model of a pole. The method achieved detection rates of 77.7% and 69.7% in the field tests while 81.0% and 86.5% of the detected targets were correct. Pole-like targets that were surrounded by other objects, such as tree trunks that were inside branches, were the most difficult to detect. Most of the false detections came from wall structures, which could be corrected in further processing.

  14. A FRAMEWORK FOR ATTRIBUTE-BASED COMMUNITY DETECTION WITH APPLICATIONS TO INTEGRATED FUNCTIONAL GENOMICS.

    PubMed

    Yu, Han; Hageman Blair, Rachael

    2016-01-01

    Understanding community structure in networks has received considerable attention in recent years. Detecting and leveraging community structure holds promise for understanding and potentially intervening with the spread of influence. Network features of this type have important implications in a number of research areas, including, marketing, social networks, and biology. However, an overwhelming majority of traditional approaches to community detection cannot readily incorporate information of node attributes. Integrating structural and attribute information is a major challenge. We propose a exible iterative method; inverse regularized Markov Clustering (irMCL), to network clustering via the manipulation of the transition probability matrix (aka stochastic flow) corresponding to a graph. Similar to traditional Markov Clustering, irMCL iterates between "expand" and "inflate" operations, which aim to strengthen the intra-cluster flow, while weakening the inter-cluster flow. Attribute information is directly incorporated into the iterative method through a sigmoid (logistic function) that naturally dampens attribute influence that is contradictory to the stochastic flow through the network. We demonstrate advantages and the exibility of our approach using simulations and real data. We highlight an application that integrates breast cancer gene expression data set and a functional network defined via KEGG pathways reveal significant modules for survival.

  15. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

    PubMed

    Gangnon, Ronald E

    2012-03-01

    The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.

  16. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

    PubMed Central

    Gangnon, Ronald E.

    2011-01-01

    Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118

  17. Comparison of cluster-based and source-attribution methods for estimating transmission risk using large HIV sequence databases.

    PubMed

    Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M

    2018-06-01

    Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  18. Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

    NASA Astrophysics Data System (ADS)

    Ma, Xiaoke; Wang, Bingbo; Yu, Liang

    2018-01-01

    Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.

  19. A mathematical programming approach for sequential clustering of dynamic networks

    NASA Astrophysics Data System (ADS)

    Silva, Jonathan C.; Bennett, Laura; Papageorgiou, Lazaros G.; Tsoka, Sophia

    2016-02-01

    A common analysis performed on dynamic networks is community structure detection, a challenging problem that aims to track the temporal evolution of network modules. An emerging area in this field is evolutionary clustering, where the community structure of a network snapshot is identified by taking into account both its current state as well as previous time points. Based on this concept, we have developed a mixed integer non-linear programming (MINLP) model, SeqMod, that sequentially clusters each snapshot of a dynamic network. The modularity metric is used to determine the quality of community structure of the current snapshot and the historical cost is accounted for by optimising the number of node pairs co-clustered at the previous time point that remain so in the current snapshot partition. Our method is tested on social networks of interactions among high school students, college students and members of the Brazilian Congress. We show that, for an adequate parameter setting, our algorithm detects the classes that these students belong more accurately than partitioning each time step individually or by partitioning the aggregated snapshots. Our method also detects drastic discontinuities in interaction patterns across network snapshots. Finally, we present comparative results with similar community detection methods for time-dependent networks from the literature. Overall, we illustrate the applicability of mathematical programming as a flexible, adaptable and systematic approach for these community detection problems. Contribution to the Topical Issue "Temporal Network Theory and Applications", edited by Petter Holme.

  20. Change detection and classification of land cover in multispectral satellite imagery using clustering of sparse approximations (CoSA) over learned feature dictionaries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moody, Daniela I.; Brumby, Steven P.; Rowland, Joel C.

    Neuromimetic machine vision and pattern recognition algorithms are of great interest for landscape characterization and change detection in satellite imagery in support of global climate change science and modeling. We present results from an ongoing effort to extend machine vision methods to the environmental sciences, using adaptive sparse signal processing combined with machine learning. A Hebbian learning rule is used to build multispectral, multiresolution dictionaries from regional satellite normalized band difference index data. Land cover labels are automatically generated via our CoSA algorithm: Clustering of Sparse Approximations, using a clustering distance metric that combines spectral and spatial textural characteristics tomore » help separate geologic, vegetative, and hydrologie features. We demonstrate our method on example Worldview-2 satellite images of an Arctic region, and use CoSA labels to detect seasonal surface changes. In conclusion, our results suggest that neuroscience-based models are a promising approach to practical pattern recognition and change detection problems in remote sensing.« less

  1. Change detection and classification of land cover in multispectral satellite imagery using clustering of sparse approximations (CoSA) over learned feature dictionaries

    DOE PAGES

    Moody, Daniela I.; Brumby, Steven P.; Rowland, Joel C.; ...

    2014-10-01

    Neuromimetic machine vision and pattern recognition algorithms are of great interest for landscape characterization and change detection in satellite imagery in support of global climate change science and modeling. We present results from an ongoing effort to extend machine vision methods to the environmental sciences, using adaptive sparse signal processing combined with machine learning. A Hebbian learning rule is used to build multispectral, multiresolution dictionaries from regional satellite normalized band difference index data. Land cover labels are automatically generated via our CoSA algorithm: Clustering of Sparse Approximations, using a clustering distance metric that combines spectral and spatial textural characteristics tomore » help separate geologic, vegetative, and hydrologie features. We demonstrate our method on example Worldview-2 satellite images of an Arctic region, and use CoSA labels to detect seasonal surface changes. In conclusion, our results suggest that neuroscience-based models are a promising approach to practical pattern recognition and change detection problems in remote sensing.« less

  2. Method for assaying clustered DNA damages

    DOEpatents

    Sutherland, Betsy M.

    2004-09-07

    Disclosed is a method for detecting and quantifying clustered damages in DNA. In this method, a first aliquot of the DNA to be tested for clustered damages with one or more lesion-specific cleaving reagents under conditions appropriate for cleavage of the DNA to produce single-strand nicks in the DNA at sites of damage lesions. The number average molecular length (Ln) of double stranded DNA is then quantitatively determined for the treated DNA. The number average molecular length (Ln) of double stranded DNA is also quantitatively determined for a second, untreated aliquot of the DNA. The frequency of clustered damages (.PHI..sub.c) in the DNA is then calculated.

  3. Refining historical limits method to improve disease cluster detection, New York City, New York, USA.

    PubMed

    Levin-Rector, Alison; Wilson, Elisha L; Fine, Annie D; Greene, Sharon K

    2015-02-01

    Since the early 2000s, the Bureau of Communicable Disease of the New York City Department of Health and Mental Hygiene has analyzed reportable infectious disease data weekly by using the historical limits method to detect unusual clusters that could represent outbreaks. This method typically produced too many signals for each to be investigated with available resources while possibly failing to signal during true disease outbreaks. We made method refinements that improved the consistency of case inclusion criteria and accounted for data lags and trends and aberrations in historical data. During a 12-week period in 2013, we prospectively assessed these refinements using actual surveillance data. The refined method yielded 74 signals, a 45% decrease from what the original method would have produced. Fewer and less biased signals included a true citywide increase in legionellosis and a localized campylobacteriosis cluster subsequently linked to live-poultry markets. Future evaluations using simulated data could complement this descriptive assessment.

  4. Assessment of cluster yield components by image analysis.

    PubMed

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  5. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    PubMed

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  6. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods

    PubMed Central

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610

  7. The Effects of Including Observed Means or Latent Means as Covariates in Multilevel Models for Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Aydin, Burak; Leite, Walter L.; Algina, James

    2016-01-01

    We investigated methods of including covariates in two-level models for cluster randomized trials to increase power to detect the treatment effect. We compared multilevel models that included either an observed cluster mean or a latent cluster mean as a covariate, as well as the effect of including Level 1 deviation scores in the model. A Monte…

  8. Spatio-temporal cluster detection of chickenpox in Valencia, Spain in the period 2008-2012.

    PubMed

    Iftimi, Adina; Martínez-Ruiz, Francisco; Míguez Santiyán, Ana; Montes, Francisco

    2015-05-18

    Chickenpox is a highly contagious airborne disease caused by Varicella zoster, which affects nearly all non-immune children worldwide with an annual incidence estimated at 80-90 million cases. To analyze the spatiotemporal pattern of the chickenpox incidence in the city of Valencia, Spain two complementary statistical approaches were used. First, we evaluated the existence of clusters and spatio-temporal interaction; secondly, we used this information to find the locations of the spatio-temporal clusters via the space-time permutation model. The first method used detects any aggregation in our data but does not provide the spatial and temporal information. The second method gives the locations, areas and time-frame for the spatio-temporal clusters. An overall decreasing time trend, a pronounced 12-monthly periodicity and two complementary periods were observed. Several areas with high incidence, surrounding the center of the city were identified. The existence of aggregation in time and space was observed, and a number of spatio-temporal clusters were located.

  9. MRI-alone radiation therapy planning for prostate cancer: Automatic fiducial marker detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghose, Soumya, E-mail: soumya.ghose@case.edu; Mitra, Jhimli; Rivest-Hénault, David

    Purpose: The feasibility of radiation therapy treatment planning using substitute computed tomography (sCT) generated from magnetic resonance images (MRIs) has been demonstrated by a number of research groups. One challenge with an MRI-alone workflow is the accurate identification of intraprostatic gold fiducial markers, which are frequently used for prostate localization prior to each dose delivery fraction. This paper investigates a template-matching approach for the detection of these seeds in MRI. Methods: Two different gradient echo T1 and T2* weighted MRI sequences were acquired from fifteen prostate cancer patients and evaluated for seed detection. For training, seed templates from manual contoursmore » were selected in a spectral clustering manifold learning framework. This aids in clustering “similar” gold fiducial markers together. The marker with the minimum distance to a cluster centroid was selected as the representative template of that cluster during training. During testing, Gaussian mixture modeling followed by a Markovian model was used in automatic detection of the probable candidates. The probable candidates were rigidly registered to the templates identified from spectral clustering, and a similarity metric is computed for ranking and detection. Results: A fiducial detection accuracy of 95% was obtained compared to manual observations. Expert radiation therapist observers were able to correctly identify all three implanted seeds on 11 of the 15 scans (the proposed method correctly identified all seeds on 10 of the 15). Conclusions: An novel automatic framework for gold fiducial marker detection in MRI is proposed and evaluated with detection accuracies comparable to manual detection. When radiation therapists are unable to determine the seed location in MRI, they refer back to the planning CT (only available in the existing clinical framework); similarly, an automatic quality control is built into the automatic software to ensure that all gold seeds are either correctly detected or a warning is raised for further manual intervention.« less

  10. Computer object segmentation by nonlinear image enhancement, multidimensional clustering, and geometrically constrained contour optimization

    NASA Astrophysics Data System (ADS)

    Bruynooghe, Michel M.

    1998-04-01

    In this paper, we present a robust method for automatic object detection and delineation in noisy complex images. The proposed procedure is a three stage process that integrates image segmentation by multidimensional pixel clustering and geometrically constrained optimization of deformable contours. The first step is to enhance the original image by nonlinear unsharp masking. The second step is to segment the enhanced image by multidimensional pixel clustering, using our reducible neighborhoods clustering algorithm that has a very interesting theoretical maximal complexity. Then, candidate objects are extracted and initially delineated by an optimized region merging algorithm, that is based on ascendant hierarchical clustering with contiguity constraints and on the maximization of average contour gradients. The third step is to optimize the delineation of previously extracted and initially delineated objects. Deformable object contours have been modeled by cubic splines. An affine invariant has been used to control the undesired formation of cusps and loops. Non linear constrained optimization has been used to maximize the external energy. This avoids the difficult and non reproducible choice of regularization parameters, that are required by classical snake models. The proposed method has been applied successfully to the detection of fine and subtle microcalcifications in X-ray mammographic images, to defect detection by moire image analysis, and to the analysis of microrugosities of thin metallic films. The later implementation of the proposed method on a digital signal processor associated to a vector coprocessor would allow the design of a real-time object detection and delineation system for applications in medical imaging and in industrial computer vision.

  11. A coloured oil level indicator detection method based on simple linear iterative clustering

    NASA Astrophysics Data System (ADS)

    Liu, Tianli; Li, Dongsong; Jiao, Zhiming; Liang, Tao; Zhou, Hao; Yang, Guoqing

    2017-12-01

    A detection method of coloured oil level indicator is put forward. The method is applied to inspection robot in substation, which realized the automatic inspection and recognition of oil level indicator. Firstly, the detected image of the oil level indicator is collected, and the detected image is clustered and segmented to obtain the label matrix of the image. Secondly, the detection image is processed by colour space transformation, and the feature matrix of the image is obtained. Finally, the label matrix and feature matrix are used to locate and segment the detected image, and the upper edge of the recognized region is obtained. If the upper limb line exceeds the preset oil level threshold, the alarm will alert the station staff. Through the above-mentioned image processing, the inspection robot can independently recognize the oil level of the oil level indicator, and instead of manual inspection. It embodies the automatic and intelligent level of unattended operation.

  12. Multi-fault clustering and diagnosis of gear system mined by spectrum entropy clustering based on higher order cumulants

    NASA Astrophysics Data System (ADS)

    Shao, Renping; Li, Jing; Hu, Wentao; Dong, Feifei

    2013-02-01

    Higher order cumulants (HOC) is a new kind of modern signal analysis of theory and technology. Spectrum entropy clustering (SEC) is a data mining method of statistics, extracting useful characteristics from a mass of nonlinear and non-stationary data. Following a discussion on the characteristics of HOC theory and SEC method in this paper, the study of signal processing techniques and the unique merits of nonlinear coupling characteristic analysis in processing random and non-stationary signals are introduced. Also, a new clustering analysis and diagnosis method is proposed for detecting multi-damage on gear by introducing the combination of HOC and SEC into the damage-detection and diagnosis of the gear system. The noise is restrained by HOC and by extracting coupling features and separating the characteristic signal at different speeds and frequency bands. Under such circumstances, the weak signal characteristics in the system are emphasized and the characteristic of multi-fault is extracted. Adopting a data-mining method of SEC conducts an analysis and diagnosis at various running states, such as the speed of 300 r/min, 900 r/min, 1200 r/min, and 1500 r/min of the following six signals: no-fault, short crack-fault in tooth root, long crack-fault in tooth root, short crack-fault in pitch circle, long crack-fault in pitch circle, and wear-fault on tooth. Research shows that this combined method of detection and diagnosis can also identify the degree of damage of some faults. On this basis, the virtual instrument of the gear system which detects damage and diagnoses faults is developed by combining with advantages of MATLAB and VC++, employing component object module technology, adopting mixed programming methods, and calling the program transformed from an *.m file under VC++. This software system possesses functions of collecting and introducing vibration signals of gear, analyzing and processing signals, extracting features, visualizing graphics, detecting and diagnosing faults, detecting and monitoring, etc. Finally, the results of testing and verifying show that the developed system can effectively be used to detect and diagnose faults in an actual operating gear transmission system.

  13. Multi-fault clustering and diagnosis of gear system mined by spectrum entropy clustering based on higher order cumulants.

    PubMed

    Shao, Renping; Li, Jing; Hu, Wentao; Dong, Feifei

    2013-02-01

    Higher order cumulants (HOC) is a new kind of modern signal analysis of theory and technology. Spectrum entropy clustering (SEC) is a data mining method of statistics, extracting useful characteristics from a mass of nonlinear and non-stationary data. Following a discussion on the characteristics of HOC theory and SEC method in this paper, the study of signal processing techniques and the unique merits of nonlinear coupling characteristic analysis in processing random and non-stationary signals are introduced. Also, a new clustering analysis and diagnosis method is proposed for detecting multi-damage on gear by introducing the combination of HOC and SEC into the damage-detection and diagnosis of the gear system. The noise is restrained by HOC and by extracting coupling features and separating the characteristic signal at different speeds and frequency bands. Under such circumstances, the weak signal characteristics in the system are emphasized and the characteristic of multi-fault is extracted. Adopting a data-mining method of SEC conducts an analysis and diagnosis at various running states, such as the speed of 300 r/min, 900 r/min, 1200 r/min, and 1500 r/min of the following six signals: no-fault, short crack-fault in tooth root, long crack-fault in tooth root, short crack-fault in pitch circle, long crack-fault in pitch circle, and wear-fault on tooth. Research shows that this combined method of detection and diagnosis can also identify the degree of damage of some faults. On this basis, the virtual instrument of the gear system which detects damage and diagnoses faults is developed by combining with advantages of MATLAB and VC++, employing component object module technology, adopting mixed programming methods, and calling the program transformed from an *.m file under VC++. This software system possesses functions of collecting and introducing vibration signals of gear, analyzing and processing signals, extracting features, visualizing graphics, detecting and diagnosing faults, detecting and monitoring, etc. Finally, the results of testing and verifying show that the developed system can effectively be used to detect and diagnose faults in an actual operating gear transmission system.

  14. Distribution-based fuzzy clustering of electrical resistivity tomography images for interface detection

    NASA Astrophysics Data System (ADS)

    Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.

    2014-04-01

    A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.

  15. A novel complex networks clustering algorithm based on the core influence of nodes.

    PubMed

    Tong, Chao; Niu, Jianwei; Dai, Bin; Xie, Zhongyu

    2014-01-01

    In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster's core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.

  16. Overlapping Community Detection based on Network Decomposition

    NASA Astrophysics Data System (ADS)

    Ding, Zhuanlian; Zhang, Xingyi; Sun, Dengdi; Luo, Bin

    2016-04-01

    Community detection in complex network has become a vital step to understand the structure and dynamics of networks in various fields. However, traditional node clustering and relatively new proposed link clustering methods have inherent drawbacks to discover overlapping communities. Node clustering is inadequate to capture the pervasive overlaps, while link clustering is often criticized due to the high computational cost and ambiguous definition of communities. So, overlapping community detection is still a formidable challenge. In this work, we propose a new overlapping community detection algorithm based on network decomposition, called NDOCD. Specifically, NDOCD iteratively splits the network by removing all links in derived link communities, which are identified by utilizing node clustering technique. The network decomposition contributes to reducing the computation time and noise link elimination conduces to improving the quality of obtained communities. Besides, we employ node clustering technique rather than link similarity measure to discover link communities, thus NDOCD avoids an ambiguous definition of community and becomes less time-consuming. We test our approach on both synthetic and real-world networks. Results demonstrate the superior performance of our approach both in computation time and accuracy compared to state-of-the-art algorithms.

  17. Intrinsic alignment in redMaPPer clusters – II. Radial alignment of satellites towards cluster centres

    DOE PAGES

    Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.; ...

    2017-11-23

    We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less

  18. Intrinsic alignment in redMaPPer clusters – II. Radial alignment of satellites towards cluster centres

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.

    We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less

  19. Alignment and integration of complex networks by hypergraph-based spectral clustering

    NASA Astrophysics Data System (ADS)

    Michoel, Tom; Nachtergaele, Bruno

    2012-11-01

    Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.

  20. Alignment and integration of complex networks by hypergraph-based spectral clustering.

    PubMed

    Michoel, Tom; Nachtergaele, Bruno

    2012-11-01

    Complex networks possess a rich, multiscale structure reflecting the dynamical and functional organization of the systems they model. Often there is a need to analyze multiple networks simultaneously, to model a system by more than one type of interaction, or to go beyond simple pairwise interactions, but currently there is a lack of theoretical and computational methods to address these problems. Here we introduce a framework for clustering and community detection in such systems using hypergraph representations. Our main result is a generalization of the Perron-Frobenius theorem from which we derive spectral clustering algorithms for directed and undirected hypergraphs. We illustrate our approach with applications for local and global alignment of protein-protein interaction networks between multiple species, for tripartite community detection in folksonomies, and for detecting clusters of overlapping regulatory pathways in directed networks.

  1. Weak lensing magnification of SpARCS galaxy clusters

    NASA Astrophysics Data System (ADS)

    Tudorica, A.; Hildebrandt, H.; Tewes, M.; Hoekstra, H.; Morrison, C. B.; Muzzin, A.; Wilson, G.; Yee, H. K. C.; Lidman, C.; Hicks, A.; Nantais, J.; Erben, T.; van der Burg, R. F. J.; Demarco, R.

    2017-12-01

    Context. Measuring and calibrating relations between cluster observables is critical for resource-limited studies. The mass-richness relation of clusters offers an observationally inexpensive way of estimating masses. Its calibration is essential for cluster and cosmological studies, especially for high-redshift clusters. Weak gravitational lensing magnification is a promising and complementary method to shear studies, that can be applied at higher redshifts. Aims: We aim to employ the weak lensing magnification method to calibrate the mass-richness relation up to a redshift of 1.4. We used the Spitzer Adaptation of the Red-Sequence Cluster Survey (SpARCS) galaxy cluster candidates (0.2 < z < 1.4) and optical data from the Canada France Hawaii Telescope (CFHT) to test whether magnification can be effectively used to constrain the mass of high-redshift clusters. Methods: Lyman-break galaxies (LBGs) selected using the u-band dropout technique and their colours were used as a background sample of sources. LBG positions were cross-correlated with the centres of the sample of SpARCS clusters to estimate the magnification signal, which was optimally-weighted using an externally-calibrated LBG luminosity function. The signal was measured for cluster sub-samples, binned in both redshift and richness. Results: We measured the cross-correlation between the positions of galaxy cluster candidates and LBGs and detected a weak lensing magnification signal for all bins at a detection significance of 2.6-5.5σ. In particular, the significance of the measurement for clusters with z> 1.0 is 4.1σ; for the entire cluster sample we obtained an average M200 of 1.28 -0.21+0.23 × 1014 M⊙. Conclusions: Our measurements demonstrated the feasibility of using weak lensing magnification as a viable tool for determining the average halo masses for samples of high redshift galaxy clusters. The results also established the success of using galaxy over-densities to select massive clusters at z > 1. Additional studies are necessary for further modelling of the various systematic effects we discussed.

  2. A correlation analysis-based detection and delineation of ECG characteristic events using template waveforms extracted by ensemble averaging of clustered heart cycles.

    PubMed

    Homaeinezhad, M R; Erfanianmoshiri-Nejad, M; Naseri, H

    2014-01-01

    The goal of this study is to introduce a simple, standard and safe procedure to detect and to delineate P and T waves of the electrocardiogram (ECG) signal in real conditions. The proposed method consists of four major steps: (1) a secure QRS detection and delineation algorithm, (2) a pattern recognition algorithm designed for distinguishing various ECG clusters which take place between consecutive R-waves, (3) extracting template of the dominant events of each cluster waveform and (4) application of the correlation analysis in order to delineate automatically the P- and T-waves in noisy conditions. The performance characteristics of the proposed P and T detection-delineation algorithm are evaluated versus various ECG signals whose qualities are altered from the best to the worst cases based on the random-walk noise theory. Also, the method is applied to the MIT-BIH Arrhythmia and the QT databases for comparing some parts of its performance characteristics with a number of P and T detection-delineation algorithms. The conducted evaluations indicate that in a signal with low quality value of about 0.6, the proposed method detects the P and T events with sensitivity Se=85% and positive predictive value of P+=89%, respectively. In addition, at the same quality, the average delineation errors associated with those ECG events are 45 and 63ms, respectively. Stable delineation error, high detection accuracy and high noise tolerance were the most important aspects considered during development of the proposed method. © 2013 Elsevier Ltd. All rights reserved.

  3. Mixture-Tuned, Clutter Matched Filter for Remote Detection of Subpixel Spectral Signals

    NASA Technical Reports Server (NTRS)

    Thompson, David R.; Mandrake, Lukas; Green, Robert O.

    2013-01-01

    Mapping localized spectral features in large images demands sensitive and robust detection algorithms. Two aspects of large images that can harm matched-filter detection performance are addressed simultaneously. First, multimodal backgrounds may thwart the typical Gaussian model. Second, outlier features can trigger false detections from large projections onto the target vector. Two state-of-the-art approaches are combined that independently address outlier false positives and multimodal backgrounds. The background clustering models multimodal backgrounds, and the mixture tuned matched filter (MT-MF) addresses outliers. Combining the two methods captures significant additional performance benefits. The resulting mixture tuned clutter matched filter (MT-CMF) shows effective performance on simulated and airborne datasets. The classical MNF transform was applied, followed by k-means clustering. Then, each cluster s mean, covariance, and the corresponding eigenvalues were estimated. This yields a cluster-specific matched filter estimate as well as a cluster- specific feasibility score to flag outlier false positives. The technology described is a proof of concept that may be employed in future target detection and mapping applications for remote imaging spectrometers. It is of most direct relevance to JPL proposals for airborne and orbital hyperspectral instruments. Applications include subpixel target detection in hyperspectral scenes for military surveillance. Earth science applications include mineralogical mapping, species discrimination for ecosystem health monitoring, and land use classification.

  4. Deep Learning Nuclei Detection in Digitized Histology Images by Superpixels

    PubMed Central

    Sornapudi, Sudhir; Stanley, Ronald Joe; Stoecker, William V.; Almubarak, Haidar; Long, Rodney; Antani, Sameer; Thoma, George; Zuna, Rosemary; Frazier, Shelliane R.

    2018-01-01

    Background: Advances in image analysis and computational techniques have facilitated automatic detection of critical features in histopathology images. Detection of nuclei is critical for squamous epithelium cervical intraepithelial neoplasia (CIN) classification into normal, CIN1, CIN2, and CIN3 grades. Methods: In this study, a deep learning (DL)-based nuclei segmentation approach is investigated based on gathering localized information through the generation of superpixels using a simple linear iterative clustering algorithm and training with a convolutional neural network. Results: The proposed approach was evaluated on a dataset of 133 digitized histology images and achieved an overall nuclei detection (object-based) accuracy of 95.97%, with demonstrated improvement over imaging-based and clustering-based benchmark techniques. Conclusions: The proposed DL-based nuclei segmentation Method with superpixel analysis has shown improved segmentation results in comparison to state-of-the-art methods. PMID:29619277

  5. Nucleus and cytoplasm segmentation in microscopic images using K-means clustering and region growing

    PubMed Central

    Sarrafzadeh, Omid; Dehnavi, Alireza Mehri

    2015-01-01

    Background: Segmentation of leukocytes acts as the foundation for all automated image-based hematological disease recognition systems. Most of the time, hematologists are interested in evaluation of white blood cells only. Digital image processing techniques can help them in their analysis and diagnosis. Materials and Methods: The main objective of this paper is to detect leukocytes from a blood smear microscopic image and segment them into their two dominant elements, nucleus and cytoplasm. The segmentation is conducted using two stages of applying K-means clustering. First, the nuclei are segmented using K-means clustering. Then, a proposed method based on region growing is applied to separate the connected nuclei. Next, the nuclei are subtracted from the original image. Finally, the cytoplasm is segmented using the second stage of K-means clustering. Results: The results indicate that the proposed method is able to extract the nucleus and cytoplasm regions accurately and works well even though there is no significant contrast between the components in the image. Conclusions: In this paper, a method based on K-means clustering and region growing is proposed in order to detect leukocytes from a blood smear microscopic image and segment its components, the nucleus and the cytoplasm. As region growing step of the algorithm relies on the information of edges, it will not able to separate the connected nuclei more accurately in poor edges and it requires at least a weak edge to exist between the nuclei. The nucleus and cytoplasm segments of a leukocyte can be used for feature extraction and classification which leads to automated leukemia detection. PMID:26605213

  6. Using Fuzzy Clustering for Real-time Space Flight Safety

    NASA Technical Reports Server (NTRS)

    Lee, Charles; Haskell, Richard E.; Hanna, Darrin; Alena, Richard L.

    2004-01-01

    To ensure space flight safety, it is necessary to monitor myriad sensor readings on the ground and in flight. Since a space shuttle has many sensors, monitoring data and drawing conclusions from information contained within the data in real time is challenging. The nature of the information can be critical to the success of the mission and safety of the crew and therefore, must be processed with minimal data-processing time. Data analysis algorithms could be used to synthesize sensor readings and compare data associated with normal operation with the data obtained that contain fault patterns to draw conclusions. Detecting abnormal operation during early stages in the transition from safe to unsafe operation requires a large amount of historical data that can be categorized into different classes (non-risk, risk). Even though the 40 years of shuttle flight program has accumulated volumes of historical data, these data don t comprehensively represent all possible fault patterns since fault patterns are usually unknown before the fault occurs. This paper presents a method that uses a similarity measure between fuzzy clusters to detect possible faults in real time. A clustering technique based on a fuzzy equivalence relation is used to characterize temporal data. Data collected during an initial time period are separated into clusters. These clusters are characterized by their centroids. Clusters formed during subsequent time periods are either merged with an existing cluster or added to the cluster list. The resulting list of cluster centroids, called a cluster group, characterizes the behavior of a particular set of temporal data. The degree to which new clusters formed in a subsequent time period are similar to the cluster group is characterized by a similarity measure, q. This method is applied to downlink data from Columbia flights. The results show that this technique can detect an unexpected fault that has not been present in the training data set.

  7. Topic detection using paragraph vectors to support active learning in systematic reviews.

    PubMed

    Hashimoto, Kazuma; Kontonatsios, Georgios; Miwa, Makoto; Ananiadou, Sophia

    2016-08-01

    Systematic reviews require expert reviewers to manually screen thousands of citations in order to identify all relevant articles to the review. Active learning text classification is a supervised machine learning approach that has been shown to significantly reduce the manual annotation workload by semi-automating the citation screening process of systematic reviews. In this paper, we present a new topic detection method that induces an informative representation of studies, to improve the performance of the underlying active learner. Our proposed topic detection method uses a neural network-based vector space model to capture semantic similarities between documents. We firstly represent documents within the vector space, and cluster the documents into a predefined number of clusters. The centroids of the clusters are treated as latent topics. We then represent each document as a mixture of latent topics. For evaluation purposes, we employ the active learning strategy using both our novel topic detection method and a baseline topic model (i.e., Latent Dirichlet Allocation). Results obtained demonstrate that our method is able to achieve a high sensitivity of eligible studies and a significantly reduced manual annotation cost when compared to the baseline method. This observation is consistent across two clinical and three public health reviews. The tool introduced in this work is available from https://nactem.ac.uk/pvtopic/. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Spatial event cluster detection using an approximate normal distribution.

    PubMed

    Torabi, Mahmoud; Rosychuk, Rhonda J

    2008-12-12

    In geographic surveillance of disease, areas with large numbers of disease cases are to be identified so that investigations of the causes of high disease rates can be pursued. Areas with high rates are called disease clusters and statistical cluster detection tests are used to identify geographic areas with higher disease rates than expected by chance alone. Typically cluster detection tests are applied to incident or prevalent cases of disease, but surveillance of disease-related events, where an individual may have multiple events, may also be of interest. Previously, a compound Poisson approach that detects clusters of events by testing individual areas that may be combined with their neighbours has been proposed. However, the relevant probabilities from the compound Poisson distribution are obtained from a recursion relation that can be cumbersome if the number of events are large or analyses by strata are performed. We propose a simpler approach that uses an approximate normal distribution. This method is very easy to implement and is applicable to situations where the population sizes are large and the population distribution by important strata may differ by area. We demonstrate the approach on pediatric self-inflicted injury presentations to emergency departments and compare the results for probabilities based on the recursion and the normal approach. We also implement a Monte Carlo simulation to study the performance of the proposed approach. In a self-inflicted injury data example, the normal approach identifies twelve out of thirteen of the same clusters as the compound Poisson approach, noting that the compound Poisson method detects twelve significant clusters in total. Through simulation studies, the normal approach well approximates the compound Poisson approach for a variety of different population sizes and case and event thresholds. A drawback of the compound Poisson approach is that the relevant probabilities must be determined through a recursion relation and such calculations can be computationally intensive if the cluster size is relatively large or if analyses are conducted with strata variables. On the other hand, the normal approach is very flexible, easily implemented, and hence, more appealing for users. Moreover, the concepts may be more easily conveyed to non-statisticians interested in understanding the methodology associated with cluster detection test results.

  9. Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering.

    PubMed

    Guo, Xuan; Meng, Yu; Yu, Ning; Pan, Yi

    2014-04-10

    Taking the advantage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.

  10. Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering

    PubMed Central

    2014-01-01

    Backgroud Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger. Results In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions. Conclusions Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS. PMID:24717145

  11. Eb&D: A new clustering approach for signed social networks based on both edge-betweenness centrality and density of subgraphs

    NASA Astrophysics Data System (ADS)

    Qi, Xingqin; Song, Huimin; Wu, Jianliang; Fuller, Edgar; Luo, Rong; Zhang, Cun-Quan

    2017-09-01

    Clustering algorithms for unsigned social networks which have only positive edges have been studied intensively. However, when a network has like/dislike, love/hate, respect/disrespect, or trust/distrust relationships, unsigned social networks with only positive edges are inadequate. Thus we model such kind of networks as signed networks which can have both negative and positive edges. Detecting the cluster structures of signed networks is much harder than for unsigned networks, because it not only requires that positive edges within clusters are as many as possible, but also requires that negative edges between clusters are as many as possible. Currently, we have few clustering algorithms for signed networks, and most of them requires the number of final clusters as an input while it is actually hard to predict beforehand. In this paper, we will propose a novel clustering algorithm called Eb &D for signed networks, where both the betweenness of edges and the density of subgraphs are used to detect cluster structures. A hierarchically nested system will be constructed to illustrate the inclusion relationships of clusters. To show the validity and efficiency of Eb &D, we test it on several classical social networks and also hundreds of synthetic data sets, and all obtain better results compared with other methods. The biggest advantage of Eb &D compared with other methods is that the number of clusters do not need to be known prior.

  12. Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy

    NASA Astrophysics Data System (ADS)

    Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.

    2012-11-01

    Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.

  13. An incremental community detection method for social tagging systems using locality-sensitive hashing.

    PubMed

    Wu, Zhenyu; Zou, Ming

    2014-10-01

    An increasing number of users interact, collaborate, and share information through social networks. Unprecedented growth in social networks is generating a significant amount of unstructured social data. From such data, distilling communities where users have common interests and tracking variations of users' interests over time are important research tracks in fields such as opinion mining, trend prediction, and personalized services. However, these tasks are extremely difficult considering the highly dynamic characteristics of the data. Existing community detection methods are time consuming, making it difficult to process data in real time. In this paper, dynamic unstructured data is modeled as a stream. Tag assignments stream clustering (TASC), an incremental scalable community detection method, is proposed based on locality-sensitive hashing. Both tags and latent interactions among users are incorporated in the method. In our experiments, the social dynamic behaviors of users are first analyzed. The proposed TASC method is then compared with state-of-the-art clustering methods such as StreamKmeans and incremental k-clique; results indicate that TASC can detect communities more efficiently and effectively. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. The MUSE-Wide survey: detection of a clustering signal from Lyman α emitters in the range 3 < z < 6

    NASA Astrophysics Data System (ADS)

    Diener, C.; Wisotzki, L.; Schmidt, K. B.; Herenz, E. C.; Urrutia, T.; Garel, T.; Kerutt, J.; Saust, R. L.; Bacon, R.; Cantalupo, S.; Contini, T.; Guiderdoni, B.; Marino, R. A.; Richard, J.; Schaye, J.; Soucail, G.; Weilbacher, P. M.

    2017-11-01

    We present a clustering analysis of a sample of 238 Ly α emitters at redshift 3 ≲ z ≲ 6 from the MUSE-Wide survey. This survey mosaics extragalactic legacy fields with 1h MUSE pointings to detect statistically relevant samples of emission line galaxies. We analysed the first year observations from MUSE-Wide making use of the clustering signal in the line-of-sight direction. This method relies on comparing pair-counts at close redshifts for a fixed transverse distance and thus exploits the full potential of the redshift range covered by our sample. A clear clustering signal with a correlation length of r0=2.9^{+1.0}_{-1.1} Mpc (comoving) is detected. Whilst this result is based on only about a quarter of the full survey size, it already shows the immense potential of MUSE for efficiently observing and studying the clustering of Ly α emitters.

  15. Automatic categorization of anatomical landmark-local appearances based on diffeomorphic demons and spectral clustering for constructing detector ensembles.

    PubMed

    Hanaoka, Shouhei; Masutani, Yoshitaka; Nemoto, Mitsutaka; Nomura, Yukihiro; Yoshikawa, Takeharu; Hayashi, Naoto; Ohtomo, Kuni

    2012-01-01

    A method for categorizing landmark-local appearances extracted from computed tomography (CT) datasets is presented. Anatomical landmarks in the human body inevitably have inter-individual variations that cause difficulty in automatic landmark detection processes. The goal of this study is to categorize subjects (i.e., training datasets) according to local shape variations of such a landmark so that each subgroup has less shape variation and thus the machine learning of each landmark detector is much easier. The similarity between each subject pair is measured based on the non-rigid registration result between them. These similarities are used by the spectral clustering process. After the clustering, all training datasets in each cluster, as well as synthesized intermediate images calculated from all subject-pairs in the cluster, are used to train the corresponding subgroup detector. All of these trained detectors compose a detector ensemble to detect the target landmark. Evaluation with clinical CT datasets showed great improvement in the detection performance.

  16. Border preserving skin lesion segmentation

    NASA Astrophysics Data System (ADS)

    Kamali, Mostafa; Samei, Golnoosh

    2008-03-01

    Melanoma is a fatal cancer with a growing incident rate. However it could be cured if diagnosed in early stages. The first step in detecting melanoma is the separation of skin lesion from healthy skin. There are particular features associated with a malignant lesion whose successful detection relies upon accurately extracted borders. We propose a two step approach. First, we apply K-means clustering method (to 3D RGB space) that extracts relatively accurate borders. In the second step we perform an extra refining step for detecting the fading area around some lesions as accurately as possible. Our method has a number of novelties. Firstly as the clustering method is directly applied to the 3D color space, we do not overlook the dependencies between different color channels. In addition, it is capable of extracting fine lesion borders up to pixel level in spite of the difficulties associated with fading areas around the lesion. Performing clustering in different color spaces reveals that 3D RGB color space is preferred. The application of the proposed algorithm to an extensive data-base of skin lesions shows that its performance is superior to that of existing methods both in terms of accuracy and computational complexity.

  17. Prediction of CpG-island function: CpG clustering vs. sliding-window methods

    PubMed Central

    2010-01-01

    Background Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. Results We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. Conclusions The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands. PMID:20500903

  18. Assessment of Overlap of Phylogenetic Transmission Clusters and Communities in Simple Sexual Contact Networks: Applications to HIV-1

    PubMed Central

    Villandre, Luc; Günthard, Huldrych F.; Kouyos, Roger; Stadler, Tanja

    2016-01-01

    Background Transmission patterns of sexually-transmitted infections (STIs) could relate to the structure of the underlying sexual contact network, whose features are therefore of interest to clinicians. Conventionally, we represent sexual contacts in a population with a graph, that can reveal the existence of communities. Phylogenetic methods help infer the history of an epidemic and incidentally, may help detecting communities. In particular, phylogenetic analyses of HIV-1 epidemics among men who have sex with men (MSM) have revealed the existence of large transmission clusters, possibly resulting from within-community transmissions. Past studies have explored the association between contact networks and phylogenies, including transmission clusters, producing conflicting conclusions about whether network features significantly affect observed transmission history. As far as we know however, none of them thoroughly investigated the role of communities, defined with respect to the network graph, in the observation of clusters. Methods The present study investigates, through simulations, community detection from phylogenies. We simulate a large number of epidemics over both unweighted and weighted, undirected random interconnected-islands networks, with islands corresponding to communities. We use weighting to modulate distance between islands. We translate each epidemic into a phylogeny, that lets us partition our samples of infected subjects into transmission clusters, based on several common definitions from the literature. We measure similarity between subjects’ island membership indices and transmission cluster membership indices with the adjusted Rand index. Results and Conclusion Analyses reveal modest mean correspondence between communities in graphs and phylogenetic transmission clusters. We conclude that common methods often have limited success in detecting contact network communities from phylogenies. The rarely-fulfilled requirement that network communities correspond to clades in the phylogeny is their main drawback. Understanding the link between transmission clusters and communities in sexual contact networks could help inform policymaking to curb HIV incidence in MSMs. PMID:26863322

  19. Modified vegetation indices for Ganoderma disease detection in oil palm from field spectroradiometer data

    NASA Astrophysics Data System (ADS)

    Shafri, Helmi Z. M.; Anuar, M. Izzuddin; Saripan, M. Iqbal

    2009-10-01

    High resolution field spectroradiometers are important for spectral analysis and mobile inspection of vegetation disease. The biggest challenges in using this technology for automated vegetation disease detection are in spectral signatures pre-processing, band selection and generating reflectance indices to improve the ability of hyperspectral data for early detection of disease. In this paper, new indices for oil palm Ganoderma disease detection were generated using band ratio and different band combination techniques. Unsupervised clustering method was used to cluster the values of each class resultant from each index. The wellness of band combinations was assessed by using Optimum Index Factor (OIF) while cluster validation was executed using Average Silhouette Width (ASW). 11 modified reflectance indices were generated in this study and the indices were ranked according to the values of their ASW. These modified indices were also compared to several existing and new indices. The results showed that the combination of spectral values at 610.5nm and 738nm was the best for clustering the three classes of infection levels in the determination of the best spectral index for early detection of Ganoderma disease.

  20. A segmentation/clustering model for the analysis of array CGH data.

    PubMed

    Picard, F; Robin, S; Lebarbier, E; Daudin, J-J

    2007-09-01

    Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.

  1. A ground truth based comparative study on clustering of gene expression data.

    PubMed

    Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

    2008-05-01

    Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

  2. Experimental methods of molecular matter-wave optics.

    PubMed

    Juffmann, Thomas; Ulbricht, Hendrik; Arndt, Markus

    2013-08-01

    We describe the state of the art in preparing, manipulating and detecting coherent molecular matter. We focus on experimental methods for handling the quantum motion of compound systems from diatomic molecules to clusters or biomolecules.Molecular quantum optics offers many challenges and innovative prospects: already the combination of two atoms into one molecule takes several well-established methods from atomic physics, such as for instance laser cooling, to their limits. The enormous internal complexity that arises when hundreds or thousands of atoms are bound in a single organic molecule, cluster or nanocrystal provides a richness that can only be tackled by combining methods from atomic physics, chemistry, cluster physics, nanotechnology and the life sciences.We review various molecular beam sources and their suitability for matter-wave experiments. We discuss numerous molecular detection schemes and give an overview over diffraction and interference experiments that have already been performed with molecules or clusters.Applications of de Broglie studies with composite systems range from fundamental tests of physics up to quantum-enhanced metrology in physical chemistry, biophysics and the surface sciences.Nanoparticle quantum optics is a growing field, which will intrigue researchers still for many years to come. This review can, therefore, only be a snapshot of a very dynamical process.

  3. Identification and handling of artifactual gene expression profiles emerging in microarray hybridization experiments

    PubMed Central

    Brodsky, Leonid; Leontovich, Andrei; Shtutman, Michael; Feinstein, Elena

    2004-01-01

    Mathematical methods of analysis of microarray hybridizations deal with gene expression profiles as elementary units. However, some of these profiles do not reflect a biologically relevant transcriptional response, but rather stem from technical artifacts. Here, we describe two technically independent but rationally interconnected methods for identification of such artifactual profiles. Our diagnostics are based on detection of deviations from uniformity, which is assumed as the main underlying principle of microarray design. Method 1 is based on detection of non-uniformity of microarray distribution of printed genes that are clustered based on the similarity of their expression profiles. Method 2 is based on evaluation of the presence of gene-specific microarray spots within the slides’ areas characterized by an abnormal concentration of low/high differential expression values, which we define as ‘patterns of differentials’. Applying two novel algorithms, for nested clustering (method 1) and for pattern detection (method 2), we can make a dual estimation of the profile’s quality for almost every printed gene. Genes with artifactual profiles detected by method 1 may then be removed from further analysis. Suspicious differential expression values detected by method 2 may be either removed or weighted according to the probabilities of patterns that cover them, thus diminishing their input in any further data analysis. PMID:14999086

  4. Population clustering based on copy number variations detected from next generation sequencing data.

    PubMed

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2014-08-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

  5. A similarity based agglomerative clustering algorithm in networks

    NASA Astrophysics Data System (ADS)

    Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong

    2018-04-01

    The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.

  6. A scoping review of spatial cluster analysis techniques for point-event data.

    PubMed

    Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott

    2013-05-01

    Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.

  7. Automatic Selection of Multiple Images in the Frontier Field Clusters

    NASA Astrophysics Data System (ADS)

    Mahler, Guillaume; Richard, Johan; Patricio, Vera; Clément, Benjamin; Lagattuta, David

    2015-08-01

    Probing the central mass distribution of massive galaxy clusters is an important step towards mapping the overall distribution of their dark matter content. Thanks to gravitational lensing and the appearance of multiple images, we can constrain the inner region of galaxy clusters with a high precision. The Frontier Fields (FF) provide us with the deepest HST data ever in such clusters. Currently, most multiple-image systems are found by eye, yet in the FF, we expect hundreds to exist.Thus, In order to deal with such huge amounts of data, we need to method develop an automated detection method.I present a new tool to perform this task, MISE (Multiple Images SEarcher), a program which identifies multiple images by combining their specific properties. In particular, multiple images must: a) have similar colors, b) have similar surface brightnesses, and c) appear in locations predicted by a specific lensing configuration.I will describe the tuning and performances of MISE on both the FF clusters and the simulated clusters HERA and ARES. MISE allows us to not confirm multiple images identified visually, but also detect new multiple-image candidates in MACS0416 and A2744, giving us additional constraints on the mass distribution in these clusters. A spectroscopic follow-up of these candidates is currently underway with MUSE.

  8. A new hierarchical method to find community structure in networks

    NASA Astrophysics Data System (ADS)

    Saoud, Bilal; Moussaoui, Abdelouahab

    2018-04-01

    Community structure is very important to understand a network which represents a context. Many community detection methods have been proposed like hierarchical methods. In our study, we propose a new hierarchical method for community detection in networks based on genetic algorithm. In this method we use genetic algorithm to split a network into two networks which maximize the modularity. Each new network represents a cluster (community). Then we repeat the splitting process until we get one node at each cluster. We use the modularity function to measure the strength of the community structure found by our method, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our method are highly effective at discovering community structure in both computer-generated and real-world network data.

  9. Impact of socioeconomic inequalities on geographic disparities in cancer incidence: comparison of methods for spatial disease mapping.

    PubMed

    Goungounga, Juste Aristide; Gaudart, Jean; Colonna, Marc; Giorgi, Roch

    2016-10-12

    The reliability of spatial statistics is often put into question because real spatial variations may not be found, especially in heterogeneous areas. Our objective was to compare empirically different cluster detection methods. We assessed their ability to find spatial clusters of cancer cases and evaluated the impact of the socioeconomic status (e.g., the Townsend index) on cancer incidence. Moran's I, the empirical Bayes index (EBI), and Potthoff-Whittinghill test were used to investigate the general clustering. The local cluster detection methods were: i) the spatial oblique decision tree (SpODT); ii) the spatial scan statistic of Kulldorff (SaTScan); and, iii) the hierarchical Bayesian spatial modeling (HBSM) in a univariate and multivariate setting. These methods were used with and without introducing the Townsend index of socioeconomic deprivation known to be related to the distribution of cancer incidence. Incidence data stemmed from the Cancer Registry of Isère and were limited to prostate, lung, colon-rectum, and bladder cancers diagnosed between 1999 and 2007 in men only. The study found a spatial heterogeneity (p < 0.01) and an autocorrelation for prostate (EBI = 0.02; p = 0.001), lung (EBI = 0.01; p = 0.019) and bladder (EBI = 0.007; p = 0.05) cancers. After introduction of the Townsend index, SaTScan failed in finding cancers clusters. This introduction changed the results obtained with the other methods. SpODT identified five spatial classes (p < 0.05): four in the Western and one in the Northern parts of the study area (standardized incidence ratios: 1.68, 1.39, 1.14, 1.12, and 1.16, respectively). In the univariate setting, the Bayesian smoothing method found the same clusters as the two other methods (RR >1.2). The multivariate HBSM found a spatial correlation between lung and bladder cancers (r = 0.6). In spatial analysis of cancer incidence, SpODT and HBSM may be used not only for cluster detection but also for searching for confounding or etiological factors in small areas. Moreover, the multivariate HBSM offers a flexible and meaningful modeling of spatial variations; it shows plausible previously unknown associations between various cancers.

  10. Unsupervised Structure Detection in Biomedical Data.

    PubMed

    Vogt, Julia E

    2015-01-01

    A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.

  11. AMICO: optimized detection of galaxy clusters in photometric surveys

    NASA Astrophysics Data System (ADS)

    Bellagamba, Fabio; Roncarelli, Mauro; Maturi, Matteo; Moscardini, Lauro

    2018-02-01

    We present Adaptive Matched Identifier of Clustered Objects (AMICO), a new algorithm for the detection of galaxy clusters in photometric surveys. AMICO is based on the Optimal Filtering technique, which allows to maximize the signal-to-noise ratio (S/N) of the clusters. In this work, we focus on the new iterative approach to the extraction of cluster candidates from the map produced by the filter. In particular, we provide a definition of membership probability for the galaxies close to any cluster candidate, which allows us to remove its imprint from the map, allowing the detection of smaller structures. As demonstrated in our tests, this method allows the deblending of close-by and aligned structures in more than 50 per cent of the cases for objects at radial distance equal to 0.5 × R200 or redshift distance equal to 2 × σz, being σz the typical uncertainty of photometric redshifts. Running AMICO on mocks derived from N-body simulations and semi-analytical modelling of the galaxy evolution, we obtain a consistent mass-amplitude relation through the redshift range of 0.3 < z < 1, with a logarithmic slope of ∼0.55 and a logarithmic scatter of ∼0.14. The fraction of false detections is steeply decreasing with S/N and negligible at S/N > 5.

  12. Segmentation and clustering as complementary sources of information

    NASA Astrophysics Data System (ADS)

    Dale, Michael B.; Allison, Lloyd; Dale, Patricia E. R.

    2007-03-01

    This paper examines the effects of using a segmentation method to identify change-points or edges in vegetation. It identifies coherence (spatial or temporal) in place of unconstrained clustering. The segmentation method involves change-point detection along a sequence of observations so that each cluster formed is composed of adjacent samples; this is a form of constrained clustering. The protocol identifies one or more models, one for each section identified, and the quality of each is assessed using a minimum message length criterion, which provides a rational basis for selecting an appropriate model. Although the segmentation is less efficient than clustering, it does provide other information because it incorporates textural similarity as well as homogeneity. In addition it can be useful in determining various scales of variation that may apply to the data, providing a general method of small-scale pattern analysis.

  13. Extraction of the number of peroxisomes in yeast cells by automated image analysis.

    PubMed

    Niemistö, Antti; Selinummi, Jyrki; Saleem, Ramsey; Shmulevich, Ilya; Aitchison, John; Yli-Harja, Olli

    2006-01-01

    An automated image analysis method for extracting the number of peroxisomes in yeast cells is presented. Two images of the cell population are required for the method: a bright field microscope image from which the yeast cells are detected and the respective fluorescent image from which the number of peroxisomes in each cell is found. The segmentation of the cells is based on clustering the local mean-variance space. The watershed transformation is thereafter employed to separate cells that are clustered together. The peroxisomes are detected by thresholding the fluorescent image. The method is tested with several images of a budding yeast Saccharomyces cerevisiae population, and the results are compared with manually obtained results.

  14. Accounting for Limited Detection Efficiency and Localization Precision in Cluster Analysis in Single Molecule Localization Microscopy

    PubMed Central

    Shivanandan, Arun; Unnikrishnan, Jayakrishnan; Radenovic, Aleksandra

    2015-01-01

    Single Molecule Localization Microscopy techniques like PhotoActivated Localization Microscopy, with their sub-diffraction limit spatial resolution, have been popularly used to characterize the spatial organization of membrane proteins, by means of quantitative cluster analysis. However, such quantitative studies remain challenged by the techniques’ inherent sources of errors such as a limited detection efficiency of less than 60%, due to incomplete photo-conversion, and a limited localization precision in the range of 10 – 30nm, varying across the detected molecules, mainly depending on the number of photons collected from each. We provide analytical methods to estimate the effect of these errors in cluster analysis and to correct for them. These methods, based on the Ripley’s L(r) – r or Pair Correlation Function popularly used by the community, can facilitate potentially breakthrough results in quantitative biology by providing a more accurate and precise quantification of protein spatial organization. PMID:25794150

  15. Performance of cancer cluster Q-statistics for case-control residential histories

    PubMed Central

    Sloan, Chantel D.; Jacquez, Geoffrey M.; Gallagher, Carolyn M.; Ward, Mary H.; Raaschou-Nielsen, Ole; Nordsborg, Rikke Baastrup; Meliker, Jaymie R.

    2012-01-01

    Few investigations of health event clustering have evaluated residential mobility, though causative exposures for chronic diseases such as cancer often occur long before diagnosis. Recently developed Q-statistics incorporate human mobility into disease cluster investigations by quantifying space- and time-dependent nearest neighbor relationships. Using residential histories from two cancer case-control studies, we created simulated clusters to examine Q-statistic performance. Results suggest the intersection of cases with significant clustering over their life course, Qi, with cases who are constituents of significant local clusters at given times, Qit, yielded the best performance, which improved with increasing cluster size. Upon comparison, a larger proportion of true positives were detected with Kulldorf’s spatial scan method if the time of clustering was provided. We recommend using Q-statistics to identify when and where clustering may have occurred, followed by the scan method to localize the candidate clusters. Future work should investigate the generalizability of these findings. PMID:23149326

  16. Hierarchical modeling of cluster size in wildlife surveys

    USGS Publications Warehouse

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  17. Searching for filaments and large-scale structure around DAFT/FADA clusters

    NASA Astrophysics Data System (ADS)

    Durret, F.; Márquez, I.; Acebrón, A.; Adami, C.; Cabrera-Lavers, A.; Capelato, H.; Martinet, N.; Sarron, F.; Ulmer, M. P.

    2016-04-01

    Context. Clusters of galaxies are located at the intersection of cosmic filaments and are still accreting galaxies and groups along these preferential directions. However, because of their relatively low contrast on the sky, filaments are difficult to detect (unless a large amount of spectroscopic data are available), and unambiguous detections have been limited until now to relatively low redshifts (z< ~ 0.3). Aims: This project is aimed at searching for extensions and filaments around clusters, traced by galaxies selected to be at the cluster redshift based on the red sequence. In the 0.4

  18. Deep Learning Nuclei Detection in Digitized Histology Images by Superpixels.

    PubMed

    Sornapudi, Sudhir; Stanley, Ronald Joe; Stoecker, William V; Almubarak, Haidar; Long, Rodney; Antani, Sameer; Thoma, George; Zuna, Rosemary; Frazier, Shelliane R

    2018-01-01

    Advances in image analysis and computational techniques have facilitated automatic detection of critical features in histopathology images. Detection of nuclei is critical for squamous epithelium cervical intraepithelial neoplasia (CIN) classification into normal, CIN1, CIN2, and CIN3 grades. In this study, a deep learning (DL)-based nuclei segmentation approach is investigated based on gathering localized information through the generation of superpixels using a simple linear iterative clustering algorithm and training with a convolutional neural network. The proposed approach was evaluated on a dataset of 133 digitized histology images and achieved an overall nuclei detection (object-based) accuracy of 95.97%, with demonstrated improvement over imaging-based and clustering-based benchmark techniques. The proposed DL-based nuclei segmentation Method with superpixel analysis has shown improved segmentation results in comparison to state-of-the-art methods.

  19. Comparison of K-means and fuzzy c-means algorithm performance for automated determination of the arterial input function.

    PubMed

    Yin, Jiandong; Sun, Hongzan; Yang, Jiawen; Guo, Qiyong

    2014-01-01

    The arterial input function (AIF) plays a crucial role in the quantification of cerebral perfusion parameters. The traditional method for AIF detection is based on manual operation, which is time-consuming and subjective. Two automatic methods have been reported that are based on two frequently used clustering algorithms: fuzzy c-means (FCM) and K-means. However, it is still not clear which is better for AIF detection. Hence, we compared the performance of these two clustering methods using both simulated and clinical data. The results demonstrate that K-means analysis can yield more accurate and robust AIF results, although it takes longer to execute than the FCM method. We consider that this longer execution time is trivial relative to the total time required for image manipulation in a PACS setting, and is acceptable if an ideal AIF is obtained. Therefore, the K-means method is preferable to FCM in AIF detection.

  20. Comparison of K-Means and Fuzzy c-Means Algorithm Performance for Automated Determination of the Arterial Input Function

    PubMed Central

    Yin, Jiandong; Sun, Hongzan; Yang, Jiawen; Guo, Qiyong

    2014-01-01

    The arterial input function (AIF) plays a crucial role in the quantification of cerebral perfusion parameters. The traditional method for AIF detection is based on manual operation, which is time-consuming and subjective. Two automatic methods have been reported that are based on two frequently used clustering algorithms: fuzzy c-means (FCM) and K-means. However, it is still not clear which is better for AIF detection. Hence, we compared the performance of these two clustering methods using both simulated and clinical data. The results demonstrate that K-means analysis can yield more accurate and robust AIF results, although it takes longer to execute than the FCM method. We consider that this longer execution time is trivial relative to the total time required for image manipulation in a PACS setting, and is acceptable if an ideal AIF is obtained. Therefore, the K-means method is preferable to FCM in AIF detection. PMID:24503700

  1. Anomaly clustering in hyperspectral images

    NASA Astrophysics Data System (ADS)

    Doster, Timothy J.; Ross, David S.; Messinger, David W.; Basener, William F.

    2009-05-01

    The topological anomaly detection algorithm (TAD) differs from other anomaly detection algorithms in that it uses a topological/graph-theoretic model for the image background instead of modeling the image with a Gaussian normal distribution. In the construction of the model, TAD produces a hard threshold separating anomalous pixels from background in the image. We build on this feature of TAD by extending the algorithm so that it gives a measure of the number of anomalous objects, rather than the number of anomalous pixels, in a hyperspectral image. This is done by identifying, and integrating, clusters of anomalous pixels via a graph theoretical method combining spatial and spectral information. The method is applied to a cluttered HyMap image and combines small groups of pixels containing like materials, such as those corresponding to rooftops and cars, into individual clusters. This improves visualization and interpretation of objects.

  2. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    PubMed Central

    Ying Wah, Teh

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

  3. A fast density-based clustering algorithm for real-time Internet of Things stream.

    PubMed

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  4. Nucleus and cytoplasm segmentation in microscopic images using K-means clustering and region growing.

    PubMed

    Sarrafzadeh, Omid; Dehnavi, Alireza Mehri

    2015-01-01

    Segmentation of leukocytes acts as the foundation for all automated image-based hematological disease recognition systems. Most of the time, hematologists are interested in evaluation of white blood cells only. Digital image processing techniques can help them in their analysis and diagnosis. The main objective of this paper is to detect leukocytes from a blood smear microscopic image and segment them into their two dominant elements, nucleus and cytoplasm. The segmentation is conducted using two stages of applying K-means clustering. First, the nuclei are segmented using K-means clustering. Then, a proposed method based on region growing is applied to separate the connected nuclei. Next, the nuclei are subtracted from the original image. Finally, the cytoplasm is segmented using the second stage of K-means clustering. The results indicate that the proposed method is able to extract the nucleus and cytoplasm regions accurately and works well even though there is no significant contrast between the components in the image. In this paper, a method based on K-means clustering and region growing is proposed in order to detect leukocytes from a blood smear microscopic image and segment its components, the nucleus and the cytoplasm. As region growing step of the algorithm relies on the information of edges, it will not able to separate the connected nuclei more accurately in poor edges and it requires at least a weak edge to exist between the nuclei. The nucleus and cytoplasm segments of a leukocyte can be used for feature extraction and classification which leads to automated leukemia detection.

  5. Quantifying clutter: A comparison of four methods and their relationship to bat detection

    Treesearch

    Joy M. O’Keefe; Susan C. Loeb; Hoke S. Hill Jr.; J. Drew Lanham

    2014-01-01

    The degree of spatial complexity in the environment, or clutter, affects the quality of foraging habitats for bats and their detection with acoustic systems. Clutter has been assessed in a variety of ways but there are no standardized methods for measuring clutter. We compared four methods (Visual Clutter, Cluster, Single Variable, and Clutter Index) and related these...

  6. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    PubMed

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Covariance descriptor fusion for target detection

    NASA Astrophysics Data System (ADS)

    Cukur, Huseyin; Binol, Hamidullah; Bal, Abdullah; Yavuz, Fatih

    2016-05-01

    Target detection is one of the most important topics for military or civilian applications. In order to address such detection tasks, hyperspectral imaging sensors provide useful images data containing both spatial and spectral information. Target detection has various challenging scenarios for hyperspectral images. To overcome these challenges, covariance descriptor presents many advantages. Detection capability of the conventional covariance descriptor technique can be improved by fusion methods. In this paper, hyperspectral bands are clustered according to inter-bands correlation. Target detection is then realized by fusion of covariance descriptor results based on the band clusters. The proposed combination technique is denoted Covariance Descriptor Fusion (CDF). The efficiency of the CDF is evaluated by applying to hyperspectral imagery to detect man-made objects. The obtained results show that the CDF presents better performance than the conventional covariance descriptor.

  8. Lightweight biometric detection system for human classification using pyroelectric infrared detectors.

    PubMed

    Burchett, John; Shankar, Mohan; Hamza, A Ben; Guenther, Bob D; Pitsianis, Nikos; Brady, David J

    2006-05-01

    We use pyroelectric detectors that are differential in nature to detect motion in humans by their heat emissions. Coded Fresnel lens arrays create boundaries that help to localize humans in space as well as to classify the nature of their motion. We design and implement a low-cost biometric tracking system by using off-the-shelf components. We demonstrate two classification methods by using data gathered from sensor clusters of dual-element pyroelectric detectors with coded Fresnel lens arrays. We propose two algorithms for person identification, a more generalized spectral clustering method and a more rigorous example that uses principal component regression to perform a blind classification.

  9. Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra.

    PubMed

    Rieder, Vera; Schork, Karin U; Kerschke, Laura; Blank-Landeshammer, Bernhard; Sickmann, Albert; Rahnenführer, Jörg

    2017-11-03

    In proteomics, liquid chromatography-tandem mass spectrometry (LC-MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass.

  10. Scalable Static and Dynamic Community Detection Using Grappolo

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Halappanavar, Mahantesh; Lu, Hao; Kalyanaraman, Anantharaman

    Graph clustering, popularly known as community detection, is a fundamental kernel for several applications of relevance to the Defense Advanced Research Projects Agency’s (DARPA) Hierarchical Identify Verify Exploit (HIVE) Pro- gram. Clusters or communities represent natural divisions within a network that are densely connected within a cluster and sparsely connected to the rest of the network. The need to compute clustering on large scale data necessitates the development of efficient algorithms that can exploit modern architectures that are fundamentally parallel in nature. How- ever, due to their irregular and inherently sequential nature, many of the current algorithms for community detectionmore » are challenging to parallelize. In response to the HIVE Graph Challenge, we present several parallelization heuristics for fast community detection using the Louvain method as the serial template. We implement all the heuristics in a software library called Grappolo. Using the inputs from the HIVE Challenge, we demonstrate superior performance and high quality solutions based on four parallelization heuristics. We use Grappolo on static graphs as the first step towards community detection on streaming graphs.« less

  11. Detection and clustering of features in aerial images by neuron network-based algorithm

    NASA Astrophysics Data System (ADS)

    Vozenilek, Vit

    2015-12-01

    The paper presents the algorithm for detection and clustering of feature in aerial photographs based on artificial neural networks. The presented approach is not focused on the detection of specific topographic features, but on the combination of general features analysis and their use for clustering and backward projection of clusters to aerial image. The basis of the algorithm is a calculation of the total error of the network and a change of weights of the network to minimize the error. A classic bipolar sigmoid was used for the activation function of the neurons and the basic method of backpropagation was used for learning. To verify that a set of features is able to represent the image content from the user's perspective, the web application was compiled (ASP.NET on the Microsoft .NET platform). The main achievements include the knowledge that man-made objects in aerial images can be successfully identified by detection of shapes and anomalies. It was also found that the appropriate combination of comprehensive features that describe the colors and selected shapes of individual areas can be useful for image analysis.

  12. Onto-clust--a methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders.

    PubMed

    Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell

    2009-02-01

    Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.

  13. Clusters, Groups, and Filaments in the Chandra Deep Field-South up to Redshift 1

    NASA Astrophysics Data System (ADS)

    Dehghan, S.; Johnston-Hollitt, M.

    2014-03-01

    We present a comprehensive structure detection analysis of the 0.3 deg2 area of the MUSYC-ACES field, which covers the Chandra Deep Field-South (CDFS). Using a density-based clustering algorithm on the MUSYC and ACES photometric and spectroscopic catalogs, we find 62 overdense regions up to redshifts of 1, including clusters, groups, and filaments. We also present the detection of a relatively small void of ~10 Mpc2 at z ~ 0.53. All structures are confirmed using the DBSCAN method, including the detection of nine structures previously reported in the literature. We present a catalog of all structures present, including their central position, mean redshift, velocity dispersions, and classification based on their morphological and spectroscopic distributions. In particular, we find 13 galaxy clusters and 6 large groups/small clusters. Comparison of these massive structures with published XMM-Newton imaging (where available) shows that 80% of these structures are associated with diffuse, soft-band (0.4-1 keV) X-ray emission, including 90% of all objects classified as clusters. The presence of soft-band X-ray emission in these massive structures (M 200 >= 4.9 × 1013 M ⊙) provides a strong independent confirmation of our methodology and classification scheme. In the closest two clusters identified (z < 0.13) high-quality optical imaging from the Deep2c field of the Garching-Bonn Deep Survey reveals the cD galaxies and demonstrates that they sit at the center of the detected X-ray emission. Nearly 60% of the clusters, groups, and filaments are detected in the known enhanced density regions of the CDFS at z ~= 0.13, 0.52, 0.68, and 0.73. Additionally, all of the clusters, bar the most distant, are found in these overdense redshift regions. Many of the clusters and groups exhibit signs of ongoing formation seen in their velocity distributions, position within the detected cosmic web, and in one case through the presence of tidally disrupted central galaxies exhibiting trails of stars. These results all provide strong support for hierarchical structure formation up to redshifts of 1.

  14. Atomistic cluster alignment method for local order mining in liquids and glasses

    NASA Astrophysics Data System (ADS)

    Fang, X. W.; Wang, C. Z.; Yao, Y. X.; Ding, Z. J.; Ho, K. M.

    2010-11-01

    An atomistic cluster alignment method is developed to identify and characterize the local atomic structural order in liquids and glasses. With the “order mining” idea for structurally disordered systems, the method can detect the presence of any type of local order in the system and can quantify the structural similarity between a given set of templates and the aligned clusters in a systematic and unbiased manner. Moreover, population analysis can also be carried out for various types of clusters in the system. The advantages of the method in comparison with other previously developed analysis methods are illustrated by performing the structural analysis for four prototype systems (i.e., pure Al, pure Zr, Zr35Cu65 , and Zr36Ni64 ). The results show that the cluster alignment method can identify various types of short-range orders (SROs) in these systems correctly while some of these SROs are difficult to capture by most of the currently available analysis methods (e.g., Voronoi tessellation method). Such a full three-dimensional atomistic analysis method is generic and can be applied to describe the magnitude and nature of noncrystalline ordering in many disordered systems.

  15. Cluster richness-mass calibration with cosmic microwave background lensing

    NASA Astrophysics Data System (ADS)

    Geach, James E.; Peacock, John A.

    2017-11-01

    Identifying galaxy clusters through overdensities of galaxies in photometric surveys is the oldest1,2 and arguably the most economical and mass-sensitive detection method3,4, compared with X-ray5-7 and Sunyaev-Zel'dovich effect8 surveys that detect the hot intracluster medium. However, a perennial problem has been the mapping of optical `richness' measurements onto total cluster mass3,9-12. Emitted at a conformal distance of 14 gigaparsecs, the cosmic microwave background acts as a backlight to all intervening mass in the Universe, and therefore has been gravitationally lensed13-15. Experiments such as the Atacama Cosmology Telescope16, South Pole Telescope17-19 and the Planck20 satellite have now detected gravitational lensing of the cosmic microwave background and produced large-area maps of the foreground deflecting structures. Here we present a calibration of cluster optical richness at the 10% level by measuring the average cosmic microwave background lensing measured by Planck towards the positions of large numbers of optically selected clusters, detecting the deflection of photons by structures of total mass of order 1014 M⊙. Although mainly aimed at the study of larger-scale structures, the Planck estimate of the cosmic microwave background lensing field can be used to recover a nearly unbiased lensing signal for stacked clusters on arcminute scales15,21. This approach offers a clean measure of total cluster masses over most of cosmic history, largely independent of baryon physics.

  16. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality

    PubMed Central

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M

    2008-01-01

    Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. Conclusion The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. Method We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit. PMID:18992163

  17. A Survey of Variable Extragalactic Sources with XTE's All Sky Monitor (ASM)

    NASA Technical Reports Server (NTRS)

    Jernigan, Garrett

    1998-01-01

    The original goal of the project was the near real-time detection of AGN utilizing the SSC 3 of the ASM on XTE which does a deep integration on one 100 square degree region of the sky. While the SSC never performed sufficiently well to allow the success of this goal, the work on the project has led to the development of a new analysis method for coded aperture systems which has now been applied to ASM data for mapping regions near clusters of galaxies such as the Perseus Cluster and the Coma Cluster. Publications are in preparation that describe both the new method and the results from mapping clusters of galaxies.

  18. A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa

    PubMed Central

    Petegrosso, Raphael; Tolar, Jakub

    2018-01-01

    Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. PMID:29630593

  19. Clustering methods applied in the detection of Ki67 hot-spots in whole tumor slide images: an efficient way to characterize heterogeneous tissue-based biomarkers.

    PubMed

    Lopez, Xavier Moles; Debeir, Olivier; Maris, Calliope; Rorive, Sandrine; Roland, Isabelle; Saerens, Marco; Salmon, Isabelle; Decaestecker, Christine

    2012-09-01

    Whole-slide scanners allow the digitization of an entire histological slide at very high resolution. This new acquisition technique opens a wide range of possibilities for addressing challenging image analysis problems, including the identification of tissue-based biomarkers. In this study, we use whole-slide scanner technology for imaging the proliferating activity patterns in tumor slides based on Ki67 immunohistochemistry. Faced with large images, pathologists require tools that can help them identify tumor regions that exhibit high proliferating activity, called "hot-spots" (HSs). Pathologists need tools that can quantitatively characterize these HS patterns. To respond to this clinical need, the present study investigates various clustering methods with the aim of identifying Ki67 HSs in whole tumor slide images. This task requires a method capable of identifying an unknown number of clusters, which may be highly variable in terms of shape, size, and density. We developed a hybrid clustering method, referred to as Seedlink. Compared to manual HS selections by three pathologists, we show that Seedlink provides an efficient way of detecting Ki67 HSs and improves the agreement among pathologists when identifying HSs. Copyright © 2012 International Society for Advancement of Cytometry.

  20. Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

    NASA Astrophysics Data System (ADS)

    Takahashi, A.; Hashimoto, M.

    2015-12-01

    Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.

  1. An Optimal Method for Detecting Internal and External Intrusion in MANET

    NASA Astrophysics Data System (ADS)

    Rafsanjani, Marjan Kuchaki; Aliahmadipour, Laya; Javidi, Mohammad M.

    Mobile Ad hoc Network (MANET) is formed by a set of mobile hosts which communicate among themselves through radio waves. The hosts establish infrastructure and cooperate to forward data in a multi-hop fashion without a central administration. Due to their communication type and resources constraint, MANETs are vulnerable to diverse types of attacks and intrusions. In this paper, we proposed a method for prevention internal intruder and detection external intruder by using game theory in mobile ad hoc network. One optimal solution for reducing the resource consumption of detection external intruder is to elect a leader for each cluster to provide intrusion service to other nodes in the its cluster, we call this mode moderate mode. Moderate mode is only suitable when the probability of attack is low. Once the probability of attack is high, victim nodes should launch their own IDS to detect and thwart intrusions and we call robust mode. In this paper leader should not be malicious or selfish node and must detect external intrusion in its cluster with minimum cost. Our proposed method has three steps: the first step building trust relationship between nodes and estimation trust value for each node to prevent internal intrusion. In the second step we propose an optimal method for leader election by using trust value; and in the third step, finding the threshold value for notifying the victim node to launch its IDS once the probability of attack exceeds that value. In first and third step we apply Bayesian game theory. Our method due to using game theory, trust value and honest leader can effectively improve the network security, performance and reduce resource consumption.

  2. An Unlikely Radio Halo in the Low X-Ray Luminosity Galaxy Cluster RXCJ1514.9-1523

    NASA Technical Reports Server (NTRS)

    Marketvitch, M.; ZuHone, J. A.; Lee, D.; Giacintucci, S.; Dallacasa, D.; Venturi, T.; Brunetti, G.; Cassano, R.; Markevitch, M.; Athreya, R. M.

    2011-01-01

    Aims: We report the discovery of a giant radio halo in the galaxy cluster RXCJ1514,9-1523 at z=0.22 with a relatively low X-ray luminosity, L(sub X) (0.1-2.4kev) approx. 7 x 10(exp 44) ergs/s. Methods: This faint, diffuse radio source is detected with the Giant Meterwave Radio Telescope at 327 MHz. The source is barely detected at 1.4 GHz in a NVSS pointing that we have reanalyzed. Results: The integrated radio spectrum of the halo is quite steep, with a slope alpha = 1.6 between 327 MHz and 1.4 GHz. While giant radio halos are common in more X-ray luminous cluster mergers, there is a less than 10% probability to detect a halo in systems with L(sub X) < 8 x 10(exp 44) ergs/s. The detection of a new giant halo in this borderline luminosity regime can be particularly useful for discriminating between the competing theories for the origin of ultrarelativistic electrons in clusters. Furthermore, if our steep radio spectral index is confirmed by future deeper radio observations, this cluster would provide another example of the very rare, new class of ultra-steep spectrum radio halos, predicted by the model in which the cluster cosmic ray electrons are produced by turbulent reacceleration.

  3. Molecular Subtyping to Detect Human Listeriosis Clusters

    PubMed Central

    Sauders, Brian D.; Fortes, Esther D.; Morse, Dale L.; Dumas, Nellie; Kiehlbauch, Julia A.; Schukken, Ynte; Hibbs, Jonathan R.

    2003-01-01

    We analyzed the diversity (Simpson’s Index, D) and distribution of Listeria monocytogenes in human listeriosis cases in New York State (excluding New York City) from November 1996 to June 2000 by using automated ribotyping and pulsed-field gel electrophoresis (PFGE). We applied a scan statistic (p<0.05) to detect listeriosis clusters caused by a specific Listeria monocytogenes subtype. Of 131 human isolates, 34 (D=0.923) ribotypes and 74 (D=0.975) PFGE types were found. Nine (31% of cases) clusters were identified by ribotype or PFGE; five (18% of cases) clusters were identified by using both methods. Two of the nine clusters (13% of cases) identified corresponded with investigated multistate listeriosis outbreaks. While most human listeriosis cases are considered sporadic, highly discriminatory molecular subtyping approaches thus indicated that 13% to 31% of cases reported in New York State may represent single-source clusters. Listeriosis control and reduction efforts should include broad-based subtyping of human isolates and consider that a large number of cases may represent outbreaks. PMID:12781006

  4. A cross-species bi-clustering approach to identifying conserved co-regulated genes.

    PubMed

    Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

    2016-06-15

    A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ jinbo@engr.uconn.edu. © The Author 2016. Published by Oxford University Press.

  5. CLUSTERING OF INTERICTAL SPIKES BY DYNAMIC TIME WARPING AND AFFINITY PROPAGATION

    PubMed Central

    Thomas, John; Jin, Jing; Dauwels, Justin; Cash, Sydney S.; Westover, M. Brandon

    2018-01-01

    Epilepsy is often associated with the presence of spikes in electroencephalograms (EEGs). The spike waveforms vary vastly among epilepsy patients, and also for the same patient across time. In order to develop semi-automated and automated methods for detecting spikes, it is crucial to obtain a better understanding of the various spike shapes. In this paper, we develop several approaches to extract exemplars of spikes. We generate spike exemplars by applying clustering algorithms to a database of spikes from 12 patients. As similarity measures for clustering, we consider the Euclidean distance and Dynamic Time Warping (DTW). We assess two clustering algorithms, namely, K-means clustering and affinity propagation. The clustering methods are compared based on the mean squared error, and the similarity measures are assessed based on the number of generated spike clusters. Affinity propagation with DTW is shown to be the best combination for clustering epileptic spikes, since it generates fewer spike templates and does not require to pre-specify the number of spike templates. PMID:29527130

  6. Digital image modification detection using color information and its histograms.

    PubMed

    Zhou, Haoyu; Shen, Yue; Zhu, Xinghui; Liu, Bo; Fu, Zigang; Fan, Na

    2016-09-01

    The rapid development of many open source and commercial image editing software makes the authenticity of the digital images questionable. Copy-move forgery is one of the most widely used tampering techniques to create desirable objects or conceal undesirable objects in a scene. Existing techniques reported in the literature to detect such tampering aim to improve the robustness of these methods against the use of JPEG compression, blurring, noise, or other types of post processing operations. These post processing operations are frequently used with the intention to conceal tampering and reduce tampering clues. A robust method based on the color moments and other five image descriptors is proposed in this paper. The method divides the image into fixed size overlapping blocks. Clustering operation divides entire search space into smaller pieces with similar color distribution. Blocks from the tampered regions will reside within the same cluster since both copied and moved regions have similar color distributions. Five image descriptors are used to extract block features, which makes the method more robust to post processing operations. An ensemble of deep compositional pattern-producing neural networks are trained with these extracted features. Similarity among feature vectors in clusters indicates possible forged regions. Experimental results show that the proposed method can detect copy-move forgery even if an image was distorted by gamma correction, addictive white Gaussian noise, JPEG compression, or blurring. Copyright © 2016. Published by Elsevier Ireland Ltd.

  7. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality.

    PubMed

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M

    2008-11-07

    Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit.

  8. Towards Linking Anonymous Authorship in Casual Sexual Encounter Ads

    PubMed Central

    Fries, Jason A.; Segre, Alberto M.; Polgreen, Philip M.

    2013-01-01

    Objective This paper constructs an authorship-linked collection or corpus of anonymous, sex-seeking ads found on the classifieds website Craigslist. This corpus is then used to validate an authorship attribution approach based on identifying near duplicate text in ad clusters, providing insight into how often anonymous individuals post sex-seeking ads and where they meet for encounters. Introduction The increasing use of the Internet to arrange sexual encounters presents challenges to public health agencies formulating STD interventions, particularly in the context of anonymous encounters. These encounters complicate or break traditional interventions. In previous work [1], we examined a corpus of anonymous personal ads seeking sexual encounters from the classifieds website Craigslist and presented a way of linking multiple ads posted across time to a single author. The key observation of our approach is that some ads are simply reposts of older ads, often updated with only minor textual changes. Under the presumption that these ads, when not spam, originate from the same author, we can use efficient near-duplicate detection techniques to cluster ads within some threshold similarity. Linking ads in this way allows us to preserve the anonymity of authors while still extracting useful information on the frequency with which authors post ads, as well as the geographic regions in which they seek encounters. While this process detects many clusters, the lack of a true corpus of authorship-linked ads makes it difficult to validate and tune the parameters of our system. Fortunately, many ad authors provide an obfuscated telephone number in ad text (e.g., 867–5309 becomes 8sixseven5three oh nine) to bypass Craigslist filters, which prohibit including phone numbers in personal ads. By matching phone numbers of this type across all ads, we can create a corpus of ad clusters known to be written by a single author. This authorship corpus can then be used to evaluate and tune our existing near-duplicate detection system, and in the future identify features for more robust authorship attribution techniques. Methods From 7-1-2009 until 7-1-2011, RSS feeds were collected daily for 8 personal ad categories from 414 sites across the United States, for a total of 67 million ads. To create an anonymous, author-linked corpus, we used a regular expression to identify obfuscated phone numbers in ad text. We measure the ability of near-duplicate detection to link clusters in two ways: 1) detecting all ads in a cluster; and 2) correctly detecting a subset of ads within a single cluster. Ads incorrectly assigned to more than 1 cluster are considered false positives. All results are reported in terms of precision, recall, and F-scores (common information retrieval metrics) across cluster size, expressed as number of ads. Results 652,014 ads contained phone numbers, producing a total of 46,079 authorship-linked ad clusters. For detecting all ads within a cluster, precision ranged from 0.05 to 0.0 and recall from 0.02 to 0.0 for all cluster sizes. For detecting partial clusters, see Figure 1. Conclusions We find that near-duplicate detection alone is insufficient to detect all ads within a cluster. However, we do find that the process can, with high precision and low recall, detect a subset of ads associated with a single author. This follows the intuition that an author’s total set of ads is itself comprised of multiple self-similar subsets. While a near-duplicate detection approach can correctly identify subsets of ads linked to a single author, this process alone cannot attribute multiple clusters to a single author. Future work will explore leveraging additional linguistic features to improve author attribution. (Top) Evaluations for partial cluster detection using the near-duplicate identification approach to linking anonymous authorship in Craigslist ads and (bottom) the distribution of ad cluster sizes.

  9. Human recognition based on head-shoulder contour extraction and BP neural network

    NASA Astrophysics Data System (ADS)

    Kong, Xiao-fang; Wang, Xiu-qin; Gu, Guohua; Chen, Qian; Qian, Wei-xian

    2014-11-01

    In practical application scenarios like video surveillance and human-computer interaction, human body movements are uncertain because the human body is a non-rigid object. Based on the fact that the head-shoulder part of human body can be less affected by the movement, and will seldom be obscured by other objects, in human detection and recognition, a head-shoulder model with its stable characteristics can be applied as a detection feature to describe the human body. In order to extract the head-shoulder contour accurately, a head-shoulder model establish method with combination of edge detection and the mean-shift algorithm in image clustering has been proposed in this paper. First, an adaptive method of mixture Gaussian background update has been used to extract targets from the video sequence. Second, edge detection has been used to extract the contour of moving objects, and the mean-shift algorithm has been combined to cluster parts of target's contour. Third, the head-shoulder model can be established, according to the width and height ratio of human head-shoulder combined with the projection histogram of the binary image, and the eigenvectors of the head-shoulder contour can be acquired. Finally, the relationship between head-shoulder contour eigenvectors and the moving objects will be formed by the training of back-propagation (BP) neural network classifier, and the human head-shoulder model can be clustered for human detection and recognition. Experiments have shown that the method combined with edge detection and mean-shift algorithm proposed in this paper can extract the complete head-shoulder contour, with low calculating complexity and high efficiency.

  10. Evaluation of sliding baseline methods for spatial estimation for cluster detection in the biosurveillance system

    PubMed Central

    Xing, Jian; Burkom, Howard; Moniz, Linda; Edgerton, James; Leuze, Michael; Tokars, Jerome

    2009-01-01

    Background The Centers for Disease Control and Prevention's (CDC's) BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. Methods The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate) and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. Results Simple estimation methods that account for day-of-week (DOW) data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. Conclusion The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving different spatial resolution or other syndromes can yield further improvement. PMID:19615075

  11. Detection of Clostridium difficile infection clusters, using the temporal scan statistic, in a community hospital in southern Ontario, Canada, 2006-2011.

    PubMed

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-05-12

    In hospitals, Clostridium difficile infection (CDI) surveillance relies on unvalidated guidelines or threshold criteria to identify outbreaks. This can result in false-positive and -negative cluster alarms. The application of statistical methods to identify and understand CDI clusters may be a useful alternative or complement to standard surveillance techniques. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting CDI clusters and determine if there are significant differences in the rate of CDI cases by month, season, and year in a community hospital. Bacteriology reports of patients identified with a CDI from August 2006 to February 2011 were collected. For patients detected with CDI from March 2010 to February 2011, stool specimens were obtained. Clostridium difficile isolates were characterized by ribotyping and investigated for the presence of toxin genes by PCR. CDI clusters were investigated using a retrospective temporal scan test statistic. Statistically significant clusters were compared to known CDI outbreaks within the hospital. A negative binomial regression model was used to identify associations between year, season, month and the rate of CDI cases. Overall, 86 CDI cases were identified. Eighteen specimens were analyzed and nine ribotypes were classified with ribotype 027 (n = 6) the most prevalent. The temporal scan statistic identified significant CDI clusters at the hospital (n = 5), service (n = 6), and ward (n = 4) levels (P ≤ 0.05). Three clusters were concordant with the one C. difficile outbreak identified by hospital personnel. Two clusters were identified as potential outbreaks. The negative binomial model indicated years 2007-2010 (P ≤ 0.05) had decreased CDI rates compared to 2006 and spring had an increased CDI rate compared to the fall (P = 0.023). Application of the temporal scan statistic identified several clusters, including potential outbreaks not detected by hospital personnel. The identification of time periods with decreased or increased CDI rates may have been a result of specific hospital events. Understanding the clustering of CDIs can aid in the interpretation of surveillance data and lead to the development of better early detection systems.

  12. Mixed Pattern Matching-Based Traffic Abnormal Behavior Recognition

    PubMed Central

    Cui, Zhiming; Zhao, Pengpeng

    2014-01-01

    A motion trajectory is an intuitive representation form in time-space domain for a micromotion behavior of moving target. Trajectory analysis is an important approach to recognize abnormal behaviors of moving targets. Against the complexity of vehicle trajectories, this paper first proposed a trajectory pattern learning method based on dynamic time warping (DTW) and spectral clustering. It introduced the DTW distance to measure the distances between vehicle trajectories and determined the number of clusters automatically by a spectral clustering algorithm based on the distance matrix. Then, it clusters sample data points into different clusters. After the spatial patterns and direction patterns learned from the clusters, a recognition method for detecting vehicle abnormal behaviors based on mixed pattern matching was proposed. The experimental results show that the proposed technical scheme can recognize main types of traffic abnormal behaviors effectively and has good robustness. The real-world application verified its feasibility and the validity. PMID:24605045

  13. Local Higher-Order Graph Clustering

    PubMed Central

    Yin, Hao; Benson, Austin R.; Leskovec, Jure; Gleich, David F.

    2018-01-01

    Local graph clustering methods aim to find a cluster of nodes by exploring a small region of the graph. These methods are attractive because they enable targeted clustering around a given seed node and are faster than traditional global graph clustering methods because their runtime does not depend on the size of the input graph. However, current local graph partitioning methods are not designed to account for the higher-order structures crucial to the network, nor can they effectively handle directed networks. Here we introduce a new class of local graph clustering methods that address these issues by incorporating higher-order network information captured by small subgraphs, also called network motifs. We develop the Motif-based Approximate Personalized PageRank (MAPPR) algorithm that finds clusters containing a seed node with minimal motif conductance, a generalization of the conductance metric for network motifs. We generalize existing theory to prove the fast running time (independent of the size of the graph) and obtain theoretical guarantees on the cluster quality (in terms of motif conductance). We also develop a theory of node neighborhoods for finding sets that have small motif conductance, and apply these results to the case of finding good seed nodes to use as input to the MAPPR algorithm. Experimental validation on community detection tasks in both synthetic and real-world networks, shows that our new framework MAPPR outperforms the current edge-based personalized PageRank methodology. PMID:29770258

  14. Detecting space-time cancer clusters using residential histories

    NASA Astrophysics Data System (ADS)

    Jacquez, Geoffrey M.; Meliker, Jaymie R.

    2007-04-01

    Methods for analyzing geographic clusters of disease typically ignore the space-time variability inherent in epidemiologic datasets, do not adequately account for known risk factors (e.g., smoking and education) or covariates (e.g., age, gender, and race), and do not permit investigation of the latency window between exposure and disease. Our research group recently developed Q-statistics for evaluating space-time clustering in cancer case-control studies with residential histories. This technique relies on time-dependent nearest neighbor relationships to examine clustering at any moment in the life-course of the residential histories of cases relative to that of controls. In addition, in place of the widely used null hypothesis of spatial randomness, each individual's probability of being a case is instead based on his/her risk factors and covariates. Case-control clusters will be presented using residential histories of 220 bladder cancer cases and 440 controls in Michigan. In preliminary analyses of this dataset, smoking, age, gender, race and education were sufficient to explain the majority of the clustering of residential histories of the cases. Clusters of unexplained risk, however, were identified surrounding the business address histories of 10 industries that emit known or suspected bladder cancer carcinogens. The clustering of 5 of these industries began in the 1970's and persisted through the 1990's. This systematic approach for evaluating space-time clustering has the potential to generate novel hypotheses about environmental risk factors. These methods may be extended to detect differences in space-time patterns of any two groups of people, making them valuable for security intelligence and surveillance operations.

  15. A Web-Based Multidrug-Resistant Organisms Surveillance and Outbreak Detection System with Rule-Based Classification and Clustering

    PubMed Central

    Tseng, Yi-Ju; Wu, Jung-Hsuan; Ping, Xiao-Ou; Lin, Hui-Chi; Chen, Ying-Yu; Shang, Rung-Ji; Chen, Ming-Yuan; Lai, Feipei

    2012-01-01

    Background The emergence and spread of multidrug-resistant organisms (MDROs) are causing a global crisis. Combating antimicrobial resistance requires prevention of transmission of resistant organisms and improved use of antimicrobials. Objectives To develop a Web-based information system for automatic integration, analysis, and interpretation of the antimicrobial susceptibility of all clinical isolates that incorporates rule-based classification and cluster analysis of MDROs and implements control chart analysis to facilitate outbreak detection. Methods Electronic microbiological data from a 2200-bed teaching hospital in Taiwan were classified according to predefined criteria of MDROs. The numbers of organisms, patients, and incident patients in each MDRO pattern were presented graphically to describe spatial and time information in a Web-based user interface. Hierarchical clustering with 7 upper control limits (UCL) was used to detect suspicious outbreaks. The system’s performance in outbreak detection was evaluated based on vancomycin-resistant enterococcal outbreaks determined by a hospital-wide prospective active surveillance database compiled by infection control personnel. Results The optimal UCL for MDRO outbreak detection was the upper 90% confidence interval (CI) using germ criterion with clustering (area under ROC curve (AUC) 0.93, 95% CI 0.91 to 0.95), upper 85% CI using patient criterion (AUC 0.87, 95% CI 0.80 to 0.93), and one standard deviation using incident patient criterion (AUC 0.84, 95% CI 0.75 to 0.92). The performance indicators of each UCL were statistically significantly higher with clustering than those without clustering in germ criterion (P < .001), patient criterion (P = .04), and incident patient criterion (P < .001). Conclusion This system automatically identifies MDROs and accurately detects suspicious outbreaks of MDROs based on the antimicrobial susceptibility of all clinical isolates. PMID:23195868

  16. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    NASA Astrophysics Data System (ADS)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  17. Detecting false positive sequence homology: a machine learning approach.

    PubMed

    Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Bybee, Seth M

    2016-02-24

    Accurate detection of homologous relationships of biological sequences (DNA or amino acid) amongst organisms is an important and often difficult task that is essential to various evolutionary studies, ranging from building phylogenies to predicting functional gene annotations. There are many existing heuristic tools, most commonly based on bidirectional BLAST searches that are used to identify homologous genes and combine them into two fundamentally distinct classes: orthologs and paralogs. Due to only using heuristic filtering based on significance score cutoffs and having no cluster post-processing tools available, these methods can often produce multiple clusters constituting unrelated (non-homologous) sequences. Therefore sequencing data extracted from incomplete genome/transcriptome assemblies originated from low coverage sequencing or produced by de novo processes without a reference genome are susceptible to high false positive rates of homology detection. In this paper we develop biologically informative features that can be extracted from multiple sequence alignments of putative homologous genes (orthologs and paralogs) and further utilized in context of guided experimentation to verify false positive outcomes. We demonstrate that our machine learning method trained on both known homology clusters obtained from OrthoDB and randomly generated sequence alignments (non-homologs), successfully determines apparent false positives inferred by heuristic algorithms especially among proteomes recovered from low-coverage RNA-seq data. Almost ~42 % and ~25 % of predicted putative homologies by InParanoid and HaMStR respectively were classified as false positives on experimental data set. Our process increases the quality of output from other clustering algorithms by providing a novel post-processing method that is both fast and efficient at removing low quality clusters of putative homologous genes recovered by heuristic-based approaches.

  18. Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm

    NASA Astrophysics Data System (ADS)

    Frisca, Bustamam, Alhadi; Siswantining, Titin

    2017-03-01

    Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.

  19. A Routing Path Construction Method for Key Dissemination Messages in Sensor Networks

    PubMed Central

    Moon, Soo Young; Cho, Tae Ho

    2014-01-01

    Authentication is an important security mechanism for detecting forged messages in a sensor network. Each cluster head (CH) in dynamic key distribution schemes forwards a key dissemination message that contains encrypted authentication keys within its cluster to next-hop nodes for the purpose of authentication. The forwarding path of the key dissemination message strongly affects the number of nodes to which the authentication keys in the message are actually distributed. We propose a routing method for the key dissemination messages to increase the number of nodes that obtain the authentication keys. In the proposed method, each node selects next-hop nodes to which the key dissemination message will be forwarded based on secret key indexes, the distance to the sink node, and the energy consumption of its neighbor nodes. The experimental results show that the proposed method can increase by 50–70% the number of nodes to which authentication keys in each cluster are distributed compared to geographic and energy-aware routing (GEAR). In addition, the proposed method can detect false reports earlier by using the distributed authentication keys, and it consumes less energy than GEAR when the false traffic ratio (FTR) is ≥10%. PMID:25136649

  20. The statistical average of optical properties for alumina particle cluster in aircraft plume

    NASA Astrophysics Data System (ADS)

    Li, Jingying; Bai, Lu; Wu, Zhensen; Guo, Lixin

    2018-04-01

    We establish a model for lognormal distribution of monomer radius and number of alumina particle clusters in plume. According to the Multi-Sphere T Matrix (MSTM) theory, we provide a method for finding the statistical average of optical properties for alumina particle clusters in plume, analyze the effect of different distributions and different detection wavelengths on the statistical average of optical properties for alumina particle cluster, and compare the statistical average optical properties under the alumina particle cluster model established in this study and those under three simplified alumina particle models. The calculation results show that the monomer number of alumina particle cluster and its size distribution have a considerable effect on its statistical average optical properties. The statistical average of optical properties for alumina particle cluster at common detection wavelengths exhibit obvious differences, whose differences have a great effect on modeling IR and UV radiation properties of plume. Compared with the three simplified models, the alumina particle cluster model herein features both higher extinction and scattering efficiencies. Therefore, we may find that an accurate description of the scattering properties of alumina particles in aircraft plume is of great significance in the study of plume radiation properties.

  1. Locating Structural Centers: A Density-Based Clustering Method for Community Detection

    PubMed Central

    Liu, Gongshen; Li, Jianhua; Nees, Jan P.

    2017-01-01

    Uncovering underlying community structures in complex networks has received considerable attention because of its importance in understanding structural attributes and group characteristics of networks. The algorithmic identification of such structures is a significant challenge. Local expanding methods have proven to be efficient and effective in community detection, but most methods are sensitive to initial seeds and built-in parameters. In this paper, we present a local expansion method by density-based clustering, which aims to uncover the intrinsic network communities by locating the structural centers of communities based on a proposed structural centrality. The structural centrality takes into account local density of nodes and relative distance between nodes. The proposed algorithm expands a community from the structural center to the border with a single local search procedure. The local expanding procedure follows a heuristic strategy as allowing it to find complete community structures. Moreover, it can identify different node roles (cores and outliers) in communities by defining a border region. The experiments involve both on real-world and artificial networks, and give a comparison view to evaluate the proposed method. The result of these experiments shows that the proposed method performs more efficiently with a comparative clustering performance than current state of the art methods. PMID:28046030

  2. EVIDENCE FOR THE UNIVERSALITY OF PROPERTIES OF RED-SEQUENCE GALAXIES IN X-RAY- AND RED-SEQUENCE-SELECTED CLUSTERS AT z ∼ 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foltz, R.; Wilson, G.; DeGroot, A.

    We study the slope, intercept, and scatter of the color–magnitude and color–mass relations for a sample of 10 infrared red-sequence-selected clusters at z ∼ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ≳ 3 with an age spread Δt ≳ 1 Gyr. We compare UVJ color–color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color–magnitude relations from our red-sequence selected sample with X-ray- and photometric-redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable tomore » detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ∼ 1.« less

  3. Efficient evaluation of sampling quality of molecular dynamics simulations by clustering of dihedral torsion angles and Sammon mapping.

    PubMed

    Frickenhaus, Stephan; Kannan, Srinivasaraghavan; Zacharias, Martin

    2009-02-01

    A direct conformational clustering and mapping approach for peptide conformations based on backbone dihedral angles has been developed and applied to compare conformational sampling of Met-enkephalin using two molecular dynamics (MD) methods. Efficient clustering in dihedrals has been achieved by evaluating all combinations resulting from independent clustering of each dihedral angle distribution, thus resolving all conformational substates. In contrast, Cartesian clustering was unable to accurately distinguish between all substates. Projection of clusters on dihedral principal component (PCA) subspaces did not result in efficient separation of highly populated clusters. However, representation in a nonlinear metric by Sammon mapping was able to separate well the 48 highest populated clusters in just two dimensions. In addition, this approach also allowed us to visualize the transition frequencies between clusters efficiently. Significantly, higher transition frequencies between more distinct conformational substates were found for a recently developed biasing-potential replica exchange MD simulation method allowing faster sampling of possible substates compared to conventional MD simulations. Although the number of theoretically possible clusters grows exponentially with peptide length, in practice, the number of clusters is only limited by the sampling size (typically much smaller), and therefore the method is well suited also for large systems. The approach could be useful to rapidly and accurately evaluate conformational sampling during MD simulations, to compare different sampling strategies and eventually to detect kinetic bottlenecks in folding pathways.

  4. Daily Reportable Disease Spatiotemporal Cluster Detection, New York City, New York, USA, 2014-2015.

    PubMed

    Greene, Sharon K; Peterson, Eric R; Kapell, Deborah; Fine, Annie D; Kulldorff, Martin

    2016-10-01

    Each day, the New York City Department of Health and Mental Hygiene uses the free SaTScan software to apply prospective space-time permutation scan statistics to strengthen early outbreak detection for 35 reportable diseases. This method prompted early detection of outbreaks of community-acquired legionellosis and shigellosis.

  5. Machine-Learning Inspired Seismic Phase Detection for Aftershocks of the 2008 MW7.9 Wenchuan Earthquake

    NASA Astrophysics Data System (ADS)

    Zhu, L.; Li, Z.; Li, C.; Wang, B.; Chen, Z.; McClellan, J. H.; Peng, Z.

    2017-12-01

    Spatial-temporal evolution of aftershocks is important for illumination of earthquake physics and for rapid response of devastative earthquakes. To improve aftershock catalogs of the 2008 MW7.9 Wenchuan earthquake in Sichuan, China, Alibaba cloud and China Earthquake Administration jointly launched a seismological contest in May 2017 [Fang et al., 2017]. This abstract describes how we handle this problem in this competition. We first used Short-Term Average/Long-Term Average (STA/LTA) and Kurtosis function to obtain over 55000 candidate phase picks (P or S). Based on Signal to Noise Ratio (SNR), about 40000 phases (P or S) are selected. So far, these 40000 phases have a hit rate of 40% among the manually picks. The causes include that 1) there exist false picks (neither P nor S); 2) some P and S arrivals are mis-labeled. To improve our results, we correlate the 40000 phases over continuous waveforms to obtain the phases missed by during the first pass. This results in 120,000 events. After constructing an affinity matrix based on the cross-correlation for newly detected phases, subspace clustering methods [Vidal 2011] are applied to group those phases into separated subspaces. Initial results show good agreement between empirical and clustered labels of P phases. Half of the empirical S phases are clustered into the P phase cluster. This may be a combined effect of 1) mislabeling isolated P phases to S phases and 2) clustering errors due to a small incomplete sample pool. Phases that were falsely detected in the initial results can be also teased out. To better characterize P and S phases, our next step is to apply subspace clustering methods directly to the waveforms, instead of using the cross-correlation coefficients of detected phases. After that, supervised learning, e.g., a convolutional neural network, can be employed to improve the pick accuracy. Updated results will be presented at the meeting.

  6. Quantitative detection of the colloidal gold immunochromatographic strip in HSV color space

    NASA Astrophysics Data System (ADS)

    Wu, Yuanshu; Gao, Yueming; Du, Min

    2014-09-01

    In this paper, a fast, reliable and accurate quantitative detection method for the colloidal gold immunochromatographic strip(GICA) is presented. An image acquisition device which is mainly composed of annular LED source, zoom ratio lens, and 10bit CMOS image sensors with 54.5dB SNR is designed for the detection. Firstly, the test line is extracted from the strip window through using the H component peak points of the HSV space as the clustering centers via the Fuzzy C-Means(FCM) clustering method. Then, a two dimensional eigenvalue composed with the hue(H) and saturation(S) of HSV space was proposed to improve the accuracy of the quantitative detection. At last, the experiment of human chorionic gonadotropin(HCG) with the concentration range 0-500mIU/mL is carried out. The results show that the linear correlation coefficient between this method and optical density(OD) values measured by the fiber optical sensor reach 96.74%. Meanwhile, the linearity of fitting curve constructed with concentration was greater than 95.00%.

  7. Unsupervised Learning and Pattern Recognition of Biological Data Structures with Density Functional Theory and Machine Learning.

    PubMed

    Chen, Chien-Chang; Juan, Hung-Hui; Tsai, Meng-Yuan; Lu, Henry Horng-Shing

    2018-01-11

    By introducing the methods of machine learning into the density functional theory, we made a detour for the construction of the most probable density function, which can be estimated by learning relevant features from the system of interest. Using the properties of universal functional, the vital core of density functional theory, the most probable cluster numbers and the corresponding cluster boundaries in a studying system can be simultaneously and automatically determined and the plausibility is erected on the Hohenberg-Kohn theorems. For the method validation and pragmatic applications, interdisciplinary problems from physical to biological systems were enumerated. The amalgamation of uncharged atomic clusters validated the unsupervised searching process of the cluster numbers and the corresponding cluster boundaries were exhibited likewise. High accurate clustering results of the Fisher's iris dataset showed the feasibility and the flexibility of the proposed scheme. Brain tumor detections from low-dimensional magnetic resonance imaging datasets and segmentations of high-dimensional neural network imageries in the Brainbow system were also used to inspect the method practicality. The experimental results exhibit the successful connection between the physical theory and the machine learning methods and will benefit the clinical diagnoses.

  8. Three-dimensional reconstruction of clustered microcalcifications from two digitized mammograms

    NASA Astrophysics Data System (ADS)

    Stotzka, Rainer; Mueller, Tim O.; Epper, Wolfgang; Gemmeke, Hartmut

    1998-06-01

    X-ray mammography is one of the most significant diagnosis methods in early detection of breast cancer. Usually two X- ray images from different angles are taken from each mamma to make even overlapping structures visible. X-ray mammography has a very high spatial resolution and can show microcalcifications of 50 - 200 micron in size. Clusters of microcalcifications are one of the most important and often the only indicator for malignant tumors. These calcifications are in some cases extremely difficult to detect. Computer assisted diagnosis of digitized mammograms may improve detection and interpretation of microcalcifications and cause more reliable diagnostic findings. We build a low-cost mammography workstation to detect and classify clusters of microcalcifications and tissue densities automatically. New in this approach is the estimation of the 3D formation of segmented microcalcifications and its visualization which will put additional diagnostic information at the radiologists disposal. The real problem using only two or three projections for reconstruction is the big loss of volume information. Therefore the arrangement of a cluster is estimated using only the positions of segmented microcalcifications. The arrangement of microcalcifications is visualized to the physician by rotating.

  9. Cluster randomised crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care.

    PubMed

    Forbes, Andrew B; Akram, Muhammad; Pilcher, David; Cooper, Jamie; Bellomo, Rinaldo

    2015-02-01

    Cluster randomised crossover trials have been utilised in recent years in the health and social sciences. Methods for analysis have been proposed; however, for binary outcomes, these have received little assessment of their appropriateness. In addition, methods for determination of sample size are currently limited to balanced cluster sizes both between clusters and between periods within clusters. This article aims to extend this work to unbalanced situations and to evaluate the properties of a variety of methods for analysis of binary data, with a particular focus on the setting of potential trials of near-universal interventions in intensive care to reduce in-hospital mortality. We derive a formula for sample size estimation for unbalanced cluster sizes, and apply it to the intensive care setting to demonstrate the utility of the cluster crossover design. We conduct a numerical simulation of the design in the intensive care setting and for more general configurations, and we assess the performance of three cluster summary estimators and an individual-data estimator based on binomial-identity-link regression. For settings similar to the intensive care scenario involving large cluster sizes and small intra-cluster correlations, the sample size formulae developed and analysis methods investigated are found to be appropriate, with the unweighted cluster summary method performing well relative to the more optimal but more complex inverse-variance weighted method. More generally, we find that the unweighted and cluster-size-weighted summary methods perform well, with the relative efficiency of each largely determined systematically from the study design parameters. Performance of individual-data regression is adequate with small cluster sizes but becomes inefficient for large, unbalanced cluster sizes. When outcome prevalences are 6% or less and the within-cluster-within-period correlation is 0.05 or larger, all methods display sub-nominal confidence interval coverage, with the less prevalent the outcome the worse the coverage. As with all simulation studies, conclusions are limited to the configurations studied. We confined attention to detecting intervention effects on an absolute risk scale using marginal models and did not explore properties of binary random effects models. Cluster crossover designs with binary outcomes can be analysed using simple cluster summary methods, and sample size in unbalanced cluster size settings can be determined using relatively straightforward formulae. However, caution needs to be applied in situations with low prevalence outcomes and moderate to high intra-cluster correlations. © The Author(s) 2014.

  10. An automated method for finding molecular complexes in large protein interaction networks

    PubMed Central

    Bader, Gary D; Hogue, Christopher WV

    2003-01-01

    Background Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. Results This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. Conclusion Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from . PMID:12525261

  11. The Principle of the Micro-Electronic Neural Bridge and a Prototype System Design.

    PubMed

    Huang, Zong-Hao; Wang, Zhi-Gong; Lu, Xiao-Ying; Li, Wen-Yuan; Zhou, Yu-Xuan; Shen, Xiao-Yan; Zhao, Xin-Tai

    2016-01-01

    The micro-electronic neural bridge (MENB) aims to rebuild lost motor function of paralyzed humans by routing movement-related signals from the brain, around the damage part in the spinal cord, to the external effectors. This study focused on the prototype system design of the MENB, including the principle of the MENB, the neural signal detecting circuit and the functional electrical stimulation (FES) circuit design, and the spike detecting and sorting algorithm. In this study, we developed a novel improved amplitude threshold spike detecting method based on variable forward difference threshold for both training and bridging phase. The discrete wavelet transform (DWT), a new level feature coefficient selection method based on Lilliefors test, and the k-means clustering method based on Mahalanobis distance were used for spike sorting. A real-time online spike detecting and sorting algorithm based on DWT and Euclidean distance was also implemented for the bridging phase. Tested by the data sets available at Caltech, in the training phase, the average sensitivity, specificity, and clustering accuracies are 99.43%, 97.83%, and 95.45%, respectively. Validated by the three-fold cross-validation method, the average sensitivity, specificity, and classification accuracy are 99.43%, 97.70%, and 96.46%, respectively.

  12. Searching for galaxy clusters in the Kilo-Degree Survey

    NASA Astrophysics Data System (ADS)

    Radovich, M.; Puddu, E.; Bellagamba, F.; Roncarelli, M.; Moscardini, L.; Bardelli, S.; Grado, A.; Getman, F.; Maturi, M.; Huang, Z.; Napolitano, N.; McFarland, J.; Valentijn, E.; Bilicki, M.

    2017-02-01

    Aims: In this paper, we present the tools used to search for galaxy clusters in the Kilo Degree Survey (KiDS), and our first results. Methods: The cluster detection is based on an implementation of the optimal filtering technique that enables us to identify clusters as over-densities in the distribution of galaxies using their positions on the sky, magnitudes, and photometric redshifts. The contamination and completeness of the cluster catalog are derived using mock catalogs based on the data themselves. The optimal signal to noise threshold for the cluster detection is obtained by randomizing the galaxy positions and selecting the value that produces a contamination of less than 20%. Starting from a subset of clusters detected with high significance at low redshifts, we shift them to higher redshifts to estimate the completeness as a function of redshift: the average completeness is 85%. An estimate of the mass of the clusters is derived using the richness as a proxy. Results: We obtained 1858 candidate clusters with redshift 0

  13. Source selection for cluster weak lensing measurements in the Hyper Suprime-Cam survey

    NASA Astrophysics Data System (ADS)

    Medezinski, Elinor; Oguri, Masamune; Nishizawa, Atsushi J.; Speagle, Joshua S.; Miyatake, Hironao; Umetsu, Keiichi; Leauthaud, Alexie; Murata, Ryoma; Mandelbaum, Rachel; Sifón, Cristóbal; Strauss, Michael A.; Huang, Song; Simet, Melanie; Okabe, Nobuhiro; Tanaka, Masayuki; Komiyama, Yutaka

    2018-03-01

    We present optimized source galaxy selection schemes for measuring cluster weak lensing (WL) mass profiles unaffected by cluster member dilution from the Subaru Hyper Suprime-Cam Strategic Survey Program (HSC-SSP). The ongoing HSC-SSP survey will uncover thousands of galaxy clusters to z ≲ 1.5. In deriving cluster masses via WL, a critical source of systematics is contamination and dilution of the lensing signal by cluster members, and by foreground galaxies whose photometric redshifts are biased. Using the first-year CAMIRA catalog of ˜900 clusters with richness larger than 20 found in ˜140 deg2 of HSC-SSP data, we devise and compare several source selection methods, including selection in color-color space (CC-cut), and selection of robust photometric redshifts by applying constraints on their cumulative probability distribution function (P-cut). We examine the dependence of the contamination on the chosen limits adopted for each method. Using the proper limits, these methods give mass profiles with minimal dilution in agreement with one another. We find that not adopting either the CC-cut or P-cut methods results in an underestimation of the total cluster mass (13% ± 4%) and the concentration of the profile (24% ± 11%). The level of cluster contamination can reach as high as ˜10% at R ≈ 0.24 Mpc/h for low-z clusters without cuts, while employing either the P-cut or CC-cut results in cluster contamination consistent with zero to within the 0.5% uncertainties. Our robust methods yield a ˜60 σ detection of the stacked CAMIRA surface mass density profile, with a mean mass of M200c = [1.67 ± 0.05(stat)] × 1014 M⊙/h.

  14. Community detection for fluorescent lifetime microscopy image segmentation

    NASA Astrophysics Data System (ADS)

    Hu, Dandan; Sarder, Pinaki; Ronhovde, Peter; Achilefu, Samuel; Nussinov, Zohar

    2014-03-01

    Multiresolution community detection (CD) method has been suggested in a recent work as an efficient method for performing unsupervised segmentation of fluorescence lifetime (FLT) images of live cell images containing fluorescent molecular probes.1 In the current paper, we further explore this method in FLT images of ex vivo tissue slices. The image processing problem is framed as identifying clusters with respective average FLTs against a background or "solvent" in FLT imaging microscopy (FLIM) images derived using NIR fluorescent dyes. We have identified significant multiresolution structures using replica correlations in these images, where such correlations are manifested by information theoretic overlaps of the independent solutions ("replicas") attained using the multiresolution CD method from different starting points. In this paper, our method is found to be more efficient than a current state-of-the-art image segmentation method based on mixture of Gaussian distributions. It offers more than 1:25 times diversity based on Shannon index than the latter method, in selecting clusters with distinct average FLTs in NIR FLIM images.

  15. Network Modeling and Energy-Efficiency Optimization for Advanced Machine-to-Machine Sensor Networks

    PubMed Central

    Jung, Sungmo; Kim, Jong Hyun; Kim, Seoksoo

    2012-01-01

    Wireless machine-to-machine sensor networks with multiple radio interfaces are expected to have several advantages, including high spatial scalability, low event detection latency, and low energy consumption. Here, we propose a network model design method involving network approximation and an optimized multi-tiered clustering algorithm that maximizes node lifespan by minimizing energy consumption in a non-uniformly distributed network. Simulation results show that the cluster scales and network parameters determined with the proposed method facilitate a more efficient performance compared to existing methods. PMID:23202190

  16. Community detection in complex networks using proximate support vector clustering

    NASA Astrophysics Data System (ADS)

    Wang, Feifan; Zhang, Baihai; Chai, Senchun; Xia, Yuanqing

    2018-03-01

    Community structure, one of the most attention attracting properties in complex networks, has been a cornerstone in advances of various scientific branches. A number of tools have been involved in recent studies concentrating on the community detection algorithms. In this paper, we propose a support vector clustering method based on a proximity graph, owing to which the introduced algorithm surpasses the traditional support vector approach both in accuracy and complexity. Results of extensive experiments undertaken on computer generated networks and real world data sets illustrate competent performances in comparison with the other counterparts.

  17. Network clustering and community detection using modulus of families of loops.

    PubMed

    Shakeri, Heman; Poggi-Corradini, Pietro; Albin, Nathan; Scoglio, Caterina

    2017-01-01

    We study the structure of loops in networks using the notion of modulus of loop families. We introduce an alternate measure of network clustering by quantifying the richness of families of (simple) loops. Modulus tries to minimize the expected overlap among loops by spreading the expected link usage optimally. We propose weighting networks using these expected link usages to improve classical community detection algorithms. We show that the proposed method enhances the performance of certain algorithms, such as spectral partitioning and modularity maximization heuristics, on standard benchmarks.

  18. An approach to online network monitoring using clustered patterns

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Jinoh; Sim, Alex; Suh, Sang C.

    Network traffic monitoring is a core element in network operations and management for various purposes such as anomaly detection, change detection, and fault/failure detection. In this study, we introduce a new approach to online monitoring using a pattern-based representation of the network traffic. Unlike the past online techniques limited to a single variable to summarize (e.g., sketch), the focus of this study is on capturing the network state from the multivariate attributes under consideration. To this end, we employ clustering with its benefit of the aggregation of multidimensional variables. The clustered result represents the state of the network with regardmore » to the monitored variables, which can also be compared with the previously observed patterns visually and quantitatively. Finally, we demonstrate the proposed method with two popular use cases, one for estimating state changes and the other for identifying anomalous states, to confirm its feasibility.« less

  19. An approach to online network monitoring using clustered patterns

    DOE PAGES

    Kim, Jinoh; Sim, Alex; Suh, Sang C.; ...

    2017-03-13

    Network traffic monitoring is a core element in network operations and management for various purposes such as anomaly detection, change detection, and fault/failure detection. In this study, we introduce a new approach to online monitoring using a pattern-based representation of the network traffic. Unlike the past online techniques limited to a single variable to summarize (e.g., sketch), the focus of this study is on capturing the network state from the multivariate attributes under consideration. To this end, we employ clustering with its benefit of the aggregation of multidimensional variables. The clustered result represents the state of the network with regardmore » to the monitored variables, which can also be compared with the previously observed patterns visually and quantitatively. Finally, we demonstrate the proposed method with two popular use cases, one for estimating state changes and the other for identifying anomalous states, to confirm its feasibility.« less

  20. A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

    PubMed Central

    Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven

    2017-01-01

    Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313

  1. Bearing performance degradation assessment based on a combination of empirical mode decomposition and k-medoids clustering

    NASA Astrophysics Data System (ADS)

    Rai, Akhand; Upadhyay, S. H.

    2017-09-01

    Bearing is the most critical component in rotating machinery since it is more susceptible to failure. The monitoring of degradation in bearings becomes of great concern for averting the sudden machinery breakdown. In this study, a novel method for bearing performance degradation assessment (PDA) based on an amalgamation of empirical mode decomposition (EMD) and k-medoids clustering is encouraged. The fault features are extracted from the bearing signals using the EMD process. The extracted features are then subjected to k-medoids based clustering for obtaining the normal state and failure state cluster centres. A confidence value (CV) curve based on dissimilarity of the test data object to the normal state is obtained and employed as the degradation indicator for assessing the health of bearings. The proposed outlook is applied on the vibration signals collected in run-to-failure tests of bearings to assess its effectiveness in bearing PDA. To validate the superiority of the suggested approach, it is compared with commonly used time-domain features RMS and kurtosis, well-known fault diagnosis method envelope analysis (EA) and existing PDA classifiers i.e. self-organizing maps (SOM) and Fuzzy c-means (FCM). The results demonstrate that the recommended method outperforms the time-domain features, SOM and FCM based PDA in detecting the early stage degradation more precisely. Moreover, EA can be used as an accompanying method to confirm the early stage defect detected by the proposed bearing PDA approach. The study shows the potential application of k-medoids clustering as an effective tool for PDA of bearings.

  2. A segmentation method for lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise

    PubMed Central

    Zhang, Wei; Zhang, Xiaolong; Qiang, Yan; Tian, Qi; Tang, Xiaoxian

    2017-01-01

    The fast and accurate segmentation of lung nodule image sequences is the basis of subsequent processing and diagnostic analyses. However, previous research investigating nodule segmentation algorithms cannot entirely segment cavitary nodules, and the segmentation of juxta-vascular nodules is inaccurate and inefficient. To solve these problems, we propose a new method for the segmentation of lung nodule image sequences based on superpixels and density-based spatial clustering of applications with noise (DBSCAN). First, our method uses three-dimensional computed tomography image features of the average intensity projection combined with multi-scale dot enhancement for preprocessing. Hexagonal clustering and morphological optimized sequential linear iterative clustering (HMSLIC) for sequence image oversegmentation is then proposed to obtain superpixel blocks. The adaptive weight coefficient is then constructed to calculate the distance required between superpixels to achieve precise lung nodules positioning and to obtain the subsequent clustering starting block. Moreover, by fitting the distance and detecting the change in slope, an accurate clustering threshold is obtained. Thereafter, a fast DBSCAN superpixel sequence clustering algorithm, which is optimized by the strategy of only clustering the lung nodules and adaptive threshold, is then used to obtain lung nodule mask sequences. Finally, the lung nodule image sequences are obtained. The experimental results show that our method rapidly, completely and accurately segments various types of lung nodule image sequences. PMID:28880916

  3. Semantic distance-based creation of clusters of pharmacovigilance terms and their evaluation.

    PubMed

    Dupuch, Marie; Grabar, Natalia

    2015-04-01

    Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs or biologics. The detection of adverse drug reactions is performed using statistical algorithms and groupings of ADR terms from the MedDRA (Medical Dictionary for Drug Regulatory Activities) terminology. Standardized MedDRA Queries (SMQs) are the groupings which become a standard for assisting the retrieval and evaluation of MedDRA-coded ADR reports worldwide. Currently 84 SMQs have been created, while several important safety topics are not yet covered. Creation of SMQs is a long and tedious process performed by the experts. It relies on manual analysis of MedDRA in order to find out all the relevant terms to be included in a SMQ. Our objective is to propose an automatic method for assisting the creation of SMQs using the clustering of terms which are semantically similar. The experimental method relies on a specific semantic resource, and also on the semantic distance algorithms and clustering approaches. We perform several experiments in order to define the optimal parameters. Our results show that the proposed method can assist the creation of SMQs and make this process faster and systematic. The average performance of the method is precision 59% and recall 26%. The correlation of the results obtained is 0.72 against the medical doctors judgments and 0.78 against the medical coders judgments. These results and additional evaluation indicate that the generated clusters can be efficiently used for the detection of pharmacovigilance signals, as they provide better signal detection than the existing SMQs. Copyright © 2014. Published by Elsevier Inc.

  4. Pattern Activity Clustering and Evaluation (PACE)

    NASA Astrophysics Data System (ADS)

    Blasch, Erik; Banas, Christopher; Paul, Michael; Bussjager, Becky; Seetharaman, Guna

    2012-06-01

    With the vast amount of network information available on activities of people (i.e. motions, transportation routes, and site visits) there is a need to explore the salient properties of data that detect and discriminate the behavior of individuals. Recent machine learning approaches include methods of data mining, statistical analysis, clustering, and estimation that support activity-based intelligence. We seek to explore contemporary methods in activity analysis using machine learning techniques that discover and characterize behaviors that enable grouping, anomaly detection, and adversarial intent prediction. To evaluate these methods, we describe the mathematics and potential information theory metrics to characterize behavior. A scenario is presented to demonstrate the concept and metrics that could be useful for layered sensing behavior pattern learning and analysis. We leverage work on group tracking, learning and clustering approaches; as well as utilize information theoretical metrics for classification, behavioral and event pattern recognition, and activity and entity analysis. The performance evaluation of activity analysis supports high-level information fusion of user alerts, data queries and sensor management for data extraction, relations discovery, and situation analysis of existing data.

  5. Whole-Genome Sequencing of Recent Listeria monocytogenes Isolates from Germany Reveals Population Structure and Disease Clusters.

    PubMed

    Halbedel, Sven; Prager, Rita; Fuchs, Stephan; Trost, Eva; Werner, Guido; Flieger, Antje

    2018-06-01

    Listeria monocytogenes causes foodborne outbreaks with high mortality. For improvement of outbreak cluster detection, the German consiliary laboratory for listeriosis implemented whole-genome sequencing (WGS) in 2015. A total of 424 human L. monocytogenes isolates collected in 2007 to 2017 were subjected to WGS and core-genome multilocus sequence typing (cgMLST). cgMLST grouped the isolates into 38 complexes, reflecting 4 known and 34 unknown disease clusters. Most of these complexes were confirmed by single nucleotide polymorphism (SNP) calling, but some were further differentiated. Interestingly, several cgMLST cluster types were further subtyped by pulsed-field gel electrophoresis, partly due to phage insertions in the accessory genome. Our results highlight the usefulness of cgMLST for routine cluster detection but also show that cgMLST complexes require validation by methods providing higher typing resolution. Twelve cgMLST clusters included recent cases, suggesting activity of the source. Therefore, the cgMLST nomenclature data presented here may support future public health actions. Copyright © 2018 American Society for Microbiology.

  6. Identifying sighting clusters of endangered taxa with historical records.

    PubMed

    Duffy, Karl J

    2011-04-01

    The probability and time of extinction of taxa is often inferred from statistical analyses of historical records. Many of these analyses require the exclusion of multiple records within a unit of time (i.e., a month or a year). Nevertheless, spatially explicit, temporally aggregated data may be useful for identifying clusters of sightings (i.e., sighting clusters) in space and time. Identification of sighting clusters highlights changes in the historical recording of endangered taxa. I used two methods to identify sighting clusters in historical records: the Ederer-Myers-Mantel (EMM) test and the space-time permutation scan (STPS). I applied these methods to the spatially explicit sighting records of three species of orchids that are listed as endangered in the Republic of Ireland under the Wildlife Act (1976): Cephalanthera longifolia, Hammarbya paludosa, and Pseudorchis albida. Results with the EMM test were strongly affected by the choice of the time interval, and thus the number of temporal samples, used to examine the records. For example, sightings of P. albida clustered when the records were partitioned into 20-year temporal samples, but not when they were partitioned into 22-year temporal samples. Because the statistical power of EMM was low, it will not be useful when data are sparse. Nevertheless, the STPS identified regions that contained sighting clusters because it uses a flexible scanning window (defined by cylinders of varying size that move over the study area and evaluate the likelihood of clustering) to detect them, and it identified regions with high and regions with low rates of orchid sightings. The STPS analyses can be used to detect sighting clusters of endangered species that may be related to regions of extirpation and may assist in the categorization of threat status. ©2010 Society for Conservation Biology.

  7. MRI-alone radiation therapy planning for prostate cancer: Automatic fiducial marker detection.

    PubMed

    Ghose, Soumya; Mitra, Jhimli; Rivest-Hénault, David; Fazlollahi, Amir; Stanwell, Peter; Pichler, Peter; Sun, Jidi; Fripp, Jurgen; Greer, Peter B; Dowling, Jason A

    2016-05-01

    The feasibility of radiation therapy treatment planning using substitute computed tomography (sCT) generated from magnetic resonance images (MRIs) has been demonstrated by a number of research groups. One challenge with an MRI-alone workflow is the accurate identification of intraprostatic gold fiducial markers, which are frequently used for prostate localization prior to each dose delivery fraction. This paper investigates a template-matching approach for the detection of these seeds in MRI. Two different gradient echo T1 and T2* weighted MRI sequences were acquired from fifteen prostate cancer patients and evaluated for seed detection. For training, seed templates from manual contours were selected in a spectral clustering manifold learning framework. This aids in clustering "similar" gold fiducial markers together. The marker with the minimum distance to a cluster centroid was selected as the representative template of that cluster during training. During testing, Gaussian mixture modeling followed by a Markovian model was used in automatic detection of the probable candidates. The probable candidates were rigidly registered to the templates identified from spectral clustering, and a similarity metric is computed for ranking and detection. A fiducial detection accuracy of 95% was obtained compared to manual observations. Expert radiation therapist observers were able to correctly identify all three implanted seeds on 11 of the 15 scans (the proposed method correctly identified all seeds on 10 of the 15). An novel automatic framework for gold fiducial marker detection in MRI is proposed and evaluated with detection accuracies comparable to manual detection. When radiation therapists are unable to determine the seed location in MRI, they refer back to the planning CT (only available in the existing clinical framework); similarly, an automatic quality control is built into the automatic software to ensure that all gold seeds are either correctly detected or a warning is raised for further manual intervention.

  8. BP network identification technology of infrared polarization based on fuzzy c-means clustering

    NASA Astrophysics Data System (ADS)

    Zeng, Haifang; Gu, Guohua; He, Weiji; Chen, Qian; Yang, Wei

    2011-08-01

    Infrared detection system is frequently employed on surveillance operations and reconnaissance mission to detect particular targets of interest in both civilian and military communities. By incorporating the polarization of light as supplementary information, the target discrimination performance could be enhanced. So this paper proposed an infrared target identification method which is based on fuzzy theory and neural network with polarization properties of targets. The paper utilizes polarization degree and light intensity to advance the unsupervised KFCM (kernel fuzzy C-Means) clustering method. And establish different material pol1arization properties database. In the built network, the system can feedback output corresponding material types of probability distribution toward any input polarized degree such as 10° 15°, 20°, 25°, 30°. KFCM, which has stronger robustness and accuracy than FCM, introduces kernel idea and gives the noise points and invalid value different but intuitively reasonable weights. Because of differences in characterization of material properties, there will be some conflicts in classification results. And D - S evidence theory was used in the combination of the polarization and intensity information. Related results show KFCM clustering precision and operation rate are higher than that of the FCM clustering method. The artificial neural network method realizes material identification, which reasonable solved the problems of complexity in environmental information of infrared polarization, and improperness of background knowledge and inference rule. This method of polarization identification is fast in speed, good in self-adaption and high in resolution.

  9. Genetic diversity of K-antigen gene clusters of Escherichia coli and their molecular typing using a suspension array.

    PubMed

    Yang, Shuang; Xi, Daoyi; Jing, Fuyi; Kong, Deju; Wu, Junli; Feng, Lu; Cao, Boyang; Wang, Lei

    2018-04-01

    Capsular polysaccharides (CPSs), or K-antigens, are the major surface antigens of Escherichia coli. More than 80 serologically unique K-antigens are classified into 4 groups (Groups 1-4) of capsules. Groups 1 and 4 contain the Wzy-dependent polymerization pathway and the gene clusters are in the order galF to gnd; Groups 2 and 3 contain the ABC-transporter-dependent pathway and the gene clusters consist of 3 regions, regions 1, 2 and 3. Little is known about the variations among the gene clusters. In this study, 9 serotypes of K-antigen gene clusters (K2ab, K11, K20, K24, K38, K84, K92, K96, and K102) were sequenced and correlated with their CPS chemical structures. On the basis of sequence data, a K-antigen-specific suspension array that detects 10 distinct CPSs, including the above 9 CPSs plus K30, was developed. This is the first report to catalog the genetic features of E. coli K-antigen variations and to develop a suspension array for their molecular typing. The method has a number of advantages over traditional bacteriophage and serum agglutination methods and lays the foundation for straightforward identification and detection of additional K-antigens in the future.

  10. Direct Standard-Free Quantitation of Tamiflu® and Other Pharmaceutical Tablets using Clustering Agents with Electrospray Ionization Mass Spectrometry

    PubMed Central

    Flick, Tawnya G.; Leib, Ryan D.; Williams, Evan R.

    2010-01-01

    Accurate and rapid quantitation is advantageous to identify counterfeit and substandard pharmaceutical drugs. A standard-free electrospray ionization mass spectrometry method is used to directly determine the dosage in the prescription and over-the-counter drugs, Tamiflu®, Sudafed®, and Dramamine®. A tablet of each drug was dissolved in aqueous solution, filtered, and introduced into solutions containing a known concentration of either L-tryptophan, L-phenylalanine or prednisone as clustering agents. The active ingredient(s) incorporates statistically into large clusters of the clustering agent where effects of differential ionization/detection are substantially reduced. From the abundances of large clusters, the dosages of the active ingredients in each of the tablets were determined to typically better than 20% accuracy even when the ionization/detection efficiency of the individual components differed by over 100×. Although this unorthodox method for quantitation is not as accurate as using conventional standards, it has the advantages that it is fast, it can be applied to mixtures where the identities of the analytes are unknown, and it can be used when suitable standards may not be readily available, such as schedule I or II controlled substances or new designer drugs that have not previously been identified. PMID:20092258

  11. Comparative analysis on the selection of number of clusters in community detection

    NASA Astrophysics Data System (ADS)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  12. A novel unsupervised spike sorting algorithm for intracranial EEG.

    PubMed

    Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R

    2011-01-01

    This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.

  13. Clustering Multiple Sclerosis Subgroups with Multifractal Methods and Self-Organizing Map Algorithm

    NASA Astrophysics Data System (ADS)

    Karaca, Yeliz; Cattani, Carlo

    Magnetic resonance imaging (MRI) is the most sensitive method to detect chronic nervous system diseases such as multiple sclerosis (MS). In this paper, Brownian motion Hölder regularity functions (polynomial, periodic (sine), exponential) for 2D image, such as multifractal methods were applied to MR brain images, aiming to easily identify distressed regions, in MS patients. With these regions, we have proposed an MS classification based on the multifractal method by using the Self-Organizing Map (SOM) algorithm. Thus, we obtained a cluster analysis by identifying pixels from distressed regions in MR images through multifractal methods and by diagnosing subgroups of MS patients through artificial neural networks.

  14. New data clustering for RBF classifier of agriculture products from x-ray images

    NASA Astrophysics Data System (ADS)

    Casasent, David P.; Chen, Xuewen

    1999-08-01

    Classification of real-time x-ray images of randomly oriented touching pistachio nuts is discussed. The ultimate objective is the development of a subsystem for automated non-invasive detection of defective product items on a conveyor belt. We discuss the use of clustering and how it is vital to achieve useful classification. New clustering methods using class identify and new cluster classes are advanced and shown to be of use for this application. Radial basis function neural net classifiers are emphasized. We expect our results to be of use for other classifiers and applications.

  15. Temporal Data-Driven Sleep Scheduling and Spatial Data-Driven Anomaly Detection for Clustered Wireless Sensor Networks

    PubMed Central

    Li, Gang; He, Bin; Huang, Hongwei; Tang, Limin

    2016-01-01

    The spatial–temporal correlation is an important feature of sensor data in wireless sensor networks (WSNs). Most of the existing works based on the spatial–temporal correlation can be divided into two parts: redundancy reduction and anomaly detection. These two parts are pursued separately in existing works. In this work, the combination of temporal data-driven sleep scheduling (TDSS) and spatial data-driven anomaly detection is proposed, where TDSS can reduce data redundancy. The TDSS model is inspired by transmission control protocol (TCP) congestion control. Based on long and linear cluster structure in the tunnel monitoring system, cooperative TDSS and spatial data-driven anomaly detection are then proposed. To realize synchronous acquisition in the same ring for analyzing the situation of every ring, TDSS is implemented in a cooperative way in the cluster. To keep the precision of sensor data, spatial data-driven anomaly detection based on the spatial correlation and Kriging method is realized to generate an anomaly indicator. The experiment results show that cooperative TDSS can realize non-uniform sensing effectively to reduce the energy consumption. In addition, spatial data-driven anomaly detection is quite significant for maintaining and improving the precision of sensor data. PMID:27690035

  16. Least squares regression methods for clustered ROC data with discrete covariates.

    PubMed

    Tang, Liansheng Larry; Zhang, Wei; Li, Qizhai; Ye, Xuan; Chan, Leighton

    2016-07-01

    The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Exploring the galaxy cluster-group transition regime at high redshifts. Physical properties of two newly detected z > 1 systems

    NASA Astrophysics Data System (ADS)

    Šuhada, R.; Fassbender, R.; Nastasi, A.; Böhringer, H.; de Hoon, A.; Pierini, D.; Santos, J. S.; Rosati, P.; Mühlegger, M.; Quintana, H.; Schwope, A. D.; Lamer, G.; Kohnert, J.; Pratt, G. W.

    2011-06-01

    Context. Multi-wavelength surveys for clusters of galaxies are opening a window on the elusive high-redshift (z > 1) cluster population. Well controlled statistical samples of distant clusters will enable us to answer questions about their cosmological context, early assembly phases and the thermodynamical evolution of the intracluster medium. Aims: We report on the detection of two z > 1 systems, XMMU J0302.2-0001 and XMMU J1532.2-0836, as part of the XMM-Newton Distant Cluster Project (XDCP) sample. We investigate the nature of the sources, measure their spectroscopic redshift and determine their basic physical parameters. Methods: The results of the present paper are based on the analysis of XMM-Newton archival data, optical/near-infrared imaging and deep optical follow-up spectroscopy of the clusters. Results: We confirm the X-ray source XMMU J0302.2-0001 as a gravitationally bound, bona fide cluster of galaxies at spectroscopic redshift z = 1.185. We estimate its M500 mass to (1.6 ± 0.3) × 1014 M⊙ from its measured X-ray luminosity. This ranks the cluster among intermediate mass system. In the case of XMMU J1532.2-0836 we find the X-ray detection to be coincident with a dynamically bound system of galaxies at z = 1.358. Optical spectroscopy reveals the presence of a central active galactic nucleus, which can be a dominant source of the detected X-ray emission from this system. We provide upper limits of X-ray parameters for the system and discuss cluster identification challenges in the high-redshift low-mass cluster regime. A third, intermediate redshift (z = 0.647) cluster, XMMU J0302.1-0000, is serendipitously detected in the same field as XMMU J0302.2-0001. We provide its analysis as well. Based on observations obtained with ESO Telescopes at the Paranal Observatory under program ID 080.A-0659 and 081.A-0312, observations collected at the Centro Astrnómico Hispano Alemán (CAHA) at Calar Alto, Spain operated jointly by the Max-Planck Institut für Astronomie and the Instituto de Astrofísica de Andalucía (CSIC). X-ray observations were obtained by XMM-Newton.

  18. Surface Enhanced Raman Spectroscopy (SERS) for the Multiplex Detection of Braf, Kras, and Pik3ca Mutations in Plasma of Colorectal Cancer Patients

    PubMed Central

    Li, Xiaozhou; Yang, Tianyue; Li, Caesar Siqi; Song, Youtao; Lou, Hong; Guan, Dagang; Jin, Lili

    2018-01-01

    In this paper, we discuss the use of a procedure based on polymerase chain reaction (PCR) and surface enhanced Raman spectroscopy (SERS) (PCR-SERS) to detect DNA mutations. Methods: This method was implemented by first amplifying DNA-containing target mutations, then by annealing probes, and finally by applying SERS detection. The obtained SERS spectra were from a mixture of fluorescence tags labeled to complementary sequences on the mutant DNA. Then, the SERS spectra of multiple tags were decomposed to component tag spectra by multiple linear regression (MLR). Results: The detection limit was 10-11 M with a coefficient of determination (R2) of 0.88. To demonstrate the applicability of this process on real samples, the PCR-SERS method was applied on blood plasma taken from 49 colorectal cancer patients to detect six mutations located at the BRAF, KRAS, and PIK3CA genes. The mutation rates obtained by the PCR-SERS method were in concordance with previous research. Fisher's exact test showed that only two detected mutations at BRAF (V600E) and PIK3CA (E542K) were significantly positively correlated with right-sided colon cancer. No other clinical feature such as gender, age, cancer stage, or differentiation was correlated with mutation (V600E at BRAF, G12C, G12D, G12V, G13D at KRAS, and E542K at PIK3CA). Visually, a dendrogram drawn through hierarchical clustering analysis (HCA) supported the results of Fisher's exact test. The clusters drawn by all six mutations did not conform to the distributions of cancer stages, differentiation or cancer positions. However, the cluster drawn by the two mutations of V600E and E542K showed that all samples with those mutations belonged to the right-sided colon cancer group. Conclusion: The suggested PCR-SERS method is multiplexed, flexible in probe design, easy to incorporate into existing PCR conditions, and was sensitive enough to detect mutations in blood plasma. PMID:29556349

  19. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data.

    PubMed

    Oluwadare, Oluwatosin; Cheng, Jianlin

    2017-11-14

    With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .

  20. A Review of Subsequence Time Series Clustering

    PubMed Central

    Teh, Ying Wah

    2014-01-01

    Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies. PMID:25140332

  1. A review of subsequence time series clustering.

    PubMed

    Zolhavarieh, Seyedjamal; Aghabozorgi, Saeed; Teh, Ying Wah

    2014-01-01

    Clustering of subsequence time series remains an open issue in time series clustering. Subsequence time series clustering is used in different fields, such as e-commerce, outlier detection, speech recognition, biological systems, DNA recognition, and text mining. One of the useful fields in the domain of subsequence time series clustering is pattern recognition. To improve this field, a sequence of time series data is used. This paper reviews some definitions and backgrounds related to subsequence time series clustering. The categorization of the literature reviews is divided into three groups: preproof, interproof, and postproof period. Moreover, various state-of-the-art approaches in performing subsequence time series clustering are discussed under each of the following categories. The strengths and weaknesses of the employed methods are evaluated as potential issues for future studies.

  2. Spot detection and image segmentation in DNA microarray data.

    PubMed

    Qin, Li; Rueda, Luis; Ali, Adnan; Ngom, Alioune

    2005-01-01

    Following the invention of microarrays in 1994, the development and applications of this technology have grown exponentially. The numerous applications of microarray technology include clinical diagnosis and treatment, drug design and discovery, tumour detection, and environmental health research. One of the key issues in the experimental approaches utilising microarrays is to extract quantitative information from the spots, which represent genes in a given experiment. For this process, the initial stages are important and they influence future steps in the analysis. Identifying the spots and separating the background from the foreground is a fundamental problem in DNA microarray data analysis. In this review, we present an overview of state-of-the-art methods for microarray image segmentation. We discuss the foundations of the circle-shaped approach, adaptive shape segmentation, histogram-based methods and the recently introduced clustering-based techniques. We analytically show that clustering-based techniques are equivalent to the one-dimensional, standard k-means clustering algorithm that utilises the Euclidean distance.

  3. Evaluation of null-point detection methods on simulation data

    NASA Astrophysics Data System (ADS)

    Olshevsky, Vyacheslav; Fu, Huishan; Vaivads, Andris; Khotyaintsev, Yuri; Lapenta, Giovanni; Markidis, Stefano

    2014-05-01

    We model the measurements of artificial spacecraft that resemble the configuration of CLUSTER propagating in the particle-in-cell simulation of turbulent magnetic reconnection. The simulation domain contains multiple isolated X-type null-points, but the majority are O-type null-points. Simulations show that current pinches surrounded by twisted fields, analogous to laboratory pinches, are formed along the sequences of O-type nulls. In the simulation, the magnetic reconnection is mainly driven by the kinking of the pinches, at spatial scales of several ion inertial lentghs. We compute the locations of magnetic null-points and detect their type. When the satellites are separated by the fractions of ion inertial length, as it is for CLUSTER, they are able to locate both the isolated null-points, and the pinches. We apply the method to the real CLUSTER data and speculate how common are pinches in the magnetosphere, and whether they play a dominant role in the dissipation of magnetic energy.

  4. Weakly supervised image semantic segmentation based on clustering superpixels

    NASA Astrophysics Data System (ADS)

    Yan, Xiong; Liu, Xiaohua

    2018-04-01

    In this paper, we propose an image semantic segmentation model which is trained from image-level labeled images. The proposed model starts with superpixel segmenting, and features of the superpixels are extracted by trained CNN. We introduce a superpixel-based graph followed by applying the graph partition method to group correlated superpixels into clusters. For the acquisition of inter-label correlations between the image-level labels in dataset, we not only utilize label co-occurrence statistics but also exploit visual contextual cues simultaneously. At last, we formulate the task of mapping appropriate image-level labels to the detected clusters as a problem of convex minimization. Experimental results on MSRC-21 dataset and LableMe dataset show that the proposed method has a better performance than most of the weakly supervised methods and is even comparable to fully supervised methods.

  5. Cancer detection based on Raman spectra super-paramagnetic clustering

    NASA Astrophysics Data System (ADS)

    González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual

    2016-08-01

    The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.

  6. Method for Continuous Monitoring of Electrospray Ion Formation

    NASA Astrophysics Data System (ADS)

    Metzler, Guille; Crathern, Susan; Bachmann, Lorin; Fernández-Metzler, Carmen; King, Richard

    2017-10-01

    A method for continuously monitoring the performance of electrospray ionization without the addition of hardware or chemistry to the system is demonstrated. In the method, which we refer to as SprayDx, cluster ions with solvent vapor natively formed by electrospray are followed throughout the collection of liquid chromatography-selected reaction monitoring data. The cluster ion extracted ion chromatograms report on the consistency of the ion formation and detection system. The data collected by the SprayDx method resemble the data collected for postcolumn infusion of analyte. The response of the cluster ions monitored reports on changes in the physical parameters of the ion source such as voltage and gas flow. SprayDx is also observed to report on ion suppression in a fashion very similar to a postcolumn infusion of analyte. We anticipate the method finding utility as a continuous readout on the performance of electrospray and other atmospheric pressure ionization processes. [Figure not available: see fulltext.

  7. Predicting overlapping protein complexes from weighted protein interaction graphs by gradually expanding dense neighborhoods.

    PubMed

    Dimitrakopoulos, Christos; Theofilatos, Konstantinos; Pegkas, Andreas; Likothanassis, Spiros; Mavroudi, Seferina

    2016-07-01

    Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters

    PubMed Central

    2010-01-01

    Background Irregularly shaped spatial clusters are difficult to delineate. A cluster found by an algorithm often spreads through large portions of the map, impacting its geographical meaning. Penalized likelihood methods for Kulldorff's spatial scan statistics have been used to control the excessive freedom of the shape of clusters. Penalty functions based on cluster geometry and non-connectivity have been proposed recently. Another approach involves the use of a multi-objective algorithm to maximize two objectives: the spatial scan statistics and the geometric penalty function. Results & Discussion We present a novel scan statistic algorithm employing a function based on the graph topology to penalize the presence of under-populated disconnection nodes in candidate clusters, the disconnection nodes cohesion function. A disconnection node is defined as a region within a cluster, such that its removal disconnects the cluster. By applying this function, the most geographically meaningful clusters are sifted through the immense set of possible irregularly shaped candidate cluster solutions. To evaluate the statistical significance of solutions for multi-objective scans, a statistical approach based on the concept of attainment function is used. In this paper we compared different penalized likelihoods employing the geometric and non-connectivity regularity functions and the novel disconnection nodes cohesion function. We also build multi-objective scans using those three functions and compare them with the previous penalized likelihood scans. An application is presented using comprehensive state-wide data for Chagas' disease in puerperal women in Minas Gerais state, Brazil. Conclusions We show that, compared to the other single-objective algorithms, multi-objective scans present better performance, regarding power, sensitivity and positive predicted value. The multi-objective non-connectivity scan is faster and better suited for the detection of moderately irregularly shaped clusters. The multi-objective cohesion scan is most effective for the detection of highly irregularly shaped clusters. PMID:21034451

  9. The Monitor project: searching for occultations in young open clusters

    NASA Astrophysics Data System (ADS)

    Aigrain, S.; Hodgkin, S.; Irwin, J.; Hebb, L.; Irwin, M.; Favata, F.; Moraux, E.; Pont, F.

    2007-02-01

    The Monitor project is a photometric monitoring survey of nine young (1-200Myr) clusters in the solar neighbourhood to search for eclipses by very low mass stars and brown dwarfs and for planetary transits in the light curves of cluster members. It began in the autumn of 2004 and uses several 2- to 4-m telescopes worldwide. We aim to calibrate the relation between age, mass, radius and where possible luminosity, from the K dwarf to the planet regime, in an age range where constraints on evolutionary models are currently very scarce. Any detection of an exoplanet in one of our youngest targets (<~10Myr) would also provide important constraints on planet formation and migration time-scales and their relation to protoplanetary disc lifetimes. Finally, we will use the light curves of cluster members to study rotation and flaring in low-mass pre-main-sequence stars. The present paper details the motivation, science goals and observing strategy of the survey. We present a method to estimate the sensitivity and number of detections expected in each cluster, using a simple semi-analytic approach which takes into account the characteristics of the cluster and photometric observations, using (tunable) best-guess assumptions for the incidence and parameter distribution of putative companions, and we incorporate the limits imposed by radial velocity follow-up from medium and large telescopes. We use these calculations to show that the survey as a whole can be expected to detect over 100 young low and very low mass eclipsing binaries, and ~3 transiting planets with radial velocity signatures detectable with currently available facilities.

  10. Spatial heterogeneity of type I error for local cluster detection tests

    PubMed Central

    2014-01-01

    Background Just as power, type I error of cluster detection tests (CDTs) should be spatially assessed. Indeed, CDTs’ type I error and power have both a spatial component as CDTs both detect and locate clusters. In the case of type I error, the spatial distribution of wrongly detected clusters (WDCs) can be particularly affected by edge effect. This simulation study aims to describe the spatial distribution of WDCs and to confirm and quantify the presence of edge effect. Methods A simulation of 40 000 datasets has been performed under the null hypothesis of risk homogeneity. The simulation design used realistic parameters from survey data on birth defects, and in particular, two baseline risks. The simulated datasets were analyzed using the Kulldorff’s spatial scan as a commonly used test whose behavior is otherwise well known. To describe the spatial distribution of type I error, we defined the participation rate for each spatial unit of the region. We used this indicator in a new statistical test proposed to confirm, as well as quantify, the edge effect. Results The predefined type I error of 5% was respected for both baseline risks. Results showed strong edge effect in participation rates, with a descending gradient from center to edge, and WDCs more often centrally situated. Conclusions In routine analysis of real data, clusters on the edge of the region should be carefully considered as they rarely occur when there is no cluster. Further work is needed to combine results from power studies with this work in order to optimize CDTs performance. PMID:24885343

  11. Intracluster light in clusters of galaxies at redshifts 0.4 < z < 0.8

    NASA Astrophysics Data System (ADS)

    Guennou, L.; Adami, C.; Da Rocha, C.; Durret, F.; Ulmer, M. P.; Allam, S.; Basa, S.; Benoist, C.; Biviano, A.; Clowe, D.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Johnston, D.; Just, D.; Kron, R.; Kubo, J. M.; Le Brun, V.; Marshall, P.; Mazure, A.; Murphy, K. J.; Pereira, D. N. E.; Rabaça, C. R.; Rostagni, F.; Rudnick, G.; Russeil, D.; Schrabback, T.; Slezak, E.; Tucker, D.; Zaritsky, D.

    2012-01-01

    Context. The study of intracluster light (ICL) can help us to understand the mechanisms taking place in galaxy clusters, and to place constraints on the cluster formation history and physical properties. However, owing to the intrinsic faintness of ICL emission, most searches and detailed studies of ICL have been limited to redshifts z < 0.4. Aims: To help us extend our knowledge of ICL properties to higher redshifts and study the evolution of ICL with redshift, we search for ICL in a subsample of ten clusters detected by the ESO Distant Cluster Survey (EDisCS), at redshifts 0.4 < z < 0.8, that are also part of our DAFT/FADA Survey. Methods: We analyze the ICL by applying the OV WAV package, a wavelet-based technique, to deep HST ACS images in the F814W filter and to V-band VLT/FORS2 images of three clusters. Detection levels are assessed as a function of the diffuse light source surface brightness using simulations. Results: In the F814W filter images, we detect diffuse light sources in all the clusters, with typical sizes of a few tens of kpc (assuming that they are at the cluster redshifts). The ICL detected by stacking the ten F814W images shows an 8σ detection in the source center extending over a ~50 × 50 kpc2 area, with a total absolute magnitude of -21.6 in the F814W filter, equivalent to about two L∗ galaxies per cluster. We find a weak correlation between the total F814W absolute magnitude of the ICL and the cluster velocity dispersion and mass. There is no apparent correlation between the cluster mass-to-light ratio (M/L) and the amount of ICL, and no evidence of any preferential orientation in the ICL source distribution. We find no strong variation in the amount of ICL between z = 0 and z = 0.8. In addition, we find wavelet-detected compact objects (WDCOs) in the three clusters for which data in two bands are available; these objects are probably very faint compact galaxies that in some cases are members of the respective clusters and comparable to the faint dwarf galaxies of the Local Group. Conclusions: We show that the ICL is prevalent in clusters at least up to redshift z = 0.8. In the future, we propose to detect the ICL at even higher redshifts, to determine wether there is a particular stage of cluster evolution where it was stripped from galaxies and spread into the intracluster medium. Based on observations made at ESO Telescopes at the Paranal Observatory under programme ID 082.A-0374. Also based on the use of the NASA/IPAC Extragalactic Database (NED) which is operated by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. Based on observations made with the NASA/ESA Hubble Space Telescope, obtained from the data archives at the Space Telescope European Coordinating Facility and the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555.

  12. SAR image segmentation using skeleton-based fuzzy clustering

    NASA Astrophysics Data System (ADS)

    Cao, Yun Yi; Chen, Yan Qiu

    2003-06-01

    SAR image segmentation can be converted to a clustering problem in which pixels or small patches are grouped together based on local feature information. In this paper, we present a novel framework for segmentation. The segmentation goal is achieved by unsupervised clustering upon characteristic descriptors extracted from local patches. The mixture model of characteristic descriptor, which combines intensity and texture feature, is investigated. The unsupervised algorithm is derived from the recently proposed Skeleton-Based Data Labeling method. Skeletons are constructed as prototypes of clusters to represent arbitrary latent structures in image data. Segmentation using Skeleton-Based Fuzzy Clustering is able to detect the types of surfaces appeared in SAR images automatically without any user input.

  13. Detecting Corresponding Vertex Pairs between Planar Tessellation Datasets with Agglomerative Hierarchical Cell-Set Matching.

    PubMed

    Huh, Yong; Yu, Kiyun; Park, Woojin

    2016-01-01

    This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48.

  14. A fully automatic microcalcification detection approach based on deep convolution neural network

    NASA Astrophysics Data System (ADS)

    Cai, Guanxiong; Guo, Yanhui; Zhang, Yaqin; Qin, Genggeng; Zhou, Yuanpin; Lu, Yao

    2018-02-01

    Breast cancer is one of the most common cancers and has high morbidity and mortality worldwide, posing a serious threat to the health of human beings. The emergence of microcalcifications (MCs) is an important signal of early breast cancer. However, it is still challenging and time consuming for radiologists to identify some tiny and subtle individual MCs in mammograms. This study proposed a novel computer-aided MC detection algorithm on the full field digital mammograms (FFDMs) using deep convolution neural network (DCNN). Firstly, a MC candidate detection system was used to obtain potential MC candidates. Then a DCNN was trained using a novel adaptive learning strategy, neutrosophic reinforcement sample learning (NRSL) strategy to speed up the learning process. The trained DCNN served to recognize true MCs. After been classified by DCNN, a density-based regional clustering method was imposed to form MC clusters. The accuracy of the DCNN with our proposed NRSL strategy converges faster and goes higher than the traditional DCNN at same epochs, and the obtained an accuracy of 99.87% on training set, 95.12% on validation set, and 93.68% on testing set at epoch 40. For cluster-based MC cluster detection evaluation, a sensitivity of 90% was achieved at 0.13 false positives (FPs) per image. The obtained results demonstrate that the designed DCNN plays a significant role in the MC detection after being prior trained.

  15. Degree-based statistic and center persistency for brain connectivity analysis.

    PubMed

    Yoo, Kwangsun; Lee, Peter; Chung, Moo K; Sohn, William S; Chung, Sun Ju; Na, Duk L; Ju, Daheen; Jeong, Yong

    2017-01-01

    Brain connectivity analyses have been widely performed to investigate the organization and functioning of the brain, or to observe changes in neurological or psychiatric conditions. However, connectivity analysis inevitably introduces the problem of mass-univariate hypothesis testing. Although, several cluster-wise correction methods have been suggested to address this problem and shown to provide high sensitivity, these approaches fundamentally have two drawbacks: the lack of spatial specificity (localization power) and the arbitrariness of an initial cluster-forming threshold. In this study, we propose a novel method, degree-based statistic (DBS), performing cluster-wise inference. DBS is designed to overcome the above-mentioned two shortcomings. From a network perspective, a few brain regions are of critical importance and considered to play pivotal roles in network integration. Regarding this notion, DBS defines a cluster as a set of edges of which one ending node is shared. This definition enables the efficient detection of clusters and their center nodes. Furthermore, a new measure of a cluster, center persistency (CP) was introduced. The efficiency of DBS with a known "ground truth" simulation was demonstrated. Then they applied DBS to two experimental datasets and showed that DBS successfully detects the persistent clusters. In conclusion, by adopting a graph theoretical concept of degrees and borrowing the concept of persistence from algebraic topology, DBS could sensitively identify clusters with centric nodes that would play pivotal roles in an effect of interest. DBS is potentially widely applicable to variable cognitive or clinical situations and allows us to obtain statistically reliable and easily interpretable results. Hum Brain Mapp 38:165-181, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  16. Overlapping communities from dense disjoint and high total degree clusters

    NASA Astrophysics Data System (ADS)

    Zhang, Hongli; Gao, Yang; Zhang, Yue

    2018-04-01

    Community plays an important role in the field of sociology, biology and especially in domains of computer science, where systems are often represented as networks. And community detection is of great importance in the domains. A community is a dense subgraph of the whole graph with more links between its members than between its members to the outside nodes, and nodes in the same community probably share common properties or play similar roles in the graph. Communities overlap when nodes in a graph belong to multiple communities. A vast variety of overlapping community detection methods have been proposed in the literature, and the local expansion method is one of the most successful techniques dealing with large networks. The paper presents a density-based seeding method, in which dense disjoint local clusters are searched and selected as seeds. The proposed method selects a seed by the total degree and density of local clusters utilizing merely local structures of the network. Furthermore, this paper proposes a novel community refining phase via minimizing the conductance of each community, through which the quality of identified communities is largely improved in linear time. Experimental results in synthetic networks show that the proposed seeding method outperforms other seeding methods in the state of the art and the proposed refining method largely enhances the quality of the identified communities. Experimental results in real graphs with ground-truth communities show that the proposed approach outperforms other state of the art overlapping community detection algorithms, in particular, it is more than two orders of magnitude faster than the existing global algorithms with higher quality, and it obtains much more accurate community structure than the current local algorithms without any priori information.

  17. Globular clusters and environmental effects in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Sales, Laura

    2016-10-01

    Globular clusters are old compact stellar systems orbiting around galaxies of all types. Tens of thousands of them can also be found populating the intra-cluster regions of nearby galaxy clusters like Virgo and Coma. Thanks to the HST Frontier Fields program, GCs are starting now to be detected also in intermediate redshift clusters. Yet, despite their ubiquity, a theoretical model for the formation and evolution of GCs is still missing, especially within the cosmological context.Here we propose to use cosmological hydrodynamical simulations of 18 galaxy clusters coupled to a post-processing GC formation model to explore the assembly of galaxies in clusters together with their expected GC population. The method, which has already been implemented and tested, will allow us to characterize for the first time the number, radial distribution and kinematics of GCs in clusters, with products directly comparable to observational maps. We will explore cluster-to-cluster variations and also characterize the build up of the intra-cluster component of GCs with time.As the method relies on a detailed study of the star-formation history of galaxies, we will jointly constrain the predicted quenching time-scales for satellites and the occurrence of starburst events associated to infall and orbital pericenters of galaxies in massive clusters. This will inform further studies on the distribution, velocity and properties of post-starburst galaxies in past, ongoing and future HST programs.

  18. Low-dimensional clustering reveals new influenza strains before they become dominant

    NASA Astrophysics Data System (ADS)

    He, Jiankui; Deem, Michael

    2010-03-01

    Influenza A virus has been circulating in the human population and has caused three pandemics in the last century (1918 H1N1, 1957 H2N2, 1968 H3N2). The newly appeared 2009 A(H1N1) has been classified by the World Health Organization (WHO) as the fourth pandemic virus strain. We here consider an approach for early detection of new dominant strains. We first construct a network model and apply it to the evolution of the 2009 A(H1N1) virus. By clustering the sequence data, we found two main clusters. We then define a metric to detect the emergence of dominant strains. We show on historical H3N2 data that this method is able to find a cluster around an incipient dominant strain before it becomes dominant. For example, for H3N2 as of 30 March 2009, we see the cluster for the new A/BritishColumbia/RV1222/2009 strain. Turning to H1N1 and the 2009 A(H1N1), we do not see evidence for antigenically novel 2009 A(H1N1) strains as of August 2009. This strain detection tool combines a projection operator with a density estimation.

  19. Weak-lensing detection of intracluster filaments with ground-based data

    NASA Astrophysics Data System (ADS)

    Maturi, Matteo; Merten, Julian

    2013-11-01

    According to the current standard model of cosmology, matter in the Universe arranges itself along a network of filamentary structure. These filaments connect the main nodes of this so-called "cosmic web", which are clusters of galaxies. Although its large-scale distribution is clearly characterized by numerical simulations, constraining the dark-matter content of the cosmic web in reality turns out to be difficult. The natural method of choice is gravitational lensing. However, the direct detection and mapping of the elusive filament signal is challenging and in this work we present two methods that are specifically tailored to achieve this task. A linear matched filter aims at detecting the smooth mass-component of filaments and is optimized to perform a shear decomposition that follows the anisotropic component of the lensing signal. Filaments clearly inherit this property due to their morphology. At the same time, the contamination arising from the central massive cluster is controlled in a natural way. The filament 1σ detection is of about κ ~ 0.01 - 0.005 according to the filter's template width and length, enabling the detection of structures beyond reach with other approaches. The second, complementary method seeks to detect the clumpy component of filaments. The detection is determined by the number density of subclump identifications in an area enclosing the potential filament, as was found within the observed field with the filter approach. We tested both methods against mocked observations based on realistic N-body simulations of filamentary structure and proved the feasibility of detecting filaments with ground-based data.

  20. The Distribution and Ages of Star Clusters in the Small Magellanic Cloud: Constraints on the Interaction History of the Magellanic Clouds

    NASA Astrophysics Data System (ADS)

    Bitsakis, Theodoros; González-Lópezlira, R. A.; Bonfini, P.; Bruzual, G.; Maravelias, G.; Zaritsky, D.; Charlot, S.; Ramírez-Siordia, V. H.

    2018-02-01

    We present a new study of the spatial distribution and ages of the star clusters in the Small Magellanic Cloud (SMC). To detect and estimate the ages of the star clusters we rely on the new fully automated method developed by Bitsakis et al. Our code detects 1319 star clusters in the central 18 deg2 of the SMC we surveyed (1108 of which have never been reported before). The age distribution of those clusters suggests enhanced cluster formation around 240 Myr ago. It also implies significant differences in the cluster distribution of the bar with respect to the rest of the galaxy, with the younger clusters being predominantly located in the bar. Having used the same setup, and data from the same surveys as for our previous study of the LMC, we are able to robustly compare the cluster properties between the two galaxies. Our results suggest that the bulk of the clusters in both galaxies were formed approximately 300 Myr ago, probably during a direct collision between the two galaxies. On the other hand, the locations of the young (≤50 Myr) clusters in both Magellanic Clouds, found where their bars join the H I arms, suggest that cluster formation in those regions is a result of internal dynamical processes. Finally, we discuss the potential causes of the apparent outside-in quenching of cluster formation that we observe in the SMC. Our findings are consistent with an evolutionary scheme where the interactions between the Magellanic Clouds constitute the major mechanism driving their overall evolution.

  1. Temporal Methods to Detect Content-Based Anomalies in Social Media

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skryzalin, Jacek; Field, Jr., Richard; Fisher, Andrew N.

    Here, we develop a method for time-dependent topic tracking and meme trending in social media. Our objective is to identify time periods whose content differs signifcantly from normal, and we utilize two techniques to do so. The first is an information-theoretic analysis of the distributions of terms emitted during different periods of time. In the second, we cluster documents from each time period and analyze the tightness of each clustering. We also discuss a method of combining the scores created by each technique, and we provide ample empirical analysis of our methodology on various Twitter datasets.

  2. Finding approximate gene clusters with Gecko 3.

    PubMed

    Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian

    2016-11-16

    Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. A system for learning statistical motion patterns.

    PubMed

    Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve

    2006-09-01

    Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.

  4. Next-generation sequencing for typing and detection of resistance genes: performance of a new commercial method during an outbreak of extended-spectrum-beta-lactamase-producing Escherichia coli.

    PubMed

    Veenemans, J; Overdevest, I T; Snelders, E; Willemsen, I; Hendriks, Y; Adesokan, A; Doran, G; Bruso, S; Rolfe, A; Pettersson, A; Kluytmans, J A J W

    2014-07-01

    Next-generation sequencing (NGS) has the potential to provide typing results and detect resistance genes in a single assay, thus guiding timely treatment decisions and allowing rapid tracking of transmission of resistant clones. We evaluated the performance of a new NGS assay (Hospital Acquired Infection BioDetection System; Pathogenica) during an outbreak of sequence type 131 (ST131) Escherichia coli infections in a nursing home in The Netherlands. The assay was performed on 56 extended-spectrum-beta-lactamase (ESBL) E. coli isolates collected during 2 prevalence surveys (March and May 2013). Typing results were compared to those of amplified fragment length polymorphism (AFLP), whereby we visually assessed the agreement of the BioDetection phylogenetic tree with clusters defined by AFLP. A microarray was considered the gold standard for detection of resistance genes. AFLP identified a large cluster of 31 indistinguishable isolates on adjacent departments, indicating clonal spread. The BioDetection phylogenetic tree showed that all isolates of this outbreak cluster were strongly related, while the further arrangement of the tree also largely agreed with other clusters defined by AFLP. The BioDetection assay detected ESBL genes in all but 1 isolate (sensitivity, 98%) but was unable to discriminate between ESBL and non-ESBL TEM and SHV beta-lactamases or to specify CTX-M genes by group. The performance of the hospital-acquired infection (HAI) BioDetection System for typing of E. coli isolates compared well with the results of AFLP. Its performance with larger collections from different locations, and for typing of other species, was not evaluated and needs further study. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  5. Detecting Outliers in Factor Analysis Using the Forward Search Algorithm

    ERIC Educational Resources Information Center

    Mavridis, Dimitris; Moustaki, Irini

    2008-01-01

    In this article we extend and implement the forward search algorithm for identifying atypical subjects/observations in factor analysis models. The forward search has been mainly developed for detecting aberrant observations in regression models (Atkinson, 1994) and in multivariate methods such as cluster and discriminant analysis (Atkinson, Riani,…

  6. Hyperspectral remote sensing for advanced detection of early blight (Alternaria solani) disease in potato (Solanum tuberosum) plants

    NASA Astrophysics Data System (ADS)

    Atherton, Daniel

    Early detection of disease and insect infestation within crops and precise application of pesticides can help reduce potential production losses, reduce environmental risk, and reduce the cost of farming. The goal of this study was the advanced detection of early blight (Alternaria solani) in potato (Solanum tuberosum) plants using hyperspectral remote sensing data captured with a handheld spectroradiometer. Hyperspectral reflectance spectra were captured 10 times over five weeks from plants grown to the vegetative and tuber bulking growth stages. The spectra were analyzed using principal component analysis (PCA), spectral change (ratio) analysis, partial least squares (PLS), cluster analysis, and vegetative indices. PCA successfully distinguished more heavily diseased plants from healthy and minimally diseased plants using two principal components. Spectral change (ratio) analysis provided wavelengths (490-510, 640, 665-670, 690, 740-750, and 935 nm) most sensitive to early blight infection followed by ANOVA results indicating a highly significant difference (p < 0.0001) between disease rating group means. In the majority of the experiments, comparisons of diseased plants with healthy plants using Fisher's LSD revealed more heavily diseased plants were significantly different from healthy plants. PLS analysis demonstrated the feasibility of detecting early blight infected plants, finding four optimal factors for raw spectra with the predictor variation explained ranging from 93.4% to 94.6% and the response variation explained ranging from 42.7% to 64.7%. Cluster analysis successfully distinguished healthy plants from all diseased plants except for the most mildly diseased plants, showing clustering analysis was an effective method for detection of early blight. Analysis of the reflectance spectra using the simple ratio (SR) and the normalized difference vegetative index (NDVI) was effective at differentiating all diseased plants from healthy plants, except for the most mildly diseased plants. Of the analysis methods attempted, cluster analysis and vegetative indices were the most promising. The results show the potential of hyperspectral remote sensing for the detection of early blight in potato plants.

  7. Community detection in sequence similarity networks based on attribute clustering

    DOE PAGES

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    2017-07-24

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  8. Community detection in sequence similarity networks based on attribute clustering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.

    Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less

  9. Defining syndromes using cattle meat inspection data for syndromic surveillance purposes: a statistical approach with the 2005-2010 data from ten French slaughterhouses.

    PubMed

    Dupuy, Céline; Morignat, Eric; Maugey, Xavier; Vinard, Jean-Luc; Hendrikx, Pascal; Ducrot, Christian; Calavas, Didier; Gay, Emilie

    2013-04-30

    The slaughterhouse is a central processing point for food animals and thus a source of both demographic data (age, breed, sex) and health-related data (reason for condemnation and condemned portions) that are not available through other sources. Using these data for syndromic surveillance is therefore tempting. However many possible reasons for condemnation and condemned portions exist, making the definition of relevant syndromes challenging.The objective of this study was to determine a typology of cattle with at least one portion of the carcass condemned in order to define syndromes. Multiple factor analysis (MFA) in combination with clustering methods was performed using both health-related data and demographic data. Analyses were performed on 381,186 cattle with at least one portion of the carcass condemned among the 1,937,917 cattle slaughtered in ten French abattoirs. Results of the MFA and clustering methods led to 12 clusters considered as stable according to year of slaughter and slaughterhouse. One cluster was specific to a disease of public health importance (cysticercosis). Two clusters were linked to the slaughtering process (fecal contamination of heart or lungs and deterioration lesions). Two clusters respectively characterized by chronic liver lesions and chronic peritonitis could be linked to diseases of economic importance to farmers. Three clusters could be linked respectively to reticulo-pericarditis, fatty liver syndrome and farmer's lung syndrome, which are related to both diseases of economic importance to farmers and herd management issues. Three clusters respectively characterized by arthritis, myopathy and Dark Firm Dry (DFD) meat could notably be linked to animal welfare issues. Finally, one cluster, characterized by bronchopneumonia, could be linked to both animal health and herd management issues. The statistical approach of combining multiple factor analysis with cluster analysis showed its relevance for the detection of syndromes using available large and complex slaughterhouse data. The advantages of this statistical approach are to i) define groups of reasons for condemnation based on meat inspection data, ii) help grouping reasons for condemnation among a list of various possible reasons for condemnation for which a consensus among experts could be difficult to reach, iii) assign each animal to a single syndrome which allows the detection of changes in trends of syndromes to detect unusual patterns in known diseases and emergence of new diseases.

  10. Distributions of Gas and Galaxies from Galaxy Clusters to Larger Scales

    NASA Astrophysics Data System (ADS)

    Patej, Anna

    2017-01-01

    We address the distributions of gas and galaxies on three scales: the outskirts of galaxy clusters, the clustering of galaxies on large scales, and the extremes of the galaxy distribution. In the outskirts of galaxy clusters, long-standing analytical models of structure formation and recent simulations predict the existence of density jumps in the gas and dark matter profiles. We use these features to derive models for the gas density profile, obtaining a simple fiducial model that is in agreement with both observations of cluster interiors and simulations of the outskirts. We next consider the galaxy density profiles of clusters; under the assumption that the galaxies in cluster outskirts follow similar collisionless dynamics as the dark matter, their distribution should show a steep jump as well. We examine the profiles of a low-redshift sample of clusters and groups, finding evidence for the jump in some of these clusters. Moving to larger scales where massive galaxies of different types are expected to trace the same large-scale structure, we present a test of this prediction by measuring the clustering of red and blue galaxies at z 0.6, finding low stochasticity between the two populations. These results address a key source of systematic uncertainty - understanding how target populations of galaxies trace large-scale structure - in galaxy redshift surveys. Such surveys use baryon acoustic oscillations (BAO) as a cosmological probe, but are limited by the expense of obtaining sufficiently dense spectroscopy. With the intention of leveraging upcoming deep imaging data, we develop a new method of detecting the BAO in sparse spectroscopic samples via cross-correlation with a dense photometric catalog. This method will permit the extension of BAO measurements to higher redshifts than possible with the existing spectroscopy alone. Lastly, we connect galaxies near and far: the Local Group dwarfs and the high redshift galaxies observed by Hubble and Spitzer. We examine how the local dwarfs may have appeared in the past and compare their properties to the detection limits of the upcoming James Webb Space Telescope (JWST), finding that JWST should be able to detect galaxies similar to the progenitors of a few of the brightest of the local galaxies, revealing a hitherto unobserved population of galaxies at high redshifts.

  11. Clustering analysis of water distribution systems: identifying critical components and community impacts.

    PubMed

    Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D

    2014-01-01

    Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.

  12. Linear-array-based photoacoustic tomography for label-free high-throughput detection and quantification of circulating melanoma tumor cell clusters

    NASA Astrophysics Data System (ADS)

    Hai, Pengfei; Zhou, Yong; Zhang, Ruiying; Ma, Jun; Li, Yang; Wang, Lihong V.

    2017-03-01

    Circulating tumor cell (CTC) clusters arise from multicellular grouping in the primary tumor and elevate the metastatic potential by 23 to 50 fold compared to single CTCs. High throughout detection and quantification of CTC clusters is critical for understanding the tumor metastasis process and improving cancer therapy. In this work, we report a linear-array-based photoacoustic tomography (LA-PAT) system capable of label-free high-throughput CTC cluster detection and quantification in vivo. LA-PAT detects CTC clusters and quantifies the number of cells in them based on the contrast-to-noise ratios (CNRs) of photoacoustic signals. The feasibility of LA-PAT was first demonstrated by imaging CTC clusters ex vivo. LA-PAT detected CTC clusters in the blood-filled microtubes and computed the number of cells in the clusters. The size distribution of the CTC clusters measured by LA-PAT agreed well with that obtained by optical microscopy. We demonstrated the ability of LA-PAT to detect and quantify CTC clusters in vivo by imaging injected CTC clusters in rat tail veins. LA-PAT detected CTC clusters immediately after injection as well as when they were circulating in the rat bloodstreams. Similarly, the numbers of cells in the clusters were computed based on the CNRs of the photoacoustic signals. The data showed that larger CTC clusters disappear faster than the smaller ones. The results prove the potential of LA-PAT as a promising tool for both preclinical tumor metastasis studies and clinical cancer therapy evaluation.

  13. The cosmological analysis of X-ray cluster surveys. III. 4D X-ray observable diagrams

    NASA Astrophysics Data System (ADS)

    Pierre, M.; Valotti, A.; Faccioli, L.; Clerc, N.; Gastaud, R.; Koulouridis, E.; Pacaud, F.

    2017-11-01

    Context. Despite compelling theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties surrounding cluster-mass estimates. Aims: Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. Methods: We model the population of detected clusters in the count-rate - hardness-ratio - angular size - redshift space and compare the corresponding four-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are estimated by averaging the results from ten independent survey realisations. The method allows a simultaneous fit of the cosmological parameters of the cluster evolutionary physics and of the selection effects. Results: When using information from the X-ray survey alone plus redshifts, this approach is shown to be as accurate as the modelling of the mass function for the cosmological parameters and to perform better for the cluster physics, for a similar level of assumptions on the scaling relations. It enables the identification of degenerate combinations of parameter values. Conclusions: Given the considerably shorter computer times involved for running the minimisation procedure in the observed parameter space, this method appears to clearly outperform traditional mass-based approaches when X-ray survey data alone are available.

  14. Functional magnetic resonance imaging activation detection: fuzzy cluster analysis in wavelet and multiwavelet domains.

    PubMed

    Jahanian, Hesamoddin; Soltanian-Zadeh, Hamid; Hossein-Zadeh, Gholam-Ali

    2005-09-01

    To present novel feature spaces, based on multiscale decompositions obtained by scalar wavelet and multiwavelet transforms, to remedy problems associated with high dimension of functional magnetic resonance imaging (fMRI) time series (when they are used directly in clustering algorithms) and their poor signal-to-noise ratio (SNR) that limits accurate classification of fMRI time series according to their activation contents. Using randomization, the proposed method finds wavelet/multiwavelet coefficients that represent the activation content of fMRI time series and combines them to define new feature spaces. Using simulated and experimental fMRI data sets, the proposed feature spaces are compared to the cross-correlation (CC) feature space and their performances are evaluated. In these studies, the false positive detection rate is controlled using randomization. To compare different methods, several points of the receiver operating characteristics (ROC) curves, using simulated data, are estimated and compared. The proposed features suppress the effects of confounding signals and improve activation detection sensitivity. Experimental results show improved sensitivity and robustness of the proposed method compared to the conventional CC analysis. More accurate and sensitive activation detection can be achieved using the proposed feature spaces compared to CC feature space. Multiwavelet features show superior detection sensitivity compared to the scalar wavelet features. (c) 2005 Wiley-Liss, Inc.

  15. Deducing the Milky Way's Massive Cluster Population

    NASA Astrophysics Data System (ADS)

    Hanson, M. M.; Popescu, B.; Larsen, S. S.; Ivanov, V. D.

    2010-11-01

    Recent near-infrared surveys of the galactic plane have been used to identify new massive cluster candidates. Follow up study indicates about half are not true, gravitationally-bound clusters. These false positives are created by high density fields of unassociated stars, often due to a sight-line of reduced extinction. What is not so easy to estimate is the number of false negatives, clusters which exist but are not currently being detected by our surveys. In order to derive critical characteristics of the Milky Way's massive cluster population, such as cluster mass function and cluster lifetimes, one must be able to estimate the characteristics of these false negatives. Our group has taken on the daunting task of attempting such an estimate by first creating the stellar cluster imaging simulation program, MASSCLEAN. I will present our preliminary models and methods for deriving the biases of current searches.

  16. Clustering analysis of line indices for LAMOST spectra with AstroStat

    NASA Astrophysics Data System (ADS)

    Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

    2018-06-01

    The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.

  17. Cancer Detection in Microarray Data Using a Modified Cat Swarm Optimization Clustering Approach

    PubMed

    M, Pandi; R, Balamurugan; N, Sadhasivam

    2017-12-29

    Objective: A better understanding of functional genomics can be obtained by extracting patterns hidden in gene expression data. This could have paramount implications for cancer diagnosis, gene treatments and other domains. Clustering may reveal natural structures and identify interesting patterns in underlying data. The main objective of this research was to derive a heuristic approach to detection of highly co-expressed genes related to cancer from gene expression data with minimum Mean Squared Error (MSE). Methods: A modified CSO algorithm using Harmony Search (MCSO-HS) for clustering cancer gene expression data was applied. Experiment results are analyzed using two cancer gene expression benchmark datasets, namely for leukaemia and for breast cancer. Result: The results indicated MCSO-HS to be better than HS and CSO, 13% and 9% with the leukaemia dataset. For breast cancer dataset improvement was by 22% and 17%, respectively, in terms of MSE. Conclusion: The results showed MCSO-HS to outperform HS and CSO with both benchmark datasets. To validate the clustering results, this work was tested with internal and external cluster validation indices. Also this work points to biological validation of clusters with gene ontology in terms of function, process and component. Creative Commons Attribution License

  18. Using an innovative multiple regression procedure in a cancer population (Part 1): detecting and probing relationships of common interacting symptoms (pain, fatigue/weakness, sleep problems) as a strategy to discover influential symptom pairs and clusters

    PubMed Central

    Francoeur, Richard B

    2015-01-01

    Background The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Materials and methods Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Results Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain–fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain–fatigue/weakness–sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. Conclusion By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship. These findings suggest that a countervailing mechanism involving depressive affect could account for the effectiveness of a cognitive behavioral intervention to reduce the severity of a pain, fatigue, and sleep disturbance cluster in a previous randomized trial. PMID:25565865

  19. Micro-heterogeneity versus clustering in binary mixtures of ethanol with water or alkanes.

    PubMed

    Požar, Martina; Lovrinčević, Bernarda; Zoranić, Larisa; Primorać, Tomislav; Sokolić, Franjo; Perera, Aurélien

    2016-08-24

    Ethanol is a hydrogen bonding liquid. When mixed in small concentrations with water or alkanes, it forms aggregate structures reminiscent of, respectively, the direct and inverse micellar aggregates found in emulsions, albeit at much smaller sizes. At higher concentrations, micro-heterogeneous mixing with segregated domains is found. We examine how different statistical methods, namely correlation function analysis, structure factor analysis and cluster distribution analysis, can describe efficiently these morphological changes in these mixtures. In particular, we explain how the neat alcohol pre-peak of the structure factor evolves into the domain pre-peak under mixing conditions, and how this evolution differs whether the co-solvent is water or alkane. This study clearly establishes the heuristic superiority of the correlation function/structure factor analysis to study the micro-heterogeneity, since cluster distribution analysis is insensitive to domain segregation. Correlation functions detect the domains, with a clear structure factor pre-peak signature, while the cluster techniques detect the cluster hierarchy within domains. The main conclusion is that, in micro-segregated mixtures, the domain structure is a more fundamental statistical entity than the underlying cluster structures. These findings could help better understand comparatively the radiation scattering experiments, which are sensitive to domains, versus the spectroscopy-NMR experiments, which are sensitive to clusters.

  20. Improving cluster-based missing value estimation of DNA microarray data.

    PubMed

    Brás, Lígia P; Menezes, José C

    2007-06-01

    We present a modification of the weighted K-nearest neighbours imputation method (KNNimpute) for missing values (MVs) estimation in microarray data based on the reuse of estimated data. The method was called iterative KNN imputation (IKNNimpute) as the estimation is performed iteratively using the recently estimated values. The estimation efficiency of IKNNimpute was assessed under different conditions (data type, fraction and structure of missing data) by the normalized root mean squared error (NRMSE) and the correlation coefficients between estimated and true values, and compared with that of other cluster-based estimation methods (KNNimpute and sequential KNN). We further investigated the influence of imputation on the detection of differentially expressed genes using SAM by examining the differentially expressed genes that are lost after MV estimation. The performance measures give consistent results, indicating that the iterative procedure of IKNNimpute can enhance the prediction ability of cluster-based methods in the presence of high missing rates, in non-time series experiments and in data sets comprising both time series and non-time series data, because the information of the genes having MVs is used more efficiently and the iterative procedure allows refining the MV estimates. More importantly, IKNN has a smaller detrimental effect on the detection of differentially expressed genes.

  1. A Saliency Guided Semi-Supervised Building Change Detection Method for High Resolution Remote Sensing Images

    PubMed Central

    Hou, Bin; Wang, Yunhong; Liu, Qingjie

    2016-01-01

    Characterizations of up to date information of the Earth’s surface are an important application providing insights to urban planning, resources monitoring and environmental studies. A large number of change detection (CD) methods have been developed to solve them by utilizing remote sensing (RS) images. The advent of high resolution (HR) remote sensing images further provides challenges to traditional CD methods and opportunities to object-based CD methods. While several kinds of geospatial objects are recognized, this manuscript mainly focuses on buildings. Specifically, we propose a novel automatic approach combining pixel-based strategies with object-based ones for detecting building changes with HR remote sensing images. A multiresolution contextual morphological transformation called extended morphological attribute profiles (EMAPs) allows the extraction of geometrical features related to the structures within the scene at different scales. Pixel-based post-classification is executed on EMAPs using hierarchical fuzzy clustering. Subsequently, the hierarchical fuzzy frequency vector histograms are formed based on the image-objects acquired by simple linear iterative clustering (SLIC) segmentation. Then, saliency and morphological building index (MBI) extracted on difference images are used to generate a pseudo training set. Ultimately, object-based semi-supervised classification is implemented on this training set by applying random forest (RF). Most of the important changes are detected by the proposed method in our experiments. This study was checked for effectiveness using visual evaluation and numerical evaluation. PMID:27618903

  2. A Saliency Guided Semi-Supervised Building Change Detection Method for High Resolution Remote Sensing Images.

    PubMed

    Hou, Bin; Wang, Yunhong; Liu, Qingjie

    2016-08-27

    Characterizations of up to date information of the Earth's surface are an important application providing insights to urban planning, resources monitoring and environmental studies. A large number of change detection (CD) methods have been developed to solve them by utilizing remote sensing (RS) images. The advent of high resolution (HR) remote sensing images further provides challenges to traditional CD methods and opportunities to object-based CD methods. While several kinds of geospatial objects are recognized, this manuscript mainly focuses on buildings. Specifically, we propose a novel automatic approach combining pixel-based strategies with object-based ones for detecting building changes with HR remote sensing images. A multiresolution contextual morphological transformation called extended morphological attribute profiles (EMAPs) allows the extraction of geometrical features related to the structures within the scene at different scales. Pixel-based post-classification is executed on EMAPs using hierarchical fuzzy clustering. Subsequently, the hierarchical fuzzy frequency vector histograms are formed based on the image-objects acquired by simple linear iterative clustering (SLIC) segmentation. Then, saliency and morphological building index (MBI) extracted on difference images are used to generate a pseudo training set. Ultimately, object-based semi-supervised classification is implemented on this training set by applying random forest (RF). Most of the important changes are detected by the proposed method in our experiments. This study was checked for effectiveness using visual evaluation and numerical evaluation.

  3. Intersection Detection Based on Qualitative Spatial Reasoning on Stopping Point Clusters

    NASA Astrophysics Data System (ADS)

    Zourlidou, S.; Sester, M.

    2016-06-01

    The purpose of this research is to propose and test a method for detecting intersections by analysing collectively acquired trajectories of moving vehicles. Instead of solely relying on the geometric features of the trajectories, such as heading changes, which may indicate turning points and consequently intersections, we extract semantic features of the trajectories in form of sequences of stops and moves. Under this spatiotemporal prism, the extracted semantic information which indicates where vehicles stop can reveal important locations, such as junctions. The advantage of the proposed approach in comparison with existing turning-points oriented approaches is that it can detect intersections even when not all the crossing road segments are sampled and therefore no turning points are observed in the trajectories. The challenge with this approach is that first of all, not all vehicles stop at the same location - thus, the stop-location is blurred along the direction of the road; this, secondly, leads to the effect that nearby junctions can induce similar stop-locations. As a first step, a density-based clustering is applied on the layer of stop observations and clusters of stop events are found. Representative points of the clusters are determined (one per cluster) and in a last step the existence of an intersection is clarified based on spatial relational cluster reasoning, with which less informative geospatial clusters, in terms of whether a junction exists and where its centre lies, are transformed in more informative ones. Relational reasoning criteria, based on the relative orientation of the clusters with their adjacent ones are discussed for making sense of the relation that connects them, and finally for forming groups of stop events that belong to the same junction.

  4. SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.

    PubMed

    Nepusz, Tamás; Sasidharan, Rajkumar; Paccanaro, Alberto

    2010-03-09

    An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences). Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps.

  5. Performance and evaluation of real-time multicomputer control systems

    NASA Technical Reports Server (NTRS)

    Shin, K. G.

    1983-01-01

    New performance measures, detailed examples, modeling of error detection process, performance evaluation of rollback recovery methods, experiments on FTMP, and optimal size of an NMR cluster are discussed.

  6. Image reconstruction of muon tomographic data using a density-based clustering method

    NASA Astrophysics Data System (ADS)

    Perry, Kimberly B.

    Muons are subatomic particles capable of reaching the Earth's surface before decaying. When these particles collide with an object that has a high atomic number (Z), their path of travel changes substantially. Tracking muon movement through shielded containers can indicate what types of materials lie inside. This thesis proposes using a density-based clustering algorithm called OPTICS to perform image reconstructions using muon tomographic data. The results show that this method is capable of detecting high-Z materials quickly, and can also produce detailed reconstructions with large amounts of data.

  7. Brimstone chemistry under laser light assists mass spectrometric detection and imaging the distribution of arsenic in minerals.

    PubMed

    Lal, Swapnil; Zheng, Zhaoyu; Pavlov, Julius; Attygalle, Athula B

    2018-05-23

    Singly charged As2n+1 ion clusters (n = 2-11) were generated from elemental arsenic by negative-ion laser-ablation mass spectrometry. The overall abundance of the gaseous As ions generated upon laser irradiation was enhanced nearly a hundred times when As-bearing samples were admixed with sulfur. However, sulfur does not act purely as an inert matrix: irradiating arsenic-sulfur mixtures revealed a novel pathway to generate and detect a series of [AsSn]- clusters (n = 2-6). Intriguingly, the spectra recorded from As2O3, NaAsO2, Na3AsO4, cacodylic acid and 3-amino-4-hydroxyphenylarsonic acid together with sulfur as the matrix were remarkably similar to that acquired from an elemental arsenic and sulfur mixture. This result indicated that arsenic sulfide cluster-ions are generated directly from arsenic compounds by a hitherto unknown pathway. The mechanism of elemental sulfur extracting chemically bound arsenic from compounds and forming [AsSn]- clusters is enigmatic; however, this discovery has a practical value as a general detection method for arsenic compounds. For example, the method was employed for the detection of As in its minerals, and for the imaging of arsenic distribution in minerals such as domeykite. LDI-MS data recorded from a latent image imprinted on a piece of paper from a flat mineral surface, and wetting the paper with a solution of sulfur, enabled the localization of arsenic in the mineral. The distribution of As was visualized as false-color images by extracting from acquired data the relative intensities of m/z 139 (AsS2-) and m/z 171 (AsS3-) ions.

  8. Hierarchical Kohonenen net for anomaly detection in network security.

    PubMed

    Sarasamma, Suseela T; Zhu, Qiuming A; Huff, Julie

    2005-04-01

    A novel multilevel hierarchical Kohonen Net (K-Map) for an intrusion detection system is presented. Each level of the hierarchical map is modeled as a simple winner-take-all K-Map. One significant advantage of this multilevel hierarchical K-Map is its computational efficiency. Unlike other statistical anomaly detection methods such as nearest neighbor approach, K-means clustering or probabilistic analysis that employ distance computation in the feature space to identify the outliers, our approach does not involve costly point-to-point computation in organizing the data into clusters. Another advantage is the reduced network size. We use the classification capability of the K-Map on selected dimensions of data set in detecting anomalies. Randomly selected subsets that contain both attacks and normal records from the KDD Cup 1999 benchmark data are used to train the hierarchical net. We use a confidence measure to label the clusters. Then we use the test set from the same KDD Cup 1999 benchmark to test the hierarchical net. We show that a hierarchical K-Map in which each layer operates on a small subset of the feature space is superior to a single-layer K-Map operating on the whole feature space in detecting a variety of attacks in terms of detection rate as well as false positive rate.

  9. A novel peak detection approach with chemical noise removal using short-time FFT for prOTOF MS data.

    PubMed

    Zhang, Shuqin; Wang, Honghui; Zhou, Xiaobo; Hoehn, Gerard T; DeGraba, Thomas J; Gonzales, Denise A; Suffredini, Anthony F; Ching, Wai-Ki; Ng, Michael K; Wong, Stephen T C

    2009-08-01

    Peak detection is a pivotal first step in biomarker discovery from MS data and can significantly influence the results of downstream data analysis steps. We developed a novel automatic peak detection method for prOTOF MS data, which does not require a priori knowledge of protein masses. Random noise is removed by an undecimated wavelet transform and chemical noise is attenuated by an adaptive short-time discrete Fourier transform. Isotopic peaks corresponding to a single protein are combined by extracting an envelope over them. Depending on the S/N, the desired peaks in each individual spectrum are detected and those with the highest intensity among their peak clusters are recorded. The common peaks among all the spectra are identified by choosing an appropriate cut-off threshold in the complete linkage hierarchical clustering. To remove the 1 Da shifting of the peaks, the peak corresponding to the same protein is determined as the detected peak with the largest number among its neighborhood. We validated this method using a data set of serial peptide and protein calibration standards. Compared with MoverZ program, our new method detects more peaks and significantly enhances S/N of the peak after the chemical noise removal. We then successfully applied this method to a data set from prOTOF MS spectra of albumin and albumin-bound proteins from serum samples of 59 patients with carotid artery disease compared to vascular disease-free patients to detect peaks with S/N> or =2. Our method is easily implemented and is highly effective to define peaks that will be used for disease classification or to highlight potential biomarkers.

  10. Distant Cluster Hunting. II; A Comparison of X-Ray and Optical Cluster Detection Techniques and Catalogs from the ROSAT Optical X-Ray Survey

    NASA Technical Reports Server (NTRS)

    Donahue, Megan; Scharf, Caleb A.; Mack, Jennifer; Lee, Y. Paul; Postman, Marc; Rosait, Piero; Dickinson, Mark; Voit, G. Mark; Stocke, John T.

    2002-01-01

    We present and analyze the optical and X-ray catalogs of moderate-redshift cluster candidates from the ROSA TOptical X-Ray Survey, or ROXS. The survey covers the sky area contained in the fields of view of 23 deep archival ROSA T PSPC pointings, 4.8 square degrees. The cross-correlated cluster catalogs were con- structed by comparing two independent catalogs extracted from the optical and X-ray bandpasses, using a matched-filter technique for the optical data and a wavelet technique for the X-ray data. We cross-identified cluster candidates in each catalog. As reported in Paper 1, the matched-filter technique found optical counter- parts for at least 60% (26 out of 43) of the X-ray cluster candidates; the estimated redshifts from the matched filter algorithm agree with at least 7 of 1 1 spectroscopic confirmations (Az 5 0.10). The matched filter technique. with an imaging sensitivity of ml N 23, identified approximately 3 times the number of candidates (155 candidates, 142 with a detection confidence >3 u) found in the X-ray survey of nearly the same area. There are 57 X-ray candidates, 43 of which are unobscured by scattered light or bright stars in the optical images. Twenty-six of these have fairly secure optical counterparts. We find that the matched filter algorithm, when applied to images with galaxy flux sensitivities of mI N 23, is fairly well-matched to discovering z 5 1 clusters detected by wavelets in ROSAT PSPC exposures of 8000-60,000 s. The difference in the spurious fractions between the optical and X-ray (30%) and IO%, respectively) cannot account for the difference in source number. In Paper I, we compared the optical and X-ray cluster luminosity functions and we found that the luminosity functions are consistent if the relationship between X-ray and optical luminosities is steep (Lx o( L&f). Here, in Paper 11, we present the cluster catalogs and a numerical simulation of the ROXS. We also present color-magnitude plots for several of the cluster candidates, and examine the prominence of the red sequence in each. We find that the X-ray clusters in our survey do not all have a prominent red sequence. We conclude that while the red sequence may be a distinct feature in the color-magnitude plots for virialized massive clusters, it may be less distinct in lower mass clusters of galaxies at even moderate redshifts. Multiple, complementary methods of selecting and defining clusters may be essential, particularly at high redshift where all methods start to run into completeness limits, incomplete understanding of physical evolution, and projection effects.

  11. Multi-Target State Extraction for the SMC-PHD Filter

    PubMed Central

    Si, Weijian; Wang, Liwei; Qu, Zhiyu

    2016-01-01

    The sequential Monte Carlo probability hypothesis density (SMC-PHD) filter has been demonstrated to be a favorable method for multi-target tracking. However, the time-varying target states need to be extracted from the particle approximation of the posterior PHD, which is difficult to implement due to the unknown relations between the large amount of particles and the PHD peaks representing potential target locations. To address this problem, a novel multi-target state extraction algorithm is proposed in this paper. By exploiting the information of measurements and particle likelihoods in the filtering stage, we propose a validation mechanism which aims at selecting effective measurements and particles corresponding to detected targets. Subsequently, the state estimates of the detected and undetected targets are performed separately: the former are obtained from the particle clusters directed by effective measurements, while the latter are obtained from the particles corresponding to undetected targets via clustering method. Simulation results demonstrate that the proposed method yields better estimation accuracy and reliability compared to existing methods. PMID:27322274

  12. Could the clinical interpretability of subgroups detected using clustering methods be improved by using a novel two-stage approach?

    PubMed

    Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice

    2015-01-01

    Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.

  13. Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

    PubMed

    Ruzik, L; Obarski, N; Papierz, A; Mojski, M

    2015-06-01

    High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  14. Computer-aided detection of clustered microcalcifications in multiscale bilateral filtering regularized reconstructed digital breast tomosynthesis volume

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Samala, Ravi K., E-mail: rsamala@umich.edu; Chan, Heang-Ping; Lu, Yao

    Purpose: Develop a computer-aided detection (CADe) system for clustered microcalcifications in digital breast tomosynthesis (DBT) volume enhanced with multiscale bilateral filtering (MSBF) regularization. Methods: With Institutional Review Board approval and written informed consent, two-view DBT of 154 breasts, of which 116 had biopsy-proven microcalcification (MC) clusters and 38 were free of MCs, was imaged with a General Electric GEN2 prototype DBT system. The DBT volumes were reconstructed with MSBF-regularized simultaneous algebraic reconstruction technique (SART) that was designed to enhance MCs and reduce background noise while preserving the quality of other tissue structures. The contrast-to-noise ratio (CNR) of MCs was furthermore » improved with enhancement-modulated calcification response (EMCR) preprocessing, which combined multiscale Hessian response to enhance MCs by shape and bandpass filtering to remove the low-frequency structured background. MC candidates were then located in the EMCR volume using iterative thresholding and segmented by adaptive region growing. Two sets of potential MC objects, cluster centroid objects and MC seed objects, were generated and the CNR of each object was calculated. The number of candidates in each set was controlled based on the breast volume. Dynamic clustering around the centroid objects grouped the MC candidates to form clusters. Adaptive criteria were designed to reduce false positive (FP) clusters based on the size, CNR values and the number of MCs in the cluster, cluster shape, and cluster based maximum intensity projection. Free-response receiver operating characteristic (FROC) and jackknife alternative FROC (JAFROC) analyses were used to assess the performance and compare with that of a previous study. Results: Unpaired two-tailedt-test showed a significant increase (p < 0.0001) in the ratio of CNRs for MCs with and without MSBF regularization compared to similar ratios for FPs. For view-based detection, a sensitivity of 85% was achieved at an FP rate of 2.16 per DBT volume. For case-based detection, a sensitivity of 85% was achieved at an FP rate of 0.85 per DBT volume. JAFROC analysis showed a significant improvement in the performance of the current CADe system compared to that of our previous system (p = 0.003). Conclusions: MBSF regularized SART reconstruction enhances MCs. The enhancement in the signals, in combination with properly designed adaptive threshold criteria, effective MC feature analysis, and false positive reduction techniques, leads to a significant improvement in the detection of clustered MCs in DBT.« less

  15. A Novel Algorithm for Detecting Protein Complexes with the Breadth First Search

    PubMed Central

    Tang, Xiwei; Wang, Jianxin; Li, Min; He, Yiming; Pan, Yi

    2014-01-01

    Most biological processes are carried out by protein complexes. A substantial number of false positives of the protein-protein interaction (PPI) data can compromise the utility of the datasets for complexes reconstruction. In order to reduce the impact of such discrepancies, a number of data integration and affinity scoring schemes have been devised. The methods encode the reliabilities (confidence) of physical interactions between pairs of proteins. The challenge now is to identify novel and meaningful protein complexes from the weighted PPI network. To address this problem, a novel protein complex mining algorithm ClusterBFS (Cluster with Breadth-First Search) is proposed. Based on the weighted density, ClusterBFS detects protein complexes of the weighted network by the breadth first search algorithm, which originates from a given seed protein used as starting-point. The experimental results show that ClusterBFS performs significantly better than the other computational approaches in terms of the identification of protein complexes. PMID:24818139

  16. ATCA observations of the MACS-Planck Radio Halo Cluster Project. II. Radio observations of an intermediate redshift cluster sample

    NASA Astrophysics Data System (ADS)

    Martinez Aviles, G.; Johnston-Hollitt, M.; Ferrari, C.; Venturi, T.; Democles, J.; Dallacasa, D.; Cassano, R.; Brunetti, G.; Giacintucci, S.; Pratt, G. W.; Arnaud, M.; Aghanim, N.; Brown, S.; Douspis, M.; Hurier, J.; Intema, H. T.; Langer, M.; Macario, G.; Pointecouteau, E.

    2018-04-01

    Aim. A fraction of galaxy clusters host diffuse radio sources whose origins are investigated through multi-wavelength studies of cluster samples. We investigate the presence of diffuse radio emission in a sample of seven galaxy clusters in the largely unexplored intermediate redshift range (0.3 < z < 0.44). Methods: In search of diffuse emission, deep radio imaging of the clusters are presented from wide band (1.1-3.1 GHz), full resolution ( 5 arcsec) observations with the Australia Telescope Compact Array (ATCA). The visibilities were also imaged at lower resolution after point source modelling and subtraction and after a taper was applied to achieve better sensitivity to low surface brightness diffuse radio emission. In case of non-detection of diffuse sources, we set upper limits for the radio power of injected diffuse radio sources in the field of our observations. Furthermore, we discuss the dynamical state of the observed clusters based on an X-ray morphological analysis with XMM-Newton. Results: We detect a giant radio halo in PSZ2 G284.97-23.69 (z = 0.39) and a possible diffuse source in the nearly relaxed cluster PSZ2 G262.73-40.92 (z = 0.421). Our sample contains three highly disturbed massive clusters without clear traces of diffuse emission at the observed frequencies. We were able to inject modelled radio haloes with low values of total flux density to set upper detection limits; however, with our high-frequency observations we cannot exclude the presence of RH in these systems because of the sensitivity of our observations in combination with the high z of the observed clusters. The reduced images are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/611/A94

  17. A comparison of visual search strategies of elite and non-elite tennis players through cluster analysis.

    PubMed

    Murray, Nicholas P; Hunfalvay, Melissa

    2017-02-01

    Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.

  18. Conversion events in gene clusters

    PubMed Central

    2011-01-01

    Background Gene clusters containing multiple similar genomic regions in close proximity are of great interest for biomedical studies because of their associations with inherited diseases. However, such regions are difficult to analyze due to their structural complexity and their complicated evolutionary histories, reflecting a variety of large-scale mutational events. In particular, conversion events can mislead inferences about the relationships among these regions, as traced by traditional methods such as construction of phylogenetic trees or multi-species alignments. Results To correct the distorted information generated by such methods, we have developed an automated pipeline called CHAP (Cluster History Analysis Package) for detecting conversion events. We used this pipeline to analyze the conversion events that affected two well-studied gene clusters (α-globin and β-globin) and three gene clusters for which comparative sequence data were generated from seven primate species: CCL (chemokine ligand), IFN (interferon), and CYP2abf (part of cytochrome P450 family 2). CHAP is freely available at http://www.bx.psu.edu/miller_lab. Conclusions These studies reveal the value of characterizing conversion events in the context of studying gene clusters in complex genomes. PMID:21798034

  19. Sector Identification in a Set of Stock Return Time Series Traded at the London Stock Exchange

    NASA Astrophysics Data System (ADS)

    Coronnello, C.; Tumminello, M.; Lillo, F.; Micciche, S.; Mantegna, R. N.

    2005-09-01

    We compare some methods recently used in the literature to detect the existence of a certain degree of common behavior of stock returns belonging to the same economic sector. Specifically, we discuss methods based on random matrix theory and hierarchical clustering techniques. We apply these methods to a portfolio of stocks traded at the London Stock Exchange. The investigated time series are recorded both at a daily time horizon and at a 5-minute time horizon. The correlation coefficient matrix is very different at different time horizons confirming that more structured correlation coefficient matrices are observed for long time horizons. All the considered methods are able to detect economic information and the presence of clusters characterized by the economic sector of stocks. However, different methods present a different degree of sensitivity with respect to different sectors. Our comparative analysis suggests that the application of just a single method could not be able to extract all the economic information present in the correlation coefficient matrix of a stock portfolio.

  20. Identifying Clusters of Active Transportation Using Spatial Scan Statistics

    PubMed Central

    Huang, Lan; Stinchcomb, David G.; Pickle, Linda W.; Dill, Jennifer; Berrigan, David

    2009-01-01

    Background There is an intense interest in the possibility that neighborhood characteristics influence active transportation such as walking or biking. The purpose of this paper is to illustrate how a spatial cluster identification method can evaluate the geographic variation of active transportation and identify neighborhoods with unusually high/low levels of active transportation. Methods Self-reported walking/biking prevalence, demographic characteristics, street connectivity variables, and neighborhood socioeconomic data were collected from respondents to the 2001 California Health Interview Survey (CHIS; N=10,688) in Los Angeles County (LAC) and San Diego County (SDC). Spatial scan statistics were used to identify clusters of high or low prevalence (with and without age-adjustment) and the quantity of time spent walking and biking. The data, a subset from the 2001 CHIS, were analyzed in 2007–2008. Results Geographic clusters of significantly high or low prevalence of walking and biking were detected in LAC and SDC. Structural variables such as street connectivity and shorter block lengths are consistently associated with higher levels of active transportation, but associations between active transportation and socioeconomic variables at the individual and neighborhood levels are mixed. Only one cluster with less time spent walking and biking among walkers/bikers was detected in LAC, and this was of borderline significance. Age-adjustment affects the clustering pattern of walking/biking prevalence in LAC, but not in SDC. Conclusions The use of spatial scan statistics to identify significant clustering of health behaviors such as active transportation adds to the more traditional regression analysis that examines associations between behavior and environmental factors by identifying specific geographic areas with unusual levels of the behavior independent of predefined administrative units. PMID:19589451

  1. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    PubMed

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  2. Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

    PubMed Central

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136

  3. Galaxy cluster center detection methods with weak lensing

    NASA Astrophysics Data System (ADS)

    Simet, Melanie

    The precise location of galaxy cluster centers is a persistent problem in weak lensing mass estimates and in interpretations of clusters in a cosmological context. In this work, we test methods of centroid determination from weak lensing data and examine the effects of such self-calibration on the measured masses. Drawing on lensing data from the Sloan Digital Sky Survey Stripe 82, a 275 square degree region of coadded data in the Southern Galactic Cap, together with a catalog of MaxBCG clusters, we show that halo substructure as well as shape noise and stochasticity in galaxy positions limit the precision of such a self-calibration (in the context of Stripe 82, to ˜ 500 h-1 kpc or larger) and bias the mass estimates around these points to a level that is likely unacceptable for the purposes of making cosmological measurements. We also project the usefulness of this technique in future surveys.

  4. X-ray morphological study of galaxy cluster catalogues

    NASA Astrophysics Data System (ADS)

    Democles, Jessica; Pierre, Marguerite; Arnaud, Monique

    2016-07-01

    Context : The intra-cluster medium distribution as probed by X-ray morphology based analysis gives good indication of the system dynamical state. In the race for the determination of precise scaling relations and understanding their scatter, the dynamical state offers valuable information. Method : We develop the analysis of the centroid-shift so that it can be applied to characterize galaxy cluster surveys such as the XXL survey or high redshift cluster samples. We use it together with the surface brightness concentration parameter and the offset between X-ray peak and brightest cluster galaxy in the context of the XXL bright cluster sample (Pacaud et al 2015) and a set of high redshift massive clusters detected by Planck and SPT and observed by both XMM-Newton and Chandra observatories. Results : Using the wide redshift coverage of the XXL sample, we see no trend between the dynamical state of the systems with the redshift.

  5. MOCCA-SURVEY Database I: Is NGC 6535 a dark star cluster harbouring an IMBH?

    NASA Astrophysics Data System (ADS)

    Askar, Abbas; Bianchini, Paolo; de Vita, Ruggero; Giersz, Mirek; Hypki, Arkadiusz; Kamann, Sebastian

    2017-01-01

    We describe the dynamical evolution of a unique type of dark star cluster model in which the majority of the cluster mass at Hubble time is dominated by an intermediate-mass black hole (IMBH). We analysed results from about 2000 star cluster models (Survey Database I) simulated using the Monte Carlo code MOnte Carlo Cluster simulAtor and identified these dark star cluster models. Taking one of these models, we apply the method of simulating realistic `mock observations' by utilizing the Cluster simulatiOn Comparison with ObservAtions (COCOA) and Simulating Stellar Cluster Observation (SISCO) codes to obtain the photometric and kinematic observational properties of the dark star cluster model at 12 Gyr. We find that the perplexing Galactic globular cluster NGC 6535 closely matches the observational photometric and kinematic properties of the dark star cluster model presented in this paper. Based on our analysis and currently observed properties of NGC 6535, we suggest that this globular cluster could potentially harbour an IMBH. If it exists, the presence of this IMBH can be detected robustly with proposed kinematic observations of NGC 6535.

  6. Robust statistical methods for hit selection in RNA interference high-throughput screening experiments.

    PubMed

    Zhang, Xiaohua Douglas; Yang, Xiting Cindy; Chung, Namjin; Gates, Adam; Stec, Erica; Kunapuli, Priya; Holder, Dan J; Ferrer, Marc; Espeseth, Amy S

    2006-04-01

    RNA interference (RNAi) high-throughput screening (HTS) experiments carried out using large (>5000 short interfering [si]RNA) libraries generate a huge amount of data. In order to use these data to identify the most effective siRNAs tested, it is critical to adopt and develop appropriate statistical methods. To address the questions in hit selection of RNAi HTS, we proposed a quartile-based method which is robust to outliers, true hits and nonsymmetrical data. We compared it with the more traditional tests, mean +/- k standard deviation (SD) and median +/- 3 median of absolute deviation (MAD). The results suggested that the quartile-based method selected more hits than mean +/- k SD under the same preset error rate. The number of hits selected by median +/- k MAD was close to that by the quartile-based method. Further analysis suggested that the quartile-based method had the greatest power in detecting true hits, especially weak or moderate true hits. Our investigation also suggested that platewise analysis (determining effective siRNAs on a plate-by-plate basis) can adjust for systematic errors in different plates, while an experimentwise analysis, in which effective siRNAs are identified in an analysis of the entire experiment, cannot. However, experimentwise analysis may detect a cluster of true positive hits placed together in one or several plates, while platewise analysis may not. To display hit selection results, we designed a specific figure called a plate-well series plot. We thus suggest the following strategy for hit selection in RNAi HTS experiments. First, choose the quartile-based method, or median +/- k MAD, for identifying effective siRNAs. Second, perform the chosen method experimentwise on transformed/normalized data, such as percentage inhibition, to check the possibility of hit clusters. If a cluster of selected hits are observed, repeat the analysis based on untransformed data to determine whether the cluster is due to an artifact in the data. If no clusters of hits are observed, select hits by performing platewise analysis on transformed data. Third, adopt the plate-well series plot to visualize both the data and the hit selection results, as well as to check for artifacts.

  7. Intelligent screening of electrofusion-polyethylene joints based on a thermal NDT method

    NASA Astrophysics Data System (ADS)

    Doaei, Marjan; Tavallali, M. Sadegh

    2018-05-01

    The combinations of infrared thermal images and artificial intelligence methods have opened new avenues for pushing the boundaries of available testing methods. Hence, in the current study, a novel thermal non-destructive testing method for polyethylene electrofusion joints was combined with k-means clustering algorithms as an intelligent screening tool. The experiments focused on ovality of pipes in the coupler, as well as misalignment of pipes-couplers in 25 mm diameter joints. The temperature responses of each joint to an internal heat pulse were recorded by an IR thermal camera, and further processed to identify the faulty joints. The results represented clustering accuracy of 92%, as well as more than 90% abnormality detection capabilities.

  8. The cluster-cluster correlation function. [of galaxies

    NASA Technical Reports Server (NTRS)

    Postman, M.; Geller, M. J.; Huchra, J. P.

    1986-01-01

    The clustering properties of the Abell and Zwicky cluster catalogs are studied using the two-point angular and spatial correlation functions. The catalogs are divided into eight subsamples to determine the dependence of the correlation function on distance, richness, and the method of cluster identification. It is found that the Corona Borealis supercluster contributes significant power to the spatial correlation function to the Abell cluster sample with distance class of four or less. The distance-limited catalog of 152 Abell clusters, which is not greatly affected by a single system, has a spatial correlation function consistent with the power law Xi(r) = 300r exp -1.8. In both the distance class four or less and distance-limited samples the signal in the spatial correlation function is a power law detectable out to 60/h Mpc. The amplitude of Xi(r) for clusters of richness class two is about three times that for richness class one clusters. The two-point spatial correlation function is sensitive to the use of estimated redshifts.

  9. Community detection using Kernel Spectral Clustering with memory

    NASA Astrophysics Data System (ADS)

    Langone, Rocco; Suykens, Johan A. K.

    2013-02-01

    This work is related to the problem of community detection in dynamic scenarios, which for instance arises in the segmentation of moving objects, clustering of telephone traffic data, time-series micro-array data etc. A desirable feature of a clustering model which has to capture the evolution of communities over time is the temporal smoothness between clusters in successive time-steps. In this way the model is able to track the long-term trend and in the same time it smooths out short-term variation due to noise. We use the Kernel Spectral Clustering with Memory effect (MKSC) which allows to predict cluster memberships of new nodes via out-of-sample extension and has a proper model selection scheme. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness as a valid prior knowledge. The latter, in fact, allows the model to cluster the current data well and to be consistent with the recent history. Here we propose a generalization of the MKSC model with an arbitrary memory, not only one time-step in the past. The experiments conducted on toy problems confirm our expectations: the more memory we add to the model, the smoother over time are the clustering results. We also compare with the Evolutionary Spectral Clustering (ESC) algorithm which is a state-of-the art method, and we obtain comparable or better results.

  10. Effect of image quality on calcification detection in digital mammography

    PubMed Central

    Warren, Lucy M.; Mackenzie, Alistair; Cooke, Julie; Given-Wilson, Rosalind M.; Wallis, Matthew G.; Chakraborty, Dev P.; Dance, David R.; Bosmans, Hilde; Young, Kenneth C.

    2012-01-01

    Purpose: This study aims to investigate if microcalcification detection varies significantly when mammographic images are acquired using different image qualities, including: different detectors, dose levels, and different image processing algorithms. An additional aim was to determine how the standard European method of measuring image quality using threshold gold thickness measured with a CDMAM phantom and the associated limits in current EU guidelines relate to calcification detection. Methods: One hundred and sixty two normal breast images were acquired on an amorphous selenium direct digital (DR) system. Microcalcification clusters extracted from magnified images of slices of mastectomies were electronically inserted into half of the images. The calcification clusters had a subtle appearance. All images were adjusted using a validated mathematical method to simulate the appearance of images from a computed radiography (CR) imaging system at the same dose, from both systems at half this dose, and from the DR system at quarter this dose. The original 162 images were processed with both Hologic and Agfa (Musica-2) image processing. All other image qualities were processed with Agfa (Musica-2) image processing only. Seven experienced observers marked and rated any identified suspicious regions. Free response operating characteristic (FROC) and ROC analyses were performed on the data. The lesion sensitivity at a nonlesion localization fraction (NLF) of 0.1 was also calculated. Images of the CDMAM mammographic test phantom were acquired using the automatic setting on the DR system. These images were modified to the additional image qualities used in the observer study. The images were analyzed using automated software. In order to assess the relationship between threshold gold thickness and calcification detection a power law was fitted to the data. Results: There was a significant reduction in calcification detection using CR compared with DR: the alternative FROC (AFROC) area decreased from 0.84 to 0.63 and the ROC area decreased from 0.91 to 0.79 (p < 0.0001). This corresponded to a 30% drop in lesion sensitivity at a NLF equal to 0.1. Detection was also sensitive to the dose used. There was no significant difference in detection between the two image processing algorithms used (p > 0.05). It was additionally found that lower threshold gold thickness from CDMAM analysis implied better cluster detection. The measured threshold gold thickness passed the acceptable limit set in the EU standards for all image qualities except half dose CR. However, calcification detection varied significantly between image qualities. This suggests that the current EU guidelines may need revising. Conclusions: Microcalcification detection was found to be sensitive to detector and dose used. Standard measurements of image quality were a good predictor of microcalcification cluster detection. PMID:22755704

  11. Hierarchical clustering method for improved prostate cancer imaging in diffuse optical tomography

    NASA Astrophysics Data System (ADS)

    Kavuri, Venkaiah C.; Liu, Hanli

    2013-03-01

    We investigate the feasibility of trans-rectal near infrared (NIR) based diffuse optical tomography (DOT) for early detection of prostate cancer using a transrectal ultrasound (TRUS) compatible imaging probe. For this purpose, we designed a TRUS-compatible, NIR-based image system (780nm), in which the photo diodes were placed on the trans-rectal probe. DC signals were recorded and used for estimating the absorption coefficient. We validated the system using laboratory phantoms. For further improvement, we also developed a hierarchical clustering method (HCM) to improve the accuracy of image reconstruction with limited prior information. We demonstrated the method using computer simulations laboratory phantom experiments.

  12. An X-ray method for detecting substructure in galaxy clusters - Application to Perseus, A2256, Centaurus, Coma, and Sersic 40/6

    NASA Technical Reports Server (NTRS)

    Mohr, Joseph J.; Fabricant, Daniel G.; Geller, Margaret J.

    1993-01-01

    We use the moments of the X-ray surface brightness distribution to constrain the dynamical state of a galaxy cluster. Using X-ray observations from the Einstein Observatory IPC, we measure the first moment FM, the ellipsoidal orientation angle, and the axial ratio at a sequence of radii in the cluster. We argue that a significant variation in the image centroid FM as a function of radius is evidence for a nonequilibrium feature in the intracluster medium (ICM) density distribution. In simple terms, centroid shifts indicate that the center of mass of the ICM varies with radius. This variation is a tracer of continuing dynamical evolution. For each cluster, we evaluate the significance of variations in the centroid of the IPC image by computing the same statistics on an ensemble of simulated cluster images. In producing these simulated images we include X-ray point source emission, telescope vignetting, Poisson noise, and characteristics of the IPC. Application of this new method to five Abell clusters reveals that the core of each one has significant substructure. In addition, we find significant variations in the orientation angle and the axial ratio for several of the clusters.

  13. Knowledge Discovery from Databases: An Introductory Review.

    ERIC Educational Resources Information Center

    Vickery, Brian

    1997-01-01

    Introduces new procedures being used to extract knowledge from databases and discusses rationales for developing knowledge discovery methods. Methods are described for such techniques as classification, clustering, and the detection of deviations from pre-established norms. Examines potential uses of knowledge discovery in the information field.…

  14. The effect of image processing on the detection of cancers in digital mammography.

    PubMed

    Warren, Lucy M; Given-Wilson, Rosalind M; Wallis, Matthew G; Cooke, Julie; Halling-Brown, Mark D; Mackenzie, Alistair; Chakraborty, Dev P; Bosmans, Hilde; Dance, David R; Young, Kenneth C

    2014-08-01

    OBJECTIVE. The objective of our study was to investigate the effect of image processing on the detection of cancers in digital mammography images. MATERIALS AND METHODS. Two hundred seventy pairs of breast images (both breasts, one view) were collected from eight systems using Hologic amorphous selenium detectors: 80 image pairs showed breasts containing subtle malignant masses; 30 image pairs, biopsy-proven benign lesions; 80 image pairs, simulated calcification clusters; and 80 image pairs, no cancer (normal). The 270 image pairs were processed with three types of image processing: standard (full enhancement), low contrast (intermediate enhancement), and pseudo-film-screen (no enhancement). Seven experienced observers inspected the images, locating and rating regions they suspected to be cancer for likelihood of malignancy. The results were analyzed using a jackknife-alternative free-response receiver operating characteristic (JAFROC) analysis. RESULTS. The detection of calcification clusters was significantly affected by the type of image processing: The JAFROC figure of merit (FOM) decreased from 0.65 with standard image processing to 0.63 with low-contrast image processing (p = 0.04) and from 0.65 with standard image processing to 0.61 with film-screen image processing (p = 0.0005). The detection of noncalcification cancers was not significantly different among the image-processing types investigated (p > 0.40). CONCLUSION. These results suggest that image processing has a significant impact on the detection of calcification clusters in digital mammography. For the three image-processing versions and the system investigated, standard image processing was optimal for the detection of calcification clusters. The effect on cancer detection should be considered when selecting the type of image processing in the future.

  15. Evaluating the utility of syndromic surveillance algorithms for screening to detect potentially clonal hospital infection outbreaks

    PubMed Central

    Talbot, Thomas R; Schaffner, William; Bloch, Karen C; Daniels, Titus L; Miller, Randolph A

    2011-01-01

    Objective The authors evaluated algorithms commonly used in syndromic surveillance for use as screening tools to detect potentially clonal outbreaks for review by infection control practitioners. Design Study phase 1 applied four aberrancy detection algorithms (CUSUM, EWMA, space-time scan statistic, and WSARE) to retrospective microbiologic culture data, producing a list of past candidate outbreak clusters. In phase 2, four infectious disease physicians categorized the phase 1 algorithm-identified clusters to ascertain algorithm performance. In phase 3, project members combined the algorithms to create a unified screening system and conducted a retrospective pilot evaluation. Measurements The study calculated recall and precision for each algorithm, and created precision-recall curves for various methods of combining the algorithms into a unified screening tool. Results Individual algorithm recall and precision ranged from 0.21 to 0.31 and from 0.053 to 0.29, respectively. Few candidate outbreak clusters were identified by more than one algorithm. The best method of combining the algorithms yielded an area under the precision-recall curve of 0.553. The phase 3 combined system detected all infection control-confirmed outbreaks during the retrospective evaluation period. Limitations Lack of phase 2 reviewers' agreement indicates that subjective expert review was an imperfect gold standard. Less conservative filtering of culture results and alternate parameter selection for each algorithm might have improved algorithm performance. Conclusion Hospital outbreak detection presents different challenges than traditional syndromic surveillance. Nevertheless, algorithms developed for syndromic surveillance have potential to form the basis of a combined system that might perform clinically useful hospital outbreak screening. PMID:21606134

  16. New detections of embedded clusters in the Galactic halo

    NASA Astrophysics Data System (ADS)

    Camargo, D.; Bica, E.; Bonatto, C.

    2016-09-01

    Context. Until recently it was thought that high Galactic latitude clouds were a non-star-forming ensemble. However, in a previous study we reported the discovery of two embedded clusters (ECs) far away from the Galactic plane (~ 5 kpc). In our recent star cluster catalogue we provided additional high and intermediate latitude cluster candidates. Aims: This work aims to clarify whether our previous detection of star clusters far away from the disc represents just an episodic event or whether star cluster formation is currently a systematic phenomenon in the Galactic halo. We analyse the nature of four clusters found in our recent catalogue and report the discovery of three new ECs each with an unusually high latitude and distance from the Galactic disc midplane. Methods: The analysis is based on 2MASS and WISE colour-magnitude diagrams (CMDs), and stellar radial density profiles (RDPs). The CMDs are built by applying a field-star decontamination procedure, which uncovers the cluster's intrinsic CMD morphology. Results: All of these clusters are younger than 5 Myr. The high-latitude ECs C 932, C 934, and C 939 appear to be related to a cloud complex about 5 kpc below the Galactic disc, under the Local arm. The other clusters are above the disc, C 1074 and C 1100 with a vertical distance of ~3 kpc, C 1099 with ~ 2 kpc, and C 1101 with ~1.8 kpc. Conclusions: According to the derived parameters ECs located below and above the disc occur, which gives evidence of widespread star cluster formation throughout the Galactic halo. This study therefore represents a paradigm shift, by demonstrating that a sterile halo must now be understood as a host for ongoing star formation. The origin and fate of these ECs remain open. There are two possibilities for their origin, Galactic fountains or infall. The discovery of ECs far from the disc suggests that the Galactic halo is more actively forming stars than previously thought. Furthermore, since most ECs do not survive the infant mortality, stars may be raining from the halo into the disc, and/or the halo may be harbouring generations of stars formed in clusters like those detected in our survey.

  17. Photoinduced nucleation: a novel tool for detecting molecules in air at ultra-low concentrations

    DOEpatents

    Katz, Joseph L.; Lihavainen, Heikki; Rudek, Markus M.; Salter, Brian C.

    2002-01-01

    A method and apparatus for determining the presence of molecules in a gas at concentrations of less than about 100 ppb. Light having wavelengths in the range from about 200 nm to about 350 nm is used to illuminate a flowing sample of the gas causing the molecules if present to form clusters. A mixture of the illuminated gas and a vapor is cooled until the vapor is supersaturated so that there is a small rate of homogeneous nucleation. The supersaturated vapor condenses on the clusters thus causing the clusters to grow to a size sufficient to be counted by light scattering and then the clusters are counted.

  18. Planck 2015 results. XXIII. The thermal Sunyaev-Zeldovich effect-cosmic infrared background correlation

    NASA Astrophysics Data System (ADS)

    Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Bartolo, N.; Battaner, E.; Benabed, K.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Burigana, C.; Butler, R. C.; Calabrese, E.; Catalano, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Churazov, E.; Clements, D. L.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Dickinson, C.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Finelli, F.; Flores-Cacho, I.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Galeotta, S.; Galli, S.; Ganga, K.; Génova-Santos, R. T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Harrison, D. L.; Helou, G.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lamarre, J.-M.; Langer, M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Levrier, F.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maffei, B.; Maggio, G.; Maino, D.; Mak, D. S. Y.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; Melchiorri, A.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Munshi, D.; Murphy, J. A.; Nati, F.; Natoli, P.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Ponthieu, N.; Pratt, G. W.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Welikala, N.; Yvon, D.; Zacchei, A.; Zonca, A.

    2016-09-01

    We use Planck data to detect the cross-correlation between the thermal Sunyaev-Zeldovich (tSZ) effect and the infrared emission from the galaxies that make up the the cosmic infrared background (CIB). We first perform a stacking analysis towards Planck-confirmed galaxy clusters. We detect infrared emission produced by dusty galaxies inside these clusters and demonstrate that the infrared emission is about 50% more extended than the tSZ effect. Modelling the emission with a Navarro-Frenk-White profile, we find that the radial profile concentration parameter is c500 = 1.00+0.18-0.15 . This indicates that infrared galaxies in the outskirts of clusters have higher infrared flux than cluster-core galaxies. We also study the cross-correlation between tSZ and CIB anisotropies, following three alternative approaches based on power spectrum analyses: (I) using a catalogue of confirmed clusters detected in Planck data; (II) using an all-sky tSZ map built from Planck frequency maps; and (III) using cross-spectra between Planck frequency maps. With the three different methods, we detect the tSZ-CIB cross-power spectrum at significance levels of (I) 6σ; (II) 3σ; and (III) 4σ. We model the tSZ-CIB cross-correlation signature and compare predictions with the measurements. The amplitude of the cross-correlation relative to the fiducial model is AtSZ-CIB = 1.2 ± 0.3. This result is consistent with predictions for the tSZ-CIB cross-correlation assuming the best-fit cosmological model from Planck 2015 results along with the tSZ and CIB scaling relations.

  19. Matched Filter Stochastic Background Characterization for Hyperspectral Target Detection

    DTIC Science & Technology

    2005-09-30

    and Pre- Clustering MVN Test.....................126 4.2.3 Pre- Clustering Detection Results.................................................130...4.2.4 Pre- Clustering Target Influence..................................................134 4.2.5 Statistical Distance Exclusion and Low Contrast...al, 2001] Figure 2.7 ROC Curve Comparison of RX, K-Means, and Bayesian Pre- Clustering Applied to Anomaly Detection [Ashton, 1998] Figure 2.8 ROC

  20. The cosmological analysis of X-ray cluster surveys. IV. Testing ASpiX with template-based cosmological simulations

    NASA Astrophysics Data System (ADS)

    Valotti, A.; Pierre, M.; Farahi, A.; Evrard, A.; Faccioli, L.; Sauvageot, J.-L.; Clerc, N.; Pacaud, F.

    2018-06-01

    Context. This paper is the fourth of a series evaluating the ASpiX cosmological method, based on X-ray diagrams, which are constructed from simple cluster observable quantities, namely: count rate (CR), hardness ratio (HR), core radius (rc), and redshift. Aims: Following extensive tests on analytical toy catalogues (Paper III), we present the results of a more realistic study over a 711 deg2 template-based maps derived from a cosmological simulation. Methods: Dark matter haloes from the Aardvark simulation have been ascribed luminosities, temperatures, and core radii, using local scaling relations and assuming self-similar evolution. The predicted X-ray sky-maps were converted into XMM event lists, using a detailed instrumental simulator. The XXL pipeline runs on the resulting sky images, produces an observed cluster catalogue over which the tests have been performed. This allowed us to investigate the relative power of various combinations of the CR, HR, rc, and redshift information. Two fitting methods were used: a traditional Markov chain Monte Carlo (MCMC) approach and a simple minimisation procedure (Amoeba) whose mean uncertainties are a posteriori evaluated by means of synthetic catalogues. The results were analysed and compared to the predictions from the Fisher analysis (FA). Results: For this particular catalogue realisation, assuming that the scaling relations are perfectly known, the CR-HR combination gives σ8 and Ωm at the 10% level, while CR-HR-rc-z improves this to ≤3%. Adding a second HR improves the results from the CR-HR1-rc combination, but to a lesser extent than when adding the redshift information. When all coefficients of the mass-temperature relation (M-T, including scatter) are also fitted, the cosmological parameters are constrained to within 5-10% and larger for the M-T coefficients (up to a factor of two for the scatter). The errors returned by the MCMC, those by Amoeba and the FA predictions are in most cases in excellent agreement and always within a factor of two. We also study the impact of the scatter of the mass-size relation (M-Rc) on the number of detected clusters: for the cluster typical sizes usually assumed, the larger the scatter, the lower the number of detected objects. Conclusions: The present study confirms and extends the trends outlined in our previous analyses, namely the power of X-ray observable diagrams to successfully and easily fit at the same time, the cosmological parameters, cluster physics, and the survey selection, by involving all detected clusters. The accuracy levels quoted should not be considered as definitive. A number of simplifying hypotheses were made for the testing purpose, but this should affect any method in the same way. The next publication will consider in greater detail the impact of cluster shapes (selection and measurements) and of cluster physics on the final error budget by means of hydrodynamical simulations.

  1. Entanglement detection in optical lattices of bosonic atoms with collective measurements

    NASA Astrophysics Data System (ADS)

    Tóth, Géza

    2004-05-01

    The minimum requirements for entanglement detection are discussed for a spin chain in which the spins cannot be individually accessed. The methods presented detect entangled states close to a cluster state and a many-body singlet state, and seem to be viable for experimental realization in optical lattices of two-state bosonic atoms. The entanglement criteria are based on entanglement witnesses and on the uncertainty of collective observables.

  2. Privacy protection versus cluster detection in spatial epidemiology.

    PubMed

    Olson, Karen L; Grannis, Shaun J; Mandl, Kenneth D

    2006-11-01

    Patient data that includes precise locations can reveal patients' identities, whereas data aggregated into administrative regions may preserve privacy and confidentiality. We investigated the effect of varying degrees of address precision (exact latitude and longitude vs the center points of zip code or census tracts) on detection of spatial clusters of cases. We simulated disease outbreaks by adding supplementary spatially clustered emergency department visits to authentic hospital emergency department syndromic surveillance data. We identified clusters with a spatial scan statistic and evaluated detection rate and accuracy. More clusters were identified, and clusters were more accurately detected, when exact locations were used. That is, these clusters contained at least half of the simulated points and involved few additional emergency department visits. These results were especially apparent when the synthetic clustered points crossed administrative boundaries and fell into multiple zip code or census tracts. The spatial cluster detection algorithm performed better when addresses were analyzed as exact locations than when they were analyzed as center points of zip code or census tracts, particularly when the clustered points crossed administrative boundaries. Use of precise addresses offers improved performance, but this practice must be weighed against privacy concerns in the establishment of public health data exchange policies.

  3. Characteristics of Clusters of Salmonella and Escherichia coli O157 Detected by Pulsed-Field Gel Electrophoresis that Predict Identification of Outbreaks.

    PubMed

    Jones, Timothy F; Sashti, Nupur; Ingram, Amanda; Phan, Quyen; Booth, Hillary; Rounds, Joshua; Nicholson, Cyndy S; Cosgrove, Shaun; Crocker, Kia; Gould, L Hannah

    2016-12-01

    Molecular subtyping of pathogens is critical for foodborne disease outbreak detection and investigation. Many clusters initially identified by pulsed-field gel electrophoresis (PFGE) are not confirmed as point-source outbreaks. We evaluated characteristics of clusters that can help prioritize investigations to maximize effective use of limited resources. A multiagency collaboration (FoodNet) collected data on Salmonella and Escherichia coli O157 clusters for 3 years. Cluster size, timing, extent, and nature of epidemiologic investigations were analyzed to determine associations with whether the cluster was identified as a confirmed outbreak. During the 3-year study period, 948 PFGE clusters were identified; 849 (90%) were Salmonella and 99 (10%) were E. coli O157. Of those, 192 (20%) were ultimately identified as outbreaks (154 [18%] of Salmonella and 38 [38%] of E. coli O157 clusters). Successful investigation was significantly associated with larger cluster size, more rapid submission of isolates (e.g., for Salmonella, 6 days for outbreaks vs. 8 days for nonoutbreaks) and PFGE result reporting to investigators (16 days vs. 29 days, respectively), and performance of analytic studies (completed in 33% of Salmonella outbreaks vs. 1% of nonoutbreaks) and environmental investigations (40% and 1%, respectively). Intervals between first and second cases in a cluster did not differ significantly between outbreaks and nonoutbreaks. Molecular subtyping of pathogens is a rapidly advancing technology, and successfully identifying outbreaks will vary by pathogen and methods used. Understanding criteria for successfully investigating outbreaks is critical for efficiently using limited resources.

  4. Geospatial Distribution and Clustering of Chlamydia trachomatis in Communities Undergoing Mass Azithromycin Treatment

    PubMed Central

    Yohannan, Jithin; He, Bing; Wang, Jiangxia; Greene, Gregory; Schein, Yvette; Mkocha, Harran; Munoz, Beatriz; Quinn, Thomas C.; Gaydos, Charlotte; West, Sheila K.

    2014-01-01

    Purpose. We detected spatial clustering of households with Chlamydia trachomatis infection (CI) and active trachoma (AT) in villages undergoing mass treatment with azithromycin (MDA) over time. Methods. We obtained global positioning system (GPS) coordinates for all households in four villages in Kongwa District, Tanzania. Every 6 months for a period of 42 months, our team examined all children under 10 for AT, and tested for CI with ocular swabbing and Amplicor. Villages underwent four rounds of annual MDA. We classified households as having ≥1 child with CI (or AT) or having 0 children with CI (or AT). We calculated the difference in the K function between households with and without CI or AT to detect clustering at each time point. Results. Between 918 and 991 households were included over the 42 months of this analysis. At baseline, 306 households (32.59%) had ≥1 child with CI, which declined to 73 households (7.50%) at 42 months. We observed borderline clustering of households with CI at 12 months after one round of MDA and statistically significant clustering with growing cluster sizes between 18 and 24 months after two rounds of MDA. Clusters diminished in size at 30 months after 3 rounds of MDA. Active trachoma did not cluster at any time point. Conclusions. This study demonstrates that CI clusters after multiple rounds of MDA. Clusters of infection may increase in size if the annual antibiotic pressure is removed. The absence of growth after the three rounds suggests the start of control of transmission. PMID:24906862

  5. Visual verification and analysis of cluster detection for molecular dynamics.

    PubMed

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.

  6. Global detection approach for clustered microcalcifications in mammograms using a deep learning network.

    PubMed

    Wang, Juan; Nishikawa, Robert M; Yang, Yongyi

    2017-04-01

    In computerized detection of clustered microcalcifications (MCs) from mammograms, the traditional approach is to apply a pattern detector to locate the presence of individual MCs, which are subsequently grouped into clusters. Such an approach is often susceptible to the occurrence of false positives (FPs) caused by local image patterns that resemble MCs. We investigate the feasibility of a direct detection approach to determining whether an image region contains clustered MCs or not. Toward this goal, we develop a deep convolutional neural network (CNN) as the classifier model to which the input consists of a large image window ([Formula: see text] in size). The multiple layers in the CNN classifier are trained to automatically extract image features relevant to MCs at different spatial scales. In the experiments, we demonstrated this approach on a dataset consisting of both screen-film mammograms and full-field digital mammograms. We evaluated the detection performance both on classifying image regions of clustered MCs using a receiver operating characteristic (ROC) analysis and on detecting clustered MCs from full mammograms by a free-response receiver operating characteristic analysis. For comparison, we also considered a recently developed MC detector with FP suppression. In classifying image regions of clustered MCs, the CNN classifier achieved 0.971 in the area under the ROC curve, compared to 0.944 for the MC detector. In detecting clustered MCs from full mammograms, at 90% sensitivity, the CNN classifier obtained an FP rate of 0.69 clusters/image, compared to 1.17 clusters/image by the MC detector. These results indicate that using global image features can be more effective in discriminating clustered MCs from FPs caused by various sources, such as linear structures, thereby providing a more accurate detection of clustered MCs on mammograms.

  7. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison

    PubMed Central

    Matsen IV, Frederick A.; Evans, Steven N.

    2013-01-01

    Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. PMID:23505415

  8. Insights into quasar UV spectra using unsupervised clustering analysis

    NASA Astrophysics Data System (ADS)

    Tammour, A.; Gallagher, S. C.; Daley, M.; Richards, G. T.

    2016-06-01

    Machine learning techniques can provide powerful tools to detect patterns in multidimensional parameter space. We use K-means - a simple yet powerful unsupervised clustering algorithm which picks out structure in unlabelled data - to study a sample of quasar UV spectra from the Quasar Catalog of the 10th Data Release of the Sloan Digital Sky Survey (SDSS-DR10) of Paris et al. Detecting patterns in large data sets helps us gain insights into the physical conditions and processes giving rise to the observed properties of quasars. We use K-means to find clusters in the parameter space of the equivalent width (EW), the blue- and red-half-width at half-maximum (HWHM) of the Mg II 2800 Å line, the C IV 1549 Å line, and the C III] 1908 Å blend in samples of broad absorption line (BAL) and non-BAL quasars at redshift 1.6-2.1. Using this method, we successfully recover correlations well-known in the UV regime such as the anti-correlation between the EW and blueshift of the C IV emission line and the shape of the ionizing spectra energy distribution (SED) probed by the strength of He II and the Si III]/C III] ratio. We find this to be particularly evident when the properties of C III] are used to find the clusters, while those of Mg II proved to be less strongly correlated with the properties of the other lines in the spectra such as the width of C IV or the Si III]/C III] ratio. We conclude that unsupervised clustering methods (such as K-means) are powerful methods for finding `natural' binning boundaries in multidimensional data sets and discuss caveats and future work.

  9. Label-free high-throughput detection and quantification of circulating melanoma tumor cell clusters by linear-array-based photoacoustic tomography

    NASA Astrophysics Data System (ADS)

    Hai, Pengfei; Zhou, Yong; Zhang, Ruiying; Ma, Jun; Li, Yang; Shao, Jin-Yu; Wang, Lihong V.

    2017-04-01

    Circulating tumor cell (CTC) clusters, arising from multicellular groupings in a primary tumor, greatly elevate the metastatic potential of cancer compared with single CTCs. High-throughput detection and quantification of CTC clusters are important for understanding the tumor metastatic process and improving cancer therapy. Here, we applied a linear-array-based photoacoustic tomography (LA-PAT) system and improved the image reconstruction for label-free high-throughput CTC cluster detection and quantification in vivo. The feasibility was first demonstrated by imaging CTC cluster ex vivo. The relationship between the contrast-to-noise ratios (CNRs) and the number of cells in melanoma tumor cell clusters was investigated and verified. Melanoma CTC clusters with a minimum of four cells could be detected, and the number of cells could be computed from the CNR. Finally, we demonstrated imaging of injected melanoma CTC clusters in rats in vivo. Similarly, the number of cells in the melanoma CTC clusters could be quantified. The data showed that larger CTC clusters had faster clearance rates in the bloodstream, which agreed with the literature. The results demonstrated the capability of LA-PAT to detect and quantify melanoma CTC clusters in vivo and showed its potential for tumor metastasis study and cancer therapy.

  10. Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

    PubMed Central

    Sharafi, Zahra

    2017-01-01

    Background The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed. PMID:29312463

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Warren, Lucy M.; Mackenzie, Alistair; Cooke, Julie

    Purpose: This study aims to investigate if microcalcification detection varies significantly when mammographic images are acquired using different image qualities, including: different detectors, dose levels, and different image processing algorithms. An additional aim was to determine how the standard European method of measuring image quality using threshold gold thickness measured with a CDMAM phantom and the associated limits in current EU guidelines relate to calcification detection. Methods: One hundred and sixty two normal breast images were acquired on an amorphous selenium direct digital (DR) system. Microcalcification clusters extracted from magnified images of slices of mastectomies were electronically inserted into halfmore » of the images. The calcification clusters had a subtle appearance. All images were adjusted using a validated mathematical method to simulate the appearance of images from a computed radiography (CR) imaging system at the same dose, from both systems at half this dose, and from the DR system at quarter this dose. The original 162 images were processed with both Hologic and Agfa (Musica-2) image processing. All other image qualities were processed with Agfa (Musica-2) image processing only. Seven experienced observers marked and rated any identified suspicious regions. Free response operating characteristic (FROC) and ROC analyses were performed on the data. The lesion sensitivity at a nonlesion localization fraction (NLF) of 0.1 was also calculated. Images of the CDMAM mammographic test phantom were acquired using the automatic setting on the DR system. These images were modified to the additional image qualities used in the observer study. The images were analyzed using automated software. In order to assess the relationship between threshold gold thickness and calcification detection a power law was fitted to the data. Results: There was a significant reduction in calcification detection using CR compared with DR: the alternative FROC (AFROC) area decreased from 0.84 to 0.63 and the ROC area decreased from 0.91 to 0.79 (p < 0.0001). This corresponded to a 30% drop in lesion sensitivity at a NLF equal to 0.1. Detection was also sensitive to the dose used. There was no significant difference in detection between the two image processing algorithms used (p > 0.05). It was additionally found that lower threshold gold thickness from CDMAM analysis implied better cluster detection. The measured threshold gold thickness passed the acceptable limit set in the EU standards for all image qualities except half dose CR. However, calcification detection varied significantly between image qualities. This suggests that the current EU guidelines may need revising. Conclusions: Microcalcification detection was found to be sensitive to detector and dose used. Standard measurements of image quality were a good predictor of microcalcification cluster detection.« less

  12. Mixture model-based clustering and logistic regression for automatic detection of microaneurysms in retinal images

    NASA Astrophysics Data System (ADS)

    Sánchez, Clara I.; Hornero, Roberto; Mayo, Agustín; García, María

    2009-02-01

    Diabetic Retinopathy is one of the leading causes of blindness and vision defects in developed countries. An early detection and diagnosis is crucial to avoid visual complication. Microaneurysms are the first ocular signs of the presence of this ocular disease. Their detection is of paramount importance for the development of a computer-aided diagnosis technique which permits a prompt diagnosis of the disease. However, the detection of microaneurysms in retinal images is a difficult task due to the wide variability that these images usually present in screening programs. We propose a statistical approach based on mixture model-based clustering and logistic regression which is robust to the changes in the appearance of retinal fundus images. The method is evaluated on the public database proposed by the Retinal Online Challenge in order to obtain an objective performance measure and to allow a comparative study with other proposed algorithms.

  13. Improving Glaucoma Detection Using Spatially Correspondent Clusters of Damage and by Combining Standard Automated Perimetry and Optical Coherence Tomography

    PubMed Central

    Raza, Ali S.; Zhang, Xian; De Moraes, Carlos G. V.; Reisman, Charles A.; Liebmann, Jeffrey M.; Ritch, Robert; Hood, Donald C.

    2014-01-01

    Purpose. To improve the detection of glaucoma, techniques for assessing local patterns of damage and for combining structure and function were developed. Methods. Standard automated perimetry (SAP) and frequency-domain optical coherence tomography (fdOCT) data, consisting of macular retinal ganglion cell plus inner plexiform layer (mRGCPL) as well as macular and optic disc retinal nerve fiber layer (mRNFL and dRNFL) thicknesses, were collected from 52 eyes of 52 healthy controls and 156 eyes of 96 glaucoma suspects and patients. In addition to generating simple global metrics, SAP and fdOCT data were searched for contiguous clusters of abnormal points and converted to a continuous metric (pcc). The pcc metric, along with simpler methods, was used to combine the information from the SAP and fdOCT. The performance of different methods was assessed using the area under receiver operator characteristic curves (AROC scores). Results. The pcc metric performed better than simple global measures for both the fdOCT and SAP. The best combined structure-function metric (mRGCPL&SAP pcc, AROC = 0.868 ± 0.032) was better (statistically significant) than the best metrics for independent measures of structure and function. When SAP was used as part of the inclusion and exclusion criteria, AROC scores increased for all metrics, including the best combined structure-function metric (AROC = 0.975 ± 0.014). Conclusions. A combined structure-function metric improved the detection of glaucomatous eyes. Overall, the primary sources of value-added for glaucoma detection stem from the continuous cluster search (the pcc), the mRGCPL data, and the combination of structure and function. PMID:24408977

  14. The molecular epidemiology of HIV-1 in the Comunidad Valenciana (Spain): analysis of transmission clusters.

    PubMed

    Patiño-Galindo, Juan Ángel; Torres-Puente, Manoli; Bracho, María Alma; Alastrué, Ignacio; Juan, Amparo; Navarro, David; Galindo, María José; Ocete, Dolores; Ortega, Enrique; Gimeno, Concepción; Belda, Josefina; Domínguez, Victoria; Moreno, Rosario; González-Candelas, Fernando

    2017-09-14

    HIV infections are still a very serious concern for public heath worldwide. We have applied molecular evolution methods to study the HIV-1 epidemics in the Comunidad Valenciana (CV, Spain) from a public health surveillance perspective. For this, we analysed 1804 HIV-1 sequences comprising protease and reverse transcriptase (PR/RT) coding regions, sampled between 2004 and 2014. These sequences were subtyped and subjected to phylogenetic analyses in order to detect transmission clusters. In addition, univariate and multinomial comparisons were performed to detect epidemiological differences between HIV-1 subtypes, and risk groups. The HIV epidemic in the CV is dominated by subtype B infections among local men who have sex with men (MSM). 270 transmission clusters were identified (>57% of the dataset), 12 of which included ≥10 patients; 11 of subtype B (9 affecting MSMs) and one (n = 21) of CRF14, affecting predominately intravenous drug users (IDUs). Dated phylogenies revealed these large clusters to have originated from the mid-80s to the early 00 s. Subtype B is more likely to form transmission clusters than non-B variants and MSMs to cluster than other risk groups. Multinomial analyses revealed an association between non-B variants, which are not established in the local population yet, and different foreign groups.

  15. Statistical detection of geographic clusters of resistant Escherichia coli in a regional network with WHONET and SaTScan

    PubMed Central

    Park, Rachel; O'Brien, Thomas F.; Huang, Susan S.; Baker, Meghan A.; Yokoe, Deborah S.; Kulldorff, Martin; Barrett, Craig; Swift, Jamie; Stelling, John

    2016-01-01

    Objectives While antimicrobial resistance threatens the prevention, treatment, and control of infectious diseases, systematic analysis of routine microbiology laboratory test results worldwide can alert new threats and promote timely response. This study explores statistical algorithms for recognizing geographic clustering of multi-resistant microbes within a healthcare network and monitoring the dissemination of new strains over time. Methods Escherichia coli antimicrobial susceptibility data from a three-year period stored in WHONET were analyzed across ten facilities in a healthcare network utilizing SaTScan's spatial multinomial model with two models for defining geographic proximity. We explored geographic clustering of multi-resistance phenotypes within the network and changes in clustering over time. Results Geographic clustering identified from both latitude/longitude and non-parametric facility groupings geographic models were similar, while the latter was offers greater flexibility and generalizability. Iterative application of the clustering algorithms suggested the possible recognition of the initial appearance of invasive E. coli ST131 in the clinical database of a single hospital and subsequent dissemination to others. Conclusion Systematic analysis of routine antimicrobial resistance susceptibility test results supports the recognition of geographic clustering of microbial phenotypic subpopulations with WHONET and SaTScan, and iterative application of these algorithms can detect the initial appearance in and dissemination across a region prompting early investigation, response, and containment measures. PMID:27530311

  16. Defining syndromes using cattle meat inspection data for syndromic surveillance purposes: a statistical approach with the 2005–2010 data from ten French slaughterhouses

    PubMed Central

    2013-01-01

    Background The slaughterhouse is a central processing point for food animals and thus a source of both demographic data (age, breed, sex) and health-related data (reason for condemnation and condemned portions) that are not available through other sources. Using these data for syndromic surveillance is therefore tempting. However many possible reasons for condemnation and condemned portions exist, making the definition of relevant syndromes challenging. The objective of this study was to determine a typology of cattle with at least one portion of the carcass condemned in order to define syndromes. Multiple factor analysis (MFA) in combination with clustering methods was performed using both health-related data and demographic data. Results Analyses were performed on 381,186 cattle with at least one portion of the carcass condemned among the 1,937,917 cattle slaughtered in ten French abattoirs. Results of the MFA and clustering methods led to 12 clusters considered as stable according to year of slaughter and slaughterhouse. One cluster was specific to a disease of public health importance (cysticercosis). Two clusters were linked to the slaughtering process (fecal contamination of heart or lungs and deterioration lesions). Two clusters respectively characterized by chronic liver lesions and chronic peritonitis could be linked to diseases of economic importance to farmers. Three clusters could be linked respectively to reticulo-pericarditis, fatty liver syndrome and farmer’s lung syndrome, which are related to both diseases of economic importance to farmers and herd management issues. Three clusters respectively characterized by arthritis, myopathy and Dark Firm Dry (DFD) meat could notably be linked to animal welfare issues. Finally, one cluster, characterized by bronchopneumonia, could be linked to both animal health and herd management issues. Conclusion The statistical approach of combining multiple factor analysis with cluster analysis showed its relevance for the detection of syndromes using available large and complex slaughterhouse data. The advantages of this statistical approach are to i) define groups of reasons for condemnation based on meat inspection data, ii) help grouping reasons for condemnation among a list of various possible reasons for condemnation for which a consensus among experts could be difficult to reach, iii) assign each animal to a single syndrome which allows the detection of changes in trends of syndromes to detect unusual patterns in known diseases and emergence of new diseases. PMID:23628140

  17. A PVC/polypyrrole sensor designed for beef taste detection using electrochemical methods and sensory evaluation.

    PubMed

    Zhu, Lingtao; Wang, Xiaodan; Han, Yunxiu; Cai, Yingming; Jin, Jiahui; Wang, Hongmei; Xu, Liping; Wu, Ruijia

    2018-03-01

    An electrochemical sensor for detection of beef taste was designed in this study. This sensor was based on the structure of polyvinyl chloride/polypyrrole (PVC/PPy), which was polymerized onto the surface of a platinum (Pt) electrode to form a Pt-PPy-PVC film. Detecting by electrochemical methods, the sensor was well characterized by electrochemical impedance spectroscopy (EIS) and cyclic voltammetry (CV). The sensor was applied to detect 10 rib-eye beef samples and the accuracy of the new sensor was validated by sensory evaluation and ion sensor detection. Several cluster analysis methods were used in the study to distinguish the beef samples. According to the obtained results, the designed sensor showed a high degree of association of electrochemical detection and sensory evaluation, which proved a fast and precise sensor for beef taste detection. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Stochastic competitive learning in complex networks.

    PubMed

    Silva, Thiago Christiano; Zhao, Liang

    2012-03-01

    Competitive learning is an important machine learning approach which is widely employed in artificial neural networks. In this paper, we present a rigorous definition of a new type of competitive learning scheme realized on large-scale networks. The model consists of several particles walking within the network and competing with each other to occupy as many nodes as possible, while attempting to reject intruder particles. The particle's walking rule is composed of a stochastic combination of random and preferential movements. The model has been applied to solve community detection and data clustering problems. Computer simulations reveal that the proposed technique presents high precision of community and cluster detections, as well as low computational complexity. Moreover, we have developed an efficient method for estimating the most likely number of clusters by using an evaluator index that monitors the information generated by the competition process itself. We hope this paper will provide an alternative way to the study of competitive learning..

  19. Multifocal visual evoked potentials for early glaucoma detection.

    PubMed

    Weizer, Jennifer S; Musch, David C; Niziol, Leslie M; Khan, Naheed W

    2012-07-01

    To compare multifocal visual evoked potentials (mfVEP) with other detection methods in early open-angle glaucoma. Ten patients with suspected glaucoma and 5 with early open-angle glaucoma underwent mfVEP, standard automated perimetry (SAP), short-wave automated perimetry, frequency-doubling technology perimetry, and nerve fiber layer optical coherence tomography. Nineteen healthy control subjects underwent mfVEP and SAP for comparison. Comparisons between groups involving continuous variables were made using independent t tests; for categorical variables, Fisher's exact test was used. Monocular mfVEP cluster defects were associated with an increased SAP pattern standard deviation (P = .0195). Visual fields that showed interocular mfVEP cluster defects were more likely to also show superior quadrant nerve fiber layer thinning by OCT (P = .0152). Multifocal visual evoked potential cluster defects are associated with a functional and an anatomic measure that both relate to glaucomatous optic neuropathy. Copyright 2012, SLACK Incorporated.

  20. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  1. A Test for Cluster Bias: Detecting Violations of Measurement Invariance across Clusters in Multilevel Data

    ERIC Educational Resources Information Center

    Jak, Suzanne; Oort, Frans J.; Dolan, Conor V.

    2013-01-01

    We present a test for cluster bias, which can be used to detect violations of measurement invariance across clusters in 2-level data. We show how measurement invariance assumptions across clusters imply measurement invariance across levels in a 2-level factor model. Cluster bias is investigated by testing whether the within-level factor loadings…

  2. Functional video-based analysis of 3D cardiac structures generated from human embryonic stem cells.

    PubMed

    Nitsch, Scarlett; Braun, Florian; Ritter, Sylvia; Scholz, Michael; Schroeder, Insa S

    2018-05-01

    Human embryonic stem cells (hESCs) differentiated into cardiomyocytes (CM) often develop into complex 3D structures that are composed of various cardiac cell types. Conventional methods to study the electrophysiology of cardiac cells are patch clamp and microelectrode array (MEAs) analyses. However, these methods are not suitable to investigate the contractile features of 3D cardiac clusters that detach from the surface of the culture dishes during differentiation. To overcome this problem, we developed a video-based motion detection software relying on the optical flow by Farnebäck that we call cBRA (cardiac beat rate analyzer). The beating characteristics of the differentiated cardiac clusters were calculated based on the local displacement between two subsequent images. Two differentiation protocols, which profoundly differ in the morphology of cardiac clusters generated and in the expression of cardiac markers, were used and the resulting CM were characterized. Despite these differences, beat rates and beating variabilities could be reliably determined using cBRA. Likewise, stimulation of β-adrenoreceptors by isoproterenol could easily be identified in the hESC-derived CM. Since even subtle changes in the beating features are detectable, this method is suitable for high throughput cardiotoxicity screenings. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  3. Charge exchange in galaxy clusters

    NASA Astrophysics Data System (ADS)

    Gu, Liyi; Mao, Junjie; de Plaa, Jelle; Raassen, A. J. J.; Shah, Chintan; Kaastra, Jelle S.

    2018-03-01

    Context. Though theoretically expected, the charge exchange emission from galaxy clusters has never been confidently detected. Accumulating hints were reported recently, including a rather marginal detection with the Hitomi data of the Perseus cluster. As previously suggested, a detection of charge exchange line emission from galaxy clusters would not only impact the interpretation of the newly discovered 3.5 keV line, but also open up a new research topic on the interaction between hot and cold matter in clusters. Aim. We aim to perform the most systematic search for the O VIII charge exchange line in cluster spectra using the RGS on board XMM-Newton. Methods: We introduce a sample of 21 clusters observed with the RGS. In order to search for O VIII charge exchange, the sample selection criterion is a >35σ detection of the O VIII Lyα line in the archival RGS spectra. The dominating thermal plasma emission is modeled and subtracted with a two-temperature thermal component, and the residuals are stacked for the line search. The systematic uncertainties in the fits are quantified by refitting the spectra with a varying continuum and line broadening. Results: By the residual stacking, we do find a hint of a line-like feature at 14.82 Å, the characteristic wavelength expected for oxygen charge exchange. This feature has a marginal significance of 2.8σ, and the average equivalent width is 2.5 × 10-4 keV. We further demonstrate that the putative feature can be barely affected by the systematic errors from continuum modeling and instrumental effects, or the atomic uncertainties of the neighboring thermal lines. Conclusions: Assuming a realistic temperature and abundance pattern, the physical model implied by the possible oxygen line agrees well with the theoretical model proposed previously to explain the reported 3.5 keV line. If the charge exchange source indeed exists, we expect that the oxygen abundance could have been overestimated by 8-22% in previous X-ray measurements that assumed pure thermal lines. These new RGS results bring us one step forward to understanding the charge exchange phenomenon in galaxy clusters.

  4. The structure of clusters of galaxies

    NASA Astrophysics Data System (ADS)

    Fox, David Charles

    When infalling gas is accreted onto a cluster of galaxies, its kinetic energy is converted to thermal energy in a shock, heating the ions. Using a self-similar spherical model, we calculate the collisional heating of the electrons by the ions, and predict the electron and ion temperature profiles. While there are significant differences between the two, they occur at radii larger than currently observable, and too large to explain observed X-ray temperature declines in clusters. Numerical simulations by Navarro, Frenk, & White (1996) predict a universal dark matter density profile. We calculate the expected number of multiply-imaged background galaxies in the Hubble Deep Field due to foreground groups and clusters with this profile. Such groups are up to 1000 times less efficient at lensing than the standard singular isothermal spheres. However, with either profile, the expected number of galaxies lensed by groups in the Hubble Deep Field is at most one, consistent with the lack of clearly identified group lenses. X-ray and Sunyaev-Zel'dovich (SZ) effect observations can be combined to determine the distance to clusters of galaxies, provided the clusters are spherical. When applied to an aspherical cluster, this method gives an incorrect distance. We demonstrate a method for inferring the three-dimensional shape of a cluster and its correct distance from X-ray, SZ effect, and weak gravitational lensing observations, under the assumption of hydrostatic equilibrium. We apply this method to simple, analytic models of clusters, and to a numerically simulated cluster. Using artificial observations based on current X-ray and SZ effect instruments, we recover the true distance without detectable bias and with uncertainties of 4 percent.

  5. Image-Subtraction Photometry of Variable Stars in the Globular Clusters NGC 6388 and NGC 6441

    NASA Technical Reports Server (NTRS)

    Corwin, Michael T.; Sumerel, Andrew N.; Pritzl, Barton J.; Smith, Horace A.; Catelan, M.; Sweigart, Allen V.; Stetson, Peter B.

    2006-01-01

    We have applied Alard's image subtraction method (ISIS v2.1) to the observations of the globular clusters NGC 6388 and NGC 6441 previously analyzed using standard photometric techniques (DAOPHOT, ALLFRAME). In this reanalysis of observations obtained at CTIO, besides recovering the variables previously detected on the basis of our ground-based images, we have also been able to recover most of the RR Lyrae variables previously detected only in the analysis of Hubble Space Telescope WFPC2 observations of the inner region of NGC 6441. In addition, we report five possible new variables not found in the analysis of the EST observations of NGC 6441. This dramatically illustrates the capabilities of image subtraction techniques applied to ground-based data to recover variables in extremely crowded fields. We have also detected twelve new variables and six possible variables in NGC 6388 not found in our previous groundbased studies. Revised mean periods for RRab stars in NGC 6388 and NGC 6441 are 0.676 day and 0.756 day, respectively. These values are among the largest known for any galactic globular cluster. Additional probable type II Cepheids were identified in NGC 6388, confirming its status as a metal-rich globular cluster rich in Cepheids.

  6. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes

    PubMed Central

    Skinnider, Michael A.; Merwin, Nishanth J.; Johnston, Chad W.

    2017-01-01

    Abstract Microbial natural products represent a rich resource of pharmaceutically and industrially important compounds. Genome sequencing has revealed that the majority of natural products remain undiscovered, and computational methods to connect biosynthetic gene clusters to their corresponding natural products therefore have the potential to revitalize natural product discovery. Previously, we described PRediction Informatics for Secondary Metabolomes (PRISM), a combinatorial approach to chemical structure prediction for genetically encoded nonribosomal peptides and type I and II polyketides. Here, we present a ground-up rewrite of the PRISM structure prediction algorithm to derive prediction of natural products arising from non-modular biosynthetic paradigms. Within this new version, PRISM 3, natural product scaffolds are modeled as chemical graphs, permitting structure prediction for aminocoumarins, antimetabolites, bisindoles and phosphonate natural products, and building upon the addition of ribosomally synthesized and post-translationally modified peptides. Further, with the addition of cluster detection for 11 new cluster types, PRISM 3 expands to detect 22 distinct natural product cluster types. Other major modifications to PRISM include improved sequence input and ORF detection, user-friendliness and output. Distribution of PRISM 3 over a 300-core server grid improves the speed and capacity of the web application. PRISM 3 is available at http://magarveylab.ca/prism/. PMID:28460067

  7. Medical Imaging Lesion Detection Based on Unified Gravitational Fuzzy Clustering

    PubMed Central

    Vianney Kinani, Jean Marie; Gallegos Funes, Francisco; Mújica Vargas, Dante; Ramos Díaz, Eduardo; Arellano, Alfonso

    2017-01-01

    We develop a swift, robust, and practical tool for detecting brain lesions with minimal user intervention to assist clinicians and researchers in the diagnosis process, radiosurgery planning, and assessment of the patient's response to the therapy. We propose a unified gravitational fuzzy clustering-based segmentation algorithm, which integrates the Newtonian concept of gravity into fuzzy clustering. We first perform fuzzy rule-based image enhancement on our database which is comprised of T1/T2 weighted magnetic resonance (MR) and fluid-attenuated inversion recovery (FLAIR) images to facilitate a smoother segmentation. The scalar output obtained is fed into a gravitational fuzzy clustering algorithm, which separates healthy structures from the unhealthy. Finally, the lesion contour is automatically outlined through the initialization-free level set evolution method. An advantage of this lesion detection algorithm is its precision and its simultaneous use of features computed from the intensity properties of the MR scan in a cascading pattern, which makes the computation fast, robust, and self-contained. Furthermore, we validate our algorithm with large-scale experiments using clinical and synthetic brain lesion datasets. As a result, an 84%–93% overlap performance is obtained, with an emphasis on robustness with respect to different and heterogeneous types of lesion and a swift computation time. PMID:29158887

  8. Tensor Spectral Clustering for Partitioning Higher-order Network Structures.

    PubMed

    Benson, Austin R; Gleich, David F; Leskovec, Jure

    2015-01-01

    Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.

  9. Tensor Spectral Clustering for Partitioning Higher-order Network Structures

    PubMed Central

    Benson, Austin R.; Gleich, David F.; Leskovec, Jure

    2016-01-01

    Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms. PMID:27812399

  10. Fast detection of vascular plaque in optical coherence tomography images using a reduced feature set

    NASA Astrophysics Data System (ADS)

    Prakash, Ammu; Ocana Macias, Mariano; Hewko, Mark; Sowa, Michael; Sherif, Sherif

    2018-03-01

    Optical coherence tomography (OCT) images are capable of detecting vascular plaque by using the full set of 26 Haralick textural features and a standard K-means clustering algorithm. However, the use of the full set of 26 textural features is computationally expensive and may not be feasible for real time implementation. In this work, we identified a reduced set of 3 textural feature which characterizes vascular plaque and used a generalized Fuzzy C-means clustering algorithm. Our work involves three steps: 1) the reduction of a full set 26 textural feature to a reduced set of 3 textural features by using genetic algorithm (GA) optimization method 2) the implementation of an unsupervised generalized clustering algorithm (Fuzzy C-means) on the reduced feature space, and 3) the validation of our results using histology and actual photographic images of vascular plaque. Our results show an excellent match with histology and actual photographic images of vascular tissue. Therefore, our results could provide an efficient pre-clinical tool for the detection of vascular plaque in real time OCT imaging.

  11. Detection and tracking of gas plumes in LWIR hyperspectral video sequence data

    NASA Astrophysics Data System (ADS)

    Gerhart, Torin; Sunu, Justin; Lieu, Lauren; Merkurjev, Ekaterina; Chang, Jen-Mei; Gilles, Jérôme; Bertozzi, Andrea L.

    2013-05-01

    Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we present an effective method of visualizing hyperspectral video sequences containing chemical plumes and investigate the effectiveness of segmentation techniques on these post-processed videos. Our approach uses a combination of dimension reduction and histogram equalization to prepare the hyperspectral videos for segmentation. First, Principal Components Analysis (PCA) is used to reduce the dimension of the entire video sequence. This is done by projecting each pixel onto the first few Principal Components resulting in a type of spectral filter. Next, a Midway method for histogram equalization is used. These methods redistribute the intensity values in order to reduce icker between frames. This properly prepares these high-dimensional video sequences for more traditional segmentation techniques. We compare the ability of various clustering techniques to properly segment the chemical plume. These include K-means, spectral clustering, and the Ginzburg-Landau functional.

  12. The Impact of Multilocus Variable-Number Tandem-Repeat Analysis on PulseNet Canada Escherichia coli O157:H7 Laboratory Surveillance and Outbreak Support, 2008-2012.

    PubMed

    Rumore, Jillian Leigh; Tschetter, Lorelee; Nadon, Celine

    2016-05-01

    The lack of pattern diversity among pulsed-field gel electrophoresis (PFGE) profiles for Escherichia coli O157:H7 in Canada does not consistently provide optimal discrimination, and therefore, differentiating temporally and/or geographically associated sporadic cases from potential outbreak cases can at times impede investigations. To address this limitation, DNA sequence-based methods such as multilocus variable-number tandem-repeat analysis (MLVA) have been explored. To assess the performance of MLVA as a supplemental method to PFGE from the Canadian perspective, a retrospective analysis of all E. coli O157:H7 isolated in Canada from January 2008 to December 2012 (inclusive) was conducted. A total of 2285 E. coli O157:H7 isolates and 63 clusters of cases (by PFGE) were selected for the study. Based on the qualitative analysis, the addition of MLVA improved the categorization of cases for 60% of clusters and no change was observed for ∼40% of clusters investigated. In such situations, MLVA serves to confirm PFGE results, but may not add further information per se. The findings of this study demonstrate that MLVA data, when used in combination with PFGE-based analyses, provide additional resolution to the detection of clusters lacking PFGE diversity as well as demonstrate good epidemiological concordance. In addition, MLVA is able to identify cluster-associated isolates with variant PFGE pattern combinations that may have been previously missed by PFGE alone. Optimal laboratory surveillance in Canada is achieved with the application of PFGE and MLVA in tandem for routine surveillance, cluster detection, and outbreak response.

  13. Analysis of the effects of the global financial crisis on the Turkish economy, using hierarchical methods

    NASA Astrophysics Data System (ADS)

    Kantar, Ersin; Keskin, Mustafa; Deviren, Bayram

    2012-04-01

    We have analyzed the topology of 50 important Turkish companies for the period 2006-2010 using the concept of hierarchical methods (the minimal spanning tree (MST) and hierarchical tree (HT)). We investigated the statistical reliability of links between companies in the MST by using the bootstrap technique. We also used the average linkage cluster analysis (ALCA) technique to observe the cluster structures much better. The MST and HT are known as useful tools to perceive and detect global structure, taxonomy, and hierarchy in financial data. We obtained four clusters of companies according to their proximity. We also observed that the Banks and Holdings cluster always forms in the centre of the MSTs for the periods 2006-2007, 2008, and 2009-2010. The clusters match nicely with their common production activities or their strong interrelationship. The effects of the Automobile sector increased after the global financial crisis due to the temporary incentives provided by the Turkish government. We find that Turkish companies were not very affected by the global financial crisis.

  14. Fast "swarm of detectors" and their application in cosmic rays

    NASA Astrophysics Data System (ADS)

    Shoziyoev, G. P.; Shoziyoev, Sh. P.

    2017-06-01

    New opportunities in science appeared with the latest technology of the 21st century. This paper points to creating a new architecture for detection systems of different characteristics in astrophysics and geophysics using the latest technologies related to multicopter cluster systems, alternative energy sources, cluster technologies, cloud computing and big data. The idea of a quick-deployable scaleable dynamic system of a controlled drone with a small set of different detectors for detecting various components of extensive air showers in cosmic rays and in geophysics is very attractive. Development of this type of new system also allows to give a multiplier effect for the development of various sciences and research methods to observe natural phenomena.

  15. Deep Galex Observations of the Coma Cluster: Source Catalog and Galaxy Counts

    NASA Technical Reports Server (NTRS)

    Hammer, D.; Hornschemeier, A. E.; Mobasher, B.; Miller, N.; Smith, R.; Arnouts, S.; Milliard, B.; Jenkins, L.

    2010-01-01

    We present a source catalog from deep 26 ks GALEX observations of the Coma cluster in the far-UV (FUV; 1530 Angstroms) and near-UV (NUV; 2310 Angstroms) wavebands. The observed field is centered 0.9 deg. (1.6 Mpc) south-west of the Coma core, and has full optical photometric coverage by SDSS and spectroscopic coverage to r-21. The catalog consists of 9700 galaxies with GALEX and SDSS photometry, including 242 spectroscopically-confirmed Coma member galaxies that range from giant spirals and elliptical galaxies to dwarf irregular and early-type galaxies. The full multi-wavelength catalog (cluster plus background galaxies) is 80% complete to NUV=23 and FUV=23.5, and has a limiting depth at NUV=24.5 and FUV=25.0 which corresponds to a star formation rate of 10(exp -3) solar mass yr(sup -1) at the distance of Coma. The GALEX images presented here are very deep and include detections of many resolved cluster members superposed on a dense field of unresolved background galaxies. This required a two-fold approach to generating a source catalog: we used a Bayesian deblending algorithm to measure faint and compact sources (using SDSS coordinates as a position prior), and used the GALEX pipeline catalog for bright and/or extended objects. We performed simulations to assess the importance of systematic effects (e.g. object blends, source confusion, Eddington Bias) that influence source detection and photometry when using both methods. The Bayesian deblending method roughly doubles the number of source detections and provides reliable photometry to a few magnitudes deeper than the GALEX pipeline catalog. This method is also free from source confusion over the UV magnitude range studied here: conversely, we estimate that the GALEX pipeline catalogs are confusion limited at NUV approximately 23 and FUV approximately 24. We have measured the total UV galaxy counts using our catalog and report a 50% excess of counts across FUV=22-23.5 and NUV=21.5-23 relative to previous GALEX measurements, which is not attributed to cluster member galaxies. Our galaxy counts are a better match to deeper UV counts measured with HST.

  16. Regional Patterns and Spatial Clusters of Nonstationarities in Annual Peak Instantaneous Streamflow

    NASA Astrophysics Data System (ADS)

    White, K. D.; Baker, B.; Mueller, C.; Villarini, G.; Foley, P.; Friedman, D.

    2017-12-01

    Information about hydrologic changes resulting from changes in climate, land use, and land cover is a necessity planning and design or water resources infrastructure. The United States Army Corps of Engineers (USACE) evaluated and selected 12 methods to detect abrupt and slowly varying nonstationarities in records of maximum peak annual flows. They deployed a publicly available tool[1]in 2016 and a guidance document in 2017 to support identification of nonstationarities in a reproducible manner using a robust statistical framework. This statistical framework has now been applied to streamflow records across the continental United States to explore the presence of regional patterns and spatial clusters of nonstationarities in peak annual flow. Incorporating this geographic dimension into the detection of nonstationarities provides valuable insight for the process of attribution of these significant changes. This poster summarizes the methods used and provides the results of the regional analysis. [1] Available here - http://www.corpsclimate.us/ptcih.cfm

  17. Detection of CO emission in Hydra 1 cluster galaxies

    NASA Technical Reports Server (NTRS)

    Huchtmeier, W. K.

    1990-01-01

    A survey of bright Hydra cluster spiral galaxies for the CO(1-0) transition at 115 GHz was performed with the 15m Swedish-ESO submillimeter telescope (SEST). Five out of 15 galaxies observed have been detected in the CO(1-0) line. The largest spiral galaxy in the cluster, NGC 3312, got more CO than any spiral of the Virgo cluster. This Sa-type galaxy is optically largely distorted and disrupted on one side. It is a good candidate for ram pressure stripping while passing through the cluster's central region. A comparison with global CO properties of Virgo cluster spirals shows a relatively good agreement with the detected Hydra cluster galaxies.

  18. Accurate detection of hierarchical communities in complex networks based on nonlinear dynamical evolution

    NASA Astrophysics Data System (ADS)

    Zhuo, Zhao; Cai, Shi-Min; Tang, Ming; Lai, Ying-Cheng

    2018-04-01

    One of the most challenging problems in network science is to accurately detect communities at distinct hierarchical scales. Most existing methods are based on structural analysis and manipulation, which are NP-hard. We articulate an alternative, dynamical evolution-based approach to the problem. The basic principle is to computationally implement a nonlinear dynamical process on all nodes in the network with a general coupling scheme, creating a networked dynamical system. Under a proper system setting and with an adjustable control parameter, the community structure of the network would "come out" or emerge naturally from the dynamical evolution of the system. As the control parameter is systematically varied, the community hierarchies at different scales can be revealed. As a concrete example of this general principle, we exploit clustered synchronization as a dynamical mechanism through which the hierarchical community structure can be uncovered. In particular, for quite arbitrary choices of the nonlinear nodal dynamics and coupling scheme, decreasing the coupling parameter from the global synchronization regime, in which the dynamical states of all nodes are perfectly synchronized, can lead to a weaker type of synchronization organized as clusters. We demonstrate the existence of optimal choices of the coupling parameter for which the synchronization clusters encode accurate information about the hierarchical community structure of the network. We test and validate our method using a standard class of benchmark modular networks with two distinct hierarchies of communities and a number of empirical networks arising from the real world. Our method is computationally extremely efficient, eliminating completely the NP-hard difficulty associated with previous methods. The basic principle of exploiting dynamical evolution to uncover hidden community organizations at different scales represents a "game-change" type of approach to addressing the problem of community detection in complex networks.

  19. Automatic building detection based on Purposive FastICA (PFICA) algorithm using monocular high resolution Google Earth images

    NASA Astrophysics Data System (ADS)

    Ghaffarian, Saman; Ghaffarian, Salar

    2014-11-01

    This paper proposes an improved FastICA model named as Purposive FastICA (PFICA) with initializing by a simple color space transformation and a novel masking approach to automatically detect buildings from high resolution Google Earth imagery. ICA and FastICA algorithms are defined as Blind Source Separation (BSS) techniques for unmixing source signals using the reference data sets. In order to overcome the limitations of the ICA and FastICA algorithms and make them purposeful, we developed a novel method involving three main steps: 1-Improving the FastICA algorithm using Moore-Penrose pseudo inverse matrix model, 2-Automated seeding of the PFICA algorithm based on LUV color space and proposed simple rules to split image into three regions; shadow + vegetation, baresoil + roads and buildings, respectively, 3-Masking out the final building detection results from PFICA outputs utilizing the K-means clustering algorithm with two number of clusters and conducting simple morphological operations to remove noises. Evaluation of the results illustrates that buildings detected from dense and suburban districts with divers characteristics and color combinations using our proposed method have 88.6% and 85.5% overall pixel-based and object-based precision performances, respectively.

  20. Motion estimation accuracy for visible-light/gamma-ray imaging fusion for portable portal monitoring

    NASA Astrophysics Data System (ADS)

    Karnowski, Thomas P.; Cunningham, Mark F.; Goddard, James S.; Cheriyadat, Anil M.; Hornback, Donald E.; Fabris, Lorenzo; Kerekes, Ryan A.; Ziock, Klaus-Peter; Gee, Timothy F.

    2010-01-01

    The use of radiation sensors as portal monitors is increasing due to heightened concerns over the smuggling of fissile material. Portable systems that can detect significant quantities of fissile material that might be present in vehicular traffic are of particular interest. We have constructed a prototype, rapid-deployment portal gamma-ray imaging portal monitor that uses machine vision and gamma-ray imaging to monitor multiple lanes of traffic. Vehicles are detected and tracked by using point detection and optical flow methods as implemented in the OpenCV software library. Points are clustered together but imperfections in the detected points and tracks cause errors in the accuracy of the vehicle position estimates. The resulting errors cause a "blurring" effect in the gamma image of the vehicle. To minimize these errors, we have compared a variety of motion estimation techniques including an estimate using the median of the clustered points, a "best-track" filtering algorithm, and a constant velocity motion estimation model. The accuracy of these methods are contrasted and compared to a manually verified ground-truth measurement by quantifying the rootmean- square differences in the times the vehicles cross the gamma-ray image pixel boundaries compared with a groundtruth manual measurement.

  1. Combination of automated high throughput platforms, flow cytometry, and hierarchical clustering to detect cell state.

    PubMed

    Kitsos, Christine M; Bhamidipati, Phani; Melnikova, Irena; Cash, Ethan P; McNulty, Chris; Furman, Julia; Cima, Michael J; Levinson, Douglas

    2007-01-01

    This study examined whether hierarchical clustering could be used to detect cell states induced by treatment combinations that were generated through automation and high-throughput (HT) technology. Data-mining techniques were used to analyze the large experimental data sets to determine whether nonlinear, non-obvious responses could be extracted from the data. Unary, binary, and ternary combinations of pharmacological factors (examples of stimuli) were used to induce differentiation of HL-60 cells using a HT automated approach. Cell profiles were analyzed by incorporating hierarchical clustering methods on data collected by flow cytometry. Data-mining techniques were used to explore the combinatorial space for nonlinear, unexpected events. Additional small-scale, follow-up experiments were performed on cellular profiles of interest. Multiple, distinct cellular profiles were detected using hierarchical clustering of expressed cell-surface antigens. Data-mining of this large, complex data set retrieved cases of both factor dominance and cooperativity, as well as atypical cellular profiles. Follow-up experiments found that treatment combinations producing "atypical cell types" made those cells more susceptible to apoptosis. CONCLUSIONS Hierarchical clustering and other data-mining techniques were applied to analyze large data sets from HT flow cytometry. From each sample, the data set was filtered and used to define discrete, usable states that were then related back to their original formulations. Analysis of resultant cell populations induced by a multitude of treatments identified unexpected phenotypes and nonlinear response profiles.

  2. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  3. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nurgaliev, D.; McDonald, M.; Benson, B. A.

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  4. Testing for X-Ray–SZ Differences and Redshift Evolution in the X-Ray Morphology of Galaxy Clusters

    DOE PAGES

    Nurgaliev, D.; McDonald, M.; Benson, B. A.; ...

    2017-05-16

    We present a quantitative study of the X-ray morphology of galaxy clusters, as a function of their detection method and redshift. We analyze two separate samples of galaxy clusters: a sample of 36 clusters atmore » $$0.35\\lt z\\lt 0.9$$ selected in the X-ray with the ROSAT PSPC 400 deg(2) survey, and a sample of 90 clusters at $$0.25\\lt z\\lt 1.2$$ selected via the Sunyaev–Zel’dovich (SZ) effect with the South Pole Telescope. Clusters from both samples have similar-quality Chandra observations, which allow us to quantify their X-ray morphologies via two distinct methods: centroid shifts (w) and photon asymmetry ($${A}_{\\mathrm{phot}}$$). The latter technique provides nearly unbiased morphology estimates for clusters spanning a broad range of redshift and data quality. We further compare the X-ray morphologies of X-ray- and SZ-selected clusters with those of simulated clusters. We do not find a statistically significant difference in the measured X-ray morphology of X-ray and SZ-selected clusters over the redshift range probed by these samples, suggesting that the two are probing similar populations of clusters. We find that the X-ray morphologies of simulated clusters are statistically indistinguishable from those of X-ray- or SZ-selected clusters, implying that the most important physics for dictating the large-scale gas morphology (outside of the core) is well-approximated in these simulations. Finally, we find no statistically significant redshift evolution in the X-ray morphology (both for observed and simulated clusters), over the range of $$z\\sim 0.3$$ to $$z\\sim 1$$, seemingly in contradiction with the redshift-dependent halo merger rate predicted by simulations.« less

  5. Clustering Of Left Ventricular Wall Motion Patterns

    NASA Astrophysics Data System (ADS)

    Bjelogrlic, Z.; Jakopin, J.; Gyergyek, L.

    1982-11-01

    A method for detection of wall regions with similar motion was presented. A model based on local direction information was used to measure the left ventricular wall motion from cineangiographic sequence. Three time functions were used to define segmental motion patterns: distance of a ventricular contour segment from the mean contour, the velocity of a segment and its acceleration. Motion patterns were clustered by the UPGMA algorithm and by an algorithm based on K-nearest neighboor classification rule.

  6. Planck 2015 results: XXIII. The thermal Sunyaev-Zeldovich effect-cosmic infrared background correlation

    DOE PAGES

    Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...

    2016-09-20

    In this paper, we use Planck data to detect the cross-correlation between the thermal Sunyaev-Zeldovich (tSZ) effect and the infrared emission from the galaxies that make up the the cosmic infrared background (CIB). We first perform a stacking analysis towards Planck-confirmed galaxy clusters. We detect infrared emission produced by dusty galaxies inside these clusters and demonstrate that the infrared emission is about 50% more extended than the tSZ effect. Modelling the emission with a Navarro-Frenk-White profile, we find that the radial profile concentration parameter is c 500 = 1.00 +0.18 -0.15 . This indicates that infrared galaxies in the outskirtsmore » of clusters have higher infrared flux than cluster-core galaxies. We also study the cross-correlation between tSZ and CIB anisotropies, following three alternative approaches based on power spectrum analyses: (i) using a catalogue of confirmed clusters detected in Planck data; (ii) using an all-sky tSZ map built from Planck frequency maps; and (iii) using cross-spectra between Planck frequency maps. With the three different methods, we detect the tSZ-CIB cross-power spectrum at significance levels of (i) 6σ; (ii) 3σ; and (iii) 4σ. We model the tSZ-CIB cross-correlation signature and compare predictions with the measurements. The amplitude of the cross-correlation relative to the fiducial model is A tSZ-CIB = 1.2 ± 0.3. Finally, this result is consistent with predictions for the tSZ-CIB cross-correlation assuming the best-fit cosmological model from Planck 2015 results along with the tSZ and CIB scaling relations.« less

  7. Planck 2015 results: XXIII. The thermal Sunyaev-Zeldovich effect-cosmic infrared background correlation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ade, P. A. R.; Aghanim, N.; Arnaud, M.

    In this paper, we use Planck data to detect the cross-correlation between the thermal Sunyaev-Zeldovich (tSZ) effect and the infrared emission from the galaxies that make up the the cosmic infrared background (CIB). We first perform a stacking analysis towards Planck-confirmed galaxy clusters. We detect infrared emission produced by dusty galaxies inside these clusters and demonstrate that the infrared emission is about 50% more extended than the tSZ effect. Modelling the emission with a Navarro-Frenk-White profile, we find that the radial profile concentration parameter is c 500 = 1.00 +0.18 -0.15 . This indicates that infrared galaxies in the outskirtsmore » of clusters have higher infrared flux than cluster-core galaxies. We also study the cross-correlation between tSZ and CIB anisotropies, following three alternative approaches based on power spectrum analyses: (i) using a catalogue of confirmed clusters detected in Planck data; (ii) using an all-sky tSZ map built from Planck frequency maps; and (iii) using cross-spectra between Planck frequency maps. With the three different methods, we detect the tSZ-CIB cross-power spectrum at significance levels of (i) 6σ; (ii) 3σ; and (iii) 4σ. We model the tSZ-CIB cross-correlation signature and compare predictions with the measurements. The amplitude of the cross-correlation relative to the fiducial model is A tSZ-CIB = 1.2 ± 0.3. Finally, this result is consistent with predictions for the tSZ-CIB cross-correlation assuming the best-fit cosmological model from Planck 2015 results along with the tSZ and CIB scaling relations.« less

  8. Hemodynamic Response to Interictal Epileptiform Discharges Addressed by Personalized EEG-fNIRS Recordings

    PubMed Central

    Pellegrino, Giovanni; Machado, Alexis; von Ellenrieder, Nicolas; Watanabe, Satsuki; Hall, Jeffery A.; Lina, Jean-Marc; Kobayashi, Eliane; Grova, Christophe

    2016-01-01

    Objective: We aimed at studying the hemodynamic response (HR) to Interictal Epileptic Discharges (IEDs) using patient-specific and prolonged simultaneous ElectroEncephaloGraphy (EEG) and functional Near InfraRed Spectroscopy (fNIRS) recordings. Methods: The epileptic generator was localized using Magnetoencephalography source imaging. fNIRS montage was tailored for each patient, using an algorithm to optimize the sensitivity to the epileptic generator. Optodes were glued using collodion to achieve prolonged acquisition with high quality signal. fNIRS data analysis was handled with no a priori constraint on HR time course, averaging fNIRS signals to similar IEDs. Cluster-permutation analysis was performed on 3D reconstructed fNIRS data to identify significant spatio-temporal HR clusters. Standard (GLM with fixed HRF) and cluster-permutation EEG-fMRI analyses were performed for comparison purposes. Results: fNIRS detected HR to IEDs for 8/9 patients. It mainly consisted oxy-hemoglobin increases (seven patients), followed by oxy-hemoglobin decreases (six patients). HR was lateralized in six patients and lasted from 8.5 to 30 s. Standard EEG-fMRI analysis detected an HR in 4/9 patients (4/9 without enough IEDs, 1/9 unreliable result). The cluster-permutation EEG-fMRI analysis restricted to the region investigated by fNIRS showed additional strong and non-canonical BOLD responses starting earlier than the IEDs and lasting up to 30 s. Conclusions: (i) EEG-fNIRS is suitable to detect the HR to IEDs and can outperform EEG-fMRI because of prolonged recordings and greater chance to detect IEDs; (ii) cluster-permutation analysis unveils additional HR features underestimated when imposing a canonical HR function (iii) the HR is often bilateral and lasts up to 30 s. PMID:27047325

  9. The Cluster AgeS Experiment (CASE). Detecting Aperiodic Photometric Variability with the Friends of Friends Algorithm

    NASA Astrophysics Data System (ADS)

    Rozyczka, M.; Narloch, W.; Pietrukowicz, P.; Thompson, I. B.; Pych, W.; Poleski, R.

    2018-03-01

    We adapt the friends of friends algorithm to the analysis of light curves, and show that it can be succesfully applied to searches for transient phenomena in large photometric databases. As a test case we search OGLE-III light curves for known dwarf novae. A single combination of control parameters allows us to narrow the search to 1% of the data while reaching a ≍90% detection efficiency. A search involving ≍2% of the data and three combinations of control parameters can be significantly more effective - in our case a 100% efficiency is reached. The method can also quite efficiently detect semi-regular variability. In particular, 28 new semi-regular variables have been found in the field of the globular cluster M22, which was examined earlier with the help of periodicity-searching algorithms.

  10. Analysis of capture-recapture models with individual covariates using data augmentation

    USGS Publications Warehouse

    Royle, J. Andrew

    2009-01-01

    I consider the analysis of capture-recapture models with individual covariates that influence detection probability. Bayesian analysis of the joint likelihood is carried out using a flexible data augmentation scheme that facilitates analysis by Markov chain Monte Carlo methods, and a simple and straightforward implementation in freely available software. This approach is applied to a study of meadow voles (Microtus pennsylvanicus) in which auxiliary data on a continuous covariate (body mass) are recorded, and it is thought that detection probability is related to body mass. In a second example, the model is applied to an aerial waterfowl survey in which a double-observer protocol is used. The fundamental unit of observation is the cluster of individual birds, and the size of the cluster (a discrete covariate) is used as a covariate on detection probability.

  11. Space-Time Analysis of Testicular Cancer Clusters Using Residential Histories: A Case-Control Study in Denmark

    PubMed Central

    Sloan, Chantel D.; Nordsborg, Rikke B.; Jacquez, Geoffrey M.; Raaschou-Nielsen, Ole; Meliker, Jaymie R.

    2015-01-01

    Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population. PMID:25756204

  12. Space-time analysis of testicular cancer clusters using residential histories: a case-control study in Denmark.

    PubMed

    Sloan, Chantel D; Nordsborg, Rikke B; Jacquez, Geoffrey M; Raaschou-Nielsen, Ole; Meliker, Jaymie R

    2015-01-01

    Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population.

  13. Prediction, Detection, and Validation of Isotope Clusters in Mass Spectrometry Data

    PubMed Central

    Treutler, Hendrik; Neumann, Steffen

    2016-01-01

    Mass spectrometry is a key analytical platform for metabolomics. The precise quantification and identification of small molecules is a prerequisite for elucidating the metabolism and the detection, validation, and evaluation of isotope clusters in LC-MS data is important for this task. Here, we present an approach for the improved detection of isotope clusters using chemical prior knowledge and the validation of detected isotope clusters depending on the substance mass using database statistics. We find remarkable improvements regarding the number of detected isotope clusters and are able to predict the correct molecular formula in the top three ranks in 92% of the cases. We make our methodology freely available as part of the Bioconductor packages xcms version 1.50.0 and CAMERA version 1.30.0. PMID:27775610

  14. Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes.

    PubMed

    Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun

    2015-11-04

    There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.

  15. DEWS (DEep White matter hyperintensity Segmentation framework): A fully automated pipeline for detecting small deep white matter hyperintensities in migraineurs.

    PubMed

    Park, Bo-Yong; Lee, Mi Ji; Lee, Seung-Hak; Cha, Jihoon; Chung, Chin-Sang; Kim, Sung Tae; Park, Hyunjin

    2018-01-01

    Migraineurs show an increased load of white matter hyperintensities (WMHs) and more rapid deep WMH progression. Previous methods for WMH segmentation have limited efficacy to detect small deep WMHs. We developed a new fully automated detection pipeline, DEWS (DEep White matter hyperintensity Segmentation framework), for small and superficially-located deep WMHs. A total of 148 non-elderly subjects with migraine were included in this study. The pipeline consists of three components: 1) white matter (WM) extraction, 2) WMH detection, and 3) false positive reduction. In WM extraction, we adjusted the WM mask to re-assign misclassified WMHs back to WM using many sequential low-level image processing steps. In WMH detection, the potential WMH clusters were detected using an intensity based threshold and region growing approach. For false positive reduction, the detected WMH clusters were classified into final WMHs and non-WMHs using the random forest (RF) classifier. Size, texture, and multi-scale deep features were used to train the RF classifier. DEWS successfully detected small deep WMHs with a high positive predictive value (PPV) of 0.98 and true positive rate (TPR) of 0.70 in the training and test sets. Similar performance of PPV (0.96) and TPR (0.68) was attained in the validation set. DEWS showed a superior performance in comparison with other methods. Our proposed pipeline is freely available online to help the research community in quantifying deep WMHs in non-elderly adults.

  16. Cluster ensemble based on Random Forests for genetic data.

    PubMed

    Alhusain, Luluah; Hafez, Alaaeldin M

    2017-01-01

    Clustering plays a crucial role in several application domains, such as bioinformatics. In bioinformatics, clustering has been extensively used as an approach for detecting interesting patterns in genetic data. One application is population structure analysis, which aims to group individuals into subpopulations based on shared genetic variations, such as single nucleotide polymorphisms. Advances in DNA sequencing technology have facilitated the obtainment of genetic datasets with exceptional sizes. Genetic data usually contain hundreds of thousands of genetic markers genotyped for thousands of individuals, making an efficient means for handling such data desirable. Random Forests (RFs) has emerged as an efficient algorithm capable of handling high-dimensional data. RFs provides a proximity measure that can capture different levels of co-occurring relationships between variables. RFs has been widely considered a supervised learning method, although it can be converted into an unsupervised learning method. Therefore, RF-derived proximity measure combined with a clustering technique may be well suited for determining the underlying structure of unlabeled data. This paper proposes, RFcluE, a cluster ensemble approach for determining the underlying structure of genetic data based on RFs. The approach comprises a cluster ensemble framework to combine multiple runs of RF clustering. Experiments were conducted on high-dimensional, real genetic dataset to evaluate the proposed approach. The experiments included an examination of the impact of parameter changes, comparing RFcluE performance against other clustering methods, and an assessment of the relationship between the diversity and quality of the ensemble and its effect on RFcluE performance. This paper proposes, RFcluE, a cluster ensemble approach based on RF clustering to address the problem of population structure analysis and demonstrate the effectiveness of the approach. The paper also illustrates that applying a cluster ensemble approach, combining multiple RF clusterings, produces more robust and higher-quality results as a consequence of feeding the ensemble with diverse views of high-dimensional genetic data obtained through bagging and random subspace, the two key features of the RF algorithm.

  17. Locally adaptive decision in detection of clustered microcalcifications in mammograms.

    PubMed

    Sainz de Cea, María V; Nishikawa, Robert M; Yang, Yongyi

    2018-02-15

    In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value  <10 -4 ). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.

  18. Locally adaptive decision in detection of clustered microcalcifications in mammograms

    NASA Astrophysics Data System (ADS)

    Sainz de Cea, María V.; Nishikawa, Robert M.; Yang, Yongyi

    2018-02-01

    In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value  <10-4). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.

  19. Unsupervised spike sorting based on discriminative subspace learning.

    PubMed

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2014-01-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.

  20. Parallel and Scalable Clustering and Classification for Big Data in Geosciences

    NASA Astrophysics Data System (ADS)

    Riedel, M.

    2015-12-01

    Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.

  1. An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network

    PubMed Central

    Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian

    2015-01-01

    Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish–Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection. PMID:26447696

  2. An Integrated Intrusion Detection Model of Cluster-Based Wireless Sensor Network.

    PubMed

    Sun, Xuemei; Yan, Bo; Zhang, Xinzhong; Rong, Chuitian

    2015-01-01

    Considering wireless sensor network characteristics, this paper combines anomaly and mis-use detection and proposes an integrated detection model of cluster-based wireless sensor network, aiming at enhancing detection rate and reducing false rate. Adaboost algorithm with hierarchical structures is used for anomaly detection of sensor nodes, cluster-head nodes and Sink nodes. Cultural-Algorithm and Artificial-Fish-Swarm-Algorithm optimized Back Propagation is applied to mis-use detection of Sink node. Plenty of simulation demonstrates that this integrated model has a strong performance of intrusion detection.

  3. Extraction of multi-scale landslide morphological features based on local Gi* using airborne LiDAR-derived DEM

    NASA Astrophysics Data System (ADS)

    Shi, Wenzhong; Deng, Susu; Xu, Wenbing

    2018-02-01

    For automatic landslide detection, landslide morphological features should be quantitatively expressed and extracted. High-resolution Digital Elevation Models (DEMs) derived from airborne Light Detection and Ranging (LiDAR) data allow fine-scale morphological features to be extracted, but noise in DEMs influences morphological feature extraction, and the multi-scale nature of landslide features should be considered. This paper proposes a method to extract landslide morphological features characterized by homogeneous spatial patterns. Both profile and tangential curvature are utilized to quantify land surface morphology, and a local Gi* statistic is calculated for each cell to identify significant patterns of clustering of similar morphometric values. The method was tested on both synthetic surfaces simulating natural terrain and airborne LiDAR data acquired over an area dominated by shallow debris slides and flows. The test results of the synthetic data indicate that the concave and convex morphologies of the simulated terrain features at different scales and distinctness could be recognized using the proposed method, even when random noise was added to the synthetic data. In the test area, cells with large local Gi* values were extracted at a specified significance level from the profile and the tangential curvature image generated from the LiDAR-derived 1-m DEM. The morphologies of landslide main scarps, source areas and trails were clearly indicated, and the morphological features were represented by clusters of extracted cells. A comparison with the morphological feature extraction method based on curvature thresholds proved the proposed method's robustness to DEM noise. When verified against a landslide inventory, the morphological features of almost all recent (< 5 years) landslides and approximately 35% of historical (> 10 years) landslides were extracted. This finding indicates that the proposed method can facilitate landslide detection, although the cell clusters extracted from curvature images should be filtered using a filtering strategy based on supplementary information provided by expert knowledge or other data sources.

  4. Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic.

    PubMed

    Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno

    2016-01-22

    Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.

  5. Spectral-clustering approach to Lagrangian vortex detection.

    PubMed

    Hadjighasem, Alireza; Karrasch, Daniel; Teramoto, Hiroshi; Haller, George

    2016-06-01

    One of the ubiquitous features of real-life turbulent flows is the existence and persistence of coherent vortices. Here we show that such coherent vortices can be extracted as clusters of Lagrangian trajectories. We carry out the clustering on a weighted graph, with the weights measuring pairwise distances of fluid trajectories in the extended phase space of positions and time. We then extract coherent vortices from the graph using tools from spectral graph theory. Our method locates all coherent vortices in the flow simultaneously, thereby showing high potential for automated vortex tracking. We illustrate the performance of this technique by identifying coherent Lagrangian vortices in several two- and three-dimensional flows.

  6. Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

    DTIC Science & Technology

    2016-04-26

    REPORT TYPE Final 3. DATES COVERED (From - To) 15 Oct 2014 to 14 Jan 2015 4. TITLE AND SUBTITLE Detecting statistically significant clusters of...extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif-based clusters in complex...priori, the need for triangle motif-based clustering . 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was

  7. Innovative Strategies to Develop Chemical Categories Using a Combination of Structural and Toxicological Properties.

    PubMed

    Batke, Monika; Gütlein, Martin; Partosch, Falko; Gundert-Remy, Ursula; Helma, Christoph; Kramer, Stefan; Maunz, Andreas; Seeland, Madeleine; Bitsch, Annette

    2016-01-01

    Interest is increasing in the development of non-animal methods for toxicological evaluations. These methods are however, particularly challenging for complex toxicological endpoints such as repeated dose toxicity. European Legislation, e.g., the European Union's Cosmetic Directive and REACH, demands the use of alternative methods. Frameworks, such as the Read-across Assessment Framework or the Adverse Outcome Pathway Knowledge Base, support the development of these methods. The aim of the project presented in this publication was to develop substance categories for a read-across with complex endpoints of toxicity based on existing databases. The basic conceptual approach was to combine structural similarity with shared mechanisms of action. Substances with similar chemical structure and toxicological profile form candidate categories suitable for read-across. We combined two databases on repeated dose toxicity, RepDose database, and ELINCS database to form a common database for the identification of categories. The resulting database contained physicochemical, structural, and toxicological data, which were refined and curated for cluster analyses. We applied the Predictive Clustering Tree (PCT) approach for clustering chemicals based on structural and on toxicological information to detect groups of chemicals with similar toxic profiles and pathways/mechanisms of toxicity. As many of the experimental toxicity values were not available, this data was imputed by predicting them with a multi-label classification method, prior to clustering. The clustering results were evaluated by assessing chemical and toxicological similarities with the aim of identifying clusters with a concordance between structural information and toxicity profiles/mechanisms. From these chosen clusters, seven were selected for a quantitative read-across, based on a small ratio of NOAEL of the members with the highest and the lowest NOAEL in the cluster (< 5). We discuss the limitations of the approach. Based on this analysis we propose improvements for a follow-up approach, such as incorporation of metabolic information and more detailed mechanistic information. The software enables the user to allocate a substance in a cluster and to use this information for a possible read- across. The clustering tool is provided as a free web service, accessible at http://mlc-reach.informatik.uni-mainz.de.

  8. Synthesis of colloidal silver nanoparticle clusters and their application in ascorbic acid detection by SERS.

    PubMed

    Cholula-Díaz, Jorge L; Lomelí-Marroquín, Diana; Pramanick, Bidhan; Nieto-Argüello, Alfonso; Cantú-Castillo, Luis A; Hwang, Hyundoo

    2018-03-01

    Ascorbic acid (vitamin C) has an essential role in the human body mainly due to its antioxidant function. In this work, metallic silver nanoparticle (AgNP) colloids were used in SERS experiments to detect ascorbic acid in aqueous solution. The AgNPs were synthesized by a green method using potato starch as reducing and stabilizing agent, and water as the solvent. The optical properties of the yellowish as-synthesized silver colloids were characterized by UV-vis spectroscopy, in which besides a typical band at 410 nm related to the localized surface plasmon resonance of the silver nanoparticles, a shoulder band around 500 nm, due to silver nanoparticle cluster formation, is presented when relatively higher concentrations of starch are used in the synthesis. These starch-capped silver nanoparticles show an intrinsic Raman peak at 1386 cm -1 assigned to deformation modes of the starch structure. The increase of the intensity of the SERS peak at 1386 cm -1 with an increase in the concentration of the ascorbic acid is related to a decrease of the gap between dimers and trimers of the silver nanoparticle clusters produced by the presence of ascorbic acid in the colloid. The limit of detection of this technique for ascorbic acid is 0.02 mM with a measurement concentration range of 0.02-10 mM, which is relevant for the application of this method for detecting ascorbic acid in biological specimen. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Varying face occlusion detection and iterative recovery for face recognition

    NASA Astrophysics Data System (ADS)

    Wang, Meng; Hu, Zhengping; Sun, Zhe; Zhao, Shuhuan; Sun, Mei

    2017-05-01

    In most sparse representation methods for face recognition (FR), occlusion problems were usually solved via removing the occlusion part of both query samples and training samples to perform the recognition process. This practice ignores the global feature of facial image and may lead to unsatisfactory results due to the limitation of local features. Considering the aforementioned drawback, we propose a method called varying occlusion detection and iterative recovery for FR. The main contributions of our method are as follows: (1) to detect an accurate occlusion area of facial images, an image processing and intersection-based clustering combination method is used for occlusion FR; (2) according to an accurate occlusion map, the new integrated facial images are recovered iteratively and put into a recognition process; and (3) the effectiveness on recognition accuracy of our method is verified by comparing it with three typical occlusion map detection methods. Experiments show that the proposed method has a highly accurate detection and recovery performance and that it outperforms several similar state-of-the-art methods against partial contiguous occlusion.

  10. A novel infrared small moving target detection method based on tracking interest points under complicated background

    NASA Astrophysics Data System (ADS)

    Dong, Xiabin; Huang, Xinsheng; Zheng, Yongbin; Bai, Shengjian; Xu, Wanying

    2014-07-01

    Infrared moving target detection is an important part of infrared technology. We introduce a novel infrared small moving target detection method based on tracking interest points under complicated background. Firstly, Difference of Gaussians (DOG) filters are used to detect a group of interest points (including the moving targets). Secondly, a sort of small targets tracking method inspired by Human Visual System (HVS) is used to track these interest points for several frames, and then the correlations between interest points in the first frame and the last frame are obtained. Last, a new clustering method named as R-means is proposed to divide these interest points into two groups according to the correlations, one is target points and another is background points. In experimental results, the target-to-clutter ratio (TCR) and the receiver operating characteristics (ROC) curves are computed experimentally to compare the performances of the proposed method and other five sophisticated methods. From the results, the proposed method shows a better discrimination of targets and clutters and has a lower false alarm rate than the existing moving target detection methods.

  11. Taxonomy and clustering in collaborative systems: The case of the on-line encyclopedia Wikipedia

    NASA Astrophysics Data System (ADS)

    Capocci, A.; Rao, F.; Caldarelli, G.

    2008-01-01

    In this paper we investigate the nature and structure of the relation between imposed classifications and real clustering in a particular case of a scale-free network given by the on-line encyclopedia Wikipedia. We find a statistical similarity in the distributions of community sizes both by using the top-down approach of the categories division present in the archive and in the bottom-up procedure of community detection given by an algorithm based on the spectral properties of the graph. Regardless of the statistically similar behaviour, the two methods provide a rather different division of the articles, thereby signaling that the nature and presence of power laws is a general feature for these systems and cannot be used as a benchmark to evaluate the suitability of a clustering method.

  12. Colour segmentation of multi variants tuberculosis sputum images using self organizing map

    NASA Astrophysics Data System (ADS)

    Rulaningtyas, Riries; Suksmono, Andriyan B.; Mengko, Tati L. R.; Saptawati, Putri

    2017-05-01

    Lung tuberculosis detection is still identified from Ziehl-Neelsen sputum smear images in low and middle countries. The clinicians decide the grade of this disease by counting manually the amount of tuberculosis bacilli. It is very tedious for clinicians with a lot number of patient and without standardization for sputum staining. The tuberculosis sputum images have multi variant characterizations in colour, because of no standardization in staining. The sputum has more variants colour and they are difficult to be identified. For helping the clinicians, this research examined the Self Organizing Map method for colouring image segmentation in sputum images based on colour clustering. This method has better performance than k-means clustering which also tried in this research. The Self Organizing Map could segment the sputum images with y good result and cluster the colours adaptively.

  13. Clustering analysis of moving target signatures

    NASA Astrophysics Data System (ADS)

    Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto

    2010-04-01

    Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.

  14. The X-CLASS-redMaPPer galaxy cluster comparison. I. Identification procedures

    NASA Astrophysics Data System (ADS)

    Sadibekova, T.; Pierre, M.; Clerc, N.; Faccioli, L.; Gastaud, R.; Le Fevre, J.-P.; Rozo, E.; Rykoff, E.

    2014-11-01

    Context. This paper is the first in a series undertaking a comprehensive correlation analysis between optically selected and X-ray-selected cluster catalogues. The rationale of the project is to develop a holistic picture of galaxy clusters utilising optical and X-ray-cluster-selected catalogues with well-understood selection functions. Aims: Unlike most of the X-ray/optical cluster correlations to date, the present paper focuses on the non-matching objects in either waveband. We investigate how the differences observed between the optical and X-ray catalogues may stem from (1) a shortcoming of the detection algorithms; (2) dispersion in the X-ray/optical scaling relations; or (3) substantial intrinsic differences between the cluster populations probed in the X-ray and optical bands. The aim is to inventory and elucidate these effects in order to account for selection biases in the further determination of X-ray/optical cluster scaling relations. Methods: We correlated the X-CLASS serendipitous cluster catalogue extracted from the XMM archive with the redMaPPer optical cluster catalogue derived from the Sloan Digital Sky Survey (DR8). We performed a detailed and, in large part, interactive analysis of the matching output from the correlation. The overlap between the two catalogues has been accurately determined and possible cluster positional errors were manually recovered. The final samples comprise 270 and 355 redMaPPer and X-CLASS clusters, respectively. X-ray cluster matching rates were analysed as a function of optical richness. In the second step, the redMaPPer clusters were correlated with the entire X-ray catalogue, containing point and uncharacterised sources (down to a few 10-15 erg s-1 cm-2 in the [0.5-2] keV band). A stacking analysis was performed for the remaining undetected optical clusters. Results: We find that all rich (λ ≥ 80) clusters are detected in X-rays out to z = 0.6. Below this redshift, the richness threshold for X-ray detection steadily decreases with redshift. Likewise, all X-ray bright clusters are detected by redMaPPer. After correcting for obvious pipeline shortcomings (about 10% of the cases both in optical and X-ray), ~50% of the redMaPPer (down to a richness of 20) are found to coincide with an X-CLASS cluster; when considering X-ray sources of any type, this fraction increases to ~80%; for the remaining objects, the stacking analysis finds a weak signal within 0.5 Mpc around the cluster optical centres. The fraction of clusters totally dominated by AGN-type emission appears to be a few percent. Conversely, ~40% of the X-CLASS clusters are identified with a redMaPPer (down to a richness of 20) - part of the non-matches being due to the X-CLASS sample extending further out than redMaPPer (z< 1.5 vs. z< 0.6), but extending the correlation down to a richness of 5 raises the matching rate to ~65%. Conclusions: This state-of-the-art study involving two well-validated cluster catalogues has shown itself to be complex, and it points to a number of issues inherent to blind cross-matching, owing both to pipeline shortcomings and cluster peculiar properties. These can only been accounted for after a manual check. The combined X-ray and optical scaling relations will be presented in a subsequent article.

  15. Using an innovative multiple regression procedure in a cancer population (Part 1): detecting and probing relationships of common interacting symptoms (pain, fatigue/weakness, sleep problems) as a strategy to discover influential symptom pairs and clusters.

    PubMed

    Francoeur, Richard B

    2015-01-01

    The majority of patients with advanced cancer experience symptom pairs or clusters among pain, fatigue, and insomnia. Improved methods are needed to detect and interpret interactions among symptoms or diesease markers to reveal influential pairs or clusters. In prior work, I developed and validated sequential residual centering (SRC), a method that improves the sensitivity of multiple regression to detect interactions among predictors, by conditioning for multicollinearity (shared variation) among interactions and component predictors. Using a hypothetical three-way interaction among pain, fatigue, and sleep to predict depressive affect, I derive and explain SRC multiple regression. Subsequently, I estimate raw and SRC multiple regressions using real data for these symptoms from 268 palliative radiation outpatients. Unlike raw regression, SRC reveals that the three-way interaction (pain × fatigue/weakness × sleep problems) is statistically significant. In follow-up analyses, the relationship between pain and depressive affect is aggravated (magnified) within two partial ranges: 1) complete-to-some control over fatigue/weakness when there is complete control over sleep problems (ie, a subset of the pain-fatigue/weakness symptom pair), and 2) no control over fatigue/weakness when there is some-to-no control over sleep problems (ie, a subset of the pain-fatigue/weakness-sleep problems symptom cluster). Otherwise, the relationship weakens (buffering) as control over fatigue/weakness or sleep problems diminishes. By reducing the standard error, SRC unmasks a three-way interaction comprising a symptom pair and cluster. Low-to-moderate levels of the moderator variable for fatigue/weakness magnify the relationship between pain and depressive affect. However, when the comoderator variable for sleep problems accompanies fatigue/weakness, only frequent or unrelenting levels of both symptoms magnify the relationship. These findings suggest that a countervailing mechanism involving depressive affect could account for the effectiveness of a cognitive behavioral intervention to reduce the severity of a pain, fatigue, and sleep disturbance cluster in a previous randomized trial.

  16. Characterization of computer network events through simultaneous feature selection and clustering of intrusion alerts

    NASA Astrophysics Data System (ADS)

    Chen, Siyue; Leung, Henry; Dondo, Maxwell

    2014-05-01

    As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.

  17. Photometric Determination of Binary Mass Ratios in the WIYN Open Cluster Study (WOCS) Using Theoretical Isochrones

    NASA Astrophysics Data System (ADS)

    Cai, K.; Durisen, R. H.; Deliyannis, C. P.

    2003-05-01

    Binary stars in Galactic open clusters are difficult to detect without spectroscopic observations. However, from theoretical isochrones, we find that binary stars with different primary masses M1 and mass ratios q = M2/M1 have measurably different behaviors in various UBVRI color-magnitude and color-color diagrams. By using appropriate Yonsei-Yale Isochrones, in the best cases we can evaluate M1 and q to within about +/- 0.1Msun and +/- 0.1, respectively, for individual proper-motion members that have multiple WOCS UBVRI measurements of high quality. The cluster metallicity, reddening, and distance modulus and best-fit isochrones are determined self-consistently from the same WOCS data. This technique allows us to detect binaries and determine their mass ratios in open clusters without time-consuming spectrocopy, which is only sensitive to a limited range of binary separations. We will report results from this photometric technique for WOCS cluster M35 for M1 in the range of 1 to 4 Msun. For the lower main sequence, we used the empirical colors to reduce the error introduced by the problematic color transformations of Y2 Isochrones. In addition to other sources of uncertainty, we have considered effects of rapid rotation and pulsational instability. We plan to apply our method to other WOCS clusters in the future and explore differences in binary fractions and/or mass ratio distributions as a function of cluster age, metallicity, and other parameters.

  18. Unsupervised consensus cluster analysis of [18F]-fluoroethyl-L-tyrosine positron emission tomography identified textural features for the diagnosis of pseudoprogression in high-grade glioma

    PubMed Central

    Kebir, Sied; Khurshid, Zain; Gaertner, Florian C.; Essler, Markus; Hattingen, Elke; Fimmers, Rolf; Scheffler, Björn; Herrlinger, Ulrich; Bundschuh, Ralph A.; Glas, Martin

    2017-01-01

    Rationale Timely detection of pseudoprogression (PSP) is crucial for the management of patients with high-grade glioma (HGG) but remains difficult. Textural features of O-(2-[18F]fluoroethyl)-L-tyrosine positron emission tomography (FET-PET) mirror tumor uptake heterogeneity; some of them may be associated with tumor progression. Methods Fourteen patients with HGG and suspected of PSP underwent FET-PET imaging. A set of 19 conventional and textural FET-PET features were evaluated and subjected to unsupervised consensus clustering. The final diagnosis of true progression vs. PSP was based on follow-up MRI using RANO criteria. Results Three robust clusters have been identified based on 10 predominantly textural FET-PET features. None of the patients with PSP fell into cluster 2, which was associated with high values for textural FET-PET markers of uptake heterogeneity. Three out of 4 patients with PSP were assigned to cluster 3 that was largely associated with low values of textural FET-PET features. By comparison, tumor-to-normal brain ratio (TNRmax) at the optimal cutoff 2.1 was less predictive of PSP (negative predictive value 57% for detecting true progression, p=0.07 vs. 75% with cluster 3, p=0.04). Principal Conclusions Clustering based on textural O-(2-[18F]fluoroethyl)-L-tyrosine PET features may provide valuable information in assessing the elusive phenomenon of pseudoprogression. PMID:28030820

  19. Weak lensing study of 16 DAFT/FADA clusters: Substructures and filaments

    NASA Astrophysics Data System (ADS)

    Martinet, Nicolas; Clowe, Douglas; Durret, Florence; Adami, Christophe; Acebrón, Ana; Hernandez-García, Lorena; Márquez, Isabel; Guennou, Loic; Sarron, Florian; Ulmer, Mel

    2016-05-01

    While our current cosmological model places galaxy clusters at the nodes of a filament network (the cosmic web), we still struggle to detect these filaments at high redshifts. We perform a weak lensing study for a sample of 16 massive, medium-high redshift (0.4

  20. Detection Method of TOXOPLASMA GONDII Tachyzoites

    NASA Astrophysics Data System (ADS)

    Eassa, Souzan; Bose, Chhanda; Alusta, Pierre; Tarasenko, Olga

    2011-06-01

    Tachyzoites are considered to be the most important stage of Toxoplasma gondii which causes toxoplasmosis. T. gondii is, an obligate intracellular parasite which infects a wide range of cells. The present study was designed to develop a method for an early detection of T. gondii tachyzoites. The method comprised of a binding assay which was analyzed using principal component and cluster analysis. Our data showed that glycoconjugates GC1, GC2, GC3 and GC10 exhibit a significantly higher binding affinity for T. gondii tachyzoites as compared to controls (T. gondii only, PAA only, GC 1, 2, 3, and 10 only).

  1. Simulation-Based Evaluation of the Performances of an Algorithm for Detecting Abnormal Disease-Related Features in Cattle Mortality Records.

    PubMed

    Perrin, Jean-Baptiste; Durand, Benoît; Gay, Emilie; Ducrot, Christian; Hendrikx, Pascal; Calavas, Didier; Hénaux, Viviane

    2015-01-01

    We performed a simulation study to evaluate the performances of an anomaly detection algorithm considered in the frame of an automated surveillance system of cattle mortality. The method consisted in a combination of temporal regression and spatial cluster detection which allows identifying, for a given week, clusters of spatial units showing an excess of deaths in comparison with their own historical fluctuations. First, we simulated 1,000 outbreaks of a disease causing extra deaths in the French cattle population (about 200,000 herds and 20 million cattle) according to a model mimicking the spreading patterns of an infectious disease and injected these disease-related extra deaths in an authentic mortality dataset, spanning from January 2005 to January 2010. Second, we applied our algorithm on each of the 1,000 semi-synthetic datasets to identify clusters of spatial units showing an excess of deaths considering their own historical fluctuations. Third, we verified if the clusters identified by the algorithm did contain simulated extra deaths in order to evaluate the ability of the algorithm to identify unusual mortality clusters caused by an outbreak. Among the 1,000 simulations, the median duration of simulated outbreaks was 8 weeks, with a median number of 5,627 simulated deaths and 441 infected herds. Within the 12-week trial period, 73% of the simulated outbreaks were detected, with a median timeliness of 1 week, and a mean of 1.4 weeks. The proportion of outbreak weeks flagged by an alarm was 61% (i.e. sensitivity) whereas one in three alarms was a true alarm (i.e. positive predictive value). The performances of the detection algorithm were evaluated for alternative combination of epidemiologic parameters. The results of our study confirmed that in certain conditions automated algorithms could help identifying abnormal cattle mortality increases possibly related to unidentified health events.

  2. Simulation-Based Evaluation of the Performances of an Algorithm for Detecting Abnormal Disease-Related Features in Cattle Mortality Records

    PubMed Central

    Perrin, Jean-Baptiste; Durand, Benoît; Gay, Emilie; Ducrot, Christian; Hendrikx, Pascal; Calavas, Didier; Hénaux, Viviane

    2015-01-01

    We performed a simulation study to evaluate the performances of an anomaly detection algorithm considered in the frame of an automated surveillance system of cattle mortality. The method consisted in a combination of temporal regression and spatial cluster detection which allows identifying, for a given week, clusters of spatial units showing an excess of deaths in comparison with their own historical fluctuations. First, we simulated 1,000 outbreaks of a disease causing extra deaths in the French cattle population (about 200,000 herds and 20 million cattle) according to a model mimicking the spreading patterns of an infectious disease and injected these disease-related extra deaths in an authentic mortality dataset, spanning from January 2005 to January 2010. Second, we applied our algorithm on each of the 1,000 semi-synthetic datasets to identify clusters of spatial units showing an excess of deaths considering their own historical fluctuations. Third, we verified if the clusters identified by the algorithm did contain simulated extra deaths in order to evaluate the ability of the algorithm to identify unusual mortality clusters caused by an outbreak. Among the 1,000 simulations, the median duration of simulated outbreaks was 8 weeks, with a median number of 5,627 simulated deaths and 441 infected herds. Within the 12-week trial period, 73% of the simulated outbreaks were detected, with a median timeliness of 1 week, and a mean of 1.4 weeks. The proportion of outbreak weeks flagged by an alarm was 61% (i.e. sensitivity) whereas one in three alarms was a true alarm (i.e. positive predictive value). The performances of the detection algorithm were evaluated for alternative combination of epidemiologic parameters. The results of our study confirmed that in certain conditions automated algorithms could help identifying abnormal cattle mortality increases possibly related to unidentified health events. PMID:26536596

  3. Identification of temporal variations in mental workload using locally-linear-embedding-based EEG feature reduction and support-vector-machine-based clustering and classification techniques.

    PubMed

    Yin, Zhong; Zhang, Jianhua

    2014-07-01

    Identifying the abnormal changes of mental workload (MWL) over time is quite crucial for preventing the accidents due to cognitive overload and inattention of human operators in safety-critical human-machine systems. It is known that various neuroimaging technologies can be used to identify the MWL variations. In order to classify MWL into a few discrete levels using representative MWL indicators and small-sized training samples, a novel EEG-based approach by combining locally linear embedding (LLE), support vector clustering (SVC) and support vector data description (SVDD) techniques is proposed and evaluated by using the experimentally measured data. The MWL indicators from different cortical regions are first elicited by using the LLE technique. Then, the SVC approach is used to find the clusters of these MWL indicators and thereby to detect MWL variations. It is shown that the clusters can be interpreted as the binary class MWL. Furthermore, a trained binary SVDD classifier is shown to be capable of detecting slight variations of those indicators. By combining the two schemes, a SVC-SVDD framework is proposed, where the clear-cut (smaller) cluster is detected by SVC first and then a subsequent SVDD model is utilized to divide the overlapped (larger) cluster into two classes. Finally, three-class MWL levels (low, normal and high) can be identified automatically. The experimental data analysis results are compared with those of several existing methods. It has been demonstrated that the proposed framework can lead to acceptable computational accuracy and has the advantages of both unsupervised and supervised training strategies. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  4. Association rule mining in the US Vaccine Adverse Event Reporting System (VAERS).

    PubMed

    Wei, Lai; Scott, John

    2015-09-01

    Spontaneous adverse event reporting systems are critical tools for monitoring the safety of licensed medical products. Commonly used signal detection algorithms identify disproportionate product-adverse event pairs and may not be sensitive to more complex potential signals. We sought to develop a computationally tractable multivariate data-mining approach to identify product-multiple adverse event associations. We describe an application of stepwise association rule mining (Step-ARM) to detect potential vaccine-symptom group associations in the US Vaccine Adverse Event Reporting System. Step-ARM identifies strong associations between one vaccine and one or more adverse events. To reduce the number of redundant association rules found by Step-ARM, we also propose a clustering method for the post-processing of association rules. In sample applications to a trivalent intradermal inactivated influenza virus vaccine and to measles, mumps, rubella, and varicella (MMRV) vaccine and in simulation studies, we find that Step-ARM can detect a variety of medically coherent potential vaccine-symptom group signals efficiently. In the MMRV example, Step-ARM appears to outperform univariate methods in detecting a known safety signal. Our approach is sensitive to potentially complex signals, which may be particularly important when monitoring novel medical countermeasure products such as pandemic influenza vaccines. The post-processing clustering algorithm improves the applicability of the approach as a screening method to identify patterns that may merit further investigation. Copyright © 2015 John Wiley & Sons, Ltd.

  5. A hybrid protection approaches for denial of service (DoS) attacks in wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Gunasekaran, Mahalakshmi; Periakaruppan, Subathra

    2017-06-01

    Wireless sensor network (WSN) contains the distributed autonomous devices with the sensing capability of physical and environmental conditions. During the clustering operation, the consumption of more energy causes the draining in battery power that leads to minimum network lifetime. Hence, the WSN devices are initially operated on low-power sleep mode to maximise the lifetime. But, the attacks arrival cause the disruption in low-power operating called denial of service (DoS) attacks. The conventional intrusion detection (ID) approaches such as rule-based and anomaly-based methods effectively detect the DoS attacks. But, the energy consumption and false detection rate are more. The absence of attack information and broadcast of its impact to the other cluster head (CH) leads to easy DoS attacks arrival. This article combines the isolation and routing tables to detect the attack in the specific cluster and broadcasts the information to other CH. The intercommunication between the CHs prevents the DoS attacks effectively. In addition, the swarm-based defence approach is proposed to migrate the fault channel to normal operating channel through frequency hop approaches. The comparative analysis between the proposed table-based intrusion detection systems (IDSs) and swarm-based defence approaches with the traditional IDS regarding the parameters of transmission overhead/efficiency, energy consumption, and false positive/negative rates proves the capability of DoS prediction/prevention in WSN.

  6. Screening versus routine practice in detection of atrial fibrillation in patients aged 65 or over: cluster randomised controlled trial

    PubMed Central

    Fitzmaurice, David A; Jowett, Sue; Mant, Jonathon; Murray, Ellen T; Holder, Roger; Raftery, J P; Bryan, S; Davies, Michael; Lip, Gregory Y H; Allan, T F

    2007-01-01

    Objectives To assess whether screening improves the detection of atrial fibrillation (cluster randomisation) and to compare systematic and opportunistic screening. Design Multicentred cluster randomised controlled trial, with subsidiary trial embedded within the intervention arm. Setting 50 primary care centres in England, with further individual randomisation of patients in the intervention practices. Participants 14 802 patients aged 65 or over in 25 intervention and 25 control practices. Interventions Patients in intervention practices were randomly allocated to systematic screening (invitation for electrocardiography) or opportunistic screening (pulse taking and invitation for electrocardiography if the pulse was irregular). Screening took place over 12 months in each practice from October 2001 to February 2003. No active screening took place in control practices. Main outcome measure Newly identified atrial fibrillation. Results The detection rate of new cases of atrial fibrillation was 1.63% a year in the intervention practices and 1.04% in control practices (difference 0.59%, 95% confidence interval 0.20% to 0.98%). Systematic and opportunistic screening detected similar numbers of new cases (1.62% v 1.64%, difference 0.02%, −0.5% to 0.5%). Conclusion Active screening for atrial fibrillation detects additional cases over current practice. The preferred method of screening in patients aged 65 or over in primary care is opportunistic pulse taking with follow-up electrocardiography. Trial registration Current Controlled Trials ISRCTN19633732. PMID:17673732

  7. Detection method of visible and invisible nipples on digital breast tomosynthesis

    NASA Astrophysics Data System (ADS)

    Chae, Seung-Hoon; Jeong, Ji-Wook; Lee, Sooyeul; Chae, Eun Young; Kim, Hak Hee; Choi, Young-Wook

    2015-03-01

    Digital Breast Tomosynthesis(DBT) with 3D breast image can improve detection sensitivity of breast cancer more than 2D mammogram on dense breast. The nipple location information is needed to analyze DBT. The nipple location is invaluable information in registration and as a reference point for classifying mass or micro-calcification clusters. Since there are visible nipple and invisible nipple in 2D mammogram or DBT, the nipple detection of breast must be possible to detect visible and invisible nipple of breast. The detection method of visible nipple using shape information of nipple is simple and highly efficient. However, it is difficult to detect invisible nipple because it doesn't have prominent shape. Mammary glands in breast connect nipple, anatomically. The nipple location is detected through analyzing location of mammary glands in breast. In this paper, therefore, we propose a method to detect the nipple on a breast, which has a visible or invisible nipple using changes of breast area and mammary glands, respectively. The result shows that our proposed method has average error of 2.54+/-1.47mm.

  8. [Use of nested PCR in detection of the plague pathogen].

    PubMed

    Glukhov, A I; Gordeev, S A; Al'tshuler, M L; Zykova, I E; Severin, S E

    2003-07-01

    Causative agents of plague, i.e. bacterium Yersina pestis (in the subcutaneous tissues of rodents) and their cutaneous parasites need to be isolated to enable plague prevention. A comparatively new method of polymerase chain reaction (PCR) opens up new possibilities of determining Y. pestis just within several hours and without any cultivation. The article contains a description of the PCR-method, which makes it possible to distinguish the culture of Y. pestis from cultures of other microorganism, including speci of Yersina. The method is of the cluster-type, i.e. it is made up of subsequent PC reactions with the substrate for the second reaction being the product of the first one. The cluster nature of the method preconditions a higher sensitivity and specificity versus the ordinary PCR.

  9. Network-constrained spatio-temporal clustering analysis of traffic collisions in Jianghan District of Wuhan, China

    PubMed Central

    Fan, Yaxin; Zhu, Xinyan; Guo, Wei; Guo, Tao

    2018-01-01

    The analysis of traffic collisions is essential for urban safety and the sustainable development of the urban environment. Reducing the road traffic injuries and the financial losses caused by collisions is the most important goal of traffic management. In addition, traffic collisions are a major cause of traffic congestion, which is a serious issue that affects everyone in the society. Therefore, traffic collision analysis is essential for all parties, including drivers, pedestrians, and traffic officers, to understand the road risks at a finer spatio-temporal scale. However, traffic collisions in the urban context are dynamic and complex. Thus, it is important to detect how the collision hotspots evolve over time through spatio-temporal clustering analysis. In addition, traffic collisions are not isolated events in space. The characteristics of the traffic collisions and their surrounding locations also present an influence of the clusters. This work tries to explore the spatio-temporal clustering patterns of traffic collisions by combining a set of network-constrained methods. These methods were tested using the traffic collision data in Jianghan District of Wuhan, China. The results demonstrated that these methods offer different perspectives of the spatio-temporal clustering patterns. The weighted network kernel density estimation provides an intuitive way to incorporate attribute information. The network cross K-function shows that there are varying clustering tendencies between traffic collisions and different types of POIs. The proposed network differential Local Moran’s I and network local indicators of mobility association provide straightforward and quantitative measures of the hotspot changes. This case study shows that these methods could help researchers, practitioners, and policy-makers to better understand the spatio-temporal clustering patterns of traffic collisions. PMID:29672551

  10. Grouped fuzzy SVM with EM-based partition of sample space for clustered microcalcification detection.

    PubMed

    Wang, Huiya; Feng, Jun; Wang, Hongyu

    2017-07-20

    Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.

  11. Infrared Multiple Photon Dissociation Spectroscopy Of Metal Cluster-Adducts

    NASA Astrophysics Data System (ADS)

    Cox, D. M.; Kaldor, A.; Zakin, M. R.

    1987-01-01

    Recent development of the laser vaporization technique combined with mass-selective detection has made possible new studies of the fundamental chemical and physical properties of unsupported transition metal clusters as a function of the number of constituent atoms. A variety of experimental techniques have been developed in our laboratory to measure ionization threshold energies, magnetic moments, and gas phase reactivity of clusters. However, studies have so far been unable to determine the cluster structure or the chemical state of chemisorbed species on gas phase clusters. The application of infrared multiple photon dissociation IRMPD to obtain the IR absorption properties of metal cluster-adsorbate species in a molecular beam is described here. Specifically using a high power, pulsed CO2 laser as the infrared source, the IRMPD spectrum for methanol chemisorbed on small iron clusters is measured as a function of the number of both iron atoms and methanols in the complex for different methanol isotopes. Both the feasibility and potential utility of IRMPD for characterizing metal cluster-adsorbate interactions are demonstrated. The method is generally applicable to any cluster or cluster-adsorbate system dependent only upon the availability of appropriate high power infrared sources.

  12. Threshold selection for classification of MR brain images by clustering method

    NASA Astrophysics Data System (ADS)

    Moldovanu, Simona; Obreja, Cristian; Moraru, Luminita

    2015-12-01

    Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.

  13. Visualizing statistical significance of disease clusters using cartograms.

    PubMed

    Kronenfeld, Barry J; Wong, David W S

    2017-05-15

    Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.

  14. Dense module enumeration in biological networks

    NASA Astrophysics Data System (ADS)

    Tsuda, Koji; Georgii, Elisabeth

    2009-12-01

    Analysis of large networks is a central topic in various research fields including biology, sociology, and web mining. Detection of dense modules (a.k.a. clusters) is an important step to analyze the networks. Though numerous methods have been proposed to this aim, they often lack mathematical rigorousness. Namely, there is no guarantee that all dense modules are detected. Here, we present a novel reverse-search-based method for enumerating all dense modules. Furthermore, constraints from additional data sources such as gene expression profiles or customer profiles can be integrated, so that we can systematically detect dense modules with interesting profiles. We report successful applications in human protein interaction network analyses.

  15. Aerial detection of Ailanthus altissima: a cost-effective method to map an invasive tree in forested landscapes

    Treesearch

    Joanne Rebbeck; Aaron Kloss; Michael Bowden; Cheryl Coon; Todd F. Hutchinson; Louis Iverson; Greg Guess

    2015-01-01

    We present an aerial mapping method to efficiently and effectively identify seed clusters of the invasive tree, Ailanthus altissima (Mill.) Swingle across deciduous forest landscapes in the eastern United States. We found that the ideal time to conduct aerial digital surveys is early to middle winter, when Ailanthus seed...

  16. ModuleMiner - improved computational detection of cis-regulatory modules: are there different modes of gene regulation in embryonic development and adult tissues?

    PubMed Central

    Van Loo, Peter; Aerts, Stein; Thienpont, Bernard; De Moor, Bart; Moreau, Yves; Marynen, Peter

    2008-01-01

    We present ModuleMiner, a novel algorithm for computationally detecting cis-regulatory modules (CRMs) in a set of co-expressed genes. ModuleMiner outperforms other methods for CRM detection on benchmark data, and successfully detects CRMs in tissue-specific microarray clusters and in embryonic development gene sets. Interestingly, CRM predictions for differentiated tissues exhibit strong enrichment close to the transcription start site, whereas CRM predictions for embryonic development gene sets are depleted in this region. PMID:18394174

  17. Atlas-based segmentation of brainstem regions in neuromelanin-sensitive magnetic resonance images

    NASA Astrophysics Data System (ADS)

    Puigvert, Marc; Castellanos, Gabriel; Uranga, Javier; Abad, Ricardo; Fernández-Seara, María. A.; Pastor, Pau; Pastor, María. A.; Muñoz-Barrutia, Arrate; Ortiz de Solórzano, Carlos

    2015-03-01

    We present a method for the automatic delineation of two neuromelanin rich brainstem structures -substantia nigra pars compacta (SN) and locus coeruleus (LC)- in neuromelanin sensitive magnetic resonance images of the brain. The segmentation method uses a dynamic multi-image reference atlas and a pre-registration atlas selection strategy. To create the atlas, a pool of 35 images of healthy subjects was pair-wise pre-registered and clustered in groups using an affinity propagation approach. Each group of the atlas is represented by a single exemplar image. Each new target image to be segmented is registered to the exemplars of each cluster. Then all the images of the highest performing clusters are enrolled into the final atlas, and the results of the registration with the target image are propagated using a majority voting approach. All registration processes used combined one two-stage affine and one elastic B-spline algorithm, to account for global positioning, region selection and local anatomic differences. In this paper, we present the algorithm, with emphasis in the atlas selection method and the registration scheme. We evaluate the performance of the atlas selection strategy using 35 healthy subjects and 5 Parkinson's disease patients. Then, we quantified the volume and contrast ratio of neuromelanin signal of these structures in 47 normal subjects and 40 Parkinson's disease patients to confirm that this method can detect neuromelanin-containing neurons loss in Parkinson's disease patients and could eventually be used for the early detection of SN and LC damage.

  18. Digital Breast Tomosynthesis: Observer Performance of Clustered Microcalcification Detection on Breast Phantom Images Acquired with an Experimental System Using Variable Scan Angles, Angular Increments, and Number of Projection Views

    PubMed Central

    Goodsitt, Mitchell M.; Helvie, Mark A.; Zelakiewicz, Scott; Schmitz, Andrea; Noroozian, Mitra; Paramagul, Chintana; Roubidoux, Marilyn A.; Nees, Alexis V.; Neal, Colleen H.; Carson, Paul; Lu, Yao; Hadjiiski, Lubomir; Wei, Jun

    2014-01-01

    Purpose To investigate the dependence of microcalcification cluster detectability on tomographic scan angle, angular increment, and number of projection views acquired at digital breast tomosynthesis (DBTdigital breast tomosynthesis). Materials and Methods A prototype DBTdigital breast tomosynthesis system operated in step-and-shoot mode was used to image breast phantoms. Four 5-cm-thick phantoms embedded with 81 simulated microcalcification clusters of three speck sizes (subtle, medium, and obvious) were imaged by using a rhodium target and rhodium filter with 29 kV, 50 mAs, and seven acquisition protocols. Fixed angular increments were used in four protocols (denoted as scan angle, angular increment, and number of projection views, respectively: 16°, 1°, and 17; 24°, 3°, and nine; 30°, 3°, and 11; and 60°, 3°, and 21), and variable increments were used in three (40°, variable, and 13; 40°, variable, and 15; and 60°, variable, and 21). The reconstructed DBTdigital breast tomosynthesis images were interpreted by six radiologists who located the microcalcification clusters and rated their conspicuity. Results The mean sensitivity for detection of subtle clusters ranged from 80% (22.5 of 28) to 96% (26.8 of 28) for the seven DBTdigital breast tomosynthesis protocols; the highest sensitivity was achieved with the 16°, 1°, and 17 protocol (96%), but the difference was significant only for the 60°, 3°, and 21 protocol (80%, P < .002) and did not reach significance for the other five protocols (P = .01–.15). The mean sensitivity for detection of medium and obvious clusters ranged from 97% (28.2 of 29) to 100% (24 of 24), but the differences fell short of significance (P = .08 to >.99). The conspicuity of subtle and medium clusters with the 16°, 1°, and 17 protocol was rated higher than those with other protocols; the differences were significant for subtle clusters with the 24°, 3°, and nine protocol and for medium clusters with 24°, 3°, and nine; 30°, 3°, and 11; 60°, 3° and 21; and 60°, variable, and 21 protocols (P < .002). Conclusion With imaging that did not include x-ray source motion or patient motion during acquisition of the projection views, narrow-angle DBTdigital breast tomosynthesis provided higher sensitivity and conspicuity than wide-angle DBTdigital breast tomosynthesis for subtle microcalcification clusters. © RSNA, 2014 PMID:25007048

  19. Primordial binary populations in low-density star clusters as seen by Chandra: globular clusters versus old open clusters

    NASA Astrophysics Data System (ADS)

    van den Berg, Maureen C.

    2015-08-01

    The binaries in the core of a star cluster are the energy source that prevents the cluster from experiencing core collapse. To model the dynamical evolution of a cluster, it is important to have constraints on the primordial binary content. X-ray observations of old star clusters are very efficient in detecting the close interacting binaries among the cluster members. The X-ray sources in star clusters are a mix of binaries that were dynamically formed and primordial binaries. In massive, dense star clusters, dynamical encounters play an important role in shaping the properties and numbers of the binaries. In contrast, in the low-density clusters the impact of dynamical encounters is presumed to be very small, and the close binaries detected in X-rays represent a primordial population. The lowest density globular clusters have current masses and central densities similar to those of the oldest open clusters in our Milky Way. I will discuss the results of studies with the Chandra X-ray Observatory that have nevertheless revealed a clear dichotomy: far fewer (if any at all) X-ray sources are detected in the central regions of the low-density globular clusters compared to the number of secure cluster members that have been detected in old open clusters (above a limiting X-ray luminosity of typically 4e30 erg/s). The low stellar encounter rates imply that dynamical destruction of binaries can be ignored at present, therefore an explanation must be sought elsewhere. I will discuss several factors that can shed light on the implied differences between the primordial close binary populations in the two types of star clusters.

  20. Adaboost multi-view face detection based on YCgCr skin color model

    NASA Astrophysics Data System (ADS)

    Lan, Qi; Xu, Zhiyong

    2016-09-01

    Traditional Adaboost face detection algorithm uses Haar-like features training face classifiers, whose detection error rate is low in the face region. While under the complex background, the classifiers will make wrong detection easily to the background regions with the similar faces gray level distribution, which leads to the error detection rate of traditional Adaboost algorithm is high. As one of the most important features of a face, skin in YCgCr color space has good clustering. We can fast exclude the non-face areas through the skin color model. Therefore, combining with the advantages of the Adaboost algorithm and skin color detection algorithm, this paper proposes Adaboost face detection algorithm method that bases on YCgCr skin color model. Experiments show that, compared with traditional algorithm, the method we proposed has improved significantly in the detection accuracy and errors.

  1. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes.

    PubMed

    Skinnider, Michael A; Merwin, Nishanth J; Johnston, Chad W; Magarvey, Nathan A

    2017-07-03

    Microbial natural products represent a rich resource of pharmaceutically and industrially important compounds. Genome sequencing has revealed that the majority of natural products remain undiscovered, and computational methods to connect biosynthetic gene clusters to their corresponding natural products therefore have the potential to revitalize natural product discovery. Previously, we described PRediction Informatics for Secondary Metabolomes (PRISM), a combinatorial approach to chemical structure prediction for genetically encoded nonribosomal peptides and type I and II polyketides. Here, we present a ground-up rewrite of the PRISM structure prediction algorithm to derive prediction of natural products arising from non-modular biosynthetic paradigms. Within this new version, PRISM 3, natural product scaffolds are modeled as chemical graphs, permitting structure prediction for aminocoumarins, antimetabolites, bisindoles and phosphonate natural products, and building upon the addition of ribosomally synthesized and post-translationally modified peptides. Further, with the addition of cluster detection for 11 new cluster types, PRISM 3 expands to detect 22 distinct natural product cluster types. Other major modifications to PRISM include improved sequence input and ORF detection, user-friendliness and output. Distribution of PRISM 3 over a 300-core server grid improves the speed and capacity of the web application. PRISM 3 is available at http://magarveylab.ca/prism/. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Automatic three-dimensional measurement of large-scale structure based on vision metrology.

    PubMed

    Zhu, Zhaokun; Guan, Banglei; Zhang, Xiaohu; Li, Daokui; Yu, Qifeng

    2014-01-01

    All relevant key techniques involved in photogrammetric vision metrology for fully automatic 3D measurement of large-scale structure are studied. A new kind of coded target consisting of circular retroreflective discs is designed, and corresponding detection and recognition algorithms based on blob detection and clustering are presented. Then a three-stage strategy starting with view clustering is proposed to achieve automatic network orientation. As for matching of noncoded targets, the concept of matching path is proposed, and matches for each noncoded target are found by determination of the optimal matching path, based on a novel voting strategy, among all possible ones. Experiments on a fixed keel of airship have been conducted to verify the effectiveness and measuring accuracy of the proposed methods.

  3. Tree crown mapping in managed woodlands (parklands) of semi-arid West Africa using WorldView-2 imagery and geographic object based image analysis.

    PubMed

    Karlson, Martin; Reese, Heather; Ostwald, Madelene

    2014-11-28

    Detailed information on tree cover structure is critical for research and monitoring programs targeting African woodlands, including agroforestry parklands. High spatial resolution satellite imagery represents a potentially effective alternative to field-based surveys, but requires the development of accurate methods to automate information extraction. This study presents a method for tree crown mapping based on Geographic Object Based Image Analysis (GEOBIA) that use spectral and geometric information to detect and delineate individual tree crowns and crown clusters. The method was implemented on a WorldView-2 image acquired over the parklands of Saponé, Burkina Faso, and rigorously evaluated against field reference data. The overall detection rate was 85.4% for individual tree crowns and crown clusters, with lower accuracies in areas with high tree density and dense understory vegetation. The overall delineation error (expressed as the difference between area of delineated object and crown area measured in the field) was 45.6% for individual tree crowns and 61.5% for crown clusters. Delineation accuracies were higher for medium (35-100 m(2)) and large (≥100 m(2)) trees compared to small (<35 m(2)) trees. The results indicate potential of GEOBIA and WorldView-2 imagery for tree crown mapping in parkland landscapes and similar woodland areas.

  4. Tree Crown Mapping in Managed Woodlands (Parklands) of Semi-Arid West Africa Using WorldView-2 Imagery and Geographic Object Based Image Analysis

    PubMed Central

    Karlson, Martin; Reese, Heather; Ostwald, Madelene

    2014-01-01

    Detailed information on tree cover structure is critical for research and monitoring programs targeting African woodlands, including agroforestry parklands. High spatial resolution satellite imagery represents a potentially effective alternative to field-based surveys, but requires the development of accurate methods to automate information extraction. This study presents a method for tree crown mapping based on Geographic Object Based Image Analysis (GEOBIA) that use spectral and geometric information to detect and delineate individual tree crowns and crown clusters. The method was implemented on a WorldView-2 image acquired over the parklands of Saponé, Burkina Faso, and rigorously evaluated against field reference data. The overall detection rate was 85.4% for individual tree crowns and crown clusters, with lower accuracies in areas with high tree density and dense understory vegetation. The overall delineation error (expressed as the difference between area of delineated object and crown area measured in the field) was 45.6% for individual tree crowns and 61.5% for crown clusters. Delineation accuracies were higher for medium (35–100 m2) and large (≥100 m2) trees compared to small (<35 m2) trees. The results indicate potential of GEOBIA and WorldView-2 imagery for tree crown mapping in parkland landscapes and similar woodland areas. PMID:25460815

  5. Detecting brain tumor in computed tomography images using Markov random fields and fuzzy C-means clustering techniques

    NASA Astrophysics Data System (ADS)

    Abdulbaqi, Hayder Saad; Jafri, Mohd Zubir Mat; Omar, Ahmad Fairuz; Mustafa, Iskandar Shahrim Bin; Abood, Loay Kadom

    2015-04-01

    Brain tumors, are an abnormal growth of tissues in the brain. They may arise in people of any age. They must be detected early, diagnosed accurately, monitored carefully, and treated effectively in order to optimize patient outcomes regarding both survival and quality of life. Manual segmentation of brain tumors from CT scan images is a challenging and time consuming task. Size and location accurate detection of brain tumor plays a vital role in the successful diagnosis and treatment of tumors. Brain tumor detection is considered a challenging mission in medical image processing. The aim of this paper is to introduce a scheme for tumor detection in CT scan images using two different techniques Hidden Markov Random Fields (HMRF) and Fuzzy C-means (FCM). The proposed method has been developed in this research in order to construct hybrid method between (HMRF) and threshold. These methods have been applied on 4 different patient data sets. The result of comparison among these methods shows that the proposed method gives good results for brain tissue detection, and is more robust and effective compared with (FCM) techniques.

  6. Deep CCD Photometry of the Rich Galaxy Cluster Abel 1656 Characteristics of the Dwarf Elliptical Galaxy Population in the Cluster Core

    NASA Astrophysics Data System (ADS)

    Secker, Jeffrey Alan

    1995-01-01

    We have developed a statistically rigorous and automated method to implement the detection, photometry and classification of faint objects on digital images. We use these methods to analyze deep R- and B-band CCD images of the central ~ 700 arcmin ^2 of the Coma cluster core, and an associated control field. We have detected and measured total R magnitudes and (B-R) colors for a sample of 3741 objects on the galaxy cluster fields, and 1164 objects on a remote control field, complete to a limiting magnitude of R = 22.5 mag. The typical uncertainties are +/- 0.06 and +/-0.12 mag in total magnitude and color respectively. The dwarf elliptical (dE) galaxies are confined to a well-defined sequence in the color range given by 0.7<= (B-R)<= 1.9 mag: within this interval there are 2535 dE candidates on our fields in the cluster core, and 694 objects on the control field. With an image scale of 0.53 arcsec/pixel and seeing near 1.2 arcsec, a large fraction of the dE galaxy candidates are resolved. We find a significant metallicity gradient in the radial distribution of the dwarf elliptical galaxies, which goes as Z~ R^{-0.32 } outwards from the cluster center at NGC 4874. As well, there is a strong color-luminosity correlation, in the sense that more luminous dE galaxies are redder in the mean. These effects give rise to a radial variation in the cluster luminosity function. The spatial distribution of the faint dE galaxies is well fit by a standard King model with a central surface density of Sigma _0 = 1.44 dEs arcmin^{ -2}, a core radius R_{ rm c} = 18.7 arcmin (~eq 0.44 Mpc), and a tidal radius of 1.44 deg ( ~eq 2.05 Mpc). This core is significantly larger than R_{rm c} = 12.3 arcmin (~eq 0.29 Mpc) found for the bright cluster galaxies. The composite luminosity function for Coma galaxies is modeled as the sum of a log -normal distribution for the giant galaxies and a Schechter function for the dwarf elliptical galaxies, with a faint -end slope of alpha = -1.41, consistent with known faint-end slopes for the Virgo and Fornax clusters. The early-type dwarf-to-giant ratio for the Coma cluster core is consistent with that of the Virgo cluster, and thus with the rich Coma cluster being formed as the merger of multiple less-rich galaxy clusters.

  7. a Cloud Boundary Detection Scheme Combined with Aslic and Cnn Using ZY-3, GF-1/2 Satellite Imagery

    NASA Astrophysics Data System (ADS)

    Guo, Z.; Li, C.; Wang, Z.; Kwok, E.; Wei, X.

    2018-04-01

    Remote sensing optical image cloud detection is one of the most important problems in remote sensing data processing. Aiming at the information loss caused by cloud cover, a cloud detection method based on convolution neural network (CNN) is presented in this paper. Firstly, a deep CNN network is used to extract the multi-level feature generation model of cloud from the training samples. Secondly, the adaptive simple linear iterative clustering (ASLIC) method is used to divide the detected images into superpixels. Finally, the probability of each superpixel belonging to the cloud region is predicted by the trained network model, thereby generating a cloud probability map. The typical region of GF-1/2 and ZY-3 were selected to carry out the cloud detection test, and compared with the traditional SLIC method. The experiment results show that the average accuracy of cloud detection is increased by more than 5 %, and it can detected thin-thick cloud and the whole cloud boundary well on different imaging platforms.

  8. GEsture: an online hand-drawing tool for gene expression pattern search.

    PubMed

    Wang, Chunyan; Xu, Yiqing; Wang, Xuelin; Zhang, Li; Wei, Suyun; Ye, Qiaolin; Zhu, Youxiang; Yin, Hengfu; Nainwal, Manoj; Tanon-Reyes, Luis; Cheng, Feng; Yin, Tongming; Ye, Ning

    2018-01-01

    Gene expression profiling data provide useful information for the investigation of biological function and process. However, identifying a specific expression pattern from extensive time series gene expression data is not an easy task. Clustering, a popular method, is often used to classify similar expression genes, however, genes with a 'desirable' or 'user-defined' pattern cannot be efficiently detected by clustering methods. To address these limitations, we developed an online tool called GEsture. Users can draw, or graph a curve using a mouse instead of inputting abstract parameters of clustering methods. GEsture explores genes showing similar, opposite and time-delay expression patterns with a gene expression curve as input from time series datasets. We presented three examples that illustrate the capacity of GEsture in gene hunting while following users' requirements. GEsture also provides visualization tools (such as expression pattern figure, heat map and correlation network) to display the searching results. The result outputs may provide useful information for researchers to understand the targets, function and biological processes of the involved genes.

  9. Faster Detection of Poliomyelitis Outbreaks to Support Polio Eradication

    PubMed Central

    Chenoweth, Paul; Okayasu, Hiro; Donnelly, Christl A.; Aylward, R. Bruce; Grassly, Nicholas C.

    2016-01-01

    As the global eradication of poliomyelitis approaches the final stages, prompt detection of new outbreaks is critical to enable a fast and effective outbreak response. Surveillance relies on reporting of acute flaccid paralysis (AFP) cases and laboratory confirmation through isolation of poliovirus from stool. However, delayed sample collection and testing can delay outbreak detection. We investigated whether weekly testing for clusters of AFP by location and time, using the Kulldorff scan statistic, could provide an early warning for outbreaks in 20 countries. A mixed-effects regression model was used to predict background rates of nonpolio AFP at the district level. In Tajikistan and Congo, testing for AFP clusters would have resulted in an outbreak warning 39 and 11 days, respectively, before official confirmation of large outbreaks. This method has relatively high specificity and could be integrated into the current polio information system to support rapid outbreak response activities. PMID:26890053

  10. Faster Detection of Poliomyelitis Outbreaks to Support Polio Eradication.

    PubMed

    Blake, Isobel M; Chenoweth, Paul; Okayasu, Hiro; Donnelly, Christl A; Aylward, R Bruce; Grassly, Nicholas C

    2016-03-01

    As the global eradication of poliomyelitis approaches the final stages, prompt detection of new outbreaks is critical to enable a fast and effective outbreak response. Surveillance relies on reporting of acute flaccid paralysis (AFP) cases and laboratory confirmation through isolation of poliovirus from stool. However, delayed sample collection and testing can delay outbreak detection. We investigated whether weekly testing for clusters of AFP by location and time, using the Kulldorff scan statistic, could provide an early warning for outbreaks in 20 countries. A mixed-effects regression model was used to predict background rates of nonpolio AFP at the district level. In Tajikistan and Congo, testing for AFP clusters would have resulted in an outbreak warning 39 and 11 days, respectively, before official confirmation of large outbreaks. This method has relatively high specificity and could be integrated into the current polio information system to support rapid outbreak response activities.

  11. THE PREVALENCE AND IMPACT OF WOLF–RAYET STARS IN EMERGING MASSIVE STAR CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sokal, Kimberly R.; Johnson, Kelsey E.; Indebetouw, Rémy

    We investigate Wolf–Rayet (WR) stars as a source of feedback contributing to the removal of natal material in the early evolution of massive star clusters. Despite previous work suggesting that massive star clusters clear out their natal material before the massive stars evolve into the WR phase, WR stars have been detected in several emerging massive star clusters. These detections suggest that the timescale for clusters to emerge can be at least as long as the time required to produce WR stars (a few million years), and could also indicate that WR stars may be providing the tipping point inmore » the combined feedback processes that drive a massive star cluster to emerge. We explore the potential overlap between the emerging phase and the WR phase with an observational survey to search for WR stars in emerging massive star clusters hosting WR stars. We select candidate emerging massive star clusters from known radio continuum sources with thermal emission and obtain optical spectra with the 4 m Mayall Telescope at Kitt Peak National Observatory and the 6.5 m MMT.{sup 4} We identify 21 sources with significantly detected WR signatures, which we term “emerging WR clusters.” WR features are detected in ∼50% of the radio-selected sample, and thus we find that WR stars are commonly present in currently emerging massive star clusters. The observed extinctions and ages suggest that clusters without WR detections remain embedded for longer periods of time, and may indicate that WR stars can aid, and therefore accelerate, the emergence process.« less

  12. A flexibly shaped space-time scan statistic for disease outbreak detection and monitoring.

    PubMed

    Takahashi, Kunihiko; Kulldorff, Martin; Tango, Toshiro; Yih, Katherine

    2008-04-11

    Early detection of disease outbreaks enables public health officials to implement disease control and prevention measures at the earliest possible time. A time periodic geographical disease surveillance system based on a cylindrical space-time scan statistic has been used extensively for disease surveillance along with the SaTScan software. In the purely spatial setting, many different methods have been proposed to detect spatial disease clusters. In particular, some spatial scan statistics are aimed at detecting irregularly shaped clusters which may not be detected by the circular spatial scan statistic. Based on the flexible purely spatial scan statistic, we propose a flexibly shaped space-time scan statistic for early detection of disease outbreaks. The performance of the proposed space-time scan statistic is compared with that of the cylindrical scan statistic using benchmark data. In order to compare their performances, we have developed a space-time power distribution by extending the purely spatial bivariate power distribution. Daily syndromic surveillance data in Massachusetts, USA, are used to illustrate the proposed test statistic. The flexible space-time scan statistic is well suited for detecting and monitoring disease outbreaks in irregularly shaped areas.

  13. Comparative Analysis of Automatic Exudate Detection between Machine Learning and Traditional Approaches

    NASA Astrophysics Data System (ADS)

    Sopharak, Akara; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Thomas

    To prevent blindness from diabetic retinopathy, periodic screening and early diagnosis are neccessary. Due to lack of expert ophthalmologists in rural area, automated early exudate (one of visible sign of diabetic retinopathy) detection could help to reduce the number of blindness in diabetic patients. Traditional automatic exudate detection methods are based on specific parameter configuration, while the machine learning approaches which seems more flexible may be computationally high cost. A comparative analysis of traditional and machine learning of exudates detection, namely, mathematical morphology, fuzzy c-means clustering, naive Bayesian classifier, Support Vector Machine and Nearest Neighbor classifier are presented. Detected exudates are validated with expert ophthalmologists' hand-drawn ground-truths. The sensitivity, specificity, precision, accuracy and time complexity of each method are also compared.

  14. Peak tree: a new tool for multiscale hierarchical representation and peak detection of mass spectrometry data.

    PubMed

    Zhang, Peng; Li, Houqiang; Wang, Honghui; Wong, Stephen T C; Zhou, Xiaobo

    2011-01-01

    Peak detection is one of the most important steps in mass spectrometry (MS) analysis. However, the detection result is greatly affected by severe spectrum variations. Unfortunately, most current peak detection methods are neither flexible enough to revise false detection results nor robust enough to resist spectrum variations. To improve flexibility, we introduce peak tree to represent the peak information in MS spectra. Each tree node is a peak judgment on a range of scales, and each tree decomposition, as a set of nodes, is a candidate peak detection result. To improve robustness, we combine peak detection and common peak alignment into a closed-loop framework, which finds the optimal decomposition via both peak intensity and common peak information. The common peak information is derived and loopily refined from the density clustering of the latest peak detection result. Finally, we present an improved ant colony optimization biomarker selection method to build a whole MS analysis system. Experiment shows that our peak detection method can better resist spectrum variations and provide higher sensitivity and lower false detection rates than conventional methods. The benefits from our peak-tree-based system for MS disease analysis are also proved on real SELDI data.

  15. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    PubMed

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Detection of moving clusters by a method of cinematic pairs

    NASA Astrophysics Data System (ADS)

    Khodjachikh, M. F.; Romanovsky, E. A.

    2000-01-01

    The algorithm of revealing of pairs stars with common movement is offered and is realized. The basic source is the catalogue HIPPARCOS. On concentration of kinematic pairs it is revealed three unknown earlier moving clusters in constellations: 1) Phe, 2) Cae, 3) Hor and, well known, in 4) UMa are revealed. On an original technique the members of clusters -- all 87 stars are allocated. Coordinates of the clusters convergent point α, delta; (in degrees), spatial speed (in km/s) and age (in 106 yr) from isochrone fitting have made: 1) 51, -29, 19.0, 500, 5/6; 2) 104, -32, 23.7, 300, 9/12; 3) 119, -27, 22.3, 100, 9/22; 4) 303, -31, 16.7, 500, 16/8 accordingly. Numerator of fraction -- number of stars identified as the members of clusters, denominator -- number of the probable members (with unknown radial speeds). The preliminary qualitative analysis of clusters spatial structure is carried in view of their dynamic evolution.

  17. Probing high-redshift clusters with HST/ACS gravitational weak-lensing and Chandra x-ray observations

    NASA Astrophysics Data System (ADS)

    Jee, Myungkook James

    2006-06-01

    Clusters of galaxies, the largest gravitationally bound objects in the Universe, are useful tracers of cosmic evolution, and particularly detailed studies of still-forming clusters at high-redshifts can considerably enhance our understanding of the structure formation. We use two powerful methods that have become recently available for the study of these distant clusters: spaced- based gravitational weak-lensing and high-resolution X-ray observations. Detailed analyses of five high-redshift (0.8 < z < 1.3) clusters are presented based on the deep Advanced Camera for Surveys (ACS) and Chandra X-ray images. We show that, when the instrumental characteristics are properly understood, the newly installed ACS on the Hubble Space Telescope (HST) can detect subtle shape distortions of background galaxies down to the limiting magnitudes of the observations, which enables the mapping of the cluster dark matter in unprecedented high-resolution. The cluster masses derived from this HST /ACS weak-lensing study have been compared with those from the re-analyses of the archival Chandra X-ray data. We find that there are interesting offsets between the cluster galaxy, intracluster medium (ICM), and dark matter centroids, and possible scenarios are discussed. If the offset is confirmed to be uniquitous in other clusters, the explanation may necessitate major refinements in our current understanding of the nature of dark matter, as well as the cluster galaxy dynamics. CL0848+4452, the highest-redshift ( z = 1.27) cluster yet detected in weak-lensing, has a significant discrepancy between the weak- lensing and X-ray masses. If this trend is found to be severe and common also for other X-ray weak clusters at redshifts beyond the unity, the conventional X-ray determination of cluster mass functions, often inferred from their immediate X-ray properties such as the X-ray luminosity and temperature via the so-called mass-luminosity (M-L) and mass-temperature (M-T) relations, will become highly unstable in this redshift regime. Therefore, the relatively unbiased weak-lensing measurements of the cluster mass properties can be used to adequately calibrate the scaling relations in future high-redshift cluster investigations.

  18. Wavelet-based clustering of resting state MRI data in the rat.

    PubMed

    Medda, Alessio; Hoffmann, Lukas; Magnuson, Matthew; Thompson, Garth; Pan, Wen-Ju; Keilholz, Shella

    2016-01-01

    While functional connectivity has typically been calculated over the entire length of the scan (5-10min), interest has been growing in dynamic analysis methods that can detect changes in connectivity on the order of cognitive processes (seconds). Previous work with sliding window correlation has shown that changes in functional connectivity can be observed on these time scales in the awake human and in anesthetized animals. This exciting advance creates a need for improved approaches to characterize dynamic functional networks in the brain. Previous studies were performed using sliding window analysis on regions of interest defined based on anatomy or obtained from traditional steady-state analysis methods. The parcellation of the brain may therefore be suboptimal, and the characteristics of the time-varying connectivity between regions are dependent upon the length of the sliding window chosen. This manuscript describes an algorithm based on wavelet decomposition that allows data-driven clustering of voxels into functional regions based on temporal and spectral properties. Previous work has shown that different networks have characteristic frequency fingerprints, and the use of wavelets ensures that both the frequency and the timing of the BOLD fluctuations are considered during the clustering process. The method was applied to resting state data acquired from anesthetized rats, and the resulting clusters agreed well with known anatomical areas. Clusters were highly reproducible across subjects. Wavelet cross-correlation values between clusters from a single scan were significantly higher than the values from randomly matched clusters that shared no temporal information, indicating that wavelet-based analysis is sensitive to the relationship between areas. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

    PubMed

    Koren, Omry; Knights, Dan; Gonzalez, Antonio; Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into 'enterotypes' or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes.

  20. A Guide to Enterotypes across the Human Body: Meta-Analysis of Microbial Community Structures in Human Microbiome Datasets

    PubMed Central

    Waldron, Levi; Segata, Nicola; Knight, Rob; Huttenhower, Curtis; Ley, Ruth E.

    2013-01-01

    Recent analyses of human-associated bacterial diversity have categorized individuals into ‘enterotypes’ or clusters based on the abundances of key bacterial genera in the gut microbiota. There is a lack of consensus, however, on the analytical basis for enterotypes and on the interpretation of these results. We tested how the following factors influenced the detection of enterotypes: clustering methodology, distance metrics, OTU-picking approaches, sequencing depth, data type (whole genome shotgun (WGS) vs.16S rRNA gene sequence data), and 16S rRNA region. We included 16S rRNA gene sequences from the Human Microbiome Project (HMP) and from 16 additional studies and WGS sequences from the HMP and MetaHIT. In most body sites, we observed smooth abundance gradients of key genera without discrete clustering of samples. Some body habitats displayed bimodal (e.g., gut) or multimodal (e.g., vagina) distributions of sample abundances, but not all clustering methods and workflows accurately highlight such clusters. Because identifying enterotypes in datasets depends not only on the structure of the data but is also sensitive to the methods applied to identifying clustering strength, we recommend that multiple approaches be used and compared when testing for enterotypes. PMID:23326225

  1. Effect of image quality on calcification detection in digital mammography.

    PubMed

    Warren, Lucy M; Mackenzie, Alistair; Cooke, Julie; Given-Wilson, Rosalind M; Wallis, Matthew G; Chakraborty, Dev P; Dance, David R; Bosmans, Hilde; Young, Kenneth C

    2012-06-01

    This study aims to investigate if microcalcification detection varies significantly when mammographic images are acquired using different image qualities, including: different detectors, dose levels, and different image processing algorithms. An additional aim was to determine how the standard European method of measuring image quality using threshold gold thickness measured with a CDMAM phantom and the associated limits in current EU guidelines relate to calcification detection. One hundred and sixty two normal breast images were acquired on an amorphous selenium direct digital (DR) system. Microcalcification clusters extracted from magnified images of slices of mastectomies were electronically inserted into half of the images. The calcification clusters had a subtle appearance. All images were adjusted using a validated mathematical method to simulate the appearance of images from a computed radiography (CR) imaging system at the same dose, from both systems at half this dose, and from the DR system at quarter this dose. The original 162 images were processed with both Hologic and Agfa (Musica-2) image processing. All other image qualities were processed with Agfa (Musica-2) image processing only. Seven experienced observers marked and rated any identified suspicious regions. Free response operating characteristic (FROC) and ROC analyses were performed on the data. The lesion sensitivity at a nonlesion localization fraction (NLF) of 0.1 was also calculated. Images of the CDMAM mammographic test phantom were acquired using the automatic setting on the DR system. These images were modified to the additional image qualities used in the observer study. The images were analyzed using automated software. In order to assess the relationship between threshold gold thickness and calcification detection a power law was fitted to the data. There was a significant reduction in calcification detection using CR compared with DR: the alternative FROC (AFROC) area decreased from 0.84 to 0.63 and the ROC area decreased from 0.91 to 0.79 (p < 0.0001). This corresponded to a 30% drop in lesion sensitivity at a NLF equal to 0.1. Detection was also sensitive to the dose used. There was no significant difference in detection between the two image processing algorithms used (p > 0.05). It was additionally found that lower threshold gold thickness from CDMAM analysis implied better cluster detection. The measured threshold gold thickness passed the acceptable limit set in the EU standards for all image qualities except half dose CR. However, calcification detection varied significantly between image qualities. This suggests that the current EU guidelines may need revising. Microcalcification detection was found to be sensitive to detector and dose used. Standard measurements of image quality were a good predictor of microcalcification cluster detection. © 2012 American Association of Physicists in Medicine.

  2. Correlation Functions Quantify Super-Resolution Images and Estimate Apparent Clustering Due to Over-Counting

    PubMed Central

    Veatch, Sarah L.; Machta, Benjamin B.; Shelby, Sarah A.; Chiang, Ethan N.; Holowka, David A.; Baird, Barbara A.

    2012-01-01

    We present an analytical method using correlation functions to quantify clustering in super-resolution fluorescence localization images and electron microscopy images of static surfaces in two dimensions. We use this method to quantify how over-counting of labeled molecules contributes to apparent self-clustering and to calculate the effective lateral resolution of an image. This treatment applies to distributions of proteins and lipids in cell membranes, where there is significant interest in using electron microscopy and super-resolution fluorescence localization techniques to probe membrane heterogeneity. When images are quantified using pair auto-correlation functions, the magnitude of apparent clustering arising from over-counting varies inversely with the surface density of labeled molecules and does not depend on the number of times an average molecule is counted. In contrast, we demonstrate that over-counting does not give rise to apparent co-clustering in double label experiments when pair cross-correlation functions are measured. We apply our analytical method to quantify the distribution of the IgE receptor (FcεRI) on the plasma membranes of chemically fixed RBL-2H3 mast cells from images acquired using stochastic optical reconstruction microscopy (STORM/dSTORM) and scanning electron microscopy (SEM). We find that apparent clustering of FcεRI-bound IgE is dominated by over-counting labels on individual complexes when IgE is directly conjugated to organic fluorophores. We verify this observation by measuring pair cross-correlation functions between two distinguishably labeled pools of IgE-FcεRI on the cell surface using both imaging methods. After correcting for over-counting, we observe weak but significant self-clustering of IgE-FcεRI in fluorescence localization measurements, and no residual self-clustering as detected with SEM. We also apply this method to quantify IgE-FcεRI redistribution after deliberate clustering by crosslinking with two distinct trivalent ligands of defined architectures, and we evaluate contributions from both over-counting of labels and redistribution of proteins. PMID:22384026

  3. Slope angle estimation method based on sparse subspace clustering for probe safe landing

    NASA Astrophysics Data System (ADS)

    Li, Haibo; Cao, Yunfeng; Ding, Meng; Zhuang, Likui

    2018-06-01

    To avoid planetary probes landing on steep slopes where they may slip or tip over, a new method of slope angle estimation based on sparse subspace clustering is proposed to improve accuracy. First, a coordinate system is defined and established to describe the measured data of light detection and ranging (LIDAR). Second, this data is processed and expressed with a sparse representation. Third, on this basis, the data is made to cluster to determine which subspace it belongs to. Fourth, eliminating outliers in subspace, the correct data points are used for the fitting planes. Finally, the vectors normal to the planes are obtained using the plane model, and the angle between the normal vectors is obtained through calculation. Based on the geometric relationship, this angle is equal in value to the slope angle. The proposed method was tested in a series of experiments. The experimental results show that this method can effectively estimate the slope angle, can overcome the influence of noise and obtain an exact slope angle. Compared with other methods, this method can minimize the measuring errors and further improve the estimation accuracy of the slope angle.

  4. Applying Machine Learning to Star Cluster Classification

    NASA Astrophysics Data System (ADS)

    Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar

    2016-01-01

    Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.

  5. Change detection for synthetic aperture radar images based on pattern and intensity distinctiveness analysis

    NASA Astrophysics Data System (ADS)

    Wang, Xiao; Gao, Feng; Dong, Junyu; Qi, Qiang

    2018-04-01

    Synthetic aperture radar (SAR) image is independent on atmospheric conditions, and it is the ideal image source for change detection. Existing methods directly analysis all the regions in the speckle noise contaminated difference image. The performance of these methods is easily affected by small noisy regions. In this paper, we proposed a novel change detection framework for saliency-guided change detection based on pattern and intensity distinctiveness analysis. The saliency analysis step can remove small noisy regions, and therefore makes the proposed method more robust to the speckle noise. In the proposed method, the log-ratio operator is first utilized to obtain a difference image (DI). Then, the saliency detection method based on pattern and intensity distinctiveness analysis is utilized to obtain the changed region candidates. Finally, principal component analysis and k-means clustering are employed to analysis pixels in the changed region candidates. Thus, the final change map can be obtained by classifying these pixels into changed or unchanged class. The experiment results on two real SAR images datasets have demonstrated the effectiveness of the proposed method.

  6. Statistical study of ionospheric ion beams observed by CLUSTER above the polar caps

    NASA Astrophysics Data System (ADS)

    Maggiolo, R.; Echim, M.; Fontaine, D.; Teste, A. F.; Jacquey, C.

    2009-12-01

    Above the polar caps and during prolonged periods of Northward IMF, the Cluster spacecraft detect accelerated ion beams with energies up to a few keV. They are associated with downward precipitating electrons and converging electric field structures indicating that the acceleration is caused by a quasi-static field aligned electric field that can extend to altitudes up to 5 RE (Maggiolo et al. 2006, Teste et al. 2007). Using the AMDA science analysis service provided by the Centre de Données de la Physique des Plasmas (CDPP, http://cdpp.cesr.fr), we have been able to extract from the Cluster ion detectors dataset the time periods when Cluster encounters polar cap local ion beams. 6 years of data have been mined with this tool. Almost 200 events have been found giving new insight on these structures. After a description of the method used for the automatic detection of the beams, we will discuss their statistical properties. We analyze their relation to solar wind and IMF. In particular, we estimate the delay between a Northward/Southward turning of the IMF and the appearance/disappearance of these beams. The characteristics of the particles detected inside these structures as well as their size, orientation and location are also presented. We show that these ion beams are located on magnetic field lines mapping close to the high latitude magnetopause and in the central part of the lobes and that 40 % of them are detected together with hot isotropic ions. These results will be discussed in term of magnetotail configuration during prolonged periods of Northward IMF.

  7. DNA damage induced by the direct effect of radiation

    NASA Astrophysics Data System (ADS)

    Yokoya, A.; Shikazono, N.; Fujii, K.; Urushibara, A.; Akamatsu, K.; Watanabe, R.

    2008-10-01

    We have studied the nature of DNA damage induced by the direct effect of radiation. The yields of single- (SSB) and double-strand breaks (DSB), base lesions and clustered damage were measured using the agarose gel electrophoresis method after exposing to various kinds of radiations to a simple model DNA molecule, fully hydrated closed-circular plasmid DNA (pUC18). The yield of SSB does not show significant dependence on linear energy transfer (LET) values. On the other hand, the yields of base lesions revealed by enzymatic probes, endonuclease III (Nth) and formamidopyrimidine DNA glycosylase (Fpg), which excise base lesions and leave a nick at the damage site, strongly depend on LET values. Soft X-ray photon (150 kVp) irradiation gives a maximum yield of the base lesions detected by the enzymatic probes as SSB and clustered damage, which is composed of one base lesion and proximate other base lesions or SSBs. The clustered damage is visualized as an enzymatically induced DSB. The yields of the enzymatically additional damages strikingly decrease with increasing levels of LET. These results suggest that in higher LET regions, the repair enzymes used as probes are compromised because of the dense damage clustering. The studies using simple plasmid DNA as a irradiation sample, however, have a technical difficulty to detect multiple SSBs in a plasmid DNA. To detect the additional SSBs induced in opposite strand of the first SSB, we have also developed a novel technique of DNA-denaturation assay. This allows us to detect multiply induced SSBs in both strand of DNA, but not induced DSB.

  8. Measurement Error Correction Formula for Cluster-Level Group Differences in Cluster Randomized and Observational Studies

    ERIC Educational Resources Information Center

    Cho, Sun-Joo; Preacher, Kristopher J.

    2016-01-01

    Multilevel modeling (MLM) is frequently used to detect cluster-level group differences in cluster randomized trial and observational studies. Group differences on the outcomes (posttest scores) are detected by controlling for the covariate (pretest scores) as a proxy variable for unobserved factors that predict future attributes. The pretest and…

  9. Cluster detection methods applied to the Upper Cape Cod cancer data.

    PubMed

    Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann

    2005-09-15

    A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.

  10. Modeling the Movement of Homicide by Type to Inform Public Health Prevention Efforts

    PubMed Central

    Grady, Sue; Pizarro, Jesenia M.; Melde, Chris

    2015-01-01

    Objectives. We modeled the spatiotemporal movement of hotspot clusters of homicide by motive in Newark, New Jersey, to investigate whether different homicide types have different patterns of clustering and movement. Methods. We obtained homicide data from the Newark Police Department Homicide Unit’s investigative files from 1997 through 2007 (n = 560). We geocoded the address at which each homicide victim was found and recorded the date of and the motive for the homicide. We used cluster detection software to model the spatiotemporal movement of statistically significant homicide clusters by motive, using census tract and month of occurrence as the spatial and temporal units of analysis. Results. Gang-motivated homicides showed evidence of clustering and diffusion through Newark. Additionally, gang-motivated homicide clusters overlapped to a degree with revenge and drug-motivated homicide clusters. Escalating dispute and nonintimate familial homicides clustered; however, there was no evidence of diffusion. Intimate partner and robbery homicides did not cluster. Conclusions. By tracking how homicide types diffuse through communities and determining which places have ongoing or emerging homicide problems by type, we can better inform the deployment of prevention and intervention efforts. PMID:26270315

  11. Multilocus sequence typing and pulsed-field gel electrophoresis analysis of Oenococcus oeni from different wine-producing regions of China.

    PubMed

    Wang, Tao; Li, Hua; Wang, Hua; Su, Jing

    2015-04-16

    The present study established a typing method with NotI-based pulsed-field gel electrophoresis (PFGE) and stress response gene schemed multilocus sequence typing (MLST) for 55 Oenococcus oeni strains isolated from six individual regions in China and two model strains PSU-1 (CP000411) and ATCC BAA-1163 (AAUV00000000). Seven stress response genes, cfa, clpL, clpP, ctsR, mleA, mleP and omrA, were selected for MLST testing, and positive selective pressure was detected for these genes. Furthermore, both methods separated the strains into two clusters. The PFGE clusters are correlated with the region, whereas the sequence types (STs) formed by the MLST confirm the two clusters identified by PFGE. In addition, the population structure was a mixture of evolutionary pathways, and the strains exhibited both clonal and panmictic characteristics. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM).

    PubMed

    Dipnall, J F; Pasco, J A; Berk, M; Williams, L J; Dodd, S; Jacka, F N; Meyer, D

    2017-01-01

    Key lifestyle-environ risk factors are operative for depression, but it is unclear how risk factors cluster. Machine-learning (ML) algorithms exist that learn, extract, identify and map underlying patterns to identify groupings of depressed individuals without constraints. The aim of this research was to use a large epidemiological study to identify and characterise depression clusters through "Graphing lifestyle-environs using machine-learning methods" (GLUMM). Two ML algorithms were implemented: unsupervised Self-organised mapping (SOM) to create GLUMM clusters and a supervised boosted regression algorithm to describe clusters. Ninety-six "lifestyle-environ" variables were used from the National health and nutrition examination study (2009-2010). Multivariate logistic regression validated clusters and controlled for possible sociodemographic confounders. The SOM identified two GLUMM cluster solutions. These solutions contained one dominant depressed cluster (GLUMM5-1, GLUMM7-1). Equal proportions of members in each cluster rated as highly depressed (17%). Alcohol consumption and demographics validated clusters. Boosted regression identified GLUMM5-1 as more informative than GLUMM7-1. Members were more likely to: have problems sleeping; unhealthy eating; ≤2 years in their home; an old home; perceive themselves underweight; exposed to work fumes; experienced sex at ≤14 years; not perform moderate recreational activities. A positive relationship between GLUMM5-1 (OR: 7.50, P<0.001) and GLUMM7-1 (OR: 7.88, P<0.001) with depression was found, with significant interactions with those married/living with partner (P=0.001). Using ML based GLUMM to form ordered depressive clusters from multitudinous lifestyle-environ variables enabled a deeper exploration of the heterogeneous data to uncover better understandings into relationships between the complex mental health factors. Copyright © 2016 Elsevier Masson SAS. All rights reserved.

  13. Dynamic clustering detection through multi-valued descriptors of dermoscopic images.

    PubMed

    Cozza, Valentina; Guarracino, Maria Rosario; Maddalena, Lucia; Baroni, Adone

    2011-09-10

    This paper introduces a dynamic clustering methodology based on multi-valued descriptors of dermoscopic images. The main idea is to support medical diagnosis to decide if pigmented skin lesions belonging to an uncertain set are nearer to malignant melanoma or to benign nevi. Melanoma is the most deadly skin cancer, and early diagnosis is a current challenge for clinicians. Most data analysis algorithms for skin lesions discrimination focus on segmentation and extraction of features of categorical or numerical type. As an alternative approach, this paper introduces two new concepts: first, it considers multi-valued data that scalar variables not only describe but also intervals or histogram variables; second, it introduces a dynamic clustering method based on Wasserstein distance to compare multi-valued data. The overall strategy of analysis can be summarized into the following steps: first, a segmentation of dermoscopic images allows to identify a set of multi-valued descriptors; second, we performed a discriminant analysis on a set of images where there is an a priori classification so that it is possible to detect which features discriminate the benign and malignant lesions; and third, we performed the proposed dynamic clustering method on the uncertain cases, which need to be associated to one of the two previously mentioned groups. Results based on clinical data show that the grading of specific descriptors associated to dermoscopic characteristics provides a novel way to characterize uncertain lesions that can help the dermatologist's diagnosis. Copyright © 2011 John Wiley & Sons, Ltd.

  14. Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data.

    PubMed

    Sharafi, Zahra; Mousavi, Amin; Ayatollahi, Seyyed Mohammad Taghi; Jafari, Peyman

    2017-01-01

    The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.

  15. Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes.

    PubMed

    Roche, Kimberly E; Weinstein, Marvin; Dunwoodie, Leland J; Poehlman, William L; Feltus, Frank A

    2018-05-25

    We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.

  16. Penalized unsupervised learning with outliers

    PubMed Central

    Witten, Daniela M.

    2013-01-01

    We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057

  17. A method to detect progression of glaucoma using the multifocal visual evoked potential technique

    PubMed Central

    Wangsupadilok, Boonchai; Kanadani, Fabio N.; Grippo, Tomas M.; Liebmann, Jeffrey M.; Ritch, Robert; Hood, Donald C.

    2010-01-01

    Purpose To describe a method for monitoring progression of glaucoma using the multifocal visual evoked potential (mfVEP) technique. Methods Eighty-seven patients diagnosed with open-angle glaucoma were divided into two groups. Group I, comprised 43 patients who had a repeat mfVEP test within 50 days (mean 0.9 ± 0.5 months), and group II, 44 patients who had a repeat test after at least 6 months (mean 20.7 ± 9.7 months). Monocular mfVEPs were obtained using a 60-sector pattern reversal dartboard display. Monocular and interocular analyses were performed. Data from the two visits were compared. The total number of abnormal test points with P < 5% within the visual field (total scores) and number of abnormal test points within a cluster (cluster size) were calculated. Data for group I provided a measure of test–retest variability independent of disease progression. Data for group II provided a possible measure of progression. Results The difference in the total scores for group II between visit 1 and visit 2 for the interocular and monocular comparison was significant (P < 0.05) as was the difference in cluster size for the interocular comparison (P < 0.05). Group I did not show a significant change in either total score or cluster size. Conclusion The change in the total score and cluster size over time provides a possible method for assessing progression of glaucoma with the mfVEP technique. PMID:18830654

  18. Detection of Single Standing Dead Trees from Aerial Color Infrared Imagery by Segmentation with Shape and Intensity Priors

    NASA Astrophysics Data System (ADS)

    Polewski, P.; Yao, W.; Heurich, M.; Krzystek, P.; Stilla, U.

    2015-03-01

    Standing dead trees, known as snags, are an essential factor in maintaining biodiversity in forest ecosystems. Combined with their role as carbon sinks, this makes for a compelling reason to study their spatial distribution. This paper presents an integrated method to detect and delineate individual dead tree crowns from color infrared aerial imagery. Our approach consists of two steps which incorporate statistical information about prior distributions of both the image intensities and the shapes of the target objects. In the first step, we perform a Gaussian Mixture Model clustering in the pixel color space with priors on the cluster means, obtaining up to 3 components corresponding to dead trees, living trees, and shadows. We then refine the dead tree regions using a level set segmentation method enriched with a generative model of the dead trees' shape distribution as well as a discriminative model of their pixel intensity distribution. The iterative application of the statistical shape template yields the set of delineated dead crowns. The prior information enforces the consistency of the template's shape variation with the shape manifold defined by manually labeled training examples, which makes it possible to separate crowns located in close proximity and prevents the formation of large crown clusters. Also, the statistical information built into the segmentation gives rise to an implicit detection scheme, because the shape template evolves towards an empty contour if not enough evidence for the object is present in the image. We test our method on 3 sample plots from the Bavarian Forest National Park with reference data obtained by manually marking individual dead tree polygons in the images. Our results are scenario-dependent and range from a correctness/completeness of 0.71/0.81 up to 0.77/1, with an average center-of-gravity displacement of 3-5 pixels between the detected and reference polygons.

  19. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campbell, W; Miften, M; Jones, B

    Purpose: Pancreatic SBRT relies on extremely accurate delivery of ablative radiation doses to the target, and intra-fractional tracking of fiducial markers can facilitate improvements in dose delivery. However, this requires algorithms that are able to find fiducial markers with high speed and accuracy. The purpose of this study was to develop a novel marker tracking algorithm that is robust against many of the common errors seen with traditional template matching techniques. Methods: Using CBCT projection images, a method was developed to create detailed template images of fiducial marker clusters without prior knowledge of the number of markers, their positions, ormore » their orientations. Briefly, the method (i) enhances markers in projection images, (ii) stabilizes the cluster’s position, (iii) reconstructs the cluster in 3D, and (iv) precomputes a set of static template images dependent on gantry angle. Furthermore, breathing data were used to produce 4D reconstructions of clusters, yielding dynamic template images dependent on gantry angle and breathing amplitude. To test these two approaches, static and dynamic templates were used to track the motion of marker clusters in more than 66,000 projection images from 75 CBCT scans of 15 pancreatic SBRT patients. Results: For both static and dynamic templates, the new technique was able to locate marker clusters present in projection images 100% of the time. The algorithm was also able to correctly locate markers in several instances where only some of the markers were visible due to insufficient field-of-view. In cases where clusters exhibited deformation and/or rotation during breathing, dynamic templates resulted in cross-correlation scores up to 70% higher than static templates. Conclusion: Patient-specific templates provided complete tracking of fiducial marker clusters in CBCT scans, and dynamic templates helped to provide higher cross-correlation scores for deforming/rotating clusters. This novel algorithm provides an extremely accurate method to detect fiducial markers during treatment. Research funding provided by Varian Medical Systems to Miften and Jones.« less

  20. Fast, shape-directed, landmark-based deep gray matter segmentation for quantification of iron deposition

    NASA Astrophysics Data System (ADS)

    Ekin, Ahmet; Jasinschi, Radu; van der Grond, Jeroen; van Buchem, Mark A.; van Muiswinkel, Arianne

    2006-03-01

    This paper introduces image processing methods to automatically detect the 3D volume-of-interest (VOI) and 2D region-of-interest (ROI) for deep gray matter organs (thalamus, globus pallidus, putamen, and caudate nucleus) of patients with suspected iron deposition from MR dual echo images. Prior to the VOI and ROI detection, cerebrospinal fluid (CSF) region is segmented by a clustering algorithm. For the segmentation, we automatically determine the cluster centers with the mean shift algorithm that can quickly identify the modes of a distribution. After the identification of the modes, we employ the K-Harmonic means clustering algorithm to segment the volumetric MR data into CSF and non-CSF. Having the CSF mask and observing that the frontal lobe of the lateral ventricle has more consistent shape accross age and pathological abnormalities, we propose a shape-directed landmark detection algorithm to detect the VOI in a speedy manner. The proposed landmark detection algorithm utilizes a novel shape model of the front lobe of the lateral ventricle for the slices where thalamus, globus pallidus, putamen, and caudate nucleus are expected to appear. After this step, for each slice in the VOI, we use horizontal and vertical projections of the CSF map to detect the approximate locations of the relevant organs to define the ROI. We demonstrate the robustness of the proposed VOI and ROI localization algorithms to pathologies, including severe amounts of iron accumulation as well as white matter lesions, and anatomical variations. The proposed algorithms achieved very high detection accuracy, 100% in the VOI detection , over a large set of a challenging MR dataset.

Top