Science.gov

Sample records for agglomerative cluster analysis

  1. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-11-01

    In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of

  2. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    NASA Astrophysics Data System (ADS)

    Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.

    2015-07-01

    In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor

  3. Value-balanced agglomerative connectivity clustering

    NASA Astrophysics Data System (ADS)

    Gupta, Gunjan K.; Ghosh, Joydeep

    2001-03-01

    In this paper we propose a new clustering framework for transactional data-sets involving large numbers of customers and products. Such transactional data pose particular issues such as very high dimensionality (greater than 10,000), and sparse categorical entries, that have been dealt with more effectively using a graph-based approach to clustering such as ROCK. But large transactional data raises certain other issues such as how to compare diverse products (e.g. milk vs. cars) cluster balancing and outlier removal, that need to be addressed. We first propose a new similarity measure that takes the value of the goods purchased into account, and form a value-based graph representation based on this similarity measure. A novel value-based balancing criterion that allows the user to control the balancing of clusters, is then defined. This balancing criterion is integrated with a value-based goodness measure for merging two clusters in an agglomerative clustering routine. Since graph-based clustering algorithms are very sensitive to outliers, we also propose a fast, effective and simple outlier detection and removal method based on under-clustering or over- partitioning. The performance of the proposed clustering framework is compared with leading graph-theoretic approaches such as ROCK and METIS.

  4. Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering

    PubMed Central

    2015-01-01

    Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments. PMID:26517376

  5. An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

    PubMed Central

    Dawson, Kevin J.; Belkhir, Khalid

    2009-01-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306

  6. Biodiversity Assessment Using Hierarchical Agglomerative Clustering and Spectral Unmixing over Hyperspectral Images

    PubMed Central

    Medina, Ollantay; Manian, Vidya; Chinea, J. Danilo

    2013-01-01

    Hyperspectral images represent an important source of information to assess ecosystem biodiversity. In particular, plant species richness is a primary indicator of biodiversity. This paper uses spectral variance to predict vegetation richness, known as Spectral Variation Hypothesis. Hierarchical agglomerative clustering is our primary tool to retrieve clusters whose Shannon entropy should reflect species richness on a given zone. However, in a high spectral mixing scenario, an additional unmixing step, just before entropy computation, is required; cluster centroids are enough for the unmixing process. Entropies computed using the proposed method correlate well with the ones calculated directly from synthetic and field data. PMID:24132230

  7. Hierarchic Agglomerative Clustering Methods for Automatic Document Classification.

    ERIC Educational Resources Information Center

    Griffiths, Alan; And Others

    1984-01-01

    Considers classifications produced by application of single linkage, complete linkage, group average, and word clustering methods to Keen and Cranfield document test collections, and studies structure of hierarchies produced, extent to which methods distort input similarity matrices during classification generation, and retrieval effectiveness…

  8. Agglomerative clustering-based approach for two-dimensional phase unwrapping.

    PubMed

    Herráez, Miguel Arevalillo; Boticario, Jesús G; Lalor, Michael J; Burton, David R

    2005-03-01

    We describe a novel algorithm for two-dimensional phase unwrapping. The technique combines the principles of agglomerative clustering and use of heuristics to construct a discontinuous quality-guided path. Unlike other quality-guided algorithms, which establish the path at the start of the unwrapping process, our technique constructs the path as the unwrapping process evolves. This makes the technique less prone to error propagation, although it presents higher execution times than other existing algorithms. The algorithm reacts satisfactorily to random noise and breaks in the phase distribution. A variation of the algorithm is also presented that considerably reduces the execution time without affecting the results significantly. PMID:15765690

  9. Evaluating the Feasibility of an Agglomerative Hierarchy Clustering Algorithm for the Automatic Detection of the Arterial Input Function Using DSC-MRI

    PubMed Central

    Yin, Jiandong; Yang, Jiawen; Guo, Qiyong

    2014-01-01

    During dynamic susceptibility contrast-magnetic resonance imaging (DSC-MRI), it has been demonstrated that the arterial input function (AIF) can be obtained using fuzzy c-means (FCM) and k-means clustering methods. However, due to the dependence on the initial centers of clusters, both clustering methods have poor reproducibility between the calculation and recalculation steps. To address this problem, the present study developed an alternative clustering technique based on the agglomerative hierarchy (AH) method for AIF determination. The performance of AH method was evaluated using simulated data and clinical data based on comparisons with the two previously demonstrated clustering-based methods in terms of the detection accuracy, calculation reproducibility, and computational complexity. The statistical analysis demonstrated that, at the cost of a significantly longer execution time, AH method obtained AIFs more in line with the expected AIF, and it was perfectly reproducible at different time points. In our opinion, the disadvantage of AH method in terms of the execution time can be alleviated by introducing a professional high-performance workstation. The findings of this study support the feasibility of using AH clustering method for detecting the AIF automatically. PMID:24932638

  10. Combining Analytical Hierarchy Process and Agglomerative Hierarchical Clustering in Search of Expert Consensus in Green Corridors Development Management

    NASA Astrophysics Data System (ADS)

    Shapira, Aviad; Shoshany, Maxim; Nir-Goldenberg, Sigal

    2013-07-01

    Environmental management and planning are instrumental in resolving conflicts arising between societal needs for economic development on the one hand and for open green landscapes on the other hand. Allocating green corridors between fragmented core green areas may provide a partial solution to these conflicts. Decisions regarding green corridor development require the assessment of alternative allocations based on multiple criteria evaluations. Analytical Hierarchy Process provides a methodology for both a structured and consistent extraction of such evaluations and for the search for consensus among experts regarding weights assigned to the different criteria. Implementing this methodology using 15 Israeli experts—landscape architects, regional planners, and geographers—revealed inherent differences in expert opinions in this field beyond professional divisions. The use of Agglomerative Hierarchical Clustering allowed to identify clusters representing common decisions regarding criterion weights. Aggregating the evaluations of these clusters revealed an important dichotomy between a pragmatist approach that emphasizes the weight of statutory criteria and an ecological approach that emphasizes the role of the natural conditions in allocating green landscape corridors.

  11. Hierarchical agglomerative sub-clustering technique for particles management in PIC simulations

    NASA Astrophysics Data System (ADS)

    Grasso, Giacomo; Frignani, Michele; Rocchi, Federico; Sumini, Marco

    2010-08-01

    The effectiveness of Particle-In-Cell (PIC) codes lies mainly in the robustness of the methods implemented, under the fundamental assumption that a sufficient number of pseudo-particles is concerned for a correct representation of the system. The consequent drawback is the huge increase of computational time required to run a simulation, to what concerns the particles charge assignment to the grid and the motion of the former through the latter. Moreover the coupling of such methods with Monte-Carlo-Collisional (MCC) modules causes another expensive computational cost to simulate particle multiple collisions with background gas and domain boundaries. Particles management techniques are therefore often introduced in PIC-MCC codes in order to improve the distribution of pseudo-particles in the simulation domain: as a matter of facts, the aim at managing the number of samples according to the importance of the considered region is a main question for codes simulating a local phenomenon in a larger domain or a strongly collisional system (e.g.: a ionizing plasma, where the number of particles increases exponentially). A clustering procedure based on the distribution function sampling applied to the 5D phase space (2D in space, 3D in velocity) is here proposed, representing the leading criterion for particles merging and splitting procedures guaranteeing the second order charge moments conservation. Applied to the study of the electrical breakdown in the early discharge phase of a Plasma Focus device, this technique is shown to increase performances of both PIC kernel and MCC module preserving the solution of the electric field and increasing samples representativeness in stochastic calculations (with respect to more traditional merging and splitting procedures).

  12. Instability of Hierarchical Cluster Analysis Due to Input Order of the Data: The PermuCLUSTER Solution

    ERIC Educational Resources Information Center

    van der Kloot, Willem A.; Spaans, Alexander M. J.; Heiser, Willem J.

    2005-01-01

    Hierarchical agglomerative cluster analysis (HACA) may yield different solutions under permutations of the input order of the data. This instability is caused by ties, either in the initial proximity matrix or arising during agglomeration. The authors recommend to repeat the analysis on a large number of random permutations of the rows and columns…

  13. Relation chain based clustering analysis

    NASA Astrophysics Data System (ADS)

    Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo

    2011-08-01

    Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In

  14. An agglomerative approach for shot summarization based on content homogeneity

    NASA Astrophysics Data System (ADS)

    Ioannidis, Antonis; Chasanis, Vasileios; Likas, Aristidis

    2015-02-01

    An efficient shot summarization method is presented based on agglomerative clustering of the shot frames. Unlike other agglomerative methods, our approach relies on a cluster merging criterion that computes the content homogeneity of a merged cluster. An important feature of the proposed approach is the automatic estimation of the number of a shot's most representative frames, called keyframes. The method starts by splitting each video sequence into small, equal sized clusters (segments). Then, agglomerative clustering is performed, where from the current set of clusters, a pair of clusters is selected and merged to form a larger unimodal (homogeneous) cluster. The algorithm proceeds until no further cluster merging is possible. At the end, the medoid of each of the final clusters is selected as keyframe and the set of keyframes constitutes the summary of the shot. Numerical experiments demonstrate that our method reasonable estimates the number of ground-truth keyframes, while extracting non-repetitive keyframes that efficiently summarize the content of each shot.

  15. Cluster analysis of WIBS single particle bioaerosol data

    NASA Astrophysics Data System (ADS)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2012-09-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.

  16. Cluster Morphology Analysis

    PubMed Central

    Jacquez, Geoffrey M.

    2009-01-01

    Most disease clustering methods assume specific shapes and do not evaluate statistical power using the applicable geography, at-risk population, and covariates. Cluster Morphology Analysis (CMA) conducts power analyses of alternative techniques assuming clusters of different relative risks and shapes. Results are ranked by statistical power and false positives, under the rationale that surveillance should (1) find true clusters while (2) avoiding false clusters. CMA then synthesizes results of the most powerful methods. CMA was evaluated in simulation studies and applied to pancreatic cancer mortality in Michigan, and finds clusters of flexible shape while routinely evaluating statistical power. PMID:20234799

  17. Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important.

    PubMed

    Abramyan, Tigran M; Snyder, James A; Thyparambil, Aby A; Stuart, Steven J; Latour, Robert A

    2016-08-01

    Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed-state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc. PMID:27292100

  18. Cluster analysis of WIBS single-particle bioaerosol data

    NASA Astrophysics Data System (ADS)

    Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.

    2013-02-01

    Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP) measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.

  19. [Cluster analysis in biomedical researches].

    PubMed

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research. PMID:24640781

  20. Agglomerative percolation on the Bethe lattice and the triangular cactus

    NASA Astrophysics Data System (ADS)

    Chae, Huiseung; Yook, Soon-Hyung; Kim, Yup

    2013-08-01

    Agglomerative percolation (AP) on the Bethe lattice and the triangular cactus is studied to establish the exact mean-field theory for AP. Using the self-consistent simulation method based on the exact self-consistent equations, the order parameter P∞ and the average cluster size S are measured. From the measured P∞ and S, the critical exponents βk and γk for k = 2 and 3 are evaluated. Here, βk and γk are the critical exponents for P∞ and S when the growth of clusters spontaneously breaks the Zk symmetry of the k-partite graph. The obtained values are β2 = 1.79(3), γ2 = 0.88(1), β3 = 1.35(5) and γ3 = 0.94(2). By comparing these exponents with those for ordinary percolation (β∞ = 1 and γ∞ = 1), we also find β∞ < β3 < β2 and γ∞ > γ3 > γ2. These results quantitatively verify the conjecture that the AP model belongs to a new universality class if the Zk symmetry is broken spontaneously, and the new universality class depends on k.

  1. Mining a Web Citation Database for Author Co-Citation Analysis.

    ERIC Educational Resources Information Center

    He, Yulan; Hui, Siu Cheung

    2002-01-01

    Proposes a mining process to automate author co-citation analysis based on the Web Citation Database, a data warehouse for storing citation indices of Web publications. Describes the use of agglomerative hierarchical clustering for author clustering and multidimensional scaling for displaying author cluster maps, and explains PubSearch, a…

  2. Cluster beam analysis via photoionization

    SciTech Connect

    Grover, J.R. ); Herron, W.J.; Coolbaugh, M.T.; Peifer, W.R.; Garvey, J.F. )

    1991-08-22

    A photoionization method for quantitatively analyzing the neutral products of free jet expansions is described. The basic principle is to measure the yield of an ion characterization of each component cluster at a photon energy just below that at which production of the same ion from larger clusters can be detected. Since there is then no problem with fragmentation, the beam density of each neutral cluster can be measured in the presence of larger clusters. Although these measurements must be done in the test ions' onset regions where their yields are often quite small, the technique is made highly practicable by the large intensities of widely tunable vacuum-ultraviolet synchrotron light now available at electron storage rings. As an example, the method is applied to the analysis of cluster beams collimated from the free jet expansion of a 200:1 ammonia-chlorobenzene mixture.

  3. Cluster Analysis by Linear Contrasts.

    ERIC Educational Resources Information Center

    Shafto, Michael

    The purpose of this paper is to suggest a technique of cluster analysis which is similar in aim to the Interactive Intercolumnar Correlation Analysis (IICA), though different in detail. Two methods are proposed for extracting a single bipolar factor (a "contrast compenent") directly from the initial similarities matrix. The advantages of this…

  4. Detecting Corresponding Vertex Pairs between Planar Tessellation Datasets with Agglomerative Hierarchical Cell-Set Matching

    PubMed Central

    Huh, Yong; Yu, Kiyun; Park, Woojin

    2016-01-01

    This paper proposes a method to detect corresponding vertex pairs between planar tessellation datasets. Applying an agglomerative hierarchical co-clustering, the method finds geometrically corresponding cell-set pairs from which corresponding vertex pairs are detected. Then, the map transformation is performed with the vertex pairs. Since these pairs are independently detected for each corresponding cell-set pairs, the method presents improved matching performance regardless of locally uneven positional discrepancies between dataset. The proposed method was applied to complicated synthetic cell datasets assumed as a cadastral map and a topographical map, and showed an improved result with the F-measures of 0.84 comparing to a previous matching method with the F-measure of 0.48. PMID:27348229

  5. On Comparison of Clustering Methods for Pharmacoepidemiological Data.

    PubMed

    Feuillet, Fanny; Bellanger, Lise; Hardouin, Jean-Benoit; Victorri-Vigneau, Caroline; Sébille, Véronique

    2015-01-01

    The high consumption of psychotropic drugs is a public health problem. Rigorous statistical methods are needed to identify consumption characteristics in post-marketing phase. Agglomerative hierarchical clustering (AHC) and latent class analysis (LCA) can both provide clusters of subjects with similar characteristics. The objective of this study was to compare these two methods in pharmacoepidemiology, on several criteria: number of clusters, concordance, interpretation, and stability over time. From a dataset on bromazepam consumption, the two methods present a good concordance. AHC is a very stable method and it provides homogeneous classes. LCA is an inferential approach and seems to allow identifying more accurately extreme deviant behavior. PMID:24905478

  6. Method for preventing plugging in the pyrolysis of agglomerative coals

    DOEpatents

    Green, Norman W.

    1979-01-23

    To prevent plugging in a pyrolysis operation where an agglomerative coal in a nondeleteriously reactive carrier gas is injected as a turbulent jet from an opening into an elongate pyrolysis reactor, the coal is comminuted to a size where the particles under operating conditions will detackify prior to contact with internal reactor surfaces while a secondary flow of fluid is introduced along the peripheral inner surface of the reactor to prevent backflow of the coal particles. The pyrolysis operation is depicted by two equations which enable preselection of conditions which insure prevention of reactor plugging.

  7. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  8. Multi-viewpoint clustering analysis

    NASA Technical Reports Server (NTRS)

    Mehrotra, Mala; Wild, Chris

    1993-01-01

    In this paper, we address the feasibility of partitioning rule-based systems into a number of meaningful units to enhance the comprehensibility, maintainability and reliability of expert systems software. Preliminary results have shown that no single structuring principle or abstraction hierarchy is sufficient to understand complex knowledge bases. We therefore propose the Multi View Point - Clustering Analysis (MVP-CA) methodology to provide multiple views of the same expert system. We present the results of using this approach to partition a deployed knowledge-based system that navigates the Space Shuttle's entry. We also discuss the impact of this approach on verification and validation of knowledge-based systems.

  9. Patterns of comorbidity in community-dwelling older people hospitalised for fall-related injury: A cluster analysis

    PubMed Central

    2011-01-01

    Background Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. Methods We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity) we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. Results More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7). The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. Conclusions The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients should be investigated by

  10. The applicability and effectiveness of cluster analysis

    NASA Technical Reports Server (NTRS)

    Ingram, D. S.; Actkinson, A. L.

    1973-01-01

    An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.

  11. Modified distance in average linkage based on M-estimator and MADn criteria in hierarchical cluster analysis

    NASA Astrophysics Data System (ADS)

    Muda, Nora; Othman, Abdul Rahman

    2015-10-01

    The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.

  12. Cluster analysis of multiple planetary flow regimes

    NASA Technical Reports Server (NTRS)

    Mo, Kingtse; Ghil, Michael

    1987-01-01

    A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.

  13. Applications of cluster analysis to satellite soundings

    NASA Technical Reports Server (NTRS)

    Munteanu, M. J.; Jakubowicz, O.; Kalnay, E.; Piraino, P.

    1984-01-01

    The advantages of the use of cluster analysis in the improvement of satellite temperature retrievals were evaluated since the use of natural clusters, which are associated with atmospheric temperature soundings characteristic of different types of air masses, has the potential for improving stratified regression schemes in comparison with currently used methods which stratify soundings based on latitude, season, and land/ocean. The method of discriminatory analysis was used. The correct cluster of temperature profiles from satellite measurements was located in 85% of the cases. Considerable improvement was observed at all mandatory levels using regression retrievals derived in the clusters of temperature (weighted and nonweighted) in comparison with the control experiment and with the regression retrievals derived in the clusters of brightness temperatures of 3 MSU and 5 IR channels.

  14. Cluster analysis of multiple planetary flow regimes

    NASA Technical Reports Server (NTRS)

    Mo, Kingtse; Ghil, Michael

    1988-01-01

    A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.

  15. Cluster Analysis of Adolescent Blogs

    ERIC Educational Resources Information Center

    Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan

    2012-01-01

    Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…

  16. Cluster Analysis of the Malaysian Hipposideros

    NASA Astrophysics Data System (ADS)

    Sazali, Siti Nurlydia; Laman, Charlie J.; Abdullah, M. T.

    2008-01-01

    A preliminary study on the morphometric variations among species in the genus Hipposideros was conducted using voucher specimens from the Universiti Malaysia Sarawak (UNIMAS) Zoological Museum and the Department of Wildlife and National Park (DWNP) Kuala Lumpur. A total of 24 individuals from six species of this genus were morphologically studied where all related measurements of body, skull and dental were measured and recorded. The statistical data subjected to the cluster analysis shows that the genus Hipposideros is divided into two major clusters where each species was clearly separated. The cluster analysis among Hipposideros species is useful for aiding in species identification.

  17. Towards optimal cluster power spectrum analysis

    NASA Astrophysics Data System (ADS)

    Smith, Robert E.; Marian, Laura

    2016-04-01

    The power spectrum of galaxy clusters is an important probe of the cosmological model. In this paper, we develop a formalism to compute the optimal weights for the estimation of the matter power spectrum from cluster power spectrum measurements. We find a closed-form analytic expression for the optimal weights, which takes into account: the cluster mass, finite survey volume effects, survey masking, and a flux limit. The optimal weights are w(M,χ ) ∝ b(M,χ )/[1+bar{n}_h(χ ) overline{b^2}(χ )overline{P}(k)], where b(M, χ) is the bias of clusters of mass M at radial position χ(z), bar{n}_h(χ ) and overline{b^2}(χ ) are the expected space density and bias squared of all clusters, and overline{P}(k) is the matter power spectrum at wavenumber k. This result is analogous to that of Percival et al. We compare our optimal weighting scheme with mass weighting and also with the original power spectrum scheme of Feldman et al. We show that our optimal weighting scheme outperforms these approaches for both volume- and flux-limited cluster surveys. Finally, we present a new expression for the Fisher information matrix for cluster power spectrum analysis. Our expression shows that for an optimally weighted cluster survey the cosmological information content is boosted, relative to the standard approach of Tegmark.

  18. ASteCA: Automated Stellar Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Perren, G. I.; Vázquez, R. A.; Piatti, A. E.

    2015-04-01

    We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.

  19. Using Cluster Analysis to Examine Husband-Wife Decision Making

    ERIC Educational Resources Information Center

    Bonds-Raacke, Jennifer M.

    2006-01-01

    Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…

  20. Subtypes of Autism by Cluster Analysis.

    ERIC Educational Resources Information Center

    Eaves, Linda C.; And Others

    1994-01-01

    Cluster analysis of data from 166 children with autistic spectrum disorders revealed 4 subtypes with differences in behavioral and cognitive areas. The four subtypes include a typically autistic group, a low-functioning group, a high-functioning group (Asperger syndrome/schizoid), and a hard-to-diagnose group with mild/moderate retardation and a…

  1. Identifying Peer Institutions Using Cluster Analysis

    ERIC Educational Resources Information Center

    Boronico, Jess; Choksi, Shail S.

    2012-01-01

    The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…

  2. Systematization of actinides using cluster analysis

    SciTech Connect

    Kopyrin, A.A.; Terent`eva, T.N.; Khramov, N.N.

    1994-11-01

    A representation of the actinides in multidimensional property space is proposed for systematization of these elements using cluster analysis. Literature data for their atomic properties are used. Owing to the wide variation of published ionization potentials, medians are used to estimate them. Vertical dendograms are used for classification on the basis of distances between the actinides in atomic-property space. The properties of actinium and lawrencium are furthest removed from the main group. Thorium and mendelevium exhibit individualized properties. A cluster based on the einsteinium-fermium pair is joined by californium.

  3. A Multivariate Analysis of Galaxy Cluster Properties

    NASA Astrophysics Data System (ADS)

    Ogle, P. M.; Djorgovski, S.

    1993-05-01

    We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.

  4. Deterministic algorithm with agglomerative heuristic for location problems

    NASA Astrophysics Data System (ADS)

    Kazakovtsev, L.; Stupina, A.

    2015-10-01

    Authors consider the clustering problem solved with the k-means method and p-median problem with various distance metrics. The p-median problem and the k-means problem as its special case are most popular models of the location theory. They are implemented for solving problems of clustering and many practically important logistic problems such as optimal factory or warehouse location, oil or gas wells, optimal drilling for oil offshore, steam generators in heavy oil fields. Authors propose new deterministic heuristic algorithm based on ideas of the Information Bottleneck Clustering and genetic algorithms with greedy heuristic. In this paper, results of running new algorithm on various data sets are given in comparison with known deterministic and stochastic methods. New algorithm is shown to be significantly faster than the Information Bottleneck Clustering method having analogous preciseness.

  5. Cluster Analysis for CTBT Seismic Event Monitoring

    SciTech Connect

    Carr, Dorthe B.; Young, Chris J.; Aster, Richard C.; Zhang, Xioabing

    1999-08-03

    , respectively. The clustering techniques prove to be much more effective for the New Mexico data than the Wyoming data, apparently because the New Mexico mines are closer and consequently the signal to noise ratios (SNR's) for those events are higher. To verify this hypothesis we experiment with adding gaussian noise to the New Mexico data to simulate data from more distant sites. Our results suggest that clustering techniques can be very useful for identifying small anomalous events if at least one good recording is available, and that the only reliable way to improve clustering results is to process the waveforms to improve SNR. For events with good SNR that do have strong grouping, cluster analysis will reveal the inherent groupings regardless of the choice of clustering method.

  6. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    PubMed

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks. PMID:26357321

  7. Cluster analysis of contaminated sediment data: nodal analysis.

    PubMed

    Hartwell, S Ian; Claflin, Larry W

    2005-07-01

    The objective of the present study was to explore the use of multivariate statistical methods as a means to discern relationships between contaminants and biological and/or toxicological effects in a representative data set from the National Status and Trends (NS&T) Program. Data from the National Oceanic and Atmospheric Administration, NS&T Program's Bioeffects Survey of Delaware Bay, USA, were examined using various univariate and multivariate statistical techniques, including cluster analysis. Each approach identified consistent patterns and relationships between the three types of triad data. The analyses also identified factors that bias the interpretation of the data, primarily the presence of rare and unique species and the dependence of species distributions on physical parameters. Sites and species were clustered with the unweighted pair-group method using arithmetic averages clustering with the Jaccard coefficient that clustered species and sites into mutually consistent groupings. Pearson product moment correlation coefficients, normalized for salinity, also were clustered. The most informative analysis, termed nodal analysis, was the intersection of species cluster analysis with site cluster analysis. This technique produced a visual representation of species association patterns among site clusters. Site characteristics, such as salinity and grain size, not contaminant concentrations, appeared to be the primary factors determining species distributions. This suggests the sediment-quality triad needs to use physical parameters as a distinct leg from chemical concentrations to improve sediment-quality assessments in large bodies of water. Because the Delaware Bay system has confounded gradients of contaminants and physical parameters, analyses were repeated with data from northern Chesapeake Bay, USA, with similar results. PMID:16050601

  8. AMOEBA clustering revisited. [cluster analysis, classification, and image display program

    NASA Technical Reports Server (NTRS)

    Bryant, Jack

    1990-01-01

    A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.

  9. Meaningful statistical analysis of large computational clusters.

    SciTech Connect

    Gentile, Ann C.; Marzouk, Youssef M.; Brandt, James M.; Pebay, Philippe Pierre

    2005-07-01

    Effective monitoring of large computational clusters demands the analysis of a vast amount of raw data from a large number of machines. The fundamental interactions of the system are not, however, well-defined, making it difficult to draw meaningful conclusions from this data, even if one were able to efficiently handle and process it. In this paper we show that computational clusters, because they are comprised of a large number of identical machines, behave in a statistically meaningful fashion. We therefore can employ normal statistical methods to derive information about individual systems and their environment and to detect problems sooner than with traditional mechanisms. We discuss design details necessary to use these methods on a large system in a timely and low-impact fashion.

  10. Equivalent damage validation by variable cluster analysis

    NASA Astrophysics Data System (ADS)

    Drago, Carlo; Ferlito, Rachele; Zucconi, Maria

    2016-06-01

    The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.

  11. Chaotic map clustering algorithm for EEG analysis

    NASA Astrophysics Data System (ADS)

    Bellotti, R.; De Carlo, F.; Stramaglia, S.

    2004-03-01

    The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

  12. Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data.

    PubMed

    Yu, Zhiwen; Chen, Hantao; You, Jane; Liu, Jiming; Wong, Hau-San; Han, Guoqiang; Li, Le

    2015-01-01

    Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms. PMID:26357330

  13. Accelerating DNA analysis applications on GPU clusters

    SciTech Connect

    Tumeo, Antonino; Villa, Oreste

    2010-06-13

    DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also include heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.

  14. Cluster analysis of word frequency dynamics

    NASA Astrophysics Data System (ADS)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  15. Prume Heating Analysis of Clustered Rocket Engines

    NASA Astrophysics Data System (ADS)

    Maemura, Takashi; Igarashi, Iwao

    The H-IIB launch vehicle is an upgraded version of the current H-IIA launch capacity, which has two liquid rocket engines (LE-7A) in the first-stage, instead of one for the H-IIA. It has four SRB-As attached to the body, while the standard version of H-IIA had two SRB-As. One of the major design issue of H-IIB launch vehicle is increased prume heating due to clustering two LE-7A engines and four SRB-As, especially the interaction of engine prumes at high altitude. This paper describes the prume analysys method of H-IIB launch vehicle which is based on the flight proven method of the current H-IIA launch.

  16. Failure Mode Identification Through Clustering Analysis

    NASA Technical Reports Server (NTRS)

    Arunajadai, Srikesh G.; Stone, Robert B.; Tumer, Irem Y.; Clancy, Daniel (Technical Monitor)

    2002-01-01

    Research has shown that nearly 80% of the costs and problems are created in product development and that cost and quality are essentially designed into products in the conceptual stage. Currently, failure identification procedures (such as FMEA (Failure Modes and Effects Analysis), FMECA (Failure Modes, Effects and Criticality Analysis) and FTA (Fault Tree Analysis)) and design of experiments are being used for quality control and for the detection of potential failure modes during the detail design stage or post-product launch. Though all of these methods have their own advantages, they do not give information as to what are the predominant failures that a designer should focus on while designing a product. This work uses a functional approach to identify failure modes, which hypothesizes that similarities exist between different failure modes based on the functionality of the product/component. In this paper, a statistical clustering procedure is proposed to retrieve information on the set of predominant failures that a function experiences. The various stages of the methodology are illustrated using a hypothetical design example.

  17. Random sequential renormalization and agglomerative percolation in networks: Application to Erdös-Rényi and scale-free graphs

    NASA Astrophysics Data System (ADS)

    Bizhani, Golnoosh; Grassberger, Peter; Paczuski, Maya

    2011-12-01

    We study the statistical behavior under random sequential renormalization (RSR) of several network models including Erdös-Rényi (ER) graphs, scale-free networks, and an annealed model related to ER graphs. In RSR the network is locally coarse grained by choosing at each renormalization step a node at random and joining it to all its neighbors. Compared to previous (quasi-)parallel renormalization methods [Song , Nature (London)NATUAS0028-083610.1038/nature03248 433, 392 (2005)], RSR allows a more fine-grained analysis of the renormalization group (RG) flow and unravels new features that were not discussed in the previous analyses. In particular, we find that all networks exhibit a second-order transition in their RG flow. This phase transition is associated with the emergence of a giant hub and can be viewed as a new variant of percolation, called agglomerative percolation. We claim that this transition exists also in previous graph renormalization schemes and explains some of the scaling behavior seen there. For critical trees it happens as N/N0→0 in the limit of large systems (where N0 is the initial size of the graph and N its size at a given RSR step). In contrast, it happens at finite N/N0 in sparse ER graphs and in the annealed model, while it happens for N/N0→1 on scale-free networks. Critical exponents seem to depend on the type of the graph but not on the average degree and obey usual scaling relations for percolation phenomena. For the annealed model they agree with the exponents obtained from a mean-field theory. At late times, the networks exhibit a starlike structure in agreement with the results of Radicchi [Phys. Rev. Lett.PRLTAO0031-900710.1103/PhysRevLett.101.148701 101, 148701 (2008)]. While degree distributions are of main interest when regarding the scheme as network renormalization, mass distributions (which are more relevant when considering “supernodes” as clusters) are much easier to study using the fast Newman-Ziff algorithm for

  18. A Hybrid Monkey Search Algorithm for Clustering Analysis

    PubMed Central

    Chen, Xin; Zhou, Yongquan; Luo, Qifang

    2014-01-01

    Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis. PMID:24772039

  19. An analysis of hospital brand mark clusters.

    PubMed

    Vollmers, Stacy M; Miller, Darryl W; Kilic, Ozcan

    2010-07-01

    This study analyzed brand mark clusters (i.e., various types of brand marks displayed in combination) used by hospitals in the United States. The brand marks were assessed against several normative criteria for creating brand marks that are memorable and that elicit positive affect. Overall, results show a reasonably high level of adherence to many of these normative criteria. Many of the clusters exhibited pictorial elements that reflected benefits and that were conceptually consistent with the verbal content of the cluster. Also, many clusters featured icons that were balanced and moderately complex. However, only a few contained interactive imagery or taglines communicating benefits. PMID:20582849

  20. A Survey of Popular R Packages for Cluster Analysis

    ERIC Educational Resources Information Center

    Flynt, Abby; Dean, Nema

    2016-01-01

    Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…

  1. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    ERIC Educational Resources Information Center

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is applied…

  2. Using Cluster Analysis for Data Mining in Educational Technology Research

    ERIC Educational Resources Information Center

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  3. Application of cluster analysis to aerometric data (journal version)

    SciTech Connect

    Crutcher, H.L.; Rhodes, R.C.; Graves, M.E.; Fairbairn, B.; Nelson, A.C.

    1986-01-01

    The NORMIX data-analysis program, which incorporates cluster-analysis and multivariate statistical-analysis routines, was modified and revised for use in a UNIVAC 1110 computer. The revised program was tested on three sample data sets and produced results in agreement with those from the original program. The NORMIX program was then used to evaluate and analyze eight sets of aerometric data from various sources. Comparison of the performance of NORMIX with two other cluster analysis algorithms, MIKCA and SAS CLUSTER, revealed that all three programs produce similar results in terms of hierarchical clustering, but NORMIX produces considerably more statistical evaluation and information to the user. Thus NORMIX is recommended as the most useful cluster analysis program of these three.

  4. Cluster analysis of the hot subdwarfs in the PG survey

    NASA Technical Reports Server (NTRS)

    Thejll, Peter; Charache, Darryl; Shipman, Harry L.

    1989-01-01

    Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.

  5. Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

    ERIC Educational Resources Information Center

    DiStefano, Christine; Kamphaus, R. W.

    2006-01-01

    Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…

  6. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    PubMed Central

    Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms—Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large

  7. MASSCLEAN: MASSive CLuster Evolution and ANalysis package -- A new tool for stellar clusters

    NASA Astrophysics Data System (ADS)

    Popescu, Bogdan

    2010-11-01

    Stellar clusters are laboratories for stellar evolution. Their stellar content have an uniform age and chemical composition, but span a large mass interval. The majority of stars are born in clusters and end up in the general field population. An accurate characterization of stellar clusters could be used to built better models, from stellar evolution to the evolution of an entire galaxy. Regardless of the fact that they are so close, for many Milky Way clusters it is difficult to be observed because they are obscured by the dust in the disk of our Galaxy. The clusters from the Local Group and beyond are too distant, so only their integrated properties could be used most of the time. There is one way to analyze the observational data, to search for clusters, and to describe them: simulations. MASSCLEAN (MASSive CLuster Evolution and ANalysis) package was developed to provide a better characterization of Galactic clusters, to derive selection effects of current surveys, and to provide information about the extra-galactic clusters. Simulations of known Galactic clusters are used to get better constraints on their parameters, like mass, age, extinction, chemical composition and distance. This is the traditional way to describe the Galactic clusters, fitting the data using the available models. The difference is that MASSCLEAN simulations provide a consistent set of parameters. The majority of extra-galactic clusters are known only from their integrated properties, integrated magnitudes and colors. The current models for stellar populations are available only in the infinite mass limit. But the real clusters have a finite mass, and their integrated colors show a large dispersion (stochastic fluctuations). The description of the variation of integrated colors as a function of mass and age lead to the creation of MASSCLEANcolors database, based on 70 million Monte Carlo simulations. Since the entries in the database form a consistent set of integrated colors, integrated

  8. Visual cluster analysis and pattern recognition methods

    DOEpatents

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    2001-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  9. The REFLEX II Galaxy Cluster sample: mock catalogues and clustering analysis

    NASA Astrophysics Data System (ADS)

    Balaguera-Antolinez, Andres; Sanchez, Ariel G.; Bohringer, Hans

    2012-09-01

    We present results of the analysis of abundance and clustering from the new ROSAT-ESO Flux-Limited X-Ray (REFLEX) II galaxy cluster catalogue. To model the covariance matrix of the different statistics, we have created a set of 100 mock galaxy cluster catalogues built from a suite large volume LambdaCDM N-Body simulations (L-BASICC and calibrated with the X-ray luminosity function. We discuss the calibration scheme and some implications regarding the cluster scaling relations, particularly, the link between mass and luminosity. Similarly we show the behavior of the clustering signal as a function of the X-ray luminosity and some cosmological implications.

  10. The Sensitivity of Atmospheric Trajectory Cluster Analysis Results to Clustering Methods Using Trajectories to the PICO-NARE Station

    NASA Astrophysics Data System (ADS)

    Owen, R. C.; Honrath, R. E.; Merrill, J.

    2003-12-01

    The use of cluster analysis to group atmospheric trajectories according to similar flow paths has become a common tool in atmospheric studies. Many methods are available to conduct a cluster analysis. However, the dependence of the resulting clusters upon the specific clustering method chosen has not been fully characterized. Specifically, the use of hierarchical versus non-hierarchical clustering algorithms has received little focus. This study presents the results of two cluster analyses: one using the hierarchical clustering algorithm average linkage, and one using the non-hierarchical clustering algorithm k-means. These results demonstrate the sensitivity of this cluster analysis to the use of a hierarchical method versus a non-hierarchical method. In addition, this study analyzes methods for dealing with the vertical component of trajectories during the clustering process. The analyses were performed using a 40-year set of trajectories to the PICO-NARE station, located atop Pico Mountain in the Azores Islands in the central North Atlantic.

  11. A Note on Cluster Effects in Latent Class Analysis

    ERIC Educational Resources Information Center

    Kaplan, David; Keller, Bryan

    2011-01-01

    This article examines the effects of clustering in latent class analysis. A comprehensive simulation study is conducted, which begins by specifying a true multilevel latent class model with varying within- and between-cluster sample sizes, varying latent class proportions, and varying intraclass correlations. These models are then estimated under…

  12. Optimal Cluster Sizes for Wireless Sensor Networks: An Experimental Analysis

    NASA Astrophysics Data System (ADS)

    Förster, Anna; Förster, Alexander; Murphy, Amy L.

    Node clustering and data aggregation are popular techniques to reduce energy consumption in large WSNs and a large body of literature has emerged describing various clustering protocols. Unfortunately, for practitioners wishing to exploit clustering in deployments, there is little help when trying to identify a protocol that meets their needs. This paper takes a step back from specific protocols to consider the fundamental question: what is the optimal cluster size in terms of the resulting communication generated to collect data. Our experimental analysis considers a wide range of parameters that characterize the WSN, and shows that in the most common cases, clusters in which all nodes can communicate in one hop to the cluster head are optimal.

  13. Hierarchical spike clustering analysis for investigation of interneuron heterogeneity.

    PubMed

    Boehlen, Anne; Heinemann, Uwe; Henneberger, Christian

    2016-04-21

    Action potentials represent the output of a neuron. Especially interneurons display a variety of discharge patterns ranging from regular action potential firing to prominent spike clustering or stuttering. The mechanisms underlying this heterogeneity remain incompletely understood. We established hierarchical cluster analysis of spike trains as a measure of spike clustering. A clustering index was calculated from action potential trains recorded in the whole-cell patch clamp configuration from hippocampal (CA1, stratum radiatum) and entorhinal (medial entorhinal cortex, layer 2) interneurons in acute slices and simulated data. Prominent, region-dependent, but also variable spike clustering was detected using this measure. Further analysis revealed a strong positive correlation between spike clustering and membrane potentials oscillations but an inverse correlation with neuronal resonance. Furthermore, clustering was more pronounced when the balance between fast-activating K(+) currents, assessed by the spike repolarisation time, and hyperpolarization-activated currents, gauged by the size of the sag potential, was shifted in favour of fast K(+) currents. Simulations of spike clustering confirmed that variable ratios of fast K(+) and hyperpolarization-activated currents could underlie different degrees of spike clustering and could thus be crucial for temporally structuring interneuron spike output. PMID:26987719

  14. Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis

    PubMed Central

    Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis

    2016-01-01

    Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230

  15. Cluster analysis of water-quality data for Lake Sakakawea, Audubon Lake, and McClusky Canal, central North Dakota, 1990-2003

    USGS Publications Warehouse

    Ryberg, Karen R.

    2006-01-01

    As a result of the Dakota Water Resources Act of 2000, the Bureau of Reclamation, U.S. Department of the Interior, identified eight water-supply alternatives (including a no-action alternative) to meet future water needs in portions of the Red River of the North (Red River) Basin. Of those alternatives, four include the interbasin transfer of water from the Missouri River Basin to the Red River Basin. Three of the interbasin transfer alternatives would use the McClusky Canal, located in central North Dakota, to transport the water. Therefore, the water quality of the McClusky Canal and the sources of its water, Lake Sakakawea and Audubon Lake, is of interest to water-quality stakeholders. The Bureau of Reclamation collected water-quality samples at 23 sites on Lake Sakakawea, Audubon Lake, and the McClusky Canal system from 1990 through 2003. Physical properties and water-quality constituents from these samples were summarized and analyzed by the U.S. Geological Survey using hierarchical agglomerative cluster analysis (HACA). HACA separated the samples into related clusters, or groups. These groups were examined for statistical significance and relation to structure of the McClusky Canal system. Statistically, the sample groupings found using HACA were significantly different from each other and appear to result from spatial and temporal water-quality differences corresponding with different sections of the canal and different operational conditions. Future operational changes of the canal system may justify additional water-quality sampling to characterize possible water-quality changes.

  16. Visual verification and analysis of cluster detection for molecular dynamics.

    PubMed

    Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas

    2007-01-01

    A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented. PMID:17968118

  17. Automated analysis of organic particles using cluster SIMS

    NASA Astrophysics Data System (ADS)

    Gillen, Greg; Zeissler, Cindy; Mahoney, Christine; Lindstrom, Abigail; Fletcher, Robert; Chi, Peter; Verkouteren, Jennifer; Bright, David; Lareau, Richard T.; Boldman, Mike

    2004-06-01

    Cluster primary ion bombardment combined with secondary ion imaging is used on an ion microscope secondary ion mass spectrometer for the spatially resolved analysis of organic particles on various surfaces. Compared to the use of monoatomic primary ion beam bombardment, the use of a cluster primary ion beam (SF 5+ or C 8-) provides significant improvement in molecular ion yields and a reduction in beam-induced degradation of the analyte molecules. These characteristics of cluster bombardment, along with automated sample stage control and custom image analysis software are utilized to rapidly characterize the spatial distribution of trace explosive particles, narcotics and inkjet-printed microarrays on a variety of surfaces.

  18. Atlas-guided cluster analysis of large tractography datasets.

    PubMed

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  19. Atlas-Guided Cluster Analysis of Large Tractography Datasets

    PubMed Central

    Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer

    2013-01-01

    Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292

  20. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    ERIC Educational Resources Information Center

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  1. Using cluster analysis to organize and explore regional GPS velocities

    USGS Publications Warehouse

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  2. Comparative analysis of genomic signal processing for microarray data clustering.

    PubMed

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods. PMID:22157075

  3. A Distributed Flocking Approach for Information Stream Clustering Analysis

    SciTech Connect

    Cui, Xiaohui; Potok, Thomas E

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  4. Space-Time Cluster Analysis of Invasive Meningococcal Disease

    PubMed Central

    de Melker, Hester; Spanjaard, Lodewijk; Dankert, Jacob; Nagelkerke, Nico

    2004-01-01

    Clusters are recognized when meningococcal cases of the same phenotypic strain (markers: serogroup, serotype, and subtype) occur in spatial and temporal proximity. The incidence of such clusters was compared to the incidence that would be expected by chance by using space-time nearest-neighbor analysis of 4,887 confirmed invasive meningococcal cases identified in the 9-year surveillance period 1993–2001 in the Netherlands. Clustering beyond chance only occurred among the closest neighboring cases (comparable to secondary cases) and was small (3.1%, 95% confidence interval 2.1%–4.1%). PMID:15498165

  5. High-order fluorescence fluctuation analysis of model protein clusters.

    PubMed Central

    Palmer, A G; Thompson, N L

    1989-01-01

    The technique of high-order fluorescence fluctuation autocorrelation for detecting and characterizing protein oligomers was applied to solutions containing two fluorescent proteins in which the more fluorescent proteins were analogues for clusters of the less fluorescent ones. The results show that the model protein clusters can be detected for average numbers of observed subunits (free monomers plus monomers in oligomers) equal to 10-100 and for relative fluorescent yields that correspond to oligomers as small as trimers. High-order fluorescent fluctuation analysis may therefore be applicable to cell surface receptor clusters in natural or model membranes. PMID:2548201

  6. Multivariate Analysis of the Globular Clusters in M87

    NASA Astrophysics Data System (ADS)

    Das, Sukanta; Chattopadhayay, Tanuka; Davoust, Emmanuel

    2015-11-01

    An objective classification of 147 globular clusters (GCs) in the inner region of the giant elliptical galaxy M87 is carried out with the help of two methods of multivariate analysis. First, independent component analysis (ICA) is used to determine a set of independent variables that are linear combinations of various observed parameters (mostly Lick indices) of the GCs. Next, K-means cluster analysis (CA) is applied on the independent components (ICs), to find the optimum number of homogeneous groups having an underlying structure. The properties of the four groups of GCs thus uncovered are used to explain the formation mechanism of the host galaxy. It is suggested that M87 formed in two successive phases. First a monolithic collapse, which gave rise to an inner group of metal-rich clusters with little systematic rotation and an outer group of metal-poor clusters in eccentric orbits. In a second phase, the galaxy accreted low-mass satellites in a dissipationless fashion, from the gas of which the two other groups of GCs formed. Evidence is given for a blue stellar population in the more metal rich clusters, which we interpret by Helium enrichment. Finally, it is found that the clusters of M87 differ in some of their chemical properties (NaD, TiO1, light-element abundances) from GCs in our Galaxy and M31.

  7. The Enhanced Hoshen-Kopelman Algorithm for Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Hoshen, Joseph

    1997-08-01

    In 1976 Hoshen and Kopelman(J. Hoshen and R. Kopelman, Phys. Rev. B, 14, 3438 (1976).) introduced a breakthrough algorithm, known today as the Hoshen-Kopelman algorithm, for cluster analysis. This algorithm revolutionized Monte Carlo cluster calculations in percolation theory as it enables analysis of very large lattices containing 10^11 or more sites. Initially the HK algorithm primary use was in the domain of pure and basic sciences. Later it began finding applications in diverse fields of technology and applied sciences. Example of such applications are two and three dimensional image analysis, composite material modeling, polymers, remote sensing, brain modeling and food processing. While the original HK algorithm provides only cluster size data for only one class of sites, the Enhanced HK (EHK) algorithm, presented in this paper, enables calculations of cluster spatial moments -- characteristics of cluster shapes -- for multiple classes of sites. These enhancements preserve the time and space complexities of the original HK algorithm, such that very large lattices could be still analyzed simultaneously in a single pass through the lattice for cluster sizes, classes and shapes.

  8. Application of Subspace Clustering in DNA Sequence Analysis.

    PubMed

    Wallace, Tim; Sekmen, Ali; Wang, Xiaofei

    2015-10-01

    Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis. PMID:26162018

  9. Towards eliminating bias in cluster analysis of TB genotyped data.

    PubMed

    van Schalkwyk, Cari; Cule, Madeleine; Welte, Alex; van Helden, Paul; van der Spuy, Gian; Uys, Pieter

    2012-01-01

    The relative contributions of transmission and reactivation of latent infection to TB cases observed clinically has been reported in many situations, but always with some uncertainty. Genotyped data from TB organisms obtained from patients have been used as the basis for heuristic distinctions between circulating (clustered strains) and reactivated infections (unclustered strains). Naïve methods previously applied to the analysis of such data are known to provide biased estimates of the proportion of unclustered cases. The hypergeometric distribution, which generates probabilities of observing clusters of a given size as realized clusters of all possible sizes, is analyzed in this paper to yield a formal estimator for genotype cluster sizes. Subtle aspects of numerical stability, bias, and variance are explored. This formal estimator is seen to be stable with respect to the epidemiologically interesting properties of the cluster size distribution (the number of clusters and the number of singletons) though it does not yield satisfactory estimates of the number of clusters of larger sizes. The problem that even complete coverage of genotyping, in a practical sampling frame, will only provide a partial view of the actual transmission network remains to be explored. PMID:22479534

  10. Towards Eliminating Bias in Cluster Analysis of TB Genotyped Data

    PubMed Central

    Welte, Alex; van Helden, Paul; van der Spuy, Gian; Uys, Pieter

    2012-01-01

    The relative contributions of transmission and reactivation of latent infection to TB cases observed clinically has been reported in many situations, but always with some uncertainty. Genotyped data from TB organisms obtained from patients have been used as the basis for heuristic distinctions between circulating (clustered strains) and reactivated infections (unclustered strains). Naïve methods previously applied to the analysis of such data are known to provide biased estimates of the proportion of unclustered cases. The hypergeometric distribution, which generates probabilities of observing clusters of a given size as realized clusters of all possible sizes, is analyzed in this paper to yield a formal estimator for genotype cluster sizes. Subtle aspects of numerical stability, bias, and variance are explored. This formal estimator is seen to be stable with respect to the epidemiologically interesting properties of the cluster size distribution (the number of clusters and the number of singletons) though it does not yield satisfactory estimates of the number of clusters of larger sizes. The problem that even complete coverage of genotyping, in a practical sampling frame, will only provide a partial view of the actual transmission network remains to be explored. PMID:22479534

  11. Density of points clustering, application to transcriptomic data analysis

    PubMed Central

    Wicker, Nicolas; Dembele, Doulaye; Raffelsberger, Wolfgang; Poch, Olivier

    2002-01-01

    With the increasing amount of data produced by high-throughput technologies in many fields of science, clustering has become an integral step in exploratory data analysis in order to group similar elements into classes. However, many clustering algorithms can only work properly if aided by human expertise. For example, one parameter which is crucial and often manually set is the number of clusters present in the analyzed set. We present a novel stopping rule to find the optimal number of clusters based on the comparison of the density of points inside the clusters and between them. The method is evaluated on synthetic as well as on real transcriptomic data and compared with two current methods. Finally, we illustrate its usefulness in the analysis of the expression profiles of promyelocytic cells before and after treatment with all-trans retinoic acid. Simultaneous clustering for gene regulation and absolute initial expression levels allowed the identification of numerous genes associated with signal transduction revealing the complexity of retinoic acid signaling. PMID:12235383

  12. Application of Vertical Cluster Analysis Method to the Analysis of Time Dependent Biological Data Sets

    NASA Astrophysics Data System (ADS)

    Chandra, Sathees B. C.; Wang, Yao

    The purpose of this study is to apply vertical cluster analysis method to interpret and analyze habituation of the leg movement response, to different odors, in fruit flies. In most cases cluster analysis methods are used to analyze data sets, which can be classified into categories. We define this type of method as horizontal cluster analysis method. In this study, instead of dividing the data into categories, we divide the data based on different periods of time. We define this method as a vertical cluster analysis method. Here we apply vertical cluster analysis method to evaluate the habituation of leg movement responses of fruit fly, Drosophila melanogaster. The vertical cluster analyses helped us to identify hidden features of fruit fly behavior.

  13. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters III: Analysis of 30 Clusters

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, R.; Stenning, D. C.; Sarajedini, A.; von Hippel, T.; van Dyk, D. A.; Robinson, E.; Stein, N.; Jefferys, W. H.

    2016-09-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ˜0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed globular cluster formation scenarios. Additionally, we leverage our Bayesian technique to shed light on inconsistencies between the theoretical models and the observed data.

  14. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    PubMed

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-01

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (P<0.001), independent of age, height, weight, and running speed. When these two groups were compared to PFP runners, one cluster exhibited greater while the other exhibited reduced peak knee abduction angles (P<0.05). The variability observed in running patterns across this sample could be the result of different gait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners. PMID:26456422

  15. Clustering and classification techniques for the analysis of vibration signatures

    NASA Astrophysics Data System (ADS)

    Alguindigue, Israel E.; Loskiewicz-Buczak, Anna; Uhrig, Robert E.

    1992-09-01

    A methodology is proposed for the clustering and classification of vibration signatures in the frequency domain. The technique is based on the technologies of neural networks and fuzzy clustering and it is especially suited for the problem of vibration analysis because it permits the incorporation of specific knowledge about the domain in a very simple manner, and because the system learns from actual process data. The system uses the backpropagation algorithm for classification of compressed signatures, where compression is used as a mechanism for noise removal and automatic feature extraction. The clustering system uses the Fuzzy C algorithm with a matrix of weights for the calculation of distances between patterns and centroids. The matrix is used to assign factors of importance to frequencies in the spectrum which are known to be related to particular defects. The two aspects of the analysis (clustering and classification) are complementary because in many cases the exact operating state of a machine cannot be assessed, and clustering may unveil classes of operating states that would not be discovered otherwise. Accurate results were obtained from testing the system on rolling element bearing data.

  16. Phage cluster relationships identified through single gene analysis

    PubMed Central

    2013-01-01

    Background Phylogenetic comparison of bacteriophages requires whole genome approaches such as dotplot analysis, genome pairwise maps, and gene content analysis. Currently mycobacteriophages, a highly studied phage group, are categorized into related clusters based on the comparative analysis of whole genome sequences. With the recent explosion of phage isolation, a simple method for phage cluster prediction would facilitate analysis of crude or complex samples without whole genome isolation and sequencing. The hypothesis of this study was that mycobacteriophage-cluster prediction is possible using comparison of a single, ubiquitous, semi-conserved gene. Tape Measure Protein (TMP) was selected to test the hypothesis because it is typically the longest gene in mycobacteriophage genomes and because regions within the TMP gene are conserved. Results A single gene, TMP, identified the known Mycobacteriophage clusters and subclusters using a Gepard dotplot comparison or a phylogenetic tree constructed from global alignment and maximum likelihood comparisons. Gepard analysis of 247 mycobacteriophage TMP sequences appropriately recovered 98.8% of the subcluster assignments that were made by whole-genome comparison. Subcluster-specific primers within TMP allow for PCR determination of the mycobacteriophage subcluster from DNA samples. Using the single-gene comparison approach for siphovirus coliphages, phage groupings by TMP comparison reflected relationships observed in a whole genome dotplot comparison and confirm the potential utility of this approach to another widely studied group of phages. Conclusions TMP sequence comparison and PCR results support the hypothesis that a single gene can be used for distinguishing phage cluster and subcluster assignments. TMP single-gene analysis can quickly and accurately aid in mycobacteriophage classification. PMID:23777341

  17. Hierarchical clustering of 54 races and strains of the mulberry silkworm, Bombyx mori L: Significance of biochemical parameters.

    PubMed

    Chatterjee, S N; Datta, R K

    1992-12-01

    A detailed analysis was undertaken to test the efficacy of hierarchical agglomerative clustering (UPGMA method) in grouping the races and strains of the mulberry silkworm, Bombyx moti L., and to ascertain the importance of biochemical parameters in the clustering process. The analysis was based on data from two rearing seasons with 54 selected races/strains of different geographic origin and varying yield potentials. The results indicate that seven clusters can be realised with yield parameters alone, whereas the inclusion of biochemical parameters in clustering resulted into two broad groups: one having all the breeds with high cocoon weight and shell weight, the other having all the low-yielding silkworm strains both from India and from other countries. Further sub-grouping under these two groups highlights genetical differences associated with the differentiation of various groups of races in temperate and tropical areas as well as their significance for silkworm breeding. Estimates of all ten variables were further subjected to 'quick clustering' and the results showed that cluster 5, constituted by 38 lowyielding strains of India, China and Europe, had the highest values of the final cluster centre for amylase and the effective rate of rearing (ERR), while clusters 1 and 4 had the highest values for invertase and alkaline phosphatase. The evolutionary aspect of the genetic channelisation of silkworm races from various countries is discussed against the background of differences in the biochemical parameters and yield variables. PMID:24197452

  18. Mokken Scale Analysis Using Hierarchical Clustering Procedures

    ERIC Educational Resources Information Center

    van Abswoude, Alexandra A. H.; Vermunt, Jeroen K.; Hemker, Bas T.; van der Ark, L. Andries

    2004-01-01

    Mokken scale analysis (MSA) can be used to assess and build unidimensional scales from an item pool that is sensitive to multiple dimensions. These scales satisfy a set of scaling conditions, one of which follows from the model of monotone homogeneity. An important drawback of the MSA program is that the sequential item selection and scale…

  19. Cluster Analysis in Minority Group Poverty Studies.

    ERIC Educational Resources Information Center

    Ross, E. Lamar

    This paper, one of a series which arose out of data gathered on Choctaw Indians, Negroes, and whites in a low income area of Mississippi, expands upon one aspect of a recently completed analysis by the author. In the study, an attempt was made to distinguish between the characteristics associated with income levels and those related to ethnic…

  20. Influence of Scholarships on STEM Teachers: Cluster Analysis and Characteristics

    ERIC Educational Resources Information Center

    Liou, Pey-Yan; Desjardins, Christopher David; Lawrenz, Frances

    2010-01-01

    Science, technology, engineering, and mathematics (STEM) teachers' perceptions about the influence of scholarship on their decision to teach and to teach in a high-needs school were examined using cluster analysis. Three hundred and four STEM scholars, who were currently teaching, and who received funding from 45 institutions located throughout…

  1. COMPARATIVE STRATEGIES FOR USING CLUSTER ANALYSIS TO ASSESS DIETARY PATTERNS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The objective of this study was to characterize dietary patterns using two different cluster analysis strategies. In this cross-sectional study, diet information was assessed by five 24-hour recalls collected over 10 months. All foods were classified into 24 food subgroups. Demographic, health, and ...

  2. Making Sense of Cluster Analysis: Revelations from Pakistani Science Classes

    ERIC Educational Resources Information Center

    Pell, Tony; Hargreaves, Linda

    2011-01-01

    Cluster analysis has been applied to quantitative data in educational research over several decades and has been a feature of the Maurice Galton's research in primary and secondary classrooms. It has offered potentially useful insights for teaching yet its implications for practice are rarely implemented. It has been subject also to negative…

  3. A Cluster Analysis of Personality Style in Adults with ADHD

    ERIC Educational Resources Information Center

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  4. Language Learner Motivational Types: A Cluster Analysis Study

    ERIC Educational Resources Information Center

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  5. K-means cluster analysis and seismicity partitioning for Pakistan

    NASA Astrophysics Data System (ADS)

    Rehman, Khaista; Burton, Paul W.; Weatherill, Graeme A.

    2014-07-01

    Pakistan and the western Himalaya is a region of high seismic activity located at the triple junction between the Arabian, Eurasian and Indian plates. Four devastating earthquakes have resulted in significant numbers of fatalities in Pakistan and the surrounding region in the past century (Quetta, 1935; Makran, 1945; Pattan, 1974 and the recent 2005 Kashmir earthquake). It is therefore necessary to develop an understanding of the spatial distribution of seismicity and the potential seismogenic sources across the region. This forms an important basis for the calculation of seismic hazard; a crucial input in seismic design codes needed to begin to effectively mitigate the high earthquake risk in Pakistan. The development of seismogenic source zones for seismic hazard analysis is driven by both geological and seismotectonic inputs. Despite the many developments in seismic hazard in recent decades, the manner in which seismotectonic information feeds the definition of the seismic source can, in many parts of the world including Pakistan and the surrounding regions, remain a subjective process driven primarily by expert judgment. Whilst much research is ongoing to map and characterise active faults in Pakistan, knowledge of the seismogenic properties of the active faults is still incomplete in much of the region. Consequently, seismicity, both historical and instrumental, remains a primary guide to the seismogenic sources of Pakistan. This study utilises a cluster analysis approach for the purposes of identifying spatial differences in seismicity, which can be utilised to form a basis for delineating seismogenic source regions. An effort is made to examine seismicity partitioning for Pakistan with respect to earthquake database, seismic cluster analysis and seismic partitions in a seismic hazard context. A magnitude homogenous earthquake catalogue has been compiled using various available earthquake data. The earthquake catalogue covers a time span from 1930 to 2007 and

  6. Cluster analysis of movement patterns in multiarticular actions: a tutorial.

    PubMed

    Rein, Robert; Button, Chris; Davids, Keith; Summers, Jeffery

    2010-04-01

    The present paper proposes a technical analysis method for extracting information about movement patterning in studies of motor control, based on a cluster analysis of movement kinematics. In a tutorial fashion, data from three different experiments are presented to exemplify and validate the technical method. When applied to three different basketball-shooting techniques, the method clearly distinguished between the different patterns. When applied to a cyclical wrist supination-pronation task, the cluster analysis provided the same results as an analysis using the conventional discrete relative phase measure. Finally, when analyzing throwing performance constrained by distance to target, the method grouped movement patterns together according to throwing distance. In conclusion, the proposed technical method provides a valuable tool to improve understanding of coordination and control in different movement models, including multiarticular actions. PMID:20484771

  7. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    PubMed

    Hsu, Jessie J; Finkelstein, Dianne M; Schoenfeld, David A

    2015-01-01

    One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome. PMID:26562156

  8. Outcome-Driven Cluster Analysis with Application to Microarray Data

    PubMed Central

    Hsu, Jessie J.; Finkelstein, Dianne M.; Schoenfeld, David A.

    2015-01-01

    One goal of cluster analysis is to sort characteristics into groups (clusters) so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes) into groups of highly correlated genes that have the same effect on the outcome (recovery). We propose a random effects model where the genes within each group (cluster) equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome. PMID:26562156

  9. Cluster coarsening during polymer collapse: Finite-size scaling analysis

    NASA Astrophysics Data System (ADS)

    Majumder, Suman; Janke, Wolfhard

    2015-06-01

    We study the kinetics of the collapse of a single flexible polymer when it is quenched from a good solvent to a poor solvent. Results obtained from Monte Carlo simulations show that the collapse occurs through a sequence of events with the formation, growth and subsequent coalescence of clusters of monomers to a single compact globule. Particular emphasis is given in this work to the cluster growth during the collapse, analyzed via the application of finite-size scaling techniques. The growth exponent obtained in our analysis is suggestive of the universal Lifshitz-Slyozov mechanism of cluster growth. The methods used in this work could be of more general validity and applicable to other phenomena such as protein folding.

  10. A cluster analysis investigation of workaholism as a syndrome.

    PubMed

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. PMID:16551174

  11. An optical analysis of the merging cluster Abell 3888

    NASA Astrophysics Data System (ADS)

    Shakouri, S.; Johnston-Hollitt, M.; Dehghan, S.

    2016-05-01

    In this paper we present new AAOmega spectroscopy of 254 galaxies within a 30 arcmin radius around Abell 3888. We combine these data with the existing redshifts measured in a one degree radius around the cluster and performed a substructure analysis. We confirm 71 member galaxies within the core of A3888 and determine a new average redshift and velocity dispersion for the cluster of 0.1535 ± 0.0009 and 1181 ± 197 km s-1, respectively. The cluster is elongated along an East-West axis and we find the core is bimodal along this axis with two subgroups of 26 and 41 members detected. Our results suggest that A3888 is a merging system putting to rest the previous conjecture about the morphological status of the cluster derived from X-ray observations. In addition to the results on A3888 we also present six newly detected galaxy overdensities in the field, three of which we classify as new galaxy clusters.

  12. Full Text Clustering and Relationship Network Analysis of Biomedical Publications

    PubMed Central

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers. PMID:25250864

  13. Full text clustering and relationship network analysis of biomedical publications.

    PubMed

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers. PMID:25250864

  14. The Productivity Analysis of Chennai Automotive Industry Cluster

    NASA Astrophysics Data System (ADS)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  15. Analysis of RXTE data on Clusters of Galaxies

    NASA Technical Reports Server (NTRS)

    Petrosian, Vahe

    2004-01-01

    This grant provided support for the reduction, analysis and interpretation of of hard X-ray (HXR, for short) observations of the cluster of galaxies RXJO658--5557 scheduled for the week of August 23, 2002 under the RXTE Cycle 7 program (PI Vahe Petrosian, Obs. ID 70165). The goal of the observation was to search for and characterize the shape of the HXR component beyond the well established thermal soft X-ray (SXR) component. Such hard components have been detected in several nearby clusters. distant cluster would provide information on the characteristics of this radiation at a different epoch in the evolution of the imiverse and shed light on its origin. We (Petrosian, 2001) have argued that thermal bremsstrahlung, as proposed earlier, cannot be the mechanism for the production of the HXRs and that the most likely mechanism is Compton upscattering of the cosmic microwave radiation by relativistic electrons which are known to be present in the clusters and be responsible for the observed radio emission. Based on this picture we estimated that this cluster, in spite of its relatively large distance, will have HXR signal comparable to the other nearby ones. The planned observation of a relatively The proposed RXTE observations were carried out and the data have been analyzed. We detect a hard X-ray tail in the spectrum of this cluster with a flux very nearly equal to our predicted value. This has strengthen the case for the Compton scattering model. We intend the data obtained via this observation to be a part of a larger data set. We have identified other clusters of galaxies (in archival RXTE and other instrument data sets) with sufficiently high quality data where we can search for and measure (or at least put meaningful limits) on the strength of the hard component. With these studies we expect to clarify the mechanism for acceleration of particles in the intercluster medium and provide guidance for future observations of this intriguing phenomenon by instrument

  16. The Quantitative Analysis of Chennai Automotive Industry Cluster

    NASA Astrophysics Data System (ADS)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  17. The Quantitative Analysis of Chennai Automotive Industry Cluster

    NASA Astrophysics Data System (ADS)

    Bhaskaran, Ethirajan

    2016-05-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  18. Bayesian Analysis of Multiple Populations in Galactic Globular Clusters

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, Rachel A.; Sarajedini, Ata; von Hippel, Ted; Stenning, David; Piotto, Giampaolo; Milone, Antonino; van Dyk, David A.; Robinson, Elliot; Stein, Nathan

    2016-01-01

    We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.

  19. Applying cluster analysis to physics education research data

    NASA Astrophysics Data System (ADS)

    Springuel, R. Padraic

    One major thrust of Physics Education Research (PER) is the identification of student ideas about specific physics concepts, both correct ideas and those that differ from the expert consensus. Typically the research process of eliciting the spectrum of student ideas involves the administration of specially designed questions to students. One major analysis task in PER is the sorting of these student responses into thematically coherent groups. This process is one which has previously been done by eye in PER. This thesis explores the possibility of using cluster analysis to perform the task in a more rigorous and less time-intensive fashion while making fewer assumptions about what the students are doing. Since this technique has not previously been used in PER, a summary of the various kinds of cluster analysis is included as well as a discussion of which might be appropriate for the task of sorting student responses into groups. Two example data sets (one based on the Force and Motion Conceptual Evaluation (DICE) the other looking at acceleration in two-dimensions (A2D) are examined in depth to demonstrate how cluster analysis can be applied to PER data and the various considerations which must be taken into account when doing so. In both cases, the techniques described in this thesis found 5 groups which contained about 90% of the students in the data set. The results of this application are compared to previous research on the topics covered by the two examples to demonstrate that cluster analysis can effectively uncover the same patterns in student responses that have already been identified.

  20. Analysis of data separation and recovery problems using clustered sparsity

    NASA Astrophysics Data System (ADS)

    King, Emily J.; Kutyniok, Gitta; Zhuang, Xiaosheng

    2011-09-01

    Data often have two or more fundamental components, like cartoon-like and textured elements in images; point, filament, and sheet clusters in astronomical data; and tonal and transient layers in audio signals. For many applications, separating these components is of interest. Another issue in data analysis is that of incomplete data, for example a photograph with scratches or seismic data collected with fewer than necessary sensors. There exists a unified approach to solving these problems which is minimizing the l1 norm of the analysis coefficients with respect to particular frame(s). This approach using the concept of clustered sparsity leads to similar theoretical bounds and results, which are presented here. Furthermore, necessary conditions for the frames to lead to sufficiently good solutions are also shown.

  1. Segment clustering methodology for unsupervised Holter recordings analysis

    NASA Astrophysics Data System (ADS)

    Rodríguez-Sotelo, Jose Luis; Peluffo-Ordoñez, Diego; Castellanos Dominguez, German

    2015-01-01

    Cardiac arrhythmia analysis on Holter recordings is an important issue in clinical settings, however such issue implicitly involves attending other problems related to the large amount of unlabelled data which means a high computational cost. In this work an unsupervised methodology based in a segment framework is presented, which consists of dividing the raw data into a balanced number of segments in order to identify fiducial points, characterize and cluster the heartbeats in each segment separately. The resulting clusters are merged or split according to an assumed criterion of homogeneity. This framework compensates the high computational cost employed in Holter analysis, being possible its implementation for further real time applications. The performance of the method is measure over the records from the MIT/BIH arrhythmia database and achieves high values of sensibility and specificity, taking advantage of database labels, for a broad kind of heartbeats types recommended by the AAMI.

  2. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches

    PubMed Central

    Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683

  3. ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

    NASA Technical Reports Server (NTRS)

    Wharton, S. W.; Turner, B. J.

    1981-01-01

    An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.

  4. Assessing intraplate volcano compositional similarities with cluster analysis

    NASA Astrophysics Data System (ADS)

    Konter, J. G.

    2012-12-01

    The compositional variation in intraplate volcanoes is commonly assessed as a function of end-members that were recognized as extrema in a 3D space, defined by radiogenic isotope ratios. The specific isotope ratios used are the principle components in the intraplate volcano compositional data set, and by reducing the dimensionality of the data set to 3, groupings and trends in the data can be visually identified. Such groupings can then be used to compare to other geochemical or geophysical data sets (e.g. correlations with seismic models). A complementary approach in examining groupings and trends in a data set is the use of cluster analysis, which can be used to recognize groups of similar intraplate volcanic systems. Since it is not known a priori how many clusters may exist, hierarchical cluster analysis can be used to examine the relationships between individual intraplate volcanic systems. The technique compares the Euclidian distance between the data available at the different locations, and this data can have a large number of dimensions. The results can be visualized as a dendrogram, where individual locations are represented by different branches (or leafs) that join at different distances. We use Matlab to examine the data extracted from pre-compiled GEOROC database files, including location, major elements, large ion lithophile elements, high field strength elements, rare earth elements and radiogenic isotopes. These data do not vary over the same range in values and are therefore first normalized by the total range in the data set for each particular element or isotope ratio. Since multiple samples have been analyzed for most intraplate volcanic systems, we assess the results for the average, the maximum, and the minimum values for each element. In addition, we investigate the robustness of the outcome by removing one element at a time from the data set and recalculating a new dendrogram. One of the outcomes is that the resulting clusters seem to

  5. Psychosocial Costs of Racism to Whites: Exploring Patterns through Cluster Analysis

    ERIC Educational Resources Information Center

    Spanierman, Lisa B.; Poteat, V. Paul; Beer, Amanda M.; Armstrong, Patrick Ian

    2006-01-01

    Participants (230 White college students) completed the Psychosocial Costs of Racism to Whites (PCRW) Scale. Using cluster analysis, we identified 5 distinct cluster groups on the basis of PCRW subscale scores: the unempathic and unaware cluster contained the lowest empathy scores; the insensitive and afraid cluster consisted of low empathy and…

  6. Genomic cluster and network analysis for predictive screening for hepatotoxicity.

    PubMed

    Fukushima, Tamio; Kikkawa, Rie; Hamada, Yoshimasa; Horii, Ikuo

    2006-12-01

    The present study was undertaken to estimate the usefulness of genomic approaches to predict hepatotoxicity. Male rats were treated with acetaminophen (APAP), carbon tetrachloride (CCL), amiodarone (AD) or tetracycline (TC) at toxic doses. Their livers were extracted 6 or 24 hr after the dosings and were used for subsequent examinations. At 6 hr there were no histological changes noted in any of the groups except for the CCL group, but at 24 hr, such changes were noted in all but the AD group. Regarding genomic analysis, we performed hierarchical cluster analysis using S-plus software. The individual microarray data were clearly classified into 5 treatment-related clusters at 24 hr as well as at 6 hr, even though no morphological changes were noted at 6 hr. In the gene expression analysis using GeneSpring, transcription factor and oxidative stress- and lipid metabolism-related genes were markedly affected in all treatment groups at both time points when compared with the corresponding control values. Finally, we investigated gene networks in the above-affected genes by using Ingenuity Pathway Analysis software. Down-regulation of lipid metabolism-related genes regulated by SREBP1 was observed in all treatment groups at both time points, and up-regulation of oxidative stress-related genes regulated by Nrf2 was observed in the APAP and CCL treatment groups. From the above findings, for the application of genomic approaches to predict hepatotoxicity, we considered that cluster analysis for classification and early prediction of hepatotoxicity and network analysis for investigation of toxicological biomarkers would be useful. PMID:17202758

  7. Cluster analysis of radionuclide concentrations in beach sand.

    PubMed

    de Meijer, R J; James, I R; Jennings, P J; Koeyers, J E

    2001-03-01

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of radionuclides in the environment, which are usually based on the construction of lines of equal concentrations. The paper focuses on the methodology of quantitatively correlating activity concentrations of natural radionuclides in two groups of minerals. The methodology is widely applicable, but is demonstrated for natural radioactivity in beach sands along the coast of South West Australia. PMID:11214891

  8. Case-cohort analysis of clusters of recurrent events.

    PubMed

    Chen, Feng; Chen, Kani

    2014-01-01

    The case-cohort sampling, first proposed in Prentice (Biometrika 73:1-11, 1986), is one of the most effective cohort designs for analysis of event occurrence, with the regression model being the typical Cox proportional hazards model. This paper extends to consider the case-cohort design for recurrent events with certain specific clustering feature, which is captured by a properly modified Cox-type self-exciting intensity model. We discuss the advantage of using this model and validate the pseudo-likelihood method. Simulation studies are presented in support of the theory. Application is illustrated with analysis of a bladder cancer data. PMID:23832308

  9. Analysis of the velocity data of cluster A562

    NASA Astrophysics Data System (ADS)

    Calderón Espinoza, D.; Gómez, P.

    2014-10-01

    We present a recent study of the dynamics of the cluster of galaxies Abell 562 intended to determine if ram pressure is responsible for the jet bending in the Wide-Angle Tailed (WAT) radio source located in the central elliptical galaxy. Given the properties of the jet and of the intra-cluster medium (ICM), a relative velocity between the galaxy and the ICM greater than 800 km/s is needed for this mechanism to bend the WAT jet. We find that the peculiar velocity of the WAT galaxy is 170 ± 140 km/s which is not enough to produce the bending. This is based on the analysis of the velocity of 146 galaxy cluster members obtained with the Gemini Multi-Object Spectrometer (GMOS) at Gemini North. However, our analysis of these velocity data and archival Chandra data suggests that an off-axis merger occurred in this system. This type of merger typically produces bulk flow motions with peak velocities greater than 1000 km/s which should be enough to explain the bending of the jets.

  10. Fractal Segmentation and Clustering Analysis for Seismic Time Slices

    NASA Astrophysics Data System (ADS)

    Ronquillo, G.; Oleschko, K.; Korvin, G.; Arizabalo, R. D.

    2002-05-01

    Fractal analysis has become part of the standard approach for quantifying texture on gray-tone or colored images. In this research we introduce a multi-stage fractal procedure to segment, classify and measure the clustering patterns on seismic time slices from a 3-D seismic survey. Five fractal classifiers (c1)-(c5) were designed to yield standardized, unbiased and precise measures of the clustering of seismic signals. The classifiers were tested on seismic time slices from the AKAL field, Cantarell Oil Complex, Mexico. The generalized lacunarity (c1), fractal signature (c2), heterogeneity (c3), rugosity of boundaries (c4) and continuity resp. tortuosity (c5) of the clusters are shown to be efficient measures of the time-space variability of seismic signals. The Local Fractal Analysis (LFA) of time slices has proved to be a powerful edge detection filter to detect and enhance linear features, like faults or buried meandering rivers. The local fractal dimensions of the time slices were also compared with the self-affinity dimensions of the corresponding parts of porosity-logs. It is speculated that the spectral dimension of the negative-amplitude parts of the time-slice yields a measure of connectivity between the formation's high-porosity zones, and correlates with overall permeability.

  11. Symptom cluster research: conceptual, design, measurement, and analysis issues.

    PubMed

    Barsevick, Andrea M; Whitmer, Kyra; Nail, Lillian M; Beck, Susan L; Dudley, William N

    2006-01-01

    Cancer patients may experience multiple concurrent symptoms caused by the cancer, cancer treatment, or their combination. The complex relationships between and among symptoms, as well as the clinical antecedents and consequences, have not been well described. This paper examines the literature on cancer symptom clusters focusing on the conceptualization, design, measurement, and analytic issues. The investigation of symptom clustering is in an early stage of testing empirically whether the characteristics defined in the conceptual definition can be observed in cancer patients. Decisions related to study design include sample selection, the timing of symptom measures, and the characteristics of symptom interventions. For self-report symptom measures, decisions include symptom dimensions to evaluate, methods of scaling symptoms, and the time frame of responses. Analytic decisions may focus on the application of factor analysis, cluster analysis, and path models. Studying the complex symptoms of oncology patients will yield increased understanding of the patterns of association, interaction, and synergy of symptoms that produce specific clinical outcomes. It will also provide a scientific basis and new directions for clinical assessment and intervention. PMID:16442485

  12. The relative vertex clustering value - a new criterion for the fast discovery of functional modules in protein interaction networks

    PubMed Central

    2015-01-01

    Background Cellular processes are known to be modular and are realized by groups of proteins implicated in common biological functions. Such groups of proteins are called functional modules, and many community detection methods have been devised for their discovery from protein interaction networks (PINs) data. In current agglomerative clustering approaches, vertices with just a very few neighbors are often classified as separate clusters, which does not make sense biologically. Also, a major limitation of agglomerative techniques is that their computational efficiency do not scale well to large PINs. Finally, PIN data obtained from large scale experiments generally contain many false positives, and this makes it hard for agglomerative clustering methods to find the correct clusters, since they are known to be sensitive to noisy data. Results We propose a local similarity premetric, the relative vertex clustering value, as a new criterion allowing to decide when a node can be added to a given node's cluster and which addresses the above three issues. Based on this criterion, we introduce a novel and very fast agglomerative clustering technique, FAC-PIN, for discovering functional modules and protein complexes from a PIN data. Conclusions Our proposed FAC-PIN algorithm is applied to nine PIN data from eight different species including the yeast PIN, and the identified functional modules are validated using Gene Ontology (GO) annotations from DAVID Bioinformatics Resources. Identified protein complexes are also validated using experimentally verified complexes. Computational results show that FAC-PIN can discover functional modules or protein complexes from PINs more accurately and more efficiently than HC-PIN and CNM, the current state-of-the-art approaches for clustering PINs in an agglomerative manner. PMID:25734691

  13. Cluster: Mission Overview and End-of-Life Analysis

    NASA Technical Reports Server (NTRS)

    Pallaschke, S.; Munoz, I.; Rodriquez-Canabal, J.; Sieg, D.; Yde, J. J.

    2007-01-01

    The Cluster mission is part of the scientific programme of the European Space Agency (ESA) and its purpose is the analysis of the Earth's magnetosphere. The Cluster project consists of four satellites. The selected polar orbit has a shape of 4.0 and 19.2 Re which is required for performing measurements near the cusp and the tail of the magnetosphere. When crossing these regions the satellites form a constellation which in most of the cases so far has been a regular tetrahedron. The satellite operations are carried out by the European Space Operations Centre (ESOC) at Darmstadt, Germany. The paper outlines the future orbit evolution and the envisaged operations from a Flight Dynamics point of view. In addition a brief summary of the LEOP and routine operations is included beforehand.

  14. [The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis].

    PubMed

    Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong

    2011-09-01

    The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface. PMID:22097851

  15. Dynamical analysis of galaxy cluster merger Abell 2146

    NASA Astrophysics Data System (ADS)

    White, J. A.; Canning, R. E. A.; King, L. J.; Lee, B. E.; Russell, H. R.; Baum, S. A.; Clowe, D. I.; Coleman, J. E.; Donahue, M.; Edge, A. C.; Fabian, A. C.; Johnstone, R. M.; McNamara, B. R.; O'Dea, C. P.; Sanders, J. S.

    2015-11-01

    We present a dynamical analysis of the merging galaxy cluster system Abell 2146 using spectroscopy obtained with the Gemini Multi-Object Spectrograph on the Gemini North telescope. As revealed by the Chandra X-ray Observatory, the system is undergoing a major merger and has a gas structure indicative of a recent first core passage. The system presents two large shock fronts, making it unique amongst these rare systems. The hot gas structure indicates that the merger axis must be close to the plane of the sky and that the two merging clusters are relatively close in mass, from the observation of two shock fronts. Using 63 spectroscopically determined cluster members, we apply various statistical tests to establish the presence of two distinct massive structures. With the caveat that the system has recently undergone a major merger, the virial mass estimate is M_vir= 8.5^{+4.3}_{-4.7} × 10^{14} M_{⊙} for the whole system, consistent with the mass determination in a previous study using the Sunyaev-Zel'dovich signal. The newly calculated redshift for the system is z = 0.2323. A two-body dynamical model gives an angle of 13°-19° between the merger axis and the plane of the sky, and a time-scale after first core passage of ≈0.24-0.28 Gyr.

  16. Common and Cluster-Specific Simultaneous Component Analysis

    PubMed Central

    De Roover, Kim; Timmerman, Marieke E.; Mesquita, Batja; Ceulemans, Eva

    2013-01-01

    In many fields of research, so-called ‘multiblock’ data are collected, i.e., data containing multivariate observations that are nested within higher-level research units (e.g., inhabitants of different countries). Each higher-level unit (e.g., country) then corresponds to a ‘data block’. For such data, it may be interesting to investigate the extent to which the correlation structure of the variables differs between the data blocks. More specifically, when capturing the correlation structure by means of component analysis, one may want to explore which components are common across all data blocks and which components differ across the data blocks. This paper presents a common and cluster-specific simultaneous component method which clusters the data blocks according to their correlation structure and allows for common and cluster-specific components. Model estimation and model selection procedures are described and simulation results validate their performance. Also, the method is applied to data from cross-cultural values research to illustrate its empirical value. PMID:23667463

  17. Covariance analysis of differential drag-based satellite cluster flight

    NASA Astrophysics Data System (ADS)

    Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini

    2016-06-01

    One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.

  18. Cluster analysis on mass spectra of biogenic secondary organic aerosol

    NASA Astrophysics Data System (ADS)

    Spindler, C.; Kiendler-Scharr, A.; Kleist, E.; Mensah, A.; Mentel, T.; Tillmann, R.; Wildt, J.

    2009-04-01

    Biogenic secondary organic aerosols (BSOA) are of high importance in the atmosphere. The formation of SOA from the volatile organic compound (VOC) emissions of selected trees was investigated in the JPAC (Jülich Plant Aerosol Chamber) facility. The VOC (mainly monoterpenes) were transferred into a reaction chamber where vapors were photo-chemically oxidized and formed BSOA. The aerosol was characterized by aerosol mass spectrometry (Aerodyne Quadrupol-AMS). Inside the AMS, flash-vaporization of the aerosol particles and electron impact ionization of the evaporated molecules cause a high fragmentation of the organic compounds. Here, we present a classification of the aerosol mass spectra via cluster analysis. Average mass spectra are produced by combination of related single mass spectra to so-called clusters. The mass spectra were similar due to the similarity of the precursor substances. However, we can show that there are differences in the BSOA mass spectra of different tree species. Furthermore we can distinguish the influence of the precursor chemistry and chemical aging. BSOA formed from plants exposed to stress can be distinguished from BSOA formed under non stressed conditions. Significance and limitations of the clustering method for very similar mass spectra will be demonstrated and discussed.

  19. Three Systems of Insular Functional Connectivity Identified with Cluster Analysis

    PubMed Central

    Pitskel, Naomi B.; Pelphrey, Kevin A.

    2011-01-01

    Despite much research on the function of the insular cortex, few studies have investigated functional subdivisions of the insula in humans. The present study used resting-state functional connectivity magnetic resonance imaging (MRI) to parcellate the human insular lobe based on clustering of functional connectivity patterns. Connectivity maps were computed for each voxel in the insula based on resting-state functional MRI (fMRI) data and segregated using cluster analysis. We identified 3 insular subregions with distinct patterns of connectivity: a posterior region, functionally connected with primary and secondary somatomotor cortices; a dorsal anterior to middle region, connected with dorsal anterior cingulate cortex, along with other regions of a previously described control network; and a ventral anterior region, primarily connected with pregenual anterior cingulate cortex. Applying these regions to a separate task data set, we found that dorsal and ventral anterior insula responded selectively to disgusting images, while posterior insula did not. These results demonstrate that clustering of connectivity patterns can be used to subdivide cerebral cortex into anatomically and functionally meaningful subregions; the insular regions identified here should be useful in future investigations on the function of the insula. PMID:21097516

  20. Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks

    NASA Astrophysics Data System (ADS)

    Johnson, Joseph

    2016-03-01

    We have designed and build an optimal numerical standardization algorithm that links numerical values with their associated units, error level, and defining metadata thus supporting automated data exchange and new levels of artificial intelligence (AI). The software manages all dimensional and error analysis and computational tracing. Tables of entities verses properties of these generalized numbers (called ``metanumbers'') support a transformation of each table into a network among the entities and another network among their properties where the network connection matrix is based upon a proximity metric between the two items. We previously proved that every network is isomorphic to the Lie algebra that generates continuous Markov transformations. We have also shown that the eigenvectors of these Markov matrices provide an agnostic clustering of the underlying patterns. We will present this methodology and show how our new work on conversion of scientific numerical data through this process can reveal underlying information clusters ordered by the eigenvalues. We will also show how the linking of clusters from different tables can be used to form a ``supernet'' of all numerical information supporting new initiatives in AI.

  1. [Clustering analysis applied to near-infrared spectroscopy analysis of Chinese traditional medicine].

    PubMed

    Liu, Mu-qing; Zhou, De-cheng; Xu, Xin-yuan; Sun, Yao-jie; Zhou, Xiao-li; Han, Lei

    2007-10-01

    The present article discusses the clustering analysis used in the near-infrared (NIR) spectroscopy analysis of Chinese traditional medicines, which provides a new method for the classification of Chinese traditional medicines. Samples selected purposely in the authors' research to measure their absorption spectra in seconds by a multi-channel NIR spectrometer developed in the authors' lab were safrole, eucalypt oil, laurel oil, turpentine, clove oil and three samples of costmary oil from different suppliers. The spectra in the range of 0.70-1.7 microm were measured with air as background and the results indicated that they are quite distinct. Qualitative mathematical model was set up and cluster analysis based on the spectra was carried out through different clustering methods for optimization, and came out the cluster correlation coefficient of 0.9742 in the authors' research. This indicated that cluster analysis of the group of samples is practicable. Also it is reasonable to get the result that the calculated classification of 8 samples was quite accorded with their characteristics, especially the three samples of costmary oil were in the closest classification of the clustering analysis. PMID:18306778

  2. The REFLEX II galaxy cluster survey: power spectrum analysis

    NASA Astrophysics Data System (ADS)

    Balaguera-Antolínez, A.; Sánchez, Ariel G.; Böhringer, H.; Collins, C.; Guzzo, L.; Phleps, S.

    2011-05-01

    We present the power spectrum of galaxy clusters measured from the new ROSAT-ESO Flux-Limited X-Ray (REFLEX II) galaxy cluster catalogue. This new sample extends the flux limit of the original REFLEX catalogue to 1.8 × 10-12 erg s-1 cm-2, yielding a total of 911 clusters with ≥94 per cent completeness in redshift follow-up. The analysis of the data is improved by creating a set of 100 REFLEX II-catalogue-like mock galaxy cluster catalogues built from a suite of large-volume Λ cold dark matter (ΛCDM) N-body simulations (L-BASICC II). The measured power spectrum is in agreement with the predictions from a ΛCDM cosmological model. The measurements show the expected increase in the amplitude of the power spectrum with increasing X-ray luminosity. On large scales, we show that the shape of the measured power spectrum is compatible with a scale-independent bias and provide a model for the amplitude that allows us to connect our measurements with a cosmological model. By implementing a luminosity-dependent power-spectrum estimator, we observe that the power spectrum measured from the REFLEX II sample is weakly affected by flux-selection effects. The shape of the measured power spectrum is compatible with a featureless power spectrum on scales k > 0.01 h Mpc-1 and hence no statistically significant signal of baryonic acoustic oscillations can be detected. We show that the measured REFLEX II power spectrum displays signatures of non-linear evolution.

  3. Proteomics Analysis Reveals Overlapping Functions of Clustered Protocadherins*

    PubMed Central

    Han, Meng-Hsuan; Lin, Chengyi; Meng, Shuxia; Wang, Xiaozhong

    2010-01-01

    The three tandem-arrayed protocadherin (Pcdh) gene clusters, namely Pcdh-α, Pcdh-β, and Pcdh-γ, play important roles in the development of the vertebrate central nervous system. To gain insight into the molecular action of PCDHs, we performed a systematic proteomics analysis of PCDH-γ-associated protein complexes. We identified a list of 154 non-redundant proteins in the PCDH-γ complexes. This list includes nearly 30 members of clustered Pcdh-α, -β, and -γ families as core components of the complexes and additionally over 120 putative PCDH-associated proteins. We validated a selected subset of PCDH-γ-associated proteins using specific antibodies. Analysis of the identities of PCDH-associated proteins showed that the majority of them overlap with the proteomic profile of postsynaptic density preparations. Further analysis of membrane protein complexes revealed that several validated PCDH-γ-associated proteins exhibit reduced levels in Pcdh-γ-deficient brain tissues. Therefore, PCDH-γs are required for the integrity of the complexes. However, the size of the overall complexes and the abundance of many other proteins remained unchanged, raising a possibility that PCDH-αs and PCDH-βs might compensate for PCDH-γ function in complex formation. As a test of this idea, RNA interference knockdown of both PCDH-αs and PCDH-γs showed that PCDHs have redundant functions in regulating neuronal survival in the chicken spinal cord. Taken together, our data provide evidence that clustered PCDHs coexist in large protein complexes and have overlapping functions during vertebrate neural development. PMID:19843561

  4. Nonuniqueness in traveltime tomography: Ensemble inference and cluster analysis

    SciTech Connect

    Vasco, D.W.; Peterson, J.E. Jr.; Majer, E.L.

    1996-07-01

    The authors examine the nonlinear aspects of seismic traveltime tomography. This is accomplished by completing an extensive set of conjugate gradient inversions on a parallel virtual machine, with each initiated by a different starting model. The goal is an exploratory analysis of a set of conjugate gradient solutions to the traveltime tomography problem. The authors find that distinct local minima are generated when prior constraints are imposed on traveltime tomographic inverse problems. Methods from cluster analysis determine the number and location of the isolated solutions to the traveltime tomography problem. They apply the cluster analysis techniques to a cross-borehole traveltime data set gathered at the Gypsy Pilot Site in Pawnee County, Oklahoma. They find that the 1075 final models, satisfying the traveltime data and a model norm penalty, form up to 61 separate solutions. All solutions appear to contain a central low velocity zone bounded above and below by higher velocity layers. Such a structure agrees with well-logs, hydrological well tests, and a previous seismic inversion.

  5. An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis

    USGS Publications Warehouse

    McKenna, J.E., Jr.

    2003-01-01

    The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.

  6. Multivariate cluster analysis of forest fire events in Portugal

    NASA Astrophysics Data System (ADS)

    Tonini, Marj; Pereira, Mario; Vega Orozco, Carmen; Parente, Joana

    2015-04-01

    Portugal is one of the major fire-prone European countries, mainly due to its favourable climatic, topographic and vegetation conditions. Compared to the other Mediterranean countries, the number of events registered here from 1980 up to nowadays is the highest one; likewise, with respect to the burnt area, Portugal is the third most affected country. Portuguese mapped burnt areas are available from the website of the Institute for the Conservation of Nature and Forests (ICNF). This official geodatabase is the result of satellite measurements starting from the year 1990. The spatial information, delivered in shapefile format, provides a detailed description of the shape and the size of area burnt by each fire, while the date/time information relate to the ignition fire is restricted to the year of occurrence. In terms of a statistical formalism wildfires can be associated to a stochastic point process, where events are analysed as a set of geographical coordinates corresponding, for example, to the centroid of each burnt area. The spatio/temporal pattern of stochastic point processes, including the cluster analysis, is a basic procedure to discover predisposing factorsas well as for prevention and forecasting purposes. These kinds of studies are primarily focused on investigating the spatial cluster behaviour of environmental data sequences and/or mapping their distribution at different times. To include both the two dimensions (space and time) a comprehensive spatio-temporal analysis is needful. In the present study authors attempt to verify if, in the case of wildfires in Portugal, space and time act independently or if, conversely, neighbouring events are also closer in time. We present an application of the spatio-temporal K-function to a long dataset (1990-2012) of mapped burnt areas. Moreover, the multivariate K-function allowed checking for an eventual different distribution between small and large fires. The final objective is to elaborate a 3D

  7. Time series clustering analysis of health-promoting behavior

    NASA Astrophysics Data System (ADS)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  8. IPC two-color analysis of x ray galaxy clusters

    NASA Technical Reports Server (NTRS)

    White, Raymond E., III

    1990-01-01

    The mass distributions were determined of several clusters of galaxies by using X ray surface brightness data from the Einstein Observatory Imaging Proportional Counter (IPC). Determining cluster mass distributions is important for constraining the nature of the dark matter which dominates the mass of galaxies, galaxy clusters, and the Universe. Galaxy clusters are permeated with hot gas in hydrostatic equilibrium with the gravitational potentials of the clusters. Cluster mass distributions can be determined from x ray observations of cluster gas by using the equation of hydrostatic equilibrium and knowledge of the density and temperature structure of the gas. The x ray surface brightness at some distance from the cluster is the result of the volume x ray emissivity being integrated along the line of sight in the cluster.

  9. The composite sequential clustering technique for analysis of multispectral scanner data

    NASA Technical Reports Server (NTRS)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  10. The Use of Cluster Analysis in Typological Research on Community College Students

    ERIC Educational Resources Information Center

    Bahr, Peter Riley; Bielby, Rob; House, Emily

    2011-01-01

    One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…

  11. A Multiple-Methods Approach to the Investigation of WAIS-R Constructs Employing Cluster Analysis.

    ERIC Educational Resources Information Center

    Fraboni, Maryann; And Others

    1989-01-01

    Seven hierarchical clustering methods were applied to the Wechsler Adult Intelligence Scale-Revised (WAIS-R) scores of 121 medical rehabilitation clients to investigate the possibility of method-dependent results and determine the stability of the clusters. This multiple-methods cluster analysis suggests that the underlying constructs of the…

  12. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    ERIC Educational Resources Information Center

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  13. Cluster analysis of rural, urban, and curbside atmospheric particle size data.

    PubMed

    Beddows, David C S; Dall'Osto, Manuel; Harrison, Roy M

    2009-07-01

    Particle size is a key determinant of the hazard posed by airborne particles. Continuous multivariate particle size data have been collected using aerosol particle size spectrometers sited at four locations within the UK: Harwell (Oxfordshire); Regents Park (London); British Telecom Tower (London); and Marylebone Road (London). These data have been analyzed using k-means cluster analysis, deduced to be the preferred cluster analysis technique, selected from an option of four partitional cluster packages, namelythe following: Fuzzy; k-means; k-median; and Model-Based clustering. Using cluster validation indices k-means clustering was shown to produce clusters with the smallest size, furthest separation, and importantly the highest degree of similarity between the elements within each partition. Using k-means clustering, the complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. At Harwell, the rural background measurement site, the cluster analysis showed that the spectra may be differentiated by their modal-diameters and average temporal trends showing either high counts during the day-time or night-time hours. Likewise for the urban sites, the cluster analysis differentiated the spectra into a small number of size distributions according their modal-diameter, the location of the measurement site, and time of day. The responsible aerosol emission, formation, and dynamic processes can be inferred according to the cluster characteristics and correlation to concurrently measured meteorological, gas phase, and particle phase measurements. PMID:19673253

  14. The methodology of multi-viewpoint clustering analysis

    NASA Technical Reports Server (NTRS)

    Mehrotra, Mala; Wild, Chris

    1993-01-01

    One of the greatest challenges facing the software engineering community is the ability to produce large and complex computer systems, such as ground support systems for unmanned scientific missions, that are reliable and cost effective. In order to build and maintain these systems, it is important that the knowledge in the system be suitably abstracted, structured, and otherwise clustered in a manner which facilitates its understanding, manipulation, testing, and utilization. Development of complex mission-critical systems will require the ability to abstract overall concepts in the system at various levels of detail and to consider the system from different points of view. Multi-ViewPoint - Clustering Analysis MVP-CA methodology has been developed to provide multiple views of large, complicated systems. MVP-CA provides an ability to discover significant structures by providing an automated mechanism to structure both hierarchically (from detail to abstract) and orthogonally (from different perspectives). We propose to integrate MVP/CA into an overall software engineering life cycle to support the development and evolution of complex mission critical systems.

  15. The XMM Cluster Survey: optical analysis methodology and the first data release

    NASA Astrophysics Data System (ADS)

    Mehrtens, Nicola; Romer, A. Kathy; Hilton, Matt; Lloyd-Davies, E. J.; Miller, Christopher J.; Stanford, S. A.; Hosmer, Mark; Hoyle, Ben; Collins, Chris A.; Liddle, Andrew R.; Viana, Pedro T. P.; Nichol, Robert C.; Stott, John P.; Dubois, E. Naomi; Kay, Scott T.; Sahlén, Martin; Young, Owain; Short, C. J.; Christodoulou, L.; Watson, William A.; Davidson, Michael; Harrison, Craig D.; Baruah, Leon; Smith, Mathew; Burke, Claire; Mayers, Julian A.; Deadman, Paul-James; Rooney, Philip J.; Edmondson, Edward M.; West, Michael; Campbell, Heather C.; Edge, Alastair C.; Mann, Robert G.; Sabirli, Kivanc; Wake, David; Benoist, Christophe; da Costa, Luiz; Maia, Marcio A. G.; Ogando, Ricardo

    2012-06-01

    The XMM Cluster Survey (XCS) is a serendipitous search for galaxy clusters using all publicly available data in the XMM-Newton Science Archive. Its main aims are to measure cosmological parameters and trace the evolution of X-ray scaling relations. In this paper we present the first data release from the XMM Cluster Survey (XCS-DR1). This consists of 503 optically confirmed, serendipitously detected, X-ray clusters. Of these clusters, 256 are new to the literature and 357 are new X-ray discoveries. We present 463 clusters with a redshift estimate (0.06 < z < 1.46), including 261 clusters with spectroscopic redshifts. The remainder have photometric redshifts. In addition, we have measured X-ray temperatures (TX) for 401 clusters (0.4 < TX < 14.7 keV). We highlight seven interesting subsamples of XCS-DR1 clusters: (i) 10 clusters at high redshift (z > 1.0, including a new spectroscopically confirmed cluster at z= 1.01); (ii) 66 clusters with high TX (>5 keV) (iii) 130 clusters/groups with low TX (<2 keV) (iv) 27 clusters with measured TX values in the Sloan Digital Sky Survey (SDSS) ‘Stripe 82’ co-add region; (v) 77 clusters with measured TX values in the Dark Energy Survey region; (vi) 40 clusters detected with sufficient counts to permit mass measurements (under the assumption of hydrostatic equilibrium); (vii) 104 clusters that can be used for applications such as the derivation of cosmological parameters and the measurement of cluster scaling relations. The X-ray analysis methodology used to construct and analyse the XCS-DR1 cluster sample has been presented in a companion paper, Lloyd-Davies et al.

  16. Highlights of the Merging Cluster Collaboration's Analysis of 26 Radio Relic Galaxy Cluster Mergers

    NASA Astrophysics Data System (ADS)

    Dawson, William; Golovich, Nathan; Wittman, David M.; Bradac, Marusa; Brüggen, Marcus; Bullock, James; Elbert, Oliver; Jee, James; Kaplinghat, Manoj; Kim, Stacy; Mahdavi, Andisheh; Merten, Julian; Ng, Karen; Annika, Peter; Rocha, Miguel E.; Sobral, David; Stroe, Andra; Van Weeren, Reinout J.; Merging Cluster Collaboration

    2016-01-01

    Merging galaxy clusters are now recognized as multifaceted probes providing unique insight into the properties of dark matter, the environmental impact of plasma shocks on galaxy evolution, and the physics of high energy particle acceleration. The Merging Cluster Collaboration has used the diffuse radio emission associated with the synchrotron radiation of relativistic particles accelerated by shocks generated during major cluster mergers (i.e. radio relics) to identify a homogenous sample of 26 galaxy cluster mergers. We have confirmed theoretical expectations that radio relics are predominantly associated with mergers occurring near the plane of the sky and at a relatively common merger phase; making them ideal probes of self-interacting dark matter, and eliminating much of the dominant uncertainty when relating the observed star formation rates to the event of the major cluster merger. We will highlight a number of the discovered common traits of this sample as well as detailed measurements of individual mergers.

  17. Cluster analysis of gene expression data based on self-splitting and merging competitive learning.

    PubMed

    Wu, Shuanhu; Liew, Alan Wee-Chung; Yan, Hong; Yang, Mengsu

    2004-03-01

    Cluster analysis of gene expression data from a cDNA microarray is useful for identifying biologically relevant groups of genes. However, finding the natural clusters in the data and estimating the correct number of clusters are still two largely unsolved problems. In this paper, we propose a new clustering framework that is able to address both these problems. By using the one-prototype-take-one-cluster (OPTOC) competitive learning paradigm, the proposed algorithm can find natural clusters in the input data, and the clustering solution is not sensitive to initialization. In order to estimate the number of distinct clusters in the data, we propose a cluster splitting and merging strategy. We have applied the new algorithm to simulated gene expression data for which the correct distribution of genes over clusters is known a priori. The results show that the proposed algorithm can find natural clusters and give the correct number of clusters. The algorithm has also been tested on real gene expression changes during yeast cell cycle, for which the fundamental patterns of gene expression and assignment of genes to clusters are well understood from numerous previous studies. Comparative studies with several clustering algorithms illustrate the effectiveness of our method. PMID:15055797

  18. A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis

    ERIC Educational Resources Information Center

    DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim

    2004-01-01

    This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…

  19. Higgs pair production: choosing benchmarks with cluster analysis

    NASA Astrophysics Data System (ADS)

    Carvalho, Alexandra; Dall'Osso, Martino; Dorigo, Tommaso; Goertz, Florian; Gottardo, Carlo A.; Tosi, Mia

    2016-04-01

    New physics theories often depend on a large number of free parameters. The phenomenology they predict for fundamental physics processes is in some cases drastically affected by the precise value of those free parameters, while in other cases is left basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics predicted by different models; a clustering algorithm using that metric may allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmarks are then guaranteed to be sensitive to a large area of the parameter space.

  20. Archetypal TRMM Radar Profiles Identified Through Cluster Analysis

    NASA Technical Reports Server (NTRS)

    Boccippio, Dennis J.

    2003-01-01

    It is widely held that identifiable 'convective regimes' exist in nature, although precise definitions of these are elusive. Examples include land / Ocean distinctions, break / monsoon beahvior, seasonal differences in the Amazon (SON vs DJF), etc. These regimes are often described by differences in the realized local convective spectra, and measured by various metrics of convective intensity, depth, areal coverage and rainfall amount. Objective regime identification may be valuable in several ways: regimes may serve as natural 'branch points' in satellite retrieval algorithms or data assimilation efforts; one example might be objective identification of regions that 'should' share a similar 2-R relationship. Similarly, objectively defined regimes may provide guidance on optimal siting of ground validation efforts. Objectively defined regimes could also serve as natural (rather than arbitrary geographic) domain 'controls' in studies of convective response to environmental forcing. Quantification of convective vertical structure has traditionally involved parametric study of prescribed quantities thought to be important to convective dynamics: maximum radar reflectivity, cloud top height, 30-35 dBZ echo top height, rain rate, etc. Individually, these parameters are somewhat deficient as their interpretation is often nonunique (the same metric value may signify different physics in different storm realizations). Individual metrics also fail to capture the coherence and interrelationships between vertical levels available in full 3-D radar datasets. An alternative approach is discovery of natural partitions of vertical structure in a globally representative dataset, or 'archetypal' reflectivity profiles. In this study, this is accomplished through cluster analysis of a very large sample (0[107) of TRMM-PR reflectivity columns. Once achieved, the rainconditional and unconditional 'mix' of archetypal profile types in a given location and/or season provides a description

  1. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

    PubMed Central

    Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

    2015-01-01

    Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888

  2. Clustered data analysis under miscategorized ordinal outcomes and missing covariates.

    PubMed

    Roy, Surupa; Rana, Subrata; Das, Kalyan

    2016-08-15

    The primary objective in this article is to look into the analysis of clustered ordinal model where complete information on one or more covariates cease to occur. In addition, we also focus on the analysis of miscategorized data that occur in many situations as outcomes are often classified into a category that does not truly reflect its actual state. A general model structure is assumed to accommodate the information that is obtained via surrogate variables. The theoretical motivation actually developed while encountering an orthodontic data to investigate the effects of age, sex and food habit on the extent of plaque deposit. The model we propose is quite flexible and is capable of tackling those additional noises like miscategorization and missingness, which occur in the data most frequently. A new two-step approach has been proposed to estimate the parameters of model framed. A rigorous simulation study has also been carried out to justify the validity of the model taken up for analysis. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26215983

  3. Cluster analysis application in research on pork quality determinants

    NASA Astrophysics Data System (ADS)

    Przybylski, W.; Wasiewicz, P.; Zieliński, P.; Gromadzka-Ostrowska, J.; Olczak, E.; Jaworska, D.; Niemyjski, S.; Santé-Lhoutellier, V.

    2010-09-01

    In this paper data mining methods were applied to investigate features determining high quality pork meat. The aim of the study was analysis of conditionality of the pork meat quality defined in coherence with HDL and LDL cholesterol concentration, plasma leptin, triglycerides, plasma glucose and serum. The research was carried out on 54 pigs. originated from crossbreeding of Naima sows with P76-PenArLan boars hybrids line. Meat quality parameters were evaluated in samples derived from the Longissimus (LD) muscle taken behind the last rib on the basis: the pH value, meat colour, drip loss, the RTN, intramuscular fat and glycolytic potential. The results of this study were elaborated by using R environment and show that cluster and regression analysis can be a useful tool for in-depth analysis of the determinants of the quality of pig meat in homogeneous populations of pigs. However, the question of determinants of the level of glycogen and fat in meat requires further research.

  4. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    PubMed

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-01

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/. PMID:21241501

  5. Hierarchical cluster analysis applied to workers' exposures in fiberglass insulation manufacturing.

    PubMed

    Wu, J D; Milton, D K; Hammond, S K; Spear, R C

    1999-01-01

    The objectives of this study were to explore the application of cluster analysis to the characterization of multiple exposures in industrial hygiene practice and to compare exposure groupings based on the result from cluster analysis with that based on non-measurement-based approaches commonly used in epidemiology. Cluster analysis was performed for 37 workers simultaneously exposed to three agents (endotoxin, phenolic compounds and formaldehyde) in fiberglass insulation manufacturing. Different clustering algorithms, including complete-linkage (or farthest-neighbor), single-linkage (or nearest-neighbor), group-average and model-based clustering approaches, were used to construct the tree structures from which clusters can be formed. Differences were observed between the exposure clusters constructed by these different clustering algorithms. When contrasting the exposure classification based on tree structures with that based on non-measurement-based information, the results indicate that the exposure clusters identified from the tree structures had little in common with the classification results from either the traditional exposure zone or the work group classification approach. In terms of the defining homogeneous exposure groups or from the standpoint of health risk, some toxicological normalization in the components of the exposure vector appears to be required in order to form meaningful exposure groupings from cluster analysis. Finally, it remains important to see if the lack of correspondence between exposure groups based on epidemiological classification and measurement data is a peculiarity of the data or a more general problem in multivariate exposure analysis. PMID:10028893

  6. Genomic Gene Clustering Analysis of Pathways in Eukaryotes

    PubMed Central

    Lee, Jennifer M.; Sonnhammer, Erik L.L.

    2003-01-01

    Genomic clustering of genes in a pathway is commonly found in prokaryotes due to transcriptional operons, but these are not present in most eukaryotes. Yet, there might be clustering to a lesser extent of pathway members in eukaryotic genomes, that assist coregulation of a set of functionally cooperating genes. We analyzed five sequenced eukaryotic genomes for clustering of genes assigned to the same pathway in the KEGG database. Between 98% and 30% of the analyzed pathways in a genome were found to exhibit significantly higher clustering levels than expected by chance. In descending order by the level of clustering, the genomes studied were Saccharomyces cerevisiae, Homo sapiens, Caenorhabditis elegans, Arabidopsis thaliana, and Drosophila melanogaster. Surprisingly, there is not much agreement between genomes in terms of which pathways are most clustered. Only seven of 69 pathways found in all species were significantly clustered in all five of them. This species-specific pattern of pathway clustering may reflect adaptations or evolutionary events unique to a particular lineage. We note that although operons are common in C. elegans, only 58% of the pathways showed significant clustering, which is less than in human. Virtually all pathways in S. cerevisiae showed significant clustering. PMID:12695325

  7. TreeSOM: Cluster analysis in the self-organizing map.

    PubMed

    Samsonova, Elena V; Kok, Joost N; Ijzerman, Ad P

    2006-01-01

    Clustering problems arise in various domains of science and engineering. A large number of methods have been developed to date. The Kohonen self-organizing map (SOM) is a popular tool that maps a high-dimensional space onto a small number of dimensions by placing similar elements close together, forming clusters. Cluster analysis is often left to the user. In this paper we present the method TreeSOM and a set of tools to perform unsupervised SOM cluster analysis, determine cluster confidence and visualize the result as a tree facilitating comparison with existing hierarchical classifiers. We also introduce a distance measure for cluster trees that allows one to select a SOM with the most confident clusters. PMID:16781116

  8. An effective fuzzy kernel clustering analysis approach for gene expression data.

    PubMed

    Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

    2015-01-01

    Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms. PMID:26405958

  9. AVES: A Computer Cluster System approach for INTEGRAL Scientific Analysis

    NASA Astrophysics Data System (ADS)

    Federici, M.; Martino, B. L.; Natalucci, L.; Umbertini, P.

    The AVES computing system, based on an "Cluster" architecture is a fully integrated, low cost computing facility dedicated to the archiving and analysis of the INTEGRAL data. AVES is a modular system that uses the software resource manager (SLURM) and allows almost unlimited expandibility (65,536 nodes and hundreds of thousands of processors); actually is composed by 30 Personal Computers with Quad-Cores CPU able to reach the computing power of 300 Giga Flops (300x10{9} Floating point Operations Per Second), with 120 GB of RAM and 7.5 Tera Bytes (TB) of storage memory in UFS configuration plus 6 TB for users area. AVES was designed and built to solve growing problems raised from the analysis of the large data amount accumulated by the INTEGRAL mission (actually about 9 TB) and due to increase every year. The used analysis software is the OSA package, distributed by the ISDC in Geneva. This is a very complex package consisting of dozens of programs that can not be converted to parallel computing. To overcome this limitation we developed a series of programs to distribute the workload analysis on the various nodes making AVES automatically divide the analysis in N jobs sent to N cores. This solution thus produces a result similar to that obtained by the parallel computing configuration. In support of this we have developed tools that allow a flexible use of the scientific software and quality control of on-line data storing. The AVES software package is constituted by about 50 specific programs. Thus the whole computing time, compared to that provided by a Personal Computer with single processor, has been enhanced up to a factor 70.

  10. Cluster analysis of indermediate deep events in the southeastern Aegean

    NASA Astrophysics Data System (ADS)

    Ruscic, Marija; Becker, Dirk; Brüstle, Andrea; Meier, Thomas

    2015-04-01

    The Hellenic subduction zone (HSZ) is the seismically most active region in Europe where the oceanic African litosphere is subducting beneath the continental Aegean plate. Although there are numerous studies of seismicity in the HSZ, very few focus on the eastern HSZ and the Wadati-Benioff-Zone of the subducting slab in that part of the HSZ. In order to gain a better understanding of the geodynamic processes in the region a dense local seismic network is required. From September 2005 to March 2007, the temporary seismic network EGELADOS has been deployed covering the entire HSZ. It consisted of 56 onshore and 23 offshore broadband stations with addition of 19 stations from GEOFON, NOA and MedNet to complete the network. Here, we focus on a cluster of intermediate deep seismicity recorded by the EGELADOS network within the subducting African slab in the region of the Nysiros volcano. The cluster consists of 159 events at 80 to 190 km depth with magnitudes between 0.2 and 4.1 that were located using nonlinear location tool NonLinLoc. A double-difference earthquake relocation using the HypoDD software is performed with both manual readings of onset times and differential traveltimes obtained by separate cross correlation of P- and S-waveforms. Single event locations are compared to relative relocations. The event hypocenters fall into a thin zone close to the top of the slab defining its geometry with an accuracy of a few kilometers. At intermediate depth the slab is dipping towards the NW at an angle of about 30°. That means it is dipping steeper than in the western part of the HSZ. The edge of the slab is clearly defined by an abrupt disappearance of intermediate depths seismicity towards the NE. It is found approximately beneath the Turkish coastline. Furthermore, results of a cluster analysis based on the cross correlation of three-component waveforms are shown as a function of frequency and the spatio-temporal migration of the seismic activity is analysed.

  11. The Cluster Analysis of the Databases of the Orbital Parameters of Artificial Satellites

    NASA Astrophysics Data System (ADS)

    Shakun, L. S.; Koshkin, N. I.

    Cluster analysis of database of orbit parameters of artificial satellites. L.Shakun, N.Koshkin. The relational base of orbital parameters of near-Earth space objects (SO) is created. For 2007 it is led correlative and cluster analysis on variations of values A* for 4.5 thousand of low-Earth orbit (LEO) objects. Clusters LEO with similar character of atmospheric drag are selected.

  12. Study on Cluster Analysis Used with Laser-Induced Breakdown Spectroscopy

    NASA Astrophysics Data System (ADS)

    He, Li'ao; Wang, Qianqian; Zhao, Yu; Liu, Li; Peng, Zhong

    2016-06-01

    Supervised learning methods (eg. PLS-DA, SVM, etc.) have been widely used with laser-induced breakdown spectroscopy (LIBS) to classify materials; however, it may induce a low correct classification rate if a test sample type is not included in the training dataset. Unsupervised cluster analysis methods (hierarchical clustering analysis, K-means clustering analysis, and iterative self-organizing data analysis technique) are investigated in plastics classification based on the line intensities of LIBS emission in this paper. The results of hierarchical clustering analysis using four different similarity measuring methods (single linkage, complete linkage, unweighted pair-group average, and weighted pair-group average) are compared. In K-means clustering analysis, four kinds of choosing initial centers methods are applied in our case and their results are compared. The classification results of hierarchical clustering analysis, K-means clustering analysis, and ISODATA are analyzed. The experiment results demonstrated cluster analysis methods can be applied to plastics discrimination with LIBS. supported by Beijing Natural Science Foundation of China (No. 4132063)

  13. Fuzzy and hard clustering analysis for thyroid disease.

    PubMed

    Azar, Ahmad Taher; El-Said, Shaimaa Ahmed; Hassanien, Aboul Ella

    2013-07-01

    Thyroid hormones produced by the thyroid gland help regulation of the body's metabolism. A variety of methods have been proposed in the literature for thyroid disease classification. As far as we know, clustering techniques have not been used in thyroid diseases data set so far. This paper proposes a comparison between hard and fuzzy clustering algorithms for thyroid diseases data set in order to find the optimal number of clusters. Different scalar validity measures are used in comparing the performances of the proposed clustering systems. To demonstrate the performance of each algorithm, the feature values that represent thyroid disease are used as input for the system. Several runs are carried out and recorded with a different number of clusters being specified for each run (between 2 and 11), so as to establish the optimum number of clusters. To find the optimal number of clusters, the so-called elbow criterion is applied. The experimental results revealed that for all algorithms, the elbow was located at c=3. The clustering results for all algorithms are then visualized by the Sammon mapping method to find a low-dimensional (normally 2D or 3D) representation of a set of points distributed in a high dimensional pattern space. At the end of this study, some recommendations are formulated to improve determining the actual number of clusters present in the data set. PMID:23357404

  14. Stochastic analysis of the extra clustering model for animal grouping.

    PubMed

    Drmota, Michael; Fuchs, Michael; Lee, Yi-Wen

    2016-07-01

    We consider the extra clustering model which was introduced by Durand et al. (J Theor Biol 249(2):262-270, 2007) in order to describe the grouping of social animals and to test whether genetic relatedness is the main driving force behind the group formation process. Durand and François (J Math Biol 60(3):451-468, 2010) provided a first stochastic analysis of this model by deriving (amongst other things) asymptotic expansions for the mean value of the number of groups. In this paper, we will give a much finer analysis of the number of groups. More precisely, we will derive asymptotic expansions for all higher moments and give a complete characterization of the possible limit laws. In the most interesting case (neutral model), we will prove a central limit theorem with a surprising normalization. In the remaining cases, the limit law will be either a mixture of a discrete and continuous law or a discrete law. Our results show that, except of in degenerate cases, strong concentration around the mean value takes place only for the neutral model, whereas in the remaining cases there is also mass concentration away from the mean. PMID:26520857

  15. Investigating Regional Disparities of China's Human Development with Cluster Analysis: A Historical Perspective

    ERIC Educational Resources Information Center

    Yang, Yongheng; Hu, Angang

    2008-01-01

    This paper adopts both one-dimensional and multi-dimensional cluster analysis to analyze China's HDI data for 1982, 1995, 1999, and 2003, and to classify China's provinces into four tiers based on the three basic developmental aspects embedded in HDI. The classifications by cluster analysis depends on the observations' similarities with respect to…

  16. Segmenting Business Students Using Cluster Analysis Applied to Student Satisfaction Survey Results

    ERIC Educational Resources Information Center

    Gibson, Allen

    2009-01-01

    This paper demonstrates a new application of cluster analysis to segment business school students according to their degree of satisfaction with various aspects of the academic program. The resulting clusters provide additional insight into drivers of student satisfaction that are not evident from analysis of the responses of the student body as a…

  17. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    ERIC Educational Resources Information Center

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  18. Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

    SciTech Connect

    Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh; Wang, Shaobu; Mackey, Patrick S.; Hines, Paul; Huang, Zhenyu

    2013-11-30

    Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques on two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.

  19. Leukaemia clusters in childhood: geographical analysis in Britain.

    PubMed Central

    Knox, E G

    1994-01-01

    STUDY OBJECTIVE--To validate previously demonstrated spatial clustering of childhood leukaemias by showing relative proximities of selected map features to cluster locations, compared with control locations. If clusters are real, then they are likely to be close to a determining hazard. DESIGN--Cluster postcode loci and partially matched control postcodes were compared in terms of distances to railways, main roads, churches, surface water, woodland areas, and railside industrial installations. Further supporting comparisons between non-clustered cases and random postcode controls with those map features representable as single grid points were made. SETTING--England, Wales, and Scotland 1966-83. SUBJECTS--Grid referenced registrations of 9406 childhood leukaemias and non-Hodgkin's lymphomas, including 264 pairs (or more) separated by < 150 m, and grid references of random postcodes in equal numbers. MAIN RESULTS--The 264 clusters showed relative proximities (or the inverse) to several map features, of which the most powerful was an association with railways. The non-railway associations seemed to be statistically indirect. Some railside industrial installations, identified from a railway atlas, also showed relative proximities to leukaemia clusters, as well as to non-clustered cases, but did not "explain" the railway effect. These installations, with seemingly independent geographical associations, included oil refineries, petrochemical plants, oil storage and oil distribution depots, power stations, and steelworks. CONCLUSIONS--The previously shown childhood leukaemia clusters are confirmed to be non-random through their systematic associations with certain map features when compared with the control locations. The common patterns of close association of clustered and non-clustered cases imply a common aetiological component arising from a common environmental hazard--namely the use of fossil fuels, especially petroleum. PMID:7964336

  20. Two worlds collide: Image analysis methods for quantifying structural variation in cluster molecular dynamics

    SciTech Connect

    Steenbergen, K. G.; Gaston, N.

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.

  1. Neighborhood effects on an individual's health using neighborhood measurements developed by factor analysis and cluster analysis.

    PubMed

    Li, Yu-Sheng; Chuang, Ying-Chih

    2009-01-01

    This study suggests a multivariate-structural approach combining factor analysis and cluster analysis that could be used to examine neighborhood effects on an individual's health. Data were from the Taiwan Social Change Survey conducted in 1990, 1995, and 2000. In total, 5,784 women and men aged over 20 years living in 428 neighborhoods were interviewed. Participants' addresses were geocoded with census data for measuring neighborhood-level characteristics. The factor analysis was applied to identify neighborhood dimensions, which were used as entities in the cluster analysis to generate a neighborhood typology. The factor analysis generated three neighborhood dimensions: neighborhood education, age structure, and neighborhood family structure and employment. The cluster analysis generated six types of neighborhoods with combinations of the three neighborhood dimensions. Multilevel binomial regression models were used to assess the effects of neighborhoods on an individual's health. The results showed that the biggest health differences were between two neighborhood types: (1) the highest concentration of inhabitants younger than 15 years, a moderate education level, and a moderate level of single-parent families and (2) the highest educational level, a median level of single-parent families, and a median level of elderly concentrations. Individuals living in the first type had significantly higher chances of having functional limitations and poor self-rated health than the individuals in the second neighborhood type. Our study suggests that the multivariate-structural approach improves neighborhood measurements by addressing neighborhood diversity and examining how an individual's health varies in different neighborhood contexts. PMID:18629650

  2. An evaluation of centrality measures used in cluster analysis

    NASA Astrophysics Data System (ADS)

    Engström, Christopher; Silvestrov, Sergei

    2014-12-01

    Clustering of data into groups of similar objects plays an important part when analysing many types of data, especially when the datasets are large as they often are in for example bioinformatics, social networks and computational linguistics. Many clustering algorithms such as K-means and some types of hierarchical clustering need a number of centroids representing the 'center' of the clusters. The choice of centroids for the initial clusters often plays an important role in the quality of the clusters. Since a data point with a high centrality supposedly lies close to the 'center' of some cluster, this can be used to assign centroids rather than through some other method such as picking them at random. Some work have been done to evaluate the use of centrality measures such as degree, betweenness and eigenvector centrality in clustering algorithms. The aim of this article is to compare and evaluate the usefulness of a number of common centrality measures such as the above mentioned and others such as PageRank and related measures.

  3. Gennclus: New Models for General Nonhierarchical Clustering Analysis.

    ERIC Educational Resources Information Center

    Desarbo, Wayne S.

    1982-01-01

    A general class of nonhierarchical clustering models and associated algorithms for fitting them are presented. These models generalize the Shepard-Arabie Additive clusters model. Two applications are given and extensions to three-way models, nonmetric analyses, and other model specifications are provided. (Author/JKS)

  4. Alternatives to Multilevel Modeling for the Analysis of Clustered Data

    ERIC Educational Resources Information Center

    Huang, Francis L.

    2016-01-01

    Multilevel modeling has grown in use over the years as a way to deal with the nonindependent nature of observations found in clustered data. However, other alternatives to multilevel modeling are available that can account for observations nested within clusters, including the use of Taylor series linearization for variance estimation, the design…

  5. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Sanders, Elizabeth A.

    2011-01-01

    This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…

  6. Detecting Hotspots from Taxi Trajectory Data Using Spatial Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Zhao, P. X.; Qin, K.; Zhou, Q.; Liu, C. K.; Chen, Y. X.

    2015-07-01

    A method of trajectory clustering based on decision graph and data field is proposed in this paper. The method utilizes data field to describe spatial distribution of trajectory points, and uses decision graph to discover cluster centres. It can automatically determine cluster parameters and is suitable to trajectory clustering. The method is applied to trajectory clustering on taxi trajectory data, which are on the holiday (May 1st, 2014), weekday (Wednesday, May 7th, 2014) and weekend (Saturday, May 10th, 2014) respectively, in Wuhan City, China. The hotspots in four hours (8:00-9:00, 12:00-13:00, 18:00-19:00 and 23:00-24:00) for three days are discovered and visualized in heat maps. In the future, we will further research the spatiotemporal distribution and laws of these hotspots, and use more data to carry out the experiments.

  7. Boundaries, links and clusters: a new paradigm in spatial analysis?

    PubMed Central

    Jacquez, Geoff M.; Kaufmann, Andy; Goovaerts, Pierre

    2008-01-01

    This paper develops and applies new techniques for the simultaneous detection of boundaries and clusters within a probabilistic framework. The new statistic “little b” (written bij) evaluates boundaries between adjacent areas with different values, as well as links between adjacent areas with similar values. Clusters of high values (hotspots) and low values (coldspots) are then constructed by joining areas abutting locations that are significantly high (e.g., an unusually high disease rate) and that are connected through a “link” such that the values in the adjoining areas are not significantly different. Two techniques are proposed and evaluated for accomplishing cluster construction: “big B” and the “ladder” approach. We compare the statistical power and empirical Type I and Type II error of these approaches to those of wombling and the local Moran test. Significance may be evaluated using distribution theory based on the product of two continuous (e.g., non-discrete) variables. We also provide a “distribution free” algorithm based on resampling of the observed values. The methods are applied to simulated data for which the locations of boundaries and clusters is known, and compared and contrasted with clusters found using the local Moran statistic and with polygon Womble boundaries. The little b approach to boundary detection is comparable to polygon wombling in terms of Type I error, Type II error and empirical statistical power. For cluster detection, both the big B and ladder approaches have lower Type I and Type II error and are more powerful than the local Moran statistic. The new methods are not constrained to find clusters of a pre-specified shape, such as circles, ellipses and donuts, and yield a more accurate description of geographic variation than alternative cluster tests that presuppose a specific cluster shape. We recommend these techniques over existing cluster and boundary detection methods that do not provide such a

  8. Analysis of Bow Shock Oscillations Observed by the Cluster Spacecraft

    NASA Astrophysics Data System (ADS)

    Kruparova, O.; Maksimovic, M.; Krupar, V.; Santolik, O.; Soucek, J.; Safrankova, J.; Nemecek, Z.

    2014-12-01

    We present preliminary results of an analysis of multiple bow shock crossings lasting several hours that were observed by the four Cluster spacecraft during separation distances less than 1000 km. Using a simple timing method, we determined shock normal and velocity along this normal for a large number of events. We have calculated bow shock standoff distances assuming that the shock surface has a parabolic shape. These distances have been compared with the distances predicted by gas-dynamic models based on upstream plasma parameters measured by the ACE spacecraft. We analyze the oscillations of the standoff distance during multiple crossings in order to define a typical frequency of the bow shock motion and to find upstream origin of these fluctuations. We also compare the angles θBn (the angle between the magnetic field and the shock normal) retrieved by the timing method with the angles calculated by an iterative method based on Rankine-Hugoniot jump conditions. We have achieved a good agreement between these two techniques.

  9. Cluster Computing For Real Time Seismic Array Analysis.

    NASA Astrophysics Data System (ADS)

    Martini, M.; Giudicepietro, F.

    A seismic array is an instrument composed by a dense distribution of seismic sen- sors that allow to measure the directional properties of the wavefield (slowness or wavenumber vector) radiated by a seismic source. Over the last years arrays have been widely used in different fields of seismological researches. In particular they are applied in the investigation of seismic sources on volcanoes where they can be suc- cessfully used for studying the volcanic microtremor and long period events which are critical for getting information on the volcanic systems evolution. For this reason arrays could be usefully employed for the volcanoes monitoring, however the huge amount of data produced by this type of instruments and the processing techniques which are quite time consuming limited their potentiality for this application. In order to favor a direct application of arrays techniques to continuous volcano monitoring we designed and built a small PC cluster able to near real time computing the kinematics properties of the wavefield (slowness or wavenumber vector) produced by local seis- mic source. The cluster is composed of 8 Intel Pentium-III bi-processors PC working at 550 MHz, and has 4 Gigabytes of RAM memory. It runs under Linux operating system. The developed analysis software package is based on the Multiple SIgnal Classification (MUSIC) algorithm and is written in Fortran. The message-passing part is based upon the LAM programming environment package, an open-source imple- mentation of the Message Passing Interface (MPI). The developed software system includes modules devote to receiving date by internet and graphical applications for the continuous displaying of the processing results. The system has been tested with a data set collected during a seismic experiment conducted on Etna in 1999 when two dense seismic arrays have been deployed on the northeast and the southeast flanks of this volcano. A real time continuous acquisition system has been simulated by

  10. Visual cluster analysis in support of clinical decision intelligence.

    PubMed

    Gotz, David; Sun, Jimeng; Cao, Nan; Ebadollahi, Shahram

    2011-01-01

    Electronic health records (EHRs) contain a wealth of information about patients. In addition to providing efficient and accurate records for individual patients, large databases of EHRs contain valuable information about overall patient populations. While statistical insights describing an overall population are beneficial, they are often not specific enough to use as the basis for individualized patient-centric decisions. To address this challenge, we describe an approach based on patient similarity which analyzes an EHR database to extract a cohort of patient records most similar to a specific target patient. Clusters of similar patients are then visualized to allow interactive visual refinement by human experts. Statistics are then extracted from the refined patient clusters and displayed to users. The statistical insights taken from these refined clusters provide personalized guidance for complex decisions. This paper focuses on the cluster refinement stage where an expert user must interactively (a) judge the quality and contents of automatically generated similar patient clusters, and (b) refine the clusters based on his/her expertise. We describe the DICON visualization tool which allows users to interactively view and refine multidimensional similar patient clusters. We also present results from a preliminary evaluation where two medical doctors provided feedback on our approach. PMID:22195102

  11. Visual Cluster Analysis in Support of Clinical Decision Intelligence

    PubMed Central

    Gotz, David; Sun, Jimeng; Cao, Nan; Ebadollahi, Shahram

    2011-01-01

    Electronic health records (EHRs) contain a wealth of information about patients. In addition to providing efficient and accurate records for individual patients, large databases of EHRs contain valuable information about overall patient populations. While statistical insights describing an overall population are beneficial, they are often not specific enough to use as the basis for individualized patient-centric decisions. To address this challenge, we describe an approach based on patient similarity which analyzes an EHR database to extract a cohort of patient records most similar to a specific target patient. Clusters of similar patients are then visualized to allow interactive visual refinement by human experts. Statistics are then extracted from the refined patient clusters and displayed to users. The statistical insights taken from these refined clusters provide personalized guidance for complex decisions. This paper focuses on the cluster refinement stage where an expert user must interactively (a) judge the quality and contents of automatically generated similar patient clusters, and (b) refine the clusters based on his/her expertise. We describe the DICON visualization tool which allows users to interactively view and refine multidimensional similar patient clusters. We also present results from a preliminary evaluation where two medical doctors provided feedback on our approach. PMID:22195102

  12. Cluster Analysis in Patients with GOLD 1 Chronic Obstructive Pulmonary Disease

    PubMed Central

    Gagnon, Philippe; Casaburi, Richard; Saey, Didier; Porszasz, Janos; Provencher, Steeve; Milot, Julie; Bourbeau, Jean; O’Donnell, Denis E.; Maltais, François

    2015-01-01

    Background We hypothesized that heterogeneity exists within the Global Initiative for Chronic Obstructive Lung Disease (GOLD) 1 spirometric category and that different subgroups could be identified within this GOLD category. Methods Pre-randomization study participants from two clinical trials were symptomatic/asymptomatic GOLD 1 chronic obstructive pulmonary disease (COPD) patients and healthy controls. A hierarchical cluster analysis used pre-randomization demographics, symptom scores, lung function, peak exercise response and daily physical activity levels to derive population subgroups. Results Considerable heterogeneity existed for clinical variables among patients with GOLD 1 COPD. All parameters, except forced expiratory volume in 1 second (FEV1)/forced vital capacity (FVC), had considerable overlap between GOLD 1 COPD and controls. Three-clusters were identified: cluster I (18 [15%] COPD patients; 105 [85%] controls); cluster II (45 [80%] COPD patients; 11 [20%] controls); and cluster III (22 [92%] COPD patients; 2 [8%] controls). Apart from reduced diffusion capacity and lower baseline dyspnea index versus controls, cluster I COPD patients had otherwise preserved lung volumes, exercise capacity and physical activity levels. Cluster II COPD patients had a higher smoking history and greater hyperinflation versus cluster I COPD patients. Cluster III COPD patients had reduced physical activity versus controls and clusters I and II COPD patients, and lower FEV1/FVC versus clusters I and II COPD patients. Conclusions The results emphasize heterogeneity within GOLD 1 COPD, supporting an individualized therapeutic approach to patients. Trial registration www.clinicaltrials.gov. NCT01360788 and NCT01072396. PMID:25906326

  13. Topological Analysis of Emerging Bipole Clusters Producing Violent Solar Events

    NASA Astrophysics Data System (ADS)

    Mandrini, C. H.; Schmieder, B.; Démoulin, P.; Guo, Y.; Cristiani, G. D.

    2014-06-01

    During the rising phase of Solar Cycle 24 tremendous activity occurred on the Sun with rapid and compact emergence of magnetic flux leading to bursts of flares (C to M and even X-class). We investigate the violent events occurring in the cluster of two active regions (ARs), NOAA numbers 11121 and 11123, observed in November 2010 with instruments onboard the Solar Dynamics Observatory and from Earth. Within one day the total magnetic flux increased by 70 % with the emergence of new groups of bipoles in AR 11123. From all the events on 11 November, we study, in particular, the ones starting at around 07:16 UT in GOES soft X-ray data and the brightenings preceding them. A magnetic-field topological analysis indicates the presence of null points, associated separatrices, and quasi-separatrix layers (QSLs) where magnetic reconnection is prone to occur. The presence of null points is confirmed by a linear and a non-linear force-free magnetic-field model. Their locations and general characteristics are similar in both modelling approaches, which supports their robustness. However, in order to explain the full extension of the analysed event brightenings, which are not restricted to the photospheric traces of the null separatrices, we compute the locations of QSLs. Based on this more complete topological analysis, we propose a scenario to explain the origin of a low-energy event preceding a filament eruption, which is accompanied by a two-ribbon flare, and a consecutive confined flare in AR 11123. The results of our topology computation can also explain the locations of flare ribbons in two other events, one preceding and one following the ones at 07:16 UT. Finally, this study provides further examples where flare-ribbon locations can be explained when compared to QSLs and only, partially, when using separatrices.

  14. Cluster analysis of Wisconsin Breast Cancer dataset using self-organizing maps.

    PubMed

    Pantazi, Stefan; Kagolovsky, Yuri; Moehr, Jochen R

    2002-01-01

    This work deals with multidimensional data analysis, precisely cluster analysis applied to a very well known dataset, the Wisconsin Breast Cancer dataset. After the introduction of the topics of the paper the cluster analysis concept is shortly explained and different methods of cluster analysis are compared. Further, the Kohonen model of self-organizing maps is briefly described together with an example and with explanations of how the cluster analysis can be performed using the maps. After describing the data set and the methodology used for the analysis we present the findings using textual as well as visual descriptions and conclude that the approach is a useful complement for assessing multidimensional data and that this dataset has been overused for automated decision benchmarking purposes, without a thorough analysis of the data it contains. PMID:15460731

  15. Clinical Significance of Asthma Clusters by Longitudinal Analysis in Korean Asthma Cohort

    PubMed Central

    Kim, Sujeong; Yoon, Sun-young; Kwon, Hyouk-Soo; Chang, Yoon-Seok; Cho, You Sook; Jang, An-Soo; Park, Jung Won; Nahm, Dong-Ho; Yoon, Ho-Joo; Cho, Sang-Heon; Cho, Young-Joo; Choi, ByoungWhui; Moon, Hee-Bom; Kim, Tae-Bum

    2013-01-01

    Background We have previously identified four distinct groups of asthma patients in Korean cohorts using cluster analysis: (A) smoking asthma, (B) severe obstructive asthma, (C) early-onset atopic asthma, and (D) late-onset mild asthma. Methods and Results A longitudinal analysis of each cluster in a Korean adult asthma cohort was performed to investigate the clinical significance of asthma clusters over 12 months. Cluster A showed relatively high asthma control test (ACT) scores but relatively low FEV1 scores, despite a high percentage of systemic corticosteroid use. Cluster B had the lowest mean FEV1, ACT, and the quality of life questionnaire for adult Korean asthmatics (QLQAKA) scores throughout the year, even though the percentage of systemic corticosteroid use was the highest among the four clusters. Cluster C was ranked second in terms of FEV1, with the second lowest percentage of systemic corticosteroid use, and showed a marked improvement in subjective symptoms over time. Cluster D consistently showed the highest FEV1, the lowest systemic corticosteroid use, and had high ACT and QLQAKA scores. Conclusion Our asthma clusters had clinical significance with consistency among clusters over 12 months. These distinctive phenotypes may be useful in classifying asthma in real practice. PMID:24391784

  16. Analysis of the dynamical cluster approximation for the Hubbard model

    NASA Astrophysics Data System (ADS)

    Aryanpour, K.; Hettler, M. H.; Jarrell, M.

    2002-04-01

    We examine a central approximation of the recently introduced dynamical cluster approximation (DCA) by example of the Hubbard model. By both analytical and numerical means we study noncompact and compact contributions to the thermodynamic potential. We show that approximating noncompact diagrams by their cluster analogs results in a larger systematic error as compared to the compact diagrams. Consequently, only the compact contributions should be taken from the cluster, whereas noncompact graphs should be inferred from the appropriate Dyson equation. The distinction between noncompact and compact diagrams persists even in the limit of infinite dimensions. Nonlocal corrections beyond the DCA exist for the noncompact diagrams, whereas they vanish for compact diagrams.

  17. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    ERIC Educational Resources Information Center

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  18. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

    PubMed Central

    Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

    2015-01-01

    Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700

  19. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than

  20. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    SciTech Connect

    Lalonde, Michel Wassenaar, Richard; Wells, R. Glenn; Birnie, David; Ruddy, Terrence D.

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  1. Visual cluster analysis and pattern recognition template and methods

    DOEpatents

    Osbourn, G.C.; Martinez, R.F.

    1999-05-04

    A method of clustering using a novel template to define a region of influence is disclosed. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques. 30 figs.

  2. Visual cluster analysis and pattern recognition template and methods

    DOEpatents

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    1999-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  3. Analysis of the convective evaporation of nondilute clusters of drops

    NASA Technical Reports Server (NTRS)

    Bellan, J.; Harstad, K.

    1987-01-01

    The penetration distance of an outer flow into a drop cluster volume is the critical, evaporation mode-controlling parameter in the present model for nondilute drop clusters' convective evaporation. The model is found to perform well for such low penetration distances as those obtained for dense clusters in hot environments and low relative velocities between the outer gases and the cluster. For large penetration distances, however, the predictive power of the model deteriorates; in addition, the evaporation time is found to be a weak function of the initial relative velocity and a strong function of the initial drop temperature. The results generally show that the interior drop temperature was transient throughout the drop lifetime, although temperature nonuniformities persisted up to the first third of the total evaporation time at most.

  4. Detecting data fabrication in clinical trials from cluster analysis perspective.

    PubMed

    Wu, Xiaoru; Carlsson, Martin

    2011-01-01

    Detecting data fabrication is of great importance in clinical trials. As the role of statisticians in detecting abnormal data patterns has grown, a large number of statistical procedures have been developed, most of which are based on descriptive statistics. Based upon the fact that substantial data fabrication cases have certain clustering structures, this paper discusses the potential for the use of statistical clustering method in fraud detection. Three clustering patterns, angular, neighborhood and repeated measurements clustering, are identified and explored. Correspondingly, simple and efficient test statistics are proposed and randomization tests are carried out. The proposed methods are applied to a 12-week multi-center study for illustration. Extensive simulations are conducted to validate the effectiveness of the procedures. PMID:20936626

  5. Visual cluster analysis and pattern recognition template and methods

    SciTech Connect

    Osbourn, G.C.; Martinez, R.F.

    1993-12-31

    This invention is comprised of a method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  6. Dynamics of cD clusters of galaxies. II: Analysis of seven Abell clusters

    NASA Technical Reports Server (NTRS)

    Oegerle, William R.; Hill, John M.

    1994-01-01

    We have investigated the dynamics of the seven Abell clusters A193, A399, A401, A1795, A1809, A2063, and A2124, based on redshift data reported previously by us (Hill & Oegerle, (1993)). These papers present the initial results of a survey of cD cluster kinematics, with an emphasis on studying the nature of peculiar velocity cD galaxies and their parent clusters. In the current sample, we find no evidence for significant peculiar cD velocities, with respect to the global velocity distribution. However, the cD in A2063 has a significant (3 sigma) peculiar velocity with respect to galaxies in the inner 1.5 Mpc/h, which is likely due to the merger of a subcluster with A2063. We also find significant evidence for subclustering in A1795, and a marginally peculiar cD velocity with respect to galaxies within approximately 200 kpc/h of the cD. The available x-ray, optical, and galaxy redshift data strongly suggest that a subcluster has merged with A1795. We propose that the subclusters which merged with A1795 and A2063 were relatively small, with shallow potential wells, so that the cooling flows in these clusters were not disrupted. Two-body gravitational models of the A399/401 and A2063/MKW3S systems indicate that A399/401 is a bound pair with a total virial mass of approximately 4 x 10(exp 15) solar mass/h, while A2063 and MKW3S are very unlikely to be bound.

  7. Delineation of river bed-surface patches by clustering high-resolution spatial grain size data

    NASA Astrophysics Data System (ADS)

    Nelson, Peter A.; Bellugi, Dino; Dietrich, William E.

    2014-01-01

    The beds of gravel-bed rivers commonly display distinct sorting patterns, which at length scales of ~ 0.1 - 1 channel widths appear to form an organization of patches or facies. This paper explores alternatives to traditional visual facies mapping by investigating methods of patch delineation in which clustering analysis is applied to a high-resolution grid of spatial grain-size distributions (GSDs) collected during a flume experiment. Specifically, we examine four clustering techniques: 1) partitional clustering of grain-size distributions with the k-means algorithm (assigning each GSD to a type of patch based solely on its distribution characteristics), 2) spatially-constrained agglomerative clustering ("growing" patches by merging adjacent GSDs, thus generating a hierarchical structure of patchiness), 3) spectral clustering using Normalized Cuts (using the spatial distance between GSDs and the distribution characteristics to generate a matrix describing the similarity between all GSDs, and using the eigenvalues of this matrix to divide the bed into patches), and 4) fuzzy clustering with the fuzzy c-means algorithm (assigning each GSD a membership probability to every patch type). For each clustering method, we calculate metrics describing how well-separated cluster-average GSDs are and how patches are arranged in space. We use these metrics to compute optimal clustering parameters, to compare the clustering methods against each other, and to compare clustering results with patches mapped visually during the flume experiment.All clustering methods produced better-separated patch GSDs than the visually-delineated patches. Although they do not produce crisp cluster assignment, fuzzy algorithms provide useful information that can characterize the uncertainty of a location on the bed belonging to any particular type of patch, and they can be used to characterize zones of transition from one patch to another. The extent to which spatial information influences

  8. Evidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data

    NASA Astrophysics Data System (ADS)

    Folino, Gianluigi; Gori, Fabio; Jetten, Mike S. M.; Marchiori, Elena

    The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clusters in order to discover information about the internal structure of the considered dataset, or the relative abundance of protein families. Different methods for clustering analysis of metagenomic datasets have been proposed. Here we focus on evidence-based methods for clustering that employ knowledge extracted from proteins identified by a BLASTx search (proxygenes). We consider two clustering algorithms introduced in previous works and a new one. We discuss advantages and drawbacks of the algorithms, and use them to perform taxonomic analysis of metagenomic data. To this aim, three real-life benchmark datasets used in previous work on metagenomic data analysis are used. Comparison of the results indicates satisfactory coherence of the taxonomies output by the three algorithms, with respect to phylogenetic content at the class level and taxonomic distribution at phylum level. In general, the experimental comparative analysis substantiates the effectiveness of evidence-based clustering methods for taxonomic analysis of metagenomic data.

  9. Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics

    ERIC Educational Resources Information Center

    Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.

    2007-01-01

    We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…

  10. A method of using cluster analysis to study statistical dependence in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.

  11. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    ERIC Educational Resources Information Center

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  12. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    ERIC Educational Resources Information Center

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  13. Wavelet analysis to characterize cluster dynamics in a circulating fluidized bed

    SciTech Connect

    Guenther, C.; Breault, R.W.

    2007-04-30

    A common hydrodynamic feature in heavily loaded circulating fluidized beds is the presence of clusters. The continuous formation and destruction of clusters strongly influences particle hold-up, pressure drop, heat transfer at the wall, and mixing. In this paper fiber optic data is analyzed using discrete wavelet analysis to characterize the dynamic behavior of clusters. Five radial positions at three different axial locations under five different operating conditions spanning three different flow regimes were analyzed using discrete wavelets. Results are summarized with respect to cluster size and frequency.

  14. Quantitative Methylation Analysis of the PCDHB Gene Cluster.

    PubMed

    Banelli, Barbara; Romani, Massimo

    2015-01-01

    Long Range Epigenetic Silencing (LRES) is a repressed chromatin state of large chromosomal regions caused by DNA hypermethylation and histone modifications and is commonly observed in cancer. At 5q31 a LRES region of 800 kb includes three multi-gene clusters (PCDHA@, PCDHB@, and PCDHG@, respectively). Multiple experimental evidences have led to consider the PCDHB cluster as a DNA methylation marker of aggressiveness in neuroblastoma, second most common solid tumor in childhood. Because of its potential involvement not only in neuroblastoma but also in other malignancies, an easy and fast assay to screen the DNA methylation content of the PCDHB cluster might be useful for the precise stratification of the patients into risk groups and hence for choosing the most appropriate therapeutic protocol. Accordingly, we have developed a simple and cost-effective Pyrosequencing(®) assay to evaluate the methylation level of 17 genes in the protocadherin B cluster (PCDHB@). The rationale behind this Pyrosequencing assay can in principle be applied to analyze the DNA methylation level of any gene cluster with high homologies for screening purposes. PMID:26103900

  15. Cluster analysis applied to CO₂ concentrations at a rural site.

    PubMed

    Pérez, Isidro A; Sánchez, M Luisa; García, M Ángeles; Ozores, Marta; Pardo, Nuria

    2015-02-01

    In rural environments, atmospheric CO2 is mainly controlled by natural processes such as respiration-photosynthesis or low atmosphere evolution. This paper considers atmospheric CO2 measurements obtained at a rural site during 2011 using the wavelength-scanned cavity ringdown spectroscopy technique and presents two clustering methods, the silhouette being calculated to evaluate procedure validity. In the first method, clusters were formed depending on the similarity of wind roses, with satisfactory silhouette values. An anticyclonic rotation of the wind direction was observed during the daily cycle and clusters were formed by consecutive directions following the mixing layer evolution. However, monthly roses revealed four quite different wind directions, mainly oriented in the E-W axis. Although CO2 was not used in this procedure, a successful link between clusters and CO2 was obtained. In the second procedure, clusters were formed by the similarity of CO2 histograms calculated in intervals of one or two ancillary variables, wind direction, time of day, or month. The influence of a nearby city, the daily evolution of the low atmosphere, and the growing season were highlighted. Finally, the usefulness of the method lies in its easy extension to other gases or variables. PMID:25300184

  16. Fuzzy clustering analysis of the first 10 MEIC chemicals.

    PubMed

    Sârbu, C; Pop, H F

    2000-03-01

    In this paper, we discuss the classification results of the toxicological responses of 32 in vivo and in vitro test systems to the first 10 MEIC chemicals. In this order we have used different fuzzy clustering algorithms, namely hierarchical fuzzy clustering, hierarchical and horizontal fuzzy characteristics clustering and a new clustering technique, namely fuzzy hierarchical cross-classification. The characteristics clustering technique produces fuzzy partitions of the characteristics (chemicals) involved and thus it is a useful tool for studying the (dis)similarities between different chemicals and for essential chemicals selection. The cross-classification algorithm produces not only a fuzzy partition of the test systems analyzed, but also a fuzzy partition of the considered 10 MEIC (multicentre evaluation of in vitro cytotoxicity) chemicals. In this way it is possible to identify which chemicals are responsible for the similarities or differences observed between different groups of test systems. In another way, there is a specific sensitivity of a chemical for one or more toxicological tests. PMID:10665388

  17. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    SciTech Connect

    Hu, Lin; Maroudas, Dimitrios; Hammond, Karl D.; Wirth, Brian D.

    2015-10-28

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes of helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.

  18. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    NASA Astrophysics Data System (ADS)

    Hu, Lin; Hammond, Karl D.; Wirth, Brian D.; Maroudas, Dimitrios

    2015-10-01

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (Hen, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes of helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He4 and He5 clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.

  19. Applying Social Networking and Clustering Algorithms to Galaxy Groups in ALFALFA

    NASA Astrophysics Data System (ADS)

    Bramson, Ali; Wilcots, E. M.

    2012-01-01

    Because most galaxies live in groups, and the environment in which it resides affects the evolution of a galaxy, it is crucial to develop tools to understand how galaxies are distributed within groups. At the same time we must understand how groups are distributed and connected in the larger scale structure of the Universe. I have applied a variety of networking techniques to assess the substructure of galaxy groups, including distance matrices, agglomerative hierarchical clustering algorithms and dendrograms. We use distance matrices to locate groupings spatially in 3-D. Dendrograms created from agglomerative hierarchical clustering results allow us to quantify connections between galaxies and galaxy groups. The shape of the dendrogram reveals if the group is spatially homogenous or clumpy. These techniques are giving us new insight into the structure and dynamical state of galaxy groups and large scale structure. We specifically apply these techniques to the ALFALFA survey of the Coma-Abell 1367 supercluster and its resident galaxy groups.

  20. Deconstruction and analysis of multiphonic clusters in the modern flute

    NASA Astrophysics Data System (ADS)

    Barravecchio, Shauna

    The modern flute has been acoustically analyzed in great detail by many, but only from the point of view of traditional playing techniques. Very little research exists to date on more modem, "extended" technique performance. This paper explores the production of multiphonic note clusters as played on the modern flute. Several clusters as notated in James Pellerite's book on flute fingerings are recorded and analyzed for frequency content. Each one is then compared to the expected frequency content based on John Backus' 1978 paper on woodwind multiphonics. Using this information, the fingering configuration of each cluster can be deconstructed and each component pitch explained in terms of the root frequencies, overtone series, and sideband frequencies.

  1. Molecular orbital analysis of dicarbido-transition-metal cluster compounds

    SciTech Connect

    Halet, J.; Mingos, D.M.P.

    1988-01-01

    Molecular orbital calculations on dicarbido-transition-metal carbonyl cluster compounds have shown that the bonding between C/sub 2/ and the metal cage results primarily from electron donation from the C/sub 2/ sigma/sub rho/- and ..pi..-bonding molecular orbitals and back donation from filled metallic molecular orbitals to the C/sub 2/ ..pi..* orbitals. The bonding therefore follows closely the Chatt-Dewar-Ducanson model that has been established previously for ethyne and ethene complexes but not for interstitial moieties. The C-C separation in the dicarbido clusters depends critically on the geometric constraints imposed by the metal cage and the extent of forward and back donation. In these clusters where the carbon atoms are in adjacent trigonal-prismatic sites the calculated formal bond order is between 1.0 and 1.5, which agrees well with the observed C-C bond lengths.

  2. Functional clustering algorithm for the analysis of dynamic network data

    NASA Astrophysics Data System (ADS)

    Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.

    2009-05-01

    We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.

  3. A cluster analysis of affective states before and during competition.

    PubMed

    Martinent, Guillaume; Nicolas, Michel; Gaudreau, Patrick; Campo, Mickaël

    2013-12-01

    The purposes of the current study were to identify affective profiles of athletes both before and during the competition and to examine differences between these profiles on coping and attainment of sport goals among a sample of 306 athletes. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed four different clusters both before and during the competition. The four clusters were very similar at the two measurement occasions: high positive affect facilitators (n = 88 and 81), facilitators (n = 75 and 25), low affect debilitators (n = 83 and 127), and high negative affect debilitators (n = 60 and 73). Results of MANOVAs revealed that coping and attainment of sport achievement goal significantly differed across the affective profiles. Results are discussed in terms of current research on positive and negative affective states. PMID:24334321

  4. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    PubMed

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species. PMID:15888677

  5. MASSCLEAN - MASSive CLuster Evolution and ANalysis Package - Description and Tests

    NASA Astrophysics Data System (ADS)

    Hanson, Margaret M.; Popescu, B.

    2009-05-01

    We present MASSCLEAN, a new, sophisticated and robust stellar cluster image and photometry simulation package. This package is able to create color-magnitude diagrams and standard FITS images in any of the traditional optical and near-infrared bands based on cluster characteristics input by the user, including but not limited to distance, age, mass, radius and extinction. At the limit of very distant, unresolved clusters, we have checked the integrated colors created in MASSCLEAN against those from other simple stellar population models with consistent results. We have also tested models which provide a reasonable estimate of the field star contamination in images and color-magnitude diagrams. We demonstrate the package by simulating images and color-magnitude diagrams of well known massive Milky Way clusters and compare their appearance to real data. Because the algorithm populates the cluster with a discrete number of tenable stars, it can be used as part of a Monte Carlo Method to derive the probabilistic range of characteristics (integrated colors, for example) consistent with a given cluster mass and age. The discrete nature of our code is demonstrated in the realistic stochastic variation seen in the predicted V-K integrated colors as compared to the unrealistically smooth color from other SSP codes. Our simulation package is available to download and will run on any standard desktop running UNIX/Linux. Full documentation on installation and its use is also available. Finally, a web-based version of MASSCLEAN which can be immediately used and is sufficiently adaptable for most applications is available through a web interface.

  6. Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis

    PubMed Central

    Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.

    2012-01-01

    Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360

  7. An Analysis of Spatial Clustering and Implications for Wildlife Management: A Burrowing Owl Example

    NASA Astrophysics Data System (ADS)

    Fisher, Joshua B.; Trulio, Lynne A.; Biging, Gregory S.; Chromczak, Debra

    2007-03-01

    Analysis tools that combine large spatial and temporal scales are necessary for efficient management of wildlife species, such as the burrowing owl ( Athene cunicularia). We assessed the ability of Ripley’s K-function analysis integrated into a geographic information system (GIS) to determine changes in burrowing owl nest clustering over two years at NASA Ames Research Center. Specifically, we used these tools to detect changes in spatial and temporal nest clustering before, during, and after conducting management by mowing to maintain low vegetation height at nest burrows. We found that the scale and timing of owl nest clustering matched the scale and timing of our conservation management actions over a short time frame. While this study could not determine a causal link between mowing and nest clustering, we did find that Ripley’s K and GIS were effective in detecting owl nest clustering and show promise for future conservation uses.

  8. An analysis of spatial clustering and implications for wildlife management: a burrowing owl example.

    PubMed

    Fisher, Joshua B; Trulio, Lynne A; Biging, Gregory S; Chromczak, Debra

    2007-03-01

    Analysis tools that combine large spatial and temporal scales are necessary for efficient management of wildlife species, such as the burrowing owl (Athene cunicularia). We assessed the ability of Ripley's K-function analysis integrated into a geographic information system (GIS) to determine changes in burrowing owl nest clustering over two years at NASA Ames Research Center. Specifically, we used these tools to detect changes in spatial and temporal nest clustering before, during, and after conducting management by mowing to maintain low vegetation height at nest burrows. We found that the scale and timing of owl nest clustering matched the scale and timing of our conservation management actions over a short time frame. While this study could not determine a causal link between mowing and nest clustering, we did find that Ripley's K and GIS were effective in detecting owl nest clustering and show promise for future conservation uses. PMID:17253092

  9. Representation in GIS of the Results Obtained by Cluster Analysis in Territorial Profile

    NASA Astrophysics Data System (ADS)

    Dârdalą, Marian; Furtuną, Titus Felix; Reveiu, Adriana

    2010-05-01

    Cluster analysis involves grouping characteristics analyzed by the values of grouping parameters. The statistical cluster analysis uses the method of minimum dispersion of hierarchical tree method, in order to obtain the information necessary to group the administrative units. Territorial profile economic analyses can use the cluster analysis in order to make hierarchical classifications, according to performance, strategies. The hierarchical tree methods consist in identifying certain hierarchies used to take into consideration the units. According to their organization mode, clusters can be: vertically integrated, horizontally integrated, emerging clusters. With GIS, spatial data clustering can be applied to spatial data to represent the territorial analysis performed. In terms of viewing the results of cluster analysis by GIS, a usual way is to generate cartograms. In this case, a cartogram supposes defining a colors ramp, having a number of colors equal with the number of groups that divide the collectivity. The parameters used as the basis of the clustering process may exist as independent data or can be stored in the database of an informatic system. As a case study we implemented an ArcMap extension to analyze the clusters by selecting the grouping parameters and by setting the number of groups that will divide the collectivity. Cartograma can be defined taking into consideration multi-level administrative division of the territory. For example, Romania uses the split on villages, counties, regions and macro-regions. Analysis can be applied on different levels of administrative organization by aggregating the values of parameters. For example, the value of a parameter for a county can be obtained by aggregating all parameter values, for all villages, belonging to the county.

  10. The ACS Virgo Cluster Survey. XIV. Analysis of Color-Magnitude Relations in Globular Cluster Systems

    NASA Astrophysics Data System (ADS)

    Mieske, Steffen; Jordán, Andrés; Côté, Patrick; Kissler-Patig, Markus; Peng, Eric W.; Ferrarese, Laura; Blakeslee, John P.; Mei, Simona; Merritt, David; Tonry, John L.; West, Michael J.

    2006-12-01

    We examine the correlation between globular cluster (GC) color and magnitude using HST ACS imaging for a sample of 79 early-type galaxies (-21.7Cluster Survey. Using the KMM mixture modeling algorithm, we find a highly significant correlation, γz≡d(g-z)/dz=-0.037+/-0.004, between color and magnitude for the subpopulation of blue GCs in the co-added GC color-magnitude diagram of the three brightest Virgo Cluster galaxies (M49, M87, and M60): brighter GCs are redder than their fainter counterparts. For the single GC systems of M87 and M60, we find similar correlations; M49 does not appear to show a significant trend. There is no correlation between (g-z) and Mz for GCs of the red subpopulation. The correlation γg≡d(g-z)/dg for the blue subpopulation is much weaker than d(g-z)/dz. Using Monte Carlo simulations, we attribute this finding to the fact that the blue subpopulation in Mg extends to higher luminosities than does the red subpopulation, which biases the KMM fit results. The correlation between color and Mz thus is a real effect: this conclusion is supported by biweight fits to the same color distributions. We identify two environmental dependencies that influence the derived color-magnitude relation: (1) the slope decreases in significance with decreasing galaxy luminosity; and (2) the slope is stronger for GC populations located at smaller galactocentric distances. We examine several physical mechanisms that might give rise to the observed color-magnitude relation: (1) presence of contaminators; (2) accretion of GCs from low-mass galaxies; (3) stochastic effects; (4) the capture of field stars by individual GCs; and (5) GC self-enrichment. We conclude that self-enrichment and field-star capture, or a combination of these processes, offer the most promising means of explaining our observations. Based on observations with the NASA/ESA Hubble Space Telescope obtained at the Space Telescope

  11. Identification and analysis of the resorcinomycin biosynthetic gene cluster.

    PubMed

    Ooya, Koichi; Ogasawara, Yasushi; Noike, Motoyoshi; Dairi, Tohru

    2015-01-01

    Resorcinomycin (1) is composed of a nonproteinogenic amino acid, (S)-2-(3,5-dihydroxy-4-isopropylphenyl)-2-guanidinoacetic acid (2), and glycine. A biosynthetic gene cluster was identified in a genome database of Streptoverticillium roseoverticillatum by searching for orthologs of the genes responsible for biosynthesis of pheganomycin (3), which possesses a (2)-derivative at its N-terminus. The cluster contained a gene encoding an ATP-grasp-ligase (res5), which was suggested to catalyze the peptide bond formation between 2 and glycine. A res5-deletion mutant lost 1 productivity but accumulated 2 in the culture broth. However, recombinant RES5 did not show catalytic activity to form 1 with 2 and glycine as substrates. Moreover, heterologous expression of the cluster resulted in accumulation of only 2 and no production of 1 was observed. These results suggested that a peptide with glycine at its N-terminus may be used as a nucleophile and then maturated by a peptidase encoded by a gene outside of the cluster. PMID:26034896

  12. An Empirical Comparison of Variable Standardization Methods in Cluster Analysis.

    ERIC Educational Resources Information Center

    Schaffer, Catherine M.; Green, Paul E.

    1996-01-01

    The common marketing research practice of standardizing the columns of a persons-by-variables data matrix prior to clustering the entities corresponding to the rows was evaluated with 10 large-scale data sets. Results indicate that the column standardization practice may be problematic for some kinds of data that marketing researchers used for…

  13. QTL analysis of fruit cluster abundance in grape (Vitis sp.)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sustainably maximizing yield or productivity of fruit over time is a major goal of modern viticulture. One major yield component is the number of fruit or flower clusters present on a single shoot of the current year’s growth. A quantitative trail loci (QTL) study was conducted on both average numbe...

  14. Functional Analysis of a Mosquito Short Chain Dehydrogenase Cluster

    PubMed Central

    Mayoral, Jaime G.; Leonard, Kate T.; Defelipe, Lucas A.; Turjansksi, Adrian G.; Nouzova, Marcela; Noriegal, Fernando G.

    2013-01-01

    The short chain dehydrogenases (SDR) constitute one the oldest and largest families of enzymes with over 46,000 members in sequence databases. About 25% of all known dehydrogenases belong to the SDR family. SDR enzymes have critical roles in lipid, amino acid, carbohydrate, hormone and xenobiotic metabolism as well as in redox sensor mechanisms. This family is present in archaea, bacteria, and eukaryota, emphasizing their versatility and fundamental importance for metabolic processes. We identified a cluster of eight SDRs in the mosquito Aedes aegypti (AaSDRs). Members of the cluster differ in tissue specificity and developmental expression. Heterologous expression produced recombinant proteins that had diverse substrate specificities, but distinct from the conventional insect alcohol (ethanol) dehydrogenases. They are all NADP+-dependent and they have S-enantioselectivity and preference for secondary alcohols with 8–15 carbons. Homology modeling was used to build the structure of AaSDR1 and two additional cluster members. The computational study helped explain the selectivity towards the (10S)-isomers as well as the reduced activity of AaSDR4 and AaSDR9 for longer isoprenoid substrates. Similar clusters of SDRs are present in other species of insects, suggesting similar selection mechanisms causing duplication and diversification of this family of enzymes. PMID:23238893

  15. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways

    PubMed Central

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques

    2008-01-01

    The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources. PMID:18524799

  16. NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.

    PubMed

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques

    2008-07-01

    The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources. PMID:18524799

  17. Quantitative analysis of damage clustering and void linking for spallation modeling in tantalum

    SciTech Connect

    Tonks, D.L.; Zurek, A.K.; Thissell, W.R.; Hixson, R.

    1997-05-01

    In a companion paper in this volume by Zurek et al, micrographs of incipient spallation damage in rolled tantalum were numerically analyzed using image analysis techniques. Void sizes, locations, and overall porosity were measured and tabulated. In this paper, we extend this analysis to include void clusters and examine the correlation between cluster size and the ranges of local instabilities between voids visible in the micrographs. The implications for spallation modeling will be given.

  18. Groundwater source contamination mechanisms: Physicochemical profile clustering, risk factor analysis and multivariate modelling

    NASA Astrophysics Data System (ADS)

    Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.

    2014-04-01

    An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.

  19. Groundwater source contamination mechanisms: physicochemical profile clustering, risk factor analysis and multivariate modelling.

    PubMed

    Hynds, Paul; Misstear, Bruce D; Gill, Laurence W; Murphy, Heather M

    2014-04-01

    An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p=0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p<0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts. PMID:24583518

  20. Dynamical analysis of the cluster pair: A3407 + A3408

    NASA Astrophysics Data System (ADS)

    Nascimento, R. S.; Ribeiro, A. L. B.; Trevisan, M.; Carrasco, E. R.; Plana, H.; Dupke, R.

    2016-08-01

    We carried out a dynamical study of the galaxy cluster pair A3407 \\& A3408 based on a spectroscopic survey obtained with the 4 meter Blanco telescope at the CTIO, plus 6dF data, and ROSAT All-Sky-Survey. The sample consists of 122 member galaxies brighter than $m_R=20$. Our main goal is to probe the galaxy dynamics in this field and verify if the sample constitutes a single galaxy system or corresponds to an ongoing merging process. Statistical tests were applied to clusters members showing that both the composite system A3407 + A3408 as well as each individual cluster have Gaussian velocity distribution. A velocity gradient of $\\sim 847\\pm 114$ $\\rm km\\;s^{-1}$ was identified around the principal axis of the projected distribution of galaxies, indicating that the global field may be rotating. Applying the KMM algorithm to the distribution of galaxies we found that the solution with two clusters is better than the single unit solution at the 99\\% c.l. This is consistent with the X-ray distribution around this field, which shows no common X-ray halo involving A3407 and A3408. We also estimated virial masses and applied a two-body model to probe the dynamics of the pair. The more likely scenario is that in which the pair is gravitationally bound and probably experiences a collapse phase, with the cluster cores crossing in less than $\\sim$1 $h^{-1}$ Gyr, a pre-merger scenario. The complex X-ray morphology, the gas temperature, and some signs of galaxy evolution in A3408 suggests a post-merger scenario, with cores having crossed each other $\\sim 1.65 h^{-1}$Gyr ago, as an alternative solution.

  1. Dynamical analysis of the cluster pair: A3407 + A3408

    NASA Astrophysics Data System (ADS)

    Nascimento, R. S.; Ribeiro, A. L. B.; Trevisan, M.; Carrasco, E. R.; Plana, H.; Dupke, R.

    2016-08-01

    We carried out a dynamical study of the galaxy cluster pair A3407 and A3408 based on a spectroscopic survey obtained with the 4 metre Blanco telescope at the Cerro Tololo Interamerican Observatory, plus 6dF data, and ROSAT All-Sky Survey. The sample consists of 122 member galaxies brighter than mR = 20. Our main goal is to probe the galaxy dynamics in this field and verify if the sample constitutes a single galaxy system or corresponds to an ongoing merging process. Statistical tests were applied to clusters members showing that both the composite system A3407 + A3408 as well as each individual cluster have Gaussian velocity distribution. A velocity gradient of ˜847 ± 114 km s- 1 was identified around the principal axis of the projected distribution of galaxies, indicating that the global field may be rotating. Applying the KMM algorithm to the distribution of galaxies, we found that the solution with two clusters is better than the single unit solution at the 99 per cent cl. This is consistent with the X-ray distribution around this field, which shows no common X-ray halo involving A3407 and A3408. We also estimated virial masses and applied a two-body model to probe the dynamics of the pair. The more likely scenario is that in which the pair is gravitationally bound and probably experiences a collapse phase, with the cluster cores crossing in less than ˜1 h-1 Gyr, a pre-merger scenario. The complex X-ray morphology, the gas temperature, and some signs of galaxy evolution in A3408 suggest a post-merger scenario, with cores having crossed each other ˜1.65 h-1 Gyr ago, as an alternative solution.

  2. Galaxy cluster mass estimation from stacked spectroscopic analysis

    NASA Astrophysics Data System (ADS)

    Farahi, Arya; Evrard, August E.; Rozo, Eduardo; Rykoff, Eli S.; Wechsler, Risa H.

    2016-08-01

    We use simulated galaxy surveys to study: (i) how galaxy membership in redMaPPer clusters maps to the underlying halo population, and (ii) the accuracy of a mean dynamical cluster mass, Mσ(λ), derived from stacked pairwise spectroscopy of clusters with richness λ. Using ˜130 000 galaxy pairs patterned after the Sloan Digital Sky Survey (SDSS) redMaPPer cluster sample study of Rozo et al., we show that the pairwise velocity probability density function of central-satellite pairs with mi < 19 in the simulation matches the form seen in Rozo et al. Through joint membership matching, we deconstruct the main Gaussian velocity component into its halo contributions, finding that the top-ranked halo contributes ˜60 per cent of the stacked signal. The halo mass scale inferred by applying the virial scaling of Evrard et al. to the velocity normalization matches, to within a few per cent, the log-mean halo mass derived through galaxy membership matching. We apply this approach, along with miscentring and galaxy velocity bias corrections, to estimate the log-mean matched halo mass at z = 0.2 of SDSS redMaPPer clusters. Employing the velocity bias constraints of Guo et al., we find = ln (M30) + αm ln (λ/30) with M30 = 1.56 ± 0.35 × 1014 M⊙ and αm = 1.31 ± 0.06stat ± 0.13sys. Systematic uncertainty in the velocity bias of satellite galaxies overwhelmingly dominates the error budget.

  3. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    SciTech Connect

    Harmon, S; Wendelberger, B; Jeraj, R

    2014-06-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [{sup 18}F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI{sub mean}= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI{sub range}: 0.2301–1). Conclusion: Using commonly-used clustering algorithms

  4. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    SciTech Connect

    Data Analysis and Visualization and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  5. Photometric analysis of Galactic Stellar Clusters in VVV Survey

    NASA Astrophysics Data System (ADS)

    Mauro, F.; Moni Bidin, C.; Cohen, R. E.; Geisler, D.; Villanova, S.; Chené, A. N.

    2014-10-01

    We show the preliminary results of the study of the structure of the Horizontal Branch of Liller 1 and some results from the Calcium Triplet method using Ks magnitude applied to several Galactic Globular clusters using data from the VISTA Variables in the Via Lactea Survey (Minniti et al. 2010) and obtained with GeMS/GSAOI. The data are extracted with the new automatic VVV-SkZ_pipeline photometric pipeline (Mauro et al. 2013).

  6. Insights into quasar UV spectra using unsupervised clustering analysis

    NASA Astrophysics Data System (ADS)

    Tammour, A.; Gallagher, S. C.; Daley, M.; Richards, G. T.

    2016-06-01

    Machine learning techniques can provide powerful tools to detect patterns in multidimensional parameter space. We use K-means - a simple yet powerful unsupervised clustering algorithm which picks out structure in unlabelled data - to study a sample of quasar UV spectra from the Quasar Catalog of the 10th Data Release of the Sloan Digital Sky Survey (SDSS-DR10) of Paris et al. Detecting patterns in large data sets helps us gain insights into the physical conditions and processes giving rise to the observed properties of quasars. We use K-means to find clusters in the parameter space of the equivalent width (EW), the blue- and red-half-width at half-maximum (HWHM) of the Mg II 2800 Å line, the C IV 1549 Å line, and the C III] 1908 Å blend in samples of broad absorption line (BAL) and non-BAL quasars at redshift 1.6-2.1. Using this method, we successfully recover correlations well-known in the UV regime such as the anti-correlation between the EW and blueshift of the C IV emission line and the shape of the ionizing spectra energy distribution (SED) probed by the strength of He II and the Si III]/C III] ratio. We find this to be particularly evident when the properties of C III] are used to find the clusters, while those of Mg II proved to be less strongly correlated with the properties of the other lines in the spectra such as the width of C IV or the Si III]/C III] ratio. We conclude that unsupervised clustering methods (such as K-means) are powerful methods for finding `natural' binning boundaries in multidimensional data sets and discuss caveats and future work.

  7. Analysis of the nutritional status of algae by Fourier transform infrared chemical imaging

    NASA Astrophysics Data System (ADS)

    Hirschmugl, Carol J.; Bayarri, Zuheir-El; Bunta, Maria; Holt, Justin B.; Giordano, Mario

    2006-09-01

    A new non-destructive method to study the nutritional status of algal cells and their environments is demonstrated. This approach allows rapid examination of whole cells without any or little pre-treatment providing a large amount of information on the biochemical composition of cells and growth medium. The method is based on the analysis of a collection of infrared (IR) spectra for individual cells; each spectrum describes the biochemical composition of a portion of a cell; a complete set of spectra is used to reconstruct an image of the entire cell. To obtain spatially resolved information synchrotron radiation was used as a bright IR source. We tested this method on the green flagellate Euglena gracilis; a comparison was conducted between cells grown in nutrient replete conditions (Type 1) and on cells allowed to deplete their medium (Type 2). Complete sets of spectra for individual cells of both types were analyzed with agglomerative hierarchical clustering, leading to distinct clusters representative of the two types of cells. The average spectra for the clusters confirmed the similarities between the clusters and the types of cells. The clustering analysis, therefore, allows the distinction of cells of the same species, but with different nutritional histories. In order to facilitate the application of the method and reduce manipulation (washing), we analyzed the cells in the presence of residual medium. The results obtained showed that even with residual medium the outcome of the clustering analysis is reliable. Our results demonstrate the applicability FTIR microspectroscopy for ecological and ecophysiological studies.

  8. StarBooster Demonstrator Cluster Configuration Analysis/Verification Program

    NASA Technical Reports Server (NTRS)

    DeTurris, Dianne J.

    2003-01-01

    In order to study the flight dynamics of the cluster configuration of two first stage boosters and upper-stage, flight-testing of subsonic sub-scale models has been undertaken using two glideback boosters launched on a center upper-stage. Three high power rockets clustered together were built and flown to demonstrate vertical launch, separation and horizontal recovery of the boosters. Although the boosters fly to conventional aircraft landing, the centerstage comes down separately under its own parachute. The goal of the project has been to collect data during separation and flight for comparison with a six degree of freedom simulation. The configuration for the delta wing canard boosters comes from a design by Starcraft Boosters, Inc. The subscale rockets were constructed of foam covered in carbon or fiberglass and were launched with commercially available solid rocket motors. The first set of boosters built were 3-ft tall with a 4-ft tall centerstage, and two additional sets of boosters were made that were each over 5-ft tall with a 7.5 ft centerstage. The rocket cluster is launched vertically, then after motor bum out the boosters are separated and flown to a horizontal landing under radio-control. An on-board data acquisition system recorded data during both the launch and glide phases of flight.

  9. Non-equilibrium relaxation analysis in cluster algorithms

    NASA Astrophysics Data System (ADS)

    Nonomura, Yoshihiko

    2014-03-01

    In Monte Carlo study of phase transitions, the critical slowing down has been a serious problem. In order to overcome this difficulty, two kinds of approaches have been proposed. One is the cluster algorithms, where global update scheme based on a percolation theory is introduced in order to refrain from the power-law behavior at the critical point. Another is the non-equilibrium relaxation method, where the power-law critical relaxation process is analyzed by the dynamical scaling theory in order to refrain from time-consuming equilibration. Then, the next step is to fuse these two approaches -- to investigate phase transitions with early-stage relaxation process of cluster algorithms. Since the dynamical scaling theory does not hold in cluster algorithms in principle, such attempt had been considered impossible. In the present talk we show that such fusion is actually possible using an empirical scaling form obtained from the 2D Ising models instead of the dynamical scaling theory. Applications to the q >= 3 Potts models, +/- J Ising models etc. will also be explained in the presentation.

  10. How Teachers Use and Manage Their Blogs? A Cluster Analysis of Teachers' Blogs in Taiwan

    ERIC Educational Resources Information Center

    Liu, Eric Zhi-Feng; Hou, Huei-Tse

    2013-01-01

    The development of Web 2.0 has ushered in a new set of web-based tools, including blogs. This study focused on how teachers use and manage their blogs. A sample of 165 teachers' blogs in Taiwan was analyzed by factor analysis, cluster analysis and qualitative content analysis. First, the teachers' blogs were analyzed according to six criteria…

  11. Abundance analysis of an extended sample of open clusters: A search for chemical inhomogeneities

    NASA Astrophysics Data System (ADS)

    Reddy, Arumalla B. S.; Giridhar, Sunetra; Lambert, David L.

    We have initiated a program to explore the presence of chemical inhomogeneities in the Galactic disk using the open clusters as ideal probes. We have analyzed high-dispersion echelle spectra (R ≥ 55,000) of red giant members for eleven open clusters to derive abundances for many elements. The membership to the cluster has been confirmed through their radial velocities and proper motions. The spread in temperatures and gravities being very small among the red giants, nearly the same stellar lines were employed thereby reducing the random errors. The errors of average abundance for the cluster were generally in 0.02 to 0.07 dex range. Our present sample covers galactocentric distances of 8.3 to 11.3 kpc and an age range of 0.2 to 4.3 Gyrs. Our earlier analysis of four open clusters (Reddy A.B.S. et al., 2012, MNRAS, 419,1350) indicate that abundances relative to Fe for elements from Na to Eu are equal within measurement uncertainties to published abundances for thin disk giants in the field. This supports the view that field stars come from disrupted open clusters. In the enlarged sample of eleven open clusters we find cluster to cluster abundance variations for some s- and r- process elements, with certain elements such as Zr and Ba showing large variation. These differences mark the signatures that these clusters had formed under different environmental conditions (Type II SN, Type Ia SN, AGB stars or a mixture of any of these) unique to the time and site of formation. These eleven clusters support the widely held impression that there is an abundance gradient such that the metallicity [Fe/H] at the solar galactocentric distance decreases outwards at about -0.1 dex per kpc.

  12. The CERN analysis facility—a PROOF cluster for day-one physics analysis

    NASA Astrophysics Data System (ADS)

    G-Oetringhaus, J. F.

    2008-07-01

    ALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - CERN Analysis Facility) for analysis. The system is especially aimed at the prototyping phase of analyses that need a high number of development iterations and thus require a short response time. Typical examples are the tuning of cuts during the development of an analysis as well as calibration and alignment. Furthermore, the use of an interactive system with very fast response will allow ALICE to extract physics observables out of first data quickly. An additional use case is fast event simulation and reconstruction. A test setup consisting of 40 machines is used for evaluation since May 2006. The PROOF system enables the parallel processing and xrootd the access to files distributed on the test cluster. An automatic staging system for files either catalogued in the ALICE file catalog or stored in the CASTOR mass storage system has been developed. The current setup and ongoing development towards disk quotas and CPU fairshare are described. Furthermore, the integration of PROOF into ALICE's software framework (AliRoot) is discussed.

  13. OCAAT: automated analysis of star cluster colour-magnitude diagrams for gauging the local distance scale

    NASA Astrophysics Data System (ADS)

    Perren, Gabriel I.; Vázquez, Ruben A.; Piatti, Andrés E.; Moitinho, André

    2014-05-01

    Star clusters are among the fundamental astrophysical objects used in setting the local distance scale. Despite its crucial importance, the accurate determination of the distances to the Magellanic Clouds (SMC/LMC) remains a fuzzy step in the cosmological distance ladder. The exquisite astrometry of the recently launched ESA Gaia mission is expected to deliver extremely accurate statistical parallaxes, and thus distances, to the SMC/LMC. However, an independent SMC/LMC distance determination via main sequence fitting of star clusters provides an important validation check point for the Gaia distances. This has been a valuable lesson learnt from the famous Hipparcos Pleiades distance discrepancy problem. Current observations will allow hundreds of LMC/SMC clusters to be analyzed in this light. Today, the most common approach for star cluster main sequence fitting is still by eye. The process is intrinsically subjective and affected by large uncertainties, especially when applied to poorly populated clusters. It is also, clearly, not an efficient route for addressing the analysis of hundreds, or thousands, of star clusters. These concerns, together with a new attitude towards advanced statistical techniques in astronomy and the availability of powerful computers, have led to the emergence of software packages designed for analyzing star cluster photometry. With a few rare exceptions, those packages are not publicly available. Here we present OCAAT (Open Cluster Automated Analysis Tool), a suite of publicly available open source tools that fully automatises cluster isochrone fitting. The code will be applied to a large set of hundreds of open clusters observed in the Washington system, located in the Milky Way and the Magellanic Clouds. This will allow us to generate an objective and homogeneous catalog of distances up to ~ 60 kpc along with its associated reddening, ages and metallicities and uncertainty estimates.

  14. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study

    PubMed Central

    2013-01-01

    Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE)) and cluster-specific (i.e. random-effects logistic regression (RELR)) models for analyzing data from cluster randomized trials (CRTs) with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI) and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE), and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small) is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF) <3; within-cluster MI for CRTs with VIF≥3 and cluster size>50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis. PMID:23343209

  15. A weak-lensing analysis of the Abell 383 cluster

    NASA Astrophysics Data System (ADS)

    Huang, Z.; Radovich, M.; Grado, A.; Puddu, E.; Romano, A.; Limatola, L.; Fu, L.

    2011-05-01

    Aims: We use deep CFHT and SUBARU uBVRIz archival images of the Abell 383 cluster (z = 0.187) to estimate its mass by weak-lensing. Methods: To this end, we first use simulated images to check the accuracy provided by our Kaiser-Squires-Broadhurst (KSB) pipeline. These simulations include shear testing programme (STEP) 1 and 2 simulations, as well as more realistic simulations of the distortion of galaxy shapes by a cluster with a Navarro-Frenk-White (NFW) profile. From these simulations we estimate the effect of noise on shear measurement and derive the correction terms. The R-band image is used to derive the mass by fitting the observed tangential shear profile with an NFW mass profile. Photometric redshifts are computed from the uBVRIz catalogs. Different methods for the foreground/background galaxy selection are implemented, namely selection by magnitude, color, and photometric redshifts, and the results are compared. In particular, we developed a semi-automatic algorithm to select the foreground galaxies in the color-color diagram, based on the observed colors. Results: Using color selection or photometric redshifts improves the correction of dilution from foreground galaxies: this leads to higher signals in the inner parts of the cluster. We obtain a cluster mass Mvir = 7.5+2.7_{-1.9 × 1014} M⊙: this value is 20% higher than previous estimates and is more consistent the mass expected from X-ray data. The R-band luminosity function of the cluster is computed and gives a total luminosity Ltot = (2.14 ± 0.5) × 1012 L⊙ and a mass-to-luminosity ratio M/L 300 M⊙/L⊙. Based on: data collected with the Subaru Telescope (University of Tokyo) and obtained from the SMOKA, which is operated by the Astronomy Data Center, National Astronomical Observatory of Japan; observations obtained with MegaPrime/MegaCam, a joint project of CFHT and CEA/DAPNIA, at the Canada-France-Hawaii Telescope (CFHT), which is operated by the National Research Council (NRC) of Canada

  16. Clinical heterogeneity in patients with early-stage Parkinson's disease: a cluster analysis.

    PubMed

    Liu, Ping; Feng, Tao; Wang, Yong-jun; Zhang, Xuan; Chen, Biao

    2011-09-01

    The aim of this study was to investigate the clinical heterogeneity of Parkinson's disease (PD) among a cohort of Chinese patients in early stages. Clinical data on demographics, motor variables, motor phenotypes, disease progression, global cognitive function, depression, apathy, sleep quality, constipation, fatigue, and L-dopa complications were collected from 138 Chinese PD subjects in early stages (Hoehn and Yahr stages 1-3). The PD subject subtypes were classified using k-means cluster analysis according to the clinical data from five- to three-cluster consecutively. Kappa statistical analysis was performed to evaluate the consistency among different subtype solutions. The cluster analysis indicated four main subtypes: the non-tremor dominant subtype (NTD, n=28, 20.3%), rapid disease progression subtype (RDP, n=7, 5.1%), young-onset subtype (YO, n=50, 36.2%), and tremor dominant subtype (TD, n=53, 38.4%). Overall, 78.3% (108/138) of subjects were always classified between the same three groups (52 always in TD, 7 in RDP, and 49 in NTD), and 98.6% (136/138) between five- and four-cluster solutions. However, subjects classified as NTD in the four-cluster analysis were dispersed into different subtypes in the three-cluster analysis, with low concordance between four- and three-cluster solutions (kappa value=-0.139, P=0.001). This study defines clinical heterogeneity of PD patients in early stages using a data-driven approach. The subtypes generated by the four-cluster solution appear to exhibit ideal internal cohesion and external isolation. PMID:21887844

  17. Cluster analysis applied to velocity and attenuation tomography: the case study of Mt. Vesuvius

    NASA Astrophysics Data System (ADS)

    Siniscalchi, A.; Bianco, F.; Del Pezzo, E.; de Siena, L.; di Giuseppe, M. G.; Petrillo, Z.

    2009-04-01

    The interpretation of the results of seismic velocity and attenuation inversion are usually based on the qualitative observation and comparison of the different tomographic images. A promising tool to jointly interpret tomographic models based on different parameters resides in the application of statistical classification methods, such as the k-means clustering method, which minimizes the logic distance among each group of observations having homogeneous physical properties and maximizes the same quantity between groups. The correlation between the models is subsequently examined and significant classes (volumes of high correlation) are identified. Such technique is able to spatially clusterize the zones having similar characteristics in a statistical sense. Each zone is finally identified by the barycenter (centroid) of the corresponding cluster. The Vp velocity and Qp and Qs attenuation structures of Mt. Vesuvius, Italy, have been already qualitatively interpreted by a comparison with other similar investigations. To obtain a more quantitative interpretation gathered in a unified model consistent with the entire dataset, a cluster analysis was applied to this models. An optimizing study on the proper number of classes recognizes five clusters corresponding to separate zones inside the volcano structure. - The first cluster can be considered as a "background" cluster, and corresponds to the areas with "average" seismic properties (mainly located below the topographical interface). - The second cluster defines a spatial pattern corresponding to the residual part of the feeding conduit of the volcano. - The third cluster corresponds to two volumes, the first vertically extended between -1000 and -3000 m above the sea level, North-Eastward the cone; the second, in the same depth range Westward the central cone, and linked to the first one at -2000 m. These two volumes may be associated with hydrothermal basins. - The fourth and fifth clusters are described both by

  18. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient

    PubMed Central

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-01-01

    Background Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. Results In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. Conclusion This study shows that SCC is

  19. Cluster identification in AA5754 aluminium sheets using mathematical morphology analysis.

    PubMed

    Tewari, A; Tiwari, S; Biswas, P; Mishra, R K

    2008-05-01

    Quantitative image analysis of particle distribution in the microstructure of continuous cast (CC) and direct chill cast (DC) AA5754 aluminium alloy sheets have been conducted. This information can be used as an input for modelling mechanical deformation and instability in these materials. The quantitative analysis reveals that there are significant differences in the microstructure of the two materials even though the total content of second-phase particles is statistically similar. Qualitative observation shows the second-phase particles to be arranged in the form of streaks parallel to the rolling direction in the CC sheets and in a uniform random manner in the DC sheets. The main difference in the geometric microstructure of the CC and DC material is the spatial arrangement of the second-phase particles. A new mathematical technique called proximity analysis is developed to identify clusters and group of particles belonging to a cluster. Quantification through proximity analysis reveals that the particle clusters in CC sheet are in the form of long clusters (streaks) parallel to the rolling direction and are significantly longer than those in DC sheets (with the largest cluster in CC being four times larger than DC), and also have anisotropic angular orientation parallel to the rolling direction. The lower value of fracture strain observed in the CC sheets compared to DC sheets is attributed to a combination of large sizes of clusters and their preferential alignment along the rolling direction in the CC microstructure. PMID:18445147

  20. Functional cluster analysis of CT perfusion maps: a new tool for diagnosis of acute stroke?

    PubMed

    Baumgartner, Christian; Gautsch, Kurt; Böhm, Christian; Felber, Stephan

    2005-09-01

    CT perfusion imaging constitutes an important contribution to the early diagnosis of acute stroke. Cerebral blood flow (CBF), cerebral blood volume (CBV) and time-to-peak (TTP) maps are used to estimate the severity of cerebral damage after acute ischemia. We introduce functional cluster analysis as a new tool to evaluate CT perfusion in order to identify normal brain, ischemic tissue and large vessels. CBF, CBV and TTP maps represent the basis for cluster analysis applying a partitioning (k-means) and density-based (density-based spatial clustering of applications with noise, DBSCAN) paradigm. In patients with transient ischemic attack and stroke, cluster analysis identified brain areas with distinct hemodynamic properties (gray and white matter) and segmented territorial ischemia. CBF, CBV and TTP values of each detected cluster were displayed. Our preliminary results indicate that functional cluster analysis of CT perfusion maps may become a helpful tool for the interpretation of perfusion maps and provide a rapid means for the segmentation of ischemic tissue. PMID:15827821

  1. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis

    PubMed Central

    Moens, Katrien; Siegert, Richard J.; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Background Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Methods Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward’s method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Results Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001); being on ART (highest proportions for clusters two and three, p=0.012); global distress (F=26.8, p<0.001), physical distress (F=36.3, p<0.001) and psychological distress subscale (F=21.8, p<0.001) (all subscales worst for cluster one, best for cluster four). Conclusions The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with

  2. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    PubMed Central

    Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

    2014-01-01

    Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499

  3. Application of Geostatistical Methods and Machine Learning for spatio-temporal Earthquake Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Schaefer, A. M.; Daniell, J. E.; Wenzel, F.

    2014-12-01

    Earthquake clustering tends to be an increasingly important part of general earthquake research especially in terms of seismic hazard assessment and earthquake forecasting and prediction approaches. The distinct identification and definition of foreshocks, aftershocks, mainshocks and secondary mainshocks is taken into account using a point based spatio-temporal clustering algorithm originating from the field of classic machine learning. This can be further applied for declustering purposes to separate background seismicity from triggered seismicity. The results are interpreted and processed to assemble 3D-(x,y,t) earthquake clustering maps which are based on smoothed seismicity records in space and time. In addition, multi-dimensional Gaussian functions are used to capture clustering parameters for spatial distribution and dominant orientations. Clusters are further processed using methodologies originating from geostatistics, which have been mostly applied and developed in mining projects during the last decades. A 2.5D variogram analysis is applied to identify spatio-temporal homogeneity in terms of earthquake density and energy output. The results are mitigated using Kriging to provide an accurate mapping solution for clustering features. As a case study, seismic data of New Zealand and the United States is used, covering events since the 1950s, from which an earthquake cluster catalogue is assembled for most of the major events, including a detailed analysis of the Landers and Christchurch sequences.

  4. APPLICATION OF CLUSTER ANALYSIS TO AEROMETRIC DATA. VOLUME I. PART 1: CLUSTERING, VALIDATION, AND CLASSIFICATION OF DATA. PART 2: INVESTIGATION AND REPORT OF CLUSTER ANALYSIS

    EPA Science Inventory

    The calibration and enhancement of Wolfe's NORMIX (normal mixtures) computer program in the National Computing Center of the U.S. Environmental Protection Agency at the Research Triangle Park, NC is documented. The program is available for data clustering, validation, and classif...

  5. Cluster analysis based on dimensional information with applications to feature selection and classification

    NASA Technical Reports Server (NTRS)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  6. AutoGate: A Macintosh cluster analysis program for flow cytometry data

    SciTech Connect

    Salzman, G.C.; Parson, J.D.; Beckman, R.J. ); Stewart, S.J.; Stewart, C.C. )

    1993-01-01

    AutoGate, a cluster analysis program for Flow Cytometry Standard Data, has been developed for use on the Macintosh computer. AutoGate reads FCS format list mode files. It partitions the list mode events into a user-selected number of populations using K-means cluster analysis. One or more of the populations can be displayed as colored, bivariate dot plots. Eight variate data and up to twelve clusters can be analyzed. The dot plots can be saved as PICT format files. Data for individual clusters can be saved as FCS or ASCII format files. AutoGate is available from the authors through the National Flow Cytometry and Sorting Research Resource at Los Alamos.

  7. Chaotic Artificial Bee Colony Used for Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Zhang, Yudong; Wu, Lenan; Wang, Shuihua; Huo, Yuankai

    A new approach based on artificial bee colony (ABC) with chaotic theory was proposed to solve the partitional clustering problem. We first investigate the optimization model including both the encoding strategy and the variance ratio criterion (VRC). Second, a chaotic ABC algorithm was developed based on the Rossler attractor. Experiments on three types of artificial data of different degrees of overlapping all demonstrate the CABC is superior to both genetic algorithm (GA) and combinatorial particle swarm optimization (CPSO) in terms of robustness and computation time.

  8. Statistical analysis of catalogs of extragalactic objects. II - The Abell catalog of rich clusters

    NASA Technical Reports Server (NTRS)

    Hauser, M. G.; Peebles, P. J. E.

    1973-01-01

    The results of a power-spectrum analysis are presented for the distribution of clusters in the Abell catalog. Clear and direct evidence is found for superclusters with small angular scale, in agreement with the recent study of Bogart and Wagoner (1973). It is also found that the degree and angular scale of the apparent superclustering varies with distance in the manner expected if the clustering is intrinsic to the spatial distribution rather than a consequence of patchy local obscuration.

  9. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    PubMed

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups. PMID:26020225

  10. Cluster-based analysis of cycle-to-cycle variations: application to internal combustion engines

    NASA Astrophysics Data System (ADS)

    Cao, Yujun; Kaiser, Eurika; Borée, Jacques; Noack, Bernd R.; Thomas, Lionel; Guilain, Stéphane

    2014-11-01

    We define and illustrate a cluster-based analysis of cycle-to-cycle variations (CCV). The methodology is applied to engine flow but can clearly be valuable for any periodically driven fluid flow at large Reynolds numbers. High-speed particle image velocimetry data acquired during the compression stroke for 161 consecutive engine cycles are used. Clustering is applied to the velocity fields normalised by their kinetic energy. From a phase-averaged analysis of the statistics of cluster content and inter- cluster transitions, we show that CCV can be associated with different sets of trajectories during the second half of the compression phase. Conditional statistics are computed for flow data of each cluster. In particular, we identify a particular subset associated with a loss of large-scale coherence, a very low kinetic energy of the mean flow and a higher fluctuating kinetic energy. This is interpreted as a good indicator of the breakdown of the large-scale coherent tumbling motion. For this particular subset, the cluster analysis confirms the idea of a gradual destabilisation of the in-cylinder flow during the final phase of the compression. Moreover, inter- cycle statistics show that the flow states near TDC and in the measurement zone are statistically independent for consecutive engine cycles. It is important to point out that this approach is generally applicable to very large sets of data, e.g. generated by PIV or LES, and independent of the considered type of information (velocity, concentration, etc.).

  11. An X-Ray Spectral Classification Algorithm with Application to Young Stellar Clusters

    NASA Astrophysics Data System (ADS)

    Hojnacki, S. M.; Kastner, J. H.; Micela, G.; Feigelson, E. D.; LaLonde, S. M.

    2007-04-01

    A large volume of low signal-to-noise, multidimensional data is available from the CCD imaging spectrometers aboard the Chandra X-Ray Observatory and the X-Ray Multimirror Mission (XMM-Newton). To make progress analyzing this data, it is essential to develop methods to sort, classify, and characterize the vast library of X-ray spectra in a nonparametric fashion (complementary to current parametric model fits). We have developed a spectral classification algorithm that handles large volumes of data and operates independently of the requirement of spectral model fits. We use proven multivariate statistical techniques including principal component analysis and an ensemble classifier consisting of agglomerative hierarchical clustering and K-means clustering applied for the first time for spectral classification. The algorithm positions the sources in a multidimensional spectral sequence and then groups the ordered sources into clusters based on their spectra. These clusters appear more distinct for sources with harder observed spectra. The apparent diversity of source spectra is reduced to a three-dimensional locus in principal component space, with spectral outliers falling outside this locus. The algorithm was applied to a sample of 444 strong sources selected from the 1616 X-ray emitting sources detected in deep Chandra imaging spectroscopy of the Orion Nebula Cluster. Classes form sequences in NH, AV, and accretion activity indicators, demonstrating that the algorithm efficiently sorts the X-ray sources into a physically meaningful sequence. The algorithm also isolates important classes of very deeply embedded, active young stellar objects, and yields trends between X-ray spectral parameters and stellar parameters for the lowest mass, pre-main-sequence stars.

  12. Clustering analysis for muon tomography data elaboration in the Muon Portal project

    NASA Astrophysics Data System (ADS)

    Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

    2015-05-01

    Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.

  13. Weighing the Giants - I. Weak-lensing masses for 51 massive galaxy clusters: project overview, data analysis methods and cluster images

    NASA Astrophysics Data System (ADS)

    von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam

    2014-03-01

    This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic

  14. Transcriptome Analysis of Aspergillus flavus Reveals veA-Dependent Regulation of Secondary Metabolite Gene Clusters, Including the Novel Aflavarin Cluster

    PubMed Central

    Cary, J. W.; Han, Z.; Yin, Y.; Lohmar, J. M.; Shantappa, S.; Harris-Coward, P. Y.; Mack, B.; Ehrlich, K. C.; Wei, Q.; Arroyo-Manzanares, N.; Uka, V.; Vanhaecke, L.; Bhatnagar, D.; Yu, J.; Nierman, W. C.; Johns, M. A.; Sorensen, D.; Shen, H.; De Saeger, S.; Diana Di Mavungu, J.

    2015-01-01

    The global regulatory veA gene governs development and secondary metabolism in numerous fungal species, including Aspergillus flavus. This is especially relevant since A. flavus infects crops of agricultural importance worldwide, contaminating them with potent mycotoxins. The most well-known are aflatoxins, which are cytotoxic and carcinogenic polyketide compounds. The production of aflatoxins and the expression of genes implicated in the production of these mycotoxins are veA dependent. The genes responsible for the synthesis of aflatoxins are clustered, a signature common for genes involved in fungal secondary metabolism. Studies of the A. flavus genome revealed many gene clusters possibly connected to the synthesis of secondary metabolites. Many of these metabolites are still unknown, or the association between a known metabolite and a particular gene cluster has not yet been established. In the present transcriptome study, we show that veA is necessary for the expression of a large number of genes. Twenty-eight out of the predicted 56 secondary metabolite gene clusters include at least one gene that is differentially expressed depending on presence or absence of veA. One of the clusters under the influence of veA is cluster 39. The absence of veA results in a downregulation of the five genes found within this cluster. Interestingly, our results indicate that the cluster is expressed mainly in sclerotia. Chemical analysis of sclerotial extracts revealed that cluster 39 is responsible for the production of aflavarin. PMID:26209694

  15. Identification of Asthma Phenotypes Using Cluster Analysis in the Severe Asthma Research Program

    PubMed Central

    Moore, Wendy C.; Meyers, Deborah A.; Wenzel, Sally E.; Teague, W. Gerald; Li, Huashi; Li, Xingnan; D'Agostino, Ralph; Castro, Mario; Curran-Everett, Douglas; Fitzpatrick, Anne M.; Gaston, Benjamin; Jarjour, Nizar N.; Sorkness, Ronald; Calhoun, William J.; Chung, Kian Fan; Comhair, Suzy A. A.; Dweik, Raed A.; Israel, Elliot; Peters, Stephen P.; Busse, William W.; Erzurum, Serpil C.; Bleecker, Eugene R.

    2010-01-01

    Rationale: The Severe Asthma Research Program cohort includes subjects with persistent asthma who have undergone detailed phenotypic characterization. Previous univariate methods compared features of mild, moderate, and severe asthma. Objectives: To identify novel asthma phenotypes using an unsupervised hierarchical cluster analysis. Methods: Reduction of the initial 628 variables to 34 core variables was achieved by elimination of redundant data and transformation of categorical variables into ranked ordinal composite variables. Cluster analysis was performed on 726 subjects. Measurements and Main Results: Five groups were identified. Subjects in Cluster 1 (n = 110) have early onset atopic asthma with normal lung function treated with two or fewer controller medications (82%) and minimal health care utilization. Cluster 2 (n = 321) consists of subjects with early-onset atopic asthma and preserved lung function but increased medication requirements (29% on three or more medications) and health care utilization. Cluster 3 (n = 59) is a unique group of mostly older obese women with late-onset nonatopic asthma, moderate reductions in FEV1, and frequent oral corticosteroid use to manage exacerbations. Subjects in Clusters 4 (n = 120) and 5 (n = 116) have severe airflow obstruction with bronchodilator responsiveness but differ in to their ability to attain normal lung function, age of asthma onset, atopic status, and use of oral corticosteroids. Conclusions: Five distinct clinical phenotypes of asthma have been identified using unsupervised hierarchical cluster analysis. All clusters contain subjects who meet the American Thoracic Society definition of severe asthma, which supports clinical heterogeneity in asthma and the need for new approaches for the classification of disease severity in asthma. PMID:19892860

  16. A landscape-based cluster analysis using recursive search instead of a threshold parameter.

    PubMed

    Gladwin, Thomas E; Vink, Matthijs; Mars, Roger B

    2016-01-01

    Cluster-based analysis methods in neuroimaging provide control of whole-brain false positive rates without the need to conservatively correct for the number of voxels and the associated false negative results. The current method defines clusters based purely on shapes in the landscape of activation, instead of requiring the choice of a statistical threshold that may strongly affect results. Statistical significance is determined using permutation testing, combining both size and height of activation. A method is proposed for dealing with relatively small local peaks. Simulations confirm the method controls the false positive rate and correctly identifies regions of activation. The method is also illustrated using real data. •A landscape-based method to define clusters in neuroimaging data avoids the need to pre-specify a threshold to define clusters.•The implementation of the method works as expected, based on simulated and real data.•The recursive method used for defining clusters, the method used for combining clusters, and the definition of the "value" of a cluster may be of interest for future variations. PMID:27489780

  17. Profiling nurses' job satisfaction, acculturation, work environment, stress, cultural values and coping abilities: A cluster analysis.

    PubMed

    Goh, Yong-Shian; Lee, Alice; Chan, Sally Wai-Chi; Chan, Moon Fai

    2015-08-01

    This study aimed to determine whether definable profiles existed in a cohort of nursing staff with regard to demographic characteristics, job satisfaction, acculturation, work environment, stress, cultural values and coping abilities. A survey was conducted in one hospital in Singapore from June to July 2012, and 814 full-time staff nurses completed a self-report questionnaire (89% response rate). Demographic characteristics, job satisfaction, acculturation, work environment, perceived stress, cultural values, ways of coping and intention to leave current workplace were assessed as outcomes. The two-step cluster analysis revealed three clusters. Nurses in cluster 1 (n = 222) had lower acculturation scores than nurses in cluster 3. Cluster 2 (n = 362) was a group of younger nurses who reported higher intention to leave (22.4%), stress level and job dissatisfaction than the other two clusters. Nurses in cluster 3 (n = 230) were mostly Singaporean and reported the lowest intention to leave (13.0%). Resources should be allocated to specifically address the needs of younger nurses and hopefully retain them in the profession. Management should focus their retention strategies on junior nurses and provide a work environment that helps to strengthen their intention to remain in nursing by increasing their job satisfaction. PMID:24754648

  18. Structural Parameters of M81 Globular Clusters: Analysis of their Intensity Profile

    NASA Astrophysics Data System (ADS)

    Santiago-Cortés, M.; Mayya, Y. D.; Rosa-González, D.

    2014-09-01

    We present here an analysis of the surface brightness profiles on the Hubble Space Telescope (HST) F435W and F814W images for 110 Globular Clusters (GCs) in M81. The structural parameters for each of these clusters were obtained by fitting a King model to the observed profiles. The profiles are well-fitted by the King model in the majority of the GCs. We used these structural parameters to classify the GCs based on their halo and core properties. Based on the physical extent of the halo, measured as the isophotal radius at μ_I = 24 mag/arcsec^2 , we divided the clusters into two groups — compact and classical. By analyzing the core properties, we found 7 cuspy clusters, with properties similar to the cuspy clusters found in the Milky Way. In addition, we found 2 clusters that have a blue excess in the core, similar to the brightest GC in M81. We show that all clusters at galactocentric distance less than 4 kpc are tidally limited in M81.

  19. Cluster analysis for the probability of DSB site induced by electron tracks

    NASA Astrophysics Data System (ADS)

    Yoshii, Y.; Sasaki, K.; Matsuya, Y.; Date, H.

    2015-05-01

    To clarify the influence of bio-cells exposed to ionizing radiations, the densely populated pattern of the ionization in the cell nucleus is of importance because it governs the extent of DNA damage which may lead to cell lethality. In this study, we have conducted a cluster analysis of ionization and excitation events to estimate the number of double-strand breaks (DSBs) induced by electron tracks. A Monte Carlo simulation for electrons in liquid water was performed to determine the spatial location of the ionization and excitation events. The events were divided into clusters by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. The algorithm enables us to sort out the events into the groups (clusters) in which a minimum number of neighboring events are contained within a given radius. For evaluating the number of DSBs in the extracted clusters, we have introduced an aggregation index (AI). The computational results show that a sub-keV electron produces DSBs in a dense formation more effectively than higher energy electrons. The root-mean square radius (RMSR) of the cluster size is below 5 nm, which is smaller than the chromatin fiber thickness. It was found that this size of clustering events has a high possibility to cause lesions in DNA within the chromatin fiber site.

  20. The distinction of 'psychosomatogenic family types' based on parents' self reported questionnaire information: a cluster analysis.

    PubMed

    Rousseau, Sofie; Grietens, Hans; Vanderfaeillie, Johan; Ceulemans, Eva; Hoppenbrouwers, Karel; Desoete, Annemie; Van Leeuwen, Karla

    2014-06-01

    The theory of 'psychosomatogenic family types' is often used in treatment of somatizing adolescents. This study investigated the validity of distinguishing 'psychosomatogenic family types' based on parents' self-reported family features. The study included a Flemish general population sample of 12-year olds (n = 1428). We performed cluster analysis on 3 variables concerning parents' self-reported problems in family functioning. The distinguished clusters were examined for differences in marital problems, parental emotional problems, professional help for family members, demographics, and adolescents' somatization. Results showed the existence of 5 family types: 'chaotic family functioning,' 'average amount of family functioning problems,' 'few family functioning problems,' 'high amount of support and communication problems,' and 'high amount of sense of security problems' clusters. Membership of the 'chaotic family functioning' and 'average amount of family functioning problems' cluster was significantly associated with higher levels of somatization, compared with 'few family functioning problems' cluster membership. Among additional variables, only marital and parental emotional problems distinguished somatization relevant from non relevant clusters: parents in 'average amount of family functioning problems' and 'chaotic family functioning' clusters reported higher problems. The data showed that 'apparently perfect' or 'enmeshed' patterns of family functioning may not be assessed by means of parent report as adopted in this study. In addition, not only adolescents from 'extreme' types of family functioning may suffer from somatization. Further, professionals should be careful assuming that families in which parents report average to high amounts of family functioning problems also show different demographic characteristics. PMID:24749676

  1. Task Analysis for Health Occupations. Cluster: Dental Assisting. Occupation: Dental Assistant. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lathrop, Janice

    This document contains a task analysis for health occupations (dental assistant) in the dental assisting cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing…

  2. Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Home Health Aide. Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lake County Area Vocational Center, Grayslake, IL.

    This document contains a task analysis for health occupations (home health aid) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…

  3. Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials

    ERIC Educational Resources Information Center

    Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric

    2015-01-01

    This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…

  4. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  5. Classifying reanalysis surface temperature probability density functions (PDFs) over North America with cluster analysis

    NASA Astrophysics Data System (ADS)

    Loikith, P. C.; Lintner, B. R.; Kim, J.; Lee, H.; Neelin, J. D.; Waliser, D. E.

    2013-07-01

    important step in projecting future climate change impacts on extremes involves quantifying the underlying probability distribution functions (PDFs) of climate variables. However, doing so can prove challenging when multiple models and large domains are considered. Here an approach to PDF quantification using k-means clustering is considered. A standard clustering algorithm (with k = 5 clusters) is applied to 33 years of daily January surface temperature from two state-of-the-art reanalysis products, the North American Regional Reanalysis and the Modern Era Retrospective Analysis for Research and Applications. The resulting cluster assignments yield spatially coherent patterns that can be broadly related to distinct climate regimes over North America, e.g., low variability over the tropical oceans or temperature advection across stronger or weaker gradients. This technique has the potential to be a useful and intuitive tool for evaluation of model-simulated PDF structure and could provide insight into projections of future changes in temperature.

  6. 3D Plasma Clusters: Analysis of dynamical evolution and individual particle interaction

    SciTech Connect

    Antonova, T.; Thomas, H. M.; Morfill, G. E.; Annaratone, B. M.

    2008-09-07

    3D plasma clusters (up to 100 particles) have been built inside small (32 mm{sup 3}) plasma volume in gravity. It has been estimated that the external confinement has a negligible influence on the processes inside the clusters. At such conditions the analysis of dynamical evolution and individual particle interactions have shown that the binary interaction among particles in addition to the repelling Coulomb force exhibits also an attractive part. The tendency of the systems to approach the state with minimum energy by rearranging particles inside has been detected. The measured 63 particles' cluster vibrations are in close agreement with vibrations of a drop with surface tension. This indicates that even a 63 particle cluster already exhibits properties normally associated with the cooperative regime.

  7. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters

    PubMed Central

    Cimermancic, Peter; Medema, Marnix H.; Claesen, Jan; Kurita, Kenji; Wieland Brown, Laura C.; Mavrommatis, Konstantinos; Pati, Amrita; Godfrey, Paul A.; Koehrsen, Michael; Clardy, Jon; Birren, Bruce W.; Takano, Eriko; Sali, Andrej; Linington, Roger G.; Fischbach, Michael A.

    2014-01-01

    Summary Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the predicted BGCs revealed large gene cluster families, the vast majority uncharacterized. We experimentally characterized the most prominent family, consisting of two subfamilies of hundreds of BGCs distributed throughout the Proteobacteria; their products are aryl polyenes, lipids with an aryl head group conjugated to a polyene tail. We identified a distant relationship to a third subfamily of aryl polyene BGCs, and together the three subfamilies represent the largest known family of biosynthetic gene clusters, with more than 1,000 members. Although these clusters are widely divergent in sequence, their small molecule products are remarkably conserved, indicating for the first time the important roles these compounds play in Gram-negative cell biology. PMID:25036635

  8. Functional Interference Clusters in Cancer Patients With Bone Metastases: A Secondary Analysis of RTOG 9714

    SciTech Connect

    Chow, Edward; James, Jennifer; Barsevick, Andrea; Hartsell, William; Ratcliffe, Sarah; Scarantino, Charles; Ivker, Robert; Roach, Mack; Suh, John; Petersen, Ivy; Konski, Andre; Demas, William; Bruner, Deborah

    2010-04-15

    Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up. Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.

  9. Preliminary Cluster Analysis For Several Representatives Of Genus Kerivoula (Chiroptera: Vespertilionidae) in Borneo

    NASA Astrophysics Data System (ADS)

    Hasan, Noor Haliza; Abdullah, M. T.

    2008-01-01

    The aim of the study is to use cluster analysis on morphometric parameters within the genus Kerivoula to produce a dendrogram and to determine the suitability of this method to describe the relationship among species within this genus. A total of 15 adult male individuals from genus Kerivoula taken from sampling trips around Borneo and specimens kept at the zoological museum of Universiti Malaysia Sarawak were examined. A total of 27 characters using dental, skull and external body measurements were recorded. Clustering analysis illustrated the grouping and morphometric relationships between the species of this genus. It has clearly separated each species from each other despite the overlapping of measurements of some species within the genus. Cluster analysis provides an alternative approach to make a preliminary identification of a species.

  10. Weak lensing analysis of the galaxy cluster RXJ1117.4+0743 ([VMF98]097)

    NASA Astrophysics Data System (ADS)

    Gonzalez, E. J.; Domínguez, M.; García Lambas, D.; Moreschi, O.; Foex, G.; Nilo Castellon, J. L.; Alonso, M. V.

    We present a weak lensing analysis of the galaxy cluster RXJ1117.4+0743 ([VMF98]097) at ; based on data collected with Gemini South Telescope. The cluster was formerly analyzed by Carrasco et al. (2007; ApJ; 664; 777); and they found a large discrepancy between the mass estimated from X-ray observations and lensing estimates; exceeding the lensing mass by more than a factor three. Our result for the mass from the weak lensing analysis is lower than the mass obtained by Carrasco et al. and closer to the X-ray mass.

  11. Cluster analysis and relative relocation of mining-induced seismicity using HAMNET data

    NASA Astrophysics Data System (ADS)

    Wehling-Benatelli, S.; Becker, D.; Bischoff, M.; Friederich, W.; Meier, T.

    2012-04-01

    Longwall mining activity in the Ruhr-coal mining district leads to mining-induced seismicity. For detailed studies seismicity of the single longwall panel S 109 beneath Hamm-Herringen in the eastern Ruhr area was monitored between June 2006 and July 2007. More than 7000 seismic events with magnitudes -1.7 ≤ ML ≤ 2.0 are localized in this period. 70% of the events occur in the vicinity of the moving longwall face. Moreover, the seismicity pattern shows spatial clustering of events in distances up to 500 m from the panel which is related to remnant pillars of old workings and tectonic features. Two sources with common location and rock failure mechanism are expected to show identical waveforms. Hence, similar waveforms suggest similarity of source properties. Waveform similarity can be quantified by cross-correlation. Similarity matrices have been established and build the basis of a cluster analysis presented here. We compare two approaches for cluster definition: a single-linkage approach and excerpting clusters by visual inspection of the sorted similarity matrices. Clusters are found as areas of high inter-event similarity in the depicted matrix. In contrast, the single-linkage approach assigns an event to the cluster if the similarity threshold v sl = 0.9 is exceeded to at least one other member. This method is more restrictive and, in general, leads to clusters with less members than visual inspection. Both methods exhibit clusters which show the same properties. The largest clusters are built by low-magnitude events (around ML ≈-0.6) directly at the longwall face at the mining level. Other clusters include events with magnitudes as large as ML,max = 1.8. Their locations tend to lie above or below the mining level in load-bearing sandstone layers. Mining accompanying events show face-parallel near vertical fault planes whereas more distant clusters have typical solutions of remnant pillar failure with a medium dip angle. Relative relocation of the events

  12. Earthquake Cluster Analysis for Turkey and its Application for Seismic Hazard Assessment

    NASA Astrophysics Data System (ADS)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2015-04-01

    Earthquake clusters are an important element in general seismology and also for the application in seismic hazard assessment. In probabilistic seismic hazard assessment, the occurrence of earthquakes is often linked to an independent Monte Carlo process, following a stationary Poisson model. But earthquakes are dependent and constrained, especially in terms of earthquake swarms, fore- and aftershocks or even larger sequences as observed for the Landers sequence in California or the Darfield-Christchurch sequence in New Zealand. For earthquake catalogues, the element of declustering is an important step to capture earthquake frequencies by avoiding a bias towards small magnitudes due to aftershocks. On the other hand, declustered catalogues for independent probabilistic seismic activity will underestimate the total number of earthquakes by neglecting dependent seismicity. In this study, the effect of clusters on probabilistic seismic hazard assessment is investigated in detail. To capture the features of earthquake clusters, a uniform framework for earthquake cluster analysis is introduced using methodologies of geostatistics and machine learning. These features represent important cluster characteristics like cluster b-values, temporal decay, rupture orientations and many more. Cluster parameters are mapped in space using kriging. Furthermore, a detailed data analysis is undertaken to provide magnitude-dependent relations for various cluster parameters. The acquired features are used to introduce dependent seismicity within stochastic earthquake catalogues. In addition, the development of smooth seismicity maps based on historic databases is in general biased to the more complete recent decades. A filling methodology is introduced which will add dependent seismicity in catalogues where none has been recorded to avoid the above mentioned bias. As a case study, Turkey has been chosen due to its inherent seismic activity and well-recorded data coverage. Clustering

  13. Dual mode use requirements analysis for the institutional cluster.

    SciTech Connect

    Leland, Robert W.

    2003-09-01

    This paper analyzes what additional costs would be incurred in supporting dual-mode, i.e. both classified and unclassified use of the Institutional Computing (IC) hardware. The following five options are considered: periods processing in which a fraction of the system alternates in time between classified and unclassified modes, static split in which the system is constructed as a set of smaller clusters which remain in one mode or the other, re-configurable split in which the system is constructed in a split fashion but a mechanism is provided to reconfigure it very infrequently, red/black switching in which a mechanism is provided to switch sections of the system between modes frequently, and complementary operation in which parts of the system are operated entirely in one mode at one geographical site and entirely in the other mode at the other geographical site and other systems are repartitioned to balance work load. These options are evaluated against eleven criteria such as disk storage costs, distance computing costs, reductions in capability and capacity as a result of various factors etc. The evaluation is both qualitative and quantitative, and is captured in various summary tables.

  14. Differentiating Procrastinators from Each Other: A Cluster Analysis.

    PubMed

    Rozental, Alexander; Forsell, Erik; Svensson, Andreas; Forsström, David; Andersson, Gerhard; Carlbring, Per

    2015-01-01

    Procrastination refers to the tendency to postpone the initiation and completion of a given course of action. Approximately one-fifth of the adult population and half of the student population perceive themselves as being severe and chronic procrastinators. Albeit not a psychiatric diagnosis, procrastination has been shown to be associated with increased stress and anxiety, exacerbation of illness, and poorer performance in school and work. However, despite being severely debilitating, little is known about the population of procrastinators in terms of possible subgroups, and previous research has mainly investigated procrastination among university students. The current study examined data from a screening process recruiting participants to a randomized controlled trial of Internet-based cognitive behavior therapy for procrastination (Rozental et al., in press). In total, 710 treatment-seeking individuals completed self-report measures of procrastination, depression, anxiety, and quality of life. The results suggest that there might exist five separate subgroups, or clusters, of procrastinators: "Mild procrastinators" (24.93%), "Average procrastinators" (27.89%), "Well-adjusted procrastinators" (13.94%), "Severe procrastinators" (21.69%), and "Primarily depressed" (11.55%). Hence, there seems to be marked differences among procrastinators in terms of levels of severity, as well as a possible subgroup for which procrastinatory problems are primarily related to depression. Tailoring the treatment interventions to the specific procrastination profile of the individual could thus become important, as well as screening for comorbid psychiatric diagnoses in order to target difficulties associated with, for instance, depression. PMID:26178164

  15. Cluster system using fiber channel as an interconnection network analysis

    NASA Astrophysics Data System (ADS)

    Yang, Yi; Cao, Mingcui; Luo, Zhixiang

    2005-02-01

    In the parallel processing system, large numbers of processors are interconnected in order to improve the performance of the computer, such as the symmetric multiprocessor (SMP) architecture. When the basic node is an SMP or a computer having a single processor, the characteristics of an interconnection networks are important factors which influence the performance of the entire system. Fibre Channel (FC) has a lot advantages, such as excellent scalability; the bandwidth is large; delay time is short and fault tolerance is large. It is assumed that an SMP is used for a basic node. We construct the cluster system using FC as interconnection network, which are a fabric method and a FC Arbitrated Loop (FC-AL) method. According the method, if the number of nodes supported by the interconnection network is small, the addition of extra nodes can be added at small expense. The bandwidth of each node is large, the delay time is short, and the fault tolerance effect is large in the interconnection network. In the case of connecting to a shared disk, a large bandwidth is provided and time required for gaining access to the shared disk becomes short.

  16. DEVELOPMENT OF A LONG ISLAND SOUND-SPECIFIC WATER QUALITY INDEX USING CLUSTER ANALYSIS AND DISCRIMINANT ANALYSIS

    EPA Science Inventory

    The objective of this project is to develop a Long Island Sound-specific water quality index. The water quality index will be computed using multivariate cluster analysis and discriminant analysis of a set of individual water quality indicators. A numerical water quality index (a...

  17. Tracking undergraduate student achievement in a first-year physiology course using a cluster analysis approach.

    PubMed

    Brown, S J; White, S; Power, N

    2015-12-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group students into high-achieving (HA) and low-achieving (LA) clusters and to determine the ability of each summative assessment task to discriminate between HA and LA students. The two clusters identified in each semester were described as HA (n = 42) and LA (n = 115) in semester 1 (HA1 and LA1, respectively) and HA (n = 91) and LA (n = 42) in semester 2 (HA2 and LA2, respectively). In both semesters, HA and LA means for all inputs were different (all P < 0.001). Nineteen students moved from the HA1 group into the LA2 group, whereas 68 students moved from the LA1 group into the HA2 group. The overall order of importance of inputs that determined group membership was different in semester 1 compared with semester 2; in addition, the within-cluster order of importance in LA groups was different compared with HA groups. This method of analysis may 1) identify students who need extra instruction, 2) identify which assessment is more effective in discriminating between HA and LA students, and 3) provide quantitative evidence to track student achievement. PMID:26628649

  18. Adults' Physical Activity Patterns across Life Domains: Cluster Analysis with Replication

    PubMed Central

    Rovniak, Liza S.; Sallis, James F.; Saelens, Brian E.; Frank, Lawrence D.; Marshall, Simon J.; Norman, Gregory J.; Conway, Terry L.; Cain, Kelli L.; Hovell, Melbourne F.

    2010-01-01

    Objective Identifying adults' physical activity patterns across multiple life domains could inform the design of interventions and policies. Design Cluster analysis was conducted with adults in two US regions (Baltimore-Washington DC, n = 702; Seattle-King County, n = 987) to identify different physical activity patterns based on adults' reported physical activity across four life domains: leisure, occupation, transport, and home. Objectively measured physical activity, and psychosocial and built (physical) environment characteristics of activity patterns were examined. Main Outcome Measures Accelerometer-measured activity, reported domain-specific activity, psychosocial characteristics, built environment, body mass index (BMI). Results Three clusters replicated (kappa = .90-.93) across both regions: Low Activity, Active Leisure, and Active Job. The Low Activity and Active Leisure adults were demographically similar, but Active Leisure adults had the highest psychosocial and built environment support for activity, highest accelerometer-measured activity, and lowest BMI. Compared to the other clusters, the Active Job cluster had lower socioeconomic status and intermediate accelerometer-measured activity. Conclusion Adults can be clustered into groups based on their patterns of accumulating physical activity across life domains. Differences in psychosocial and built environment support between the identified clusters suggest that tailored interventions for different subgroups may be beneficial. PMID:20836604

  19. Comprehensive Behavioral Analysis of Cluster of Differentiation 47 Knockout Mice

    PubMed Central

    Koshimizu, Hisatsugu; Takao, Keizo; Matozaki, Takashi; Ohnishi, Hiroshi; Miyakawa, Tsuyoshi

    2014-01-01

    Cluster of differentiation 47 (CD47) is a member of the immunoglobulin superfamily which functions as a ligand for the extracellular region of signal regulatory protein α (SIRPα), a protein which is abundantly expressed in the brain. Previous studies, including ours, have demonstrated that both CD47 and SIRPα fulfill various functions in the central nervous system (CNS), such as the modulation of synaptic transmission and neuronal cell survival. We previously reported that CD47 is involved in the regulation of depression-like behavior of mice in the forced swim test through its modulation of tyrosine phosphorylation of SIRPα. However, other potential behavioral functions of CD47 remain largely unknown. In this study, in an effort to further investigate functional roles of CD47 in the CNS, CD47 knockout (KO) mice and their wild-type littermates were subjected to a battery of behavioral tests. CD47 KO mice displayed decreased prepulse inhibition, while the startle response did not differ between genotypes. The mutants exhibited slightly but significantly decreased sociability and social novelty preference in Crawley’s three-chamber social approach test, whereas in social interaction tests in which experimental and stimulus mice have direct contact with each other in a freely moving setting in a novel environment or home cage, there were no significant differences between the genotypes. While previous studies suggested that CD47 regulates fear memory in the inhibitory avoidance test in rodents, our CD47 KO mice exhibited normal fear and spatial memory in the fear conditioning and the Barnes maze tests, respectively. These findings suggest that CD47 is potentially involved in the regulation of sensorimotor gating and social behavior in mice. PMID:24586890

  20. Cluster-span threshold: An unbiased threshold for binarising weighted complete networks in functional connectivity analysis.

    PubMed

    Smith, Keith; Azami, Hamed; Parra, Mario A; Starr, John M; Escudero, Javier

    2015-08-01

    We propose a new unbiased threshold for network analysis named the Cluster-Span Threshold (CST). This is based on the clustering coefficient, C, following logic that a balance of `clustering' to `spanning' triples results in a useful topology for network analysis and that the product of complementing properties has a unique value only when perfectly balanced. We threshold networks by fixing C at this balanced value, rather than fixing connection density at an arbitrary value, as has been the trend. We compare results from an electroencephalogram data set of volunteers performing visual short term memory tasks of the CST alongside other thresholds, including maximum spanning trees. We find that the CST holds as a sensitive threshold for distinguishing differences in the functional connectivity between tasks. This provides a sensitive and objective method for setting a threshold on weighted complete networks which may prove influential on the future of functional connectivity research. PMID:26736883

  1. Functional analysis of the upstream regulatory region of chicken miR-17-92 cluster.

    PubMed

    Min, Cheng; Wenjian, Zhang; Tianyu, Xing; Xiaohong, Yan; Yumao, Li; Hui, Li; Ning, Wang

    2016-08-01

    miR-17-92 cluster plays important roles in cell proliferation, differentiation, apoptosis, animal development and tumorigenesis. The transcriptional regulation of miR-17-92 cluster has been extensively studied in mammals, but not in birds. To date, avian miR-17-92 cluster genomic structure has not been fully determined. The promoter location and sequence of miR-17-92 cluster have not been determined, due to the existence of a genomic gap sequence upstream of miR-17-92 cluster in all the birds whose genomes have been sequenced. In this study, genome walking was used to close the genomic gap upstream of chicken miR-17-92 cluster. In addition, bioinformatics analysis, reporter gene assay and truncation mutagenesis were used to investigate functional role of the genomic gap sequence. Genome walking analysis showed that the gap region was 1704 bp long, and its GC content was 80.11%. Bioinformatics analysis showed that in the gap region, there was a 200 bp conserved sequence among the tested 10 species (Gallus gallus, Homo sapiens, Pan troglodytes, Bos taurus, Sus scrofa, Rattus norvegicus, Mus musculus, Possum, Danio rerio, Rana nigromaculata), which is core promoter region of mammalian miR-17-92 host gene (MIR17HG). Promoter luciferase reporter gene vector of the gap region was constructed and reporter assay was performed. The result showed that the promoter activity of pGL3-cMIR17HG (-4228/-2506) was 417 times than that of negative control (empty pGL3 basic vector), suggesting that chicken miR-17-92 cluster promoter exists in the gap region. To further gain insight into the promoter structure, two different truncations for the cloned gap sequence were generated by PCR. One had a truncation of 448 bp at the 5'-end and the other had a truncation of 894 bp at the 3'-end. Further reporter analysis showed that compared with the promoter activity of pGL3-cMIR17HG (-4228/-2506), the reporter activities of the 5'-end truncation and the 3'-end truncation were reduced by 19

  2. Cluster Analysis of Velocity Field Derived from Dense GNSS Network of Japan

    NASA Astrophysics Data System (ADS)

    Takahashi, A.; Hashimoto, M.

    2015-12-01

    Dense GNSS networks have been widely used to observe crustal deformation. Simpson et al. (2012) and Savage and Simpson (2013) have conducted cluster analyses of GNSS velocity field in the San Francisco Bay Area and Mojave Desert, respectively. They have successfully found velocity discontinuities. They also showed an advantage of cluster analysis for classifying GNSS velocity field. Since in western United States, strike-slip events are dominant, geometry is simple. However, the Japanese Islands are tectonically complicated due to subduction of oceanic plates. There are many types of crustal deformation such as slow slip event and large postseismic deformation. We propose a modified clustering method of GNSS velocity field in Japan to separate time variant and static crustal deformation. Our modification is performing cluster analysis every several months or years, then qualifying cluster member similarity. If a GNSS station moved differently from its neighboring GNSS stations, the station will not belong to in the cluster which includes its surrounding stations. With this method, time variant phenomena were distinguished. We applied our method to GNSS data of Japan from 1996 to 2015. According to the analyses, following conclusions were derived. The first is the clusters boundaries are consistent with known active faults. For examples, the Arima-Takatsuki-Hanaore fault system and the Shimane-Tottori segment proposed by Nishimura (2015) are recognized, though without using prior information. The second is improving detectability of time variable phenomena, such as a slow slip event in northern part of Hokkaido region detected by Ohzono et al. (2015). The last one is the classification of postseismic deformation caused by large earthquakes. The result suggested velocity discontinuities in postseismic deformation of the Tohoku-oki earthquake. This result implies that postseismic deformation is not continuously decaying proportional to distance from its epicenter.

  3. A population-based analysis of clustering identifies a strong genetic contribution to lethal prostate cancer

    PubMed Central

    Nelson, Quentin; Agarwal, Neeraj; Stephenson, Robert; Cannon-Albright, Lisa A.

    2013-01-01

    Background: Prostate cancer is a common and often deadly cancer. Decades of study have yet to identify genes that explain much familial prostate cancer. Traditional linkage analysis of pedigrees has yielded results that are rarely validated. We hypothesize that there are rare segregating variants responsible for high-risk prostate cancer pedigrees, but recognize that within-pedigree heterogeneity is responsible for significant noise that overwhelms signal. Here we introduce a method to identify homogeneous subsets of prostate cancer, based on cancer characteristics, which show the best evidence for an inherited contribution. Methods: We have modified an existing method, the Genealogical Index of Familiality (GIF) used to show evidence for significant familial clustering. The modification allows a test for excess familial clustering of a subset of prostate cancer cases when compared to all prostate cancer cases. Results: Consideration of the familial clustering of eight clinical subsets of prostate cancer cases compared to the expected familial clustering of all prostate cancer cases identified three subsets of prostate cancer cases with evidence for familial clustering significantly in excess of expected. These subsets include prostate cancer cases diagnosed before age 50 years, prostate cancer cases with body mass index (BMI) greater than or equal to 30, and prostate cancer cases for whom prostate cancer contributed to death. Conclusions: This analysis identified several subsets of prostate cancer cases that cluster significantly more than expected when compared to all prostate cancer familial clustering. A focus on high-risk prostate cancer cases or pedigrees with these characteristics will reduce noise and could allow identification of the rare predisposition genes or variants responsible. PMID:23970893

  4. Analysis of Helium Cluster Dynamics near Grain Boundaries of Plasma-Exposed Tungsten

    NASA Astrophysics Data System (ADS)

    Hu, Lin; Hammond, Karl; Wirth, Brian; Maroudas, Dimitrios

    2015-11-01

    We report results of a systematic atomic-scale analysis of the kinetics of small mobile helium clusters near a model symmetric tilt grain boundary (GB) in tungsten (W). The small mobile helium clusters migrate toward the GB region by Fickian diffusion and drift due to an elastic interaction force that drives GB segregation. As the clusters migrate toward the GB, trap mutation (TM) reactions are activated at rates higher than those away from the GB and are the dominant kinetic processes for 4-member and larger mobile helium clusters. Each TM reaction produces a W interstitial atom on the GB, in the form of an extended interstitial configuration, and an immobile helium-vacancy complex with the W vacancy located at a short distance from the GB. These reactions are identified and characterized in detail based on analysis of a large number of molecular-dynamics trajectories. The mobility of the extended W interstitial on the GB depends on the location of the helium-vacancy complex. The identified cluster reactions are responsible for important structural, morphological, and compositional features in plasma-exposed tungsten.

  5. An algol program for dissimilarity analysis: a divisive-omnithetic clustering technique

    USGS Publications Warehouse

    Tipper, J.C.

    1979-01-01

    Clustering techniques are used properly to generate hypotheses about patterns in data. Of the hierarchical techniques, those which are divisive and omnithetic possess many theoretically optimal properties. One such method, dissimilarity analysis, is implemented here in ALGOL 60, and determined to be competitive computationally with most other methods. ?? 1979.

  6. Who Are Our Students? Cluster Analysis as a Tool for Understanding Community College Student Populations

    ERIC Educational Resources Information Center

    Ammon, Bridget V.; Bowman, Jamillah; Mourad, Roger

    2008-01-01

    This study showcases cluster analysis as a useful tool for those who seek to understand the types of students their community colleges serve. Although educational goal, academic program, and demographics are often used as descriptive variables, it is unclear which, if any, of these are the best way to classify community college students. Cluster…

  7. Student Motivational Profiles in an Introductory MIS Course: An Exploratory Cluster Analysis

    ERIC Educational Resources Information Center

    Nelson, Klara

    2014-01-01

    This study profiles students in an introductory MIS course according to a variety of variables associated with choice of academic major. The data were collected through a survey administered to 12 sections of the course. A two-step cluster analysis was performed with gender as a categorical variable and students' perceptions of task value…

  8. 2 x 2 Achievement Goals and Achievement Emotions: A Cluster Analysis of Students' Motivation

    ERIC Educational Resources Information Center

    Jang, Leong Yeok; Liu, Woon Chia

    2012-01-01

    This study sought to better understand the adoption of multiple achievement goals at an intra-individual level, and its links to emotional well-being, learning, and academic achievement. Participants were 480 Secondary Two students (aged between 13 and 14 years) from two coeducational government schools. Hierarchical cluster analysis revealed the…

  9. Exploring the Relationship between Autism Spectrum Disorder and Epilepsy Using Latent Class Cluster Analysis

    ERIC Educational Resources Information Center

    Cuccaro, Michael L.; Tuchman, Roberto F.; Hamilton, Kara L.; Wright, Harry H.; Abramson, Ruth K.; Haines, Jonathan L.; Gilbert, John R.; Pericak-Vance, Margaret

    2012-01-01

    Epilepsy co-occurs frequently in autism spectrum disorders (ASD). Understanding this co-occurrence requires a better understanding of the ASD-epilepsy phenotype (or phenotypes). To address this, we conducted latent class cluster analysis (LCCA) on an ASD dataset (N = 577) which included 64 individuals with epilepsy. We identified a 5-cluster…

  10. Cluster Analysis of Assessment in Anatomy and Physiology for Health Science Undergraduates

    ERIC Educational Resources Information Center

    Brown, Stephen; White, Sue; Power, Nicola

    2016-01-01

    Academic content common to health science programs is often taught to a mixed group of students; however, content assessment may be consistent for each discipline. This study used a retrospective cluster analysis on such a group, first to identify high and low achieving students, and second, to determine the distribution of students within…

  11. A Cluster Analysis of the Circumstances of Death in Suicides in Hong Kong

    ERIC Educational Resources Information Center

    Chen, Eric Y. H.; Chan, Wincy S. C.; Chan, Sandra S. M.; Liu, Ka Y.; Chan, Cecilia L. W.; Wong, Paul W. C.; Law, Y. W.; Yip, Paul S. F.

    2007-01-01

    Classification of suicides is essential for clinicians to better identify self-harm patients with future suicidal risks. This study examined potential subtypes of suicide in a psychological autopsy sample (N = 148) in Hong Kong. Hierarchical cluster analysis extracted two subgroups of subjects in terms of expressed deliberation assessed by the…

  12. Profiles of More and Less Successful L2 Learners: A Cluster Analysis Study

    ERIC Educational Resources Information Center

    Sparks, Richard L.; Patton, Jon; Ganschow, Leonore

    2012-01-01

    This retrospective study examined L1 achievement, intelligence, L2 aptitude, and L2 proficiency profiles of 208 students completing two years of high school L2 courses. A cluster analysis was performed to determine whether distinct cognitive and achievement profiles of more and less successful L2 learners would emerge. The results of…

  13. Multiscale deep drawing analysis of dual-phase steels using grain cluster-based RGC scheme

    NASA Astrophysics Data System (ADS)

    Tjahjanto, D. D.; Eisenlohr, P.; Roters, F.

    2015-06-01

    Multiscale modelling and simulation play an important role in sheet metal forming analysis, since the overall material responses at macroscopic engineering scales, e.g. formability and anisotropy, are strongly influenced by microstructural properties, such as grain size and crystal orientations (texture). In the present report, multiscale analysis on deep drawing of dual-phase steels is performed using an efficient grain cluster-based homogenization scheme. The homogenization scheme, called relaxed grain cluster (RGC), is based on a generalization of the grain cluster concept, where a (representative) volume element consists of p  ×  q  ×  r (hexahedral) grains. In this scheme, variation of the strain or deformation of individual grains is taken into account through the, so-called, interface relaxation, which is formulated within an energy minimization framework. An interfacial penalty term is introduced into the energy minimization framework in order to account for the effects of grain boundaries. The grain cluster-based homogenization scheme has been implemented and incorporated into the advanced material simulation platform DAMASK, which purposes to bridge the macroscale boundary value problems associated with deep drawing analysis to the micromechanical constitutive law, e.g. crystal plasticity model. Standard Lankford anisotropy tests are performed to validate the model parameters prior to the deep drawing analysis. Model predictions for the deep drawing simulations are analyzed and compared to the corresponding experimental data. The result shows that the predictions of the model are in a very good agreement with the experimental measurement.

  14. Spectral analysis of A and F dwarf members of the open cluster M6: preliminary results

    NASA Astrophysics Data System (ADS)

    Kılıçoǧlu, T.; Monier, R.; Fossati, L.

    2010-12-01

    We present the first abundance analysis of CD-32 13109 (NGC 6405 47), member of the M6 open cluster. The photospheric abundances of 14 chemical elements were determined by comparing synthetic spectra and observed spectra of the star. Findings show that this star should be an Am star.

  15. Fuzzy Clustering Analysis in Environmental Impact Assessment--A Complement Tool to Environmental Quality Index.

    ERIC Educational Resources Information Center

    Kung, Hsiang-Te; And Others

    1993-01-01

    In spite of rapid progress achieved in the methodological research underlying environmental impact assessment (EIA), the problem of weighting various parameters has not yet been solved. This paper presents a new approach, fuzzy clustering analysis, which is illustrated with an EIA case study on Baoshan-Wusong District in Shanghai, China. (Author)

  16. Epidemiological and viral genomic sequence analysis of the 2014 ebola outbreak reveals clustered transmission.

    PubMed

    Scarpino, Samuel V; Iamarino, Atila; Wells, Chad; Yamin, Dan; Ndeffo-Mbah, Martial; Wenzel, Natasha S; Fox, Spencer J; Nyenswah, Tolbert; Altice, Frederick L; Galvani, Alison P; Meyers, Lauren Ancel; Townsend, Jeffrey P

    2015-04-01

    Using Ebolavirus genomic and epidemiological data, we conducted the first joint analysis in which both data types were used to fit dynamic transmission models for an ongoing outbreak. Our results indicate that transmission is clustered, highlighting a potential bias in medical demand forecasts, and provide the first empirical estimate of underreporting. PMID:25516185

  17. Clustered Stomates in "Begonia": An Exercise in Data Collection & Statistical Analysis of Biological Space

    ERIC Educational Resources Information Center

    Lau, Joann M.; Korn, Robert W.

    2007-01-01

    In this article, the authors present a laboratory exercise in data collection and statistical analysis in biological space using clustered stomates on leaves of "Begonia" plants. The exercise can be done in middle school classes by students making their own slides and seeing imprints of cells, or at the high school level through collecting data of…

  18. ClusCo: clustering and comparison of protein models

    PubMed Central

    2013-01-01

    Background The development, optimization and validation of protein modeling methods require efficient tools for structural comparison. Frequently, a large number of models need to be compared with the target native structure. The main reason for the development of Clusco software was to create a high-throughput tool for all-versus-all comparison, because calculating similarity matrix is the one of the bottlenecks in the protein modeling pipeline. Results Clusco is fast and easy-to-use software for high-throughput comparison of protein models with different similarity measures (cRMSD, dRMSD, GDT_TS, TM-Score, MaxSub, Contact Map Overlap) and clustering of the comparison results with standard methods: K-means Clustering or Hierarchical Agglomerative Clustering. Conclusions The application was highly optimized and written in C/C++, including the code for parallel execution on CPU and GPU, which resulted in a significant speedup over similar clustering and scoring computation programs. PMID:23433004

  19. The Clusters AgeS Experiment (CASE). VI. Analysis of Two Detached Eclipsing Binaries in the Globular Cluster M55

    NASA Astrophysics Data System (ADS)

    Kaluzny, J.; Thompson, I. B.; Dotter, A.; Rozyczka, M.; Pych, W.; Rucinski, S. M.; Burley, G. S.

    2014-03-01

    We present an analysis of the detached eclipsing binaries V44 and V54 belonging to the globular cluster M55. For V54 we obtain the following absolute parameters: Mp=0.726±0.015 Msun, Rp=1.006± 0.009 Rsun, Lp=1.38±0.07 Lsun for the primary, and Ms=0.555± 0.008 Msun, Rs=0.528±0.005 Rsun, Ls=0.16±0.01 Lsun for the secondary. The age and apparent distance modulus of V54 are estimated at 13.3-14.7 Gyr and 13.94±0.05 mag, respectively. This derived age is substantially larger than ages we have derived from the analysis of binary systems in 47 Tuc and M4. The secondary of V44 is so weak in the optical domain that only mass function and relative parameters are obtained for the components of this system. However, there is a good chance that the velocity curve of the secondary could be derived from near-IR spectra. As the primary of V44 is more evolved than that of V54, such data would impose much tighter limits on the age and distance of M55.

  20. The Feasibility of Using Cluster Analysis to Examine Log Data from Educational Video Games. CRESST Report 790

    ERIC Educational Resources Information Center

    Kerr, Deirdre; Chung, Gregory K. W. K.; Iseli, Markus R.

    2011-01-01

    Analyzing log data from educational video games has proven to be a challenging endeavor. In this paper, we examine the feasibility of using cluster analysis to extract information from the log files that is interpretable in both the context of the game and the context of the subject area. If cluster analysis can be used to identify patterns of…

  1. A New Classification of Diabetic Gait Pattern Based on Cluster Analysis of Biomechanical Data

    PubMed Central

    Sawacha, Zimi; Guarneri, Gabriella; Avogaro, Angelo; Cobelli, Claudio

    2010-01-01

    Background The diabetic foot, one of the most serious complications of diabetes mellitus and a major risk factor for plantar ulceration, is determined mainly by peripheral neuropathy. Neuropathic patients exhibit decreased stability while standing as well as during dynamic conditions. A new methodology for diabetic gait pattern classification based on cluster analysis has been proposed that aims to identify groups of subjects with similar patterns of gait and verify if three-dimensional gait data are able to distinguish diabetic gait patterns from one of the control subjects. Method The gait of 20 nondiabetic individuals and 46 diabetes patients with and without peripheral neuropathy was analyzed [mean age 59.0 (2.9) and 61.1(4.4) years, mean body mass index (BMI) 24.0 (2.8), and 26.3 (2.0)]. K-means cluster analysis was applied to classify the subjects' gait patterns through the analysis of their ground reaction forces, joints and segments (trunk, hip, knee, ankle) angles, and moments. Results Cluster analysis classification led to definition of four well-separated clusters: one aggregating just neuropathic subjects, one aggregating both neuropathics and non-neuropathics, one including only diabetes patients, and one including either controls or diabetic and neuropathic subjects. Conclusions Cluster analysis was useful in grouping subjects with similar gait patterns and provided evidence that there were subgroups that might otherwise not be observed if a group ensemble was presented for any specific variable. In particular, we observed the presence of neuropathic subjects with a gait similar to the controls and diabetes patients with a long disease duration with a gait as altered as the neuropathic one. PMID:20920432

  2. Clustering binary fingerprint vectors with missing values for DNA array data analysis.

    PubMed

    Figueroa, Andres; Borneman, James; Jiang, Tao

    2004-01-01

    Oligonucleotide fingerprinting is a powerful DNA array-based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our preliminary experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes. PMID:15700408

  3. Clustering binary fingerprint vectors with missing values for DNA array data analysis.

    PubMed

    Figueroa, Andres; Borneman, James; Jiang, Tao

    2003-01-01

    Oligonucleotide fingerprinting is a powerful DNA array based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version, and present an efficient greedy algorithm based on MINIMUM CLIQUE PARTITION on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values, in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes. PMID:16452777

  4. Fault Reactivation Analysis Using Microearthquake Clustering Based on Signal-to-Noise Weighted Waveform Similarity

    NASA Astrophysics Data System (ADS)

    Grund, Michael; Groos, Jörn C.; Ritter, Joachim R. R.

    2016-04-01

    The cluster formation of about 2000 induced microearthquakes (mostly M L < 2) is studied using a waveform similarity technique based on cross-correlation and a subsequent equivalence class approach. All events were detected within two separated but neighbouring seismic volumes close to the geothermal powerplants near Landau and Insheim in the Upper Rhine Graben, SW Germany between 2006 and 2013. Besides different sensors, sampling rates and individual data gaps, mainly low signal-to-noise ratios (SNR) of the recordings at most station sites provide a complication for the determination of a precise waveform similarity analysis of the microseismic events in this area. To include a large number of events for such an analysis, a newly developed weighting approach was implemented in the waveform similarity analysis which directly considers the individual SNRs across the whole seismic network. The application to both seismic volumes leads to event clusters with high waveform similarities within short (seconds to hours) and long (months to years) time periods covering two magnitude ranges. The estimated relative hypocenter locations are spatially concentrated for each single cluster and mirror the orientations of mapped faults as well as interpreted rupture planes determined from fault plane solutions. Depending on the waveform cross-correlation coefficient threshold, clusters can be resolved in space to as little as one dominant wavelength. The interpretation of these observations implies recurring fault reactivations by fluid injection with very similar faulting mechanisms during different time periods between 2006 and 2013.

  5. Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

    NASA Technical Reports Server (NTRS)

    Ballew, G.

    1977-01-01

    The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

  6. Fault Reactivation Analysis Using Microearthquake Clustering Based on Signal-to-Noise Weighted Waveform Similarity

    NASA Astrophysics Data System (ADS)

    Grund, Michael; Groos, Jörn C.; Ritter, Joachim R. R.

    2016-07-01

    The cluster formation of about 2000 induced microearthquakes (mostly M L < 2) is studied using a waveform similarity technique based on cross-correlation and a subsequent equivalence class approach. All events were detected within two separated but neighbouring seismic volumes close to the geothermal powerplants near Landau and Insheim in the Upper Rhine Graben, SW Germany between 2006 and 2013. Besides different sensors, sampling rates and individual data gaps, mainly low signal-to-noise ratios (SNR) of the recordings at most station sites provide a complication for the determination of a precise waveform similarity analysis of the microseismic events in this area. To include a large number of events for such an analysis, a newly developed weighting approach was implemented in the waveform similarity analysis which directly considers the individual SNRs across the whole seismic network. The application to both seismic volumes leads to event clusters with high waveform similarities within short (seconds to hours) and long (months to years) time periods covering two magnitude ranges. The estimated relative hypocenter locations are spatially concentrated for each single cluster and mirror the orientations of mapped faults as well as interpreted rupture planes determined from fault plane solutions. Depending on the waveform cross-correlation coefficient threshold, clusters can be resolved in space to as little as one dominant wavelength. The interpretation of these observations implies recurring fault reactivations by fluid injection with very similar faulting mechanisms during different time periods between 2006 and 2013.

  7. Analysis of a continuous-variable quadripartite cluster state from a single optical parametric oscillator

    SciTech Connect

    Midgley, S. L. W.; Olsen, M. K.; Bradley, A. S.; Pfister, O.

    2010-11-15

    We examine the feasibility of generating continuous-variable multipartite entanglement in an intracavity concurrent downconversion scheme that has been proposed for the generation of cluster states by Menicucci et al. [Phys. Rev. Lett. 101, 130501 (2008)]. By calculating optimized versions of the van Loock-Furusawa correlations we demonstrate genuine quadripartite entanglement and investigate the degree of entanglement present. Above the oscillation threshold the basic cluster state geometry under consideration suffers from phase diffusion. We alleviate this problem by incorporating a small injected signal into our analysis. Finally, we investigate squeezed joint operators. While the squeezed joint operators approach zero in the undepleted regime, we find that this is not the case when we consider the full interaction Hamiltonian and the presence of a cavity. In fact, we find that the decay of these operators is minimal in a cavity, and even depletion alone inhibits cluster state formation.

  8. Cluster-based analysis for personalized stress evaluation using physiological signals.

    PubMed

    Xu, Qianli; Nwe, Tin Lay; Guan, Cuntai

    2015-01-01

    Technology development in wearable sensors and biosignal processing has made it possible to detect human stress from the physiological features. However, the intersubject difference in stress responses presents a major challenge for reliable and accurate stress estimation. This research proposes a novel cluster-based analysis method to measure perceived stress using physiological signals, which accounts for the intersubject differences. The physiological data are collected when human subjects undergo a series of task-rest cycles, incurring varying levels of stress that is indicated by an index of the State Trait Anxiety Inventory. Next, a quantitative measurement of stress is developed by analyzing the physiological features in two steps: 1) a k -means clustering process to divide subjects into different categories (clusters), and 2) cluster-wise stress evaluation using the general regression neural network. Experimental results show a significant improvement in evaluation accuracy as compared to traditional methods without clustering. The proposed method is useful in developing intelligent, personalized products for human stress management. PMID:25561450

  9. The heterogeneity of headache patients who self-medicate: a cluster analysis approach.

    PubMed

    Mehuys, Els; Paemeleire, Koen; Crombez, Geert; Adriaens, Els; Van Hees, Thierry; Demarche, Sophie; Christiaens, Thierry; Van Bortel, Luc; Van Tongelen, Inge; Remon, Jean-Paul; Boussery, Koen

    2016-07-01

    Patients with headache often self-treat their condition with over-the-counter analgesics. However, overuse of analgesics can cause medication-overuse headache. The present study aimed to identify subgroups of individuals with headache who self-medicate, as this could be helpful to tailor intervention strategies for prevention of medication-overuse headache. Patients (n = 1021) were recruited from 202 community pharmacies and completed a self-administered questionnaire. A hierarchical cluster analysis was used to group patients as a function of sociodemographics, pain, disability, and medication use for pain. Three patient clusters were identified. Cluster 1 (n = 498, 48.8%) consisted of relatively young individuals, and most of them suffered from migraine. They reported the least number of other pain complaints and the lowest prevalence of medication overuse (MO; 16%). Cluster 2 (n = 301, 29.5%) included older persons with mainly non-migraine headache, a low disability, and on average pain in 2 other locations. Prevalence of MO was 40%. Cluster 3 (n = 222, 21.7%) mostly consisted of patients with migraine who also report pain in many other locations. These patients reported a high disability and a severe limitation of activities. They also showed the highest rates of MO (73%). PMID:26967695

  10. Descriptive characteristics and cluster analysis of male veteran hazardous drinkers in an alcohol moderation intervention.

    PubMed

    Walker, Robrina; Hunt, Yvonne M; Olivier, Jake; Grothe, Karen B; Dubbert, Patricia M; Burke, Randy S; Cushman, William C

    2012-01-01

    Current efforts underway to develop the fifth edition of the Diagnostic and Statistical Manual (DSM-5) have reignited discussions for classifying the substance use disorders. This study's aim was to contribute to the understanding of abusive alcohol use and its validity as a diagnosis. Cluster analysis was used to identify relatively homogeneous groups of hazardous, nondependent drinkers by using data collected from the Prevention and Treatment of Hypertension Study (PATHS), a multisite trial that examined the ability of a cognitive-behavioral-based alcohol reduction intervention, compared to a control condition, to reduce alcohol use. Participants for this study (N = 511) were male military veterans. Variables theoretically associated with alcohol use (eg, demographic, tobacco use, and mental health) were used to create the clusters and a priori, empirically based external criteria were used to assess discriminant validity. Bivariate correlations among cluster variables were generally consistent with previous findings in the literature. Analyses of internal and discriminant validity of the identified clusters were largely nonsignificant, suggesting meaningful differences between clusters could not be identified. Although the typology literature has contributed supportive validity for the alcohol dependence diagnosis, this study's results do not lend supportive validity for the construct of alcohol abuse. PMID:22691012

  11. Links between patterns of racial socialization and discrimination experiences and psychological adjustment: a cluster analysis.

    PubMed

    Ajayi, Alex A; Syed, Moin

    2014-10-01

    This study used a person-oriented analytic approach to identify meaningful patterns of barriers-focused racial socialization and perceived racial discrimination experiences in a sample of 295 late adolescents. Using cluster analysis, three distinct groups were identified: Low Barrier Socialization-Low Discrimination, High Barrier Socialization-Low Discrimination, and High Barrier Socialization-High Discrimination clusters. These groups were substantively unique in terms of the frequency of racial socialization messages about bias preparation and out-group mistrust its members received and their actual perceived discrimination experiences. Further, individuals in the High Barrier Socialization-High Discrimination cluster reported significantly higher depressive symptoms than those in the Low Barrier Socialization-Low Discrimination and High Barrier Socialization-Low Discrimination clusters. However, no differences in adjustment were observed between the Low Barrier Socialization-Low Discrimination and High Barrier Socialization-Low Discrimination clusters. Overall, the findings highlight important individual differences in how young people of color experience their race and how these differences have significant implications on psychological adjustment. PMID:25124381

  12. Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland.

    PubMed

    Sasidharan, Lekshmi; Wu, Kun-Feng; Menendez, Monica

    2015-12-01

    One of the major challenges in traffic safety analyses is the heterogeneous nature of safety data, due to the sundry factors involved in it. This heterogeneity often leads to difficulties in interpreting results and conclusions due to unrevealed relationships. Understanding the underlying relationship between injury severities and influential factors is critical for the selection of appropriate safety countermeasures. A method commonly employed to address systematic heterogeneity is to focus on any subgroup of data based on the research purpose. However, this need not ensure homogeneity in the data. In this paper, latent class cluster analysis is applied to identify homogenous subgroups for a specific crash type-pedestrian crashes. The manuscript employs data from police reported pedestrian (2009-2012) crashes in Switzerland. The analyses demonstrate that dividing pedestrian severity data into seven clusters helps in reducing the systematic heterogeneity of the data and to understand the hidden relationships between crash severity levels and socio-demographic, environmental, vehicle, temporal, traffic factors, and main reason for the crash. The pedestrian crash injury severity models were developed for the whole data and individual clusters, and were compared using receiver operating characteristics curve, for which results favored clustering. Overall, the study suggests that latent class clustered regression approach is suitable for reducing heterogeneity and revealing important hidden relationships in traffic safety analyses. PMID:26476192

  13. Molecular Clustering Interrelationships and Carbohydrate Conformation in Hull and Seeds Among Barley Cultivars

    SciTech Connect

    N Liu; P Yu

    2011-12-31

    The objective of this study was to use molecular spectral analyses with the diffuse reflectance Fourier transform infrared spectroscopy (DRIFT) bioanlytical technique to study carbohydrate conformation features, molecular clustering and interrelationships in hull and seed among six barley cultivars (AC Metcalfe, CDC Dolly, McLeod, CDC Helgason, CDC Trey, CDC Cowboy), which had different degradation kinetics in rumen. The molecular structure spectral analyses in both hull and seed involved the fingerprint regions of ca. 1536-1484 cm{sup -1} (attributed mainly to aromatic lignin semicircle ring stretch), ca. 1293-1212 cm{sup -1} (attributed mainly to cellulosic compounds in the hull), ca. 1269-1217 cm{sup -1} (attributed mainly to cellulosic compound in the seeds), and ca. 1180-800 cm{sup -1} (attributed mainly to total CHO C-O stretching vibrations) together with an agglomerative hierarchical cluster (AHCA) and principal component spectral analyses (PCA). The results showed that the DRIFT technique plus AHCA and PCA molecular analyses were able to reveal carbohydrate conformation features and identify carbohydrate molecular structure differences in both hull and seeds among the barley varieties. The carbohydrate molecular spectral analyses at the region of ca. 1185-800 cm{sup -1} together with the AHCA and PCA were able to show that the barley seed inherent structures exhibited distinguishable differences among the barley varieties. CDC Helgason had differences from AC Metcalfe, MeLeod, CDC Cowboy and CDC Dolly in carbohydrate conformation in the seed. Clear molecular cluster classes could be distinguished and identified in AHCA analysis and the separate ellipses could be grouped in PCA analysis. But CDC Helgason had no distinguished differences from CDC Trey in carbohydrate conformation. These carbohydrate conformation/structure difference could partially explain why the varieties were different in digestive behaviors in animals. The molecular spectroscopy

  14. Spatiotemporal Clustering Analysis and Risk Assessments of Human Cutaneous Anthrax in China, 2005–2012

    PubMed Central

    Qian, Quan; Haque, Ubydul; Soares Magalhaes, Ricardo J.; Li, Shen-Long; Tong, Shi-Lu; Li, Cheng-Yi; Sun, Hai-Long; Sun, Yan-Song

    2015-01-01

    Objective To investigate the epidemic characteristics of human cutaneous anthrax (CA) in China, detect the spatiotemporal clusters at the county level for preemptive public health interventions, and evaluate the differences in the epidemiological characteristics within and outside clusters. Methods CA cases reported during 2005–2012 from the national surveillance system were evaluated at the county level using space-time scan statistic. Comparative analysis of the epidemic characteristics within and outside identified clusters was performed using using the χ2 test or Kruskal-Wallis test. Results The group of 30–39 years had the highest incidence of CA, and the fatality rate increased with age, with persons ≥70 years showing a fatality rate of 4.04%. Seasonality analysis showed that most of CA cases occurred between May/June and September/October of each year. The primary spatiotemporal cluster contained 19 counties from June 2006 to May 2010, and it was mainly located straddling the borders of Sichuan, Gansu, and Qinghai provinces. In these high-risk areas, CA cases were predominantly found among younger, local, males, shepherds, who were living on agriculture and stockbreeding and characterized with high morbidity, low mortality and a shorter period from illness onset to diagnosis. Conclusion CA was geographically and persistently clustered in the Southwestern China during 2005–2012, with notable differences in the epidemic characteristics within and outside spatiotemporal clusters; this demonstrates the necessity for CA interventions such as enhanced surveillance, health education, mandatory and standard decontamination or disinfection procedures to be geographically targeted to the areas identified in this study. PMID:26208355

  15. Phenotype Clustering of Breast Epithelial Cells in Confocal Imagesbased on Nuclear Protein Distribution Analysis

    SciTech Connect

    Long, Fuhui; Peng, Hanchuan; Sudar, Damir; Levievre, Sophie A.; Knowles, David W.

    2006-09-05

    Background: The distribution of the chromatin-associatedproteins plays a key role in directing nuclear function. Previously, wedeveloped an image-based method to quantify the nuclear distributions ofproteins and showed that these distributions depended on the phenotype ofhuman mammary epithelial cells. Here we describe a method that creates ahierarchical tree of the given cell phenotypes and calculates thestatistical significance between them, based on the clustering analysisof nuclear protein distributions. Results: Nuclear distributions ofnuclear mitotic apparatus protein were previously obtained fornon-neoplastic S1 and malignant T4-2 human mammary epithelial cellscultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 andthe number of days in cultured. A probabilistic ensemble approach wasused to define a set of consensus clusters from the results of multipletraditional cluster analysis techniques applied to the nucleardistribution data. Cluster histograms were constructed to show how cellsin any one phenotype were distributed across the consensus clusters.Grouping various phenotypes allowed us to build phenotype trees andcalculate the statistical difference between each group. The resultsshowed that non-neoplastic S1 cells could be distinguished from malignantT4-2 cells with 94.19 percent accuracy; that proliferating S1 cells couldbe distinguished from differentiated S1 cells with 92.86 percentaccuracy; and showed no significant difference between the variousphenotypes of T4-2 cells corresponding to increasing tumor sizes.Conclusion: This work presents a cluster analysis method that canidentify significant cell phenotypes, based on the nuclear distributionof specific proteins, with high accuracy.

  16. Chemical analysis of giant stars in the young open cluster NGC 3114

    NASA Astrophysics Data System (ADS)

    Santrich, O. J. Katime; Pereira, C. B.; Drake, N. A.

    2013-06-01

    Context. Open clusters are very useful targets for examining possible trends in galactocentric distance and age, especially when young and old open clusters are compared. Aims: We carried out a detailed spectroscopic analysis to derive the chemical composition of seven red giants in the young open cluster NGC 3114. Abundances of C, N, O, Li, Na, Mg, Al, Ca, Si, Ti, Ni, Cr, Y, Zr, La, Ce, and Nd were obtained, as well as the carbon isotopic ratio. Methods: The atmospheric parameters of the studied stars and their chemical abundances were determined using high-resolution optical spectroscopy. We employed the local-thermodynamic-equilibrium model atmospheres of Kurucz and the spectral analysis code MOOG. The abundances of the light elements were derived using the spectral synthesis technique. Results: We found that NGC 3114 has a mean metallicity of [Fe/H] = -0.01 ± 0.03. The isochrone fit yielded a turn-off mass of 4.2 M⊙. The [N/C] ratio is in good agreement with the models predicted by first dredge-up. We found that two stars, HD 87479 and HD 304864, have high rotational velocities of 15.0 km s-1 and 11.0 km s-1; HD 87526 is a halo star and is not a member of NGC 3114. Conclusions: The carbon and nitrogen abundance in NGC 3114 agree with the field and cluster giants. The oxygen abundance in NGC 3114 is lower compared to the field giants. The [O/Fe] ratio is similar to the giants in young clusters. We detected sodium enrichment in the analyzed cluster giants. As far as the other elements are concerned, their [X/Fe] ratios follow the same trend seen in giants with the same metallicity. Based on observations made with the 2.2 m telescope at the European Southern Observatory (La Silla, Chile).Tables 2 and 5 are available in electronic form at http://www.aanda.org

  17. Cluster Method Analysis of K. S. C. Image

    NASA Technical Reports Server (NTRS)

    Rodriguez, Joe, Jr.; Desai, M.

    1997-01-01

    Information obtained from satellite-based systems has moved to the forefront as a method in the identification of many land cover types. Identification of different land features through remote sensing is an effective tool for regional and global assessment of geometric characteristics. Classification data acquired from remote sensing images have a wide variety of applications. In particular, analysis of remote sensing images have special applications in the classification of various types of vegetation. Results obtained from classification studies of a particular area or region serve towards a greater understanding of what parameters (ecological, temporal, etc.) affect the region being analyzed. In this paper, we make a distinction between both types of classification approaches although, focus is given to the unsupervised classification method using 1987 Thematic Mapped (TM) images of Kennedy Space Center.

  18. Automation of Large-scale Computer Cluster Monitoring Information Analysis

    NASA Astrophysics Data System (ADS)

    Magradze, Erekle; Nadal, Jordi; Quadt, Arnulf; Kawamura, Gen; Musheghyan, Haykuhi

    2015-12-01

    High-throughput computing platforms consist of a complex infrastructure and provide a number of services apt to failures. To mitigate the impact of failures on the quality of the provided services, a constant monitoring and in time reaction is required, which is impossible without automation of the system administration processes. This paper introduces a way of automation of the process of monitoring information analysis to provide the long and short term predictions of the service response time (SRT) for a mass storage and batch systems and to identify the status of a service at a given time. The approach for the SRT predictions is based on Adaptive Neuro Fuzzy Inference System (ANFIS). An evaluation of the approaches is performed on real monitoring data from the WLCG Tier 2 center GoeGrid. Ten fold cross validation results demonstrate high efficiency of both approaches in comparison to known methods.

  19. Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

    NASA Astrophysics Data System (ADS)

    Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

    2015-03-01

    Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.

  20. Model-free functional MRI analysis using improved fuzzy cluster analysis techniques

    NASA Astrophysics Data System (ADS)

    Lange, Oliver; Meyer-Baese, Anke; Wismueller, Axel; Hurdal, Monica; Sumners, DeWitt; Auer, Dorothee

    2004-04-01

    Conventional model-based or statistical analysis methods for functional MRI (fMRI) are easy to implement, and are effective in analyzing data with simple paradigms. However, they are not applicable in situations in which patterns of neural response are complicated and when fMRI response is unknown. In this paper the Gath-Geva algorithm is adapted and rigorously studied for analyzing fMRI data. The algorithm supports spatial connectivity aiding in the identification of activation sites in functional brain imaging. A comparison of this new method with the fuzzy n-means algorithm, Kohonen's self-organizing map, fuzzy n-means algorithm with unsupervised initialization, minimal free energy vector quantizer and the "neural gas" network is done in a systematic fMRI study showing comparative quantitative evaluations. The most important findings in the paper are: (1) the Gath-Geva algorithms outperforms for a large number of codebook vectors all other clustering methods in terms of detecting small activation areas, and (2) for a smaller number of codebook vectors the fuzzy n-means with unsupervised initialization outperforms all other techniques. The applicability of the new algorithm is demonstrated on experimental data.

  1. Clustering Analysis of OFFICER'S Behaviours in London Police Foot Patrol Activities

    NASA Astrophysics Data System (ADS)

    Shen, J.; Cheng, T.

    2015-07-01

    In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers' patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI) to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.

  2. An Abundance Analysis of Red Giant Stars in the Retrograde Galactic Globular Cluster NGC 3201: Implications for Cluster Formation Scenarios

    NASA Astrophysics Data System (ADS)

    Simmerer, Jennifer A.; Ivans, I. I.

    2011-01-01

    Globular clusters have long been central to the study of Galactic Chemical Evolution. They serve as laboratories for stellar physics, evolution, and nucleosynthesis as well as representing fossil remnants of Galactic assembly processes. Our work addresses two recent areas of interest: globular clusters as accreted objects and globular clusters as hosts for multiple stellar populations. The globular cluster NGC 3201 is a curious object on a retrograde orbit. Some studies suggest that it contains stars of more than one metallicity, a property seen only in the peculiar globular cluster Omega Centauri. Both properties hint at an extra-Galactic origin. We present an elemental abundance pattern for NGC 3201 based on high resolution, high signal-to-noise spectra of red giant stars. We present abundance patterns of similar stars from the globular cluster M5 for comparison. Interpretation of our results is complicated by the discovery that at least two of our giants are variable stars. Though we can derive adequate stellar parameter solutions for both stars in every stage of variability and heavy element abundances do not change with the stellar phase, the abundances of the light elements O, Na, Mg, and Al are extremely unstable and vary greatly. Our inability to correctly model light element line formation in the atmosphere of variable red giant stars has significant implications for studies of star to star abundance variations in exactly these elements in globular clusters, which rely on stars at the same evolutionary stage as the variables in NGC 3201.

  3. Cluster analysis of European surface ozone observations for evaluation of MACC reanalysis data

    NASA Astrophysics Data System (ADS)

    Lyapina, Olga; Schultz, Martin G.; Hense, Andreas

    2016-06-01

    The high density of European surface ozone monitoring sites provides unique opportunities for the investigation of regional ozone representativeness and for the evaluation of chemistry climate models. The regional representativeness of European ozone measurements is examined through a cluster analysis (CA) of 4 years of 3-hourly ozone data from 1492 European surface monitoring stations in the Airbase database; the time resolution corresponds to the output frequency of the model that is compared to the data in this study. K-means clustering is implemented for seasonal-diurnal variations (i) in absolute mixing ratio units and (ii) normalized by the overall mean ozone mixing ratio at each site. Statistical tests suggest that each CA can distinguish between four and five different ozone pollution regimes. The individual clusters reveal differences in seasonal-diurnal cycles, showing typical patterns of the ozone behavior for more polluted stations or more rural background. The robustness of the clustering was tested with a series of k-means runs decreasing randomly the size of the initial data set or lengths of the time series. Except for the Po Valley, the clustering does not provide a regional differentiation, as the member stations within each cluster are generally distributed all over Europe. The typical seasonal, diurnal, and weekly cycles of each cluster are compared to the output of the multi-year global reanalysis produced within the Monitoring of Atmospheric Composition and Climate (MACC) project. While the MACC reanalysis generally captures the shape of the diurnal cycles and the diurnal amplitudes, it is not able to reproduce the seasonal cycles very well and it exhibits a high bias up to 12 nmol mol-1. The bias decreases from more polluted clusters to cleaner ones. Also, the seasonal and weekly cycles and frequency distributions of ozone mixing ratios are better described for clusters with relatively clean signatures. Due to relative sparsity of CO and NOx

  4. Arthropod monitoring for fine-scale habitat analysis: A case study of the El Segundo sand dunes

    SciTech Connect

    Mattoni, R.; Longcore, T.; Novotny, V.

    2000-04-01

    Arthropod communities from several habitats on and adjacent to the El Segundo dunes (Los Angeles County, CA) were sampled using pitfall and yellow pan traps to evaluate their possible use as indicators of restoration success. Communities were ordinated and clustered using correspondence analysis, detrended correspondence analysis, two-way indicator species analysis, and Ward's method of agglomerative clustering. The results showed high repeatability among replicates within any sampling arena that permits discrimination of (1) degraded and relatively undisturbed habitat, (2) different dune habitat types, and (3) annual change. Canonical correspondence analysis showed a significant effect of disturbance history on community composition that explained 5--20% of the variation. Replicates of pitfall and yellow pan traps on single sites clustered together reliably when species abundance was considered, whereas clusters using only species incidence did not group replicates as consistently. The broad taxonomic approach seems appropriate for habitat evaluation and monitoring of restoration projects as an alternative to assessments geared to single species or even single families.

  5. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. II. NGC 5024, NGC 5272, and NGC 6352

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, R.; Stenning, D. C.; Robinson, E.; von Hippel, T.; Sarajedini, A.; van Dyk, D. A.; Stein, N.; Jefferys, W. H.

    2016-07-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ∼0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.

  6. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. II. NGC 5024, NGC 5272, and NGC 6352

    NASA Astrophysics Data System (ADS)

    Wagner-Kaiser, R.; Stenning, D. C.; Robinson, E.; von Hippel, T.; Sarajedini, A.; van Dyk, D. A.; Stein, N.; Jefferys, W. H.

    2016-07-01

    We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival Advanced Camera for Surveys Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from ˜0.05 to 0.11 for these three clusters. Model grids with solar α-element abundances ([α/Fe] = 0.0) and enhanced α-elements ([α/Fe] = 0.4) are adopted.

  7. Deletion analysis of the avermectin biosynthetic genes of Streptomyces avermitilis by gene cluster displacement.

    PubMed Central

    MacNeil, T; Gewain, K M; MacNeil, D J

    1993-01-01

    Streptomyces avermitilis produces a group of glycosylated, methylated macrocyclic lactones, the avermectins, which have potent anthelmintic activity. A homologous recombination strategy termed gene cluster displacement was used to construct Neor deletion strains with defined endpoints and to clone the corresponding complementary DNA encoding functions for avermectin biosynthesis (avr). Thirty-five unique deletions of 0.5 to > 100 kb over a continuous 150-kb region were introduced into S. avermitilis. Analysis of the avermectin phenotypes of the deletion-containing strains defined the extent and ends of the 95-kb avr gene cluster, identified a regulatory region, and mapped several avr functions. A 60-kb region in the central portion determines the synthesis of the macrolide ring. A 13-kb region at one end of the cluster is responsible for synthesis and attachment of oleandrose disaccharide. A 10-kb region at the other end has functions for positive regulation and C-5 O methylation. Physical analysis of the deletions and of in vivo-cloned fragments refined a 130-kb physical map of the avr gene cluster region. Images PMID:8478321

  8. Expanded Natural Product Diversity Revealed by Analysis of Lanthipeptide-Like Gene Clusters in Actinobacteria

    PubMed Central

    Zhang, Qi; Doroghazi, James R.; Zhao, Xiling; Walker, Mark C.

    2015-01-01

    Lanthionine-containing peptides (lanthipeptides) are a rapidly growing family of polycyclic peptide natural products belonging to the large class of ribosomally synthesized and posttranslationally modified peptides (RiPPs). Lanthipeptides are widely distributed in taxonomically distant species, and their currently known biosynthetic systems and biological activities are diverse. Building on the recent natural product gene cluster family (GCF) project, we report here large-scale analysis of lanthipeptide-like biosynthetic gene clusters from Actinobacteria. Our analysis suggests that lanthipeptide biosynthetic pathways, and by extrapolation the natural products themselves, are much more diverse than currently appreciated and contain many different posttranslational modifications. Furthermore, lanthionine synthetases are much more diverse in sequence and domain topology than currently characterized systems, and they are used by the biosynthetic machineries for natural products other than lanthipeptides. The gene cluster families described here significantly expand the chemical diversity and biosynthetic repertoire of lanthionine-related natural products. Biosynthesis of these novel natural products likely involves unusual and unprecedented biochemistries, as illustrated by several examples discussed in this study. In addition, class IV lanthipeptide gene clusters are shown not to be silent, setting the stage to investigate their biological activities. PMID:25888176

  9. Spatial cluster analysis of nanoscopically mapped serotonin receptors for classification of fixed brain tissue

    NASA Astrophysics Data System (ADS)

    Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw

    2014-01-01

    We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.

  10. Analysis of local bond-orientational order for liquid gallium at ambient pressure: Two types of cluster structures.

    PubMed

    Chen, Lin-Yuan; Tang, Ping-Han; Wu, Ten-Ming

    2016-07-14

    In terms of the local bond-orientational order (LBOO) parameters, a cluster approach to analyze local structures of simple liquids was developed. In this approach, a cluster is defined as a combination of neighboring seeds having at least nb local-orientational bonds and their nearest neighbors, and a cluster ensemble is a collection of clusters with a specified nb and number of seeds ns. This cluster analysis was applied to investigate the microscopic structures of liquid Ga at ambient pressure (AP). The liquid structures studied were generated through ab initio molecular dynamics simulations. By scrutinizing the static structure factors (SSFs) of cluster ensembles with different combinations of nb and ns, we found that liquid Ga at AP contained two types of cluster structures, one characterized by sixfold orientational symmetry and the other showing fourfold orientational symmetry. The SSFs of cluster structures with sixfold orientational symmetry were akin to the SSF of a hard-sphere fluid. On the contrary, the SSFs of cluster structures showing fourfold orientational symmetry behaved similarly as the anomalous SSF of liquid Ga at AP, which is well known for exhibiting a high-q shoulder. The local structures of a highly LBOO cluster whose SSF displayed a high-q shoulder were found to be more similar to the structure of β-Ga than those of other solid phases of Ga. More generally, the cluster structures showing fourfold orientational symmetry have an inclination to resemble more to β-Ga. PMID:27421419

  11. Analysis of local bond-orientational order for liquid gallium at ambient pressure: Two types of cluster structures

    NASA Astrophysics Data System (ADS)

    Chen, Lin-Yuan; Tang, Ping-Han; Wu, Ten-Ming

    2016-07-01

    In terms of the local bond-orientational order (LBOO) parameters, a cluster approach to analyze local structures of simple liquids was developed. In this approach, a cluster is defined as a combination of neighboring seeds having at least nb local-orientational bonds and their nearest neighbors, and a cluster ensemble is a collection of clusters with a specified nb and number of seeds ns. This cluster analysis was applied to investigate the microscopic structures of liquid Ga at ambient pressure (AP). The liquid structures studied were generated through ab initio molecular dynamics simulations. By scrutinizing the static structure factors (SSFs) of cluster ensembles with different combinations of nb and ns, we found that liquid Ga at AP contained two types of cluster structures, one characterized by sixfold orientational symmetry and the other showing fourfold orientational symmetry. The SSFs of cluster structures with sixfold orientational symmetry were akin to the SSF of a hard-sphere fluid. On the contrary, the SSFs of cluster structures showing fourfold orientational symmetry behaved similarly as the anomalous SSF of liquid Ga at AP, which is well known for exhibiting a high-q shoulder. The local structures of a highly LBOO cluster whose SSF displayed a high-q shoulder were found to be more similar to the structure of β-Ga than those of other solid phases of Ga. More generally, the cluster structures showing fourfold orientational symmetry have an inclination to resemble more to β-Ga.

  12. Validation of disease states in schizophrenia: comparison of cluster analysis between US and European populations

    PubMed Central

    Thokagevistk, Katia; Millier, Aurélie; Lenert, Leslie; Sadikhov, Shamil; Moreno, Santiago; Toumi, Mondher

    2016-01-01

    Background There is controversy as to whether use of statistical clustering methods to identify common disease patterns in schizophrenia identifies patterns generalizable across countries. Objective The goal of this study was to compare disease states identified in a published study (Mohr/Lenert, 2004) considering US patients to disease states in a European cohort (EuroSC) considering English, French, and German patients. Methods Using methods paralleling those in Mohr/Lenert, we conducted a principal component analysis (PCA) on Positive and Negative Syndrome Scale items in the EuroSC data set (n=1,208), followed by k-means cluster analyses and a search for an optimal k. The optimal model structure was compared to Mohr/Lenert by assigning discrete severity levels to each cluster in each factor based on the cluster center. A harmonized model was created and patients were assigned to health states using both approaches; agreement rates in state assignment were then calculated. Results Five factors accounting for 56% of total variance were obtained from PCA. These factors corresponded to positive symptoms (Factor 1), negative symptoms (Factor 2), cognitive impairment (Factor 3), hostility/aggression (Factor 4), and mood disorder (Factor 5) (as in Mohr/Lenert). The optimal number of cluster states was six. The kappa statistic (95% confidence interval) for agreement in state assignment was 0.686 (0.670–0.703). Conclusion The patterns of schizophrenia effects identified using clustering in two different data sets were reasonably similar. Results suggest the Mohr/Lenert health state model is potentially generalizable to other populations. PMID:27386054

  13. Sensitivity Enhancement of RF Plasma Etch Endpoint Detection With K-means Cluster Analysis

    NASA Astrophysics Data System (ADS)

    Lee, Honyoung; Jang, Haegyu; Lee, Hak-Seung; Chae, Heeyeop

    2015-09-01

    Plasma etching process is the core process in semiconductor fabrication, and the etching endpoint detection is one of the essential FDC (Fault Detection and Classification) for yield management and mass production. In general, Optical emission spectrocopy (OES) has been used to detect endpoint because OES can be a non-invasive and real-time plasma monitoring tool. In OES, the trend of a few sensitive wavelengths is traced. However, in case of small-open area etch endpoint detection (ex. contact etch), it is at the boundary of the detection limit because of weak signal intensities of reaction reactants and products. Furthemore, the various materials covering the wafer such as photoresist, dielectric materials, and metals make the analysis of OES signals complicated. In this study, full spectra of optical emission signals were collected and the data were analyzed by a data-mining approach, modified K-means cluster analysis. The K-means cluster analysis is modified suitably to analyze a thousand of wavelength variables from OES. This technique can improve the sensitivity of EPD for small area oxide layer etching processes: about 1.0% oxide area. This technique is expected to be applied to various plasma monitoring applications including fault detections as well as EPD. Plasma Etch, EPD, K-means Cluster Analysis.

  14. Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook

    NASA Astrophysics Data System (ADS)

    Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.

    2012-12-01

    The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system

  15. Network analysis identifies protein clusters of functional importance in juvenile idiopathic arthritis

    PubMed Central

    2014-01-01

    Introduction Our objective was to utilise network analysis to identify protein clusters of greatest potential functional relevance in the pathogenesis of oligoarticular and rheumatoid factor negative (RF-ve) polyarticular juvenile idiopathic arthritis (JIA). Methods JIA genetic association data were used to build an interactome network model in BioGRID 3.2.99. The top 10% of this protein:protein JIA Interactome was used to generate a minimal essential network (MEN). Reactome FI Cytoscape 2.83 Plugin and the Disease Association Protein-Protein Link Evaluator (Dapple) algorithm were used to assess the functionality of the biological pathways within the MEN and to statistically rank the proteins. JIA gene expression data were integrated with the MEN and clusters of functionally important proteins derived using MCODE. Results A JIA interactome of 2,479 proteins was built from 348 JIA associated genes. The MEN, representing the most functionally related components of the network, comprised of seven clusters, with distinct functional characteristics. Four gene expression datasets from peripheral blood mononuclear cells (PBMC), neutrophils and synovial fluid monocytes, were mapped onto the MEN and a list of genes enriched for functional significance identified. This analysis revealed the genes of greatest potential functional importance to be PTPN2 and STAT1 for oligoarticular JIA and KSR1 for RF-ve polyarticular JIA. Clusters of 23 and 14 related proteins were derived for oligoarticular and RF-ve polyarticular JIA respectively. Conclusions This first report of the application of network biology to JIA, integrating genetic association findings and gene expression data, has prioritised protein clusters for functional validation and identified new pathways for targeted pharmacological intervention. PMID:24886659

  16. Lifestyle health behaviors of Hong Kong Chinese: results of a cluster analysis.

    PubMed

    Chan, Choi Wan; Leung, Sau Fong

    2015-04-01

    Sociodemographics affect health through pathways of lifestyle choices. Using data from a survey of 467 Hong Kong Chinese, this study aims to examine the prevalence of their lifestyle behaviors, identify profiles based on their sociodemographic and lifestyle variables, and compare differences among the profile groups. Two-step cluster analysis was used to identify natural profile groups within the data set: only 37% of the participants engaged in regular physical exercises, and less than 50% monitored their dietary intake carefully. The analysis yields 2 clusters, representing a "healthy" and a "less-healthy" lifestyle group. The "less-healthy" group was predominantly male, younger, employed, and had high-to-middle levels of education. The findings reveal the lifestyle behavior patterns and sociodemographic characteristics of a high-risk group, which are essential to provide knowledge for the planning of health promotion activities. PMID:25296668

  17. Novel Application of Cluster Analysis to Transport Data in Single Molecule Break Junctions

    NASA Astrophysics Data System (ADS)

    Wu, Ben; Ivie, Jeffrey; Johnson, Tyler; Himmelhuber, Roland; Monti, Oliver

    Single molecule based devices represent the ultimate limit in device design, but uncovering the major factors that determine energy level alignment in single molecule junctions and their effect on the charge transport properties of single molecules is still a major challenge. Analysis of break junction data using a novel density based hierarchical clustering algorithm reveals the deep structure of the highly stochastic data that will help hypothesis-driven elucidation of some of the key parameters for quantum transport. The strength of this approach is its scale-invariance and the identification of nested structure that may be overlooked by standard data analysis techniques. The statistical relevance of identified clusters can be gauged using a density based validation index. Arnold and Mabel Beckman Foundation.

  18. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles

    PubMed Central

    Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G. Michael; O’Connor, Christopher; Patel, Chetan B.

    2016-01-01

    Background Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. Objective To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. Methods We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. Results We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1–3 had 45–70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. Conclusions By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis

  19. Detection of Significant Groups in Hierarchical Clustering by Resampling

    PubMed Central

    Sebastiani, Paola; Perls, Thomas T.

    2016-01-01

    Hierarchical clustering is a simple and reproducible technique to rearrange data of multiple variables and sample units and visualize possible groups in the data. Despite the name, hierarchical clustering does not provide clusters automatically, and “tree-cutting” procedures are often used to identify subgroups in the data by cutting the dendrogram that represents the similarities among groups used in the agglomerative procedure. We introduce a resampling-based technique that can be used to identify cut-points of a dendrogram with a significance level based on a reference distribution for the heights of the branch points. The evaluation on synthetic data shows that the technique is robust in a variety of situations. An example with real biomarker data from the Long Life Family Study shows the usefulness of the method. PMID:27551289

  20. Detection of Significant Groups in Hierarchical Clustering by Resampling.

    PubMed

    Sebastiani, Paola; Perls, Thomas T

    2016-01-01

    Hierarchical clustering is a simple and reproducible technique to rearrange data of multiple variables and sample units and visualize possible groups in the data. Despite the name, hierarchical clustering does not provide clusters automatically, and "tree-cutting" procedures are often used to identify subgroups in the data by cutting the dendrogram that represents the similarities among groups used in the agglomerative procedure. We introduce a resampling-based technique that can be used to identify cut-points of a dendrogram with a significance level based on a reference distribution for the heights of the branch points. The evaluation on synthetic data shows that the technique is robust in a variety of situations. An example with real biomarker data from the Long Life Family Study shows the usefulness of the method. PMID:27551289

  1. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    NASA Astrophysics Data System (ADS)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  2. Seismic clusters analysis in North-Eastern Italy by the nearest-neighbor approach

    NASA Astrophysics Data System (ADS)

    Peresan, Antonella; Gentili, Stefania

    2016-04-01

    The main features of earthquake clusters in the Friuli Venezia Giulia Region (North Eastern Italy) are explored, with the aim to get some new insights on local scale patterns of seismicity in the area. The study is based on a systematic analysis of robustly and uniformly detected seismic clusters of small-to-medium magnitude events, as opposed to selected clusters analyzed in earlier studies. To characterize the features of seismicity for FVG, we take advantage of updated information from local OGS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics, Centre of Seismological Research, since 1977. A preliminary reappraisal of the earthquake bulletins is carried out, in order to identify possible missing events and to remove spurious records (e.g. duplicates and explosions). The area of sufficient completeness is outlined; for this purpose, different techniques are applied, including a comparative analysis with global ISC data, which are available in the region for large and moderate size earthquakes. Various techniques are considered to estimate the average parameters that characterize the earthquake occurrence in the region, including the b-value and the fractal dimension of epicenters distribution. Specifically, besides the classical Gutenberg-Richter Law, the Unified Scaling Law for Earthquakes, USLE, is applied. Using the updated and revised OGS data, a new formal method for detection of earthquake clusters, based on nearest-neighbor distances of events in space-time-energy domain, is applied. The bimodality of the distribution, which characterizes the earthquake nearest-neighbor distances, is used to decompose the seismic catalog into sequences of individual clusters and background seismicity. Accordingly, the method allows for a data-driven identification of main shocks (first event with the largest magnitude in the cluster), foreshocks and aftershocks. Average robust estimates of the USLE parameters (particularly, b

  3. Cluster analysis of earthquake swarms - results from West Bohemia and South-West Iceland

    NASA Astrophysics Data System (ADS)

    Čermáková, Hana; Cesca, Simone; Horálek, Josef

    2015-04-01

    Earthquake swarms are specific type of seismic activity when strain energy is released in numerous mostly shallow earthquakes, which are missing a single large event; instead a few dominant earthquakes reach similar magnitudes so that smaller events are not associated with any identifiable mainshock. Earthquake swarms distinctively cluster in time and space and last from several hours to several months. They occur at boundaries of the lithospheric plates (interplate), within the plates (intraplate), and they are very often related to the volcanic areas, geothermal fields and ocean ridges. In our study we explored the behaviour of earthquake swarms within a tectonic plate, in a boundary of tectonic plates and in volcanic areas in order to understand why the energy is released successively by sequences of small events in contrast to mainshock-aftershock earthquakes. We used catalogue data from West Bohemia-Vogtland region (WB) situated within a tectonic plate, and three different tectonic basis in South-West Iceland (SWI), namely boundary of tectonic plates (Krísuvík), the edge of a zone where typically mainshock-aftershock earthquakes occur (Olfus, South Iceland Sesmic Zone) and the volcanic area (Hengill). In case of WB we analyzed two swarms, 2000 and 2008, which occurred on the same fault segments. We analyzed distribution of events in a view of a spatial metric obtained from relative locations and time metric (in case of WB and SWI), and a focal mechanism metric based on double couple (DC) solutions (in case of WB). For this purpose we used clustering method by Cesca et al. (2014). The results are strongly affected by the subjective choice of two parameters which describe the desired density of points to infer a cluster. For the tested applications, we repeated the clustering several times to decide the best combination of these parameters. The cluster analysis applied to the double-difference locations disclosed several separate clusters in each area

  4. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  5. A Technique of Two-Stage Clustering Applied to Environmental and Civil Engineering and Related Methods of Citation Analysis.

    ERIC Educational Resources Information Center

    Miyamoto, S.; Nakayama, K.

    1983-01-01

    A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…

  6. Deriving non-homogeneous DNA Markov chain models by cluster analysis algorithm minimizing multiple alignment entropy.

    PubMed

    Borodovsky, M; Peresetsky, A

    1994-09-01

    Non-homogeneous Markov chain models can represent biologically important regions of DNA sequences. The statistical pattern that is described by these models is usually weak and was found primarily because of strong biological indications. The general method for extracting similar patterns is presented in the current paper. The algorithm incorporates cluster analysis, multiple alignment and entropy minimization. The method was first tested using the set of DNA sequences produced by Markov chain generators. It was shown that artificial gene sequences, which initially have been randomly set up along the multiple alignment panels, are aligned according to the hidden triplet phase. Then the method was applied to real protein-coding sequences and the resulting alignment clearly indicated the triplet phase and produced the parameters of the optimal 3-periodic non-homogeneous Markov chain model. These Markov models were already employed in the GeneMark gene prediction algorithm, which is used in genome sequencing projects. The algorithm can also handle the case in which the sequences to be aligned reveal different statistical patterns, such as Escherichia coli protein-coding sequences belonging to Class II and Class III. The algorithm accepts a random mix of sequences from different classes, and is able to separate them into two groups (clusters), align each cluster separately, and define a non-homogeneous Markov chain model for each sequence cluster. PMID:7952897

  7. Towards combined analysis of the most distant massive galaxy clusters with XMM and Chandra

    NASA Astrophysics Data System (ADS)

    Bartalucci, I.

    2016-06-01

    We present a detailed study of the gas and dark matter properties of the 5 most massive and distant, z ˜ 1, clusters detected via the Sunyaev-Zel'Dovich effect. These massive objects represent an ideal laboratory to test our models of structure evolution in a mass regime driven mainly by gravity. This work presents a new method to study these objects, where informations coming from XMM-Newton and Chandra instruments are efficiently combined. The combination of Chandra fine spatial resolution and XMM-Newton effective area allows us to efficiently investigate the properties of the Intra Cluster medium in the core and probe cluster outskirts. The resulting combined density profiles are used to fully characterize the thermodynamic and physical properties of the gas. Evolution properties are investigated from comparison with the REXCESS local galaxy cluster sample. In the context of the joint analysis of future Chandra and XMM large programs, we discuss the current limitations of this method and future prospects.

  8. Coupled dynamic analysis of a single gimbal control moment gyro cluster integrated with an isolation system

    NASA Astrophysics Data System (ADS)

    Luo, Qing; Li, Dongxu; Jiang, Jianping

    2014-01-01

    Control moment gyros (CMGs) are widely used as actuators for attitude control in spacecraft. However, micro-vibrations produced by CMGs will degrade the pointing performance of high-sensitivity instruments on-board the spacecraft. This paper addresses dynamic modelling and performs an analysis on the micro-vibration isolation for a single gimbal CMG (SGCMG) cluster. First, an analytical model was developed to describe both the coupled SGCMG cluster and the multi-axis isolation system that can express the dynamic outputs. This analytical model accurately reflects the mass and inertia properties, the gyroscopic effects and flexible modes of the coupled system, which can be generalized for isolation applications of SGCMG clusters. Second, the analytical model was validated using MSC.NASTRAN software based on the finite element technique. The dynamic characteristics of the coupled system are affected by the mass distribution and the gyroscopic effects of the SGCMGs. The gyroscopic effects produced by the rotary flywheel will stiffen or soften several of the structural modes of the coupled system. In addition, the gyroscopic effect of each SGCMG can interact with or counteract that of others, which induce vibration modes coupled together. Finally, the performance of the passive isolation was analysed. It was demonstrated that the gyroscopic effects should be considered in isolation studies on SGCMG clusters; otherwise, the isolation performance will be underestimated if they are ignored.

  9. Outbreak of pyrazinamide-monoresistant tuberculosis identified using genotype cluster and social media analysis

    PubMed Central

    Thomas, T. A.; Heysell, S. K.; Houpt, E. R.; Moore, J. L.; Keller, S. J.

    2015-01-01

    SUMMARY SETTING Monoresistance to pyrazinamide (PZA) has infrequently been associated with Mycobacterium tuberculosis. OBJECTIVE To report an outbreak of PZA-monoresistant M. tuberculosis in Virginia involving two genotype clusters from December 2004 to August 2010. RESULTS Thirty cases were identified involving a predominantly young, US-born population with histories of substance use and incarceration and a large proportion of children aged <15 years (n = 6, 20%); of these, 23 cases (77%) were culture-confirmed as M. tuberculosis complex. DNA fingerprinting and molecular analysis of the PZA resistance gene, pncA, demonstrated a clonal strain that was not M. bovis. Genotypic data provided the initial link between seemingly unrelated cases, and helped reveal a historic genotype cluster of cases from 2004. Further genotype cluster and contact investigation procedures, including the novel use of the social networking website Facebook.com, revealed additional links between the 2004 and 2009 genotype clusters and described an ongoing, extensive outbreak necessitating an enhanced screening and treatment protocol for contacts. CONCLUSIONS This outbreak demonstrates how tuberculosis can spread through a young, vulnerable population. The use of genotypic data and the novel incorporation of social media investigations were critical to understanding the settings and context of infectivity. PMID:24903792

  10. Detection of single unit activity from the rat vagus using cluster analysis of principal components.

    PubMed

    Horn, Charles C; Friedman, Mark I

    2003-01-30

    In vivo recordings from subdiaphragmatic vagal afferent nerves generally lack the resolution to distinguish single unit activity. Several methods for data acquisition and analysis were combined to produce a high degree of reliability in recording electrophysiological signals from gastrointestinal and hepatic afferent fibers in the rat. Recordings with low noise were achieved by paralysis of the respiratory muscles and by pinning the nerve to a recording platform. Single unit activity was isolated using principal component (PC) analysis and cluster cutting of data in multi-dimensional space (1-3 PCs). Cluster assignments were determined by a semi-automated approach using the k-means algorithm. The accuracy of single unit classification was assessed by checking inter-spike intervals (ISIs) to determine the length of the refractory period, and by cross-correlation analysis to assess whether single units were mistakenly split into more than one cluster. These analyses produced up to four isolated single units from each nerve filament (a bundle of nerve fibers), and typically it was possible to further increase yield by recording from several nerve filaments simultaneously using an array of electrodes. PMID:12573473

  11. Assessment of water quality using cluster analysis in coastal region of Mumbai, India.

    PubMed

    Kamble, Swapnil R; Vijay, Ritesh

    2011-07-01

    The coastal water quality of Mumbai is deteriorating due to various point and non-point wastewater sources. Hence, it is desirable to monitor coastal water quality for various water-related activities like bathing, contact water sports, recreation, and commercial fishing. The objective of this paper is to assess the seasonal water quality on the basis of seawater standards. Based on water-quality analysis of 17 seafronts and beaches, most of the parameters were exceeding the standards. The statistical cluster analysis was carried out for evaluating impact of wastewater and sewage discharges. The hierarchical cluster analysis resulted into three clustered groups, namely less polluted, moderately polluted, and highly polluted sites with similar characteristics of water quality. Mahim was found to be worst-affected beach due to incoming organic load from the Mithi river in comparison to other seafronts and beaches. Unaccounted sources of sewage and wastewater should be identified and rerouted through sewerage system by improving collection efficiency, treatment, and proper disposal for achieving designated receiving water quality standards. PMID:20835920

  12. Classification and identification of metal-accumulating plant species by cluster analysis.

    PubMed

    Yang, Wenhao; Li, He; Zhang, Taoxiang; Sen, Lin; Ni, Wuzhong

    2014-09-01

    Identification and classification of metal-accumulating plant species is essential for phytoextraction. Cluster analysis is used for classifying individuals based on measured characteristics. In this study, classification of plant species for metal accumulation was conducted using cluster analysis based on a practical survey. Forty plant samples belonging to 21 species were collected from an ancient silver-mining site. Five groups such as hyperaccumulator, potential hyperaccumulator, accumulator, potential accumulator, and normal accumulating plant were graded. For Cd accumulation, the ancient silver-mining ecotype of Sedum alfredii was treated as a Cd hyperaccumulator, and the others were normal Cd-accumulating plants. For Zn accumulation, S. alfredii was considered as a potential Zn hyperaccumulator, Conyza canadensis and Artemisia lavandulaefolia were Zn accumulators, and the others were normal Zn-accumulating plants. For Pb accumulation, S. alfredii and Elatostema lineolatum were potential Pb hyperaccumulators, Rubus hunanensis, Ajuga decumbens, and Erigeron annuus were Pb accumulators, C. canadensis and A. lavandulaefolia were potential Pb accumulators, and the others were normal Pb-accumulating plants. Plant species with the potential for phytoextraction were identified such as S. alfredii for Cd and Zn, C. canadensis and A. lavandulaefolia for Zn and Pb, and E. lineolatum, R. hunanensis, A. decumbens, and E. annuus for Pb. Cluster analysis is effective in the classification of plant species for metal accumulation and identification of potential species for phytoextraction. PMID:24888623

  13. A spatial cluster analysis of tractor overturns in Kentucky from 1960 to 2002

    USGS Publications Warehouse

    Saman, D.M.; Cole, H.P.; Odoi, A.; Myers, M.L.; Carey, D.I.; Westneat, S.C.

    2012-01-01

    Background: Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods: A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results: The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (p<0.0001). Conclusions: This study reveals that geographic hotspots of tractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. ?? 2012 Saman et al.

  14. A Spatial Cluster Analysis of Tractor Overturns in Kentucky from 1960 to 2002

    PubMed Central

    Saman, Daniel M.; Cole, Henry P.; Odoi, Agricola; Myers, Melvin L.; Carey, Daniel I.; Westneat, Susan C.

    2012-01-01

    Background Agricultural tractor overturns without rollover protective structures are the leading cause of farm fatalities in the United States. To our knowledge, no studies have incorporated the spatial scan statistic in identifying high-risk areas for tractor overturns. The aim of this study was to determine whether tractor overturns cluster in certain parts of Kentucky and identify factors associated with tractor overturns. Methods A spatial statistical analysis using Kulldorff's spatial scan statistic was performed to identify county clusters at greatest risk for tractor overturns. A regression analysis was then performed to identify factors associated with tractor overturns. Results The spatial analysis revealed a cluster of higher than expected tractor overturns in four counties in northern Kentucky (RR = 2.55) and 10 counties in eastern Kentucky (RR = 1.97). Higher rates of tractor overturns were associated with steeper average percent slope of pasture land by county (p = 0.0002) and a greater percent of total tractors with less than 40 horsepower by county (p<0.0001). Conclusions This study reveals that geographic hotspots of tractor overturns exist in Kentucky and identifies factors associated with overturns. This study provides policymakers a guide to targeted county-level interventions (e.g., roll-over protective structures promotion interventions) with the intention of reducing tractor overturns in the highest risk counties in Kentucky. PMID:22291980

  15. A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.

    SciTech Connect

    Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.; Pebay, Philippe Pierre; Gentile, Ann C.; Thompson, David C.; Roe, Diana C.; De Sapio, Vincent; Brandt, James M.

    2010-08-01

    The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in job queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.

  16. Multi-Resolution Clustering Analysis and Visualization of Around One Million Synthetic Earthquake Events

    NASA Astrophysics Data System (ADS)

    Kaneko, J. Y.; Yuen, D. A.; Dzwinel, W.; Boryszko, K.; Ben-Zion, Y.; Sevre, E. O.

    2002-12-01

    The study of seismic patterns with synthetic data is important for analyzing the seismic hazard of faults because one can precisely control the spatial and temporal domains. Using modern clustering analysis from statistics and a recently introduced visualization software, AMIRA, we have examined the multi-resolution nature of a total assemblage involving 922,672 earthquake events in 4 numerically simulated models, which have different constitutive parameters, with 2 disparately different time intervals in a 3D spatial domain. The evolution of stress and slip on the fault plane was simulated with the 3D elastic dislocation theory for a configuration representing the central San Andreas Fault (Ben-Zion, J. Geophys. Res., 101, 5677-5706, 1996). The 4 different models represent various levels of fault zone disorder and have the following brittle properties and names: uniform properties (model U), a Parkfield type Asperity (A), fractal properties (F), and multi-size-heterogeneities (model M). We employed the MNN (mutual nearest neighbor) clustering method and developed a C-program that calculates simultaneously a number of parameters related to the location of the earthquakes and their magnitude values .Visualization was then used to look at the geometrical locations of the hypocenters and the evolution of seismic patterns. We wrote an AmiraScript that allows us to pass the parameters in an interactive format. With data sets consisting of 150 year time intervals, we have unveiled the distinctly multi-resolutional nature in the spatial-temporal pattern of small and large earthquake correlations shown previously by Eneva and Ben-Zion (J. Geophys. Res., 102, 24513-24528, 1997). In order to search for clearer possible stationary patterns and substructures within the clusters, we have also carried out the same analysis for corresponding data sets with time extending to several thousand years. The larger data sets were studied with finer and finer time intervals and multi

  17. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations and numerical simulations

    NASA Technical Reports Server (NTRS)

    Darouzet, Fabien; DeKeyser, Johan; Decreau, Pierrette; Gallagher, Dennis; Pierrard, Viviane; Lemaire, Joseph; Dandouras, Iannis; Matsui, Hiroshi; Dunlop, Malcolm; Andre, Mats

    2005-01-01

    Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles can be derived from the plasma frequency and/or from the spacecraft potential (note that the electron spectrometer is usually not operating inside the plasmasphere); ion velocity is also measured onboard these satellites (but ion density is not reliable because of instrumental limitations). The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 minutes; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations for 3 plume events and compare CLUSTER in-situ data (panel A) with global images of the plasmasphere obtained from IMAGE (panel B), and with numerical simulations for the formation of plumes based on a model that includes the interchange instability mechanism (panel C). In particular, we study the geometry and the orientation of plasmaspheric plumes by using a four-point analysis method, the spatial gradient. We also compare several aspects of their motion as determined by different methods: (i) inner and outer plume boundary velocity calculated from time delays of this boundary observed by the wave experiment WHISPER on the four spacecraft, (ii) ion velocity derived from the ion spectrometer CIS onboard CLUSTER, (iii) drift velocity measured by the electron drift instrument ED1 onboard CLUSTER and (iv) global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  18. Constraining AGN triggering mechanisms through the clustering analysis of active black holes

    NASA Astrophysics Data System (ADS)

    Gatti, M.; Shankar, F.; Bouillot, V.; Menci, N.; Lamastra, A.; Hirschmann, M.; Fiore, F.

    2016-02-01

    The triggering mechanisms for active galactic nuclei (AGN) are still debated. Some of the most popular ones include galaxy interactions (IT) and disc instabilities (DIs). Using an advanced semi-analytic model (SAM) of galaxy formation, coupled to accurate halo occupation distribution modelling, we investigate the imprint left by each separate triggering process on the clustering strength of AGN at small and large scales. Our main results are as follows: (i) DIs, irrespective of their exact implementation in the SAM, tend to fall short in triggering AGN activity in galaxies at the centre of haloes with Mh > 1013.5 h-1 M⊙. On the contrary, the IT scenario predicts abundance of active central galaxies that generally agrees well with observations at every halo mass. (ii) The relative number of satellite AGN in DIs at intermediate-to-low luminosities is always significantly higher than in IT models, especially in groups and clusters. The low AGN satellite fraction predicted for the IT scenario might suggest that different feeding modes could simultaneously contribute to the triggering of satellite AGN. (iii) Both scenarios are quite degenerate in matching large-scale clustering measurements, suggesting that the sole average bias might not be an effective observational constraint. (iv) Our analysis suggests the presence of both a mild luminosity and a more consistent redshift dependence in the AGN clustering, with AGN inhabiting progressively less massive dark matter haloes as the redshift increases. We also discuss the impact of different observational selection cuts in measuring AGN clustering, including possible discrepancies between optical and X-ray surveys.

  19. Three-Dimensional Measurement and Cluster Analysis for Determining the Size Ranges of Chinese Temporomandibular Joint Replacement Prosthesis.

    PubMed

    Zhang, Lu-Zhu; Meng, Shuai-Shuai; He, Dong-Mei; Fu, Yu-Zhuo; Liu, Ting; Wang, Fei-Yu; Dong, Min-Jun; Chang, Yu-Si

    2016-02-01

    The aim of this study was to investigate the osseous characteristics of Chinese temporomandibular joint (TMJ) and detect the size clusters for total joint prostheses design.Computer tomography (CT) data from 448 Chinese adults (226 male and 222 female, aged from 20 to 83 years, mean age 39.3 years) with 896 normal TMJs were chosen from the Department of Radiology in the Shanghai 9th People's Hospital. Proplan CMF 1.4 software was used to reconstruct the skulls. Three-dimensional (3D) measurements of the TMJ fossa and condyle-ramus units with 13 parameters were performed. Size clusters for prostheses design were determined by hierarchical cluster analyses, nonhierarchical (K-means) cluster analysis, and discriminant analysis.The glenoid fossa was grouped into 3 clusters, and the condyle-ramus units were grouped into 4 clusters. Discriminant analyses were capable of correctly classifying 97.24% of the glenoid fossa and 94.98% of the condyle-ramus units. The means and standard deviations for the parameter values in each cluster were determined.Fossa depth and angles between the condyle and ramus were important parameters for Chinese TMJ prostheses design. 3D measurements and cluster analysis of the osseous morphology of the TMJ provided an anatomical reference and identified the dimensions of the minimum numbers of prosthesis sizes required for Chinese TMJ replacement. PMID:26937929

  20. Three-Dimensional Measurement and Cluster Analysis for Determining the Size Ranges of Chinese Temporomandibular Joint Replacement Prosthesis

    PubMed Central

    Zhang, Lu-Zhu; Meng, Shuai-Shuai; He, Dong-Mei; Fu, Yu-Zhuo; Liu, Ting; Wang, Fei-Yu; Dong, Min-Jun; Chang, Yu-Si

    2016-01-01

    Abstract The aim of this study was to investigate the osseous characteristics of Chinese temporomandibular joint (TMJ) and detect the size clusters for total joint prostheses design. Computer tomography (CT) data from 448 Chinese adults (226 male and 222 female, aged from 20 to 83 years, mean age 39.3 years) with 896 normal TMJs were chosen from the Department of Radiology in the Shanghai 9th People's Hospital. Proplan CMF 1.4 software was used to reconstruct the skulls. Three-dimensional (3D) measurements of the TMJ fossa and condyle-ramus units with 13 parameters were performed. Size clusters for prostheses design were determined by hierarchical cluster analyses, nonhierarchical (K-means) cluster analysis, and discriminant analysis. The glenoid fossa was grouped into 3 clusters, and the condyle-ramus units were grouped into 4 clusters. Discriminant analyses were capable of correctly classifying 97.24% of the glenoid fossa and 94.98% of the condyle-ramus units. The means and standard deviations for the parameter values in each cluster were determined. Fossa depth and angles between the condyle and ramus were important parameters for Chinese TMJ prostheses design. 3D measurements and cluster analysis of the osseous morphology of the TMJ provided an anatomical reference and identified the dimensions of the minimum numbers of prosthesis sizes required for Chinese TMJ replacement. PMID:26937929

  1. Identification of responders to inhaled corticosteroids in a chronic obstructive pulmonary disease population using cluster analysis

    PubMed Central

    Hinds, David R; DiSantostefano, Rachael L; Le, Hoa V; Pascoe, Steven

    2016-01-01

    Objectives To identify clusters of patients who may benefit from treatment with an inhaled corticosteroid (ICS)/long-acting β2 agonist (LABA) versus LABA alone, in terms of exacerbation reduction, and to validate previously identified clusters of patients with chronic obstructive pulmonary disease (COPD) (based on diuretic use and reversibility). Design Post hoc supervised cluster analysis using a modified recursive partitioning algorithm of two 1-year randomised, controlled trials of fluticasone furoate (FF)/vilanterol (VI) versus VI alone, with the primary end points of the annual rate of moderate-to-severe exacerbations. Setting Global. Participants 3255 patients with COPD (intent-to-treat populations) with a history of exacerbations in the past year. Interventions FF/VI 50/25 µg, 100/25 µg or 200/25 µg, or VI 25 µg; all one time per day. Outcome measures Mean annual COPD exacerbation rate to identify clusters of patients who benefit from adding an ICS (FF) to VI bronchodilator therapy. Results Three clusters were identified, including two groups that benefit from FF/VI versus VI: patients with blood eosinophils >2.4% (RR=0.68, 95% CI 0.58 to 0.79), or blood eosinophils ≤2.4% and smoking history ≤46 pack-years, experienced a reduced rate of exacerbations with FF/VI versus VI (RR=0.78, 95% CI 0.63 to 0.96), whereas those with blood eosinophils ≤2.4% and smoking history >46 pack-years were identified as non-responders (RR=1.22, 95% CI 0.94 to 1.58). Clusters of patients previously identified in the fluticasone propionate/salmeterol (SAL) versus SAL trials of similar design were not validated; all clusters of patients tended to benefit from FF/VI versus VI alone irrespective of diuretic use and reversibility. Conclusions In patients with COPD with a history of exacerbations, those with greater blood eosinophils or a lower smoking history may benefit more from ICS/LABA versus LABA alone as measured by a reduced rate of exacerbations. In terms of

  2. Study of cluster analysis used in explosives classification with laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Wang, Q. Q.; He, L. A.; Zhao, Y.; Peng, Z.; Liu, L.

    2016-06-01

    Supervised learning methods (such as partial least squares regression-discriminant analysis, SIMCA, etc) are widely used in explosives recognition. The correct classification rate may be lowered if a sample or substrate is not included in the training dataset. Unsupervised learning methods (such as hierarchical clustering analysis, K-means, etc) have the potential to solve this problem. In this paper we analyzed results of using as input variables the intensities of seven lines and then five intensity ratios of the seven lines. It was demonstrated that unsupervised learning methods had the ability to achieve a better classification result.

  3. AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

    NASA Astrophysics Data System (ADS)

    Federici, Memmo; Martino, Bruno Luigi; Ubertini, Pietro

    2012-07-01

    In this paper we describe a new computing system array, designed, built and now used at the Space Astrophysics and Planetary Institute (IAPS) in Rome, Italy, for the INTEGRAL Space Observatory scientific data analysis. This new system has become necessary in order to reduce the processing time of the INTEGRAL data accumulated during the more than 9 years of in-orbit operation. In order to fulfill the scientific data analysis requirements with a moderately limited investment the starting approach has been to use a `cluster' array of commercial quad-CPU computers, featuring the extremely large scientific and calibration data archive on line.

  4. A comprehensive comparison of different clustering methods for reliability analysis of microarray data.

    PubMed

    Kafieh, Rahele; Mehridehnavi, Alireza

    2013-01-01

    In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task. PMID:24083134

  5. Cluster Analysis of Physical and Cognitive Ageing Patterns in Older People from Shanghai.

    PubMed

    Bandelow, Stephan; Xu, Xin; Xiao, Shifu; Hogervorst, Eef

    2016-01-01

    This study investigated the relationship between education, cognitive and physical function in older age, and their respective impacts on activities of daily living (ADL). Data on 148 older participants from a community-based sample recruited in Shanghai, China, included the following measures: age, education, ADL, grip strength, balance, gait speed, global cognition and verbal memory. The majority of participants in the present cohort were cognitively and physically healthy and reported no problems with ADL. Twenty-eight percent of participants needed help with ADL, with the majority of this group being over 80 years of age. Significant predictors of reductions in functional independence included age, balance, global cognitive function (MMSE) and the gait measures. Cluster analysis revealed a protective effect of education on cognitive function that did not appear to extend to physical function. Consistency of such phenotypes of ageing clusters in other cohort studies may provide helpful models for dementia and frailty prevention measures. PMID:26907351

  6. Supercomputer and cluster performance modeling and analysis efforts:2004-2006.

    SciTech Connect

    Sturtevant, Judith E.; Ganti, Anand; Meyer, Harold Edward; Stevenson, Joel O.; Benner, Robert E., Jr.; Goudy, Susan Phelps; Doerfler, Douglas W.; Domino, Stefan Paul; Taylor, Mark A.; Malins, Robert Joseph; Scott, Ryan T.; Barnette, Daniel Wayne; Rajan, Mahesh; Ang, James Alfred; Black, Amalia Rebecca; Laub, Thomas William; Vaughan, Courtenay Thomas; Franke, Brian Claude

    2007-02-01

    This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

  7. Cluster Analysis of Physical and Cognitive Ageing Patterns in Older People from Shanghai

    PubMed Central

    Bandelow, Stephan; Xu, Xin; Xiao, Shifu; Hogervorst, Eef

    2016-01-01

    This study investigated the relationship between education, cognitive and physical function in older age, and their respective impacts on activities of daily living (ADL). Data on 148 older participants from a community-based sample recruited in Shanghai, China, included the following measures: age, education, ADL, grip strength, balance, gait speed, global cognition and verbal memory. The majority of participants in the present cohort were cognitively and physically healthy and reported no problems with ADL. Twenty-eight percent of participants needed help with ADL, with the majority of this group being over 80 years of age. Significant predictors of reductions in functional independence included age, balance, global cognitive function (MMSE) and the gait measures. Cluster analysis revealed a protective effect of education on cognitive function that did not appear to extend to physical function. Consistency of such phenotypes of ageing clusters in other cohort studies may provide helpful models for dementia and frailty prevention measures. PMID:26907351

  8. Numerical Analysis of Base Flowfield for a Four-Engine Clustered Nozzle Configuration

    NASA Technical Reports Server (NTRS)

    Wang, Ten-See

    1995-01-01

    Excessive base heating has been a problem for many launch vehicles. For certain designs such as the direct dump of turbine exhaust inside and at the lip of the nozzle, the potential burning of the turbine exhaust in the base region can be of great concern. Accurate prediction of the base environment at altitudes is therefore very important during the vehicle design phase. Otherwise, undesirable consequences may occur. In this study, the turbulent base flowfield of a cold flow experimental investigation for a four-engine clustered nozzle was numerically benchmarked using a pressure-based computational fluid dynamics (CFD) method. This is a necessary step before the benchmarking of hot flow and combustion flow tests can be considered. Since the medium was unheated air, reasonable prediction of the base pressure distribution at high altitude was the main goal. Several physical phenomena pertaining to the multiengine clustered nozzle base flow physics were deduced from the analysis.

  9. Using Image Processing Techniques for Cluster Analysis, and Droplet Formation in Phase Separating Fluids

    NASA Astrophysics Data System (ADS)

    Smith, Gregory; Oprisan, Ana; Hegseth, John; Oprisan, Sorinel; Lecoutre, Carole; Garrabos, Yves; Beysens, Daniel

    2009-03-01

    A series of experiments were performed using the Alice II apparatus in microgravity to study phase separation near critical temperature. Using image analysis techniques, we were able to obtain quantitative information regarding the morphology of gas-liquid interface near critical point of pure SF6 fluid in microgravity. Growth laws for liquid and gas clusters were extracted based on image segmentation both with thresholding and k-means clustering. By measuring the image features we analyzed the formation of spherical droplets during late stage of phase separation for a series of full view images. The growth of a wetting layer around the border of the cell containing the fluid was also investigated using image processing techniques.

  10. Automated regional registration and characterization of corresponding microcalcification clusters on temporal pairs of mammograms for interval change analysis

    SciTech Connect

    Filev, Peter; Hadjiiski, Lubomir; Chan, Heang-Ping; Sahiner, Berkman; Ge Jun; Helvie, Mark A.; Roubidoux, Marilyn; Zhou Chuan

    2008-12-15

    A computerized regional registration and characterization system for analysis of microcalcification clusters on serial mammograms is being developed in our laboratory. The system consists of two stages. In the first stage, based on the location of a detected cluster on the current mammogram, a regional registration procedure identifies the local area on the prior that may contain the corresponding cluster. A search program is used to detect cluster candidates within the local area. The detected cluster on the current image is then paired with the cluster candidates on the prior image to form true (TP-TP) or false (TP-FP) pairs. Automatically extracted features were used in a newly designed correspondence classifier to reduce the number of false pairs. In the second stage, a temporal classifier, based on both current and prior information, is used if a cluster has been detected on the prior image, and a current classifier, based on current information alone, is used if no prior cluster has been detected. The data set used in this study consisted of 261 serial pairs containing biopsy-proven calcification clusters. An MQSA radiologist identified the corresponding clusters on the mammograms. On the priors, the radiologist rated the subtlety of 30 clusters (out of the 261 clusters) as 9 or 10 on a scale of 1 (very obvious) to 10 (very subtle). Leave-one-case-out resampling was used for feature selection and classification in both the correspondence and malignant/benign classification schemes. The search program detected 91.2%(238/261) of the clusters on the priors with an average of 0.42 FPs/image. The correspondence classifier identified 86.6%(226/261) of the TP-TP pairs with 20 false matches (0.08 FPs/image) relative to the entire set of 261 image pairs. In the malignant/benign classification stage the temporal classifier achieved a test A{sub z} of 0.81 for the 246 pairs which contained a detection on the prior. In addition, a classifier was designed by using the

  11. Cloning and Analysis of the Planosporicin Lantibiotic Biosynthetic Gene Cluster of Planomonospora alba

    PubMed Central

    Sherwood, Emma J.; Hesketh, Andrew R.

    2013-01-01

    The increasing prevalence of antibiotic resistance in bacterial pathogens has renewed focus on natural products with antimicrobial properties. Lantibiotics are ribosomally synthesized peptide antibiotics that are posttranslationally modified to introduce (methyl)lanthionine bridges. Actinomycetes are renowned for their ability to produce a large variety of antibiotics, many with clinical applications, but are known to make only a few lantibiotics. One such compound is planosporicin produced by Planomonospora alba, which inhibits cell wall biosynthesis in Gram-positive pathogens. Planosporicin is a type AI lantibiotic structurally similar to those which bind lipid II, the immediate precursor for cell wall biosynthesis. The gene cluster responsible for planosporicin biosynthesis was identified by genome mining and subsequently isolated from a P. alba cosmid library. A minimal cluster of 15 genes sufficient for planosporicin production was defined by heterologous expression in Nonomuraea sp. strain ATCC 39727, while deletion of the gene encoding the precursor peptide from P. alba, which abolished planosporicin production, was also used to confirm the identity of the gene cluster. Deletion of genes encoding likely biosynthetic enzymes identified through bioinformatic analysis revealed that they, too, are essential for planosporicin production in the native host. Reverse transcription-PCR (RT-PCR) analysis indicated that the planosporicin gene cluster is transcribed in three operons. Expression of one of these, pspEF, which encodes an ABC transporter, in Streptomyces coelicolor A3(2) conferred some degree of planosporicin resistance on the heterologous host. The inability to delete these genes from P. alba suggests that they play an essential role in immunity in the natural producer. PMID:23475977

  12. Transcriptional analysis and regulatory signals of the hom-thrB cluster of Brevibacterium lactofermentum.

    PubMed Central

    Mateos, L M; Pisabarro, A; Pátek, M; Malumbres, M; Guerrero, C; Eikmanns, B J; Sahm, H; Martín, J F

    1994-01-01

    Two genes, hom (encoding homoserine dehydrogenase) and thrB (encoding homoserine kinase), of the threonine biosynthetic pathway are clustered in the chromosome of Brevibacterium lactofermentum in the order 5' hom-thrB 3', separated by only 10 bp. The Brevibacterium thrB gene is expressed in Escherichia coli, in Brevibacterium lactofermentum, and in Corynebacterium glutamicum and complements auxotrophs of all three organisms deficient in homoserine kinase, whereas the Brevibacterium hom gene did not complement two different E. coli auxotrophs lacking homoserine dehydrogenase. However, complementation was obtained when the homoserine dehydrogenase was expressed as a fusion protein in E. coli. Northern (RNA) analysis showed that the hom-thrB cluster is transcribed, giving two different transcripts of 2.5 and 1.1 kb. The 2.5-kb transcript corresponds to the entire cluster hom-thrB (i.e., they form a bicistronic operon), and the short transcript (1.1 kb) originates from the thrB gene. The promoter in front of hom and the hom-internal promoter in front of thrB were subcloned in promoter-probe vectors of E. coli and corynebacteria. The thrB promoter is efficiently recognized both in E. coli and corynebacteria, whereas the hom promoter is functional in corynebacteria but not in E. coli. The transcription start points of both promoters have been identified by primer extension and S1 mapping analysis. The thrB promoter was located in an 87-bp fragment that overlaps with the end of the hom gene. A functional transcriptional terminator located downstream from the cluster was subcloned in terminator-probe vectors. Images PMID:7961509

  13. Pareto-optimal clustering scheme using data aggregation for wireless sensor networks

    NASA Astrophysics Data System (ADS)

    Azad, Puneet; Sharma, Vidushi

    2015-07-01

    The presence of cluster heads (CHs) in a clustered wireless sensor network (WSN) leads to improved data aggregation and enhanced network lifetime. Thus, the selection of appropriate CHs in WSNs is a challenging task, which needs to be addressed. A multicriterion decision-making approach for the selection of CHs is presented using Pareto-optimal theory and technique for order preference by similarity to ideal solution (TOPSIS) methods. CHs are selected using three criteria including energy, cluster density and distance from the sink. The overall network lifetime in this method with 50% data aggregation after simulations is 81% higher than that of distributed hierarchical agglomerative clustering in similar environment and with same set of parameters. Optimum number of clusters is estimated using TOPSIS technique and found to be 9-11 for effective energy usage in WSNs.

  14. The use of the wavelet cluster analysis for asteroid family determination

    NASA Technical Reports Server (NTRS)

    Benjoya, Phillippe; Slezak, E.; Froeschle, Claude

    1992-01-01

    The asteroid family determination has been analysis method dependent for a longtime. A new cluster analysis based on the wavelet transform has allowed an automatic definition of families with a degree of significance versus randomness. Actually this method is rather general and can be applied to any kind of structural analysis. We will rather concentrate on the main features of the method. The analysis has been performed on the set of 4100 asteroid proper elements computed by Milani and Knezevic (see Milani and Knezevic 1990). Twenty one families have been found and influence of the chosen metric has been tested. The results have beem compared to Zappala et al.'s ones (see Zappala et al 1990) obtained by the use of a completely different method applied to the same set of data. For the first time, a good overlapping has been found between both method results, not only for the big well known families but also for the smallest ones.

  15. Multimorbidity Patterns in Elderly Primary Health Care Patients in a South Mediterranean European Region: A Cluster Analysis

    PubMed Central

    Foguet-Boreu, Quintí; Violán, Concepción; Rodriguez-Blanco, Teresa; Roso-Llorach, Albert; Pons-Vigués, Mariona; Pujol-Ribera, Enriqueta; Cossio Gil, Yolima; Valderas, Jose M.

    2015-01-01

    Objective The purpose of this study was to identify clusters of diagnoses in elderly patients with multimorbidity, attended in primary care. Design Cross-sectional study. Setting 251 primary care centres in Catalonia, Spain. Participants Individuals older than 64 years registered with participating practices. Main outcome measures Multimorbidity, defined as the coexistence of 2 or more ICD-10 disease categories in the electronic health record. Using hierarchical cluster analysis, multimorbidity clusters were identified by sex and age group (65–79 and ≥80 years). Results 322,328 patients with multimorbidity were included in the analysis (mean age, 75.4 years [Standard deviation, SD: 7.4], 57.4% women; mean of 7.9 diagnoses [SD: 3.9]). For both men and women, the first cluster in both age groups included the same two diagnoses: Hypertensive diseases and Metabolic disorders. The second cluster contained three diagnoses of the musculoskeletal system in the 65- to 79-year-old group, and five diseases coincided in the ≥80 age group: varicose veins of the lower limbs, senile cataract, dorsalgia, functional intestinal disorders and shoulder lesions. The greatest overlap (54.5%) between the three most common diagnoses was observed in women aged 65–79 years. Conclusion This cluster analysis of elderly primary care patients with multimorbidity, revealed a single cluster of circulatory-metabolic diseases that were the most prevalent in both age groups and sex, and a cluster of second-most prevalent diagnoses that included musculoskeletal diseases. Clusters unknown to date have been identified. The clusters identified should be considered when developing clinical guidance for this population. PMID:26524599

  16. Genomic structural analysis of porcine fatty acid desaturase cluster on chromosome 2.

    PubMed

    Taniguchi, Masaaki; Arakawa, Aisaku; Motoyama, Michiyo; Nakajima, Ikuyo; Nii, Masahiro; Mikawa, Satoshi

    2015-04-01

    Fatty acid composition is an economically important trait in meat-producing livestock. To gain insight into the molecular genetics of fatty acid desaturase (FADS) genes in pigs, we investigated the genomic structure of the porcine FADS gene family on chromosome 2. We also examined the tissue distribution of FADS gene expression. The genomic structure of FADS family in mammals consists of three isoforms FADS1, FADS2 and FADS3. However, porcine FADS cluster in the latest pig genome assembly (Sscrofa 10.2) containing some gaps is distinct from that in other mammals. We therefore sought to determine the genomic structure, including the FADS cluster in a 200-kbp range by sequencing gap regions. The structure we obtained was similar to that in other mammals. We then investigated the porcine FADS1 transcription start site and identified a novel isoform named FADS1b. Phylogenetic analysis revealed that the three members of the FADS cluster were orthologous among mammals, whereas the various FADS1 isoforms identified in pigs, mice and cattle might be attributable to species-specific transcriptional regulation with alternative promoters. Porcine FADS1b and FADS3 isoforms were predominantly expressed in the inner layer of the subcutaneous adipose tissue. Additional analyses will reveal the effects of these functionally unknown isoforms on fatty acid composition in pig fat tissues. PMID:25409917

  17. Selecting background galaxies in weak-lensing analysis of galaxy clusters

    NASA Astrophysics Data System (ADS)

    Formicola, I.; Radovich, M.; Meneghetti, M.; Mazzotta, P.; Grado, A.; Giocoli, C.

    2016-05-01

    In this paper, we present a new method to select the faint, background galaxies used to derive the mass of galaxy clusters by weak lensing. The method is based on the simultaneous analysis of the shear signal, that should be consistent with zero for the foreground, unlensed galaxies, and of the colours of the galaxies: photometric data from the COSMic evOlution Survey are used to train the colour selection. In order to validate this methodology, we test it against a set of state-of-the-art image simulations of mock galaxy clusters in different redshift [0.23-0.45] and mass [0.5-1.55 × 1015 M⊙] ranges, mimicking medium-deep multicolour imaging observations [e.g. Subaru, Large Binocular Telescope]. The performance of our method in terms of contamination by unlensed sources is comparable to a selection based on photometric redshifts, which however requires a good spectral coverage and is thus much more observationally demanding. The application of our method to simulations gives an average ratio between estimated and true masses of ˜0.98 ± 0.09. As a further test, we finally apply our method to real data, and compare our results with other weak-lensing mass estimates in the literature: for this purpose, we choose the cluster Abell 2219 (z = 0.228), for which multiband (BVRi) data are publicly available.

  18. Quasichemical analysis of the cluster-pair approximation for the thermodynamics of proton hydration

    NASA Astrophysics Data System (ADS)

    Pollard, Travis; Beck, Thomas L.

    2014-06-01

    A theoretical analysis of the cluster-pair approximation (CPA) is presented based on the quasichemical theory of solutions. The sought single-ion hydration free energy of the proton includes an interfacial potential contribution by definition. It is shown, however, that the CPA involves an extra-thermodynamic assumption that does not guarantee uniform convergence to a bulk free energy value with increasing cluster size. A numerical test of the CPA is performed using the classical polarizable AMOEBA force field and supporting quantum chemical calculations. The enthalpy and free energy differences are computed for the kosmotropic Na+/F- ion pair in water clusters of size n = 5, 25, 105. Additional calculations are performed for the chaotropic Rb+/I- ion pair. A small shift in the proton hydration free energy and a larger shift in the hydration enthalpy, relative to the CPA values, are predicted based on the n = 105 simulations. The shifts arise from a combination of sequential hydration and interfacial potential effects. The AMOEBA and quantum chemical results suggest an electrochemical surface potential of water in the range -0.4 to -0.5 V. The physical content of single-ion free energies and implications for ion-water force field development are also discussed.

  19. Cluster Analysis of Atmospheric Dynamics and Pollution Transport in a Coastal Area

    NASA Astrophysics Data System (ADS)

    Sokolov, Anton; Dmitriev, Egor; Maksimovich, Elena; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmentin, Marc; Locoge, Nadine

    2016-06-01

    Summertime atmospheric dynamics in the coastal zone of the industrialized Dunkerque agglomeration in northern France was characterized by a cluster analysis of back trajectories in the context of pollution transport. The MESO-NH atmospheric model was used to simulate the local dynamics at multiple scales with horizontal resolution down to 500 m, and for the online calculation of the Lagrangian backward trajectories with 30-min temporal resolution. Airmass transport was performed along six principal pathways obtained by the weighted k-means clustering technique. Four of these centroids corresponded to a range of wind speeds over the English Channel: two for wind directions from the north-east and two from the south-west. Another pathway corresponded to a south-westerly continental transport. The backward trajectories of the largest and most dispersed sixth cluster contained low wind speeds, including sea-breeze circulations. Based on analyses of meteorological data and pollution measurements, the principal atmospheric pathways were related to local air-contamination events. Continuous air quality and meteorological data were collected during the Benzene-Toluene-Ethylbenzene-Xylene 2006 campaign. The sites of the pollution measurements served as the endpoints for the backward trajectories. Pollutant transport pathways corresponding to the highest air contamination were defined.

  20. Clustering drug-drug interaction networks with energy model layouts: community analysis and drug repurposing

    PubMed Central

    Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai

    2016-01-01

    Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications. PMID:27599720

  1. Editing ERTS-1 data to exclude land aids cluster analysis of water targets

    NASA Technical Reports Server (NTRS)

    Erb, R. B. (Principal Investigator)

    1973-01-01

    The author has identified the following significant results. It has been determined that an increase in the number of spectrally distinct coastal water types is achieved when data values over the adjacent land areas are excluded from the processing routine. This finding resulted from an automatic clustering analysis of ERTS-1 system corrected MSS scene 1002-18134 of 25 July 1972 over Monterey Bay, California. When the entire study area data set was submitted to the clustering only two distinct water classes were extracted. However, when the land area data points were removed from the data set and resubmitted to the clustering routine, four distinct groupings of water features were identified. Additionally, unlike the previous separation, the four types could be correlated to features observable in the associated ERTS-1 imagery. This exercise demonstrates that by proper selection of data submitted to the processing routine, based upon the specific application of study, additional information may be extracted from the ERTS-1 MSS data.

  2. Detection and whole genome sequence analysis of an enterovirus 68 cluster

    PubMed Central

    2013-01-01

    Background Enteroviruses are a common cause of human disease and are associated with a wide range of clinical manifestations. Enterovirus 68 is rarely detected yet was reported in many countries in 2010. Here enterovirus 68 was identified for the first time in New Zealand in 2010 and was detected in a further fourteen specimens over a six month period. Objectives To genetically characterise enterovirus 68 specimens identified in New Zealand in 2010. Study design The genome sequence of a New Zealand representative enterovirus 68 isolate was obtained. Ten clinical specimens were analysed by sequencing the VP1 region of the enterovirus 68 genome. Results Based on sequence analysis of the VP1 region and the full genome of one representative isolate, the New Zealand enterovirus 68 isolates clustered with contemporary enterovirus 68 viruses and do not show any clear distinguishing genetic diversity when compared to other strains. All fifteen specimens showed high similarity with enterovirus 68 by VP1 sequencing. The majority of New Zealand patients suffered from bronchiolitis, were less than two years of age and were of Pacific Island or Maori descent. Conclusions We document the rare occurrence of an enterovirus 68 cluster in New Zealand in 2010. These viruses shared similarity with other clusters of enterovirus 68 that occurred globally in 2010. A greater awareness in enterovirus 68 infection may help detect this virus with increased frequency and enable us to better understand the role this strain plays in disease and the reasons behind this global emergence in 2010. PMID:23548106

  3. Eating Behaviours of British University Students: A Cluster Analysis on a Neglected Issue

    PubMed Central

    Tanton, Jina; Dodd, Lorna J.; Woodfield, Lorayne; Mabhala, Mzwandile

    2015-01-01

    Unhealthy diet is a primary risk factor for noncommunicable diseases. University student populations are known to engage in health risking lifestyle behaviours including risky eating behaviours. The purpose of this study was to examine eating behaviour patterns in a population of British university students using a two-step cluster analysis. Consumption prevalence of snack, convenience, and fast foods in addition to fruit and vegetables was measured using a self-report “Student Eating Behaviours” questionnaire on 345 undergraduate university students. Four clusters were identified: “risky eating behaviours,” “mixed eating behaviours,” “moderate eating behaviours,” and “favourable eating behaviours.” Nineteen percent of students were categorised as having “favourable eating behaviours” whilst just under a third of students were categorised within the two most risky clusters. Riskier eating behaviour patterns were associated with living on campus and Christian faith. The findings of this study highlight the importance of university microenvironments on eating behaviours in university student populations. Religion as a mediator of eating behaviours is a novel finding. PMID:26550495

  4. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

    PubMed

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes- neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992

  5. Quasichemical analysis of the cluster-pair approximation for the thermodynamics of proton hydration

    SciTech Connect

    Pollard, Travis; Beck, Thomas L.

    2014-06-14

    A theoretical analysis of the cluster-pair approximation (CPA) is presented based on the quasichemical theory of solutions. The sought single-ion hydration free energy of the proton includes an interfacial potential contribution by definition. It is shown, however, that the CPA involves an extra-thermodynamic assumption that does not guarantee uniform convergence to a bulk free energy value with increasing cluster size. A numerical test of the CPA is performed using the classical polarizable AMOEBA force field and supporting quantum chemical calculations. The enthalpy and free energy differences are computed for the kosmotropic Na{sup +}/F{sup −} ion pair in water clusters of size n = 5, 25, 105. Additional calculations are performed for the chaotropic Rb{sup +}/I{sup −} ion pair. A small shift in the proton hydration free energy and a larger shift in the hydration enthalpy, relative to the CPA values, are predicted based on the n = 105 simulations. The shifts arise from a combination of sequential hydration and interfacial potential effects. The AMOEBA and quantum chemical results suggest an electrochemical surface potential of water in the range −0.4 to −0.5 V. The physical content of single-ion free energies and implications for ion-water force field development are also discussed.

  6. An adaptive regression mixture model for fMRI cluster analysis.

    PubMed

    Oikonomou, Vangelis P; Blekas, Konstantinos

    2013-04-01

    Functional magnetic resonance imaging (fMRI) has become one of the most important techniques for studying the human brain in action. A common problem in fMRI analysis is the detection of activated brain regions in response to an experimental task. In this work we propose a novel clustering approach for addressing this issue using an adaptive regression mixture model. The main contribution of our method is the employment of both spatial and sparse properties over the body of the mixture model. Thus, the clustering approach is converted into a maximum a posteriori estimation approach, where the expectation-maximization algorithm is applied for model training. Special care is also given to estimate the kernel scalar parameter per cluster of the design matrix by presenting a multi-kernel scheme. In addition an incremental training procedure is presented so as to make the approach independent on the initialization of the model parameters. The latter also allows us to introduce an efficient stopping criterion of the process for determining the optimum brain activation area. To assess the effectiveness of our method, we have conducted experiments with simulated and real fMRI data, where we have demonstrated its ability to produce improved performance and functional activation detection capabilities. PMID:23047865

  7. Clustering drug-drug interaction networks with energy model layouts: community analysis and drug repurposing.

    PubMed

    Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai

    2016-01-01

    Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications. PMID:27599720

  8. [Classification of allergens by positive percentage agreement and cluster analysis based on specific IgE antibodies in asthmatic children].

    PubMed

    Iwasaki, E; Baba, M

    1992-10-01

    Classification and characterization of allergens is important because allergic patients are sensitized by a variety of allergens. One hundred and sixty-one sera from asthmatic children were investigated for specific IgE antibodies against 35 allergens including 20 inhalants and 15 foods by means of the MAST method. We assessed the allergenic properties of the allergens based on positive percentage agreement and cluster analysis. There was a high positive percentage agreement of specific IgE antibodies between house dust and Dermatophagoides spp., a relatively high agreement between 5 molds, cat and dog epithelium, mugwort and wormwood and 5 grasses. Among the food allergens, the positive percentage agreements were relatively high, especially between cow's milk, casein, cheese, and between 3 cereal grains. In the cluster analysis, house dust and Dermatophagoides spp. made a big cluster; therefore 32 allergens except house dust and mites were analyzed. From the results of the cluster analysis, the major cluster consisted of (1) ragweed, (2) mugwort and wormwood, (3) timothy, sweet vernal, velvet and cultivated rye, (4) wheat, barley and rice, (5) molds, (6) cow's milk, casein, soybean and cheese, (7) shrimp and crab, (8) egg white, (9) Japanese cedar, (10) dog epithelium, (11) cat epithelium. The cluster of grass pollens and cereal grains made one cluster. These results tend to confirm the presence of species cross-reactivities within the major classes of allergens. PMID:1482294

  9. JOINT ANALYSIS OF CLUSTER OBSERVATIONS. II. CHANDRA/XMM-NEWTON X-RAY AND WEAK LENSING SCALING RELATIONS FOR A SAMPLE OF 50 RICH CLUSTERS OF GALAXIES

    SciTech Connect

    Mahdavi, Andisheh; Hoekstra, Henk; Babul, Arif; Bildfell, Chris; Jeltema, Tesla; Henry, J. Patrick

    2013-04-20

    We present a study of multiwavelength X-ray and weak lensing scaling relations for a sample of 50 clusters of galaxies. Our analysis combines Chandra and XMM-Newton data using an energy-dependent cross-calibration. After considering a number of scaling relations, we find that gas mass is the most robust estimator of weak lensing mass, yielding 15% {+-} 6% intrinsic scatter at r{sub 500}{sup WL} (the pseudo-pressure Y{sub X} yields a consistent scatter of 22% {+-} 5%). The scatter does not change when measured within a fixed physical radius of 1 Mpc. Clusters with small brightest cluster galaxy (BCG) to X-ray peak offsets constitute a very regular population whose members have the same gas mass fractions and whose even smaller (<10%) deviations from regularity can be ascribed to line of sight geometrical effects alone. Cool-core clusters, while a somewhat different population, also show the same (<10%) scatter in the gas mass-lensing mass relation. There is a good correlation and a hint of bimodality in the plane defined by BCG offset and central entropy (or central cooling time). The pseudo-pressure Y{sub X} does not discriminate between the more relaxed and less relaxed populations, making it perhaps the more even-handed mass proxy for surveys. Overall, hydrostatic masses underestimate weak lensing masses by 10% on the average at r{sub 500}{sup WL}; but cool-core clusters are consistent with no bias, while non-cool-core clusters have a large and constant 15%-20% bias between r{sub 2500}{sup WL} and r{sub 500}{sup WL}, in agreement with N-body simulations incorporating unthermalized gas. For non-cool-core clusters, the bias correlates well with BCG ellipticity. We also examine centroid shift variance and power ratios to quantify substructure; these quantities do not correlate with residuals in the scaling relations. Individual clusters have for the most part forgotten the source of their departures from self-similarity.

  10. ANALYSIS OF DETACHED ECLIPSING BINARIES NEAR THE TURNOFF OF THE OPEN CLUSTER NGC 7142

    SciTech Connect

    Sandquist, Eric L.; Serio, Andrew W.; Orosz, Jerome; Shetrone, Matthew E-mail: aserio@gemini.edu E-mail: shetrone@astro.as.utexas.edu

    2013-08-01

    We analyze extensive BVR{sub C}I{sub C} photometry and radial velocity measurements for three double-lined deeply eclipsing binary stars in the field of the old open cluster NGC 7142. The short period (P = 1.9096825 days) detached binary V375 Cep is a high probability cluster member, and has a total eclipse of the secondary star. The characteristics of the primary star (M = 1.288 {+-} 0.017 M{sub Sun }) at the cluster turnoff indicate an age of 3.6 Gyr (with a random uncertainty of 0.25 Gyr), consistent with earlier analysis of the color-magnitude diagram. The secondary star (M = 0.871 {+-} 0.008 M{sub Sun }) is not expected to have evolved significantly, but its radius is more than 10% larger than predicted by models. Because this binary system has a known age, it is useful for testing the idea that radius inflation can occur in short period binaries for stars with significant convective envelopes due to the inhibition of energy transport by magnetic fields. The brighter star in the binary also produces a precision estimate of the distance modulus, independent of reddening estimates: (m - M){sub V} = 12.86 {+-} 0.07. The other two eclipsing binary systems are not cluster members, although one of the systems (V2) could only be conclusively ruled out as a present or former member once the stellar characteristics were determined. That binary is within 0. Degree-Sign 5 of edge-on, is in a fairly long-period eccentric binary, and contains two almost indistinguishable stars. The other binary (V1) has a small but nonzero eccentricity (e = 0.038) in spite of having an orbital period under 5 days.

  11. Comprehensive curation and analysis of fungal biosynthetic gene clusters of published natural products.

    PubMed

    Li, Yong Fuga; Tsai, Kathleen J S; Harvey, Colin J B; Li, James Jian; Ary, Beatrice E; Berlew, Erin E; Boehman, Brenna L; Findley, David M; Friant, Alexandra G; Gardner, Christopher A; Gould, Michael P; Ha, Jae H; Lilley, Brenna K; McKinstry, Emily L; Nawal, Saadia; Parry, Robert C; Rothchild, Kristina W; Silbert, Samantha D; Tentilucci, Michael D; Thurston, Alana M; Wai, Rebecca B; Yoon, Yongjin; Aiyar, Raeka S; Medema, Marnix H; Hillenmeyer, Maureen E; Charkoudian, Louise K

    2016-04-01

    Microorganisms produce a wide range of natural products (NPs) with clinically and agriculturally relevant biological activities. In bacteria and fungi, genes encoding successive steps in a biosynthetic pathway tend to be clustered on the chromosome as biosynthetic gene clusters (BGCs). Historically, "activity-guided" approaches to NP discovery have focused on bioactivity screening of NPs produced by culturable microbes. In contrast, recent "genome mining" approaches first identify candidate BGCs, express these biosynthetic genes using synthetic biology methods, and finally test for the production of NPs. Fungal genome mining efforts and the exploration of novel sequence and NP space are limited, however, by the lack of a comprehensive catalog of BGCs encoding experimentally-validated products. In this study, we generated a comprehensive reference set of fungal NPs whose biosynthetic gene clusters are described in the published literature. To generate this dataset, we first identified NCBI records that included both a peer-reviewed article and an associated nucleotide record. We filtered these records by text and homology criteria to identify putative NP-related articles and BGCs. Next, we manually curated the resulting articles, chemical structures, and protein sequences. The resulting catalog contains 197 unique NP compounds covering several major classes of fungal NPs, including polyketides, non-ribosomal peptides, terpenoids, and alkaloids. The distribution of articles published per compound shows a bias toward the study of certain popular compounds, such as the aflatoxins. Phylogenetic analysis of biosynthetic genes suggests that much chemical and enzymatic diversity remains to be discovered in fungi. Our catalog was incorporated into the recently launched Minimum Information about Biosynthetic Gene cluster (MIBiG) repository to create the largest known set of fungal BGCs and associated NPs, a resource that we anticipate will guide future genome mining and

  12. Using cluster analysis to identify patterns in students' responses to contextually different conceptual problems

    NASA Astrophysics Data System (ADS)

    Stewart, John; Miller, Mayo; Audo, Christine; Stewart, Gay

    2012-12-01

    This study examined the evolution of student responses to seven contextually different versions of two Force Concept Inventory questions in an introductory physics course at the University of Arkansas. The consistency in answering the closely related questions evolved little over the seven-question exam. A model for the state of student knowledge involving the probability of selecting one of the multiple-choice answers was developed. Criteria for using clustering algorithms to extract model parameters were explored and it was found that the overlap between the probability distributions of the model vectors was an important parameter in characterizing the cluster models. The course data were then clustered and the extracted model showed that students largely fit into two groups both pre- and postinstruction: one that answered all questions correctly with high probability and one that selected the distracter representing the same misconception with high probability. For the course studied, 14% of the students were left with persistent misconceptions post instruction on a static force problem and 30% on a dynamic Newton’s third law problem. These students selected the answer representing the predominant misconception slightly more consistently postinstruction, indicating that the course studied had been ineffective at moving this subgroup of students nearer a Newtonian force concept and had instead moved them slightly farther away from a correct conceptual understanding of these two problems. The consistency in answering pairs of problems with varied physical contexts is shown to be an important supplementary statistic to the score on the problems and suggests that the inclusion of such problem pairs in future conceptual inventories would be efficacious. Multiple, contextually varied questions further probe the structure of students’ knowledge. To allow working instructors to make use of the additional insight gained from cluster analysis, it is our hope that the

  13. Psychological Factors Predict Local and Referred Experimental Muscle Pain: A Cluster Analysis in Healthy Adults

    PubMed Central

    Lee, Jennifer E.; Watson, David; Frey-Law, Laura A.

    2012-01-01

    Background Recent studies suggest an underlying three- or four-factor structure explains the conceptual overlap and distinctiveness of several negative emotionality and pain-related constructs. However, the validity of these latent factors for predicting pain has not been examined. Methods A cohort of 189 (99F; 90M) healthy volunteers completed eight self-report negative emotionality and pain-related measures (Eysenck Personality Questionnaire-Revised; Positive and Negative Affect Schedule; State-Trait Anxiety Inventory; Pain Catastrophizing Scale; Fear of Pain Questionnaire; Somatosensory Amplification Scale; Anxiety Sensitivity Index; Whiteley Index). Using principal axis factoring, three primary latent factors were extracted: General Distress; Catastrophic Thinking; and Pain-Related Fear. Using these factors, individuals clustered into three subgroups of high, moderate, and low negative emotionality responses. Experimental pain was induced via intramuscular acidic infusion into the anterior tibialis muscle, producing local (infusion site) and/or referred (anterior ankle) pain and hyperalgesia. Results Pain outcomes differed between clusters (multivariate analysis of variance and multinomial regression), with individuals in the highest negative emotionality cluster reporting the greatest local pain (p = 0.05), mechanical hyperalgesia (pressure pain thresholds; p = 0.009) and greater odds (2.21 OR) of experiencing referred pain compared to the lowest negative emotionality cluster. Conclusion Our results provide support for three latent psychological factors explaining the majority of the variance between several pain-related psychological measures, and that individuals in the high negative emotionality subgroup are at increased risk for (1) acute local muscle pain; (2) local hyperalgesia; and (3) referred pain using a standardized nociceptive input. PMID:23165778

  14. Analysis of Detached Eclipsing Binaries near the Turnoff of the Open Cluster NGC 7142

    NASA Astrophysics Data System (ADS)

    Sandquist, Eric L.; Shetrone, Matthew; Serio, Andrew W.; Orosz, Jerome

    2013-08-01

    We analyze extensive BVRCIC photometry and radial velocity measurements for three double-lined deeply eclipsing binary stars in the field of the old open cluster NGC 7142. The short period (P = 1.9096825 days) detached binary V375 Cep is a high probability cluster member, and has a total eclipse of the secondary star. The characteristics of the primary star (M = 1.288 ± 0.017 M ⊙) at the cluster turnoff indicate an age of 3.6 Gyr (with a random uncertainty of 0.25 Gyr), consistent with earlier analysis of the color-magnitude diagram. The secondary star (M = 0.871 ± 0.008 M ⊙) is not expected to have evolved significantly, but its radius is more than 10% larger than predicted by models. Because this binary system has a known age, it is useful for testing the idea that radius inflation can occur in short period binaries for stars with significant convective envelopes due to the inhibition of energy transport by magnetic fields. The brighter star in the binary also produces a precision estimate of the distance modulus, independent of reddening estimates: (m - M) V = 12.86 ± 0.07. The other two eclipsing binary systems are not cluster members, although one of the systems (V2) could only be conclusively ruled out as a present or former member once the stellar characteristics were determined. That binary is within 0.°5 of edge-on, is in a fairly long-period eccentric binary, and contains two almost indistinguishable stars. The other binary (V1) has a small but nonzero eccentricity (e = 0.038) in spite of having an orbital period under 5 days.

  15. Genetic Analysis of the Serratia marcescens N28b O4 Antigen Gene Cluster

    PubMed Central

    Saigí, Francesc; Climent, Núria; Piqué, Núria; Sanchez, Cesar; Merino, Susana; Rubirés, Xavier; Aguilar, Alicia; Tomás, Juan M.; Regué, Miguel

    1999-01-01

    The Serratia marcescens N28b wbbL gene has been shown to complement the rfb-50 mutation of Escherichia coli K-12 derivatives, and a wbbL mutant has been shown to be impaired in O4-antigen biosynthesis (X. Rubirés, F. Saigí, N. Piqué, N. Climent, S. Merino, S. Albertí, J. M. Tomás, and M. Regué, J. Bacteriol. 179:7581–7586, 1997). We analyzed a recombinant cosmid containing the wbbL gene by subcloning and determination of O-antigen production phenotype in E. coli DH5α by sodium dodecyl sulfate-polyacrylamide electrophoresis and Western blot experiments with S. marcescens O4 antiserum. The results obtained showed that a recombinant plasmid (pSUB6) containing about 10 kb of DNA insert was enough to induce O4-antigen biosynthesis. The same results were obtained when an E. coli K-12 strain with a deletion of the wb cluster was used, suggesting that the O4 wb cluster is located in pSUB6. No O4 antigen was produced when plasmid pSUB6 was introduced in a wecA mutant E. coli strain, suggesting that O4-antigen production is wecA dependent. Nucleotide sequence determination of the whole insert in plasmid pSUB6 showed seven open reading frames (ORFs). On the basis of protein similarity analysis of the ORF-encoded proteins and analysis of the S. marcescens N28b wbbA insertion mutant and wzm-wzt deletion mutant, we suggest that the O4 wb cluster codes for two dTDP-rhamnose biosynthetic enzymes (RmlDC), a rhamnosyltransferase (WbbL), a two-component ATP-binding-cassette-type export system (Wzm Wzt), and a putative glycosyltransferase (WbbA). A sequence showing DNA homology to insertion element IS4 was found downstream from the last gene in the cluster (wbbA), suggesting that an IS4-like element could have been involved in the acquisition of the O4 wb cluster. PMID:10074083

  16. Similarity and Cluster Analysis of Intermediate Deep Events in the Southeastern Aegean

    NASA Astrophysics Data System (ADS)

    Ruscic, Marija; Becker, Dirk; Brüstle, Andrea; Meier, Thomas

    2016-04-01

    In order to gain a better understanding of geodynamic processes in the Hellenic subduction zone (HSZ), in particular in the eastern part of the HSZ, we analyze a cluster of intermediate deep events in the region of Nisyros volcano. The cluster recorded during the deployment of the temporary seismic network EGELADOS consists of 159 events at 80 to 200 km depth with local magnitudes ranging from magnitude 0.2 to magnitude 4.1. The network itself consisted of 56 onshore and 23 offshore broadband stations completed by 19 permanent stations from NOA, GEOFON and MedNet. It was deployed from September 2005 to March 2007 and it covered the entire HSZ. Here, both spatial and temporal clustering of the recorded events is studied by using the three component similarity analysis. The waveform cross-correlation was performed for all event combinations using data recorded on 45 onshore stations. The results are shown as a function of frequency for individual stations and as averaged values over the network. The cross-correlation coefficients at the single stations show a decreasing similarity with increasing epicentral distance as well as the effect of local heterogeneities at particular stations, causing noticeable differences in waveform similarities. Event relocation was performed by using the double-difference earthquake relocation software HypoDD and the results are compared with previously obtained single event locations which were calculated using nonlinear location tool NonLinLoc and station corrections. For the relocation, both differential travel times obtained by separate cross-correlation of P- and S-waveforms and manual readings of onset times are used. It is shown that after the relocation the inter-event distance for highly similar events has been reduced. By comparing the results of the cluster analysis with results obtained from the synthetic catalogs, where the event rate, portion and occurrence time of the aftershocks is varied, it is shown that the event

  17. Spatial assessment of air quality patterns in Malaysia using multivariate analysis

    NASA Astrophysics Data System (ADS)

    Dominick, Doreena; Juahir, Hafizan; Latif, Mohd Talib; Zain, Sharifuddin M.; Aris, Ahmad Zaharin

    2012-12-01

    This study aims to investigate possible sources of air pollutants and the spatial patterns within the eight selected Malaysian air monitoring stations based on a two-year database (2008-2009). The multivariate analysis was applied on the dataset. It incorporated Hierarchical Agglomerative Cluster Analysis (HACA) to access the spatial patterns, Principal Component Analysis (PCA) to determine the major sources of the air pollution and Multiple Linear Regression (MLR) to assess the percentage contribution of each air pollutant. The HACA results grouped the eight monitoring stations into three different clusters, based on the characteristics of the air pollutants and meteorological parameters. The PCA analysis showed that the major sources of air pollution were emissions from motor vehicles, aircraft, industries and areas of high population density. The MLR analysis demonstrated that the main pollutant contributing to variability in the Air Pollutant Index (API) at all stations was particulate matter with a diameter of less than 10 μm (PM10). Further MLR analysis showed that the main air pollutant influencing the high concentration of PM10 was carbon monoxide (CO). This was due to combustion processes, particularly originating from motor vehicles. Meteorological factors such as ambient temperature, wind speed and humidity were also noted to influence the concentration of PM10.

  18. ARCRAIDER. I. Detailed optical and X-ray analysis of the cooling flow cluster Z3146

    NASA Astrophysics Data System (ADS)

    Kausch, W.; Gitti, M.; Erben, T.; Schindler, S.

    2007-08-01

    We present a detailed analysis of the medium redshift (z = 0.2906) galaxy cluster Z3146 which is part of the ongoing ARCRAIDER project, a systematic search for gravitational arcs in massive clusters of galaxies. The analysis of Z3146 is based on deep optical wide field observations in the B, V and R bands obtained with the WFI@ESO2.2m, and shallow archival WFPC2@HST taken with the F606W filter, which are used for strong as well as weak lensing analyses. Additionally we have used publicly available XMM/Newton observations for a detailed X-ray analysis of Z3146. Both methods, lensing and X-ray, were used to determine the dynamical state and to estimate the total mass. We also identified four gravitational arc candidates. We find this cluster to be in a relaxed state, which is confirmed by a large cooling flow with nominal ~1600 M_⊙ per year, regular galaxy density and light distributions and a regular shape of the weak lensing mass reconstruction. The mass content derived with the different methods agrees well within 25% at r200=1661 h70-1 kpc indicating a velocity dispersion of σ_v=869+124-153 km s-1. Based on observations made with the NASA/ESA Hubble Space Telescope, obtained from the data archive at the Space Telescope Institute (PID-number 8301). STScI is operated by the association of Universities for Research in Astronomy, Inc. under the NASA contract NAS 5-26555. Also based on observations made with ESO Telescopes at the La Silla or Paranal Observatories under programme ID 68.A-02555 and 073.A-0050 and on observations with XMM-Newton, an ESA Science Mission with instruments and contributions directly funded by ESA Member states and the USA (NASA).

  19. CLASH: Weak-lensing shear-and-magnification analysis of 20 galaxy clusters

    SciTech Connect

    Umetsu, Keiichi; Czakon, Nicole; Medezinski, Elinor; Lemze, Doron; Ford, Holland; Nonino, Mario; Balestra, Italo; Biviano, Andrea; Merten, Julian; Postman, Marc; Koekemoer, Anton; Meneghetti, Massimo; Donahue, Megan; Molino, Alberto; Benítez, Narciso; Seitz, Stella; Gruen, Daniel; Broadhurst, Tom; Grillo, Claudio; Melchior, Peter; and others

    2014-11-10

    We present a joint shear-and-magnification weak-lensing analysis of a sample of 16 X-ray-regular and 4 high-magnification galaxy clusters at 0.19 ≲ z ≲ 0.69 selected from the Cluster Lensing And Supernova survey with Hubble (CLASH). Our analysis uses wide-field multi-color imaging, taken primarily with Suprime-Cam on the Subaru Telescope. From a stacked-shear-only analysis of the X-ray-selected subsample, we detect the ensemble-averaged lensing signal with a total signal-to-noise ratio of ≅ 25 in the radial range of 200-3500 kpc h {sup –1}, providing integrated constraints on the halo profile shape and concentration-mass relation. The stacked tangential-shear signal is well described by a family of standard density profiles predicted for dark-matter-dominated halos in gravitational equilibrium, namely, the Navarro-Frenk-White (NFW), truncated variants of NFW, and Einasto models. For the NFW model, we measure a mean concentration of c{sub 200c}=4.01{sub −0.32}{sup +0.35} at an effective halo mass of M{sub 200c}=1.34{sub −0.09}{sup +0.10}×10{sup 15} M{sub ⊙}. We show that this is in excellent agreement with Λ cold dark matter (ΛCDM) predictions when the CLASH X-ray selection function and projection effects are taken into account. The best-fit Einasto shape parameter is α{sub E}=0.191{sub −0.068}{sup +0.071}, which is consistent with the NFW-equivalent Einasto parameter of ∼0.18. We reconstruct projected mass density profiles of all CLASH clusters from a joint likelihood analysis of shear-and-magnification data and measure cluster masses at several characteristic radii assuming an NFW density profile. We also derive an ensemble-averaged total projected mass profile of the X-ray-selected subsample by stacking their individual mass profiles. The stacked total mass profile, constrained by the shear+magnification data, is shown to be consistent with our shear-based halo-model predictions, including the effects of surrounding large-scale structure as

  20. Cluster analysis reveals risk factors for repeated suicide attempts in a multi-ethnic Asian population.

    PubMed

    Choo, Carol; Diederich, Joachim; Song, Insu; Ho, Roger

    2014-04-01

    This study explores underlying patterns in suicide risk factors using data mining techniques. Medical records of suicide attempters who were admitted to a teaching hospital in January 2004 - December 2006 were studied. Cluster analysis revealed hidden patterns for repeated and single attempters (n=418). Repeated attempters had a more complex clinical picture. Symptoms of psychotic illness, borderline personality disorder, and psychosomatic complaints of insomnia and headaches, reports of adverse life events such as unemployment, divorce and quarrels, experience of negative feelings, and usage of alcohol were associated with risk of repeated overdoses with benzodiazepines and paracetamol. The findings have implications for suicide assessments and interventions. PMID:24655624

  1. Regression Models for Demand Reduction based on Cluster Analysis of Load Profiles

    SciTech Connect

    Yamaguchi, Nobuyuki; Han, Junqiao; Ghatikar, Girish; Piette, Mary Ann; Asano, Hiroshi; Kiliccote, Sila

    2009-06-28

    This paper provides new regression models for demand reduction of Demand Response programs for the purpose of ex ante evaluation of the programs and screening for recruiting customer enrollment into the programs. The proposed regression models employ load sensitivity to outside air temperature and representative load pattern derived from cluster analysis of customer baseline load as explanatory variables. The proposed models examined their performances from the viewpoint of validity of explanatory variables and fitness of regressions, using actual load profile data of Pacific Gas and Electric Company's commercial and industrial customers who participated in the 2008 Critical Peak Pricing program including Manual and Automated Demand Response.

  2. Performance analysis of the Alliant FX/8 multiprocessor using statistical clustering

    NASA Technical Reports Server (NTRS)

    Dimpsey, Robert Tod

    1988-01-01

    Results for two distinct, real, scientific workloads executed on an Alliant FX/8 are discussed. A combination of user concurrency and system overhead measurements was taken for both workloads. Preliminary analysis shows that the first sampled workload is comprised of consistently high user concurrency, low system overhead, and little paging. The second sample has much less user concurrency, but significant paging and system overhead. Statistical cluster analysis is used to extract a state transition model to jointly characterize user concurrency and system overhead. A skewness factor is introduced and used to bring out the effects of unbalanced clustering when determining states with important transitions. The results from the models show that during the collection of the first sample, the system was operating in states of high user concurrency approximately 75 percent of the time. The second workload sample shows the system in high user concurrency states only 26 percent of the time. In addition, it is ascertained that high system overhead is usually accompanied by low user concurrency. The analysis also shows a high predictability of system behavior for both workloads.

  3. Portraying persons who inject drugs recently infected with hepatitis C accessing antiviral treatment: a cluster analysis.

    PubMed

    Bamvita, Jean-Marie; Roy, Elise; Zang, Geng; Jutras-Aswad, Didier; Artenie, Andreea Adelina; Levesque, Annie; Bruneau, Julie

    2014-01-01

    Objectives. To empirically determine a categorization of people who inject drug (PWIDs) recently infected with hepatitis C virus (HCV), in order to identify profiles most likely associated with early HCV treatment uptake. Methods. The study population was composed of HIV-negative PWIDs with a documented recent HCV infection. Eligibility criteria included being 18 years old or over, and having injected drugs in the previous 6 months preceding the estimated date of HCV exposure. Participant classification was carried out using a TwoStep cluster analysis. Results. From September 2007 to December 2011, 76 participants were included in the study. 60 participants were eligible for HCV treatment. Twenty-one participants initiated HCV treatment. The cluster analysis yielded 4 classes: class 1: Lukewarm health seekers dismissing HCV treatment offer; class 2: multisubstance users willing to shake off the hell; class 3: PWIDs unlinked to health service use; class 4: health seeker PWIDs willing to reverse the fate. Conclusion. Profiles generated by our analysis suggest that prior health care utilization, a key element for treatment uptake, differs between older and younger PWIDs. Such profiles could inform the development of targeted strategies to improve health outcomes and reduce HCV infection among PWIDs. PMID:25349730

  4. Investigating properties of a set of variable AGN with cluster analysis

    NASA Astrophysics Data System (ADS)

    Nair, A. D.

    1997-05-01

    Optical and gamma-ray properties of a sample of active galactic nuclei monitored at the Rosemary Hill Observatory are analysed using cluster analysis. Cluster analysis can be used to analyse large amounts of data with many variables and investigate linear or non-linear relationships in the data. It is found that the time-scale of variation is not related to the amplitude of variability. For BLLacs and optically violent variable (OVV) quasars the variability is proportional to the redshift and absolute magnitude, but this is not true for quasars in this sample. The analysis shows that gamma-ray-loud AGN tend to be associated with superluminal sources with OVV-like characteristics. The gamma-ray fluxes, for both OVV quasars and BLLacs, are proportional to the apparent transverse velocity, and this may point to beaming as the dominant cause for the gamma-ray flux. A large majority of the OVV quasars that display a large amplitude of variability are gamma- ray-loud, but this is not true for BL Lacs.

  5. Heterogeneity of Severe Asthma in Childhood: Confirmation by Cluster Analysis of Children in the NIH/NHLBI Severe Asthma Research Program (SARP)

    PubMed Central

    Fitzpatrick, Anne M.; Teague, W. Gerald; Meyers, Deborah A.; Peters, Stephen P.; Li, Xingnan; Li, Huashi; Wenzel, Sally E.; Aujla, Shean; Castro, Mario; Bacharier, Leonard B.; Gaston, Benjamin M.; Bleecker, Eugene R.; Moore, Wendy C.

    2011-01-01

    Background Asthma in children is a heterogeneous disorder with many phenotypes. Although unsupervised cluster analysis is a useful tool for identifying phenotypes, it has not been applied to school-age children with persistent asthma across a wide range of severities. Objectives This study determined how children with severe asthma are distributed across a cluster analysis and how well these clusters conform to current definitions of asthma severity. Methods Cluster analysis was applied to 12 continuous and composite variables from 161 children at 5 centers enrolled in the Severe Asthma Research Program (SARP). Results Four clusters of asthma were identified. Children in Cluster 1 (n = 48) had relatively normal lung function and less atopy, while children in Cluster 2 (n = 52) had slightly lower lung function, more atopy, and increased symptoms and medication usage. Cluster 3 (n = 32) had greater co-morbidity, increased bronchial responsiveness and lower lung function. Cluster 4 (n = 29) had the lowest lung function and the greatest symptoms and medication usage. Predictors of cluster assignment were asthma duration, the number of asthma controller medications, and baseline lung function. Children with severe asthma were present in all clusters, and no cluster corresponded to definitions of asthma severity provided in asthma treatment guidelines. Conclusions Severe asthma in children is highly heterogeneous. Unique phenotypic clusters previously identified in adults can also be identified in children, but with important differences. Larger validation and longitudinal studies are needed to determine the baseline and predictive validity of these phenotypic clusters in the larger clinical setting. PMID:21195471

  6. Nature and Determinants of the Course of Chronic Low Back Pain Over a 12-Month Period: A Cluster Analysis

    PubMed Central

    Maher, Christopher G.; Latimer, Jane; McAuley, James H.; Hodges, Paul W.; Rogers, W. Todd

    2014-01-01

    Background It has been suggested that low back pain (LBP) is a condition with an unpredictable pattern of exacerbation, remission, and recurrence. However, there is an incomplete understanding of the course of LBP and the determinants of the course. Objective The purposes of this study were: (1) to identify clusters of LBP patients with similar fluctuating pain patterns over time and (2) to investigate whether demographic and clinical characteristics can distinguish these clusters. Design This study was a secondary analysis of data extracted from a randomized controlled trial. Methods Pain scores were collected from 155 participants with chronic nonspecific LBP. Pain intensity was measured monthly over a 1-year period by mobile phone short message service. Cluster analysis was used to identify participants with similar fluctuating patterns of pain based on the pain measures collected over a year, and t tests were used to evaluate if the clusters differed in terms of baseline characteristics. Results The cluster analysis revealed the presence of 3 main clusters. Pain was of fluctuating nature within 2 of the clusters. Out of the 155 participants, 21 (13.5%) had fluctuating pain. Baseline disability (measured with the Roland-Morris Disability Questionnaire) and treatment groups (from the initial randomized controlled trial) were significantly different in the clusters of patients with fluctuating pain when compared with the cluster of patients without fluctuating pain. Limitations A limitation of this study was the fact that participants were undergoing treatment that may have been responsible for the rather positive prognosis observed. Conclusions A small number of patients with fluctuating patterns of pain over time were identified. This number could increase if individuals with episodic pain are included in this fluctuating group. PMID:24072729

  7. An empirical comparison of several clustered data approaches under confounding due to cluster effects in the analysis of complications of coronary angioplasty.

    PubMed

    Berlin, J A; Kimmel, S E; Ten Have, T R; Sammel, M D

    1999-06-01

    In the analysis of binary response data from many types of large studies, the data are likely to have arisen from multiple centers, resulting in a within-center correlation for the response. Such correlation, or clustering, occurs when outcomes within centers tend to be more similar to each other than to outcomes in other centers. In studies where there is also variability among centers with respect to the exposure of interest, analysis of the exposure-outcome association may be confounded, even after accounting for within-center correlations. We apply several analytic methods to compare the risk of major complications associated with two strategies, staged and combined procedures, for performing percutaneous transluminal coronary angioplasty (PTCA), a mechanical means of relieving blockage of blood vessels due to atherosclerosis. Combined procedures are used in some centers as a cost-cutting strategy. We performed a number of population-averaged and cluster-specific (conditional) analyses, which (a) make no adjustments for center effects of any kind; (b) make adjustments for the effect of center on only the response; or (c) make adjustments for both the effect of center on the response and the relationship between center and exposure. The method used for this third approach decomposes the procedure type variable into within-center and among-center components, resulting in two odds ratio estimates. The naive analysis, ignoring clusters, gave a highly significant effect of procedure type (OR = 1.6). Population average models gave marginally to very nonsignificant estimates of the OR for treatment type ranging from 1.6 to 1.2 with adjustment only for the effect of centers on response. These results depended on the assumed correlation structure. Conditional (cluster-specific) models and other methods that decomposed the treatment type variable into among- and within-center components all found no within-center effect of procedure type (OR = 1.02, consistently) and a

  8. A 2163: Merger events in the hottest Abell galaxy cluster. I. Dynamical analysis from optical data

    NASA Astrophysics Data System (ADS)

    Maurogordato, S.; Cappi, A.; Ferrari, C.; Benoist, C.; Mars, G.; Soucail, G.; Arnaud, M.; Pratt, G. W.; Bourdin, H.; Sauvageot, J.-L.

    2008-04-01

    Context: A 2163 is among the richest and most distant Abell clusters, presenting outstanding properties in different wavelength domains. X-ray observations have revealed a distorted gas morphology and strong features have been detected in the temperature map, suggesting that merging processes are important in this cluster. However, the merging scenario is not yet well-defined. Aims: We have undertaken a complementary optical analysis, aiming to understand the dynamics of the system, to constrain the merging scenario and to test its effect on the properties of galaxies. Methods: We present a detailed optical analysis of A 2163 based on new multicolor wide-field imaging and medium-to-high resolution spectroscopy of several hundred galaxies. Results: The projected galaxy density distribution shows strong subclustering with two dominant structures: a main central component (A), and a northern component (B), visible both in optical and in X-ray, with two other substructures detected at high significance in the optical. At magnitudes fainter than R=19, the galaxy distribution shows a clear elongation approximately with the east-west axis extending over 4~h70-1 Mpc, while a nearly perpendicular bridge of galaxies along the north-south axis appears to connect (B) to (A). The (A) component shows a bimodal morphology, and the positions of its two density peaks depend on galaxy luminosity: at magnitudes fainter than R = 19, the axis joining the peaks shows a counterclockwise rotation (from NE/SW to E-W) centered on the position of the X-ray maximum. Our final spectroscopic catalog of 512 objects includes 476 new galaxy redshifts. We have identified 361 galaxies as cluster members; among them, 326 have high precision redshift measurements, which allow us to perform a detailed dynamical analysis of unprecedented accuracy. The cluster mean redshift and velocity dispersion are respectively z= 0.2005 ± 0.0003 and 1434 ± 60 km s-1. We spectroscopically confirm that the northern

  9. A Nonparametric Method for Detecting Fixations and Saccades Using Cluster Analysis: Removing the Need for Arbitrary Thresholds

    PubMed Central

    König, Seth D.; Buffalo, Elizabeth A.

    2014-01-01

    Background Eye tracking is an important component of many human and non-human primate behavioral experiments. As behavioral paradigms have become more complex, including unconstrained viewing of natural images, eye movements measured in these paradigms have become more variable and complex as well. Accordingly, the common practice of using acceleration, dispersion, or velocity thresholds to segment viewing behavior into periods of fixations and saccades may be insufficient. New Method Here we propose a novel algorithm, called Cluster Fix, which uses k-means cluster analysis to take advantage of the qualitative differences between fixations and saccades. The algorithm finds natural divisions in 4 state space parameters—distance, velocity, acceleration, and angular velocity—to separate scan paths into periods of fixations and saccades. The number and size of clusters adjusts to the variability of individual scan paths. Results Cluster Fix can detect small saccades that were often indistinguishable from noisy fixations. Local analysis of fixations helped determine the transition times between fixations and saccades. Comparison with Existing Methods Because Cluster Fix detects natural divisions in the data, predefined thresholds are not needed. Conclusions A major advantage of Cluster Fix is the ability to precisely identify the beginning and end of saccades, which is essential for studying neural activity that is modulated by or time-locked to saccades. Our data suggest that Cluster Fix is more sensitive than threshold-based algorithms but comes at the cost of an increase in computational time. PMID:24509130

  10. Front Crawl Sprint Performance: A Cluster Analysis of Biomechanics, Energetics, Coordinative, and Anthropometric Determinants in Young Swimmers.

    PubMed

    Figueiredo, Pedro; Silva, Ana; Sampaio, António; Vilas-Boas, João Paulo; Fernandes, Ricardo J

    2016-07-01

    The aim of this study was to evaluate the determinants of front crawl sprint performance of young swimmers using a cluster analysis. 103 swimmers, aged 11- to 13-years old, performed 25-m front crawl swimming at 50-m pace, recorded by two underwater cameras. Swimmers analysis included biomechanics, energetics, coordinative, and anthropometric characteristics. The organization of subjects in meaningful clusters, originated three groups (1.52 ± 0.16, 1.47 ± 0.17 and 1.40 ± 0.15 m/s, for Clusters 1, 2 and 3, respectively) with differences in velocity between Cluster 1 and 2 compared with Cluster 3 (p = .003). Anthropometric variables were the most determinants for clusters solution. Stroke length and stroke index were also considered relevant. In addition, differences between Cluster 1 and the others were also found for critical velocity, stroke rate and intracycle velocity variation (p < .05). It can be concluded that anthropometrics, technique and energetics (swimming efficiency) are determinant domains to young swimmers sprint performance. PMID:26061270

  11. Internal dynamics of the radio-halo cluster A2219: A multi-wavelength analysis

    NASA Astrophysics Data System (ADS)

    Boschin, W.; Girardi, M.; Barrena, R.; Biviano, A.; Feretti, L.; Ramella, M.

    2004-03-01

    We present the results of the dynamical analysis of the rich, hot, and X-ray very luminous galaxy cluster A2219, containing a powerful diffuse radio-halo. Our analysis is based on new redshift data for 27 galaxies in the cluster region, measured from spectra obtained at the TNG, with the addition of other 105 galaxies recovered from reduction of CFHT archive data in a cluster region of ˜5 arcmin radius (˜ 0.8 h-1 Mpc ; at the cluster distance) centered on the cD galaxy. The investigation of the dynamical status is also performed using X-ray data stored in the Chandra archive. Further, valuable information comes from other bands - optical photometric, infrared, and radio data - which are analyzed and/or discussed, too. We find that A2219 appears as a peak in the velocity space at z=0.225, and select 113 cluster members. We compute a high value for the line-of-sight velocity dispersion, σv= 1438+109-86 km s-1, consistent with the high average X-ray temperature of 10.3 keV. If dynamical equilibrium is assumed, the virial theorem leads to M˜2.8× 1015 M⊙ ;sun for the global mass within the virial region. However, further investigation based on both optical and X-ray data shows significant signs of a young dynamical status. In fact, we find strong evidence for the elongation of the cluster in the SE-NW direction coupled with a significant velocity gradient, as well as for the presence of substructure both in optical data and X-ray data. Moreover, we point out the presence of several active galaxies. We discuss the results of our multi-wavelength investigation suggesting a complex merging scenario where the main, original structure is subject to an ongoing merger with a few clumps aligned in a filament in the foreground oriented in an oblique direction with respect to the line-of-sight. Our conclusion supports the view of the connection between extended radio emission and merging phenomena in galaxy clusters. Based on observations made on the island of La Palma

  12. RNA-seq analysis identifies an intricate regulatory network controlling cluster root development in white lupin

    PubMed Central

    2014-01-01

    Background Highly adapted plant species are able to alter their root architecture to improve nutrient uptake and thrive in environments with limited nutrient supply. Cluster roots (CRs) are specialised structures of dense lateral roots formed by several plant species for the effective mining of nutrient rich soil patches through a combination of increased surface area and exudation of carboxylates. White lupin is becoming a model-species allowing for the discovery of gene networks involved in CR development. A greater understanding of the underlying molecular mechanisms driving these developmental processes is important for the generation of smarter plants for a world with diminishing resources to improve food security. Results RNA-seq analyses for three developmental stages of the CR formed under phosphorus-limited conditions and two of non-cluster roots have been performed for white lupin. In total 133,045,174 high-quality paired-end reads were used for a de novo assembly of the root transcriptome and merged with LAGI01 (Lupinus albus gene index) to generate an improved LAGI02 with 65,097 functionally annotated contigs. This was followed by comparative gene expression analysis. We show marked differences in the transcriptional response across the various cluster root stages to adjust to phosphate limitation by increasing uptake capacity and adjusting metabolic pathways. Several transcription factors such as PLT, SCR, PHB, PHV or AUX/IAA with a known role in the control of meristem activity and developmental processes show an increased expression in the tip of the CR. Genes involved in hormonal responses (PIN, LAX, YUC) and cell cycle control (CYCA/B, CDK) are also differentially expressed. In addition, we identify primary transcripts of miRNAs with established function in the root meristem. Conclusions Our gene expression analysis shows an intricate network of transcription factors and plant hormones controlling CR initiation and formation. In addition

  13. Cluster analysis on the bulk elemental compositions of Antarctic stony meteorites

    NASA Astrophysics Data System (ADS)

    Miyamoto, Hideaki; Niihara, Takafumi; Kuritani, Takeshi; Hong, Peng K.; Dohm, James M.; Sugita, Seiji

    2016-05-01

    Remote sensing observations by recent successful missions to small bodies have revealed the difficulty in classifying the materials which cover their surfaces into a conventional classification of meteorites. Although reflectance spectroscopy is a powerful tool for this purpose, it is influenced by many factors, such as space weathering, lighting conditions, and surface physical conditions (e.g., particle size and style of mixing). Thus, complementary information, such as elemental compositions, which can be obtained by X-ray fluorescence (XRF) and gamma-ray spectrometers (GRS), have been considered very important. However, classifying planetary materials solely based on elemental compositions has not been investigated extensively. In this study, we perform principal component and cluster analyses on 12 major and minor elements of the bulk compositions of 500 meteorites reported in the National Institute of Polar Research (NIPR), Japan database. Our unique approach, which includes using hierarchical cluster analysis, indicates that meteorites can be classified into about 10 groups purely by their bulk elemental compositions. We suggest that Si, Fe, Mg, Ca, and Na are the optimal set of elements, as this set has been used successfully to classify meteorites of the NIPR database with more than 94% accuracy. Principal components analysis indicates that elemental compositions of meteorites form eight clusters in the three-dimensional space of the components. The three major principal components (PC1, PC2, and PC3) are interpreted as (1) degree of differentiations of the source body (i.e., primitive versus differentiated), (2) degree of thermal effects, and (3) degree of chemical fractionation, respectively.

  14. Unsupervised change detection in satellite images using fuzzy c-means clustering and principal component analysis

    NASA Astrophysics Data System (ADS)

    Kesikoğlu, M. H.; Atasever, Ü. H.; Özkan, C.

    2013-10-01

    Change detection analyze means that according to observations made in different times, the process of defining the change detection occurring in nature or in the state of any objects or the ability of defining the quantity of temporal effects by using multitemporal data sets. There are lots of change detection techniques met in literature. It is possible to group these techniques under two main topics as supervised and unsupervised change detection. In this study, the aim is to define the land cover changes occurring in specific area of Kayseri with unsupervised change detection techniques by using Landsat satellite images belonging to different years which are obtained by the technique of remote sensing. While that process is being made, image differencing method is going to be applied to the images by following the procedure of image enhancement. After that, the method of Principal Component Analysis is going to be applied to the difference image obtained. To determine the areas that have and don't have changes, the image is grouped as two parts by Fuzzy C-Means Clustering method. For achieving these processes, firstly the process of image to image registration is completed. As a result of this, the images are being referred to each other. After that, gray scale difference image obtained is partitioned into 3 × 3 nonoverlapping blocks. With the method of principal component analysis, eigenvector space is gained and from here, principal components are reached. Finally, feature vector space consisting principal component is partitioned into two clusters using Fuzzy C-Means Clustering and after that change detection process has been done.

  15. [Research on distribution of patents' holders for Chinese herbal compounds in treating cardiovascular and cerebrovascular based on cluster analysis].

    PubMed

    YANG, Xu-Jie; XIAO, Shi-Ying

    2015-09-01

    To discuss the distribution of patents' holders for Chinese herbal compounds in treating cardiovascular and cerebrovascular, the patents' holders for Chinese herbal compounds in treating cardiovascular and cerebrovascular were cluster analyzed by means of simple statistics and cluster analysis. Clustering variables were composed of patent applications, patent maintained number, related papers' quantity, etc. Chinese herbal compound patents' holders were divided into four categories according to their different scientific research and patent strength. It is the magic weapon for Chinese herbal compound patents' holders that have scientific research patents' transforming and make coordination of patent protection and scientific innovation. PMID:26983221

  16. Cosmological constraints from a combination of galaxy clustering and lensing - II. Fisher matrix analysis

    NASA Astrophysics Data System (ADS)

    More, Surhud; van den Bosch, Frank C.; Cacciato, Marcello; More, Anupreeta; Mo, Houjun; Yang, Xiaohu

    2013-04-01

    We quantify the accuracy with which the cosmological parameters characterizing the energy density of matter (Ωm), the amplitude of the power spectrum of matter fluctuations (σ8), the energy density of neutrinos (Ων) and the dark energy equation of state (w0) can be constrained using data from large galaxy redshift surveys. We advocate a joint analysis of the abundance of galaxies, galaxy clustering, and the galaxy-galaxy weak-lensing signal in order to simultaneously constrain the halo occupation statistics (i.e. galaxy bias) and the cosmological parameters of interest. We parametrize the halo occupation distribution of galaxies in terms of the conditional luminosity function and use the analytical framework of the halo model described in Cacciato et al. (our companion Paper III), to predict the relevant observables. By performing a Fisher matrix analysis, we show that a joint analysis of these observables, even with the precision with which they are currently measured from the Sloan Digital Sky Survey, can be used to obtain tight constraints on the cosmological parameters, fully marginalized over uncertainties in galaxy bias. We demonstrate that the cosmological constraints from such an analysis are nearly uncorrelated with the halo occupation distribution constraints, thus, minimizing the systematic impact of any imperfections in modelling the halo occupation statistics on the cosmological constraints. In fact, we demonstrate that the constraints from such an analysis are both complementary to and competitive with existing constraints on these parameters from a number of other techniques, such as cluster abundances, cosmic shear and/or baryon acoustic oscillations, thus paving the way to test the concordance cosmological model.

  17. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    PubMed

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882

  18. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda†

    PubMed Central

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-01-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards’ method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. PMID:26024882

  19. An application of cluster analysis for determining homogeneous subregions: The agroclimatological point of view. [Rio Grande do Sul, Brazil

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Cappelletti, C. A.

    1982-01-01

    A stratification oriented to crop area and yield estimation problems was performed using an algorithm of clustering. The variables used were a set of agroclimatological characteristics measured in each one of the 232 municipalities of the State of Rio Grande do Sul, Brazil. A nonhierarchical cluster analysis was used and the pseudo F-statistics criterion was implemented for determining the "cut point" in the number of strata.

  20. Analysis of a microchannel interconnect based on the clustering of smart-pixel-device windows

    NASA Astrophysics Data System (ADS)

    Rolston, D. R.; Robertson, B.; Plant, D. V.; Hinton, H. S.

    1996-03-01

    A design analysis of a telecentric microchannel relay system developed for use with a smart-pixel-based photonic backplane is presented. The interconnect uses a clustered-window geometry in which optoelectronic device windows are grouped together about the axis of each microchannel. A Gaussian-beam propagation model is used to analyze the trade-off between window size, window density, transistor count per smart pixel, and lenslet f-number for three cases of window clustering. The results of this analysis show that, with this approach, a window density of 4000 windows/cm2 is obtained for a window size of 30 mu m and a device plane separation of 25 mm. In addition, an optical power model is developed to determine the nominal power requirements of a 32 \\times 32 smart-pixel array as a function of window size. The power requirements are obtained assuming a complementary metal-oxide semiconductor inverter-amplifier and dual-rail multiple-quantum-well self-electro-optic-effect devices as the receiver stage of the smart pixel.

  1. Analysis of a microchannel interconnect based on the clustering of smart-pixel-device windows.

    PubMed

    Rolston, D R; Robertson, B; Hinton, H S; Plant, D V

    1996-03-10

    A design analysis of a telecentric microchannel relay system developed for use with a smart-pixel-based photonic backplane is presented. The interconnect uses a clustered-window geometry in which optoelectronic device windows are grouped together about the axis of each microchannel. A Gaussian-beam propagation model is used to analyze the trade-off between window size, window density, transistor count per smart pixel, and lenslet ƒ-number for three cases of window clustering. The results of this analysis show that, with this approach, a window density of 4000 windows/cm(2) is obtained for a window size of 30 µm and a device plane separation of 25 mm. In addition, an optical power model is developed to determine the nominal power requirements of a 32 × 32 smart-pixel array as a function of window size. The power requirements are obtained assuming a complementary metal-oxide semiconductor inverter-amplifier and dual-rail multiple-quantum-well self-electro-optic-effect devices as the receiver stage of the smart pixel. PMID:21085235

  2. Dietary Patterns Derived by Cluster Analysis are Associated with Cognitive Function among Korean Older Adults.

    PubMed

    Kim, Jihye; Yu, Areum; Choi, Bo Youl; Nam, Jung Hyun; Kim, Mi Kyung; Oh, Dong Hoon; Yang, Yoon Jung

    2015-06-01

    The objective of this study was to investigate major dietary patterns among older Korean adults through cluster analysis and to determine an association between dietary patterns and cognitive function. This is a cross-sectional study. The data from the Korean Multi-Rural Communities Cohort Study was used. Participants included 765 participants aged 60 years and over. A quantitative food frequency questionnaire with 106 items was used to investigate dietary intake. The Korean version of the MMSE-KC (Mini-Mental Status Examination-Korean version) was used to assess cognitive function. Two major dietary patterns were identified using K-means cluster analysis. The "MFDF" dietary pattern indicated high consumption of Multigrain rice, Fish, Dairy products, Fruits and fruit juices, while the "WNC" dietary pattern referred to higher intakes of White rice, Noodles, and Coffee. Means of the total MMSE-KC and orientation score of the participants in the MFDF dietary pattern were higher than those of the WNC dietary pattern. Compared with the WNC dietary pattern, the MFDF dietary pattern showed a lower risk of cognitive impairment after adjusting for covariates (OR 0.64, 95% CI 0.44-0.94). The MFDF dietary pattern, with high consumption of multigrain rice, fish, dairy products, and fruits may be related to better cognition among Korean older adults. PMID:26035243

  3. Analysis of the gene cluster encoding toluene/o-xylene monooxygenase from Pseudomonas stutzeri OX1

    SciTech Connect

    Bertoni, G.; Martino, M.; Galli, E.; Barbieri, P.

    1998-10-01

    The toluene/o-xylene monooxygenase cloned from Pseudomonas stutzeri OX1 displays a very broad range of substrates and a very peculiar regioselectivity, because it is able to hydroxylate more than one position on the aromatic ring of several hydrocarbons and phenols. The nucleotide sequence of the gene cluster coding for this enzymatic system has been determined. The sequence analysis revealed the presence of six open reading frames (ORFs) homologous to other genes clustered in operons coding for multicomponent monooxygenases found in benzene- and toluene-degradative pathways cloned from Pseudomonas strains. Significant similarities were also found with multicomponent monooxygenase systems for phenol, methane, alkene, and dimethyl sulfide cloned from different bacterial strains. The knockout of each ORF and complementation with the wild-type allele indicated that all six ORFs are essential for the full activity of the toluene/o-xylene monooxygenase in Escherichia coli. This analysis also shows that despite its activity on both hydrocarbons and phenols, toluene/o-xylene monooxygenase belongs to a toluene multicomponent monooxygenase subfamily rather than to the monooxygenases active on phenols.

  4. Autonomic specificity of basic emotions: evidence from pattern classification and cluster analysis.

    PubMed

    Stephens, Chad L; Christie, Israel C; Friedman, Bruce H

    2010-07-01

    Autonomic nervous system (ANS) specificity of emotion remains controversial in contemporary emotion research, and has received mixed support over decades of investigation. This study was designed to replicate and extend psychophysiological research, which has used multivariate pattern classification analysis (PCA) in support of ANS specificity. Forty-nine undergraduates (27 women) listened to emotion-inducing music and viewed affective films while a montage of ANS variables, including heart rate variability indices, peripheral vascular activity, systolic time intervals, and electrodermal activity, were recorded. Evidence for ANS discrimination of emotion was found via PCA with 44.6% of overall observations correctly classified into the predicted emotion conditions, using ANS variables (z=16.05, p<.001). Cluster analysis of these data indicated a lack of distinct clusters, which suggests that ANS responses to the stimuli were nomothetic and stimulus-specific rather than idiosyncratic and individual-specific. Collectively these results further confirm and extend support for the notion that basic emotions have distinct ANS signatures. PMID:20338217

  5. Capillary Liquid Chromatography Mass Spectrometry Analysis of Intact Monolayer-Protected Gold Clusters in Complex Mixtures.

    PubMed

    Black, David M; Bach, Stephan B H; Whetten, Robert L

    2016-06-01

    In some respects, large noble-metal clusters protected by thiolate ligands behave as giant molecules of definite composition and structure; however, their rigorous analysis continues to be quite challenging. Analysis of complex mixtures of intact monolayer-protected clusters (MPCs) by liquid chromatography mass spectrometry (LC-MS) could provide quantitative identification of the various components present. This advance is critical for biomedical and toxicological research, as well as in fundamental studies that rely on the identification of selected compositions. This work expands upon the separate LC and MS results previously achieved, by interfacing the capillary liquid chromatograph directly to the electrospray source of the mass spectrometer, in order to provide an extremely sensitive, quantitative, and rapid means to characterize MPCs and their derivatives far beyond that of earlier reports. Here, we show that nonaqueous reversed-phase chromatography can be coupled to mass-spectrometry detection to resolve complex mixtures in minute (∼100 ng) samples of gold MPCs, of molecular masses up to ∼40 kDa, and with single-species sensitivity easily demonstrated for components on the level of sub-10 ng or picomole (1 pmol). PMID:27216373

  6. Eating or Meeting? Cluster Analysis Reveals Intricacies of White Shark (Carcharodon carcharias) Migration and Offshore Behavior

    PubMed Central

    Jorgensen, Salvador J.; Arnoldi, Natalie S.; Estess, Ethan E.; Chapple, Taylor K.; Rückert, Martin; Anderson, Scot D.; Block, Barbara A.

    2012-01-01

    Elucidating how mobile ocean predators utilize the pelagic environment is vital to understanding the dynamics of oceanic species and ecosystems. Pop-up archival transmitting (PAT) tags have emerged as an important tool to describe animal migrations in oceanic environments where direct observation is not feasible. Available PAT tag data, however, are for the most part limited to geographic position, swimming depth and environmental temperature, making effective behavioral observation challenging. However, novel analysis approaches have the potential to extend the interpretive power of these limited observations. Here we developed an approach based on clustering analysis of PAT daily time-at-depth histogram records to distinguish behavioral modes in white sharks (Carcharodon carcharias). We found four dominant and distinctive behavioral clusters matching previously described behavioral patterns, including two distinctive offshore diving modes. Once validated, we mapped behavior mode occurrence in space and time. Our results demonstrate spatial, temporal and sex-based structure in the diving behavior of white sharks in the northeastern Pacific previously unrecognized including behavioral and migratory patterns resembling those of species with lek mating systems. We discuss our findings, in combination with available life history and environmental data, and propose specific testable hypotheses to distinguish between mating and foraging in northeastern Pacific white sharks that can provide a framework for future work. Our methodology can be applied to similar datasets from other species to further define behaviors during unobservable phases. PMID:23144707

  7. Eating or meeting? Cluster analysis reveals intricacies of white shark (Carcharodon carcharias) migration and offshore behavior.

    PubMed

    Jorgensen, Salvador J; Arnoldi, Natalie S; Estess, Ethan E; Chapple, Taylor K; Rückert, Martin; Anderson, Scot D; Block, Barbara A

    2012-01-01

    Elucidating how mobile ocean predators utilize the pelagic environment is vital to understanding the dynamics of oceanic species and ecosystems. Pop-up archival transmitting (PAT) tags have emerged as an important tool to describe animal migrations in oceanic environments where direct observation is not feasible. Available PAT tag data, however, are for the most part limited to geographic position, swimming depth and environmental temperature, making effective behavioral observation challenging. However, novel analysis approaches have the potential to extend the interpretive power of these limited observations. Here we developed an approach based on clustering analysis of PAT daily time-at-depth histogram records to distinguish behavioral modes in white sharks (Carcharodon carcharias). We found four dominant and distinctive behavioral clusters matching previously described behavioral patterns, including two distinctive offshore diving modes. Once validated, we mapped behavior mode occurrence in space and time. Our results demonstrate spatial, temporal and sex-based structure in the diving behavior of white sharks in the northeastern Pacific previously unrecognized including behavioral and migratory patterns resembling those of species with lek mating systems. We discuss our findings, in combination with available life history and environmental data, and propose specific testable hypotheses to distinguish between mating and foraging in northeastern Pacific white sharks that can provide a framework for future work. Our methodology can be applied to similar datasets from other species to further define behaviors during unobservable phases. PMID:23144707

  8. Dietary Patterns Derived by Cluster Analysis are Associated with Cognitive Function among Korean Older Adults

    PubMed Central

    Kim, Jihye; Yu, Areum; Choi, Bo Youl; Nam, Jung Hyun; Kim, Mi Kyung; Oh, Dong Hoon; Yang, Yoon Jung

    2015-01-01

    The objective of this study was to investigate major dietary patterns among older Korean adults through cluster analysis and to determine an association between dietary patterns and cognitive function. This is a cross-sectional study. The data from the Korean Multi-Rural Communities Cohort Study was used. Participants included 765 participants aged 60 years and over. A quantitative food frequency questionnaire with 106 items was used to investigate dietary intake. The Korean version of the MMSE-KC (Mini-Mental Status Examination–Korean version) was used to assess cognitive function. Two major dietary patterns were identified using K-means cluster analysis. The “MFDF” dietary pattern indicated high consumption of Multigrain rice, Fish, Dairy products, Fruits and fruit juices, while the “WNC” dietary pattern referred to higher intakes of White rice, Noodles, and Coffee. Means of the total MMSE-KC and orientation score of the participants in the MFDF dietary pattern were higher than those of the WNC dietary pattern. Compared with the WNC dietary pattern, the MFDF dietary pattern showed a lower risk of cognitive impairment after adjusting for covariates (OR 0.64, 95% CI 0.44–0.94). The MFDF dietary pattern, with high consumption of multigrain rice, fish, dairy products, and fruits may be related to better cognition among Korean older adults. PMID:26035243

  9. Joint Analysis of Galaxy-Galaxy Lensing and Galaxy Clustering: Methodology and Forecasts for DES

    SciTech Connect

    Park, Y.

    2015-07-19

    The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. Our analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we study how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degeneracies necessitate the detailed joint modeling of the galaxy sample that we employ. Finally, we conclude that DES data will provide powerful constraints on the evolution of structure growth in the universe, conservatively/optimistically constraining the growth function to 7.9%/4.8% with its first-year data that covered over 1000 square degrees, and to 3.9%/2.3% with its full five-year data that will survey 5000 square degrees, including both statistical and systematic uncertainties.

  10. Dengue Fever Occurrence and Vector Detection by Larval Survey, Ovitrap and MosquiTRAP: A Space-Time Clusters Analysis

    PubMed Central

    de Melo, Diogo Portella Ornelas; Scherrer, Luciano Rios; Eiras, Álvaro Eduardo

    2012-01-01

    The use of vector surveillance tools for preventing dengue disease requires fine assessment of risk, in order to improve vector control activities. Nevertheless, the thresholds between vector detection and dengue fever occurrence are currently not well established. In Belo Horizonte (Minas Gerais, Brazil), dengue has been endemic for several years. From January 2007 to June 2008, the dengue vector Aedes (Stegomyia) aegypti was monitored by ovitrap, the sticky-trap MosquiTRAP™ and larval surveys in an study area in Belo Horizonte. Using a space-time scan for clusters detection implemented in SaTScan software, the vector presence recorded by the different monitoring methods was evaluated. Clusters of vectors and dengue fever were detected. It was verified that ovitrap and MosquiTRAP vector detection methods predicted dengue occurrence better than larval survey, both spatially and temporally. MosquiTRAP and ovitrap presented similar results of space-time intersections to dengue fever clusters. Nevertheless ovitrap clusters presented longer duration periods than MosquiTRAP ones, less acuratelly signalizing the dengue risk areas, since the detection of vector clusters during most of the study period was not necessarily correlated to dengue fever occurrence. It was verified that ovitrap clusters occurred more than 200 days (values ranged from 97.0±35.35 to 283.0±168.4 days) before dengue fever clusters, whereas MosquiTRAP clusters preceded dengue fever clusters by approximately 80 days (values ranged from 65.5±58.7 to 94.0±14. 3 days), the former showing to be more temporally precise. Thus, in the present cluster analysis study MosquiTRAP presented superior results for signaling dengue transmission risks both geographically and temporally. Since early detection is crucial for planning and deploying effective preventions, MosquiTRAP showed to be a reliable tool and this method provides groundwork for the development of even more precise tools. PMID:22848729

  11. Dengue fever occurrence and vector detection by larval survey, ovitrap and MosquiTRAP: a space-time clusters analysis.

    PubMed

    de Melo, Diogo Portella Ornelas; Scherrer, Luciano Rios; Eiras, Álvaro Eduardo

    2012-01-01

    The use of vector surveillance tools for preventing dengue disease requires fine assessment of risk, in order to improve vector control activities. Nevertheless, the thresholds between vector detection and dengue fever occurrence are currently not well established. In Belo Horizonte (Minas Gerais, Brazil), dengue has been endemic for several years. From January 2007 to June 2008, the dengue vector Aedes (Stegomyia) aegypti was monitored by ovitrap, the sticky-trap MosquiTRAP™ and larval surveys in an study area in Belo Horizonte. Using a space-time scan for clusters detection implemented in SaTScan software, the vector presence recorded by the different monitoring methods was evaluated. Clusters of vectors and dengue fever were detected. It was verified that ovitrap and MosquiTRAP vector detection methods predicted dengue occurrence better than larval survey, both spatially and temporally. MosquiTRAP and ovitrap presented similar results of space-time intersections to dengue fever clusters. Nevertheless ovitrap clusters presented longer duration periods than MosquiTRAP ones, less acuratelly signalizing the dengue risk areas, since the detection of vector clusters during most of the study period was not necessarily correlated to dengue fever occurrence. It was verified that ovitrap clusters occurred more than 200 days (values ranged from 97.0±35.35 to 283.0±168.4 days) before dengue fever clusters, whereas MosquiTRAP clusters preceded dengue fever clusters by approximately 80 days (values ranged from 65.5±58.7 to 94.0±14. 3 days), the former showing to be more temporally precise. Thus, in the present cluster analysis study MosquiTRAP presented superior results for signaling dengue transmission risks both geographically and temporally. Since early detection is crucial for planning and deploying effective preventions, MosquiTRAP showed to be a reliable tool and this method provides groundwork for the development of even more precise tools. PMID:22848729

  12. Spatio-temporal cluster analysis of county-based human West Nile virus incidence in the continental United States

    PubMed Central

    Sugumaran, Ramanathan; Larson, Scott R; DeGroote, John P

    2009-01-01

    Background West Nile virus (WNV) is a vector-borne illness that can severely affect human health. After introduction on the East Coast in 1999, the virus quickly spread and became established across the continental United States. However, there have been significant variations in levels of human WNV incidence spatially and temporally. In order to quantify these variations, we used Kulldorff's spatial scan statistic and Anselin's Local Moran's I statistic to uncover spatial clustering of human WNV incidence at the county level in the continental United States from 2002–2008. These two methods were applied with varying analysis thresholds in order to evaluate sensitivity of clusters identified. Results The spatial scan and Local Moran's I statistics revealed several consistent, important clusters or hot-spots with significant year-to-year variation. In 2002, before the pathogen had spread throughout the country, there were significant regional clusters in the upper Midwest and in Louisiana and Mississippi. The largest and most consistent area of clustering throughout the study period was in the Northern Great Plains region including large portions of Nebraska, South Dakota, and North Dakota, and significant sections of Colorado, Wyoming, and Montana. In 2006, a very strong cluster centered in southwest Idaho was prominent. Both the spatial scan statistic and the Local Moran's I statistic were sensitive to the choice of input parameters. Conclusion Significant spatial clustering of human WNV incidence has been demonstrated in the continental United States from 2002–2008. The two techniques were not always consistent in the location and size of clusters identified. Although there was significant inter-annual variation, consistent areas of clustering, with the most persistent and evident being in the Northern Great Plains, were demonstrated. Given the wide variety of mosquito species responsible and the environmental conditions they require, further spatio

  13. Potential emission flux to aerosol pollutants over Bengal Gangetic plain through combined trajectory clustering and aerosol source fields analysis

    NASA Astrophysics Data System (ADS)

    Kumar, D. Bharath; Verma, S.

    2016-09-01

    A hybrid source-receptor analysis was carried out to evaluate the potential emission flux to winter monsoon (WinMon) aerosols over Bengal Gangetic plain urban (Kolkata, Kol) and semi-urban atmospheres (Kharagpur, Kgp). This was done through application of fuzzy c-mean clustering to back-trajectory data combined with emission flux and residence time weighted aerosols analysis. WinMon mean aerosol optical depth (AOD) and angstrom exponent (AE) at Kol (AOD: 0.77; AE: 1.17) were respectively slightly higher than and nearly equal to that at Kgp (AOD: 0.71; AE: 1.18). Out of six source region clusters over Indian subcontinent and two over Indian oceanic region, the cluster mean AOD was the highest when associated with the mean path of air mass originating from the Bay of Bengal and the Arabian sea clusters at Kol and that from the Indo-Gangetic plain (IGP) cluster at Kgp. Spatial distribution of weighted AOD fields showed the highest potential source of aerosols over the IGP, primarily over upper IGP (e.g. Punjab, Haryana), lower IGP (e.g. Uttarpradesh) and eastern region (e.g. west Bengal, Bihar, northeast India) clusters. The emission flux contribution potential (EFCP) of fossil fuel (FF) emissions at surface (SL) of Kol/Kgp, elevated layer (EL) of Kol, and of biomass burning (BB) emissions at SL of Kol were primarily from upper, lower, upper/lower IGP clusters respectively. The EFCP of FF/BB emissions at Kgp-EL/SL, and that of BB at EL of Kol/Kgp were mainly from eastern region and Africa (AFR) clusters respectively. Though the AFR cluster was constituted of significantly high emission flux source potential of dust emissions, the EFCP of dust from northwest India (NWI) was comparable to that from AFR at Kol SL/EL.

  14. Theoretical and experimental analysis of ammonia ionic clusters produced by 252Cf fragment impact on an NH3 ice target.

    PubMed

    Fernandez-Lima, F A; Ponciano, C R; Chaer Nascimento, M A; da Silveira, E F

    2006-08-24

    Positive and negatively charged ammonia clusters produced by the impact of (252)Cf fission fragments (FF) on an NH(3) ice target have been examined theoretical and experimentally. The ammonia clusters generated by (252)Cf FF show an exponential dependence of the cluster population on its mass, and the desorption yields for the positive (NH(3))(n)NH(4)(+) clusters are 1 order of magnitude higher than those for the negative (NH(3))(n)NH(2)(-) clusters. The experimental population analysis of (NH(3))(n)NH(4)(+) (n = 0-18) and (NH(3))(n)NH(2)(-) (n = 0-8) cluster series show a special stability at n = 4 and 16 and n = 2, 4, and 6, respectively. DFT/B3LYP calculations of the (NH(3))(0)(-)(8)NH(4)(+) clusters show that the structures of the more stable conformers follow a clear pattern: each additional NH(3) group makes a new hydrogen bond with one of the hydrogen atoms of an NH(3) unit already bound to the NH(4)(+) core. For the (NH(3))(0)(-)(8)NH(2)(-) clusters, the DFT/B3LYP calculations show that, within the calculation error, the more stable conformers follow a clear pattern for n = 1-6: each additional NH(3) group makes a new hydrogen bond to the NH(2)(-) core. For n = 7 and 8, the additional NH(3) groups bind to other NH(3) groups, probably because of the saturation of the NH(2)(-) core. Similar results were obtained at the MP2 level of calculation. A stability analysis was performed using the commonly defined stability function E(n)(-)(1) + E(n)(+1) - 2E(n), where E is the total energy of the cluster, including the zero point correction energy (E = E(t) + ZPE). The trend on the relative stability of the clusters presents an excellent agreement with the distribution of experimental cluster abundances. Moreover, the stability analysis predicts that the (NH(3))(4)NH(4)(+) and the even negative clusters [(NH(3))(n)NH(2)(-), n = 2, 4, and 6] should be the most stable ones, in perfect agreement with the experimental results. PMID:16913675

  15. Survey on granularity clustering.

    PubMed

    Ding, Shifei; Du, Mingjing; Zhu, Hong

    2015-12-01

    With the rapid development of uncertain artificial intelligent and the arrival of big data era, conventional clustering analysis and granular computing fail to satisfy the requirements of intelligent information processing in this new case. There is the essential relationship between granular computing and clustering analysis, so some researchers try to combine granular computing with clustering analysis. In the idea of granularity, the researchers expand the researches in clustering analysis and look for the best clustering results with the help of the basic theories and methods of granular computing. Granularity clustering method which is proposed and studied has attracted more and more attention. This paper firstly summarizes the background of granularity clustering and the intrinsic connection between granular computing and clustering analysis, and then mainly reviews the research status and various methods of granularity clustering. Finally, we analyze existing problem and propose further research. PMID:26557926

  16. Morphometry and Cluster Analysis of Low Shield Volcanoes on Earth and Mars

    NASA Astrophysics Data System (ADS)

    Henderson, A.; Christiansen, E. H.; Radebaugh, J.

    2015-12-01

    Volcanoes are common on all terrestrial planets and their morphology is influenced by eruption mechanisms, volumes, and compositions and temperatures of the magmas; these are in turn influenced by the tectonic setting. In an attempt to better understand the relationship between morphometry and volcanic processes, we compared low-shield volcanoes on Syria Planum, Mars, with basaltic shields of the eastern Snake River Plain (eSRP).We used 133 volcanoes on Syria Planum that are covered by MOLA and HRSC elevation data and 246 eSRP shields covered by the NED. Shields on Syria Planum average 191 +/- 88 m tall, 12 +/- 6 km in diameter, 16 +/- 28 km3 in volume, and have 1.7° +/- 0.8 flank slopes. eSRP shields average 83 +/- 44 m tall, 4 +/- 3 km in diameter, 0.8 +/- 2 km3 in volume, and have 2.5° +/- 1 flank slopes. Bivariate plots of morphometric characteristics show that Syria Planum and eSRP low shields form the extremes of the same morphospace shared with some Icelandic olivine tholeiite shields, but is generally distinct from other terrestrial volcanoes. Cluster analysis of SP and eSRP shields with other terrestrial volcanoes separates these volcanoes into one cluster and the majority of them into the same sub-cluster that is distinct from other terrestrial volcanoes. Principal component and cluster analysis of Syria Planum and eSRP shields using height, area, volume, slope, and eccentricity shows that Syria Planum and eSRP low-shields are similar in shape (slope and eccentricity). Apparently, these low shields formed by similar processes involving Hawaiian-type eruptions of low viscosity (mafic) lavas with fissure controlled eruptions, narrowing to central vents. Initially high eruption rates and long, tube-fed lava flows shifted to the development of small lava lakes that repeatedly overflowed, and on some with late fountaining to form steeper spatter ramparts. However, Syria Planum shields are systematically larger than those on the eastern Snake River Plain. The

  17. Cluster analysis in soft x-ray spectromicroscopy : finding the patterns in complex specimens.

    SciTech Connect

    Lerotic, M.; Jacobsen, C.; Gillow, J. B.; Wirick, S.; Vogt, S.; Maser, J.; Experimental Facilities Division; State Univ. of New York at Stony Brook; BNL

    2005-06-01

    Soft X-ray spectromicroscopy provides spectral data on the chemical speciation of light elements at sub-100 nanometer spatial resolution. If all chemical species in a specimen are known and separately characterized, existing approaches can be used to measure the concentration of each component at each pixel. In other situations such as in biology or environmental science, this approach may not be possible. We have previously described the use of principle component analysis (PCA) to orthogonalize and noise-filter spectromicroscopy data, and cluster analysis (CA) to classify the analyzed data and obtain thickness maps of representative spectra. We describe here an extension of that work employing an angle distance measure; this measure provides better classification based on spectral signatures alone in specimens with significant thickness variations. The method is illustrated using simulated data, and also to examine sporulation in the bacterium Clostridium sp.

  18. Diversity of Xiphinema americanum-group Species and Hierarchical Cluster Analysis of Morphometrics

    PubMed Central

    Lamberti, Franco; Ciancio, Aurelio

    1993-01-01

    Of the 39 species composing the Xiphinema americanum group, 14 were described originally from North America and two others have been reported from this region. Many species are very similar morphologically and can be distinguished only by a difficult comparison of various combinations of some morphometric characters. Study of morphometrics of 49 populations, including the type populations of the 39 species attributed to this group, by principal component analysis and hierarchical cluster analysis placed the populations into five subgroups, proposed here as the X. brevicolle subgroup (seven species), the X. americanum subgroup (17 species), the X. taylori subgroup (two species), the X. pachtaicum subgroup (eight species), and the X. lambertii subgroup (five species). PMID:19279776

  19. Application of Factor Analysis on the Financial Ratios of Indian Cement Industry and Validation of the Results by Cluster Analysis

    NASA Astrophysics Data System (ADS)

    De, Anupam; Bandyopadhyay, Gautam; Chakraborty, B. N.

    2010-10-01

    Financial ratio analysis is an important and commonly used tool in analyzing financial health of a firm. Quite a large number of financial ratios, which can be categorized in different groups, are used for this analysis. However, to reduce number of ratios to be used for financial analysis and regrouping them into different groups on basis of empirical evidence, Factor Analysis technique is being used successfully by different researches during the last three decades. In this study Factor Analysis has been applied over audited financial data of Indian cement companies for a period of 10 years. The sample companies are listed on the Stock Exchange India (BSE and NSE). Factor Analysis, conducted over 44 variables (financial ratios) grouped in 7 categories, resulted in 11 underlying categories (factors). Each factor is named in an appropriate manner considering the factor loads and constituent variables (ratios). Representative ratios are identified for each such factor. To validate the results of Factor Analysis and to reach final conclusion regarding the representative ratios, Cluster Analysis had been performed.

  20. Quantification and clustering of phenotypic screening data using time-series analysis for chemotherapy of schistosomiasis

    PubMed Central

    2012-01-01

    Background Neglected tropical diseases, especially those caused by helminths, constitute some of the most common infections of the world's poorest people. Development of techniques for automated, high-throughput drug screening against these diseases, especially in whole-organism settings, constitutes one of the great challenges of modern drug discovery. Method We present a method for enabling high-throughput phenotypic drug screening against diseases caused by helminths with a focus on schistosomiasis. The proposed method allows for a quantitative analysis of the systemic impact of a drug molecule on the pathogen as exhibited by the complex continuum of its phenotypic responses. This method consists of two key parts: first, biological image analysis is employed to automatically monitor and quantify shape-, appearance-, and motion-based phenotypes of the parasites. Next, we represent these phenotypes as time-series and show how to compare, cluster, and quantitatively reason about them using techniques of time-series analysis. Results We present results on a number of algorithmic issues pertinent to the time-series representation of phenotypes. These include results on appropriate representation of phenotypic time-series, analysis of different time-series similarity measures for comparing phenotypic responses over time, and techniques for clustering such responses by similarity. Finally, we show how these algorithmic techniques can be used for quantifying the complex continuum of phenotypic responses of parasites. An important corollary is the ability of our method to recognize and rigorously group parasites based on the variability of their phenotypic response to different drugs. Conclusions The methods and results presented in this paper enable automatic and quantitative scoring of high-throughput phenotypic screens focused on helmintic diseases. Furthermore, these methods allow us to analyze and stratify parasites based on their phenotypic response to drugs

  1. The Cluster Ages Experiment (CASE). VII. Analysis of Two Eclipsing Binaries in the Globular Cluster NGC 6362

    NASA Astrophysics Data System (ADS)

    Kaluzny, J.; Thompson, I. B.; Dotter, A.; Rozyczka, M.; Schwarzenberg-Czerny, A.; Burley, G. S.; Mazur, B.; Rucinski, S. M.

    2015-11-01

    We use photometric and spectroscopic observations of the detached eclipsing binaries V40 and V41 in the globular cluster NGC 6362 to derive masses, radii, and luminosities of the component stars. The orbital periods of these systems are 5.30 and 17.89 days, respectively. The measured masses of the primary and secondary components (Mp, Ms) are (0.8337 ± 0.0063, 0.7947 ± 0.0048) M⊙ for V40 and (0.8215 ± 0.0058, 0.7280 ± 0.0047) M⊙ for V41. The measured radii (Rp, Rs) are (1.3253 ± 0.0075, 0.997 ± 0.013) R⊙ for V40 and (1.0739 ± 0.0048, 0.7307 ± 0.0046) R⊙ for V41. Based on the derived luminosities, we find that the distance modulus of the cluster is 14.74 ± 0.04 mag—in good agreement with 14.72 mag obtained from color-magnitude diagram (CMD) fitting. We compare the absolute parameters of component stars with theoretical isochrones in mass-radius and mass-luminosity diagrams. For assumed abundances [Fe/H] = -1.07, [α/Fe] = 0.4, and Y = 0.25 we find the most probable age of V40 to be 11.7 ± 0.2 Gyr, compatible with the age of the cluster derived from CMD fitting (12.5 ± 0.5 Gyr). V41 seems to be markedly younger than V40. If independently confirmed, this result will suggest that V41 belongs to the younger of the two stellar populations recently discovered in NGC 6362. The orbits of both systems are eccentric. Given the orbital period and age of V40, its orbit should have been tidally circularized some ˜7 Gyr ago. The observed eccentricity is most likely the result of a relatively recent close stellar encounter. This paper includes data gathered with the 6.5 m Magellan Baade and Clay Telescopes, and the 2.5-m du Pont Telescope located at Las Campanas Observatory, Chile.

  2. Spherical cluster analysis for beam angle optimization in intensity-modulated radiation therapy treatment planning

    NASA Astrophysics Data System (ADS)

    Bangert, Mark; Oelfke, Uwe

    2010-10-01

    An intuitive heuristic to establish beam configurations for intensity-modulated radiation therapy is introduced as an extension of beam ensemble selection strategies applying scalar scoring functions. It is validated by treatment plan comparisons for three intra-cranial, pancreas, and prostate cases each. Based on a patient specific matrix listing the radiological quality of candidate beam directions individually for every target voxel, a set of locally ideal beam angles is generated. The spherical distribution of locally ideal beam angles is characteristic for every treatment site and patient: ideal beam angles typically cluster around distinct orientations. We interpret the cluster centroids, which are identified with a spherical K-means algorithm, as irradiation angles of an intensity-modulated radiation therapy treatment plan. The fluence profiles are subsequently optimized during a conventional inverse planning process. The average computation time for the pre-optimization of a beam ensemble is six minutes on a state-of-the-art work station. The treatment planning study demonstrates the potential benefit of the proposed beam angle optimization strategy. For the three prostate cases under investigation, the standard treatment plans applying nine coplanar equi-spaced beams and treatment plans applying an optimized non-coplanar nine-beam ensemble yield clinically comparable dose distributions. For symmetric patient geometries, the dose distribution formed by nine equi-spaced coplanar beams cannot be improved significantly. For the three pancreas and intra-cranial cases under investigation, the optimized non-coplanar beam ensembles enable better sparing of organs at risk while guaranteeing equivalent target coverage. Beam angle optimization by spherical cluster analysis shows the biggest impact for target volumes located asymmetrically within the patient and close to organs at risk.

  3. pong: fast analysis and visualization of latent clusters in population genetic data

    PubMed Central

    Behr, Aaron A.; Liu, Katherine Z.; Liu-Fang, Gracie; Nakka, Priyanka; Ramachandran, Sohini

    2016-01-01

    Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. Results: We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. Availability and Implementation: pong is freely available and can be installed using the Python package management system pip. pong’s source code is available at https://github.com/abehr/pong. Contact: aaron_behr@alumni.brown.edu or sramachandran@brown.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:27283948

  4. Systematic X-Ray Analysis of Radio Relic Clusters with Suzaku

    NASA Astrophysics Data System (ADS)

    Akamatsu, Hiroki; Kawahara, Hajime

    2013-02-01

    We undertook a systematic X-ray analysis of six giant radio relics in four clusters of galaxies using the Suzaku satellite. The sample included CIZA 2242.8+5301, Zwcl 2341.1-0000, the South-East part of A 3667 and previously published results of the North-West part of A 3667 and A 3376. Especially, we first observed the narrow (50 kpc) relic of CIZA 2242.8+5301 by the Suzaku satellite, which enabled us to reduce the projection effect. We report on X-ray detections of shocks at the positions of the relics in CIZA 2242.8+5301 and A 3667 SE. At the positions of the two relics in ZWCL 2341.1-0000, we did not detect shocks. From spectroscopic temperature profiles across the relic, we found that the temperature profiles exhibit significant jumps across the relics for CIZA 2242.8+5301, A 3376, A 3667 NW, and A 3667 SE. We estimated the Mach number from the X-ray temperature or pressure profile using the Rankine-Hugoniot jump condition, and compared it with the Mach number derived from the radio spectral index. The resulting Mach numbers (M = 1.5-3) are almost consistent with each other, while the Mach number of CIZA 2242.8+5301, derived from the X-ray data, tends to be lower than that of the radio observation. These results indicate that the giant radio relics in merging clusters are related to the shock structure, as suggested by previous studies of individual clusters.

  5. Analyzing patients' values by applying cluster analysis and LRFM model in a pediatric dental clinic in Taiwan.

    PubMed

    Wu, Hsin-Hung; Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741

  6. Analyzing Patients' Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental Clinic in Taiwan

    PubMed Central

    Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741

  7. XMM-Newton analysis of a newly discovered, extremely X-ray luminous galaxy cluster at high redshift

    NASA Astrophysics Data System (ADS)

    Thoelken, S.; Schrabback, T.

    2016-06-01

    Galaxy clusters, the largest virialized structures in the universe, provide an excellent method to test cosmology on large scales. The galaxy cluster mass function as a function of redshift is a key tool to determine the fundamental cosmological parameters and especially measurements at high redshifts can e.g. provide constraints on dark energy. The fgas test as a direct cosmological probe is of special importance. Therefore, relaxed galaxy clusters at high redshifts are needed but these objects are considered to be extremely rare in current structure formation models. Here we present first results from an XMM-Newton analysis of an extremely X-ray luminous, newly discovered and potentially cool core cluster at a redshift of z=0.9. We carefully account for background emission and PSF effects and model the cluster emission in three radial bins. Our preliminary results suggest that this cluster is indeed a good candidate for a cool core cluster and thus potentially of extreme value for cosmology.

  8. Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Professional Nurse (Associate Degree). Education for Employment Task Lists.

    ERIC Educational Resources Information Center

    Lake County Area Vocational Center, Grayslake, IL.

    This document contains a task analysis for health occupations (professional nurse) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…

  9. Morphology and evolution of simulated and optical clusters: a comparative analysis

    NASA Astrophysics Data System (ADS)

    Rahman, Nurur; Krywult, Janusz; Motl, Patrick M.; Flin, Piotr; Shandarin, Sergei F.

    2006-04-01

    We have made a comparative study of morphological evolution in simulated dark matter (DM) haloes and X-ray brightness distribution, and in optical clusters. Samples of simulated clusters include star formation with supernovae feedback, radiative cooling and simulation in the adiabatic limit at three different redshifts, z= 0.0, 0.10 and 0.25. The optical sample contains 208 Abell, Corwin & Olowin (ACO) clusters within redshift, z<= 0.25. Cluster morphology, within 0.5 and 1.0 h-1 Mpc from cluster centre, is quantified by multiplicity and ellipticity. We find that the distribution of the DM haloes in the adiabatic simulation appears to be more elongated than the galaxy clusters. Radiative cooling brings halo shapes in excellent agreement with observed clusters; however, cooling along with feedback mechanism makes the haloes more flattened. Our results indicate relatively stronger structural evolution and more clumpy distributions in observed clusters than in the structure of simulated clusters, and slower increase in simulated cluster shapes compared to those in the observed one. Within z<= 0.1, we note an interesting agreement in the shapes of clusters obtained from the cooling simulations and observation. We also note that the different samples of observed clusters differ significantly in morphological evolution with redshift. We highlight a few possibilities responsible for the discrepancy in morphological evolution of simulated and observed clusters.

  10. Analysis of FOXF1 and the FOX gene cluster in patients with VACTERL association.

    PubMed

    Agochukwu, Nneamaka B; Pineda-Alvarez, Daniel E; Keaton, Amelia A; Warren-Mora, Nicole; Raam, Manu S; Kamat, Aparna; Chandrasekharappa, Settara C; Solomon, Benjamin D

    2011-01-01

    VACTERL association, a relatively common condition with an incidence of approximately 1 in 20,000 -35,000 births, is a non-random association of birth defects that includes vertebral defects (V), anal atresia (A), cardiac defects (C), tracheo-esophageal fistula (TE), renal anomalies (R) and limb malformations (L). Although the etiology is unknown in the majority of patients, there is evidence that it is causally heterogeneous. Several studies have shown evidence for inheritance in VACTERL, implying a role for genetic loci. Recently, patients with component features of VACTERL and a lethal developmental pulmonary disorder, alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV), were found to harbor deletions or mutations affecting FOXF1 and the FOX gene cluster on chromosome 16q24. We investigated this gene through direct sequencing and high-density SNP microarray in 12 patients with VACTERL association but without ACD/MPV. Our mutational analysis of FOXF1 showed normal sequences and no genomic imbalances affecting the FOX gene cluster on chromosome 16q24 in the studied patients. Possible explanations for these results include the etiologic and clinical heterogeneity of VACTERL association, the possibility that mutations affecting this gene may occur only in more severely affected individuals, and insufficient study sample size. PMID:21315191

  11. CN and CH Abundance Analysis in a Sample of Eight Galactic Globular Clusters

    NASA Astrophysics Data System (ADS)

    Smolinski, Jason P.; Lee, Y.; Beers, T. C.; Martell, S. L.; An, D.; Sivarani, T.

    2011-01-01

    Galactic globular clusters exhibit star-to-star variations in their light element abundances that are not predicted by formation and evolution models involving single stellar generations. Recently it has been suggested that internal pollution from early supernovae and AGB winds may have played important roles in forming a second generation of enriched stars. We present updated results of a CN and CH abundance analysis of stars from the base to the tip of the red giant branch, and in some cases down onto the main sequence, for eight globular clusters with available photometric and spectroscopic data from SDSS-I and SDSS-II/SEGUE. These results include a discussion of the radial distribution of CN enrichment and how this may impact the current paradigm. Funding for SDSS-I and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web Site is http://www.sdss.org/. This work was supported in part by grants PHY 02-16783 and PHY 08-22648: Physics Frontiers Center/Joint Institute for Nuclear Astrophysics (JINA), awarded by the U.S. National Science Foundation.

  12. Outcome of patients with autoimmune diseases in the intensive care unit: a mixed cluster analysis

    PubMed Central

    Bernal-Macías, Santiago; Reyes-Beltrán, Benjamín; Molano-González, Nicolás; Augusto Vega, Daniel; Bichernall, Claudia; Díaz, Luis Aurelio; Rojas-Villarraga, Adriana; Anaya, Juan-Manuel

    2015-01-01

    Objectives The interest on autoimmune diseases (ADs) and their outcome at the intensive care unit (ICU) has increased due to the clinical challenge for diagnosis and management as well as for prognosis. The current work presents a-year experience on these topics in a tertiary hospital. Methods The mixed-cluster methodology based on multivariate descriptive methods such as principal component analysis and multiple correspondence analyses was performed to summarize sets of related variables with strong associations and common clinical context. Results Fifty adult patients with ADs with a mean age of 46.7±17.55 years were assessed. The two most common diagnoses were systemic lupus erythematosus and systemic sclerosis, registered in 45% and 20% of patients, respectively. The main causes of admission to ICU were infection and AD flare up, observed in 36% and 24%, respectively. Mortality during ICU stay was 24%. The length of hospital stay before ICU admission, shock, vasopressors, mechanical ventilation, abdominal sepsis, Glasgow score and plasmapheresis were all factors associated with mortality. Two new clinical clusters variables (NCVs) were defined: Time ICU and ICU Support Profile, which were associated with survivor and no survivor variables. Conclusions Identification of single factors and groups of factors from NCVs will allow implementation of early and aggressive therapies in patients with ADs at the ICU in order to avoid fatal outcomes PMID:26688741

  13. A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis.

    PubMed

    Medema, Marnix H; Cimermancic, Peter; Sali, Andrej; Takano, Eriko; Fischbach, Michael A

    2014-12-01

    Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways. PMID:25474254

  14. Chronic low back pain patient groups in primary care – A cross sectional cluster analysis

    PubMed Central

    2013-01-01

    Background Due to the heterogeneous nature of chronic low back pain (CLBP), it is necessary to identify patient groups and evaluate treatments within these groups. We aimed to identify groups of patients with CLBP in the primary care setting. Methods We performed a k-means cluster analysis on a large data set (n = 634) of primary care patients with CLBP. Variables of sociodemographic data, pain characteristics, psychological status (i.e., depression, anxiety, somatization), and the patient resources of resilience and coping strategies were included. Results We found three clusters that can be characterized as “pensioners with age-associated pain caused by degenerative diseases”, “middle-aged patients with high mental distress and poor coping resources”, and “middle-aged patients who are less pain-affected and better positioned with regard to their mental health”. Conclusions Our results supported current knowledge concerning groups of CLBP patients in primary care. In particular, we identified a group that was most disabled and distressed, and which was mainly characterized by psychological variables. As shown in our study, pain-related coping strategies and resilience were low in these patients and might be addressed in differentiating treatment strategies. Future studies should focus on the identification of this group in order to achieve effective treatment allocation. Trial registration German Clinical Trial Register DRKS00003123 PMID:24131707

  15. Analysis of FOXF1 and the FOX gene cluster in patients with VACTERL association

    PubMed Central

    Agochukwu, Nneamaka B.; Pineda-Alvarez, Daniel E.; Keaton, Amelia A.; Warren-Mora, Nicole; Raam, Manu S.; Kamat, Aparna; Chandrasekharappa, Settara C.; Solomon, Benjamin D.

    2011-01-01

    VACTERL association, a relatively common condition with an incidence of approximately 1 in 20,000 – 35,000 births, is a non-random association of birth defects that includes vertebral defects (V), anal atresia (A), cardiac defects (C), tracheo-esophageal fistula (TE), renal anomalies (R) and limb malformations (L). Although the etiology is unknown in the majority of patients, there is evidence that it is causally heterogeneous. Several studies have shown evidence for inheritance in VACTERL, implying a role for genetic loci. Recently, patients with component features of VACTERL and a lethal developmental pulmonary disorder, alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV), were found to harbor deletions or mutations affecting FOXF1 and the FOX gene cluster on chromosome 16q24. We investigated this gene through direct sequencing and high-density SNP microarray in 12 patients with VACTERL association but without ACD/MPV. Our mutational analysis of FOXF1 showed normal sequences and no genomic imbalances affecting the FOX gene cluster on chromosome 16q24 in the studied patients. Possible explanations for these results include the etiologic and clinical heterogeneity of VACTERL association, the possibility that mutations affecting this gene may occur only in more severely affected individuals, and insufficient study sample size. PMID:21315191

  16. A Systematic Computational Analysis of Biosynthetic Gene Cluster Evolution: Lessons for Engineering Biosynthesis

    PubMed Central

    Sali, Andrej; Takano, Eriko; Fischbach, Michael A.

    2014-01-01

    Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties are often frustrated by the unpredictability and complexity of the enzymes that synthesize these molecules, suggesting that genetic changes within BGCs are limited by specific constraints. Here, by performing a systematic computational analysis of BGC evolution, we derive evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules: 1) BGCs for complex molecules often evolve through the successive merger of smaller sub-clusters, which function as independent evolutionary entities. 2) An important subset of polyketide synthases and nonribosomal peptide synthetases evolve by concerted evolution, which generates sets of sequence-homogenized domains that may hold promise for engineering efforts since they exhibit a high degree of functional interoperability, 3) Individual BGC families evolve in distinct ways, suggesting that design strategies should take into account family-specific functional constraints. These findings suggest novel strategies for using synthetic biology to rationally engineer biosynthetic pathways. PMID:25474254

  17. Cluster Analysis of Vortical Flow in Simulations of Cerebral Aneurysm Hemodynamics.

    PubMed

    Oeltze-Jafra, Steffen; Cebral, Juan R; Janiga, Gábor; Preim, Bernhard

    2016-01-01

    Computational fluid dynamic (CFD) simulations of blood flow provide new insights into the hemodynamics of vascular pathologies such as cerebral aneurysms. Understanding the relations between hemodynamics and aneurysm initiation, progression, and risk of rupture is crucial in diagnosis and treatment. Recent studies link the existence of vortices in the blood flow pattern to aneurysm rupture and report observations of embedded vortices -a larger vortex encloses a smaller one flowing in the opposite direction -whose implications are unclear. We present a clustering-based approach for the visual analysis of vortical flow in simulated cerebral aneurysm hemodynamics. We show how embedded vortices develop at saddle-node bifurcations on vortex core lines and convey the participating flow at full manifestation of the vortex by a fast and smart grouping of streamlines and the visualization of group representatives. The grouping result may be refined based on spectral clustering generating a more detailed visualization of the flow pattern, especially further off the core lines. We aim at supporting CFD engineers researching the biological implications of embedded vortices. PMID:26390475

  18. Connecting subsistence harvest and marine ecology: A cluster analysis of communities by fishing and hunting patterns

    NASA Astrophysics Data System (ADS)

    Renner, Martin; Huntington, Henry P.

    2014-11-01

    Alaska Native subsistence hunters and fishers are engaged in environmental sampling, influenced by harvest technology and cultural preferences as well as biogeographical factors. We compared subsistence harvest patterns in 35 communities along the Bering, Chukchi, and Beaufort coasts of Alaska to identify affinities and groupings, and to compare those results with previous ecological analyses done for the same region. We used hierarchical cluster analysis to reveal spatial patterns in subsistence harvest records of coastal Alaska Native villages from the southern Bering Sea to the Beaufort Sea. Three main clusters were identified, correlating strongly with geography. The main division separates coastal villages of western Alaska from arctic villages along the northern Chukchi and Beaufort Seas and on islands of the Bering Sea. K-means groupings corroborate this result, with some differences. The second node splits the arctic villages, along the Chukchi, Beaufort and northern Bering Seas, where marine mammals dominate the harvest, from those on islands of the Bering Sea, characterized by seabird and seal harvests. These patterns closely resemble eco-regions proposed on biological grounds. Biogeography thus appears to be a significant factor in groupings by harvest characteristics, suggesting that subsistence harvests are a viable form of ecosystem sampling.

  19. Active Tectonics of Southern California Revealed by Cluster Analysis of GPS Velocities

    NASA Astrophysics Data System (ADS)

    Thatcher, W. R.; Savage, J. C.; Simpson, R. W.

    2013-12-01

    We use cluster analysis of the USGS National Seismic Hazard Map GPS velocity field for southern California with standard deviations < 1 mm/yr to determine velocity gradients that locate the most important faults, the elastic strain associated with them, and regions of possible block-like behavior. Seven to ten well resolved clusters are statistically significant and spatially distinct with small overlap. In map view (see figure), the 7 clusters solution shows bands of relatively constant velocity sub-parallel to the San Andreas (SAF) and San Jacinto (SJF) faults and the major faults of the eastern Mojave shear zone (EMSZ). These bands are due both to elastic strain accumulation on the SAF and relative motion across lower slip rate faults in the EMSZ and Los Angeles and Ventura basins. At the largest scale, the 7-cluster map shows two main trends. The blue dots define the SJ and SA faults from northwest of the Salton Sea (SS) to Parkfield (P); the grey/magenta boundary suggests that the defined Eastern California Shear Zone could be extended farther south to the Salton Sea. The short ~80-km-long San Gorgonio Pass-San Bernardino Mountains (SGP) segment of the SAF has a much lower slip rate, ~7 mm/yr of right-lateral oblique convergence. As generally shown by previous GPS studies, right-lateral strike-slip movement rates vary considerably along the SAF. In the Imperial Valley (IV) the rate is ~40 mm/yr; east of the Salton Sea it drops to ~20 mm/yr, with 10-15 mm/yr having been shunted westward to the SJF; north of the Salton Sea ~10-15 mm/yr of strike-slip is transferred to the faults of the eastern Mojave; therefore the east-trending faults of San Gorgonio Pass (SGP) take up only ~5 mm/yr of strike slip and ~equal amounts of north-south shortening; on the Mojave (M) segment of the SAF the slip rate increases to ~15-20 mm/yr in the vicinity of Cajon Pass (CP) because of transfer of SJF slip back onto the San Andreas; northwest of Tejon Pass the rate increases again to

  20. VizieR Online Data Catalog: Slug analysis of star clusters in NGC 628 & 7793 (Krumholz+, 2015)

    NASA Astrophysics Data System (ADS)

    Krumholz, M. R.; Adamo, A.; Fumagalli, M.; Wofford, A.; Calzetti, D.; Lee, J. C.; Whitmore, B. C.; Bright, S. N.; Grasha, K.; Gouliermis, D. A.; Kim, H.; Nair, P.; Ryon, J. E.; Smith, L. J.; Thilker, D.; Ubeda, L.; Zackrisson, E.

    2016-02-01

    In this paper we use slug, the Stochastically Lighting Up Galaxies code (da Silva et al. 2012ApJ...745..145D, 2014MNRAS.444.3275D; Krumholz et al. 2015MNRAS.452.1447K), and its post-processing tool for analysis of star cluster properties, cluster_slug, to analyze an initial sample of clusters from the LEGUS (Calzetti et al. 2015AJ....149...51C). A description of the steps required to produce final cluster catalogs of the Legacy Extragalactic UV Survey (LEGUS) targets can be found in Calzetti et al. (2015AJ....149...51C), and in A. Adamo et al. (2015, in preparation). LEGUS is an HST Cycle 21 Treasury program that is imaging 50 nearby galaxies in five broadbands with the WFC3/UVIS, from the NUV to the I band. (1 data file).

  1. IDCS J1426.5+3508: Weak Lensing Analysis of a Massive Galaxy Cluster at z = 1.75

    NASA Astrophysics Data System (ADS)

    Mo, Wenli; Gonzalez, Anthony; Jee, M. James; Massey, Richard; Rhodes, Jason; Brodwin, Mark; Eisenhardt, Peter; Marrone, Daniel P.; Stanford, S. A.; Zeimann, Gregory R.

    2016-02-01

    We present a weak lensing study of the galaxy cluster IDCS J1426.5+3508 at z = 1.75, which is the highest-redshift strong lensing cluster known and the most distant cluster for which a weak lensing analysis has been undertaken. Using F160W, F814W, and F606W observations with the Hubble Space Telescope, we detect tangential shear at 2σ significance. Fitting a Navarro–Frenk–White mass profile to the shear with a theoretical median mass-concentration relation, we derive a mass {M}200,{crit}={2.3}-1.4+2.1× {10}14 M⊙. This mass is consistent with previous mass estimates from the Sunyaev–Zel’dovich (SZ) effect, X-ray, and strong lensing. The cluster lies on the local SZ–weak lensing mass scaling relation observed at low redshift, indicative of minimal evolution in this relation.

  2. Weak Lensing Analysis of Massive Galaxy Cluster IDCS J1426.5+3508 at z=1.75

    NASA Astrophysics Data System (ADS)

    Mo, Wenli; Gonzalez, Anthony H.; Jee, Myungkook J.; Massey, Richard; Rhodes, Jason; Brodwin, Mark; Eisenhardt, Peter R.; Marrone, Daniel P.; Stanford, S. Adam; Zeimann, Gregory

    2016-01-01

    We present a weak lensing study of the galaxy cluster IDCS J1426.5+3508 at z=1.75, which is the highest redshift strong lensing cluster known and the most distant cluster for which a weak lensing analysis has been undertaken. Using F160W, F814W, and F606W observations with the Hubble Space Telescope, we detect tangential shear at 2σ significance. Fitting a Navarro-Frenk-White mass profile to the shear with a theoretical median mass-concentration relation, we derive a mass consistent with previous mass estimates from the Sunyaev-Zel'dovich (SZ) effect, X-ray, and strong lensing. The cluster lies on the local SZ-weak lensing mass scaling relation observed at low redshift, indicative of minimal evolution in this relation.

  3. Three-dimensional Multi-probe Analysis of the Galaxy Cluster A1689

    NASA Astrophysics Data System (ADS)

    Umetsu, Keiichi; Sereno, Mauro; Medezinski, Elinor; Nonino, Mario; Mroczkowski, Tony; Diego, Jose M.; Ettori, Stefano; Okabe, Nobuhiro; Broadhurst, Tom; Lemze, Doron

    2015-06-01

    We perform a three-dimensional multi-probe analysis of the rich galaxy cluster A1689, one of the most powerful known lenses on the sky, by combining improved weak-lensing data from new wide-field {{BVR}}Ci\\prime z\\prime Subaru/Suprime-Cam observations with strong-lensing, X-ray, and Sunyaev–Zel’dovich effect (SZE) data sets. We reconstruct the projected matter distribution from a joint weak-lensing analysis of two-dimensional shear and azimuthally integrated magnification constraints, the combination of which allows us to break the mass-sheet degeneracy. The resulting mass distribution reveals elongation with an axis ratio of ∼0.7 in projection, aligned well with the distributions of cluster galaxies and intracluster gas. When assuming a spherical halo, our full weak-lensing analysis yields a projected halo concentration of {c}200c2D=8.9+/- 1.1 ({c}{vir}2D∼ 11), consistent with and improved from earlier weak-lensing work. We find excellent consistency between independent weak and strong lensing in the region of overlap. In a parametric triaxial framework, we constrain the intrinsic structure and geometry of the matter and gas distributions, by combining weak/strong lensing and X-ray/SZE data with minimal geometric assumptions. We show that the data favor a triaxial geometry with minor–major axis ratio 0.39±0.15 and major axis closely aligned with the line of sight (22°±10°). We obtain a halo mass {M}200c=(1.2+/- 0.2)× {10}15 {M}ȯ {h}-1 and a halo concentration {c}200c=8.4+/- 1.3, which overlaps with the ≳ 1σ tail of the predicted distribution. The shape of the gas is rounder than the underlying matter but quite elongated with minor–major axis ratio 0.60 ± 0.14. The gas mass fraction within 0.9 Mpc is {10}-2+3%, a typical value for high-mass clusters. The thermal gas pressure contributes to ∼60% of the equilibrium pressure, indicating a significant level of non-thermal pressure support. When compared to Planck's hydrostatic mass estimate

  4. The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts

    NASA Astrophysics Data System (ADS)

    Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.

    2012-07-01

    We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.

  5. THE CLUSTER AGES EXPERIMENT (CASE). V. ANALYSIS OF THREE ECLIPSING BINARIES IN THE GLOBULAR CLUSTER M4

    SciTech Connect

    Kaluzny, J.; Rozyczka, M.; Krzeminski, W.; Pych, W.; Thompson, I. B.; Burley, G. S.; Shectman, S. A.; Dotter, A.; Rucinski, S. M. E-mail: mnr@camk.edu.pl E-mail: batka@camk.edu.pl E-mail: ian@obs.carnegiescience.edu E-mail: shec@obs.carnegiescience.edu E-mail: rucinski@astro.utoronto.ca

    2013-02-01

    We use photometric and spectroscopic observations of the eclipsing binaries V65, V66, and V69 in the field of the globular cluster M4 to derive masses, radii, and luminosities of their components. The orbital periods of these systems are 2.29, 8.11, and 48.19 days, respectively. The measured masses of the primary and secondary components (M{sub p} and M{sub s} ) are 0.8035 {+-} 0.0086 and 0.6050 {+-} 0.0044 M{sub Sun} for V65, 0.7842 {+-} 0.0045 and 0.7443 {+-} 0.0042 M{sub Sun} for V66, and 0.7665 {+-} 0.0053 and 0.7278 {+-} 0/0048 M{sub Sun} for V69. The measured radii (R{sub p} and R{sub s} ) are 1.147 {+-} 0.010 and 0.6110 {+-} 0.0092 R{sub Sun} for V66, 0.9347 {+-} 0.0048 and 0.8298 {+-} 0.0053 R{sub Sun} for V66, and 0.8655 {+-} 0.0097 and 0.8074 {+-} 0.0080 R{sub Sun} for V69. The orbits of V65 and V66 are circular, whereas that of V69 has an eccentricity of 0.38. Based on systemic velocities and relative proper motions, we show that all three systems are members of the cluster. We find that the distance to M4 is 1.82 {+-} 0.04 kpc-in good agreement with recent estimates based on entirely different methods. We compare the absolute parameters of V66 and V69 with two sets of theoretical isochrones in mass-radius and mass-luminosity diagrams, and for assumed [Fe/H] = -1.20, [{alpha}/Fe] = 0.4, and Y = 0.25 we find the most probable age of M4 to be between 11.2 and 11.3 Gyr. Color-magnitude diagram (CMD) fitting with the same parameters yields an age close to, or slightly in excess of, 12 Gyr. However, considering the sources of uncertainty involved in CMD fitting, these two methods of age determination are not discrepant. Age and distance determinations can be further improved when infrared eclipse photometry is obtained.

  6. Data Mining of University Philanthropic Giving: Cluster-Discriminant Analysis and Pareto Effects

    ERIC Educational Resources Information Center

    Le Blanc, Louis A.; Rucks, Conway T.

    2009-01-01

    A large sample of 33,000 university alumni records were cluster-analyzed to generate six groups relatively unique in their respective attribute values. The attributes used to cluster the former students included average gift to the university's foundation and to the alumni association for the same institution. Cluster detection is useful in this…

  7. The Swift X-ray Telescope Cluster Survey. II. X-ray spectral analysis

    NASA Astrophysics Data System (ADS)

    Tozzi, P.; Moretti, A.; Tundo, E.; Liu, T.; Rosati, P.; Borgani, S.; Tagliaferri, G.; Campana, S.; Fugazza, D.; D'Avanzo, P.

    2014-07-01

    Aims: We present a spectral analysis of a new, flux-limited sample of 72 X-ray selected clusters of galaxies identified with the X-ray Telescope (XRT) on board the Swift satellite down to a flux limit of ~10-14 erg s-1 cm-2 (SWXCS). We carry out a detailed X-ray spectral analysis with the twofold aim of measuring redshifts and characterizing the properties of the intracluster medium (ICM) for the majority of the SWXCS sources. Methods: Optical counterparts and spectroscopic or photometric redshifts for some of the sources are obtained with a cross-correlation with the NASA/IPAC Extragalactic Database. Additional photometric redshifts are computed with a dedicated follow-up program with the Telescopio Nazionale Galileo and a cross-correlation with the SDSS. In addition, we also blindly search for the Hydrogen-like and He-like iron Kα emission line complex in the X-ray spectrum. We detect the iron emission lines in 35% of the sample, and hence obtain a robust measure of the X-ray redshift zX with typical rms error 1-5%. We use zX whenever the optical redshift is not available. Finally, for all the sources with measured redshift, background-subtracted spectra are fitted with a single-temperature mekal model to measure global temperature, X-ray luminosity and iron abundance of the ICM. We perform extensive spectral simulations to accounts for fitting bias, and to assess the robustness of our results. We derive a criterion to select reliable best-fit models and an empirical formula to account for fitting bias. The bias-corrected values are then used to investigate the scaling properties of the X-ray observables. Results: Overall, we are able to characterize the ICM of 46 sources with redshifts (64% of the sample). The sample is mostly constituted by clusters with temperatures between 3 and 10 keV, plus 14 low-mass clusters and groups with temperatures below 3 keV. The redshift distribution peaks around z ~ 0.25 and extends up to z ~ 1, with 60% of the sample at 0.1 < z

  8. Spherical Harmonic Analysis of Particle Velocity Distribution Function: Comparison of Moments and Anisotropies using Cluster Data

    NASA Technical Reports Server (NTRS)

    Gurgiolo, Chris; Vinas, Adolfo F.

    2009-01-01

    This paper presents a spherical harmonic analysis of the plasma velocity distribution function using high-angular, energy, and time resolution Cluster data obtained from the PEACE spectrometer instrument to demonstrate how this analysis models the particle distribution function and its moments and anisotropies. The results show that spherical harmonic analysis produced a robust physical representation model of the velocity distribution function, resolving the main features of the measured distributions. From the spherical harmonic analysis, a minimum set of nine spectral coefficients was obtained from which the moment (up to the heat flux), anisotropy, and asymmetry calculations of the velocity distribution function were obtained. The spherical harmonic method provides a potentially effective "compression" technique that can be easily carried out onboar