Sample records for variable four-partite cluster

  1. KmL3D: a non-parametric algorithm for clustering joint trajectories.

    PubMed

    Genolini, C; Pingault, J B; Driss, T; Côté, S; Tremblay, R E; Vitaro, F; Arnaud, C; Falissard, B

    2013-01-01

    In cohort studies, variables are measured repeatedly and can be considered as trajectories. A classic way to work with trajectories is to cluster them in order to detect the existence of homogeneous patterns of evolution. Since cohort studies usually measure a large number of variables, it might be interesting to study the joint evolution of several variables (also called joint-variable trajectories). To date, the only way to cluster joint-trajectories is to cluster each trajectory independently, then to cross the partitions obtained. This approach is unsatisfactory because it does not take into account a possible co-evolution of variable-trajectories. KmL3D is an R package that implements a version of k-means dedicated to clustering joint-trajectories. It provides facilities for the management of missing values, offers several quality criteria and its graphic interface helps the user to select the best partition. KmL3D can work with any number of joint-variable trajectories. In the restricted case of two joint trajectories, it proposes 3D tools to visualize the partitioning and then export 3D dynamic rotating-graphs to PDF format. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  2. Model-based recursive partitioning to identify risk clusters for metabolic syndrome and its components: findings from the International Mobility in Aging Study

    PubMed Central

    Pirkle, Catherine M; Wu, Yan Yan; Zunzunegui, Maria-Victoria; Gómez, José Fernando

    2018-01-01

    Objective Conceptual models underpinning much epidemiological research on ageing acknowledge that environmental, social and biological systems interact to influence health outcomes. Recursive partitioning is a data-driven approach that allows for concurrent exploration of distinct mixtures, or clusters, of individuals that have a particular outcome. Our aim is to use recursive partitioning to examine risk clusters for metabolic syndrome (MetS) and its components, in order to identify vulnerable populations. Study design Cross-sectional analysis of baseline data from a prospective longitudinal cohort called the International Mobility in Aging Study (IMIAS). Setting IMIAS includes sites from three middle-income countries—Tirana (Albania), Natal (Brazil) and Manizales (Colombia)—and two from Canada—Kingston (Ontario) and Saint-Hyacinthe (Quebec). Participants Community-dwelling male and female adults, aged 64–75 years (n=2002). Primary and secondary outcome measures We apply recursive partitioning to investigate social and behavioural risk factors for MetS and its components. Model-based recursive partitioning (MOB) was used to cluster participants into age-adjusted risk groups based on variabilities in: study site, sex, education, living arrangements, childhood adversities, adult occupation, current employment status, income, perceived income sufficiency, smoking status and weekly minutes of physical activity. Results 43% of participants had MetS. Using MOB, the primary partitioning variable was participant sex. Among women from middle-incomes sites, the predicted proportion with MetS ranged from 58% to 68%. Canadian women with limited physical activity had elevated predicted proportions of MetS (49%, 95% CI 39% to 58%). Among men, MetS ranged from 26% to 41% depending on childhood social adversity and education. Clustering for MetS components differed from the syndrome and across components. Study site was a primary partitioning variable for all components except HDL cholesterol. Sex was important for most components. Conclusion MOB is a promising technique for identifying disease risk clusters (eg, vulnerable populations) in modestly sized samples. PMID:29500203

  3. Extracting Aggregation Free Energies of Mixed Clusters from Simulations of Small Systems: Application to Ionic Surfactant Micelles.

    PubMed

    Zhang, X; Patel, L A; Beckwith, O; Schneider, R; Weeden, C J; Kindt, J T

    2017-11-14

    Micelle cluster distributions from molecular dynamics simulations of a solvent-free coarse-grained model of sodium octyl sulfate (SOS) were analyzed using an improved method to extract equilibrium association constants from small-system simulations containing one or two micelle clusters at equilibrium with free surfactants and counterions. The statistical-thermodynamic and mathematical foundations of this partition-enabled analysis of cluster histograms (PEACH) approach are presented. A dramatic reduction in computational time for analysis was achieved through a strategy similar to the selector variable method to circumvent the need for exhaustive enumeration of the possible partitions of surfactants and counterions into clusters. Using statistics from a set of small-system (up to 60 SOS molecules) simulations as input, equilibrium association constants for micelle clusters were obtained as a function of both number of surfactants and number of associated counterions through a global fitting procedure. The resulting free energies were able to accurately predict micelle size and charge distributions in a large (560 molecule) system. The evolution of micelle size and charge with SOS concentration as predicted by the PEACH-derived free energies and by a phenomenological four-parameter model fit, along with the sensitivity of these predictions to variations in cluster definitions, are analyzed and discussed.

  4. Cluster formation and drag reduction-proposed mechanism of particle recirculation within the partition column of the bottom spray fluid-bed coater.

    PubMed

    Wang, Li Kun; Heng, Paul Wan Sia; Liew, Celine Valeria

    2015-04-01

    Bottom spray fluid-bed coating is a common technique for coating multiparticulates. Under the quality-by-design framework, particle recirculation within the partition column is one of the main variability sources affecting particle coating and coat uniformity. However, the occurrence and mechanism of particle recirculation within the partition column of the coater are not well understood. The purpose of this study was to visualize and define particle recirculation within the partition column. Based on different combinations of partition gap setting, air accelerator insert diameter, and particle size fraction, particle movements within the partition column were captured using a high-speed video camera. The particle recirculation probability and voidage information were mapped using a visiometric process analyzer. High-speed images showed that particles contributing to the recirculation phenomenon were behaving as clustered colonies. Fluid dynamics analysis indicated that particle recirculation within the partition column may be attributed to the combined effect of cluster formation and drag reduction. Both visiometric process analysis and particle coating experiments showed that smaller particles had greater propensity toward cluster formation than larger particles. The influence of cluster formation on coating performance and possible solutions to cluster formation were further discussed. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.

  5. GENERAL: Teleportation of a Bipartite Entangled Coherent State via a Four-Partite Cluster-Type Entangled State

    NASA Astrophysics Data System (ADS)

    Chen, Hui-Na; Liu, Jin-Ming

    2009-10-01

    We present an optical scheme to almost completely teleport a bipartite entangled coherent state using a four-partite cluster-type entangled coherent state as quantum channel. The scheme is based on optical elements such as beam splitters, phase shifters, and photon detectors. We also obtain the average fidelity of the teleportation process. It is shown that the average fidelity is quite close to unity if the mean photon number of the coherent state is not too small.

  6. A comparison of latent class, K-means, and K-median methods for clustering dichotomous data.

    PubMed

    Brusco, Michael J; Shireman, Emilie; Steinley, Douglas

    2017-09-01

    The problem of partitioning a collection of objects based on their measurements on a set of dichotomous variables is a well-established problem in psychological research, with applications including clinical diagnosis, educational testing, cognitive categorization, and choice analysis. Latent class analysis and K-means clustering are popular methods for partitioning objects based on dichotomous measures in the psychological literature. The K-median clustering method has recently been touted as a potentially useful tool for psychological data and might be preferable to its close neighbor, K-means, when the variable measures are dichotomous. We conducted simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data. Although all 3 methods proved capable of recovering cluster structure, K-median clustering yielded the best average performance, followed closely by latent class analysis. We also report results for the 3 methods within the context of an application to transitive reasoning data, in which it was found that the 3 approaches can exhibit profound differences when applied to real data. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  7. Generalized clustering conditions of Jack polynomials at negative Jack parameter {alpha}

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bernevig, B. Andrei; Department of Physics, Princeton University, Princeton, New Jersey 08544; Haldane, F. D. M.

    We present several conjectures on the behavior and clustering properties of Jack polynomials at a negative parameter {alpha}=-(k+1/r-1), with partitions that violate the (k,r,N)- admissibility rule of [Feigin et al. [Int. Math. Res. Notices 23, 1223 (2002)]. We find that the ''highest weight'' Jack polynomials of specific partitions represent the minimum degree polynomials in N variables that vanish when s distinct clusters of k+1 particles are formed, where s and k are positive integers. Explicit counting formulas are conjectured. The generalized clustering conditions are useful in a forthcoming description of fractional quantum Hall quasiparticles.

  8. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  9. Cluster-enriched Yang-Baxter equation from SUSY gauge theories

    NASA Astrophysics Data System (ADS)

    Yamazaki, Masahito

    2018-04-01

    We propose a new generalization of the Yang-Baxter equation, where the R-matrix depends on cluster y-variables in addition to the spectral parameters. We point out that we can construct solutions to this new equation from the recently found correspondence between Yang-Baxter equations and supersymmetric gauge theories. The S^2 partition function of a certain 2d N=(2,2) quiver gauge theory gives an R-matrix, whereas its FI parameters can be identified with the cluster y-variables.

  10. Universal partitioning of the hierarchical fold network of 50-residue segments in proteins

    PubMed Central

    Ito, Jun-ichi; Sonobe, Yuki; Ikeda, Kazuyoshi; Tomii, Kentaro; Higo, Junichi

    2009-01-01

    Background Several studies have demonstrated that protein fold space is structured hierarchically and that power-law statistics are satisfied in relation between the numbers of protein families and protein folds (or superfamilies). We examined the internal structure and statistics in the fold space of 50 amino-acid residue segments taken from various protein folds. We used inter-residue contact patterns to measure the tertiary structural similarity among segments. Using this similarity measure, the segments were classified into a number (Kc) of clusters. We examined various Kc values for the clustering. The special resolution to differentiate the segment tertiary structures increases with increasing Kc. Furthermore, we constructed networks by linking structurally similar clusters. Results The network was partitioned persistently into four regions for Kc ≥ 1000. This main partitioning is consistent with results of earlier studies, where similar partitioning was reported in classifying protein domain structures. Furthermore, the network was partitioned naturally into several dozens of sub-networks (i.e., communities). Therefore, intra-sub-network clusters were mutually connected with numerous links, although inter-sub-network ones were rarely done with few links. For Kc ≥ 1000, the major sub-networks were about 40; the contents of the major sub-networks were conserved. This sub-partitioning is a novel finding, suggesting that the network is structured hierarchically: Segments construct a cluster, clusters form a sub-network, and sub-networks constitute a region. Additionally, the network was characterized by non-power-law statistics, which is also a novel finding. Conclusion Main findings are: (1) The universe of 50 residue segments found here was characterized by non-power-law statistics. Therefore, the universe differs from those ever reported for the protein domains. (2) The 50-residue segments were partitioned persistently and universally into some dozens (ca. 40) of major sub-networks, irrespective of the number of clusters. (3) These major sub-networks encompassed 90% of all segments. Consequently, the protein tertiary structure is constructed using the dozens of elements (sub-networks). PMID:19454039

  11. The implementation of hybrid clustering using fuzzy c-means and divisive algorithm for analyzing DNA human Papillomavirus cause of cervical cancer

    NASA Astrophysics Data System (ADS)

    Andryani, Diyah Septi; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    Clustering aims to classify the different patterns into groups called clusters. In this clustering method, we use n-mers frequency to calculate the distance matrix which is considered more accurate than using the DNA alignment. The clustering results could be used to discover biologically important sub-sections and groups of genes. Many clustering methods have been developed, while hard clustering methods considered less accurate than fuzzy clustering methods, especially if it is used for outliers data. Among fuzzy clustering methods, fuzzy c-means is one the best known for its accuracy and simplicity. Fuzzy c-means clustering uses membership function variable, which refers to how likely the data could be members into a cluster. Fuzzy c-means clustering works using the principle of minimizing the objective function. Parameters of membership function in fuzzy are used as a weighting factor which is also called the fuzzier. In this study we implement hybrid clustering using fuzzy c-means and divisive algorithm which could improve the accuracy of cluster membership compare to traditional partitional approach only. In this study fuzzy c-means is used in the first step to find partition results. Furthermore divisive algorithms will run on the second step to find sub-clusters and dendogram of phylogenetic tree. To find the best number of clusters is determined using the minimum value of Davies Bouldin Index (DBI) of the cluster results. In this research, the results show that the methods introduced in this paper is better than other partitioning methods. Finally, we found 3 clusters with DBI value of 1.126628 at first step of clustering. Moreover, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values.

  12. Implementation of MPEG-2 encoder to multiprocessor system using multiple MVPs (TMS320C80)

    NASA Astrophysics Data System (ADS)

    Kim, HyungSun; Boo, Kenny; Chung, SeokWoo; Choi, Geon Y.; Lee, YongJin; Jeon, JaeHo; Park, Hyun Wook

    1997-05-01

    This paper presents the efficient algorithm mapping for the real-time MPEG-2 encoding on the KAIST image computing system (KICS), which has a parallel architecture using five multimedia video processors (MVPs). The MVP is a general purpose digital signal processor (DSP) of Texas Instrument. It combines one floating-point processor and four fixed- point DSPs on a single chip. The KICS uses the MVP as a primary processing element (PE). Two PEs form a cluster, and there are two processing clusters in the KICS. Real-time MPEG-2 encoder is implemented through the spatial and the functional partitioning strategies. Encoding process of spatially partitioned half of the video input frame is assigned to ne processing cluster. Two PEs perform the functionally partitioned MPEG-2 encoding tasks in the pipelined operation mode. One PE of a cluster carries out the transform coding part and the other performs the predictive coding part of the MPEG-2 encoding algorithm. One MVP among five MVPs is used for system control and interface with host computer. This paper introduces an implementation of the MPEG-2 algorithm with a parallel processing architecture.

  13. Semi-supervised clustering methods.

    PubMed

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.

  14. Data-driven process decomposition and robust online distributed modelling for large-scale processes

    NASA Astrophysics Data System (ADS)

    Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou

    2018-02-01

    With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.

  15. Semi-supervised clustering methods

    PubMed Central

    Bair, Eric

    2013-01-01

    Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830

  16. A Solution Space for a System of Null-State Partial Differential Equations: Part 1

    NASA Astrophysics Data System (ADS)

    Flores, Steven M.; Kleban, Peter

    2015-01-01

    This article is the first of four that completely and rigorously characterize a solution space for a homogeneous system of 2 N + 3 linear partial differential equations (PDEs) in 2 N variables that arises in conformal field theory (CFT) and multiple Schramm-Löwner evolution (SLE). In CFT, these are null-state equations and conformal Ward identities. They govern partition functions for the continuum limit of a statistical cluster or loop-gas model, such as percolation, or more generally the Potts models and O( n) models, at the statistical mechanical critical point. (SLE partition functions also satisfy these equations.) For such a lattice model in a polygon with its 2 N sides exhibiting a free/fixed side-alternating boundary condition , this partition function is proportional to the CFT correlation function where the w i are the vertices of and where is a one-leg corner operator. (Partition functions for "crossing events" in which clusters join the fixed sides of in some specified connectivity are linear combinations of such correlation functions.) When conformally mapped onto the upper half-plane, methods of CFT show that this correlation function satisfies the system of PDEs that we consider. In this first article, we use methods of analysis to prove that the dimension of this solution space is no more than C N , the Nth Catalan number. While our motivations are based in CFT, our proofs are completely rigorous. This proof is contained entirely within this article, except for the proof of Lemma 14, which constitutes the second article (Flores and Kleban, in Commun Math Phys, arXiv:1404.0035, 2014). In the third article (Flores and Kleban, in Commun Math Phys, arXiv:1303.7182, 2013), we use the results of this article to prove that the solution space of this system of PDEs has dimension C N and is spanned by solutions constructed with the CFT Coulomb gas (contour integral) formalism. In the fourth article (Flores and Kleban, in Commun Math Phys, arXiv:1405.2747, 2014), we prove further CFT-related properties about these solutions, some useful for calculating cluster-crossing probabilities of critical lattice models in polygons.

  17. Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

    NASA Astrophysics Data System (ADS)

    Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.

  18. Predicting cannabis abuse screening test (CAST) scores: a recursive partitioning analysis using survey data from Czech Republic, Italy, the Netherlands and Sweden.

    PubMed

    Blankers, Matthijs; Frijns, Tom; Belackova, Vendula; Rossi, Carla; Svensson, Bengt; Trautmann, Franz; van Laar, Margriet

    2014-01-01

    Cannabis is Europe's most commonly used illicit drug. Some users do not develop dependence or other problems, whereas others do. Many factors are associated with the occurrence of cannabis-related disorders. This makes it difficult to identify key risk factors and markers to profile at-risk cannabis users using traditional hypothesis-driven approaches. Therefore, the use of a data-mining technique called binary recursive partitioning is demonstrated in this study by creating a classification tree to profile at-risk users. 59 variables on cannabis use and drug market experiences were extracted from an internet-based survey dataset collected in four European countries (Czech Republic, Italy, Netherlands and Sweden), n = 2617. These 59 potential predictors of problematic cannabis use were used to partition individual respondents into subgroups with low and high risk of having a cannabis use disorder, based on their responses on the Cannabis Abuse Screening Test. Both a generic model for the four countries combined and four country-specific models were constructed. Of the 59 variables included in the first analysis step, only three variables were required to construct a generic partitioning model to classify high risk cannabis users with 65-73% accuracy. Based on the generic model for the four countries combined, the highest risk for cannabis use disorder is seen in participants reporting a cannabis use on more than 200 days in the last 12 months. In comparison to the generic model, the country-specific models led to modest, non-significant improvements in classification accuracy, with an exception for Italy (p = 0.01). Using recursive partitioning, it is feasible to construct classification trees based on only a few variables with acceptable performance to classify cannabis users into groups with low or high risk of meeting criteria for cannabis use disorder. The number of cannabis use days in the last 12 months is the most relevant variable. The identified variables may be considered for use in future screeners for cannabis use disorders.

  19. A Comparison of Heuristic Procedures for Minimum within-Cluster Sums of Squares Partitioning

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Steinley, Douglas

    2007-01-01

    Perhaps the most common criterion for partitioning a data set is the minimization of the within-cluster sums of squared deviation from cluster centroids. Although optimal solution procedures for within-cluster sums of squares (WCSS) partitioning are computationally feasible for small data sets, heuristic procedures are required for most practical…

  20. On the complexity of some quadratic Euclidean 2-clustering problems

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Pyatkin, A. V.

    2016-03-01

    Some problems of partitioning a finite set of points of Euclidean space into two clusters are considered. In these problems, the following criteria are minimized: (1) the sum over both clusters of the sums of squared pairwise distances between the elements of the cluster and (2) the sum of the (multiplied by the cardinalities of the clusters) sums of squared distances from the elements of the cluster to its geometric center, where the geometric center (or centroid) of a cluster is defined as the mean value of the elements in that cluster. Additionally, another problem close to (2) is considered, where the desired center of one of the clusters is given as input, while the center of the other cluster is unknown (is the variable to be optimized) as in problem (2). Two variants of the problems are analyzed, in which the cardinalities of the clusters are (1) parts of the input or (2) optimization variables. It is proved that all the considered problems are strongly NP-hard and that, in general, there is no fully polynomial-time approximation scheme for them (unless P = NP).

  1. Mixture modelling for cluster analysis.

    PubMed

    McLachlan, G J; Chang, S U

    2004-10-01

    Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.

  2. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    PubMed Central

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  3. Comparative study of feature selection with ensemble learning using SOM variants

    NASA Astrophysics Data System (ADS)

    Filali, Ameni; Jlassi, Chiraz; Arous, Najet

    2017-03-01

    Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.

  4. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    ERIC Educational Resources Information Center

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  5. Measuring Constraint-Set Utility for Partitional Clustering Algorithms

    NASA Technical Reports Server (NTRS)

    Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato

    2006-01-01

    Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

  6. Grouped fuzzy SVM with EM-based partition of sample space for clustered microcalcification detection.

    PubMed

    Wang, Huiya; Feng, Jun; Wang, Hongyu

    2017-07-20

    Detection of clustered microcalcification (MC) from mammograms plays essential roles in computer-aided diagnosis for early stage breast cancer. To tackle problems associated with the diversity of data structures of MC lesions and the variability of normal breast tissues, multi-pattern sample space learning is required. In this paper, a novel grouped fuzzy Support Vector Machine (SVM) algorithm with sample space partition based on Expectation-Maximization (EM) (called G-FSVM) is proposed for clustered MC detection. The diversified pattern of training data is partitioned into several groups based on EM algorithm. Then a series of fuzzy SVM are integrated for classification with each group of samples from the MC lesions and normal breast tissues. From DDSM database, a total of 1,064 suspicious regions are selected from 239 mammography, and the measurement of Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and EVL = TPR* 1-FPR are 0.82, 0.78, 0.14 and 0.72, respectively. The proposed method incorporates the merits of fuzzy SVM and multi-pattern sample space learning, decomposing the MC detection problem into serial simple two-class classification. Experimental results from synthetic data and DDSM database demonstrate that our integrated classification framework reduces the false positive rate significantly while maintaining the true positive rate.

  7. Multivariate regression model for partitioning tree volume of white oak into round-product classes

    Treesearch

    Daniel A. Yaussy; David L. Sonderman

    1984-01-01

    Describes the development of multivariate equations that predict the expected cubic volume of four round-product classes from independent variables composed of individual tree-quality characteristics. Although the model has limited application at this time, it does demonstrate the feasibility of partitioning total tree cubic volume into round-product classes based on...

  8. An agglomerative hierarchical clustering approach to visualisation in Bayesian clustering problems

    PubMed Central

    Dawson, Kevin J.; Belkhir, Khalid

    2009-01-01

    Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals, - the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. Since the number of possible partitions grows very rapidly with the sample size, we can not visualise this probability distribution in its entirety, unless the sample is very small. As a solution to this visualisation problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package Partition View. The exact linkage algorithm takes the posterior co-assignment probabilities as input, and yields as output a rooted binary tree, - or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities. PMID:19337306

  9. Network Disruption in the Preclinical Stages of Alzheimer's Disease: From Subjective Cognitive Decline to Mild Cognitive Impairment.

    PubMed

    López-Sanz, David; Garcés, Pilar; Álvarez, Blanca; Delgado-Losada, María Luisa; López-Higes, Ramón; Maestú, Fernando

    2017-12-01

    Subjective Cognitive Decline (SCD) is a largely unknown state thought to represent a preclinical stage of Alzheimer's Disease (AD) previous to mild cognitive impairment (MCI). However, the course of network disruption in these stages is scarcely characterized. We employed resting state magnetoencephalography in the source space to calculate network smallworldness, clustering, modularity and transitivity. Nodal measures (clustering and node degree) as well as modular partitions were compared between groups. The MCI group exhibited decreased smallworldness, clustering and transitivity and increased modularity in theta and beta bands. SCD showed similar but smaller changes in clustering and transitivity, while exhibiting alterations in the alpha band in opposite direction to those showed by MCI for modularity and transitivity. At the node level, MCI disrupted both clustering and nodal degree while SCD showed minor changes in the latter. Additionally, we observed an increase in modular partition variability in both SCD and MCI in theta and beta bands. SCD elders exhibit a significant network disruption, showing intermediate values between HC and MCI groups in multiple parameters. These results highlight the relevance of cognitive concerns in the clinical setting and suggest that network disorganization in AD could start in the preclinical stages before the onset of cognitive symptoms.

  10. The threshold bootstrap clustering: a new approach to find families or transmission clusters within molecular quasispecies.

    PubMed

    Prosperi, Mattia C F; De Luca, Andrea; Di Giambenedetto, Simona; Bracciale, Laura; Fabbiani, Massimiliano; Cauda, Roberto; Salemi, Marco

    2010-10-25

    Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.

  11. Finite element modeling of diffusion and partitioning in biological systems: the infinite composite medium problem.

    PubMed

    Missel, P J

    2000-01-01

    Four methods are proposed for modeling diffusion in heterogeneous media where diffusion and partition coefficients take on differing values in each subregion. The exercise was conducted to validate finite element modeling (FEM) procedures in anticipation of modeling drug diffusion with regional partitioning into ocular tissue, though the approach can be useful for other organs, or for modeling diffusion in laminate devices. Partitioning creates a discontinuous value in the dependent variable (concentration) at an intertissue boundary that is not easily handled by available general-purpose FEM codes, which allow for only one value at each node. The discontinuity is handled using a transformation on the dependent variable based upon the region-specific partition coefficient. Methods were evaluated by their ability to reproduce a known exact result, for the problem of the infinite composite medium (Crank, J. The Mathematics of Diffusion, 2nd ed. New York: Oxford University Press, 1975, pp. 38-39.). The most physically intuitive method is based upon the concept of chemical potential, which is continuous across an interphase boundary (method III). This method makes the equation of the dependent variable highly nonlinear. This can be linearized easily by a change of variables (method IV). Results are also given for a one-dimensional problem simulating bolus injection into the vitreous, predicting time disposition of drug in vitreous and retina.

  12. Genetic diversity and population structure analysis between Indian red jungle fowl and domestic chicken using microsatellite markers.

    PubMed

    Kumar, Vinay; Shukla, Sanjeev K; Mathew, Jose; Sharma, Deepak

    2015-01-01

    The present study was conducted to assess the genetic diversity, population structure, and relatedness in Indian red jungle fowl (RJF, Gallus gallus murgi) from northern India and three domestic chicken populations (gallus gallus domesticus), maintained at the institute farms, namely White Leghorn (WL), Aseel (AS) and Red Cornish (RC) using 25 microsatellite markers. All the markers were polymorphic, the number of alleles at each locus ranged from five (MCW0111) to forty-three (LEI0212) with an average number of 19 alleles per locus. Across all loci, the mean expected heterozygosity and polymorphic information content were 0.883 and 0.872, respectively. Population-specific alleles were found in each population. A UPGMA dendrogram based on shared allele distances clearly revealed two major clusters among the four populations; cluster I had genotypes from RJF and WL whereas cluster II had AS and RC genotypes. Furthermore, the estimation of population structure was performed to understand how genetic variation is partitioned within and among populations. The maximum ▵K value was observed for K = 4 with four identified clusters. Furthermore, factorial analysis clearly showed four clustering; each cluster represented the four types of population used in the study. These results clearly, demonstrate the potential of microsatellite markers in elucidating the genetic diversity, relationships, and population structure analysis in RJF and domestic chicken populations.

  13. Approximation algorithm for the problem of partitioning a sequence into clusters

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Mikhailova, L. V.; Khamidullin, S. A.; Khandeev, V. I.

    2017-08-01

    We consider the problem of partitioning a finite sequence of Euclidean points into a given number of clusters (subsequences) using the criterion of the minimal sum (over all clusters) of intercluster sums of squared distances from the elements of the clusters to their centers. It is assumed that the center of one of the desired clusters is at the origin, while the center of each of the other clusters is unknown and determined as the mean value over all elements in this cluster. Additionally, the partition obeys two structural constraints on the indices of sequence elements contained in the clusters with unknown centers: (1) the concatenation of the indices of elements in these clusters is an increasing sequence, and (2) the difference between an index and the preceding one is bounded above and below by prescribed constants. It is shown that this problem is strongly NP-hard. A 2-approximation algorithm is constructed that is polynomial-time for a fixed number of clusters.

  14. A new clustering algorithm applicable to multispectral and polarimetric SAR images

    NASA Technical Reports Server (NTRS)

    Wong, Yiu-Fai; Posner, Edward C.

    1993-01-01

    We describe an application of a scale-space clustering algorithm to the classification of a multispectral and polarimetric SAR image of an agricultural site. After the initial polarimetric and radiometric calibration and noise cancellation, we extracted a 12-dimensional feature vector for each pixel from the scattering matrix. The clustering algorithm was able to partition a set of unlabeled feature vectors from 13 selected sites, each site corresponding to a distinct crop, into 13 clusters without any supervision. The cluster parameters were then used to classify the whole image. The classification map is much less noisy and more accurate than those obtained by hierarchical rules. Starting with every point as a cluster, the algorithm works by melting the system to produce a tree of clusters in the scale space. It can cluster data in any multidimensional space and is insensitive to variability in cluster densities, sizes and ellipsoidal shapes. This algorithm, more powerful than existing ones, may be useful for remote sensing for land use.

  15. Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data.

    PubMed

    Vera, J Fernando; Macías, Rodrigo

    2017-06-01

    One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.

  16. Co-Clustering by Bipartite Spectral Graph Partitioning for Out-of-Tutor Prediction

    ERIC Educational Resources Information Center

    Trivedi, Shubhendu; Pardos, Zachary A.; Sarkozy, Gabor N.; Heffernan, Neil T.

    2012-01-01

    Learning a more distributed representation of the input feature space is a powerful method to boost the performance of a given predictor. Often this is accomplished by partitioning the data into homogeneous groups by clustering so that separate models could be trained on each cluster. Intuitively each such predictor is a better representative of…

  17. On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.

    ERIC Educational Resources Information Center

    Carter, Randy L.; And Others

    1989-01-01

    The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…

  18. Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm

    NASA Astrophysics Data System (ADS)

    Frisca, Bustamam, Alhadi; Siswantining, Titin

    2017-03-01

    Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.

  19. Finding reproducible cluster partitions for the k-means algorithm

    PubMed Central

    2013-01-01

    K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset. PMID:23369085

  20. Finding reproducible cluster partitions for the k-means algorithm.

    PubMed

    Lisboa, Paulo J G; Etchells, Terence A; Jarman, Ian H; Chambers, Simon J

    2013-01-01

    K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.

  1. A similarity based agglomerative clustering algorithm in networks

    NASA Astrophysics Data System (ADS)

    Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong

    2018-04-01

    The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.

  2. The Clusters AgeS Experiment (CASE). Variable Stars in the Field of the Globular Cluster NGC 3201

    NASA Astrophysics Data System (ADS)

    Kaluzny, J.; Rozyczka, M.; Thompson, I. B.; Narloch, W.; Mazur, B.; Pych, W.; Schwarzenberg-Czerny, A.

    2016-01-01

    The field of the globular cluster NGC 3201 was monitored between 1998 and 2009 in a search for variable stars. BV light curves were obtained for 152 periodic or likely periodic variables, fifty-seven of which are new detections. Thirty-seven newly detected variables are proper motion members of the cluster. Among them we found seven detached or semi-detached eclipsing binaries, four contact binaries, and eight SX Phe pulsators. Four of the eclipsing binaries are located in the turnoff region, one on the lower main sequence and the remaining two slightly above the subgiant branch. Two contact systems are blue stragglers, and another two reside in the turnoff region. In the blue straggler region a total of 266 objects were found, of which 140 are proper motion (PM) members of NGC 3201, and another nineteen are field stars. Seventy-eight of the remaining objects for which we do not have PM data are located within the half-light radius from the center of the cluster, and most of them are likely genuine blue stragglers. Four variable objects in our field of view were found to coincide with X-ray sources: three chromospherically active stars and a quasar at a redshift z≍0.5.

  3. Automating the expert consensus paradigm for robust lung tissue classification

    NASA Astrophysics Data System (ADS)

    Rajagopalan, Srinivasan; Karwoski, Ronald A.; Raghunath, Sushravya; Bartholmai, Brian J.; Robb, Richard A.

    2012-03-01

    Clinicians confirm the efficacy of dynamic multidisciplinary interactions in diagnosing Lung disease/wellness from CT scans. However, routine clinical practice cannot readily accomodate such interactions. Current schemes for automating lung tissue classification are based on a single elusive disease differentiating metric; this undermines their reliability in routine diagnosis. We propose a computational workflow that uses a collection (#: 15) of probability density functions (pdf)-based similarity metrics to automatically cluster pattern-specific (#patterns: 5) volumes of interest (#VOI: 976) extracted from the lung CT scans of 14 patients. The resultant clusters are refined for intra-partition compactness and subsequently aggregated into a super cluster using a cluster ensemble technique. The super clusters were validated against the consensus agreement of four clinical experts. The aggregations correlated strongly with expert consensus. By effectively mimicking the expertise of physicians, the proposed workflow could make automation of lung tissue classification a clinical reality.

  4. Variable length adjacent partitioning for PTS based PAPR reduction of OFDM signal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ibraheem, Zeyid T.; Rahman, Md. Mijanur; Yaakob, S. N.

    2015-05-15

    Peak-to-Average power ratio (PAPR) is a major drawback in OFDM communication. It leads the power amplifier into nonlinear region operation resulting into loss of data integrity. As such, there is a strong motivation to find techniques to reduce PAPR. Partial Transmit Sequence (PTS) is an attractive scheme for this purpose. Judicious partitioning the OFDM data frame into disjoint subsets is a pivotal component of any PTS scheme. Out of the existing partitioning techniques, adjacent partitioning is characterized by an attractive trade-off between cost and performance. With an aim of determining effects of length variability of adjacent partitions, we performed anmore » investigation into the performances of a variable length adjacent partitioning (VL-AP) and fixed length adjacent partitioning in comparison with other partitioning schemes such as pseudorandom partitioning. Simulation results with different modulation and partitioning scenarios showed that fixed length adjacent partition had better performance compared to variable length adjacent partitioning. As expected, simulation results showed a slightly better performance of pseudorandom partitioning technique compared to fixed and variable adjacent partitioning schemes. However, as the pseudorandom technique incurs high computational complexities, adjacent partitioning schemes were still seen as favorable candidates for PAPR reduction.« less

  5. Climatic and physiographic controls of spatial variability in surface water balance over the contiguous United States using the Budyko relationship

    NASA Astrophysics Data System (ADS)

    Abatzoglou, John T.; Ficklin, Darren L.

    2017-09-01

    The geographic variability in the partitioning of precipitation into surface runoff (Q) and evapotranspiration (ET) is fundamental to understanding regional water availability. The Budyko equation suggests this partitioning is strictly a function of aridity, yet observed deviations from this relationship for individual watersheds impede using the framework to model surface water balance in ungauged catchments and under future climate and land use scenarios. A set of climatic, physiographic, and vegetation metrics were used to model the spatial variability in the partitioning of precipitation for 211 watersheds across the contiguous United States (CONUS) within Budyko's framework through the free parameter ω. A generalized additive model found that four widely available variables, precipitation seasonality, the ratio of soil water holding capacity to precipitation, topographic slope, and the fraction of precipitation falling as snow, explained 81.2% of the variability in ω. The ω model applied to the Budyko equation explained 97% of the spatial variability in long-term Q for an independent set of watersheds. The ω model was also applied to estimate the long-term water balance across the CONUS for both contemporary and mid-21st century conditions. The modeled partitioning of observed precipitation to Q and ET compared favorably across the CONUS with estimates from more sophisticated land-surface modeling efforts. For mid-21st century conditions, the model simulated an increase in the fraction of precipitation used by ET across the CONUS with declines in Q for much of the eastern CONUS and mountainous watersheds across the western United States.

  6. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  7. Discrete wavelet approach to multifractality

    NASA Astrophysics Data System (ADS)

    Isaacson, Susana I.; Gabbanelli, Susana C.; Busch, Jorge R.

    2000-12-01

    The use of wavelet techniques for the multifractal analysis generalizes the box counting approach, and in addition provides information on eventual deviations of multifractal behavior. By the introduction of a wavelet partition function Wq and its corresponding free energy (beta) (q), the discrepancies between (beta) (q) and the multifractal free energy r(q) are shown to be indicative of these deviations. We study with Daubechies wavelets (D4) some 1D examples previously treated with Haar wavelets, and we apply the same ideas to some 2D Monte Carlo configurations, that simulate a solution under the action of an attractive potential. In this last case, we study the influence in the multifractal spectra and partition functions of four physical parameters: the intensity of the pairwise potential, the temperature, the range of the model potential, and the concentration of the solution. The wavelet partition function Wq carries more information about the cluster statistics than the multifractal partition function Zq, and the location of its peaks contributes to the determination of characteristic sales of the measure. In our experiences, the information provided by Daubechies wavelet sis slightly more accurate than the one obtained by Haar wavelets.

  8. Cluster Stability Estimation Based on a Minimal Spanning Trees Approach

    NASA Astrophysics Data System (ADS)

    Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora

    2009-08-01

    Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.

  9. Anharmonic effects in the quantum cluster equilibrium method

    NASA Astrophysics Data System (ADS)

    von Domaros, Michael; Perlt, Eva

    2017-03-01

    The well-established quantum cluster equilibrium (QCE) model provides a statistical thermodynamic framework to apply high-level ab initio calculations of finite cluster structures to macroscopic liquid phases using the partition function. So far, the harmonic approximation has been applied throughout the calculations. In this article, we apply an important correction in the evaluation of the one-particle partition function and account for anharmonicity. Therefore, we implemented an analytical approximation to the Morse partition function and the derivatives of its logarithm with respect to temperature, which are required for the evaluation of thermodynamic quantities. This anharmonic QCE approach has been applied to liquid hydrogen chloride and cluster distributions, and the molar volume, the volumetric thermal expansion coefficient, and the isobaric heat capacity have been calculated. An improved description for all properties is observed if anharmonic effects are considered.

  10. Implementation of spectral clustering with partitioning around medoids (PAM) algorithm on microarray data of carcinoma

    NASA Astrophysics Data System (ADS)

    Cahyaningrum, Rosalia D.; Bustamam, Alhadi; Siswantining, Titin

    2017-03-01

    Technology of microarray became one of the imperative tools in life science to observe the gene expression levels, one of which is the expression of the genes of people with carcinoma. Carcinoma is a cancer that forms in the epithelial tissue. These data can be analyzed such as the identification expressions hereditary gene and also build classifications that can be used to improve diagnosis of carcinoma. Microarray data usually served in large dimension that most methods require large computing time to do the grouping. Therefore, this study uses spectral clustering method which allows to work with any object for reduces dimension. Spectral clustering method is a method based on spectral decomposition of the matrix which is represented in the form of a graph. After the data dimensions are reduced, then the data are partitioned. One of the famous partition method is Partitioning Around Medoids (PAM) which is minimize the objective function with exchanges all the non-medoid points into medoid point iteratively until converge. Objectivity of this research is to implement methods spectral clustering and partitioning algorithm PAM to obtain groups of 7457 genes with carcinoma based on the similarity value. The result in this study is two groups of genes with carcinoma.

  11. Decrease in Leaf Sucrose Synthesis Leads to Increased Leaf Starch Turnover and Decreased RuBP-limited Photosynthesis But Not Rubisco-limited Photosynthesis in Arabidopsis Null Mutants of SPSA1

    USDA-ARS?s Scientific Manuscript database

    SPS (Sucrose phosphate synthase) isoforms from dicots cluster into families A, B and C. In this study, we investigated the individual effect of null mutations of each of the four SPS genes in Arabidopsis (spsa1, spsa2, spsb and spsc) on photosynthesis and carbon partitioning. Null mutants spsa1 and ...

  12. Finding and testing network communities by lumped Markov chains.

    PubMed

    Piccardi, Carlo

    2011-01-01

    Identifying communities (or clusters), namely groups of nodes with comparatively strong internal connectivity, is a fundamental task for deeply understanding the structure and function of a network. Yet, there is a lack of formal criteria for defining communities and for testing their significance. We propose a sharp definition that is based on a quality threshold. By means of a lumped Markov chain model of a random walker, a quality measure called "persistence probability" is associated to a cluster, which is then defined as an "α-community" if such a probability is not smaller than α. Consistently, a partition composed of α-communities is an "α-partition." These definitions turn out to be very effective for finding and testing communities. If a set of candidate partitions is available, setting the desired α-level allows one to immediately select the α-partition with the finest decomposition. Simultaneously, the persistence probabilities quantify the quality of each single community. Given its ability in individually assessing each single cluster, this approach can also disclose single well-defined communities even in networks that overall do not possess a definite clusterized structure.

  13. Tensor Spectral Clustering for Partitioning Higher-order Network Structures.

    PubMed

    Benson, Austin R; Gleich, David F; Leskovec, Jure

    2015-01-01

    Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.

  14. Tensor Spectral Clustering for Partitioning Higher-order Network Structures

    PubMed Central

    Benson, Austin R.; Gleich, David F.; Leskovec, Jure

    2016-01-01

    Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms. PMID:27812399

  15. Three list scheduling temporal partitioning algorithm of time space characteristic analysis and compare for dynamic reconfigurable computing

    NASA Astrophysics Data System (ADS)

    Chen, Naijin

    2013-03-01

    Level Based Partitioning (LBP) algorithm, Cluster Based Partitioning (CBP) algorithm and Enhance Static List (ESL) temporal partitioning algorithm based on adjacent matrix and adjacent table are designed and implemented in this paper. Also partitioning time and memory occupation based on three algorithms are compared. Experiment results show LBP partitioning algorithm possesses the least partitioning time and better parallel character, as far as memory occupation and partitioning time are concerned, algorithms based on adjacent table have less partitioning time and less space memory occupation.

  16. Multipartite Entanglement Detection with Minimal Effort

    NASA Astrophysics Data System (ADS)

    Knips, Lukas; Schwemmer, Christian; Klein, Nico; Wieśniak, Marcin; Weinfurter, Harald

    2016-11-01

    Certifying entanglement of a multipartite state is generally considered a demanding task. Since an N qubit state is parametrized by 4N-1 real numbers, one might naively expect that the measurement effort of generic entanglement detection also scales exponentially with N . Here, we introduce a general scheme to construct efficient witnesses requiring a constant number of measurements independent of the number of qubits for states like, e.g., Greenberger-Horne-Zeilinger states, cluster states, and Dicke states. For four qubits, we apply this novel method to experimental realizations of the aforementioned states and prove genuine four-partite entanglement with two measurement settings only.

  17. High Performance Computing Based Parallel HIearchical Modal Association Clustering (HPAR HMAC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Patlolla, Dilip R; Surendran Nair, Sujithkumar; Graves, Daniel A.

    For many applications, clustering is a crucial step in order to gain insight into the makeup of a dataset. The best approach to a given problem often depends on a variety of factors, such as the size of the dataset, time restrictions, and soft clustering requirements. The HMAC algorithm seeks to combine the strengths of 2 particular clustering approaches: model-based and linkage-based clustering. One particular weakness of HMAC is its computational complexity. HMAC is not practical for mega-scale data clustering. For high-definition imagery, a user would have to wait months or years for a result; for a 16-megapixel image, themore » estimated runtime skyrockets to over a decade! To improve the execution time of HMAC, it is reasonable to consider an multi-core implementation that utilizes available system resources. An existing imple-mentation (Ray and Cheng 2014) divides the dataset into N partitions - one for each thread prior to executing the HMAC algorithm. This implementation benefits from 2 types of optimization: parallelization and divide-and-conquer. By running each partition in parallel, the program is able to accelerate computation by utilizing more system resources. Although the parallel implementation provides considerable improvement over the serial HMAC, it still suffers from poor computational complexity, O(N2). Once the maximum number of cores on a system is exhausted, the program exhibits slower behavior. We now consider a modification to HMAC that involves a recursive partitioning scheme. Our modification aims to exploit divide-and-conquer benefits seen by the parallel HMAC implementation. At each level in the recursion tree, partitions are divided into 2 sub-partitions until a threshold size is reached. When the partition can no longer be divided without falling below threshold size, the base HMAC algorithm is applied. This results in a significant speedup over the parallel HMAC.« less

  18. Clustering of longitudinal data by using an extended baseline: A new method for treatment efficacy clustering in longitudinal data.

    PubMed

    Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine

    2018-01-01

    Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.

  19. Theoretical microbial ecology without species

    NASA Astrophysics Data System (ADS)

    Tikhonov, Mikhail

    2017-09-01

    Ecosystems are commonly conceptualized as networks of interacting species. However, partitioning natural diversity of organisms into discrete units is notoriously problematic and mounting experimental evidence raises the intriguing question whether this perspective is appropriate for the microbial world. Here an alternative formalism is proposed that does not require postulating the existence of species as fundamental ecological variables and provides a naturally hierarchical description of community dynamics. This formalism allows approaching the species problem from the opposite direction. While the classical models treat a world of imperfectly clustered organism types as a perturbation around well-clustered species, the presented approach allows gradually adding structure to a fully disordered background. The relevance of this theoretical construct for describing highly diverse natural ecosystems is discussed.

  20. Integrated simultaneous analysis of different biomedical data types with exact weighted bi-cluster editing.

    PubMed

    Sun, Peng; Guo, Jiong; Baumbach, Jan

    2012-07-17

    The explosion of biological data has largely influenced the focus of today’s biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.

  1. Integrated simultaneous analysis of different biomedical data types with exact weighted bi-cluster editing.

    PubMed

    Sun, Peng; Guo, Jiong; Baumbach, Jan

    2012-06-01

    The explosion of biological data has largely influenced the focus of today's biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.

  2. Bayesian clustering of DNA sequences using Markov chains and a stochastic partition model.

    PubMed

    Jääskinen, Väinö; Parkkinen, Ville; Cheng, Lu; Corander, Jukka

    2014-02-01

    In many biological applications it is necessary to cluster DNA sequences into groups that represent underlying organismal units, such as named species or genera. In metagenomics this grouping needs typically to be achieved on the basis of relatively short sequences which contain different types of errors, making the use of a statistical modeling approach desirable. Here we introduce a novel method for this purpose by developing a stochastic partition model that clusters Markov chains of a given order. The model is based on a Dirichlet process prior and we use conjugate priors for the Markov chain parameters which enables an analytical expression for comparing the marginal likelihoods of any two partitions. To find a good candidate for the posterior mode in the partition space, we use a hybrid computational approach which combines the EM-algorithm with a greedy search. This is demonstrated to be faster and yield highly accurate results compared to earlier suggested clustering methods for the metagenomics application. Our model is fairly generic and could also be used for clustering of other types of sequence data for which Markov chains provide a reasonable way to compress information, as illustrated by experiments on shotgun sequence type data from an Escherichia coli strain.

  3. Comments on "The multisynapse neural network and its application to fuzzy clustering".

    PubMed

    Yu, Jian; Hao, Pengwei

    2005-05-01

    In the above-mentioned paper, Wei and Fahn proposed a neural architecture, the multisynapse neural network, to solve constrained optimization problems including high-order, logarithmic, and sinusoidal forms, etc. As one of its main applications, a fuzzy bidirectional associative clustering network (FBACN) was proposed for fuzzy-partition clustering according to the objective-functional method. The connection between the objective-functional-based fuzzy c-partition algorithms and FBACN is the Lagrange multiplier approach. Unfortunately, the Lagrange multiplier approach was incorrectly applied so that FBACN does not equivalently minimize its corresponding constrained objective-function. Additionally, Wei and Fahn adopted traditional definition of fuzzy c-partition, which is not satisfied by FBACN. Therefore, FBACN can not solve constrained optimization problems, either.

  4. Random Partition Distribution Indexed by Pairwise Information

    PubMed Central

    Dahl, David B.; Day, Ryan; Tsai, Jerry W.

    2017-01-01

    We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318

  5. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    PubMed

    Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

    2015-01-01

    To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  6. Major depressive disorder subtypes to predict long-term course

    PubMed Central

    van Loo, Hanna M.; Cai, Tianxi; Gruber, Michael J.; Li, Junlong; de Jonge, Peter; Petukhova, Maria; Rose, Sherri; Sampson, Nancy A.; Schoevers, Robert A.; Wardenaar, Klaas J.; Wilcox, Marsha A.; Al-Hamzawi, Ali Obaid; Andrade, Laura Helena; Bromet, Evelyn J.; Bunting, Brendan; Fayyad, John; Florescu, Silvia E.; Gureje, Oye; Hu, Chiyi; Huang, Yueqin; Levinson, Daphna; Medina-Mora, Maria Elena; Nakane, Yoshibumi; Posada-Villa, Jose; Scott, Kate M.; Xavier, Miguel; Zarkov, Zahari; Kessler, Ronald C.

    2016-01-01

    Background Variation in course of major depressive disorder (MDD) is not strongly predicted by existing subtype distinctions. A new subtyping approach is considered here. Methods Two data mining techniques, ensemble recursive partitioning and Lasso generalized linear models (GLMs) followed by k-means cluster analysis, are used to search for subtypes based on index episode symptoms predicting subsequent MDD course in the World Mental Health (WMH) Surveys. The WMH surveys are community surveys in 16 countries. Lifetime DSM-IV MDD was reported by 8,261 respondents. Retrospectively reported outcomes included measures of persistence (number of years with an episode; number of with an episode lasting most of the year) and severity (hospitalization for MDD; disability due to MDD). Results Recursive partitioning found significant clusters defined by the conjunctions of early onset, suicidality, and anxiety (irritability, panic, nervousness-worry-anxiety) during the index episode. GLMs found additional associations involving a number of individual symptoms. Predicted values of the four outcomes were strongly correlated. Cluster analysis of these predicted values found three clusters having consistently high, intermediate, or low predicted scores across all outcomes. The high-risk cluster (30.0% of respondents) accounted for 52.9-69.7% of high persistence and severity and was most strongly predicted by index episode severe dysphoria, suicidality, anxiety, and early onset. A total symptom count, in comparison, was not a significant predictor. Conclusions Despite being based on retrospective reports, results suggest that useful MDD subtyping distinctions can be made using data mining methods. Further studies are needed to test and expand these results with prospective data. PMID:24425049

  7. SLURM: Simple Linux Utility for Resource Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jette, M; Dunlap, C; Garlick, J

    2002-04-24

    Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, and scheduling modules. The design also includes a scalable, general-purpose communication infrastructure. Development will take place in four phases: Phase I results in a solid infrastructure; Phase II produces a functional but limited interactive job initiation capability without use of the interconnect/switch; Phase III provides switch support and documentation; Phase IV provides job status, fault-tolerance, and job queuing and control through Livermore's Distributed Productionmore » Control System (DPCS), a meta-batch and resource management system.« less

  8. An Effective Approach for Clustering InhA Molecular Dynamics Trajectory Using Substrate-Binding Cavity Features

    PubMed Central

    Ruiz, Duncan D. A.; Norberto de Souza, Osmar

    2015-01-01

    Protein receptor conformations, obtained from molecular dynamics (MD) simulations, have become a promising treatment of its explicit flexibility in molecular docking experiments applied to drug discovery and development. However, incorporating the entire ensemble of MD conformations in docking experiments to screen large candidate compound libraries is currently an unfeasible task. Clustering algorithms have been widely used as a means to reduce such ensembles to a manageable size. Most studies investigate different algorithms using pairwise Root-Mean Square Deviation (RMSD) values for all, or part of the MD conformations. Nevertheless, the RMSD only may not be the most appropriate gauge to cluster conformations when the target receptor has a plastic active site, since they are influenced by changes that occur on other parts of the structure. Hence, we have applied two partitioning methods (k-means and k-medoids) and four agglomerative hierarchical methods (Complete linkage, Ward’s, Unweighted Pair Group Method and Weighted Pair Group Method) to analyze and compare the quality of partitions between a data set composed of properties from an enzyme receptor substrate-binding cavity and two data sets created using different RMSD approaches. Ensembles of representative MD conformations were generated by selecting a medoid of each group from all partitions analyzed. We investigated the performance of our new method for evaluating binding conformation of drug candidates to the InhA enzyme, which were performed by cross-docking experiments between a 20 ns MD trajectory and 20 different ligands. Statistical analyses showed that the novel ensemble, which is represented by only 0.48% of the MD conformations, was able to reproduce 75% of all dynamic behaviors within the binding cavity for the docking experiments performed. Moreover, this new approach not only outperforms the other two RMSD-clustering solutions, but it also shows to be a promising strategy to distill biologically relevant information from MD trajectories, especially for docking purposes. PMID:26218832

  9. An Effective Approach for Clustering InhA Molecular Dynamics Trajectory Using Substrate-Binding Cavity Features.

    PubMed

    De Paris, Renata; Quevedo, Christian V; Ruiz, Duncan D A; Norberto de Souza, Osmar

    2015-01-01

    Protein receptor conformations, obtained from molecular dynamics (MD) simulations, have become a promising treatment of its explicit flexibility in molecular docking experiments applied to drug discovery and development. However, incorporating the entire ensemble of MD conformations in docking experiments to screen large candidate compound libraries is currently an unfeasible task. Clustering algorithms have been widely used as a means to reduce such ensembles to a manageable size. Most studies investigate different algorithms using pairwise Root-Mean Square Deviation (RMSD) values for all, or part of the MD conformations. Nevertheless, the RMSD only may not be the most appropriate gauge to cluster conformations when the target receptor has a plastic active site, since they are influenced by changes that occur on other parts of the structure. Hence, we have applied two partitioning methods (k-means and k-medoids) and four agglomerative hierarchical methods (Complete linkage, Ward's, Unweighted Pair Group Method and Weighted Pair Group Method) to analyze and compare the quality of partitions between a data set composed of properties from an enzyme receptor substrate-binding cavity and two data sets created using different RMSD approaches. Ensembles of representative MD conformations were generated by selecting a medoid of each group from all partitions analyzed. We investigated the performance of our new method for evaluating binding conformation of drug candidates to the InhA enzyme, which were performed by cross-docking experiments between a 20 ns MD trajectory and 20 different ligands. Statistical analyses showed that the novel ensemble, which is represented by only 0.48% of the MD conformations, was able to reproduce 75% of all dynamic behaviors within the binding cavity for the docking experiments performed. Moreover, this new approach not only outperforms the other two RMSD-clustering solutions, but it also shows to be a promising strategy to distill biologically relevant information from MD trajectories, especially for docking purposes.

  10. Clustering Qualitative Data Based on Binary Equivalence Relations: Neighborhood Search Heuristics for the Clique Partitioning Problem

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Kohn, Hans-Friedrich

    2009-01-01

    The clique partitioning problem (CPP) requires the establishment of an equivalence relation for the vertices of a graph such that the sum of the edge costs associated with the relation is minimized. The CPP has important applications for the social sciences because it provides a framework for clustering objects measured on a collection of nominal…

  11. Effect of video server topology on contingency capacity requirements

    NASA Astrophysics Data System (ADS)

    Kienzle, Martin G.; Dan, Asit; Sitaram, Dinkar; Tetzlaff, William H.

    1996-03-01

    Video servers need to assign a fixed set of resources to each video stream in order to guarantee on-time delivery of the video data. If a server has insufficient resources to guarantee the delivery, it must reject the stream request rather than slowing down all existing streams. Large scale video servers are being built as clusters of smaller components, so as to be economical, scalable, and highly available. This paper uses a blocking model developed for telephone systems to evaluate video server cluster topologies. The goal is to achieve high utilization of the components and low per-stream cost combined with low blocking probability and high user satisfaction. The analysis shows substantial economies of scale achieved by larger server images. Simple distributed server architectures can result in partitioning of resources with low achievable resource utilization. By comparing achievable resource utilization of partitioned and monolithic servers, we quantify the cost of partitioning. Next, we present an architecture for a distributed server system that avoids resource partitioning and results in highly efficient server clusters. Finally, we show how, in these server clusters, further optimizations can be achieved through caching and batching of video streams.

  12. An AO-assisted Variability Study of Four Globular Clusters

    NASA Astrophysics Data System (ADS)

    Salinas, R.; Contreras Ramos, R.; Strader, J.; Hakala, P.; Catelan, M.; Peacock, M. B.; Simunovic, M.

    2016-09-01

    The image-subtraction technique applied to study variable stars in globular clusters represented a leap in the number of new detections, with the drawback that many of these new light curves could not be transformed to magnitudes due to severe crowding. In this paper, we present observations of four Galactic globular clusters, M 2 (NGC 7089), M 10 (NGC 6254), M 80 (NGC 6093), and NGC 1261, taken with the ground-layer adaptive optics module at the SOAR Telescope, SAM. We show that the higher image quality provided by SAM allows for the calibration of the light curves of the great majority of the variables near the cores of these clusters as well as the detection of new variables, even in clusters where image-subtraction searches were already conducted. We report the discovery of 15 new variables in M 2 (12 RR Lyrae stars and 3 SX Phe stars), 12 new variables in M 10 (11 SX Phe and 1 long-period variable), and 1 new W UMa-type variable in NGC 1261. No new detections are found in M 80, but previous uncertain detections are confirmed and the corresponding light curves are calibrated into magnitudes. Additionally, based on the number of detected variables and new Hubble Space Telescope/UVIS photometry, we revisit a previous suggestion that M 80 may be the globular cluster with the richest population of blue stragglers in our Galaxy. Based on observations obtained at the Southern Astrophysical Research (SOAR) telescope, which is a joint project of the Ministério da Ciência, Tecnologia, e Inovação (MCTI) da República Federativa do Brasil, the U.S. National Optical Astronomy Observatory (NOAO), the University of North Carolina at Chapel Hill (UNC), and Michigan State University (MSU).

  13. One-step generation of continuous-variable quadripartite cluster states in a circuit QED system

    NASA Astrophysics Data System (ADS)

    Yang, Zhi-peng; Li, Zhen; Ma, Sheng-li; Li, Fu-li

    2017-07-01

    We propose a dissipative scheme for one-step generation of continuous-variable quadripartite cluster states in a circuit QED setup consisting of four superconducting coplanar waveguide resonators and a gap-tunable superconducting flux qubit. With external driving fields to adjust the desired qubit-resonator and resonator-resonator interactions, we show that continuous-variable quadripartite cluster states of the four resonators can be generated with the assistance of energy relaxation of the qubit. By comparison with the previous proposals, the distinct advantage of our scheme is that only one step of quantum operation is needed to realize the quantum state engineering. This makes our scheme simpler and more feasible in experiment. Our result may have useful application for implementing quantum computation in solid-state circuit QED systems.

  14. Dominant partition method. [based on a wave function formalism

    NASA Technical Reports Server (NTRS)

    Dixon, R. M.; Redish, E. F.

    1979-01-01

    By use of the L'Huillier, Redish, and Tandy (LRT) wave function formalism, a partially connected method, the dominant partition method (DPM) is developed for obtaining few body reductions of the many body problem in the LRT and Bencze, Redish, and Sloan (BRS) formalisms. The DPM maps the many body problem to a fewer body one by using the criterion that the truncated formalism must be such that consistency with the full Schroedinger equation is preserved. The DPM is based on a class of new forms for the irreducible cluster potential, which is introduced in the LRT formalism. Connectivity is maintained with respect to all partitions containing a given partition, which is referred to as the dominant partition. Degrees of freedom corresponding to the breakup of one or more of the clusters of the dominant partition are treated in a disconnected manner. This approach for simplifying the complicated BRS equations is appropriate for physical problems where a few body reaction mechanism prevails.

  15. The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB.

    PubMed

    Sander, Ulrich; Lubbe, Nils

    2018-04-01

    Intersection accidents are frequent and harmful. The accident types 'straight crossing path' (SCP), 'left turn across path - oncoming direction' (LTAP/OD), and 'left-turn across path - lateral direction' (LTAP/LD) represent around 95% of all intersection accidents and one-third of all police-reported car-to-car accidents in Germany. The European New Car Assessment Program (Euro NCAP) have announced that intersection scenarios will be included in their rating from 2020; however, how these scenarios are to be tested has not been defined. This study investigates whether clustering methods can be used to identify a small number of test scenarios sufficiently representative of the accident dataset to evaluate Intersection Automated Emergency Braking (AEB). Data from the German In-Depth Accident Study (GIDAS) and the GIDAS-based Pre-Crash Matrix (PCM) from 1999 to 2016, containing 784 SCP and 453 LTAP/OD accidents, were analyzed with principal component methods to identify variables that account for the relevant total variances of the sample. Three different methods for data clustering were applied to each of the accident types, two similarity-based approaches, namely Hierarchical Clustering (HC) and Partitioning Around Medoids (PAM), and the probability-based Latent Class Clustering (LCC). The optimum number of clusters was derived for HC and PAM with the silhouette method. The PAM algorithm was both initiated with random start medoid selection and medoids from HC. For LCC, the Bayesian Information Criterion (BIC) was used to determine the optimal number of clusters. Test scenarios were defined from optimal cluster medoids weighted by their real-life representation in GIDAS. The set of variables for clustering was further varied to investigate the influence of variable type and character. We quantified how accurately each cluster variation represents real-life AEB performance using pre-crash simulations with PCM data and a generic algorithm for AEB intervention. The usage of different sets of clustering variables resulted in substantially different numbers of clusters. The stability of the resulting clusters increased with prioritization of categorical over continuous variables. For each different set of cluster variables, a strong in-cluster variance of avoided versus non-avoided accidents for the specified Intersection AEB was present. The medoids did not predict the most common Intersection AEB behavior in each cluster. Despite thorough analysis using various cluster methods and variable sets, it was impossible to reduce the diversity of intersection accidents into a set of test scenarios without compromising the ability to predict real-life performance of Intersection AEB. Although this does not imply that other methods cannot succeed, it was observed that small changes in the definition of a scenario resulted in a different avoidance outcome. Therefore, we suggest using limited physical testing to validate more extensive virtual simulations to evaluate vehicle safety. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. An Investigation of Document Partitions.

    ERIC Educational Resources Information Center

    Shaw, W. M., Jr.

    1986-01-01

    Empirical significance of document partitions is investigated as a function of index term-weight and similarity thresholds. Results show the same empirically preferred partitions can be detected by two independent strategies: an analysis of cluster-based retrieval analysis and an analysis of regularities in the underlying structure of the document…

  17. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

    PubMed Central

    Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

    2015-01-01

    Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888

  18. The Clusters AgeS Experiment (CASE). Variable stars in the field of the globular cluster NGC 362

    NASA Astrophysics Data System (ADS)

    Rozyczka, M.; Thompson, I. B.; Narloch, W.; Pych, W.; Schwarzenberg-Czerny, A.

    2016-09-01

    The field of the globular cluster NGC 362 was monitored between 1997 and 2015 in a search for variable stars. BV light curves were obtained for 151 periodic or likely periodic variable stars, over a hundred of which are new detections. Twelve newly detected variable stars are proper-motion members of the cluster: two SX Phe and two RR Lyr pulsators, one contact binary, three detached or semi-detached eclipsing binaries, and four spotted variable stars. The most interesting objects among these are the binary blue straggler V20 with an asymmetric light curve, and the 8.1 d semidetached binary V24 located on the red giant branch of NGC 362, which is a Chandra X-ray source. We also provide substantial new data for 24 previously known variable stars.

  19. Merging K-means with hierarchical clustering for identifying general-shaped groups.

    PubMed

    Peterson, Anna D; Ghosh, Arka P; Maitra, Ranjan

    2018-01-01

    Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and K -means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets while K -means clustering is efficient but designed to identify homogeneous spherically-shaped clusters. We present a hybrid non-parametric clustering approach that amalgamates the two methods to identify general-shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using K -means. We next merge these groups using hierarchical methods with a data-driven distance measure as a stopping criterion. Our proposal has the potential to reveal groups with general shapes and structure in a dataset. We demonstrate good performance on several simulated and real datasets.

  20. Clustering gene expression data based on predicted differential effects of GV interaction.

    PubMed

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  1. Using Cluster Analysis to Compartmentalize a Large Managed Wetland Based on Physical, Biological, and Climatic Geospatial Attributes.

    PubMed

    Hahus, Ian; Migliaccio, Kati; Douglas-Mankin, Kyle; Klarenberg, Geraldine; Muñoz-Carpena, Rafael

    2018-04-27

    Hierarchical and partitional cluster analyses were used to compartmentalize Water Conservation Area 1, a managed wetland within the Arthur R. Marshall Loxahatchee National Wildlife Refuge in southeast Florida, USA, based on physical, biological, and climatic geospatial attributes. Single, complete, average, and Ward's linkages were tested during the hierarchical cluster analyses, with average linkage providing the best results. In general, the partitional method, partitioning around medoids, found clusters that were more evenly sized and more spatially aggregated than those resulting from the hierarchical analyses. However, hierarchical analysis appeared to be better suited to identify outlier regions that were significantly different from other areas. The clusters identified by geospatial attributes were similar to clusters developed for the interior marsh in a separate study using water quality attributes, suggesting that similar factors have influenced variations in both the set of physical, biological, and climatic attributes selected in this study and water quality parameters. However, geospatial data allowed further subdivision of several interior marsh clusters identified from the water quality data, potentially indicating zones with important differences in function. Identification of these zones can be useful to managers and modelers by informing the distribution of monitoring equipment and personnel as well as delineating regions that may respond similarly to future changes in management or climate.

  2. Possibilistic clustering for shape recognition

    NASA Technical Reports Server (NTRS)

    Keller, James M.; Krishnapuram, Raghu

    1993-01-01

    Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.

  3. Possibilistic clustering for shape recognition

    NASA Technical Reports Server (NTRS)

    Keller, James M.; Krishnapuram, Raghu

    1992-01-01

    Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

  4. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    PubMed

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  5. The Clusters AgeS Experiment (CASE). Variable Stars in the Field of the Globular Cluster M12

    NASA Astrophysics Data System (ADS)

    Kaluzny, J.; Thompson, I. B.; Narloch, W.; Pych, W.; Rozyczka, M.

    2015-09-01

    The field of the globular cluster M12 (NGC 6218) was monitored between 1995 and 2009 in a search for variable stars. BV light curves were obtained for thirty-six periodic or likely periodic variable stars. Thirty-four of these are new detections. Among the latter we identified twenty proper-motion members of the cluster: six detached or semi-detached eclipsing binaries, five contact binaries, five SX Phe pulsators, and three yellow stragglers. Two of the eclipsing binaries are located in the turnoff region, one on the lower main sequence and the remaining three among the blue stragglers. Two contact systems are blue stragglers, and the remaining three reside in the turnoff region. In the blue straggler region a total of 103 objects were found, of which 42 are proper motion members of M12, and another four are field stars. 55 of the remaining objects are located within two core radii from the center of the cluster, and as such they are likely genuine blue stragglers. We also report the discoveries of a radial color gradient of M12, and the shortest period among contact systems in globular clusters in general.

  6. Sputum neutrophils are associated with more severe asthma phenotypes using cluster analysis

    PubMed Central

    Moore, Wendy C.; Hastie, Annette T.; Li, Xingnan; Li, Huashi; Busse, William W.; Jarjour, Nizar N.; Wenzel, Sally E.; Peters, Stephen P.; Meyers, Deborah A.; Bleecker, Eugene R.

    2013-01-01

    Background Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified five asthma subphenotypes that represent the severity spectrum of early onset allergic asthma, late onset severe asthma and severe asthma with COPD characteristics. Analysis of induced sputum from a subset of SARP subjects showed four sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophils (≥2%) and neutrophils (≥40%) had characteristics of very severe asthma. Objective To better understand interactions between inflammation and clinical subphenotypes we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Methods Participants in SARP at three clinical sites who underwent sputum induction were included in this analysis (n=423). Fifteen variables including clinical characteristics and blood and sputum inflammatory cell assessments were selected by factor analysis for unsupervised cluster analysis. Results Four phenotypic clusters were identified. Cluster A (n=132) and B (n=127) subjects had mild-moderate early onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of Cluster C (n=117) and D (n=47) subjects who had moderate-severe asthma with frequent health care utilization despite treatment with high doses of inhaled or oral corticosteroids, and in Cluster D, reduced lung function. The majority these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophils were the most important variables determining cluster assignment. Conclusion This multivariate approach identified four asthma subphenotypes representing the severity spectrum from mild-moderate allergic asthma with minimal or eosinophilic predominant sputum inflammation to moderate-severe asthma with neutrophilic predominant or mixed granulocytic inflammation. PMID:24332216

  7. Prediction of sea ice thickness cluster in the Northern Hemisphere

    NASA Astrophysics Data System (ADS)

    Fuckar, Neven-Stjepan; Guemas, Virginie; Johnson, Nathaniel; Doblas-Reyes, Francisco

    2016-04-01

    Sea ice thickness (SIT) has a potential to contain substantial climate memory and predictability in the northern hemisphere (NH) sea ice system. We use 5-member NH SIT, reconstructed with an ocean-sea-ice general circulation model (NEMOv3.3 with LIM2) with a simple data assimilation routine, to determine NH SIT modes of variability disentangled from the long-term climate change. Specifically, we apply the K-means cluster analysis - one of nonhierarchical clustering methods that partition data into modes or clusters based on their distances in the physical - to determine optimal number of NH SIT clusters (K=3) and their historical variability. To examine prediction skill of NH SIT clusters in EC-Earth2.3, a state-of-the-art coupled climate forecast system, we use 5-member ocean and sea ice initial conditions (IC) from the same ocean-sea-ice historical reconstruction and atmospheric IC from ERA-Interim reanalysis. We focus on May 1st and Nov 1st start dates from 1979 to 2010. Common skill metrics of probability forecast, such as rank probability skill core and ROC (relative operating characteristics - hit rate versus false alarm rate) and reliability diagrams show that our dynamical model predominately perform better than the 1st order Marko chain forecast (that beats climatological forecast) over the first forecast year. On average May 1st start dates initially have lower skill than Nov 1st start dates, but their skill is degraded at slower rate than skill of forecast started on Nov 1st.

  8. Submorphotypes of the maxillary first molar and their effects on alignment and rotation.

    PubMed

    Kim, Hong-Kyun; Kwon, Ho Beom; Hyun, Hong-Keun; Jung, Min-Ho; Han, Seong Ho; Park, Young-Seok

    2014-09-01

    The aim of this study was to explore the shape differences in maxillary first molars with orthographic measurements using 3-dimensional virtual models to assess whether there is variability in morphology that could affect the alignment results when treated by straight-wire appliance systems. A total of 175 maxillary first molars with 4 cusps were selected for classification. With 3-dimensional laser scanning and reconstruction software, virtual casts were constructed. After performing several linear and angular measurements on the virtual occlusal plane, the teeth were clustered into 2 groups by the method of partitioning around medoids. To visualize the 2 groups, occlusal polygons were constructed using the average data of these groups. The resultant 2 clusters showed statistically significant differences in the measurements describing the cusp locations and the buccal and lingual outlines. The rotation along the centers made the 2 cluster polygons look similar, but there was a difference in the direction of the midsagittal lines. There was considerable variability in morphology according to 2 clusters in the population of this study. The occlusal polygons showed that the outlines of the 2 clusters were similar, but the midsagittal line directions and inner geometries were different. The difference between the morphologies of the 2 clusters could result in occlusal contact differences, which might be considered for better alignment of the maxillary posterior segment. Copyright © 2014 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.

  9. A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

    PubMed

    Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

    2015-10-01

    Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.

  10. Evaluation of Hierarchical Clustering Algorithms for Document Datasets

    DTIC Science & Technology

    2002-06-03

    link, complete-link, and group average ( UPGMA )) and a new set of merging criteria derived from the six partitional criterion functions. Overall, we...used the single-link, complete-link, and UPGMA schemes, as well as, the various partitional criterion functions described in Section 3.1. The single-link...other (complete-link approach). The UPGMA scheme [16] (also known as group average) overcomes these problems by measuring the similarity of two clusters

  11. Use of multiple cluster analysis methods to explore the validity of a community outcomes concept map.

    PubMed

    Orsi, Rebecca

    2017-02-01

    Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. A mesh partitioning algorithm for preserving spatial locality in arbitrary geometries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nivarti, Girish V., E-mail: g.nivarti@alumni.ubc.ca; Salehi, M. Mahdi; Bushe, W. Kendal

    2015-01-15

    Highlights: •An algorithm for partitioning computational meshes is proposed. •The Morton order space-filling curve is modified to achieve improved locality. •A spatial locality metric is defined to compare results with existing approaches. •Results indicate improved performance of the algorithm in complex geometries. -- Abstract: A space-filling curve (SFC) is a proximity preserving linear mapping of any multi-dimensional space and is widely used as a clustering tool. Equi-sized partitioning of an SFC ignores the loss in clustering quality that occurs due to inaccuracies in the mapping. Often, this results in poor locality within partitions, especially for the conceptually simple, Morton ordermore » curves. We present a heuristic that improves partition locality in arbitrary geometries by slicing a Morton order curve at points where spatial locality is sacrificed. In addition, we develop algorithms that evenly distribute points to the extent possible while maintaining spatial locality. A metric is defined to estimate relative inter-partition contact as an indicator of communication in parallel computing architectures. Domain partitioning tests have been conducted on geometries relevant to turbulent reactive flow simulations. The results obtained highlight the performance of our method as an unsupervised and computationally inexpensive domain partitioning tool.« less

  13. Ocean surface partitioning strategies using ocean colour remote Sensing: A review

    NASA Astrophysics Data System (ADS)

    Krug, Lilian Anne; Platt, Trevor; Sathyendranath, Shubha; Barbosa, Ana B.

    2017-06-01

    The ocean surface is organized into regions with distinct properties reflecting the complexity of interactions between environmental forcing and biological responses. The delineation of these functional units, each with unique, homogeneous properties and underlying ecosystem structure and dynamics, can be defined as ocean surface partitioning. The main purposes and applications of ocean partitioning include the evaluation of particular marine environments; generation of more accurate satellite ocean colour products; assimilation of data into biogeochemical and climate models; and establishment of ecosystem-based management practices. This paper reviews the diverse approaches implemented for ocean surface partition into functional units, using ocean colour remote sensing (OCRS) data, including their purposes, criteria, methods and scales. OCRS offers a synoptic, high spatial-temporal resolution, multi-decadal coverage of bio-optical properties, relevant to the applications and value of ocean surface partitioning. In combination with other biotic and/or abiotic data, OCRS-derived data (e.g., chlorophyll-a, optical properties) provide a broad and varied source of information that can be analysed using different delineation methods derived from subjective, expert-based to unsupervised learning approaches (e.g., cluster, fuzzy and empirical orthogonal function analyses). Partition schemes are applied at global to mesoscale spatial coverage, with static (time-invariant) or dynamic (time-varying) representations. A case study, the highly heterogeneous area off SW Iberian Peninsula (NE Atlantic), illustrates how the selection of spatial coverage and temporal representation affects the discrimination of distinct environmental drivers of phytoplankton variability. Advances in operational oceanography and in the subject area of satellite ocean colour, including development of new sensors, algorithms and products, are among the potential benefits from extended use, scope and applications of ocean surface partitioning using OCRS.

  14. Distribution of sea anemones (Cnidaria, Actiniaria) in Korea analyzed by environmental clustering

    USGS Publications Warehouse

    Cha, H.-R.; Buddemeier, R.W.; Fautin, D.G.; Sandhei, P.

    2004-01-01

    Using environmental data and the geospatial clustering tools LOICZView and DISCO, we empirically tested the postulated existence and boundaries of four biogeographic regions in the southern part of the Korean peninsula. Environmental variables used included wind speed, sea surface temperature (SST), salinity, tidal amplitude, and the chlorophyll spectral signal. Our analysis confirmed the existence of four biogeographic regions, but the details of the borders between them differ from those previously postulated. Specimen-level distribution records of intertidal sea anemones were mapped; their distribution relative to the environmental data supported the importance of the environmental parameters we selected in defining suitable habitats. From the geographic coincidence between anemone distribution and the clusters based on environmental variables, we infer that geospatial clustering has the power to delimit ranges for marine organisms within relatively small geographical areas.

  15. Sensitivity evaluation of dynamic speckle activity measurements using clustering methods.

    PubMed

    Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H

    2010-07-01

    We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.

  16. Phobos MRO/CRISM visible and near-infrared (0.5-2.5 μm) spectral modeling

    NASA Astrophysics Data System (ADS)

    Pajola, Maurizio; Roush, Ted; Dalle Ore, Cristina; Marzo, Giuseppe A.; Simioni, Emanuele

    2018-05-01

    This paper focuses on the spectral modeling of the surface of Phobos in the wavelength range between 0.5 and 2.5 μm. We exploit the Phobos Mars Reconnaissance Orbiter/Compact Reconnaissance Imaging Spectrometer for Mars (MRO/CRISM) dataset and extend the study area presented by Fraeman et al. (2012) including spectra from nearly the entire surface observed. Without a priori selection of surface locations we use the unsupervised K-means partitioning algorithm developed by Marzo et al. (2006) to investigate the spectral variability across Phobos surface. The statistical partitioning identifies seven clusters. We investigate the compositional information contained within the average spectra of four clusters using the radiative transfer model of Shkuratov et al. (1999). We use optical constants of Tagish Lake meteorite (TL), from Roush (2003), and pyroxene glass (PM80), from Jaeger et al. (1994) and Dorschner et al. (1995), as previously suggested by Pajola et al. (2013) as inputs for the calculations. The model results show good agreement in slope when compared to the averages of the CRISM spectral clusters. In particular, the best fitting model of the cluster with the steepest spectral slope yields relative abundances that are equal to those of Pajola et al. (2013), i.e. 20% PM80 and 80% TL, but grain sizes that are 12 μm smaller for PM80 and 4 μm smaller for TL (the grain sizes are 11 μm for PM80 and 20 μm for TL in Pajola et al. (2013), respectively). This modest discrepancy may arise from the fact that the areas observed by CRISM and those analyzed in Pajola et al. (2013) are on opposite locations on Phobos and are characterized by different morphological and weathering settings. Instead, as the clusters spectral slopes decrease, the best fits obtained show trends related to both relative abundance and grain size that is not observed for the cluster with the steepest spectral slope. With a decrease in slope there is general increase of relative percentage of PM80 from 12% to 18% and the associated decrease of TL from 88% to 82%. Simultaneously the PM80 grain sizes decrease from 9 to 5 μm and TL grain sizes increase from 13 to 16 μm. The best fitting models show relative abundances and grain sizes that partially overlap. This supports the hypothesis that from a compositional perspective the transition between the highest and lowest slopes on Phobos is subtle, and it is characterized by a smooth change of relative abundances and grain sizes, instead of a distinct dichotomy between the areas.

  17. A possibilistic approach to clustering

    NASA Technical Reports Server (NTRS)

    Krishnapuram, Raghu; Keller, James M.

    1993-01-01

    Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.

  18. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    PubMed Central

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  19. Unsupervised classification of multivariate geostatistical data: Two algorithms

    NASA Astrophysics Data System (ADS)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  20. Partitioning of the Golgi Apparatus during Mitosis in Living HeLa Cells

    PubMed Central

    Shima, David T.; Haldar, Kasturi; Pepperkok, Rainer; Watson, Rose; Warren, Graham

    1997-01-01

    The Golgi apparatus of HeLa cells was fluorescently tagged with a green fluorescent protein (GFP), localized by attachment to the NH2-terminal retention signal of N-acetylglucosaminyltransferase I (NAGT I). The location was confirmed by immunogold and immunofluorescence microscopy using a variety of Golgi markers. The behavior of the fluorescent Golgi marker was observed in fixed and living mitotic cells using confocal microscopy. By metaphase, cells contained a constant number of Golgi fragments dispersed throughout the cytoplasm. Conventional and cryoimmunoelectron microscopy showed that the NAGT I–GFP chimera (NAGFP)-positive fragments were tubulo-vesicular mitotic Golgi clusters. Mitotic conversion of Golgi stacks into mitotic clusters had surprisingly little effect on the polarity of Golgi membrane markers at the level of fluorescence microscopy. In living cells, there was little self-directed movement of the clusters in the period from metaphase to early telophase. In late telophase, the Golgi ribbon began to be reformed by a dynamic process of congregation and tubulation of the newly inherited Golgi fragments. The accuracy of partitioning the NAGFP-tagged Golgi was found to exceed that expected for a stochastic partitioning process. The results provide direct evidence for mitotic clusters as the unit of partitioning and suggest that precise regulation of the number, position, and compartmentation of mitotic membranes is a critical feature for the ordered inheritance of the Golgi apparatus. PMID:9182657

  1. An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks.

    PubMed

    Botía, Juan A; Vandrovcova, Jana; Forabosco, Paola; Guelfi, Sebastian; D'Sa, Karishma; Hardy, John; Lewis, Cathryn M; Ryten, Mina; Weale, Michael E

    2017-04-12

    Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn ). We assessed our method on networks created from UKBEC data (10 different human brain tissues), on networks created from GTEx data (42 human tissues, including 13 brain tissues), and on simulated networks derived from GTEx data. We observed substantially improved module properties, including: (1) few or zero misplaced genes; (2) increased counts of replicable clusters in alternate tissues (x3.1 on average); (3) improved enrichment of Gene Ontology terms (seen in 48/52 GCNs) (4) improved cell type enrichment signals (seen in 21/23 brain GCNs); and (5) more accurate partitions in simulated data according to a range of similarity indices. The results obtained from our investigations indicate that our k-means method, applied as an adjunct to standard WGCNA, results in better network partitions. These improved partitions enable more fruitful downstream analyses, as gene modules are more biologically meaningful.

  2. Golgi apparatus partitioning during cell division.

    PubMed

    Rabouille, Catherine; Jokitalo, Eija

    2003-01-01

    This review discusses the mitotic segregation of the Golgi apparatus. The results from classical biochemical and morphological studies have suggested that in mammalian cells this organelle remains distinct during mitosis, although highly fragmented through the formation of mitotic Golgi clusters of small tubules and vesicles. Shedding of free Golgi-derived vesicles would consume Golgi clusters and disperse this organelle throughout the cytoplasm. Vesicles could be partitioned in a stochastic and passive way between the two daughter cells and act as a template for the reassembly of this key organelle. This model has recently been modified by results obtained using GFP- or HRP-tagged Golgi resident enzymes, live cell imaging and electron microscopy. Results obtained with these techniques show that the mitotic Golgi clusters are stable entities throughout mitosis that partition in a microtubule spindle-dependent fashion. Furthermore, a newer model proposes that at the onset of mitosis, the Golgi apparatus completely loses its identity and is reabsorbed into the endoplasmic reticulum. This suggests that the partitioning of the Golgi apparatus is entirely dependent on the partitioning of the endoplasmic reticulum. We critically discuss both models and summarize what is known about the molecular mechanisms underlying the Golgi disassembly and reassembly during and after mitosis. We will also review how the study of the Golgi apparatus during mitosis in other organisms can answer current questions and perhaps reveal novel mechanisms.

  3. Retinal ganglion cells in the eastern newt Notophthalmus viridescens: topography, morphology, and diversity.

    PubMed

    Pushchin, Igor I; Karetin, Yuriy A

    2009-10-20

    The topography and morphology of retinal ganglion cells (RGCs) in the eastern newt were studied. Cells were retrogradely labeled with tetramethylrhodamine-conjugated dextran amines or horseradish peroxidase and examined in retinal wholemounts. Their total number was 18,025 +/- 3,602 (mean +/- SEM). The spatial density of RGCs varied from 2,100 cells/mm(2) in the retinal periphery to 4,500 cells/mm(2) in the dorsotemporal retina. No prominent retinal specializations were found. The spatial resolution estimated from the spatial density of RGCs varied from 1.4 cycles per degree in the periphery to 1.95 cycles per degree in the region of the peak RGC density. A sample of 68 cells was camera lucida drawn and subjected to quantitative analysis. A total of 21 parameters related to RGC morphology and stratification in the retina were estimated. Partitionings obtained by using different clustering algorithms combined with automatic variable weighting and dimensionality reduction techniques were compared, and an effective solution was found by using silhouette analysis. A total of seven clusters were identified and associated with potential cell types. Kruskal-Wallis ANOVA-on-Ranks with post hoc Mann-Whitney U tests showed significant pairwise between-cluster differences in one or more of the clustering variables. The average silhouette values of the clusters were reasonably high, ranging from 0.52 to 0.79. Cells assigned to the same cluster displayed similar morphology and stratification in the retina. The advantages and limitations of the methodology adopted are discussed. The present classification is compared with known morphological and physiological RGC classifications in other salamanders.

  4. "K"-Balance Partitioning: An Exact Method with Applications to Generalized Structural Balance and Other Psychological Contexts

    ERIC Educational Resources Information Center

    Brusco, Michael; Steinley, Douglas

    2010-01-01

    Structural balance theory (SBT) has maintained a venerable status in the psychological literature for more than 5 decades. One important problem pertaining to SBT is the approximation of structural or generalized balance via the partitioning of the vertices of a signed graph into "K" clusters. This "K"-balance partitioning problem also has more…

  5. Utility of K-Means clustering algorithm in differentiating apparent diffusion coefficient values between benign and malignant neck pathologies

    PubMed Central

    Srinivasan, A.; Galbán, C.J.; Johnson, T.D.; Chenevert, T.L.; Ross, B.D.; Mukherji, S.K.

    2014-01-01

    Purpose The objective of our study was to analyze the differences between apparent diffusion coefficient (ADC) partitions (created using the K-Means algorithm) between benign and malignant neck lesions and evaluate its benefit in distinguishing these entities. Material and methods MRI studies of 10 benign and 10 malignant proven neck pathologies were post-processed on a PC using in-house software developed in MATLAB (The MathWorks, Inc., Natick, MA). Lesions were manually contoured by two neuroradiologists with the ADC values within each lesion clustered into two (low ADC-ADCL, high ADC-ADCH) and three partitions (ADCL, intermediate ADC-ADCI, ADCH) using the K-Means clustering algorithm. An unpaired two-tailed Student’s t-test was performed for all metrics to determine statistical differences in the means between the benign and malignant pathologies. Results Statistically significant difference between the mean ADCL clusters in benign and malignant pathologies was seen in the 3 cluster models of both readers (p=0.03, 0.022 respectively) and the 2 cluster model of reader 2 (p=0.04) with the other metrics (ADCH, ADCI, whole lesion mean ADC) not revealing any significant differences. Receiver operating characteristics curves demonstrated the quantitative difference in mean ADCH and ADCL in both the 2 and 3 cluster models to be predictive of malignancy (2 clusters: p=0.008, area under curve=0.850, 3 clusters: p=0.01, area under curve=0.825). Conclusion The K-Means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared to whole lesion mean ADC alone. PMID:20007723

  6. Utility of the k-means clustering algorithm in differentiating apparent diffusion coefficient values of benign and malignant neck pathologies.

    PubMed

    Srinivasan, A; Galbán, C J; Johnson, T D; Chenevert, T L; Ross, B D; Mukherji, S K

    2010-04-01

    Does the K-means algorithm do a better job of differentiating benign and malignant neck pathologies compared to only mean ADC? The objective of our study was to analyze the differences between ADC partitions to evaluate whether the K-means technique can be of additional benefit to whole-lesion mean ADC alone in distinguishing benign and malignant neck pathologies. MR imaging studies of 10 benign and 10 malignant proved neck pathologies were postprocessed on a PC by using in-house software developed in Matlab. Two neuroradiologists manually contoured the lesions, with the ADC values within each lesion clustered into 2 (low, ADC-ADC(L); high, ADC-ADC(H)) and 3 partitions (ADC(L); intermediate, ADC-ADC(I); ADC(H)) by using the K-means clustering algorithm. An unpaired 2-tailed Student t test was performed for all metrics to determine statistical differences in the means of the benign and malignant pathologies. A statistically significant difference between the mean ADC(L) clusters in benign and malignant pathologies was seen in the 3-cluster models of both readers (P = .03 and .022, respectively) and the 2-cluster model of reader 2 (P = .04), with the other metrics (ADC(H), ADC(I); whole-lesion mean ADC) not revealing any significant differences. ROC curves demonstrated the quantitative differences in mean ADC(H) and ADC(L) in both the 2- and 3-cluster models to be predictive of malignancy (2 clusters: P = .008, area under curve = 0.850; 3 clusters: P = .01, area under curve = 0.825). The K-means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared with whole-lesion mean ADC alone.

  7. The Search for Bright Variable Stars in Open Cluster NGC 6819.

    NASA Astrophysics Data System (ADS)

    Talamantes, Antonio; Sandquist, E. L.

    2009-01-01

    During this research period data was taken for seven nights at the 1m telescope at Mt. Laguna Observatory for the open cluster NGC 6819. For four of the nights data was taken using a V-band filter. For the three nights remaining nights the data was taken using an R-band filter. Photometry was done using the ISIS image subtraction package. Six new variable stars were located using these techniques. These variable types include a pulsating variable, five detached eclipsing binaries. Of the detached eclipsing binaries, three are near the cluster turnoff and two in the blue straggler region(and one of these has total eclipses). Nine previously known variables(six contact binaries, two detached eclipsing binaries and one near-contact binary) were also studied.

  8. Reconciling Apparent Conflicts between Mitochondrial and Nuclear Phylogenies in African Elephants

    PubMed Central

    Georgiadis, Nicholas J.; David, Victor A.; Zhao, Kai; Stephens, Robert M.; Kolokotronis, Sergios-Orestis; Roca, Alfred L.

    2011-01-01

    Conservation strategies for African elephants would be advanced by resolution of conflicting claims that they comprise one, two, three or four taxonomic groups, and by development of genetic markers that establish more incisively the provenance of confiscated ivory. We addressed these related issues by genotyping 555 elephants from across Africa with microsatellite markers, developing a method to identify those loci most effective at geographic assignment of elephants (or their ivory), and conducting novel analyses of continent-wide datasets of mitochondrial DNA. Results showed that nuclear genetic diversity was partitioned into two clusters, corresponding to African forest elephants (99.5% Cluster-1) and African savanna elephants (99.4% Cluster-2). Hybrid individuals were rare. In a comparison of basal forest “F” and savanna “S” mtDNA clade distributions to nuclear DNA partitions, forest elephant nuclear genotypes occurred only in populations in which S clade mtDNA was absent, suggesting that nuclear partitioning corresponds to the presence or absence of S clade mtDNA. We reanalyzed African elephant mtDNA sequences from 81 locales spanning the continent and discovered that S clade mtDNA was completely absent among elephants at all 30 sampled tropical forest locales. The distribution of savanna nuclear DNA and S clade mtDNA corresponded closely to range boundaries traditionally ascribed to the savanna elephant species based on habitat and morphology. Further, a reanalysis of nuclear genetic assignment results suggested that West African elephants do not comprise a distinct third species. Finally, we show that some DNA markers will be more useful than others for determining the geographic origins of illegal ivory. These findings resolve the apparent incongruence between mtDNA and nuclear genetic patterns that has confounded the taxonomy of African elephants, affirm the limitations of using mtDNA patterns to infer elephant systematics or population structure, and strongly support the existence of two elephant species in Africa. PMID:21701575

  9. The Clusters AgeS Experiment (CASE). Variable Stars in the Field of the Globular Cluster NGC 6362

    NASA Astrophysics Data System (ADS)

    Kaluzny, J.; Thompson, I. B.; Rozyczka, M.; Pych, W.; Narloch, W.

    2014-12-01

    The field of the globular cluster NGC 6362 was monitored between 1995 and 2009 in a search for variable stars. BV light curves were obtained for 69 periodic variable stars including 34 known RR Lyr stars, 10 known objects of other types and 25 newly detected variable stars. Among the latter we identified 18 proper-motion members of the cluster: seven detached eclipsing binaries (DEBs), six SX Phe stars, two W UMa binaries, two spotted red giants, and a very interesting eclipsing binary composed of two red giants - the first example of such a system found in a globular cluster. Five of the DEBs are located at the turnoff region, and the remaining two are redward of the lower main sequence. Eighty-four objects from the central 9×9 arcmin2 of the cluster were found in the region of cluster blue stragglers. Of these 70 are proper motion (PM) members of NGC 6362 (including all SX Phe and two W UMa stars), and five are field stars. The remaining nine objects lacking PM information are located at the very core of the cluster, and as such they are likely genuine blue stragglers.

  10. Clustering of galaxies with f(R) gravity

    NASA Astrophysics Data System (ADS)

    Capozziello, Salvatore; Faizal, Mir; Hameeda, Mir; Pourhassan, Behnam; Salzano, Vincenzo; Upadhyay, Sudhaker

    2018-02-01

    Based on thermodynamics, we discuss the galactic clustering of expanding Universe by assuming the gravitational interaction through the modified Newton's potential given by f(R) gravity. We compute the corrected N-particle partition function analytically. The corrected partition function leads to more exact equations of state of the system. By assuming that the system follows quasi-equilibrium, we derive the exact distribution function that exhibits the f(R) correction. Moreover, we evaluate the critical temperature and discuss the stability of the system. We observe the effects of correction of f(R) gravity on the power-law behaviour of particle-particle correlation function also. In order to check the feasibility of an f(R) gravity approach to the clustering of galaxies, we compare our results with an observational galaxy cluster catalogue.

  11. Clustering of financial time series

    NASA Astrophysics Data System (ADS)

    D'Urso, Pierpaolo; Cappelli, Carmela; Di Lallo, Dario; Massari, Riccardo

    2013-05-01

    This paper addresses the topic of classifying financial time series in a fuzzy framework proposing two fuzzy clustering models both based on GARCH models. In general clustering of financial time series, due to their peculiar features, needs the definition of suitable distance measures. At this aim, the first fuzzy clustering model exploits the autoregressive representation of GARCH models and employs, in the framework of a partitioning around medoids algorithm, the classical autoregressive metric. The second fuzzy clustering model, also based on partitioning around medoids algorithm, uses the Caiado distance, a Mahalanobis-like distance, based on estimated GARCH parameters and covariances that takes into account the information about the volatility structure of time series. In order to illustrate the merits of the proposed fuzzy approaches an application to the problem of classifying 29 time series of Euro exchange rates against international currencies is presented and discussed, also comparing the fuzzy models with their crisp version.

  12. A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis

    ERIC Educational Resources Information Center

    DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim

    2004-01-01

    This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…

  13. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions.

    PubMed

    Zhu, Lin; Chung, Fu-Lai; Wang, Shitong

    2009-06-01

    The fuzziness index m has important influence on the clustering result of fuzzy clustering algorithms, and it should not be forced to fix at the usual value m = 2. In view of its distinctive features in applications and its limitation in having m = 2 only, a recent advance of fuzzy clustering called fuzzy c-means clustering with improved fuzzy partitions (IFP-FCM) is extended in this paper, and a generalized algorithm called GIFP-FCM for more effective clustering is proposed. By introducing a novel membership constraint function, a new objective function is constructed, and furthermore, GIFP-FCM clustering is derived. Meanwhile, from the viewpoints of L(p) norm distance measure and competitive learning, the robustness and convergence of the proposed algorithm are analyzed. Furthermore, the classical fuzzy c-means algorithm (FCM) and IFP-FCM can be taken as two special cases of the proposed algorithm. Several experimental results including its application to noisy image texture segmentation are presented to demonstrate its average advantage over FCM and IFP-FCM in both clustering and robustness capabilities.

  14. Orbits of Four Very Massive Binaries in the R136 Cluster

    NASA Astrophysics Data System (ADS)

    Penny, L. R.; Massey, P.; Vukovich, J.

    2001-12-01

    We present radial velocity and photometry for four early-type, massive double-lined spectroscopic binaries in the R136 cluster. Three of these systems are eclipsing, allowing orbital inclinations to be determined. One of these systems, R136-38 (O3 V + O6 V), has one of the highest masses ever measured, 57 Modot, for the primary. Comparison of our masses with those derived from standard evolutionary tracks shows excellent agreement. We also identify five other light variables in the R136 cluster worthy of follow-up study.

  15. Genetic variability of Brazilian isolates of Alternaria alternata detected by AFLP and RAPD techniques

    PubMed Central

    Dini-Andreote, Francisco; Pietrobon, Vivian Cristina; Andreote, Fernando Dini; Romão, Aline Silva; Spósito, Marcel Bellato; Araújo, Welington Luiz

    2009-01-01

    The Alternaria brown spot (ABS) is a disease caused in tangerine plants and its hybrids by the fungus Alternaria alternata f. sp. citri which has been found in Brazil since 2001. Due to the recent occurrence in Brazilian orchards, the epidemiology and genetic variability of this pathogen is still an issue to be addressed. Here it is presented a survey about the genetic variability of this fungus by the characterization of twenty four pathogenic isolates of A. alternata f. sp. citri from citrus plants and four endophytic isolates from mango (one Alternaria tenuissima and three Alternaria arborescens). The application of two molecular markers Random Amplified Polymorphic DNA (RAPD) and Amplified Fragment Length Polymorphism (AFLP) had revealed the isolates clustering in distinct groups when fingerprintings were analyzed by Principal Components Analysis (PCA). Despite the better assessment of the genetic variability through the AFLP, significant modifications in clusters components were not observed, and only slight shifts in the positioning of isolates LRS 39/3 and 25M were observed in PCA plots. Furthermore, in both analyses, only the isolates from lemon plants revealed to be clustered, differently from the absence of clustering for other hosts or plant tissues. Summarizing, both RAPD and AFLP analyses were both efficient to detect the genetic variability within the population of the pathogenic fungus Alternaria spp., supplying information on the genetic variability of this species as a basis for further studies aiming the disease control. PMID:24031413

  16. Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis

    PubMed Central

    Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.

    2012-01-01

    Objective Segmentation of populations may facilitate development of targeted substance abuse prevention programs. We aimed to partition a national sample of university students according to profiles based on substance use. Participants We used 2008–2009 data from the National College Health Assessment from the American College Health Association. Our sample consisted of 111,245 individuals from 158 institutions. Method We partitioned the sample using cluster analysis according to current substance use behaviors. We examined the association of cluster membership with individual and institutional characteristics. Results Cluster analysis yielded six distinct clusters. Three individual factors—gender, year in school, and fraternity/sorority membership—were the most strongly associated with cluster membership. Conclusions In a large sample of university students, we were able to identify six distinct patterns of substance abuse. It may be valuable to target specific populations of college-aged substance users based on individual factors. However, comprehensive intervention will require a multifaceted approach. PMID:22686360

  17. Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

    PubMed

    van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

    2017-01-01

    In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.

  18. Criteria for genuine N -partite continuous-variable entanglement and Einstein-Podolsky-Rosen steering

    NASA Astrophysics Data System (ADS)

    Teh, R. Y.; Reid, M. D.

    2014-12-01

    Following previous work, we distinguish between genuine N -partite entanglement and full N -partite inseparability. Accordingly, we derive criteria to detect genuine multipartite entanglement using continuous-variable (position and momentum) measurements. Our criteria are similar but different to those based on the van Loock-Furusawa inequalities, which detect full N -partite inseparability. We explain how the criteria can be used to detect the genuine N -partite entanglement of continuous variable states generated from squeezed and vacuum state inputs, including the continuous-variable Greenberger-Horne-Zeilinger state, with explicit predictions for up to N =9 . This makes our work accessible to experiment. For N =3 , we also present criteria for tripartite Einstein-Podolsky-Rosen (EPR) steering. These criteria provide a means to demonstrate a genuine three-party EPR paradox, in which any single party is steerable by the remaining two parties.

  19. Optimal Partitioning of a Data Set Based on the "p"-Median Model

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Kohn, Hans-Friedrich

    2008-01-01

    Although the "K"-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The "p"-median model is an especially well-studied clustering problem that requires the selection of "p" objects to serve as…

  20. Estrellas variables en campos de cúmulos abiertos galácticos detectadas en el relevamiento VVV

    NASA Astrophysics Data System (ADS)

    Palma, T.; Dékany, I.; Clariá, J. J.; Minniti, D.; Alonso-García, J. A.; Ramírez Alegría, S.; Bonatto, C.

    2016-08-01

    The present project constitutes a massive search for variable stars in the field of open clusters projected on highly reddened regions of the galactic disk and bulge. This search is being performed using -, - and -band observations of the near-infrared variability Survey Vista variables in the Via Lactea. We present the first results obtained in four open clusters projected on the Galactic bulge. The new variables discovered in the current work, 182 in total, are classified on the basis of their light curves and their locations in the corresponding color-magnitude diagrams. Among the newly discovered variable stars, Cepheids, RR Lyrae, Scuti, eclipsing binaries and other types have been found.

  1. Methods of Conceptual Clustering and their Relation to Numerical Taxonomy.

    DTIC Science & Technology

    1985-07-22

    the conceptual clustering problem is to first solve theaggregation problem, and then the characterization problem. In machine learning, the...cluster- ings by first generating some number of possible clusterings. For each clustering generated, one calls a learning from examples subroutine, which...class 1 from class 2, and vice versa, only the first combination implies a partition over the set of theoretically possible objects. The first

  2. Visual hallucinatory syndromes and the anatomy of the visual brain.

    PubMed

    Santhouse, A M; Howard, R J; ffytche, D H

    2000-10-01

    We have set out to identify phenomenological correlates of cerebral functional architecture within Charles Bonnet syndrome (CBS) hallucinations by looking for associations between specific hallucination categories. Thirty-four CBS patients were examined with a structured interview/questionnaire to establish the presence of 28 different pathological visual experiences. Associations between categories of pathological experience were investigated by an exploratory factor analysis. Twelve of the pathological experiences partitioned into three segregated syndromic clusters. The first cluster consisted of hallucinations of extended landscape scenes and small figures in costumes with hats; the second, hallucinations of grotesque, disembodied and distorted faces with prominent eyes and teeth; and the third, visual perseveration and delayed palinopsia. The three visual psycho-syndromes mirror the segregation of hierarchical visual pathways into streams and suggest a novel theoretical framework for future research into the pathophysiology of neuropsychiatric syndromes.

  3. The "p"-Median Model as a Tool for Clustering Psychological Data

    ERIC Educational Resources Information Center

    Kohn, Hans-Friedrich; Steinley, Douglas; Brusco, Michael J.

    2010-01-01

    The "p"-median clustering model represents a combinatorial approach to partition data sets into disjoint, nonhierarchical groups. Object classes are constructed around "exemplars", that is, manifest objects in the data set, with the remaining instances assigned to their closest cluster centers. Effective, state-of-the-art implementations of…

  4. Assessing the seasonality and uncertainty in evapotranspiration partitioning using a tracer-aided model

    NASA Astrophysics Data System (ADS)

    Smith, A. A.; Welch, C.; Stadnyk, T. A.

    2018-05-01

    Evapotranspiration (ET) partitioning is a growing field of research in hydrology due to the significant fraction of watershed water loss it represents. The use of tracer-aided models has improved understanding of watershed processes, and has significant potential for identifying time-variable partitioning of evaporation (E) from ET. A tracer-aided model was used to establish a time-series of E/ET using differences in riverine δ18O and δ2H in four northern Canadian watersheds (lower Nelson River, Manitoba, Canada). On average E/ET follows a parabolic trend ranging from 0.7 in the spring and autumn to 0.15 (three watersheds) and 0.5 (fourth watershed) during the summer growing season. In the fourth watershed wetlands and shrubs dominate land cover. During the summer, E/ET ratios are highest in wetlands for three watersheds (10% higher than unsaturated soil storage), while lowest for the fourth watershed (20% lower than unsaturated soil storage). Uncertainty of the ET partition parameters is strongly influenced by storage volumes, with large storage volumes increasing partition uncertainty. In addition, higher simulated soil moisture increases estimated E/ET. Although unsaturated soil storage accounts for larger surface areas in these watersheds than wetlands, riverine isotopic composition is more strongly affected by E from wetlands. Comparisons of E/ET to measurement-intensive studies in similar ecoregions indicate that the methodology proposed here adequately partitions ET.

  5. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    PubMed

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  6. Spatial coding-based approach for partitioning big spatial data in Hadoop

    NASA Astrophysics Data System (ADS)

    Yao, Xiaochuang; Mokbel, Mohamed F.; Alarabi, Louai; Eldawy, Ahmed; Yang, Jianyu; Yun, Wenju; Li, Lin; Ye, Sijing; Zhu, Dehai

    2017-09-01

    Spatial data partitioning (SDP) plays a powerful role in distributed storage and parallel computing for spatial data. However, due to skew distribution of spatial data and varying volume of spatial vector objects, it leads to a significant challenge to ensure both optimal performance of spatial operation and data balance in the cluster. To tackle this problem, we proposed a spatial coding-based approach for partitioning big spatial data in Hadoop. This approach, firstly, compressed the whole big spatial data based on spatial coding matrix to create a sensing information set (SIS), including spatial code, size, count and other information. SIS was then employed to build spatial partitioning matrix, which was used to spilt all spatial objects into different partitions in the cluster finally. Based on our approach, the neighbouring spatial objects can be partitioned into the same block. At the same time, it also can minimize the data skew in Hadoop distributed file system (HDFS). The presented approach with a case study in this paper is compared against random sampling based partitioning, with three measurement standards, namely, the spatial index quality, data skew in HDFS, and range query performance. The experimental results show that our method based on spatial coding technique can improve the query performance of big spatial data, as well as the data balance in HDFS. We implemented and deployed this approach in Hadoop, and it is also able to support efficiently any other distributed big spatial data systems.

  7. Procedures to handle inventory cluster plots that straddle two or more conditions

    Treesearch

    Jerold T. Hahn; Colin D. MacLean; Stanford L. Arner; William A. Bechtold

    1995-01-01

    We review the relative merits and field procedures for four basic plot designs to handle forest inventory plots that straddle two or more conditions, given that subplots will not be moved. A cluster design is recommended that combines fixed-area subplots and variable-radius plot (VRP) sampling. Each subplot in a cluster consists of a large fixed-area subplot for...

  8. Clustering Financial Time Series by Network Community Analysis

    NASA Astrophysics Data System (ADS)

    Piccardi, Carlo; Calatroni, Lisa; Bertoni, Fabio

    In this paper, we describe a method for clustering financial time series which is based on community analysis, a recently developed approach for partitioning the nodes of a network (graph). A network with N nodes is associated to the set of N time series. The weight of the link (i, j), which quantifies the similarity between the two corresponding time series, is defined according to a metric based on symbolic time series analysis, which has recently proved effective in the context of financial time series. Then, searching for network communities allows one to identify groups of nodes (and then time series) with strong similarity. A quantitative assessment of the significance of the obtained partition is also provided. The method is applied to two distinct case-studies concerning the US and Italy Stock Exchange, respectively. In the US case, the stability of the partitions over time is also thoroughly investigated. The results favorably compare with those obtained with the standard tools typically used for clustering financial time series, such as the minimal spanning tree and the hierarchical tree.

  9. Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm

    NASA Astrophysics Data System (ADS)

    Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian

    2017-03-01

    DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.

  10. The genetic structure of a relict population of wood frogs

    USGS Publications Warehouse

    Scherer, Rick; Muths, Erin; Noon, Barry; Oyler-McCance, Sara

    2012-01-01

    Habitat fragmentation and the associated reduction in connectivity between habitat patches are commonly cited causes of genetic differentiation and reduced genetic variation in animal populations. We used eight microsatellite markers to investigate genetic structure and levels of genetic diversity in a relict population of wood frogs (Lithobates sylvatica) in Rocky Mountain National Park, Colorado, where recent disturbances have altered hydrologic processes and fragmented amphibian habitat. We also estimated migration rates among subpopulations, tested for a pattern of isolation-by-distance, and looked for evidence of a recent population bottleneck. The results from the clustering algorithm in Program STRUCTURE indicated the population is partitioned into two genetic clusters (subpopulations), and this result was further supported by factorial component analysis. In addition, an estimate of FST (FST = 0.0675, P value \\0.0001) supported the genetic differentiation of the two clusters. Estimates of migration rates among the two subpopulations were low, as were estimates of genetic variability. Conservation of the population of wood frogs may be improved by increasing the spatial distribution of the population and improving gene flow between the subpopulations. Construction or restoration of wetlands in the landscape between the clusters has the potential to address each of these objectives.

  11. Ten-year performance of ponderosa pine provenances in the Great Plains of North America

    Treesearch

    Ralph A. Read

    1983-01-01

    A cluster and discriminant analysis based on nine of the best plantations, partitioned the seed provenance populations into six geographic clusters according to their consistency of performance in the plantations.The Northcentral Nebraska cluster of three provenances performed consistently well above average in all plantations. These easternmost...

  12. Exemplar-Based Clustering via Simulated Annealing

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Kohn, Hans-Friedrich

    2009-01-01

    Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of "exemplars" as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed…

  13. Application of strainrange partitioning to the prediction of creep-fatigue lives of AISI types 304 and 316 stainless steel

    NASA Technical Reports Server (NTRS)

    Saltsman, J. F.; Halford, G. R.

    1976-01-01

    As a demonstration of the predictive capabilities of the method of Strainrange Partitioning, published high-temperature, low cycle, creep-fatigue test results on AISI Types 304 and 316 stainless steel were analyzed and calculated, cyclic lives compared with observed lives. Predicted lives agreed with observed lives within factors of two for 76 percent, factors of three for 93 percent, and factors of four for 98 percent of the laboratory tests analyzed. Agreement between observed and predicted lives is judged satisfactory considering that the data are associated with a number of variables (two alloys, several heats and heat treatments, a range of temperatures, different testing techniques, etc.) that are not directly accounted for in the calculations.

  14. Electronic and geometric structures of Au30 clusters: a network of 2e-superatom Au cores protected by tridentate protecting motifs with u3-S

    NASA Astrophysics Data System (ADS)

    Tian, Zhimei; Cheng, Longjiu

    2015-12-01

    Density functional theory calculations have been performed to study the experimentally synthesized Au30S(SR)18 and two related Au30(SR)18 and Au30S2(SR)18 clusters. The patterns of thiolate ligands on the gold cores for the three thiolate-protected Au30 nanoclusters are on the basis of the ``divide and protect'' concept. A novel extended protecting motif with u3-S, S(Au2(SR)2)2AuSR, is discovered, which is termed the tridentate protecting motif. The Au cores of Au30S(SR)18, Au30(SR)18 and Au30S2(SR)18 clusters are Au17, Au20 and Au14, respectively. The superatom-network (SAN) model and the superatom complex (SAC) model are used to explain the chemical bonding patterns, which are verified by chemical bonding analysis based on the adaptive natural density partitioning (AdNDP) method and aromatic analysis on the basis of the nucleus-independent chemical shift (NICS) method. The Au17 core of the Au30S(SR)18 cluster can be viewed as a SAN of one Au6 superatom and four Au4 superatoms. The shape of the Au6 core is identical to that revealed in the recently synthesized Au18(SR)14 cluster. The Au20 core of the Au30(SR)18 cluster can be viewed as a SAN of two Au6 superatoms and four Au4 superatoms. The Au14 core of Au30S2(SR)18 can be regarded as a SAN of two pairs of two vertex-sharing Au4 superatoms. Meanwhile, the Au14 core is an 8e-superatom with 1S21P6 configuration. Our work may aid understanding and give new insights into the chemical synthesis of thiolate-protected Au clusters.Density functional theory calculations have been performed to study the experimentally synthesized Au30S(SR)18 and two related Au30(SR)18 and Au30S2(SR)18 clusters. The patterns of thiolate ligands on the gold cores for the three thiolate-protected Au30 nanoclusters are on the basis of the ``divide and protect'' concept. A novel extended protecting motif with u3-S, S(Au2(SR)2)2AuSR, is discovered, which is termed the tridentate protecting motif. The Au cores of Au30S(SR)18, Au30(SR)18 and Au30S2(SR)18 clusters are Au17, Au20 and Au14, respectively. The superatom-network (SAN) model and the superatom complex (SAC) model are used to explain the chemical bonding patterns, which are verified by chemical bonding analysis based on the adaptive natural density partitioning (AdNDP) method and aromatic analysis on the basis of the nucleus-independent chemical shift (NICS) method. The Au17 core of the Au30S(SR)18 cluster can be viewed as a SAN of one Au6 superatom and four Au4 superatoms. The shape of the Au6 core is identical to that revealed in the recently synthesized Au18(SR)14 cluster. The Au20 core of the Au30(SR)18 cluster can be viewed as a SAN of two Au6 superatoms and four Au4 superatoms. The Au14 core of Au30S2(SR)18 can be regarded as a SAN of two pairs of two vertex-sharing Au4 superatoms. Meanwhile, the Au14 core is an 8e-superatom with 1S21P6 configuration. Our work may aid understanding and give new insights into the chemical synthesis of thiolate-protected Au clusters. Electronic supplementary information (ESI) available: The AdNDP localized natural bonding orbitals of the valence shells of the Au30S(SH)18 cluster. IR spectra, absorption spectra and coordinates of Au30S(SCH3)18, Au30(SCH3)18 and Au30S2(SCH3)18 clusters. See DOI: 10.1039/c5nr05020k

  15. Overlapping clusters for distributed computation.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mirrokni, Vahab; Andersen, Reid; Gleich, David F.

    2010-11-01

    Scalable, distributed algorithms must address communication problems. We investigate overlapping clusters, or vertex partitions that intersect, for graph computations. This setup stores more of the graph than required but then affords the ease of implementation of vertex partitioned algorithms. Our hope is that this technique allows us to reduce communication in a computation on a distributed graph. The motivation above draws on recent work in communication avoiding algorithms. Mohiyuddin et al. (SC09) design a matrix-powers kernel that gives rise to an overlapping partition. Fritzsche et al. (CSC2009) develop an overlapping clustering for a Schwarz method. Both techniques extend an initialmore » partitioning with overlap. Our procedure generates overlap directly. Indeed, Schwarz methods are commonly used to capitalize on overlap. Elsewhere, overlapping communities (Ahn et al, Nature 2009; Mishra et al. WAW2007) are now a popular model of structure in social networks. These have long been studied in statistics (Cole and Wishart, CompJ 1970). We present two types of results: (i) an estimated swapping probability {rho}{infinity}; and (ii) the communication volume of a parallel PageRank solution (link-following {alpha} = 0.85) using an additive Schwarz method. The volume ratio is the amount of extra storage for the overlap (2 means we store the graph twice). Below, as the ratio increases, the swapping probability and PageRank communication volume decreases.« less

  16. Multi-viewpoint clustering analysis

    NASA Technical Reports Server (NTRS)

    Mehrotra, Mala; Wild, Chris

    1993-01-01

    In this paper, we address the feasibility of partitioning rule-based systems into a number of meaningful units to enhance the comprehensibility, maintainability and reliability of expert systems software. Preliminary results have shown that no single structuring principle or abstraction hierarchy is sufficient to understand complex knowledge bases. We therefore propose the Multi View Point - Clustering Analysis (MVP-CA) methodology to provide multiple views of the same expert system. We present the results of using this approach to partition a deployed knowledge-based system that navigates the Space Shuttle's entry. We also discuss the impact of this approach on verification and validation of knowledge-based systems.

  17. Scalability and Portability of Two Parallel Implementations of ADI

    NASA Technical Reports Server (NTRS)

    Phung, Thanh; VanderWijngaart, Rob F.

    1994-01-01

    Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.

  18. Environmental clustering of lakes to evaluate performance of a macrophyte index of biotic integrity

    USGS Publications Warehouse

    Vondracek, Bruce C.; Vondracek, Bruce; Hatch, Lorin K.

    2013-01-01

    Proper classification of sites is critical for the use of biological indices that can distinguish between natural and human-induced variation in biological response. The macrophyte-based index of biotic integrity was developed to assess the condition of Minnesota lakes in relation to anthropogenic stressors, but macrophyte community composition varies naturally across the state. The goal of the study was to identify environmental characteristics that naturally influence macrophyte index response and establish a preliminary lake classification scheme for biological assessment (bioassessment). Using a comprehensive set of environmental variables, we identified similar groups of lakes by clustering using flexible beta classification. Variance partitioning analysis of IBI response indicated that evaluating similar lake clusters could improve the ability of the macrophyte index to identify community change to anthropogenic stressors, although lake groups did not fully account for the natural variation in macrophyte composition. Diagnostic capabilities of the index could be improved when evaluating lakes with similar environmental characteristics, suggesting the index has potential for accurate bioassessment provided comparable groups of lakes are evaluated.

  19. Digital Breast Tomosynthesis: Observer Performance of Clustered Microcalcification Detection on Breast Phantom Images Acquired with an Experimental System Using Variable Scan Angles, Angular Increments, and Number of Projection Views

    PubMed Central

    Goodsitt, Mitchell M.; Helvie, Mark A.; Zelakiewicz, Scott; Schmitz, Andrea; Noroozian, Mitra; Paramagul, Chintana; Roubidoux, Marilyn A.; Nees, Alexis V.; Neal, Colleen H.; Carson, Paul; Lu, Yao; Hadjiiski, Lubomir; Wei, Jun

    2014-01-01

    Purpose To investigate the dependence of microcalcification cluster detectability on tomographic scan angle, angular increment, and number of projection views acquired at digital breast tomosynthesis (DBTdigital breast tomosynthesis). Materials and Methods A prototype DBTdigital breast tomosynthesis system operated in step-and-shoot mode was used to image breast phantoms. Four 5-cm-thick phantoms embedded with 81 simulated microcalcification clusters of three speck sizes (subtle, medium, and obvious) were imaged by using a rhodium target and rhodium filter with 29 kV, 50 mAs, and seven acquisition protocols. Fixed angular increments were used in four protocols (denoted as scan angle, angular increment, and number of projection views, respectively: 16°, 1°, and 17; 24°, 3°, and nine; 30°, 3°, and 11; and 60°, 3°, and 21), and variable increments were used in three (40°, variable, and 13; 40°, variable, and 15; and 60°, variable, and 21). The reconstructed DBTdigital breast tomosynthesis images were interpreted by six radiologists who located the microcalcification clusters and rated their conspicuity. Results The mean sensitivity for detection of subtle clusters ranged from 80% (22.5 of 28) to 96% (26.8 of 28) for the seven DBTdigital breast tomosynthesis protocols; the highest sensitivity was achieved with the 16°, 1°, and 17 protocol (96%), but the difference was significant only for the 60°, 3°, and 21 protocol (80%, P < .002) and did not reach significance for the other five protocols (P = .01–.15). The mean sensitivity for detection of medium and obvious clusters ranged from 97% (28.2 of 29) to 100% (24 of 24), but the differences fell short of significance (P = .08 to >.99). The conspicuity of subtle and medium clusters with the 16°, 1°, and 17 protocol was rated higher than those with other protocols; the differences were significant for subtle clusters with the 24°, 3°, and nine protocol and for medium clusters with 24°, 3°, and nine; 30°, 3°, and 11; 60°, 3° and 21; and 60°, variable, and 21 protocols (P < .002). Conclusion With imaging that did not include x-ray source motion or patient motion during acquisition of the projection views, narrow-angle DBTdigital breast tomosynthesis provided higher sensitivity and conspicuity than wide-angle DBTdigital breast tomosynthesis for subtle microcalcification clusters. © RSNA, 2014 PMID:25007048

  20. Calcium-decorated carbyne networks as hydrogen storage media.

    PubMed

    Sorokin, Pavel B; Lee, Hoonkyung; Antipina, Lyubov Yu; Singh, Abhishek K; Yakobson, Boris I

    2011-07-13

    Among the carbon allotropes, carbyne chains appear outstandingly accessible for sorption and very light. Hydrogen adsorption on calcium-decorated carbyne chain was studied using ab initio density functional calculations. The estimation of surface area of carbyne gives the value four times larger than that of graphene, which makes carbyne attractive as a storage scaffold medium. Furthermore, calculations show that a Ca-decorated carbyne can adsorb up to 6 H(2) molecules per Ca atom with a binding energy of ∼0.2 eV, desirable for reversible storage, and the hydrogen storage capacity can exceed ∼8 wt %. Unlike recently reported transition metal-decorated carbon nanostructures, which suffer from the metal clustering diminishing the storage capacity, the clustering of Ca atoms on carbyne is energetically unfavorable. Thermodynamics of adsorption of H(2) molecules on the Ca atom was also investigated using equilibrium grand partition function.

  1. Validated and longitudinally stable asthma phenotypes based on cluster analysis of the ADEPT study.

    PubMed

    Loza, Matthew J; Djukanovic, Ratko; Chung, Kian Fan; Horowitz, Daniel; Ma, Keying; Branigan, Patrick; Barnathan, Elliot S; Susulic, Vedrana S; Silkoff, Philip E; Sterk, Peter J; Baribaud, Frédéric

    2016-12-15

    Asthma is a disease of varying severity and differing disease mechanisms. To date, studies aimed at stratifying asthma into clinically useful phenotypes have produced a number of phenotypes that have yet to be assessed for stability and to be validated in independent cohorts. The aim of this study was to define and validate, for the first time ever, clinically driven asthma phenotypes using two independent, severe asthma cohorts: ADEPT and U-BIOPRED. Fuzzy partition-around-medoid clustering was performed on pre-specified data from the ADEPT participants (n = 156) and independently on data from a subset of U-BIOPRED asthma participants (n = 82) for whom the same variables were available. Models for cluster classification probabilities were derived and applied to the 12-month longitudinal ADEPT data and to a larger subset of the U-BIOPRED asthma dataset (n = 397). High and low type-2 inflammation phenotypes were defined as high or low Th2 activity, indicated by endobronchial biopsies gene expression changes downstream of IL-4 or IL-13. Four phenotypes were identified in the ADEPT (training) cohort, with distinct clinical and biomarker profiles. Phenotype 1 was "mild, good lung function, early onset", with a low-inflammatory, predominantly Type-2, phenotype. Phenotype 2 had a "moderate, hyper-responsive, eosinophilic" phenotype, with moderate asthma control, mild airflow obstruction and predominant Type-2 inflammation. Phenotype 3 had a "mixed severity, predominantly fixed obstructive, non-eosinophilic and neutrophilic" phenotype, with moderate asthma control and low Type-2 inflammation. Phenotype 4 had a "severe uncontrolled, severe reversible obstruction, mixed granulocytic" phenotype, with moderate Type-2 inflammation. These phenotypes had good longitudinal stability in the ADEPT cohort. They were reproduced and demonstrated high classification probability in two subsets of the U-BIOPRED asthma cohort. Focusing on the biology of the four clinical independently-validated easy-to-assess ADEPT asthma phenotypes will help understanding the unmet need and will aid in developing tailored therapies. NCT01274507 (ADEPT), registered October 28, 2010 and NCT01982162 (U-BIOPRED), registered October 30, 2013.

  2. Data Clustering

    NASA Astrophysics Data System (ADS)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a particular application involves considerations of the kind of data being analyzed, algorithm runtime efficiency, and how much prior knowledge is available about the problem domain, which can dictate the nature of clusters sought. Fundamentally, the clustering method and its representations of clusters carries with it a definition of what a cluster is, and it is important that this be aligned with the analysis goals for the problem at hand. In this chapter, I emphasize this point by identifying for each algorithm the cluster representation as a model, m_j , even for algorithms that are not typically thought of as creating a “model.” This chapter surveys a basic collection of clustering methods useful to any practitioner who is interested in applying clustering to a new data set. The algorithms include k-means (Section 25.2), EM (Section 25.3), agglomerative (Section 25.4), and spectral (Section 25.5) clustering, with side mentions of variants such as kernel k-means and divisive clustering. The chapter also discusses each algorithm’s strengths and limitations and provides pointers to additional in-depth reading for each subject. Section 25.6 discusses methods for incorporating domain knowledge into the clustering process. This chapter concludes with a brief survey of interesting applications of clustering methods to astronomy data (Section 25.7). The chapter begins with k-means because it is both generally accessible and so widely used that understanding it can be considered a necessary prerequisite for further work in the field. EM can be viewed as a more sophisticated version of k-means that uses a generative model for each cluster and probabilistic item assignments. Agglomerative clustering is the most basic form of hierarchical clustering and provides a basis for further exploration of algorithms in that vein. Spectral clustering permits a departure from feature-vector-based clustering and can operate on data sets instead represented as affinity, or similarity matrices—cases in which only pairwise information is known. The list of algorithms covered in this chapter is representative of those most commonly in use, but it is by no means comprehensive. There is an extensive collection of existing books on clustering that provide additional background and depth. Three early books that remain useful today are Anderberg’s Cluster Analysis for Applications [3], Hartigan’s Clustering Algorithms [25], and Gordon’s Classification [22]. The latter covers basics on similarity measures, partitioning and hierarchical algorithms, fuzzy clustering, overlapping clustering, conceptual clustering, validations methods, and visualization or data reduction techniques such as principal components analysis (PCA),multidimensional scaling, and self-organizing maps. More recently, Jain et al. provided a useful and informative survey [27] of a variety of different clustering algorithms, including those mentioned here as well as fuzzy, graph-theoretic, and evolutionary clustering. Everitt’s Cluster Analysis [19] provides a modern overview of algorithms, similarity measures, and evaluation methods.

  3. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

    PubMed

    Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B

    2016-01-01

    Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous considerations of etiology, comorbid conditions, and biomarker levels, may be superior to bedside classifications.

  4. Localized Hotspots Drive Continental Geography of Abnormal Amphibians on U.S. Wildlife Refuges

    PubMed Central

    Reeves, Mari K.; Medley, Kimberly A.; Pinkney, Alfred E.; Holyoak, Marcel; Johnson, Pieter T. J.; Lannoo, Michael J.

    2013-01-01

    Amphibians with missing, misshapen, and extra limbs have garnered public and scientific attention for two decades, yet the extent of the phenomenon remains poorly understood. Despite progress in identifying the causes of abnormalities in some regions, a lack of knowledge about their broader spatial distribution and temporal dynamics has hindered efforts to understand their implications for amphibian population declines and environmental quality. To address this data gap, we conducted a nationwide, 10-year assessment of 62,947 amphibians on U.S. National Wildlife Refuges. Analysis of a core dataset of 48,081 individuals revealed that consistent with expected background frequencies, an average of 2% were abnormal, but abnormalities exhibited marked spatial variation with a maximum prevalence of 40%. Variance partitioning analysis demonstrated that factors associated with space (rather than species or year sampled) captured 97% of the variation in abnormalities, and the amount of partitioned variance decreased with increasing spatial scale (from site to refuge to region). Consistent with this, abnormalities occurred in local to regional hotspots, clustering at scales of tens to hundreds of kilometers. We detected such hotspot clusters of high-abnormality sites in the Mississippi River Valley, California, and Alaska. Abnormality frequency was more variable within than outside of hotspot clusters. This is consistent with dynamic phenomena such as disturbance or natural enemies (pathogens or predators), whereas similarity of abnormality frequencies at scales of tens to hundreds of kilometers suggests involvement of factors that are spatially consistent at a regional scale. Our characterization of the spatial and temporal variation inherent in continent-wide amphibian abnormalities demonstrates the disproportionate contribution of local factors in predicting hotspots, and the episodic nature of their occurrence. PMID:24260103

  5. Pre-crash scenarios at road junctions: A clustering method for car crash data.

    PubMed

    Nitsche, Philippe; Thomas, Pete; Stuetz, Rainer; Welsh, Ruth

    2017-10-01

    Given the recent advancements in autonomous driving functions, one of the main challenges is safe and efficient operation in complex traffic situations such as road junctions. There is a need for comprehensive testing, either in virtual simulation environments or on real-world test tracks. This paper presents a novel data analysis method including the preparation, analysis and visualization of car crash data, to identify the critical pre-crash scenarios at T- and four-legged junctions as a basis for testing the safety of automated driving systems. The presented method employs k-medoids to cluster historical junction crash data into distinct partitions and then applies the association rules algorithm to each cluster to specify the driving scenarios in more detail. The dataset used consists of 1056 junction crashes in the UK, which were exported from the in-depth "On-the-Spot" database. The study resulted in thirteen crash clusters for T-junctions, and six crash clusters for crossroads. Association rules revealed common crash characteristics, which were the basis for the scenario descriptions. The results support existing findings on road junction accidents and provide benchmark situations for safety performance tests in order to reduce the possible number parameter combinations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Consensus-Based Sorting of Neuronal Spike Waveforms

    PubMed Central

    Fournier, Julien; Mueller, Christian M.; Shein-Idelson, Mark; Hemberger, Mike

    2016-01-01

    Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained “ground truth” data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data. PMID:27536990

  7. Consensus-Based Sorting of Neuronal Spike Waveforms.

    PubMed

    Fournier, Julien; Mueller, Christian M; Shein-Idelson, Mark; Hemberger, Mike; Laurent, Gilles

    2016-01-01

    Optimizing spike-sorting algorithms is difficult because sorted clusters can rarely be checked against independently obtained "ground truth" data. In most spike-sorting algorithms in use today, the optimality of a clustering solution is assessed relative to some assumption on the distribution of the spike shapes associated with a particular single unit (e.g., Gaussianity) and by visual inspection of the clustering solution followed by manual validation. When the spatiotemporal waveforms of spikes from different cells overlap, the decision as to whether two spikes should be assigned to the same source can be quite subjective, if it is not based on reliable quantitative measures. We propose a new approach, whereby spike clusters are identified from the most consensual partition across an ensemble of clustering solutions. Using the variability of the clustering solutions across successive iterations of the same clustering algorithm (template matching based on K-means clusters), we estimate the probability of spikes being clustered together and identify groups of spikes that are not statistically distinguishable from one another. Thus, we identify spikes that are most likely to be clustered together and therefore correspond to consistent spike clusters. This method has the potential advantage that it does not rely on any model of the spike shapes. It also provides estimates of the proportion of misclassified spikes for each of the identified clusters. We tested our algorithm on several datasets for which there exists a ground truth (simultaneous intracellular data), and show that it performs close to the optimum reached by a support vector machine trained on the ground truth. We also show that the estimated rate of misclassification matches the proportion of misclassified spikes measured from the ground truth data.

  8. Evaluating Spatial Variability in Sediment and Phosphorus Concentration-Discharge Relationships Using Bayesian Inference and Self-Organizing Maps

    NASA Astrophysics Data System (ADS)

    Underwood, Kristen L.; Rizzo, Donna M.; Schroth, Andrew W.; Dewoolkar, Mandar M.

    2017-12-01

    Given the variable biogeochemical, physical, and hydrological processes driving fluvial sediment and nutrient export, the water science and management communities need data-driven methods to identify regions prone to production and transport under variable hydrometeorological conditions. We use Bayesian analysis to segment concentration-discharge linear regression models for total suspended solids (TSS) and particulate and dissolved phosphorus (PP, DP) using 22 years of monitoring data from 18 Lake Champlain watersheds. Bayesian inference was leveraged to estimate segmented regression model parameters and identify threshold position. The identified threshold positions demonstrated a considerable range below and above the median discharge—which has been used previously as the default breakpoint in segmented regression models to discern differences between pre and post-threshold export regimes. We then applied a Self-Organizing Map (SOM), which partitioned the watersheds into clusters of TSS, PP, and DP export regimes using watershed characteristics, as well as Bayesian regression intercepts and slopes. A SOM defined two clusters of high-flux basins, one where PP flux was predominantly episodic and hydrologically driven; and another in which the sediment and nutrient sourcing and mobilization were more bimodal, resulting from both hydrologic processes at post-threshold discharges and reactive processes (e.g., nutrient cycling or lateral/vertical exchanges of fine sediment) at prethreshold discharges. A separate DP SOM defined two high-flux clusters exhibiting a bimodal concentration-discharge response, but driven by differing land use. Our novel framework shows promise as a tool with broad management application that provides insights into landscape drivers of riverine solute and sediment export.

  9. Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

    NASA Astrophysics Data System (ADS)

    Yasid, A.

    2018-01-01

    Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.

  10. Clustering box office movie with Partition Around Medoids (PAM) Algorithm based on Text Mining of Indonesian subtitle

    NASA Astrophysics Data System (ADS)

    Alfarizy, A. D.; Indahwati; Sartono, B.

    2017-03-01

    Indonesia is the largest Hollywood movie industry target market in Southeast Asia in 2015. Hollywood movies distributed in Indonesia targeted people in all range of ages including children. Low awareness of guiding children while watching movies make them could watch any rated films even the unsuitable ones for their ages. Even after being translated into Bahasa and passed the censorship phase, words that uncomfortable for children to watch still exist. The purpose of this research is to cluster box office Hollywood movies based on Indonesian subtitle, revenue, IMDb user rating and genres as one of the reference for adults to choose right movies for their children to watch. Text mining is used to extract words from the subtitles and count the frequency for three group of words (bad words, sexual words and terror words), while Partition Around Medoids (PAM) Algorithm with Gower similarity coefficient as proximity matrix is used as clustering method. We clustered 624 movies from 2006 until first half of 2016 from IMDb. Cluster with highest silhouette coefficient value (0.36) is the one with 5 clusters. Animation, Adventure and Comedy movies with high revenue like in cluster 5 is recommended for children to watch, while Comedy movies with high revenue like in cluster 4 should be avoided to watch.

  11. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

    PubMed

    Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K

    2003-11-01

    Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf

  12. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.

  13. Searching for Variable Stars in the Field of Dolidze 35 (Abstract)

    NASA Astrophysics Data System (ADS)

    Welch, J.; Smith, J. A.

    2018-06-01

    (Abstract only) We are conducting a study of the open cluster Dolidze-35. We have a data set which contains several nights and spans four years. One step of our survey is to search these data to identify candidate local standards and potential variable stars. We present early results of the variable search effort.

  14. Cluster-guided imaging-based CFD analysis of airflow and particle deposition in asthmatic human lungs

    NASA Astrophysics Data System (ADS)

    Choi, Jiwoong; Leblanc, Lawrence; Choi, Sanghun; Haghighi, Babak; Hoffman, Eric; Lin, Ching-Long

    2017-11-01

    The goal of this study is to assess inter-subject variability in delivery of orally inhaled drug products to small airways in asthmatic lungs. A recent multiscale imaging-based cluster analysis (MICA) of computed tomography (CT) lung images in an asthmatic cohort identified four clusters with statistically distinct structural and functional phenotypes associating with unique clinical biomarkers. Thus, we aimed to address inter-subject variability via inter-cluster variability. We selected a representative subject from each of the 4 asthma clusters as well as 1 male and 1 female healthy controls, and performed computational fluid and particle simulations on CT-based airway models of these subjects. The results from one severe and one non-severe asthmatic cluster subjects characterized by segmental airway constriction had increased particle deposition efficiency, as compared with the other two cluster subjects (one non-severe and one severe asthmatics) without airway constriction. Constriction-induced jets impinging on distal bifurcations led to excessive particle deposition. The results emphasize the impact of airway constriction on regional particle deposition rather than disease severity, demonstrating the potential of using cluster membership to tailor drug delivery. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837. XSEDE.

  15. Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System.

    PubMed

    Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

    2016-01-01

    Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP).

  16. Travel Time Estimation Using Freeway Point Detector Data Based on Evolving Fuzzy Neural Inference System

    PubMed Central

    Tang, Jinjun; Zou, Yajie; Ash, John; Zhang, Shen; Liu, Fang; Wang, Yinhai

    2016-01-01

    Travel time is an important measurement used to evaluate the extent of congestion within road networks. This paper presents a new method to estimate the travel time based on an evolving fuzzy neural inference system. The input variables in the system are traffic flow data (volume, occupancy, and speed) collected from loop detectors located at points both upstream and downstream of a given link, and the output variable is the link travel time. A first order Takagi-Sugeno fuzzy rule set is used to complete the inference. For training the evolving fuzzy neural network (EFNN), two learning processes are proposed: (1) a K-means method is employed to partition input samples into different clusters, and a Gaussian fuzzy membership function is designed for each cluster to measure the membership degree of samples to the cluster centers. As the number of input samples increases, the cluster centers are modified and membership functions are also updated; (2) a weighted recursive least squares estimator is used to optimize the parameters of the linear functions in the Takagi-Sugeno type fuzzy rules. Testing datasets consisting of actual and simulated data are used to test the proposed method. Three common criteria including mean absolute error (MAE), root mean square error (RMSE), and mean absolute relative error (MARE) are utilized to evaluate the estimation performance. Estimation results demonstrate the accuracy and effectiveness of the EFNN method through comparison with existing methods including: multiple linear regression (MLR), instantaneous model (IM), linear model (LM), neural network (NN), and cumulative plots (CP). PMID:26829639

  17. [Study of the clinical phenotype of symptomatic chronic airways disease by hierarchical cluster analysis and two-step cluster analyses].

    PubMed

    Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M

    2016-09-01

    To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire(SGRQ) score, acute exacerbation in the past one year, PEF variability and allergic dermatitis (P<0.05). (2) Four clusters were also identified by two-step cluster analysis as followings, cluster 1, COPD patients with moderate to severe airflow limitation; cluster 2, asthma and COPD patients with heavy smoking, airflow limitation and increased airways reversibility; cluster 3, patients having less smoking and normal pulmonary function with wheezing but no chronic cough; cluster 4, chronic bronchitis patients with normal pulmonary function and chronic cough. Significant differences were revealed regarding gender distribution, respiratory symptoms, pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, MMEF% pred, DLCO/VA% pred, RV% pred, PEF variability, total serum IgE level, cumulative tobacco cigarette consumption (pack-years), and SGRQ score (P<0.05). By different cluster analyses, distinct clinical phenotypes of chronic airway diseases are identified. Thus, individualized treatments may guide doctors to provide based on different phenotypes.

  18. Optimal Clustering in Graphs with Weighted Edges: A Unified Approach to the Threshold Problem.

    ERIC Educational Resources Information Center

    Goetschel, Roy; Voxman, William

    1987-01-01

    Relations on a finite set V are viewed as weighted graphs. Using the language of graph theory, two methods of partitioning V are examined: selecting threshold values and applying them to a maximal weighted spanning forest, and using a parametric linear program to obtain a most adhesive partition. (Author/EM)

  19. Spatial-Temporal dynamics of Newtonian and viscoelastic turbulence

    NASA Astrophysics Data System (ADS)

    Wang, Sung-Ning; Graham, Michael

    2015-11-01

    Introducing a trace amount of polymer into liquid turbulent flows can result in substantial reduction of friction drag. This phenomenon has been widely used in fluid transport, such as the Alaska crude oil pipeline. However, the mechanism is not well understood. We conduct direct numerical simulations of Newtonian and viscoelastic turbulence in large domains, in which the flow shows different characteristics in different regions. In some areas the drag is low and vortex motions are quiescent, while in other areas the drag is higher and the motions are more active. To identify these regions, we apply a statistical method, k-means clustering, which partitions the observations into k clusters by assigning each observation to its nearest centroid. The resulting partition maximizes the between-cluster variance. In the simulations, the observations are the instantaneous wall shear rate. Regions with different levels of drag are automatically identified by the partitioning algorithm. We find that the velocity profiles of the centroids exhibit characteristics similar to the individual coherent structures observed in minimal domain simulations. In addition, as viscoelasticity increases, polymer stretch becomes strongly correlated with wall shear stress. This work was supported by NSF grant CBET-1510291.

  20. Existence and significance of communities in the World Trade Web

    NASA Astrophysics Data System (ADS)

    Piccardi, Carlo; Tajoli, Lucia

    2012-06-01

    The World Trade Web (WTW), which models the international transactions among countries, is a fundamental tool for studying the economics of trade flows, their evolution over time, and their implications for a number of phenomena, including the propagation of economic shocks among countries. In this respect, the possible existence of communities is a key point, because it would imply that countries are organized in groups of preferential partners. In this paper, we use four approaches to analyze communities in the WTW between 1962 and 2008, based, respectively, on modularity optimization, cluster analysis, stability functions, and persistence probabilities. Overall, the four methods agree in finding no evidence of significant partitions. A few weak communities emerge from the analysis, but they do not represent secluded groups of countries, as intercommunity linkages are also strong, supporting the view of a truly globalized trading system.

  1. Existence and significance of communities in the World Trade Web.

    PubMed

    Piccardi, Carlo; Tajoli, Lucia

    2012-06-01

    The World Trade Web (WTW), which models the international transactions among countries, is a fundamental tool for studying the economics of trade flows, their evolution over time, and their implications for a number of phenomena, including the propagation of economic shocks among countries. In this respect, the possible existence of communities is a key point, because it would imply that countries are organized in groups of preferential partners. In this paper, we use four approaches to analyze communities in the WTW between 1962 and 2008, based, respectively, on modularity optimization, cluster analysis, stability functions, and persistence probabilities. Overall, the four methods agree in finding no evidence of significant partitions. A few weak communities emerge from the analysis, but they do not represent secluded groups of countries, as intercommunity linkages are also strong, supporting the view of a truly globalized trading system.

  2. Characterization of spatial and temporal variability in hydrochemistry of Johor Straits, Malaysia.

    PubMed

    Abdullah, Pauzi; Abdullah, Sharifah Mastura Syed; Jaafar, Othman; Mahmud, Mastura; Khalik, Wan Mohd Afiq Wan Mohd

    2015-12-15

    Characterization of hydrochemistry changes in Johor Straits within 5 years of monitoring works was successfully carried out. Water quality data sets (27 stations and 19 parameters) collected in this area were interpreted subject to multivariate statistical analysis. Cluster analysis grouped all the stations into four clusters ((Dlink/Dmax) × 100<90) and two clusters ((Dlink/Dmax) × 100<80) for site and period similarities. Principal component analysis rendered six significant components (eigenvalue>1) that explained 82.6% of the total variance of the data set. Classification matrix of discriminant analysis assigned 88.9-92.6% and 83.3-100% correctness in spatial and temporal variability, respectively. Times series analysis then confirmed that only four parameters were not significant over time change. Therefore, it is imperative that the environmental impact of reclamation and dredging works, municipal or industrial discharge, marine aquaculture and shipping activities in this area be effectively controlled and managed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Gate sequence for continuous variable one-way quantum computation

    PubMed Central

    Su, Xiaolong; Hao, Shuhong; Deng, Xiaowei; Ma, Lingyu; Wang, Meihong; Jia, Xiaojun; Xie, Changde; Peng, Kunchi

    2013-01-01

    Measurement-based one-way quantum computation using cluster states as resources provides an efficient model to perform computation and information processing of quantum codes. Arbitrary Gaussian quantum computation can be implemented sufficiently by long single-mode and two-mode gate sequences. However, continuous variable gate sequences have not been realized so far due to an absence of cluster states larger than four submodes. Here we present the first continuous variable gate sequence consisting of a single-mode squeezing gate and a two-mode controlled-phase gate based on a six-mode cluster state. The quantum property of this gate sequence is confirmed by the fidelities and the quantum entanglement of two output modes, which depend on both the squeezing and controlled-phase gates. The experiment demonstrates the feasibility of implementing Gaussian quantum computation by means of accessible gate sequences.

  4. Design of double fuzzy clustering-driven context neural networks.

    PubMed

    Kim, Eun-Hu; Oh, Sung-Kwun; Pedrycz, Witold

    2018-08-01

    In this study, we introduce a novel category of double fuzzy clustering-driven context neural networks (DFCCNNs). The study is focused on the development of advanced design methodologies for redesigning the structure of conventional fuzzy clustering-based neural networks. The conventional fuzzy clustering-based neural networks typically focus on dividing the input space into several local spaces (implied by clusters). In contrast, the proposed DFCCNNs take into account two distinct local spaces called context and cluster spaces, respectively. Cluster space refers to the local space positioned in the input space whereas context space concerns a local space formed in the output space. Through partitioning the output space into several local spaces, each context space is used as the desired (target) local output to construct local models. To complete this, the proposed network includes a new context layer for reasoning about context space in the output space. In this sense, Fuzzy C-Means (FCM) clustering is useful to form local spaces in both input and output spaces. The first one is used in order to form clusters and train weights positioned between the input and hidden layer, whereas the other one is applied to the output space to form context spaces. The key features of the proposed DFCCNNs can be enumerated as follows: (i) the parameters between the input layer and hidden layer are built through FCM clustering. The connections (weights) are specified as constant terms being in fact the centers of the clusters. The membership functions (represented through the partition matrix) produced by the FCM are used as activation functions located at the hidden layer of the "conventional" neural networks. (ii) Following the hidden layer, a context layer is formed to approximate the context space of the output variable and each node in context layer means individual local model. The outputs of the context layer are specified as a combination of both weights formed as linear function and the outputs of the hidden layer. The weights are updated using the least square estimation (LSE)-based method. (iii) At the output layer, the outputs of context layer are decoded to produce the corresponding numeric output. At this time, the weighted average is used and the weights are also adjusted with the use of the LSE scheme. From the viewpoint of performance improvement, the proposed design methodologies are discussed and experimented with the aid of benchmark machine learning datasets. Through the experiments, it is shown that the generalization abilities of the proposed DFCCNNs are better than those of the conventional FCNNs reported in the literature. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. T-cell triggering thresholds are modulated by the number of antigen within individual T-cell receptor clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Manz, Boryana N.; Jackson, Bryan L.; Petit, Rebecca S.

    2011-05-31

    T cells react to extremely small numbers of activating agonist peptides. Spatial organization of T-cell receptors (TCR) and their peptide-major histocompatibility complex (pMHC) ligands into microclusters is correlated with T-cell activation. In this study, we have designed an experimental strategy that enables control over the number of agonist peptides per TCR cluster, without altering the total number engaged by the cell. Supported membranes, partitioned with grids of barriers to lateral mobility, provide an effective way of limiting the total number of pMHC ligands that may be assembled within a single TCR cluster. Observations directly reveal that restriction of pMHC contentmore » within individual TCR clusters can decrease T-cell sensitivity for triggering initial calcium flux at fixed total pMHC density. Further analysis suggests that triggering thresholds are determined by the number of activating ligands available to individual TCR clusters, not by the total number encountered by the cell. Results from a series of experiments in which the overall agonist density and the maximum number of agonist per TCR cluster are independently varied in primary T cells indicate that the most probable minimal triggering unit for calcium signaling is at least four pMHC in a single cluster for this system. In conclusion, this threshold is unchanged by inclusion of coagonist pMHC, but costimulation of CD28 by CD80 can modulate the threshold lower.« less

  6. Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

    PubMed

    Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

    2017-04-01

    Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.

  7. SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.

    PubMed

    Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A

    2018-01-01

    Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.

  8. A phase cell cluster expansion for Euclidean field theories

    NASA Astrophysics Data System (ADS)

    Battle, Guy A., III; Federbush, Paul

    1982-08-01

    We adapt the cluster expansion first used to treat infrared problems for lattice models (a mass zero cluster expansion) to the usual field theory situation. The field is expanded in terms of special block spin functions and the cluster expansion given in terms of the expansion coefficients (phase cell variables); the cluster expansion expresses correlation functions in terms of contributions from finite coupled subsets of these variables. Most of the present work is carried through in d space time dimensions (for φ24 the details of the cluster expansion are pursued and convergence is proven). Thus most of the results in the present work will apply to a treatment of φ34 to which we hope to return in a succeeding paper. Of particular interest in this paper is a substitute for the stability of the vacuum bound appropriate to this cluster expansion (for d = 2 and d = 3), and a new method for performing estimates with tree graphs. The phase cell cluster expansions have the renormalization group incorporated intimately into their structure. We hope they will be useful ultimately in treating four dimensional field theories.

  9. VizieR Online Data Catalog: RR Lyrae in 15 Galactic globular clusters (Dambis+, 2014)

    NASA Astrophysics Data System (ADS)

    Dambis, A. K.; Rastorguev, A. S.; Zabolotskikh, M. V.

    2014-11-01

    Last year, the WISE All-Sky Data Release (Cutri et al., 2012, Cat. II/328) was made public, mapping the entire sky in four mid-infrared bands W1, W2, W3 and W4 with the effective wavelengths of 3.368, 4.618, 12.082 and 22.194um, respectively. We cross-correlated the WISE single-exposure data base with the Catalogue of Galactic globular-cluster variables by Clement et al. (2001AJ....122.2587C), the Catalogue of Accurate Equatorial Coordinates for Variable Stars in Globular Clusters by Samus et al. (2009PASP..121.1378S, Cat. J/PASP/121/1378) and the catalogue of Sawyer Hogg (1973PDDO....3....6S, Cat. V/97) (for ω Cen, NGC 6723 and NGC 6934) to compute (via Fourier fits) the intensity-mean average W1- and W2-band magnitudes, and , for a total of 357 and 272 RR Lyrae type variables in 15 and 9 Galactic globular clusters, respectively. (1 data file).

  10. A dynamic re-partitioning strategy based on the distribution of key in Spark

    NASA Astrophysics Data System (ADS)

    Zhang, Tianyu; Lian, Xin

    2018-05-01

    Spark is a memory-based distributed data processing framework, has the ability of processing massive data and becomes a focus in Big Data. But the performance of Spark Shuffle depends on the distribution of data. The naive Hash partition function of Spark can not guarantee load balancing when data is skewed. The time of job is affected by the node which has more data to process. In order to handle this problem, dynamic sampling is used. In the process of task execution, histogram is used to count the key frequency distribution of each node, and then generate the global key frequency distribution. After analyzing the distribution of key, load balance of data partition is achieved. Results show that the Dynamic Re-Partitioning function is better than the default Hash partition, Fine Partition and the Balanced-Schedule strategy, it can reduce the execution time of the task and improve the efficiency of the whole cluster.

  11. A Partitioning and Bounded Variable Algorithm for Linear Programming

    ERIC Educational Resources Information Center

    Sheskin, Theodore J.

    2006-01-01

    An interesting new partitioning and bounded variable algorithm (PBVA) is proposed for solving linear programming problems. The PBVA is a variant of the simplex algorithm which uses a modified form of the simplex method followed by the dual simplex method for bounded variables. In contrast to the two-phase method and the big M method, the PBVA does…

  12. METAGUI 3: A graphical user interface for choosing the collective variables in molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Giorgino, Toni; Laio, Alessandro; Rodriguez, Alex

    2017-08-01

    Molecular dynamics (MD) simulations allow the exploration of the phase space of biopolymers through the integration of equations of motion of their constituent atoms. The analysis of MD trajectories often relies on the choice of collective variables (CVs) along which the dynamics of the system is projected. We developed a graphical user interface (GUI) for facilitating the interactive choice of the appropriate CVs. The GUI allows: defining interactively new CVs; partitioning the configurations into microstates characterized by similar values of the CVs; calculating the free energies of the microstates for both unbiased and biased (metadynamics) simulations; clustering the microstates in kinetic basins; visualizing the free energy landscape as a function of a subset of the CVs used for the analysis. A simple mouse click allows one to quickly inspect structures corresponding to specific points in the landscape.

  13. Analysis of the mutations induced by conazole fungicides in vivo.

    PubMed

    Ross, Jeffrey A; Leavitt, Sharon A

    2010-05-01

    The mouse liver tumorigenic conazole fungicides triadimefon and propiconazole have previously been shown to be in vivo mouse liver mutagens in the Big Blue transgenic mutation assay when administered in feed at tumorigenic doses, whereas the non-tumorigenic conazole myclobutanil was not mutagenic. DNA sequencing of the mutants recovered from each treatment group as well as from animals receiving control diet was conducted to gain additional insight into the mode of action by which tumorigenic conazoles induce mutations. Relative dinucleotide mutabilities (RDMs) were calculated for each possible dinucleotide in each treatment group and then examined by multivariate statistical analysis techniques. Unsupervised hierarchical clustering analysis of RDM values segregated two independent control groups together, along with the non-tumorigen myclobutanil. The two tumorigenic conazoles clustered together in a distinct grouping. Partitioning around mediods of RDM values into two clusters also groups the triadimefon and propiconazole together in one cluster and the two control groups and myclobutanil together in a second cluster. Principal component analysis of these results identifies two components that account for 88.3% of the variability in the points. Taken together, these results are consistent with the hypothesis that propiconazole- and triadimefon-induced mutations do not represent clonal expansion of background mutations and support the hypothesis that they arise from the accumulation of reactive electrophilic metabolic intermediates within the liver in vivo.

  14. Task-specific image partitioning.

    PubMed

    Kim, Sungwoong; Nowozin, Sebastian; Kohli, Pushmeet; Yoo, Chang D

    2013-02-01

    Image partitioning is an important preprocessing step for many of the state-of-the-art algorithms used for performing high-level computer vision tasks. Typically, partitioning is conducted without regard to the task in hand. We propose a task-specific image partitioning framework to produce a region-based image representation that will lead to a higher task performance than that reached using any task-oblivious partitioning framework and existing supervised partitioning framework, albeit few in number. The proposed method partitions the image by means of correlation clustering, maximizing a linear discriminant function defined over a superpixel graph. The parameters of the discriminant function that define task-specific similarity/dissimilarity among superpixels are estimated based on structured support vector machine (S-SVM) using task-specific training data. The S-SVM learning leads to a better generalization ability while the construction of the superpixel graph used to define the discriminant function allows a rich set of features to be incorporated to improve discriminability and robustness. We evaluate the learned task-aware partitioning algorithms on three benchmark datasets. Results show that task-aware partitioning leads to better labeling performance than the partitioning computed by the state-of-the-art general-purpose and supervised partitioning algorithms. We believe that the task-specific image partitioning paradigm is widely applicable to improving performance in high-level image understanding tasks.

  15. A classification of substance-dependent men on temperament and severity variables.

    PubMed

    Henderson, Melinda J; Galen, Luke W

    2003-06-01

    This study examined the validity of classifying substance abusers based on temperament and dependence severity, and expanded the scope of typology differences to proximal determinants of use (e.g., expectancies, motives). Patients were interviewed about substance use, depression, and family history of alcohol and drug abuse. Self-report instruments measuring temperament, expectancies, and motives were completed. Participants were 147 male veterans admitted to inpatient substance abuse treatment at a U.S. Department of Veterans Affairs medical center. Cluster analysis identified four types of users with two high substance problem severity and two low substance problem severity groups. Two, high problem severity, early onset groups differed only on the cluster variable of negative affectivity (NA), but showed differences on antisocial personality characteristics, hypochondriasis, and coping motives for alcohol. The two low problem severity groups were distinguished by age of onset and positive affectivity (PA). The late onset, low PA group had a higher incidence of depression, a greater tendency to use substances in solitary contexts, and lower enhancement motives for alcohol compared to the early onset, high PA cluster. The four-cluster solution yielded more distinctions on external criteria than the two-cluster solution. Such temperament variation within both high and low severity substance abusers may be important for treatment planning.

  16. Variability of O3 and NO2 profile shapes during DISCOVER-AQ: Implications for satellite observations and comparisons to model-simulated profiles

    NASA Astrophysics Data System (ADS)

    Flynn, Clare Marie; Pickering, Kenneth E.; Crawford, James H.; Weinheimer, Andrew J.; Diskin, Glenn; Thornhill, K. Lee; Loughner, Christopher; Lee, Pius; Strode, Sarah A.

    2016-12-01

    To investigate the variability of in situ profile shapes under a variety of meteorological and pollution conditions, results are presented of an agglomerative hierarchical cluster analysis of the in situ O3 and NO2 profiles for each of the four campaigns of the NASA DISCOVER-AQ mission. Understanding the observed profile variability for these trace gases is useful for understanding the accuracy of the assumed profile shapes used in satellite retrieval algorithms as well as for understanding the correlation between satellite column observations and surface concentrations. The four campaigns of the DISCOVER-AQ mission took place in Maryland during July 2011, the San Joaquin Valley of California during January-February 2013, the Houston, Texas, metropolitan region during September 2013, and the Denver-Front Range region of Colorado during July-August 2014. Several distinct profile clusters emerged for the California, Texas, and Colorado campaigns for O3, indicating significant variability of O3 profile shapes, while the Maryland campaign presented only one distinct O3 cluster. In contrast, very few distinct profile clusters emerged for NO2 during any campaign for this particular clustering technique, indicating the NO2 profile behavior was relatively uniform throughout each campaign. However, changes in NO2 profile shape were evident as the boundary layer evolved through the day, but they were apparently not significant enough to yield more clusters. The degree of vertical mixing (as indicated by temperature lapse rate) associated with each cluster exerted an important influence on the shapes of the median cluster profiles for O3, as well as impacted the correlations between the associated column and surface data for each cluster for O3. The correlation analyses suggest satellites may have the best chance to relate to surface O3 under the conditions encountered during the Maryland campaign Clusters 1 and 2, which include deep, convective boundary layers and few interruptions to this connection from complex meteorology, chemical environments, or orography. The regional CMAQ model captured the shape factors for O3, and moderately well captured the NO2 shape factors, for the conditions associated with the Maryland campaign, suggesting that a regional air quality model may adequately specify a priori profile shapes for remote sensing retrievals. CMAQ shape factor profiles were not as well represented for the other regions.

  17. The link between eddy-driven jet variability and weather regimes in the North Atlantic-European sector

    NASA Astrophysics Data System (ADS)

    Madonna, E.; Li, C.; Grams, C. M.; Woollings, T.

    2017-12-01

    Understanding the variability of the North Atlantic eddy-driven jet is key to unravelling the dynamics, predictability and climate change response of extratropical weather in the region. This study aims to 1) reconcile two perspectives on wintertime variability in the North Atlantic-European sector and 2) clarify their link to atmospheric blocking. Two common views of wintertime variability in the North Atlantic are the zonal-mean framework comprising three preferred locations of the eddy-driven jet (southern, central, northern), and the weather regime framework comprising four classical North Atlantic-European regimes (Atlantic ridge AR, zonal ZO, European/Scandinavian blocking BL, Greenland anticyclone GA). We use a k-means clustering algorithm to characterize the two-dimensional variability of the eddy-driven jet stream, defined by the lower tropospheric zonal wind in the ERA-Interim reanalysis. The first three clusters capture the central jet and northern jet, along with a new mixed jet configuration; a fourth cluster is needed to recover the southern jet. The mixed cluster represents a split or strongly tilted jet, neither of which is well described in the zonal-mean framework, and has a persistence of about one week, similar to the other clusters. Connections between the preferred jet locations and weather regimes are corroborated - southern to GA, central to ZO, and northern to AR. In addition, the new mixed cluster is found to be linked to European/Scandinavian blocking, whose relation to the eddy-driven jet was previously unclear. The results highlight the necessity of bridging from weather to climate scales for a deeper understanding of atmospheric circulation variability.

  18. CDF trigger interface board 'FRED'

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Campbell, M.; Dell' Orso, M.; Giannetti, P.

    1985-08-01

    We describe FASTBUS boards which interface sixteen different trigger interrupts to the Collider Detector Facility (CDF) data acquisition system. The boards are known to CDF by the acronym 'FRED'. The data acquisition scheme for CDF allows for up to 16 different parts of the detector, called 'Partitions', to run independently. Four partitions are reserved for physics runs and sophisticated calibration and debugging: they use the common Level 1 and Level 2 trigger logic and have access to information from all the components of the CDF detector. These four partitions are called ''CDF Partitions''. The remaining twelve partitions have no accessmore » to the common trigger logic and provide their own Level 1 and Level 2 signals: they are called ''Autonomous Partitions''. Fred collects and interprets signals from independent parts of the CDF trigger system and delivers Level 1 and Level 2 responses to the Trigger Supervisors (FASTBUS masters which control the data acquisition process in each partition).« less

  19. Variability in body size and shape of UK offshore workers: A cluster analysis approach.

    PubMed

    Stewart, Arthur; Ledingham, Robert; Williams, Hector

    2017-01-01

    Male UK offshore workers have enlarged dimensions compared with UK norms and knowledge of specific sizes and shapes typifying their physiques will assist a range of functions related to health and ergonomics. A representative sample of the UK offshore workforce (n = 588) underwent 3D photonic scanning, from which 19 extracted dimensional measures were used in k-means cluster analysis to characterise physique groups. Of the 11 resulting clusters four somatotype groups were expressed: one cluster was muscular and lean, four had greater muscularity than adiposity, three had equal adiposity and muscularity and three had greater adiposity than muscularity. Some clusters appeared constitutionally similar to others, differing only in absolute size. These cluster centroids represent an evidence-base for future designs in apparel and other applications where body size and proportions affect functional performance. They also constitute phenotypic evidence providing insight into the 'offshore culture' which may underpin the enlarged dimensions of offshore workers. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Rocky Mountain spotted fever in Georgia, 1961-75: analysis of social and environmental factors affecting occurrence.

    PubMed Central

    Newhouse, V F; Choi, K; Holman, R C; Thacker, S B; D'Angelo, L J; Smith, J D

    1986-01-01

    For the period of 1961 through 1975, 10 geographic and sociologic variables in each of the 159 counties of Georgia were analyzed to determine how they were correlated with the occurrence of Rocky Mountain spotted fever (RMSF). Combinations of variables were transformed into a smaller number of factors using principal-component analysis. Based upon the relative values of these factors, geographic areas of similarity were delineated by cluster analysis. It was found by use of these analyses that the counties of the State formed four similarity clusters, which we called south, central, lower north and upper north. When the incidence of RMSF was subsequently calculated for each of these regions of similarity, the regions had differing RMSF incidence; low in the south and upper north, moderate in the central, and high in the lower north. The four similarity clusters agreed closely with the incidence of RMSF when both were plotted on a map. Thus, when analyzed simultaneously, the 10 variables selected could be used to predict the occurrence of RMSF. The most important variables were those of climate and geography. Of secondary, but still major importance, were the changes over the 15-year period in variables associated with humans and their environmental alterations. Detailed examination of these factors has permitted quantitative evaluation of the simultaneous impacts of the geographic and sociologic variables on the occurrence of RMSF in Georgia. These analyses could be updated to reflect changes in the relevant variables and tested as a means of identifying new high risk areas for RMSF in the State. More generally, this method might be adapted to clarify our understanding of the relative importance of individual variables in the ecology of other diseases or environmental health problems. PMID:3090609

  1. Interaction of multiple biomimetic antimicrobial polymers with model bacterial membranes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baul, Upayan, E-mail: upayanb@imsc.res.in; Vemparala, Satyavani, E-mail: vani@imsc.res.in; Kuroda, Kenichi, E-mail: kkuroda@umich.edu

    Using atomistic molecular dynamics simulations, interaction of multiple synthetic random copolymers based on methacrylates on prototypical bacterial membranes is investigated. The simulations show that the cationic polymers form a micellar aggregate in water phase and the aggregate, when interacting with the bacterial membrane, induces clustering of oppositely charged anionic lipid molecules to form clusters and enhances ordering of lipid chains. The model bacterial membrane, consequently, develops lateral inhomogeneity in membrane thickness profile compared to polymer-free system. The individual polymers in the aggregate are released into the bacterial membrane in a phased manner and the simulations suggest that the most probablemore » location of the partitioned polymers is near the 1-palmitoyl-2-oleoyl-phosphatidylglycerol (POPG) clusters. The partitioned polymers preferentially adopt facially amphiphilic conformations at lipid-water interface, despite lacking intrinsic secondary structures such as α-helix or β-sheet found in naturally occurring antimicrobial peptides.« less

  2. Competitive learning with pairwise constraints.

    PubMed

    Covões, Thiago F; Hruschka, Eduardo R; Ghosh, Joydeep

    2013-01-01

    Constrained clustering has been an active research topic since the last decade. Most studies focus on batch-mode algorithms. This brief introduces two algorithms for on-line constrained learning, named on-line linear constrained vector quantization error (O-LCVQE) and constrained rival penalized competitive learning (C-RPCL). The former is a variant of the LCVQE algorithm for on-line settings, whereas the latter is an adaptation of the (on-line) RPCL algorithm to deal with constrained clustering. The accuracy results--in terms of the normalized mutual information (NMI)--from experiments with nine datasets show that the partitions induced by O-LCVQE are competitive with those found by the (batch-mode) LCVQE. Compared with this formidable baseline algorithm, it is surprising that C-RPCL can provide better partitions (in terms of the NMI) for most of the datasets. Also, experiments on a large dataset show that on-line algorithms for constrained clustering can significantly reduce the computational time.

  3. Whole Blood Gene Expression Profiling Predicts Severe Morbidity and Mortality in Cystic Fibrosis: A 5-Year Follow-Up Study.

    PubMed

    Saavedra, Milene T; Quon, Bradley S; Faino, Anna; Caceres, Silvia M; Poch, Katie R; Sanders, Linda A; Malcolm, Kenneth C; Nichols, David P; Sagel, Scott D; Taylor-Cousar, Jennifer L; Leach, Sonia M; Strand, Matthew; Nick, Jerry A

    2018-05-01

    Cystic fibrosis pulmonary exacerbations accelerate pulmonary decline and increase mortality. Previously, we identified a 10-gene leukocyte panel measured directly from whole blood, which indicates response to exacerbation treatment. We hypothesized that molecular characteristics of exacerbations could also predict future disease severity. We tested whether a 10-gene panel measured from whole blood could identify patient cohorts at increased risk for severe morbidity and mortality, beyond standard clinical measures. Transcript abundance for the 10-gene panel was measured from whole blood at the beginning of exacerbation treatment (n = 57). A hierarchical cluster analysis of subjects based on their gene expression was performed, yielding four molecular clusters. An analysis of cluster membership and outcomes incorporating an independent cohort (n = 21) was completed to evaluate robustness of cluster partitioning of genes to predict severe morbidity and mortality. The four molecular clusters were analyzed for differences in forced expiratory volume in 1 second, C-reactive protein, return to baseline forced expiratory volume in 1 second after treatment, time to next exacerbation, and time to morbidity or mortality events (defined as lung transplant referral, lung transplant, intensive care unit admission for respiratory insufficiency, or death). Clustering based on gene expression discriminated between patient groups with significant differences in forced expiratory volume in 1 second, admission frequency, and overall morbidity and mortality. At 5 years, all subjects in cluster 1 (very low risk) were alive and well, whereas 90% of subjects in cluster 4 (high risk) had suffered a major event (P = 0.0001). In multivariable analysis, the ability of gene expression to predict clinical outcomes remained significant, despite adjustment for forced expiratory volume in 1 second, sex, and admission frequency. The robustness of gene clustering to categorize patients appropriately in terms of clinical characteristics, and short- and long-term clinical outcomes, remained consistent, even when adding in a secondary population with significantly different clinical outcomes. Whole blood gene expression profiling allows molecular classification of acute pulmonary exacerbations, beyond standard clinical measures, providing a predictive tool for identifying subjects at increased risk for mortality and disease progression.

  4. The Cluster AgeS Experiment (CASE). Variable Stars in the Field of the Globular Cluster M22

    NASA Astrophysics Data System (ADS)

    Rozyczka, M.; Thompson, I. B.; Pych, W.; Narloch, W.; Poleski, R.; Schwarzenberg-Czerny, A.

    2017-09-01

    The field of the globular cluster M22 (NGC 6656) was monitored between 2000 and 2008 in a search for variable stars. BV light curves were obtained for 359 periodic, likely periodic, and long-term variables, 238 of which are new detections. 39 newly detected variables, and 63 previously known ones are members or likely members of the cluster, including 20 SX Phe, 10 RRab and 16 RRc type pulsators, one BL Her type pulsator, 21 contact binaries, and 9 detached or semi-detached eclipsing binaries. The most interesting among the identified objects are V112 - a bright multimode SX Phe pulsator, V125 - a β Lyr type binary on the blue horizontal branch, V129 - a blue/yellow straggler with a W UMa-like light curve, located halfway between the extreme horizontal branch and red giant branch, and V134 - an extreme horizontal branch object with P=2.33 d and a nearly sinusoidal light curve. All four of them are proper motion members of the cluster. Among nonmembers, a P=2.83 d detached eclipsing binary hosting a δ Sct type pulsator was found, and a peculiar P=0.93 d binary with ellipsoidal modulation and narrow minimum in the middle of one of the descending shoulders of the sinusoid. We also collected substantial new data for previously known variables. In particular we revise the statistics of the occurrence of the Blazhko effect in RR Lyr type variables of M22.

  5. Attention-based classification pattern, a research domain criteria framework, in youths with bipolar disorder and attention-deficit/hyperactivity disorder.

    PubMed

    Kleinman, Ana; Caetano, Sheila Cavalcante; Brentani, Helena; Rocca, Cristiana Castanho de Almeida; dos Santos, Bernardo; Andrade, Enio Roberto; Zeni, Cristian Patrick; Tramontina, Silzá; Rohde, Luis Augusto Paim; Lafer, Beny

    2015-03-01

    The National Institute of Mental Health has initiated the Research Domain Criteria (RDoC) project. Instead of using disorder categories as the basis for grouping individuals, the RDoC suggests finding relevant dimensions that can cut across traditional disorders. Our aim was to use the RDoC's framework to study patterns of attention deficit based on results of Conners' Continuous Performance Test (CPT II) in youths diagnosed with bipolar disorder (BD), attention-deficit/hyperactivity disorder (ADHD), BD+ADHD and controls. Eighteen healthy controls, 23 patients with ADHD, 10 with BD and 33 BD+ADHD aged 12-17 years old were assessed. Pattern recognition was used to partition subjects into clusters based simultaneously on their performance in all CPT II variables. A Fisher's linear discriminant analysis was used to build a classifier. Using cluster analysis, the entire sample set was best clustered into two new groups, A and B, independently of the original diagnoses. ADHD and BD+ADHD were divided almost 50% in each subgroup, and there was an agglomeration of controls and BD in group B. Group A presented a greater impairment with higher means in all CPT II variables and lower Children's Global Assessment Scale. We found a high cross-validated classification accuracy for groups A and B: 95.2%. Variability of response time was the strongest CPT II measure in the discriminative pattern between groups A and B. Our classificatory exercise supports the concept behind new approaches, such as the RDoC framework, for child and adolescent psychiatry. Our approach was able to define clinical subgroups that could be used in future pathophysiological and treatment studies. © The Royal Australian and New Zealand College of Psychiatrists 2014.

  6. A mathematical programming approach for sequential clustering of dynamic networks

    NASA Astrophysics Data System (ADS)

    Silva, Jonathan C.; Bennett, Laura; Papageorgiou, Lazaros G.; Tsoka, Sophia

    2016-02-01

    A common analysis performed on dynamic networks is community structure detection, a challenging problem that aims to track the temporal evolution of network modules. An emerging area in this field is evolutionary clustering, where the community structure of a network snapshot is identified by taking into account both its current state as well as previous time points. Based on this concept, we have developed a mixed integer non-linear programming (MINLP) model, SeqMod, that sequentially clusters each snapshot of a dynamic network. The modularity metric is used to determine the quality of community structure of the current snapshot and the historical cost is accounted for by optimising the number of node pairs co-clustered at the previous time point that remain so in the current snapshot partition. Our method is tested on social networks of interactions among high school students, college students and members of the Brazilian Congress. We show that, for an adequate parameter setting, our algorithm detects the classes that these students belong more accurately than partitioning each time step individually or by partitioning the aggregated snapshots. Our method also detects drastic discontinuities in interaction patterns across network snapshots. Finally, we present comparative results with similar community detection methods for time-dependent networks from the literature. Overall, we illustrate the applicability of mathematical programming as a flexible, adaptable and systematic approach for these community detection problems. Contribution to the Topical Issue "Temporal Network Theory and Applications", edited by Petter Holme.

  7. Procedure of Partitioning Data Into Number of Data Sets or Data Group - A Review

    NASA Astrophysics Data System (ADS)

    Kim, Tai-Hoon

    The goal of clustering is to decompose a dataset into similar groups based on a objective function. Some already well established clustering algorithms are there for data clustering. Objective of these data clustering algorithms are to divide the data points of the feature space into a number of groups (or classes) so that a predefined set of criteria are satisfied. The article considers the comparative study about the effectiveness and efficiency of traditional data clustering algorithms. For evaluating the performance of the clustering algorithms, Minkowski score is used here for different data sets.

  8. Electric-field-induced association of colloidal particles

    NASA Astrophysics Data System (ADS)

    Fraden, Seth; Hurd, Alan J.; Meyer, Robert B.

    1989-11-01

    Dilute suspensions of micron diameter dielectric spheres confined to two dimensions are induced to aggregate linearly by application of an electric field. The growth of the average cluster size agrees well with the Smoluchowski equation, but the evolution of the measured cluster size distribution exhibits significant departures from theory at large times due to the formation of long linear clusters which effectively partition space into isolated one-dimensional strips.

  9. Zoning method for environmental engineering geological patterns in underground coal mining areas.

    PubMed

    Liu, Shiliang; Li, Wenping; Wang, Qiqing

    2018-09-01

    Environmental engineering geological patterns (EEGPs) are used to express the trend and intensity of eco-geological environment caused by mining in underground coal mining areas, a complex process controlled by multiple factors. A new zoning method for EEGPs was developed based on the variable-weight theory (VWT), where the weights of factors vary with their value. The method was applied to the Yushenfu mining area, Shaanxi, China. First, the mechanism of the EEGPs caused by mining was elucidated, and four types of EEGPs were proposed. Subsequently, 13 key control factors were selected from mining conditions, lithosphere, hydrosphere, ecosphere, and climatic conditions; their thematic maps were constructed using ArcGIS software and remote-sensing technologies. Then, a stimulation-punishment variable-weight model derived from the partition of basic evaluation unit of study area, construction of partition state-variable-weight vector, and determination of variable-weight interval was built to calculate the variable weights of each factor. On this basis, a zoning mathematical model of EEGPs was established, and the zoning results were analyzed. For comparison, the traditional constant-weight theory (CWT) was also applied to divide the EEGPs. Finally, the zoning results obtained using VWT and CWT were compared. The verification of field investigation indicates that VWT is more accurate and reliable than CWT. The zoning results are consistent with the actual situations and the key of planning design for the rational development of coal resources and protection of eco-geological environment. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Phenotyping asthma, rhinitis and eczema in MeDALL population-based birth cohorts: an allergic comorbidity cluster.

    PubMed

    Garcia-Aymerich, J; Benet, M; Saeys, Y; Pinart, M; Basagaña, X; Smit, H A; Siroux, V; Just, J; Momas, I; Rancière, F; Keil, T; Hohmann, C; Lau, S; Wahn, U; Heinrich, J; Tischer, C G; Fantini, M P; Lenzi, J; Porta, D; Koppelman, G H; Postma, D S; Berdel, D; Koletzko, S; Kerkhof, M; Gehring, U; Wickman, M; Melén, E; Hallberg, J; Bindslev-Jensen, C; Eller, E; Kull, I; Lødrup Carlsen, K C; Carlsen, K-H; Lambrecht, B N; Kogevinas, M; Sunyer, J; Kauffmann, F; Bousquet, J; Antó, J M

    2015-08-01

    Asthma, rhinitis and eczema often co-occur in children, but their interrelationships at the population level have been poorly addressed. We assessed co-occurrence of childhood asthma, rhinitis and eczema using unsupervised statistical techniques. We included 17 209 children at 4 years and 14 585 at 8 years from seven European population-based birth cohorts (MeDALL project). At each age period, children were grouped, using partitioning cluster analysis, according to the distribution of 23 variables covering symptoms 'ever' and 'in the last 12 months', doctor diagnosis, age of onset and treatments of asthma, rhinitis and eczema; immunoglobulin E sensitization; weight; and height. We tested the sensitivity of our estimates to subject and variable selections, and to different statistical approaches, including latent class analysis and self-organizing maps. Two groups were identified as the optimal way to cluster the data at both age periods and in all sensitivity analyses. The first (reference) group at 4 and 8 years (including 70% and 79% of children, respectively) was characterized by a low prevalence of symptoms and sensitization, whereas the second (symptomatic) group exhibited more frequent symptoms and sensitization. Ninety-nine percentage of children with comorbidities (co-occurrence of asthma, rhinitis and/or eczema) were included in the symptomatic group at both ages. The children's characteristics in both groups were consistent in all sensitivity analyses. At 4 and 8 years, at the population level, asthma, rhinitis and eczema can be classified together as an allergic comorbidity cluster. Future research including time-repeated assessments and biological data will help understanding the interrelationships between these diseases. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  11. Mothers of young children cluster into 4 groups based on psychographic food decision influencers.

    PubMed

    Byrd-Bredbenner, Carol; Abbot, Jaclyn Maurer; Cussler, Ellen

    2008-08-01

    This study explored how mothers grouped into clusters according to multiple psychographic food decision influencers and how the clusters differed in nutrient intake and nutrient content of their household food supply. Mothers (n = 201) completed a survey assessing basic demographic characteristics, food shopping and meal preparation activities, self and spouse employment, exposure to formal food or nutrition education, education level and occupation, weight status, nutrition and food preparation knowledge and skill, family member health and nutrition status, food decision influencer constructs, and dietary intake. In addition, an in-home inventory of 100 participants' household food supplies was conducted. Four distinct clusters presented when 26 psychographic food choice influencers were evaluated. These clusters appear to be valid and robust classifications of mothers in that they discriminated well on the psychographic variables used to construct the clusters as well as numerous other variables not used in the cluster analysis. In addition, the clusters appear to transcend demographic variables that often segment audiences (eg, race, mother's age, socioeconomic status), thereby adding a new dimension to the way in which this audience can be characterized. Furthermore, psychographically defined clusters predicted dietary quality. This study demonstrates that mothers are not a homogenous group and need to have their unique characteristics taken into consideration when designing strategies to promote health. These results can help health practitioners better understand factors affecting food decisions and tailor interventions to better meet the needs of mothers.

  12. Inter-individual variability and pattern recognition of surface electromyography in front crawl swimming.

    PubMed

    Martens, Jonas; Daly, Daniel; Deschamps, Kevin; Staes, Filip; Fernandes, Ricardo J

    2016-12-01

    Variability of electromyographic (EMG) recordings is a complex phenomenon rarely examined in swimming. Our purposes were to investigate inter-individual variability in muscle activation patterns during front crawl swimming and assess if there were clusters of sub patterns present. Bilateral muscle activity of rectus abdominis (RA) and deltoideus medialis (DM) was recorded using wireless surface EMG in 15 adult male competitive swimmers. The amplitude of the median EMG trial of six upper arm movement cycles was used for the inter-individual variability assessment, quantified with the coefficient of variation, coefficient of quartile variation, the variance ratio and mean deviation. Key features were selected based on qualitative and quantitative classification strategies to enter in a k-means cluster analysis to examine the presence of strong sub patterns. Such strong sub patterns were found when clustering in two, three and four clusters. Inter-individual variability in a group of highly skilled swimmers was higher compared to other cyclic movements which is in contrast to what has been reported in the previous 50years of EMG research in swimming. This leads to the conclusion that coaches should be careful in using overall reference EMG information to enhance the individual swimming technique of their athletes. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Variable Stars In the Unusual, Metal-Rich Globular Cluster

    NASA Technical Reports Server (NTRS)

    Pritzl, Barton J.; Smith, Horace A.; Catelan, Marcio; Sweigart, Allen V.; Oegerle, William R. (Technical Monitor)

    2002-01-01

    We have undertaken a search for variable stars in the metal-rich globular cluster NGC 6388 using time-series BV photometry. Twenty-eight new variables were found in this survey, increasing the total number of variables found near NGC 6388 to approx. 57. A significant number of the variables are RR Lyrae (approx. 14), most of which are probable cluster members. The periods of the fundamental mode RR Lyrae are shown to be unusually long compared to metal-rich field stars. The existence of these long period RRab stars suggests that the horizontal branch of NGC 6388 is unusually bright. This implies that the metallicity-luminosity relationship for RR Lyrae stars is not universal if the RR Lyrae in NGC 6388 are indeed metal-rich. We consider the alternative possibility that the stars in NGC 6388 may span a range in [Fe/H]. Four candidate Population II Cepheids were also found. If they are members of the cluster, NGC 6388 would be the most metal-rich globular cluster to contain Population II Cepheids. The mean V magnitude of the RR Lyrae is found to be 16.85 +/- 0.05 resulting in a distance of 9.0 to 10.3 kpc, for a range of assumed values of (M(sub V)) for RR Lyrae. We determine the reddening of the cluster to be E(B - V) = 0.40 +/- 0.03 mag, with differential reddening across the face of the cluster. We discuss the difficulty in determining the Oosterhoff classification of NGC 6388 and NGC 6441 due to the unusual nature of their RR Lyrae, and address evolutionary constraints on a recent suggestion that they are of Oosterhoff type II.

  14. SciSpark: In-Memory Map-Reduce for Earth Science Algorithms

    NASA Astrophysics Data System (ADS)

    Ramirez, P.; Wilson, B. D.; Whitehall, K. D.; Palamuttam, R. S.; Mattmann, C. A.; Shah, S.; Goodman, A.; Burke, W.

    2016-12-01

    We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based Apache Hadoop by 100x in memory and by 10x on disk. SciSpark extends Spark to support Earth Science use in three ways: Efficient ingest of N-dimensional geo-located arrays (physical variables) from netCDF3/4, HDF4/5, and/or OPeNDAP URLS; Array operations for dense arrays in scala and Java using the ND4S/ND4J or Breeze libraries; Operations to "split" datasets across a Spark cluster by time or space or both. For example, a decade-long time-series of geo-variables can be split across time to enable parallel "speedups" of analysis by day, month, or season. Similarly, very high-resolution climate grids can be partitioned into spatial tiles for parallel operations across rows, columns, or blocks. In addition, using Spark's gateway into python, PySpark, one can utilize the entire ecosystem of numpy, scipy, etc. Finally, SciSpark Notebooks provide a modern eNotebook technology in which scala, python, or spark-sql codes are entered into cells in the Notebook and executed on the cluster, with results, plots, or graph visualizations displayed in "live widgets". We have exercised SciSpark by implementing three complex Use Cases: discovery and evolution of Mesoscale Convective Complexes (MCCs) in storms, yielding a graph of connected components; PDF Clustering of atmospheric state using parallel K-Means; and statistical "rollups" of geo-variables or model-to-obs. differences (i.e. mean, stddev, skewness, & kurtosis) by day, month, season, year, and multi-year. Geo-variables are ingested and split across the cluster using methods on the sciSparkContext object including netCDFVariables() for spatial decomposition and wholeNetCDFVariables() for time-series. The presentation will cover the architecture of SciSpark, the design of the scientific RDD (sRDD) data structures for N-dim. arrays, results from the three science Use Cases, example Notebooks, lessons learned from the algorithm implementations, and parallel performance metrics.

  15. SLURM: Simple Linux Utility for Resource Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jette, M; Grondona, M

    2002-12-19

    Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling and stream copy modules. This paper presents an overview of the SLURM architecture and functionality.

  16. SLURM: Simplex Linux Utility for Resource Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jette, M; Grondona, M

    2003-04-22

    Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling, and stream copy modules. This paper presents an overview of the SLURM architecture and functionality.

  17. Graph partitions and cluster synchronization in networks of oscillators

    PubMed Central

    Schaub, Michael T.; O’Clery, Neave; Billeh, Yazan N.; Delvenne, Jean-Charles; Lambiotte, Renaud; Barahona, Mauricio

    2017-01-01

    Synchronization over networks depends strongly on the structure of the coupling between the oscillators. When the coupling presents certain regularities, the dynamics can be coarse-grained into clusters by means of External Equitable Partitions of the network graph and their associated quotient graphs. We exploit this graph-theoretical concept to study the phenomenon of cluster synchronization, in which different groups of nodes converge to distinct behaviors. We derive conditions and properties of networks in which such clustered behavior emerges, and show that the ensuing dynamics is the result of the localization of the eigenvectors of the associated graph Laplacians linked to the existence of invariant subspaces. The framework is applied to both linear and non-linear models, first for the standard case of networks with positive edges, before being generalized to the case of signed networks with both positive and negative interactions. We illustrate our results with examples of both signed and unsigned graphs for consensus dynamics and for partial synchronization of oscillator networks under the master stability function as well as Kuramoto oscillators. PMID:27781454

  18. Demonstration of Monogamy Relations for Einstein-Podolsky-Rosen Steering in Gaussian Cluster States.

    PubMed

    Deng, Xiaowei; Xiang, Yu; Tian, Caixing; Adesso, Gerardo; He, Qiongyi; Gong, Qihuang; Su, Xiaolong; Xie, Changde; Peng, Kunchi

    2017-06-09

    Understanding how quantum resources can be quantified and distributed over many parties has profound applications in quantum communication. As one of the most intriguing features of quantum mechanics, Einstein-Podolsky-Rosen (EPR) steering is a useful resource for secure quantum networks. By reconstructing the covariance matrix of a continuous variable four-mode square Gaussian cluster state subject to asymmetric loss, we quantify the amount of bipartite steering with a variable number of modes per party, and verify recently introduced monogamy relations for Gaussian steerability, which establish quantitative constraints on the security of information shared among different parties. We observe a very rich structure for the steering distribution, and demonstrate one-way EPR steering of the cluster state under Gaussian measurements, as well as one-to-multimode steering. Our experiment paves the way for exploiting EPR steering in Gaussian cluster states as a valuable resource for multiparty quantum information tasks.

  19. Demonstration of Monogamy Relations for Einstein-Podolsky-Rosen Steering in Gaussian Cluster States

    NASA Astrophysics Data System (ADS)

    Deng, Xiaowei; Xiang, Yu; Tian, Caixing; Adesso, Gerardo; He, Qiongyi; Gong, Qihuang; Su, Xiaolong; Xie, Changde; Peng, Kunchi

    2017-06-01

    Understanding how quantum resources can be quantified and distributed over many parties has profound applications in quantum communication. As one of the most intriguing features of quantum mechanics, Einstein-Podolsky-Rosen (EPR) steering is a useful resource for secure quantum networks. By reconstructing the covariance matrix of a continuous variable four-mode square Gaussian cluster state subject to asymmetric loss, we quantify the amount of bipartite steering with a variable number of modes per party, and verify recently introduced monogamy relations for Gaussian steerability, which establish quantitative constraints on the security of information shared among different parties. We observe a very rich structure for the steering distribution, and demonstrate one-way EPR steering of the cluster state under Gaussian measurements, as well as one-to-multimode steering. Our experiment paves the way for exploiting EPR steering in Gaussian cluster states as a valuable resource for multiparty quantum information tasks.

  20. Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis.

    PubMed

    Brindle, R C; Ginty, A T; Jones, A; Phillips, A C; Roseboom, T J; Carroll, D; Painter, R C; de Rooij, S R

    2016-12-01

    Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure reactivity patterns and hypertension in a large prospective cohort (age range 55-60 years). Four clusters emerged with statistically different systolic and diastolic blood pressure and heart rate reactivity patterns. Cluster 1 was characterised by a relatively exaggerated blood pressure and heart rate response while the blood pressure and heart rate responses of cluster 2 were relatively modest and in line with the sample mean. Cluster 3 was characterised by blunted cardiovascular stress reactivity across all variables and cluster 4, by an exaggerated blood pressure response and modest heart rate response. Membership to cluster 4 conferred an increased risk of hypertension at 5-year follow-up (hazard ratio=2.98 (95% CI: 1.50-5.90), P<0.01) that survived adjustment for a host of potential confounding variables. These results suggest that the cardiac reactivity plays a potentially important role in the link between blood pressure reactivity and hypertension and support the use of multivariate approaches to stress psychophysiology.

  1. A roadmap of clustering algorithms: finding a match for a biomedical application.

    PubMed

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  2. Exploring Different Patterns of Love Attitudes among Chinese College Students.

    PubMed

    Zeng, Xianglong; Pan, Yiqin; Zhou, Han; Yu, Shi; Liu, Xiangping

    2016-01-01

    Individual differences in love attitudes and the relationship between love attitudes and other variables in Asian culture lack in-depth exploration. This study conducted cluster analysis with data regarding love attitudes obtained from 389 college students in mainland China. The result of cluster analysis based on love-attitude scales distinguished four types of students: game players, rational lovers, emotional lovers, and absence lovers. These four groups of students showed significant differences in sexual attitudes and personality traits of deliberation and dutifulness but not self-discipline. The study's implications for future studies on love attitudes in certain cultural groups were also discussed.

  3. A CCD Search for Variable Stars of Spectral Type B in the Northern Hemisphere Open Clusters. VII. NGC 1502

    NASA Astrophysics Data System (ADS)

    Michalska, G.; Pigulski, A.; Stęlicki, M.; Narwid, A.

    2009-12-01

    We present results of variability search in the field of the young open cluster NGC 1502. Eight variable stars were discovered. Of six other stars in the observed field that were suspected for variability, we confirm variability of two, including one β Cep star, NGC 1502-26. The remaining four suspects were found to be constant in our photometry. In addition, UBVIC photometry of the well-known massive eclipsing binary SZ Cam was obtained. The new variable stars include: two eclipsing binaries of which one is a relatively bright detached system with an EA-type light curve, an α2 CVn-type variable, an SPB candidate, a field RR Lyr star and three other variables showing variability of unknown origin. The variability of two of them is probably related to their emission in Hα, which has been measured by means of the α index obtained for 57 stars brighter than V≍16 mag in the central part of the observed field. Four other non-variable stars with emission in Hα were also found. Additionally, we provide VIC photometry for stars down to V=17 mag and UB photometry for about 50 brightest stars in the observed field. We also show that the 10 Myr isochrone fits very well the observed color-magnitude diagram if a distance of 1 kpc and mean reddening, E(V-IC)=0.9 mag are adopted.

  4. Canonical partition functions: ideal quantum gases, interacting classical gases, and interacting quantum gases

    NASA Astrophysics Data System (ADS)

    Zhou, Chi-Chun; Dai, Wu-Sheng

    2018-02-01

    In statistical mechanics, for a system with a fixed number of particles, e.g. a finite-size system, strictly speaking, the thermodynamic quantity needs to be calculated in the canonical ensemble. Nevertheless, the calculation of the canonical partition function is difficult. In this paper, based on the mathematical theory of the symmetric function, we suggest a method for the calculation of the canonical partition function of ideal quantum gases, including ideal Bose, Fermi, and Gentile gases. Moreover, we express the canonical partition functions of interacting classical and quantum gases given by the classical and quantum cluster expansion methods in terms of the Bell polynomial in mathematics. The virial coefficients of ideal Bose, Fermi, and Gentile gases are calculated from the exact canonical partition function. The virial coefficients of interacting classical and quantum gases are calculated from the canonical partition function by using the expansion of the Bell polynomial, rather than calculated from the grand canonical potential.

  5. A Photometric Search for Planets in the Open Cluster NGC 7086

    NASA Astrophysics Data System (ADS)

    Rosvick, Joanne M.; Robb, Russell

    2006-12-01

    In an attempt to discover short-period, Jupiter-mass planets orbiting solar-type stars in open clusters, we searched for planetary transits in the populous and relatively unstudied open cluster NGC 7086. A color-magnitude diagram constructed from new B and V photometry is presented, along with revised estimates of the cluster's color excess, distance modulus, and age. Several turnoff stars were observed spectroscopically in order to determine a color excess of E(B-V)=0.83+/-0.02. Empirically fitting the main sequences of two young open clusters and the semiempirical zero-age main sequence of Vandenberg and Poll yielded a distance modulus of (V-MV)=13.4+/-0.3 mag. This corresponds to a true distance modulus of (m-M)0=10.8 mag or a distance of 1.5 kpc to NGC 7086. These values were used with isochrones from the Padova group to obtain a cluster age of 100 Myr. Eleven nights of R-band photometry were used to search for planetary transits. Differential magnitudes were constructed for each star in the cluster. Light curves for each star were produced on a night-to-night basis and inspected for variability. No planetary transits were apparent; however, some interesting variable stars were discovered: a pulsating variable that appears to be a member of the γ Dor class and four possible eclipsing binary stars, one of which actually may be a multiple system.

  6. High- and low-level hierarchical classification algorithm based on source separation process

    NASA Astrophysics Data System (ADS)

    Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber

    2016-10-01

    High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.

  7. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox.

    PubMed

    Basto, Mafalda P; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation.

  8. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox

    PubMed Central

    Basto, Mafalda P.; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W.; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation. PMID:26727497

  9. Assessing and grouping chemicals applying partial ordering Alkyl anilines as an illustrative example.

    PubMed

    Carlsen, Lars; Bruggemann, Rainer

    2018-06-03

    In chemistry there is a long tradition in classification. Usually methods are adopted from the wide field of cluster analysis. Here, based on the example of 21 alkyl anilines we show that also concepts taken out from the mathematical discipline of partially ordered sets may also be applied. The chemical compounds are described by a multi-indicator system. For the present study four indicators, mainly taken from the field of environmental chemistry were applied and a Hasse diagram was constructed. A Hasse diagram is an acyclic, transitively reduced, triangle free graph that may have several components. The crucial question is, whether or not the Hasse diagram can be interpreted from a structural chemical point of view. This is indeed the case, but it must be clearly stated that a guarantee for meaningful results in general cannot be given. For that further theoretical work is needed. Two cluster analysis methods are applied (K-means and a hierarchical cluster method). In both cases the partitioning of the set of 21 compounds by the component structure of the Hasse diagram appears to be better interpretable. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  10. A New Technique using Electron Velocity Data from the Four Cluster Spacecraft to Explore Magnetofluid Turbulence in the Solar Wind

    NASA Technical Reports Server (NTRS)

    Goldstein, Melvyn L.; Gurgiolo, C.; Fazakerley, A.; Lahiff, A.

    2008-01-01

    It is now possible in certain circumstances to use velocity moments computed from the Plasma Electron and Current Experiment (PEACE) on the four Cluster spacecraft to determine a number of turbulence properties of the solar wind, including direct measurements of the vorticity and compressibility. Assuming that the four spacecraft are not co-planar and that there is only a linear variation of the plasma variables across the volume defined by the four satellites, one can estimate the curl of the fluid velocity, i.e., the vorticity. From the vorticity it is possible to explore directly intermittent regions in the solar wind where dissipation is likely to be enhanced. In addition, one can estimate directly the Taylor microscale.

  11. Pre-selection and assessment of green organic solvents by clustering chemometric tools.

    PubMed

    Tobiszewski, Marek; Nedyalkova, Miroslava; Madurga, Sergio; Pena-Pereira, Francisco; Namieśnik, Jacek; Simeonov, Vasil

    2018-01-01

    The study presents the result of the application of chemometric tools for selection of physicochemical parameters of solvents for predicting missing variables - bioconcentration factors, water-octanol and octanol-air partitioning constants. EPI Suite software was successfully applied to predict missing values for solvents commonly considered as "green". Values for logBCF, logK OW and logK OA were modelled for 43 rather nonpolar solvents and 69 polar ones. Application of multivariate statistics was also proved to be useful in the assessment of the obtained modelling results. The presented approach can be one of the first steps and support tools in the assessment of chemicals in terms of their greenness. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Features of asthma which provide meaningful insights for understanding the disease heterogeneity.

    PubMed

    Deliu, M; Yavuz, T S; Sperrin, M; Belgrave, D; Sahiner, U M; Sackesen, C; Kalayci, O; Custovic, A

    2018-01-01

    Data-driven methods such as hierarchical clustering (HC) and principal component analysis (PCA) have been used to identify asthma subtypes, with inconsistent results. To develop a framework for the discovery of stable and clinically meaningful asthma subtypes. We performed HC in a rich data set from 613 asthmatic children, using 45 clinical variables (Model 1), and after PCA dimensionality reduction (Model 2). Clinical experts then identified a set of asthma features/domains which informed clusters in the two analyses. In Model 3, we reclustered the data using these features to ascertain whether this improved the discovery process. Cluster stability was poor in Models 1 and 2. Clinical experts highlighted four asthma features/domains which differentiated the clusters in two models: age of onset, allergic sensitization, severity, and recent exacerbations. In Model 3 (HC using these four features), cluster stability improved substantially. The cluster assignment changed, providing more clinically interpretable results. In a 5-cluster model, we labelled the clusters as: "Difficult asthma" (n = 132); "Early-onset mild atopic" (n = 210); "Early-onset mild non-atopic: (n = 153); "Late-onset" (n = 105); and "Exacerbation-prone asthma" (n = 13). Multinomial regression demonstrated that lung function was significantly diminished among children with "Difficult asthma"; blood eosinophilia was a significant feature of "Difficult," "Early-onset mild atopic," and "Late-onset asthma." Children with moderate-to-severe asthma were present in each cluster. An integrative approach of blending the data with clinical expert domain knowledge identified four features, which may be informative for ascertaining asthma endotypes. These findings suggest that variables which are key determinants of asthma presence, severity, or control may not be the most informative for determining asthma subtypes. Our results indicate that exacerbation-prone asthma may be a separate asthma endotype and that severe asthma is not a single entity, but an extreme end of the spectrum of several different asthma endotypes. © 2017 The Authors. Clinical & Experimental Allergy published by John Wiley & Sons Ltd.

  13. Sequential detection of temporal communities by estrangement confinement.

    PubMed

    Kawadia, Vikas; Sreenivasan, Sameet

    2012-01-01

    Temporal communities are the result of a consistent partitioning of nodes across multiple snapshots of an evolving network, and they provide insights into how dense clusters in a network emerge, combine, split and decay over time. To reliably detect temporal communities we need to not only find a good community partition in a given snapshot but also ensure that it bears some similarity to the partition(s) found in the previous snapshot(s), a particularly difficult task given the extreme sensitivity of community structure yielded by current methods to changes in the network structure. Here, motivated by the inertia of inter-node relationships, we present a new measure of partition distance called estrangement, and show that constraining estrangement enables one to find meaningful temporal communities at various degrees of temporal smoothness in diverse real-world datasets. Estrangement confinement thus provides a principled approach to uncovering temporal communities in evolving networks.

  14. A Deep X-ray Survey of the Globular Cluster Omega Centauri

    NASA Astrophysics Data System (ADS)

    Henleywillis, Simon; Cool, Adrienne M.; Haggard, Daryl; Heinke, Craig; Callanan, Paul; Zhao, Yue

    2018-03-01

    We identify 233 X-ray sources, of which 95 are new, in a 222 ks exposure of Omega Centauri with the Chandra X-ray Observatory's ACIS-I detector. The limiting unabsorbed flux in the core is fX(0.5-6.0 keV) ≃ 3×10-16 erg s-1 cm-2 (Lx ≃ 1×1030 erg s-1 at 5.2 kpc). We estimate that ˜60 ± 20 of these are cluster members, of which ˜30 lie within the core (rc = 155 arcsec), and another ˜30 between 1-2 core radii. We identify four new optical counterparts, for a total of 45 likely identifications. Probable cluster members include 18 cataclysmic variables (CVs) and CV candidates, one quiescent low-mass X-ray binary, four variable stars, and five stars that are either associated with ω Cen's anomalous red giant branch, or are sub-subgiants. We estimate that the cluster contains 40 ± 10 CVs with Lx > 1031 erg s-1, confirming that CVs are underabundant in ω Cen relative to the field. Intrinsic absorption is required to fit X-ray spectra of six of the nine brightest CVs, suggesting magnetic CVs, or high-inclination systems. Though no radio millisecond pulsars (MSPs) are currently known in ω Cen, more than 30 unidentified sources have luminosities and X-ray colours like those of MSPs found in other globular clusters; these could be responsible for the Fermi-detected gamma-ray emission from the cluster. Finally, we identify a CH star as the counterpart to the second-brightest X-ray source in the cluster and argue that it is a symbiotic star. This is the first such giant/white dwarf binary to be identified in a globular cluster.

  15. A Locally Optimal Algorithm for Estimating a Generating Partition from an Observed Time Series and Its Application to Anomaly Detection.

    PubMed

    Ghalyan, Najah F; Miller, David J; Ray, Asok

    2018-06-12

    Estimation of a generating partition is critical for symbolization of measurements from discrete-time dynamical systems, where a sequence of symbols from a (finite-cardinality) alphabet may uniquely specify the underlying time series. Such symbolization is useful for computing measures (e.g., Kolmogorov-Sinai entropy) to identify or characterize the (possibly unknown) dynamical system. It is also useful for time series classification and anomaly detection. The seminal work of Hirata, Judd, and Kilminster (2004) derives a novel objective function, akin to a clustering objective, that measures the discrepancy between a set of reconstruction values and the points from the time series. They cast estimation of a generating partition via the minimization of their objective function. Unfortunately, their proposed algorithm is nonconvergent, with no guarantee of finding even locally optimal solutions with respect to their objective. The difficulty is a heuristic-nearest neighbor symbol assignment step. Alternatively, we develop a novel, locally optimal algorithm for their objective. We apply iterative nearest-neighbor symbol assignments with guaranteed discrepancy descent, by which joint, locally optimal symbolization of the entire time series is achieved. While most previous approaches frame generating partition estimation as a state-space partitioning problem, we recognize that minimizing the Hirata et al. (2004) objective function does not induce an explicit partitioning of the state space, but rather the space consisting of the entire time series (effectively, clustering in a (countably) infinite-dimensional space). Our approach also amounts to a novel type of sliding block lossy source coding. Improvement, with respect to several measures, is demonstrated over popular methods for symbolizing chaotic maps. We also apply our approach to time-series anomaly detection, considering both chaotic maps and failure application in a polycrystalline alloy material.

  16. Multi-Wheat-Model Ensemble Responses to Interannual Climate Variability

    NASA Technical Reports Server (NTRS)

    Ruane, Alex C.; Hudson, Nicholas I.; Asseng, Senthold; Camarrano, Davide; Ewert, Frank; Martre, Pierre; Boote, Kenneth J.; Thorburn, Peter J.; Aggarwal, Pramod K.; Angulo, Carlos

    2016-01-01

    We compare 27 wheat models' yield responses to interannual climate variability, analyzed at locations in Argentina, Australia, India, and The Netherlands as part of the Agricultural Model Intercomparison and Improvement Project (AgMIP) Wheat Pilot. Each model simulated 1981e2010 grain yield, and we evaluate results against the interannual variability of growing season temperature, precipitation, and solar radiation. The amount of information used for calibration has only a minor effect on most models' climate response, and even small multi-model ensembles prove beneficial. Wheat model clusters reveal common characteristics of yield response to climate; however models rarely share the same cluster at all four sites indicating substantial independence. Only a weak relationship (R2 0.24) was found between the models' sensitivities to interannual temperature variability and their response to long-termwarming, suggesting that additional processes differentiate climate change impacts from observed climate variability analogs and motivating continuing analysis and model development efforts.

  17. The importance of considering rainfall partitioning in afforestation initiatives in semiarid climates: A comparison of common planted tree species in Tehran, Iran.

    PubMed

    Sadeghi, Seyed Mohammad Moein; Attarod, Pedram; Van Stan, John Toland; Pypker, Thomas Grant

    2016-10-15

    As plantations become increasingly important sources of wood and fiber in arid/semiarid places, they have also become increasingly criticized for their hydrological impacts. An examination and comparison of gross rainfall (GR) partitioning across commonly-planted tree species (Pinus eldarica, Cupressus arizonica, Robinia pseudoacacia, and Fraxinus rotundifolia) in semiarid regions has great value for watershed and forest managers interested in managing canopy hydrological processes for societal benefit. Therefore, we performed a field study examining GR partitioning into throughfall (TF), stemflow (SF), and rainfall interception (I) for these species in the semiarid Chitgar Forest Park, Tehran, Iran. An advantage to our study is that we explore the effects of forest structural differences in plantation forests experiencing similar climatic factors and storm conditions. As such, variability in GR partitioning due to different meteorological conditions is minimized, allowing comparison of structural attributes across plantations. Our results show that commonly-selected afforestation species experiencing the same climate produced differing stand structures that differentially partition GR into TF, SF, and I. P. eldarica might be the best of the four species to plant if the primary goal of afforestation is to limit erosion and stormwater runoff as it intercepted more rainfall than other species. However, the high SF generation from F. rotundifolia, and low GR necessary to initiate SF, could maximize retention of water in the soils since SF has been shown to infiltrate along root pathways and access groundwater. A consideration of GR partitioning should be considered when selecting a species for afforestation/reforestation in water-limited ecosystems. Copyright © 2016 Elsevier B.V. All rights reserved.

  18. SciSpark: Highly Interactive and Scalable Model Evaluation and Climate Metrics

    NASA Astrophysics Data System (ADS)

    Wilson, B. D.; Palamuttam, R. S.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; Verma, R.; Waliser, D. E.; Lee, H.

    2015-12-01

    Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based ApacheTM Hadoop by 100x in memory and by 10x on disk. SciSpark will enable scalable model evaluation by executing large-scale comparisons of A-Train satellite observations to model grids on a cluster of 10 to 1000 compute nodes. This 2nd generation capability for NASA's Regional Climate Model Evaluation System (RCMES) will compute simple climate metrics at interactive speeds, and extend to quite sophisticated iterative algorithms such as machine-learning based clustering of temperature PDFs, and even graph-based algorithms for searching for Mesocale Convective Complexes. We have implemented a parallel data ingest capability in which the user specifies desired variables (arrays) as several time-sorted lists of URL's (i.e. using OPeNDAP model.nc?varname, or local files). The specified variables are partitioned by time/space and then each Spark node pulls its bundle of arrays into memory to begin a computation pipeline. We also investigated the performance of several N-dim. array libraries (scala breeze, java jblas & netlib-java, and ND4J). We are currently developing science codes using ND4J and studying memory behavior on the JVM. On the pyspark side, many of our science codes already use the numpy and SciPy ecosystems. The talk will cover: the architecture of SciSpark, the design of the scientific RDD (sRDD) data structure, our efforts to integrate climate science algorithms in Python and Scala, parallel ingest and partitioning of A-Train satellite observations from HDF files and model grids from netCDF files, first parallel runs to compute comparison statistics and PDF's, and first metrics quantifying parallel speedups and memory & disk usage.

  19. Solfatara volcano subsurface imaging: two different approaches to process and interpret multi-variate data sets

    NASA Astrophysics Data System (ADS)

    Bernardinetti, Stefano; Bruno, Pier Paolo; Lavoué, François; Gresse, Marceau; Vandemeulebrouck, Jean; Revil, André

    2017-04-01

    The need to reduce model uncertainty and produce a more reliable geophysical imaging and interpretations is nowadays a fundamental task required to geophysics techniques applied in complex environments such as Solfatara Volcano. The use of independent geophysical methods allows to obtain many information on the subsurface due to the different sensitivities of the data towards parameters such as compressional and shearing wave velocities, bulk electrical conductivity, or density. The joint processing of these multiple physical properties can lead to a very detailed characterization of the subsurface and therefore enhance our imaging and our interpretation. In this work, we develop two different processing approaches based on reflection seismology and seismic P-wave tomography on one hand, and electrical data acquired over the same line, on the other hand. From these data, we obtain an image-guided electrical resistivity tomography and a post processing integration of tomographic results. The image-guided electrical resistivity tomography is obtained by regularizing the inversion of the electrical data with structural constraints extracted from a migrated seismic section using image processing tools. This approach enables to focus the reconstruction of electrical resistivity anomalies along the features visible in the seismic section, and acts as a guide for interpretation in terms of subsurface structures and processes. To integrate co-registrated P-wave velocity and electrical resistivity values, we apply a data mining tool, the k-means algorithm, to individuate relationships between the two set of variables. This algorithm permits to individuate different clusters with the objective to minimize the sum of squared Euclidean distances within each cluster and maximize it between clusters for the multivariate data set. We obtain a partitioning of the multivariate data set in a finite number of well-correlated clusters, representative of the optimum clustering of our geophysical variables (P-wave velocities and electrical resistivities). The result is an integrated tomography that shows a finite number of homogeneous geophysical facies, and therefore permits to highlight the main geological features of the subsurface.

  20. Size distribution and clothing-air partitioning of polycyclic aromatic hydrocarbons generated by barbecue.

    PubMed

    Lao, Jia-Yong; Wu, Chen-Chou; Bao, Lian-Jun; Liu, Liang-Ying; Shi, Lei; Zeng, Eddy Y

    2018-10-15

    Barbecue (BBQ) is one of the most popular cooking activities with charcoal worldwide and produces abundant polycyclic aromatic hydrocarbons (PAHs) and particulate matter. Size distribution and clothing-air partitioning of particle-bound PAHs are significant for assessing potential health hazards to humans due to exposure to BBQ fumes, but have not been examined adequately. To address this issue, particle and gaseous samples were collected at 2-m and 10-m distances from a cluster of four BBQ stoves. Personal samplers and cotton clothes were carried by volunteers sitting near the BBQ stoves. Particle-bound PAHs (especially 4-6 rings) derived from BBQ fumes were mostly affiliated with fine particles in the size range of 0.18-1.8 μm. High molecular-weight PAHs were mostly unimodal peaking in fine particles and consequently had small geometric mean diameters and standard deviations. Source diagnostics indicated that particle-bound PAHs in BBQ fumes were generated primarily by combustion of charcoal, fat content in food, and oil. The influences of BBQ fumes on the occurrence of particle-bound PAHs decreased with increasing distance from BBQ stoves, due to increased impacts of ambient sources, especially by petrogenic sources and to a lesser extent by wind speed and direction. Octanol-air and clothing-air partition coefficients of PAHs obtained from personal air samples were significantly correlated to each other. High molecular-weight PAHs had higher area-normalized clothing-air partition coefficients in cotton clothes, i.e., cotton fabrics may be a significant reservoir of higher molecular-weight PAHs. Particle-bound PAHs from barbecue fumes are generated largely from charcoal combustion and food-charred emissions and mainly affiliated with fine particles. Copyright © 2018. Published by Elsevier B.V.

  1. West Virginia US Department of Energy experimental program to stimulate competitive research. Section 2: Human resource development; Section 3: Carbon-based structural materials research cluster; Section 3: Data parallel algorithms for scientific computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1994-02-02

    This report consists of three separate but related reports. They are (1) Human Resource Development, (2) Carbon-based Structural Materials Research Cluster, and (3) Data Parallel Algorithms for Scientific Computing. To meet the objectives of the Human Resource Development plan, the plan includes K--12 enrichment activities, undergraduate research opportunities for students at the state`s two Historically Black Colleges and Universities, graduate research through cluster assistantships and through a traineeship program targeted specifically to minorities, women and the disabled, and faculty development through participation in research clusters. One research cluster is the chemistry and physics of carbon-based materials. The objective of thismore » cluster is to develop a self-sustaining group of researchers in carbon-based materials research within the institutions of higher education in the state of West Virginia. The projects will involve analysis of cokes, graphites and other carbons in order to understand the properties that provide desirable structural characteristics including resistance to oxidation, levels of anisotropy and structural characteristics of the carbons themselves. In the proposed cluster on parallel algorithms, research by four WVU faculty and three state liberal arts college faculty are: (1) modeling of self-organized critical systems by cellular automata; (2) multiprefix algorithms and fat-free embeddings; (3) offline and online partitioning of data computation; and (4) manipulating and rendering three dimensional objects. This cluster furthers the state Experimental Program to Stimulate Competitive Research plan by building on existing strengths at WVU in parallel algorithms.« less

  2. Comprehensive cluster analysis with Transitivity Clustering.

    PubMed

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  3. EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.

    PubMed

    Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina

    2009-04-01

    In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.

  4. Accelerating semantic graph databases on commodity clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morari, Alessandro; Castellana, Vito G.; Haglin, David J.

    We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.

  5. A Random Walk Approach to Query Informative Constraints for Clustering.

    PubMed

    Abin, Ahmad Ali

    2017-08-09

    This paper presents a random walk approach to the problem of querying informative constraints for clustering. The proposed method is based on the properties of the commute time, that is the expected time taken for a random walk to travel between two nodes and return, on the adjacency graph of data. Commute time has the nice property of that, the more short paths connect two given nodes in a graph, the more similar those nodes are. Since computing the commute time takes the Laplacian eigenspectrum into account, we use this property in a recursive fashion to query informative constraints for clustering. At each recursion, the proposed method constructs the adjacency graph of data and utilizes the spectral properties of the commute time matrix to bipartition the adjacency graph. Thereafter, the proposed method benefits from the commute times distance on graph to query informative constraints between partitions. This process iterates for each partition until the stop condition becomes true. Experiments on real-world data show the efficiency of the proposed method for constraints selection.

  6. Segmentation by fusion of histogram-based k-means clusters in different color spaces.

    PubMed

    Mignotte, Max

    2008-05-01

    This paper presents a new, simple, and efficient segmentation approach, based on a fusion procedure which aims at combining several segmentation maps associated to simpler partition models in order to finally get a more reliable and accurate segmentation result. The different label fields to be fused in our application are given by the same and simple (K-means based) clustering technique on an input image expressed in different color spaces. Our fusion strategy aims at combining these segmentation maps with a final clustering procedure using as input features, the local histogram of the class labels, previously estimated and associated to each site and for all these initial partitions. This fusion framework remains simple to implement, fast, general enough to be applied to various computer vision applications (e.g., motion detection and segmentation), and has been successfully applied on the Berkeley image database. The experiments herein reported in this paper illustrate the potential of this approach compared to the state-of-the-art segmentation methods recently proposed in the literature.

  7. Semi-supervised clustering for parcellating brain regions based on resting state fMRI data

    NASA Astrophysics Data System (ADS)

    Cheng, Hewei; Fan, Yong

    2014-03-01

    Many unsupervised clustering techniques have been adopted for parcellating brain regions of interest into functionally homogeneous subregions based on resting state fMRI data. However, the unsupervised clustering techniques are not able to take advantage of exiting knowledge of the functional neuroanatomy readily available from studies of cytoarchitectonic parcellation or meta-analysis of the literature. In this study, we propose a semi-supervised clustering method for parcellating amygdala into functionally homogeneous subregions based on resting state fMRI data. Particularly, the semi-supervised clustering is implemented under the framework of graph partitioning, and adopts prior information and spatial consistent constraints to obtain a spatially contiguous parcellation result. The graph partitioning problem is solved using an efficient algorithm similar to the well-known weighted kernel k-means algorithm. Our method has been validated for parcellating amygdala into 3 subregions based on resting state fMRI data of 28 subjects. The experiment results have demonstrated that the proposed method is more robust than unsupervised clustering and able to parcellate amygdala into centromedial, laterobasal, and superficial parts with improved functionally homogeneity compared with the cytoarchitectonic parcellation result. The validity of the parcellation results is also supported by distinctive functional and structural connectivity patterns of the subregions and high consistency between coactivation patterns derived from a meta-analysis and functional connectivity patterns of corresponding subregions.

  8. Performance Analysis of Entropy Methods on K Means in Clustering Process

    NASA Astrophysics Data System (ADS)

    Dicky Syahputra Lubis, Mhd.; Mawengkang, Herman; Suwilo, Saib

    2017-12-01

    K Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters / groups. This method partitions the data into clusters / groups so that data that have the same characteristics are grouped into the same cluster and data that have different characteristics are grouped into other groups.The purpose of this data clustering is to minimize the objective function set in the clustering process, which generally attempts to minimize variation within a cluster and maximize the variation between clusters. However, the main disadvantage of this method is that the number k is often not known before. Furthermore, a randomly chosen starting point may cause two points to approach the distance to be determined as two centroids. Therefore, for the determination of the starting point in K Means used entropy method where this method is a method that can be used to determine a weight and take a decision from a set of alternatives. Entropy is able to investigate the harmony in discrimination among a multitude of data sets. Using Entropy criteria with the highest value variations will get the highest weight. Given this entropy method can help K Means work process in determining the starting point which is usually determined at random. Thus the process of clustering on K Means can be more quickly known by helping the entropy method where the iteration process is faster than the K Means Standard process. Where the postoperative patient dataset of the UCI Repository Machine Learning used and using only 12 data as an example of its calculations is obtained by entropy method only with 2 times iteration can get the desired end result.

  9. Effects of variable-density thinning on understory diversity and heterogeneity in young Douglas-fir forests.

    Treesearch

    Juliann E. Aukema; Andrew B. Carey

    2008-01-01

    Nine years after variable-density thinning (VDT) on the Forest Ecosystem Study, we examined low understory vegetation in 60 plots of eight stands (four pairs of VDT and control). We compared native, exotic, ruderal, and nonforest species richness among the stands. We used clustering, ordination, and indicator species analysis to look for distinctive patches of plant...

  10. Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

    PubMed

    Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

    2014-10-30

    Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.

  11. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    PubMed

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  12. Quantifying clutter: A comparison of four methods and their relationship to bat detection

    Treesearch

    Joy M. O’Keefe; Susan C. Loeb; Hoke S. Hill Jr.; J. Drew Lanham

    2014-01-01

    The degree of spatial complexity in the environment, or clutter, affects the quality of foraging habitats for bats and their detection with acoustic systems. Clutter has been assessed in a variety of ways but there are no standardized methods for measuring clutter. We compared four methods (Visual Clutter, Cluster, Single Variable, and Clutter Index) and related these...

  13. Additional support for Afrotheria and Paenungulata, the performance of mitochondrial versus nuclear genes, and the impact of data partitions with heterogeneous base composition.

    PubMed

    Springer, M S; Amrine, H M; Burk, A; Stanhope, M J

    1999-03-01

    We concatenated sequences for four mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA, cytochrome b) and four nuclear genes [aquaporin, alpha 2B adrenergic receptor (A2AB), interphotoreceptor retinoid-binding protein (IRBP), von Willebrand factor (vWF)] into a multigene data set representing 11 eutherian orders (Artiodactyla, Hyracoidea, Insectivora, Lagomorpha, Macroscelidea, Perissodactyla, Primates, Proboscidea, Rodentia, Sirenia, Tubulidentata). Within this data set, we recognized nine mitochondrial partitions (both stems and loops, for each of 12S rRNA, tRNA valine, and 16S rRNA; and first, second, and third codon positions of cytochrome b) and 12 nuclear partitions (first, second, and third codon positions, respectively, of each of the four nuclear genes). Four of the 21 partitions (third positions of cytochrome b, A2AB, IRBP, and vWF) showed significant heterogeneity in base composition across taxa. Phylogenetic analyses (parsimony, minimum evolution, maximum likelihood) based on sequences for all 21 partitions provide 99-100% bootstrap support for Afrotheria and Paenungulata. With the elimination of the four partitions exhibiting heterogeneity in base composition, there is also high bootstrap support (89-100%) for cow + horse. Statistical tests reject Altungulata, Anagalida, and Ungulata. Data set heterogeneity between mitochondrial and nuclear genes is most evident when all partitions are included in the phylogenetic analyses. Mitochondrial-gene trees associate cow with horse, whereas nuclear-gene trees associate cow with hedgehog and these two with horse. However, after eliminating third positions of A2AB, IRBP, and vWF, nuclear data agree with mitochondrial data in supporting cow + horse. Nuclear genes provide stronger support for both Afrotheria and Paenungulata. Removal of third positions of cytochrome b results in improved performance for the mitochondrial genes in recovering these clades.

  14. Exploring Different Patterns of Love Attitudes among Chinese College Students

    PubMed Central

    Zeng, Xianglong; Pan, Yiqin; Zhou, Han; Yu, Shi; Liu, Xiangping

    2016-01-01

    Individual differences in love attitudes and the relationship between love attitudes and other variables in Asian culture lack in-depth exploration. This study conducted cluster analysis with data regarding love attitudes obtained from 389 college students in mainland China. The result of cluster analysis based on love-attitude scales distinguished four types of students: game players, rational lovers, emotional lovers, and absence lovers. These four groups of students showed significant differences in sexual attitudes and personality traits of deliberation and dutifulness but not self-discipline. The study’s implications for future studies on love attitudes in certain cultural groups were also discussed. PMID:27851784

  15. Free energy of singular sticky-sphere clusters.

    PubMed

    Kallus, Yoav; Holmes-Cerfon, Miranda

    2017-02-01

    Networks of particles connected by springs model many condensed-matter systems, from colloids interacting with a short-range potential and complex fluids near jamming, to self-assembled lattices and various metamaterials. Under small thermal fluctuations the vibrational entropy of a ground state is given by the harmonic approximation if it has no zero-frequency vibrational modes, yet such singular modes are at the epicenter of many interesting behaviors in the systems above. We consider a system of N spherical particles, and directly account for the singularities that arise in the sticky limit where the pairwise interaction is strong and short ranged. Although the contribution to the partition function from singular clusters diverges in the limit, its asymptotic value can be calculated and depends on only two parameters, characterizing the depth and range of the potential. The result holds for systems that are second-order rigid, a geometric characterization that describes all known ground-state (rigid) sticky clusters. To illustrate the applications of our theory we address the question of emergence: how does crystalline order arise in large systems when it is strongly disfavored in small ones? We calculate the partition functions of all known rigid clusters up to N≤21 and show the cluster landscape is dominated by hyperstatic clusters (those with more than 3N-6 contacts); singular and isostatic clusters are far less frequent, despite their extra vibrational and configurational entropies. Since the most hyperstatic clusters are close to fragments of a close-packed lattice, this underlies the emergence of order in sticky-sphere systems, even those as small as N=10.

  16. Free energy of singular sticky-sphere clusters

    NASA Astrophysics Data System (ADS)

    Kallus, Yoav; Holmes-Cerfon, Miranda

    2017-02-01

    Networks of particles connected by springs model many condensed-matter systems, from colloids interacting with a short-range potential and complex fluids near jamming, to self-assembled lattices and various metamaterials. Under small thermal fluctuations the vibrational entropy of a ground state is given by the harmonic approximation if it has no zero-frequency vibrational modes, yet such singular modes are at the epicenter of many interesting behaviors in the systems above. We consider a system of N spherical particles, and directly account for the singularities that arise in the sticky limit where the pairwise interaction is strong and short ranged. Although the contribution to the partition function from singular clusters diverges in the limit, its asymptotic value can be calculated and depends on only two parameters, characterizing the depth and range of the potential. The result holds for systems that are second-order rigid, a geometric characterization that describes all known ground-state (rigid) sticky clusters. To illustrate the applications of our theory we address the question of emergence: how does crystalline order arise in large systems when it is strongly disfavored in small ones? We calculate the partition functions of all known rigid clusters up to N ≤21 and show the cluster landscape is dominated by hyperstatic clusters (those with more than 3 N -6 contacts); singular and isostatic clusters are far less frequent, despite their extra vibrational and configurational entropies. Since the most hyperstatic clusters are close to fragments of a close-packed lattice, this underlies the emergence of order in sticky-sphere systems, even those as small as N =10 .

  17. Normalized Cut Algorithm for Automated Assignment of Protein Domains

    NASA Technical Reports Server (NTRS)

    Samanta, M. P.; Liang, S.; Zha, H.; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    We present a novel computational method for automatic assignment of protein domains from structural data. At the core of our algorithm lies a recently proposed clustering technique that has been very successful for image-partitioning applications. This grap.,l-theory based clustering method uses the notion of a normalized cut to partition. an undirected graph into its strongly-connected components. Computer implementation of our method tested on the standard comparison set of proteins from the literature shows a high success rate (84%), better than most existing alternative In addition, several other features of our algorithm, such as reliance on few adjustable parameters, linear run-time with respect to the size of the protein and reduced complexity compared to other graph-theory based algorithms, would make it an attractive tool for structural biologists.

  18. Understanding continental megathrust earthquake potential through geological mountain building processes: an example in Nepal Himalaya

    NASA Astrophysics Data System (ADS)

    Zhang, Huai; Zhang, Zhen; Wang, Liangshu; Leroy, Yves; shi, Yaolin

    2017-04-01

    How to reconcile continent megathrust earthquake characteristics, for instances, mapping the large-great earthquake sequences into geological mountain building process, as well as partitioning the seismic-aseismic slips, is fundamental and unclear. Here, we scope these issues by focusing a typical continental collisional belt, the great Nepal Himalaya. We first prove that refined Nepal Himalaya thrusting sequences, with accurately defining of large earthquake cycle scale, provide new geodynamical hints on long-term earthquake potential in association with, either seismic-aseismic slip partition up to the interpretation of the binary interseismic coupling pattern on the Main Himalayan Thrust (MHT), or the large-great earthquake classification via seismic cycle patterns on MHT. Subsequently, sequential limit analysis is adopted to retrieve the detailed thrusting sequences of Nepal Himalaya mountain wedge. Our model results exhibit apparent thrusting concentration phenomenon with four thrusting clusters, entitled as thrusting 'families', to facilitate the development of sub-structural regions respectively. Within the hinterland thrusting family, the total aseismic shortening and the corresponding spatio-temporal release pattern are revealed by mapping projection. Whereas, in the other three families, mapping projection delivers long-term large (M<8)-great (M>8) earthquake recurrence information, including total lifespans, frequencies and large-great earthquake alternation information by identifying rupture distances along the MHT. In addition, this partition has universality in continental-continental collisional orogenic belt with identified interseismic coupling pattern, while not applicable in continental-oceanic megathrust context.

  19. Genetic variability among elite popcorn lines based on molecular and morphoagronomic characteristics.

    PubMed

    Dos Santos, J F; Mangolin, C A; Machado, M F P S; Scapim, C A; Giordani, W; Gonçalves, L S A

    2017-06-29

    Knowledge of genetic diversity among genotypes and relationships among elite lines is of great importance for the development of breeding programs. Therefore, the objective of this study was to evaluate genetic variability based on the morphoagronomic and molecular characterization of 18 elite popcorn (Zea mays var. everta) lines to be used by Universidade Estadual de Maringá breeding programs. We used 31 microsatellite primers (widely distributed in the genome), and 16 morphological descriptors (including the resistance to maize white spot, common rust, polysora rust of maize, cercospora and leaf blights). The molecular data revealed variability among the lines, which were divided into four groups that were partially concordant with unweighted pair group method with arithmetic mean (UPMGA) and Bayesian clusters. The lines G3, G4, G11, and G13 exhibited favorable morphological characters and low disease incidence rates. The four groups were confirmed using the Gower distance in the UPGMA cluster; however, there was no association with the dissimilarity patterns obtained using the molecular data. The absence of a correlation suggests that both characterizations (morphoagronomic and molecular) are important for discriminating among elite popcorn lines.

  20. PAQ: Partition Analysis of Quasispecies.

    PubMed

    Baccam, P; Thompson, R J; Fedrigo, O; Carpenter, S; Cornette, J L

    2001-01-01

    The complexities of genetic data may not be accurately described by any single analytical tool. Phylogenetic analysis is often used to study the genetic relationship among different sequences. Evolutionary models and assumptions are invoked to reconstruct trees that describe the phylogenetic relationship among sequences. Genetic databases are rapidly accumulating large amounts of sequences. Newly acquired sequences, which have not yet been characterized, may require preliminary genetic exploration in order to build models describing the evolutionary relationship among sequences. There are clustering techniques that rely less on models of evolution, and thus may provide nice exploratory tools for identifying genetic similarities. Some of the more commonly used clustering methods perform better when data can be grouped into mutually exclusive groups. Genetic data from viral quasispecies, which consist of closely related variants that differ by small changes, however, may best be partitioned by overlapping groups. We have developed an intuitive exploratory program, Partition Analysis of Quasispecies (PAQ), which utilizes a non-hierarchical technique to partition sequences that are genetically similar. PAQ was used to analyze a data set of human immunodeficiency virus type 1 (HIV-1) envelope sequences isolated from different regions of the brain and another data set consisting of the equine infectious anemia virus (EIAV) regulatory gene rev. Analysis of the HIV-1 data set by PAQ was consistent with phylogenetic analysis of the same data, and the EIAV rev variants were partitioned into two overlapping groups. PAQ provides an additional tool which can be used to glean information from genetic data and can be used in conjunction with other tools to study genetic similarities and genetic evolution of viral quasispecies.

  1. SLURM: Simple Linux Utility for Resource Management

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jette, M; Dunlap, C; Garlick, J

    2002-07-08

    Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling and stream copy modules. The design also includes a scalable, general-purpose communication infrastructure. This paper presents a overview of the SLURM architecture and functionality.

  2. A Latent Class Multidimensional Scaling Model for Two-Way One-Mode Continuous Rating Dissimilarity Data

    ERIC Educational Resources Information Center

    Vera, J. Fernando; Macias, Rodrigo; Heiser, Willem J.

    2009-01-01

    In this paper, we propose a cluster-MDS model for two-way one-mode continuous rating dissimilarity data. The model aims at partitioning the objects into classes and simultaneously representing the cluster centers in a low-dimensional space. Under the normal distribution assumption, a latent class model is developed in terms of the set of…

  3. How to cluster in parallel with neural networks

    NASA Technical Reports Server (NTRS)

    Kamgar-Parsi, Behzad; Gualtieri, J. A.; Devaney, Judy E.; Kamgar-Parsi, Behrooz

    1988-01-01

    Partitioning a set of N patterns in a d-dimensional metric space into K clusters - in a way that those in a given cluster are more similar to each other than the rest - is a problem of interest in astrophysics, image analysis and other fields. As there are approximately K(N)/K (factorial) possible ways of partitioning the patterns among K clusters, finding the best solution is beyond exhaustive search when N is large. Researchers show that this problem can be formulated as an optimization problem for which very good, but not necessarily optimal solutions can be found by using a neural network. To do this the network must start from many randomly selected initial states. The network is simulated on the MPP (a 128 x 128 SIMD array machine), where researchers use the massive parallelism not only in solving the differential equations that govern the evolution of the network, but also by starting the network from many initial states at once, thus obtaining many solutions in one run. Researchers obtain speedups of two to three orders of magnitude over serial implementations and the promise through Analog VLSI implementations of speedups comensurate with human perceptual abilities.

  4. Analyzing Sub-Classifications of Glaucoma via SOM Based Clustering of Optic Nerve Images.

    PubMed

    Yan, Sanjun; Abidi, Syed Sibte Raza; Artes, Paul Habib

    2005-01-01

    We present a data mining framework to cluster optic nerve images obtained by Confocal Scanning Laser Tomography (CSLT) in normal subjects and patients with glaucoma. We use self-organizing maps and expectation maximization methods to partition the data into clusters that provide insights into potential sub-classification of glaucoma based on morphological features. We conclude that our approach provides a first step towards a better understanding of morphological features in optic nerve images obtained from glaucoma patients and healthy controls.

  5. Identification of sensitive parameters in the modeling of SVOC reemission processes from soil to atmosphere.

    PubMed

    Loizeau, Vincent; Ciffroy, Philippe; Roustan, Yelva; Musson-Genon, Luc

    2014-09-15

    Semi-volatile organic compounds (SVOCs) are subject to Long-Range Atmospheric Transport because of transport-deposition-reemission successive processes. Several experimental data available in the literature suggest that soil is a non-negligible contributor of SVOCs to atmosphere. Then coupling soil and atmosphere in integrated coupled models and simulating reemission processes can be essential for estimating atmospheric concentration of several pollutants. However, the sources of uncertainty and variability are multiple (soil properties, meteorological conditions, chemical-specific parameters) and can significantly influence the determination of reemissions. In order to identify the key parameters in reemission modeling and their effect on global modeling uncertainty, we conducted a sensitivity analysis targeted on the 'reemission' output variable. Different parameters were tested, including soil properties, partition coefficients and meteorological conditions. We performed EFAST sensitivity analysis for four chemicals (benzo-a-pyrene, hexachlorobenzene, PCB-28 and lindane) and different spatial scenari (regional and continental scales). Partition coefficients between air, solid and water phases are influent, depending on the precision of data and global behavior of the chemical. Reemissions showed a lower variability to soil parameters (soil organic matter and water contents at field capacity and wilting point). A mapping of these parameters at a regional scale is sufficient to correctly estimate reemissions when compared to other sources of uncertainty. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. A Genome Wide Survey of SNP Variation Reveals the Genetic Structure of Sheep Breeds

    PubMed Central

    Kijas, James W.; Townley, David; Dalrymple, Brian P.; Heaton, Michael P.; Maddox, Jillian F.; McGrath, Annette; Wilson, Peter; Ingersoll, Roxann G.; McCulloch, Russell; McWilliam, Sean; Tang, Dave; McEwan, John; Cockett, Noelle; Oddy, V. Hutton; Nicholas, Frank W.; Raadsma, Herman

    2009-01-01

    The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability. PMID:19270757

  7. Identifying sighting clusters of endangered taxa with historical records.

    PubMed

    Duffy, Karl J

    2011-04-01

    The probability and time of extinction of taxa is often inferred from statistical analyses of historical records. Many of these analyses require the exclusion of multiple records within a unit of time (i.e., a month or a year). Nevertheless, spatially explicit, temporally aggregated data may be useful for identifying clusters of sightings (i.e., sighting clusters) in space and time. Identification of sighting clusters highlights changes in the historical recording of endangered taxa. I used two methods to identify sighting clusters in historical records: the Ederer-Myers-Mantel (EMM) test and the space-time permutation scan (STPS). I applied these methods to the spatially explicit sighting records of three species of orchids that are listed as endangered in the Republic of Ireland under the Wildlife Act (1976): Cephalanthera longifolia, Hammarbya paludosa, and Pseudorchis albida. Results with the EMM test were strongly affected by the choice of the time interval, and thus the number of temporal samples, used to examine the records. For example, sightings of P. albida clustered when the records were partitioned into 20-year temporal samples, but not when they were partitioned into 22-year temporal samples. Because the statistical power of EMM was low, it will not be useful when data are sparse. Nevertheless, the STPS identified regions that contained sighting clusters because it uses a flexible scanning window (defined by cylinders of varying size that move over the study area and evaluate the likelihood of clustering) to detect them, and it identified regions with high and regions with low rates of orchid sightings. The STPS analyses can be used to detect sighting clusters of endangered species that may be related to regions of extirpation and may assist in the categorization of threat status. ©2010 Society for Conservation Biology.

  8. Topological structures in the equities market network

    PubMed Central

    Leibon, Gregory; Pauls, Scott; Rockmore, Daniel; Savell, Robert

    2008-01-01

    We present a new method for articulating scale-dependent topological descriptions of the network structure inherent in many complex systems. The technique is based on “partition decoupled null models,” a new class of null models that incorporate the interaction of clustered partitions into a random model and generalize the Gaussian ensemble. As an application, we analyze a correlation matrix derived from 4 years of close prices of equities in the New York Stock Exchange (NYSE) and National Association of Securities Dealers Automated Quotation (NASDAQ). In this example, we expose (i) a natural structure composed of 2 interacting partitions of the market that both agrees with and generalizes standard notions of scale (e.g., sector and industry) and (ii) structure in the first partition that is a topological manifestation of a well-known pattern of capital flow called “sector rotation.” Our approach gives rise to a natural form of multiresolution analysis of the underlying time series that naturally decomposes the basic data in terms of the effects of the different scales at which it clusters. We support our conclusions and show the robustness of the technique with a successful analysis on a simulated network with an embedded topological structure. The equities market is a prototypical complex system, and we expect that our approach will be of use in understanding a broad class of complex systems in which correlation structures are resident.

  9. Orbits of Four Very Massive Binaries in the R136 Cluster

    NASA Astrophysics Data System (ADS)

    Massey, Philip; Penny, Laura R.; Vukovich, Julia

    2002-02-01

    We present radial velocity and photometry for four early-type, massive, double-lined spectroscopic binaries in the R136 cluster. Three of these systems are eclipsing, allowing orbital inclinations to be determined. One of these systems, R136-38 (O3 V+O6 V), has one of the highest masses ever measured for the primary, 57 Msolar. Comparison of our masses with those derived from standard evolutionary tracks shows excellent agreement. We also identify five other light variables in the R136 cluster that are worthy of follow-up study. Based on observations made with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555. These observations are associated with proposal 8217.

  10. Diversity in phenotypic and nutritional traits in vegetable amaranth (Amaranthus tricolor), a nutritionally underutilised crop.

    PubMed

    Shukla, Sudhir; Bhargava, Atul; Chatterjee, Avijeet; Pandey, Avinash Chandra; Mishra, Brij K

    2010-01-15

    Assessment of genetic diversity in a crop-breeding programme helps in the identification of diverse parental combinations to create segregating progenies with maximum genetic variability and facilitates introgression of desirable genes from diverse germplasm into the available genetic base. In the present study, 39 strains of vegetable amaranth (Amaranthus tricolor) were evaluated for eight morphological and seven quality traits for two test seasons to study the extent of genetic divergence among the strains. Multivariate analysis showed that the first four principal components contributed 67.55% of the variability. Cluster analysis grouped the strains into six clusters that displayed a wide range of diversity for most of the traits. Cluster analysis has proved to be an effective method in grouping strains that may facilitate effective management and utilisation in crop-breeding programmes. The diverse strains falling in different clusters were identified, which can be utilised in different hybridisation programmes to develop high-foliage-yielding varieties rich in nutritional components. Copyright (c) 2009 Society of Chemical Industry.

  11. Origin and diversification of winged bean (Psophocarpus tetragonolobus (L.) DC.), a multipurpose underutilized legume.

    PubMed

    Yang, Shuyi; Grall, Aurélie; Chapman, Mark A

    2018-05-01

    For many crops, research into the origin and partitioning of genetic variation is limited and this can slow or prevent crop improvement programs. Many of these underutilized crops have traits that could be of benefit in a changing climate due to stress tolerance or nutritional properties. Winged bean (Psophocarpus tetragonolobus (L.) DC.) is one such crop. All parts of the plant can be eaten, from the roots to the seeds, and is high in protein as well as other micronutrients. The goal of our study was to identify the wild progenitor and analyze the partitioning of genetic variation in the crop. We used molecular phylogenetic analyses (cpDNA and nuclear ITS sequencing) to resolve relationships between all species in the genus, and population genetics (utilizing microsatellites) to identify genetic clusters of winged bean accessions and compare this to geography. We find that winged bean is genetically distinct from all other members of the genus. We also provide support for four groups of species in the genus, largely, but not completely, corresponding to the results of previous morphological analyses. Within winged bean, population genetic analysis using 10 polymorphic microsatellite markers suggests four genetic groups; however, there is little correspondence between the genetic variation and the geography of the accessions. The true wild progenitor of winged bean remains unknown (or is extinct). There has likely been large-scale cross-breeding, trade, and transport of winged bean and/or multiple origins of the crop. © 2018 Botanical Society of America.

  12. Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing

    PubMed Central

    Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud

    2015-01-01

    This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309

  13. Polynomial-Time Approximation Algorithm for the Problem of Cardinality-Weighted Variance-Based 2-Clustering with a Given Center

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Motkova, A. V.

    2018-01-01

    A strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters is considered. The solution criterion is the minimum of the sum (over both clusters) of weighted sums of squared distances from the elements of each cluster to its geometric center. The weights of the sums are equal to the cardinalities of the desired clusters. The center of one cluster is given as input, while the center of the other is unknown and is determined as the point of space equal to the mean of the cluster elements. A version of the problem is analyzed in which the cardinalities of the clusters are given as input. A polynomial-time 2-approximation algorithm for solving the problem is constructed.

  14. Partitioning the impact of environment and spatial structure on alpha and beta components of taxonomic, functional, and phylogenetic diversity in European ants.

    PubMed

    Arnan, Xavier; Cerdá, Xim; Retana, Javier

    2015-01-01

    We analyze the relative contribution of environmental and spatial variables to the alpha and beta components of taxonomic (TD), phylogenetic (PD), and functional (FD) diversity in ant communities found along different climate and anthropogenic disturbance gradients across western and central Europe, in order to assess the mechanisms structuring ant biodiversity. To this aim we calculated alpha and beta TD, PD, and FD for 349 ant communities, which included a total of 155 ant species; we examined 10 functional traits and phylogenetic relatedness. Variation partitioning was used to examine how much variation in ant diversity was explained by environmental and spatial variables. Autocorrelation in diversity measures and each trait's phylogenetic signal were also analyzed. We found strong autocorrelation in diversity measures. Both environmental and spatial variables significantly contributed to variation in TD, PD, and FD at both alpha and beta scales; spatial structure had the larger influence. The different facets of diversity showed similar patterns along environmental gradients. Environment explained a much larger percentage of variation in FD than in TD or PD. All traits demonstrated strong phylogenetic signals. Our results indicate that environmental filtering and dispersal limitations structure all types of diversity in ant communities. Strong dispersal limitations appear to have led to clustering of TD, PD, and FD in western and central Europe, probably because different historical and evolutionary processes generated different pools of species. Remarkably, these three facets of diversity showed parallel patterns along environmental gradients. Trait-mediated species sorting and niche conservatism appear to structure ant diversity, as evidenced by the fact that more variation was explained for FD and that all traits had strong phylogenetic signals. Since environmental variables explained much more variation in FD than in PD, functional diversity should be a better indicator of community assembly processes than phylogenetic diversity.

  15. Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.

    PubMed

    Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R

    2014-06-01

    Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic inflammation. Published by Mosby, Inc.

  16. A local search for a graph clustering problem

    NASA Astrophysics Data System (ADS)

    Navrotskaya, Anna; Il'ev, Victor

    2016-10-01

    In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.

  17. Inter-method Performance Study of Tumor Volumetry Assessment on Computed Tomography Test-retest Data

    PubMed Central

    Buckler, Andrew J.; Danagoulian, Jovanna; Johnson, Kjell; Peskin, Adele; Gavrielides, Marios A.; Petrick, Nicholas; Obuchowski, Nancy A.; Beaumont, Hubert; Hadjiiski, Lubomir; Jarecha, Rudresh; Kuhnigk, Jan-Martin; Mantri, Ninad; McNitt-Gray, Michael; Moltz, Jan Hendrik; Nyiri, Gergely; Peterson, Sam; Tervé, Pierre; Tietjen, Christian; von Lavante, Etienne; Ma, Xiaonan; Pierre, Samantha St.; Athelogou, Maria

    2015-01-01

    Rationale and objectives Tumor volume change has potential as a biomarker for diagnosis, therapy planning, and treatment response. Precision was evaluated and compared among semi-automated lung tumor volume measurement algorithms from clinical thoracic CT datasets. The results inform approaches and testing requirements for establishing conformance with the Quantitative Imaging Biomarker Alliance (QIBA) CT Volumetry Profile. Materials and Methods Industry and academic groups participated in a challenge study. Intra-algorithm repeatability and inter-algorithm reproducibility were estimated. Relative magnitudes of various sources of variability were estimated using a linear mixed effects model. Segmentation boundaries were compared to provide a basis on which to optimize algorithm performance for developers. Results Intra-algorithm repeatability ranged from 13% (best performing) to 100% (least performing), with most algorithms demonstrating improved repeatability as the tumor size increased. Inter-algorithm reproducibility determined in three partitions and found to be 58% for the four best performing groups, 70% for the set of groups meeting repeatability requirements, and 84% when all groups but the least performer were included. The best performing partition performed markedly better on tumors with equivalent diameters above 40 mm. Larger tumors benefitted by human editing but smaller tumors did not. One-fifth to one-half of the total variability came from sources independent of the algorithms. Segmentation boundaries differed substantially, not just in overall volume but in detail. Conclusions Nine of the twelve participating algorithms pass precision requirements similar to what is indicated in the QIBA Profile, with the caveat that the current study was not designed to explicitly evaluate algorithm Profile conformance. Change in tumor volume can be measured with confidence to within ±14% using any of these nine algorithms on tumor sizes above 10 mm. No partition of the algorithms were able to meet the QIBA requirements for interchangeability down to 10 mm, though the partition comprised of the best performing algorithms did meet this requirement above a tumor size of approximately 40 mm. PMID:26376841

  18. Genetic and metabolite diversity of Sardinian populations of Helichrysum italicum.

    PubMed

    Melito, Sara; Sias, Angela; Petretto, Giacomo L; Chessa, Mario; Pintore, Giorgio; Porceddu, Andrea

    2013-01-01

    Helichrysum italicum (Asteraceae) is a small shrub endemic to the Mediterranean Basin, growing in fragmented and diverse habitats. The species has attracted attention due to its secondary metabolite content, but little effort has as yet been dedicated to assessing the genetic and metabolite diversity present in these populations. Here, we describe the diversity of 50 H. italicum populations collected from a range of habitats in Sardinia. H. italicum plants were AFLP fingerprinted and the composition of their leaf essential oil characterized by GC-MS. The relationships between the genetic structure of the populations, soil, habitat and climatic variables and the essential oil chemotypes present were evaluated using Bayesian clustering, contingency analyses and AMOVA. The Sardinian germplasm could be partitioned into two AFLP-based clades. Populations collected from the southwestern region constituted a homogeneous group which remained virtually intact even at high levels of K. The second, much larger clade was more diverse. A positive correlation between genetic diversity and elevation suggested the action of natural purifying selection. Four main classes of compounds were identified among the essential oils, namely monoterpenes, oxygenated monoterpenes, sesquiterpenes and oxygenated sesquiterpenes. Oxygenated monoterpene levels were significantly correlated with the AFLP-based clade structure, suggesting a correspondence between gene pool and chemical diversity. The results suggest an association between chemotype, genetic diversity and collection location which is relevant for the planning of future collections aimed at identifying valuable sources of essential oil.

  19. Temporal information partitioning: Characterizing synergy, uniqueness, and redundancy in interacting environmental variables

    NASA Astrophysics Data System (ADS)

    Goodwell, Allison E.; Kumar, Praveen

    2017-07-01

    Information theoretic measures can be used to identify nonlinear interactions between source and target variables through reductions in uncertainty. In information partitioning, multivariate mutual information is decomposed into synergistic, unique, and redundant components. Synergy is information shared only when sources influence a target together, uniqueness is information only provided by one source, and redundancy is overlapping shared information from multiple sources. While this partitioning has been applied to provide insights into complex dependencies, several proposed partitioning methods overestimate redundant information and omit a component of unique information because they do not account for source dependencies. Additionally, information partitioning has only been applied to time-series data in a limited context, using basic pdf estimation techniques or a Gaussian assumption. We develop a Rescaled Redundancy measure (Rs) to solve the source dependency issue, and present Gaussian, autoregressive, and chaotic test cases to demonstrate its advantages over existing techniques in the presence of noise, various source correlations, and different types of interactions. This study constitutes the first rigorous application of information partitioning to environmental time-series data, and addresses how noise, pdf estimation technique, or source dependencies can influence detected measures. We illustrate how our techniques can unravel the complex nature of forcing and feedback within an ecohydrologic system with an application to 1 min environmental signals of air temperature, relative humidity, and windspeed. The methods presented here are applicable to the study of a broad range of complex systems composed of interacting variables.

  20. Variability and reliability of the vastus lateralis muscle anatomy.

    PubMed

    D'Arpa, Salvatore; Toia, Francesca; Brenner, Erich; Melloni, Carlo; Moschella, Francesco; Cordova, Adriana

    2016-08-01

    The aims of this study are to investigate the variability of the morphological and neurovascular anatomy of the vastus lateralis (VL) muscle and to describe the relationships among its intramuscular partitions and with the other muscles of the quadriceps femoris. Clinical implications in its reliability as a flap donor are also discussed. In 2012, the extra- and intramuscular neurovascular anatomy of the VL was investigated in 10 cadaveric lower limbs. In three specimens, the segmental arterial pedicles were injected with latex of different colors to point out their anastomotic connections. The morphological anatomy was investigated with regard to the mutual relationship of the three muscular partitions and the relation of the VL with the other muscles of the quadriceps femoris. The VL has a segmental morphological anatomy. However, the fibers of its three partitions interconnect individually and with the other bellies of the quadriceps femoris, particularly, in several variable portions with the vastus intermedius and mainly in the posterior part of the VL. The lateral circumflex femoral artery and its branches have variable origin, but demonstrate constant segmental distribution. Intramuscular dissection and colored latex injections show a rich anastomotic vascular network among the three partitions. Moderate variability exists in both the myological and the neurovascular anatomy of the VL. Despite this variability, the anatomy of the VL always has a constant segmental pattern, which makes the VL a reliable flap donor. Detailed knowledge of the VL anatomy could have useful applications in a broad clinical field.

  1. Exact low-temperature series expansion for the partition function of the zero-field Ising model on the infinite square lattice.

    PubMed

    Siudem, Grzegorz; Fronczak, Agata; Fronczak, Piotr

    2016-10-10

    In this paper, we provide the exact expression for the coefficients in the low-temperature series expansion of the partition function of the two-dimensional Ising model on the infinite square lattice. This is equivalent to exact determination of the number of spin configurations at a given energy. With these coefficients, we show that the ferromagnetic-to-paramagnetic phase transition in the square lattice Ising model can be explained through equivalence between the model and the perfect gas of energy clusters model, in which the passage through the critical point is related to the complete change in the thermodynamic preferences on the size of clusters. The combinatorial approach reported in this article is very general and can be easily applied to other lattice models.

  2. Exact low-temperature series expansion for the partition function of the zero-field Ising model on the infinite square lattice

    PubMed Central

    Siudem, Grzegorz; Fronczak, Agata; Fronczak, Piotr

    2016-01-01

    In this paper, we provide the exact expression for the coefficients in the low-temperature series expansion of the partition function of the two-dimensional Ising model on the infinite square lattice. This is equivalent to exact determination of the number of spin configurations at a given energy. With these coefficients, we show that the ferromagnetic–to–paramagnetic phase transition in the square lattice Ising model can be explained through equivalence between the model and the perfect gas of energy clusters model, in which the passage through the critical point is related to the complete change in the thermodynamic preferences on the size of clusters. The combinatorial approach reported in this article is very general and can be easily applied to other lattice models. PMID:27721435

  3. Partitioning of the degradation space for OCR training

    NASA Astrophysics Data System (ADS)

    Barney Smith, Elisa H.; Andersen, Tim

    2006-01-01

    Generally speaking optical character recognition algorithms tend to perform better when presented with homogeneous data. This paper studies a method that is designed to increase the homogeneity of training data, based on an understanding of the types of degradations that occur during the printing and scanning process, and how these degradations affect the homogeneity of the data. While it has been shown that dividing the degradation space by edge spread improves recognition accuracy over dividing the degradation space by threshold or point spread function width alone, the challenge is in deciding how many partitions and at what value of edge spread the divisions should be made. Clustering of different types of character features, fonts, sizes, resolutions and noise levels shows that edge spread is indeed shown to be a strong indicator of the homogeneity of character data clusters.

  4. Lagrangian based methods for coherent structure detection

    NASA Astrophysics Data System (ADS)

    Allshouse, Michael R.; Peacock, Thomas

    2015-09-01

    There has been a proliferation in the development of Lagrangian analytical methods for detecting coherent structures in fluid flow transport, yielding a variety of qualitatively different approaches. We present a review of four approaches and demonstrate the utility of these methods via their application to the same sample analytic model, the canonical double-gyre flow, highlighting the pros and cons of each approach. Two of the methods, the geometric and probabilistic approaches, are well established and require velocity field data over the time interval of interest to identify particularly important material lines and surfaces, and influential regions, respectively. The other two approaches, implementing tools from cluster and braid theory, seek coherent structures based on limited trajectory data, attempting to partition the flow transport into distinct regions. All four of these approaches share the common trait that they are objective methods, meaning that their results do not depend on the frame of reference used. For each method, we also present a number of example applications ranging from blood flow and chemical reactions to ocean and atmospheric flows.

  5. Clustering high dimensional data using RIA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aziz, Nazrina

    2015-05-15

    Clustering may simply represent a convenient method for organizing a large data set so that it can easily be understood and information can efficiently be retrieved. However, identifying cluster in high dimensionality data sets is a difficult task because of the curse of dimensionality. Another challenge in clustering is some traditional functions cannot capture the pattern dissimilarity among objects. In this article, we used an alternative dissimilarity measurement called Robust Influence Angle (RIA) in the partitioning method. RIA is developed using eigenstructure of the covariance matrix and robust principal component score. We notice that, it can obtain cluster easily andmore » hence avoid the curse of dimensionality. It is also manage to cluster large data sets with mixed numeric and categorical value.« less

  6. Coagulation-fragmentation for a finite number of particles and application to telomere clustering in the yeast nucleus

    NASA Astrophysics Data System (ADS)

    Hozé, Nathanaël; Holcman, David

    2012-01-01

    We develop a coagulation-fragmentation model to study a system composed of a small number of stochastic objects moving in a confined domain, that can aggregate upon binding to form local clusters of arbitrary sizes. A cluster can also dissociate into two subclusters with a uniform probability. To study the statistics of clusters, we combine a Markov chain analysis with a partition number approach. Interestingly, we obtain explicit formulas for the size and the number of clusters in terms of hypergeometric functions. Finally, we apply our analysis to study the statistical physics of telomeres (ends of chromosomes) clustering in the yeast nucleus and show that the diffusion-coagulation-fragmentation process can predict the organization of telomeres.

  7. Influence of acid volatile sulfides and metal concentrations on metal partitioning in contaminated sediments

    USGS Publications Warehouse

    Lee, J.-S.; Lee, B.-G.; Luoma, S.N.; Choi, H.J.; Koh, C.-H.; Brown, C.L.

    2000-01-01

    The influence of acid volatile sulfide (AVS) on the partitioning of Cd, Ni, and Zn in porewater (PW) and sediment as reactive metals (SEM, simultaneously extracted metals) was investigated in laboratory microcosms. Two spiking procedures were compared, and the effects of vertical geochemical gradients and infaunal activity were evaluated. Sediments were spiked with a Cd-Ni-Zn mixture (0.06, 3, 7.5 ??mol/g, respectively) containing four levels of AVS (0.5, 7.5, 15, 35 ??mol/g). The results were compared to sediments spiked with four levels of Cd-Ni-Zn mixtures at one AVS concentration (7.5 ??mol/g). A vertical redox gradient was generated in each treatment by an 18-d incubation with an oxidized water column. [AVS] in the surface sediments decreased by 65-95% due to oxidation during incubation; initial [AVS] was maintained at 0.5-7.5 cm depth. PW metal concentrations were correlated with [SEM - AVS] among all data. But PW metal concentrations were variable, causing the distribution coefficient, Kd(pw) (the ratio of [SEM] to PW metal concentrations) to vary by 2-3 orders of magnitude at a given [SEM - AVS]. One reason for the variability was that vertical profiles in PW metal concentrations appeared to be influenced by diffusion as well as [SEM - AVS]. The presence of animals appeared to enhance the diffusion of at least Zn. The generalization that PW metal concentrations are controlled by [SEM - AVS] is subject to some important qualifications if vertical gradients are complicated, metal concentrations vary, or equilibration times differ.The influence of acid volatile sulfide (AVS) on the partitioning of Cd, Ni, and Zn in porewater (PW) and sediment as reactive metals (SEM, simultaneously extracted metals) was investigated in laboratory microcosms. Two spiking procedures were compared, and the effects of vertical geochemical gradients and infaunal activity were evaluated. Sediments were spiked with a Cd-Ni-Zn mixture (0.06, 3, 7.5 ??mol/g, respectively) containing four levels of AVS (0.5, 7.5, 15, 35 ??mol/g). The results were compared to sediments spiked with four levels of Cd-Ni-Zn mixtures at one AVS concentration (7.5 ??mol/g). A vertical redox gradient was generated in each treatment by an 18-d incubation with an oxidized water column. [AVS] in the surface sediments decreased by 65-95% due to oxidation during incubation; initial [AVS] was maintained at 0.5-7.5 cm depth. PW metal concentrations were correlated with [SEM - AVS] among all data. But PW metal concentrations were variable, causing the distribution coefficient, Kdpw (the ratio of [SEM] to PW metal concentrations) to vary by 2-3 orders of magnitude at a given [SEM - AVS]. One reason for the variability was that vertical profiles in PW metal concentrations appeared to be influenced by diffusion as well as [SEM - AVS]. The presence of animals appeared to enhance the diffusion of at least Zn. The generalization that PW metal concentrations are controlled by [SEM - AVS] is subject to some important qualifications if vertical gradients are complicated, metal concentrations vary, or equilibration times differ.

  8. Cognitive profiles in euthymic patients with bipolar disorders: results from the FACE-BD cohort.

    PubMed

    Roux, Paul; Raust, Aurélie; Cannavo, Anne Sophie; Aubin, Valérie; Aouizerate, Bruno; Azorin, Jean-Michel; Bellivier, Frank; Belzeaux, Raoul; Bougerol, Thierry; Cussac, Iréna; Courtet, Philippe; Etain, Bruno; Gard, Sébastien; Job, Sophie; Kahn, Jean-Pierre; Leboyer, Marion; Olié, Emilie; Henry, Chantal; Passerieux, Christine

    2017-03-01

    Although cognitive deficits are a well-established feature of bipolar disorders (BD), even during periods of euthymia, little is known about cognitive phenotype heterogeneity among patients with BD. We investigated neuropsychological performance in 258 euthymic patients with BD recruited via the French network of expert centers for BD. We used a test battery assessing six domains of cognition. Hierarchical cluster analysis of the cross-sectional data was used to determine the optimal number of subgroups and to assign each patient to a specific cognitive cluster. Subsequently, subjects from each cluster were compared on demographic, clinical functioning, and pharmacological variables. A four-cluster solution was identified. The global cognitive performance was above normal in one cluster and below normal in another. The other two clusters had a near-normal cognitive performance, with above and below average verbal memory, respectively. Among the four clusters, significant differences were observed in estimated intelligence quotient and social functioning, which were lower for the low cognitive performers compared to the high cognitive performers. These results confirm the existence of several distinct cognitive profiles in BD. Identification of these profiles may help to develop profile-specific cognitive remediation programs, which might improve functioning in BD. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  9. Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

    PubMed

    Stefurak, Tres; Calhoun, Georgia B

    2007-01-01

    The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.

  10. Dynamic Airspace Configuration

    NASA Technical Reports Server (NTRS)

    Bloem, Michael J.

    2014-01-01

    In air traffic management systems, airspace is partitioned into regions in part to distribute the tasks associated with managing air traffic among different systems and people. These regions, as well as the systems and people allocated to each, are changed dynamically so that air traffic can be safely and efficiently managed. It is expected that new air traffic control systems will enable greater flexibility in how airspace is partitioned and how resources are allocated to airspace regions. In this talk, I will begin by providing an overview of some previous work and open questions in Dynamic Airspace Configuration research, which is concerned with how to partition airspace and assign resources to regions of airspace. For example, I will introduce airspace partitioning algorithms based on clustering, integer programming optimization, and computational geometry. I will conclude by discussing the development of a tablet-based tool that is intended to help air traffic controller supervisors configure airspace and controllers in current operations.

  11. A new type of simplified fuzzy rule-based system

    NASA Astrophysics Data System (ADS)

    Angelov, Plamen; Yager, Ronald

    2012-02-01

    Over the last quarter of a century, two types of fuzzy rule-based (FRB) systems dominated, namely Mamdani and Takagi-Sugeno type. They use the same type of scalar fuzzy sets defined per input variable in their antecedent part which are aggregated at the inference stage by t-norms or co-norms representing logical AND/OR operations. In this paper, we propose a significantly simplified alternative to define the antecedent part of FRB systems by data Clouds and density distribution. This new type of FRB systems goes further in the conceptual and computational simplification while preserving the best features (flexibility, modularity, and human intelligibility) of its predecessors. The proposed concept offers alternative non-parametric form of the rules antecedents, which fully reflects the real data distribution and does not require any explicit aggregation operations and scalar membership functions to be imposed. Instead, it derives the fuzzy membership of a particular data sample to a Cloud by the data density distribution of the data associated with that Cloud. Contrast this to the clustering which is parametric data space decomposition/partitioning where the fuzzy membership to a cluster is measured by the distance to the cluster centre/prototype ignoring all the data that form that cluster or approximating their distribution. The proposed new approach takes into account fully and exactly the spatial distribution and similarity of all the real data by proposing an innovative and much simplified form of the antecedent part. In this paper, we provide several numerical examples aiming to illustrate the concept.

  12. Choosing the best partition of the output from a large-scale simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Challacombe, Chelsea Jordan; Casleton, Emily Michele

    Data partitioning becomes necessary when a large-scale simulation produces more data than can be feasibly stored. The goal is to partition the data, typically so that every element belongs to one and only one partition, and store summary information about the partition, either a representative value plus an estimate of the error or a distribution. Once the partitions are determined and the summary information stored, the raw data is discarded. This process can be performed in-situ; meaning while the simulation is running. When creating the partitions there are many decisions that researchers must make. For instance, how to determine oncemore » an adequate number of partitions have been created, how are the partitions created with respect to dividing the data, or how many variables should be considered simultaneously. In addition, decisions must be made for how to summarize the information within each partition. Because of the combinatorial number of possible ways to partition and summarize the data, a method of comparing the different possibilities will help guide researchers into choosing a good partitioning and summarization scheme for their application.« less

  13. Response to traumatic brain injury neurorehabilitation through an artificial intelligence and statistics hybrid knowledge discovery from databases methodology.

    PubMed

    Gibert, Karina; García-Rudolph, Alejandro; García-Molina, Alberto; Roig-Rovira, Teresa; Bernabeu, Montse; Tormos, José María

    2008-01-01

    Develop a classificatory tool to identify different populations of patients with Traumatic Brain Injury based on the characteristics of deficit and response to treatment. A KDD framework where first, descriptive statistics of every variable was done, data cleaning and selection of relevant variables. Then data was mined using a generalization of Clustering based on rules (CIBR), an hybrid AI and Statistics technique which combines inductive learning (AI) and clustering (Statistics). A prior Knowledge Base (KB) is considered to properly bias the clustering; semantic constraints implied by the KB hold in final clusters, guaranteeing interpretability of the resultis. A generalization (Exogenous Clustering based on rules, ECIBR) is presented, allowing to define the KB in terms of variables which will not be considered in the clustering process itself, to get more flexibility. Several tools as Class panel graph are introduced in the methodology to assist final interpretation. A set of 5 classes was recommended by the system and interpretation permitted profiles labeling. From the medical point of view, composition of classes is well corresponding with different patterns of increasing level of response to rehabilitation treatments. All the patients initially assessable conform a single group. Severe impaired patients are subdivided in four profiles which clearly distinct response patterns. Particularly interesting the partial response profile, where patients could not improve executive functions. Meaningful classes were obtained and, from a semantics point of view, the results were sensibly improved regarding classical clustering, according to our opinion that hybrid AI & Stats techniques are more powerful for KDD than pure ones.

  14. Temperature-mortality relationship in dairy cattle in France based on an iso-hygro-thermal partition of the territory

    NASA Astrophysics Data System (ADS)

    Morignat, Eric; Gay, Emilie; Vinard, Jean-Luc; Calavas, Didier; Hénaux, Viviane

    2017-11-01

    The issue of global warming and more specifically its health impact on populations is increasingly concerning. The aim of our study was to evaluate the impact of temperature on dairy cattle mortality in France during the warm season (April-August). We therefore devised and implemented a spatial partitioning method to divide France into areas in which weather conditions were homogeneous, combining a multiple factor analysis with a clustering method using both weather and spatial data. We then used time-series regressions (2001-2008) to model the relationship between temperature humidity index (an index representing the temperature corrected by the relative humidity) and dairy cattle mortality within these areas. We found a significant effect of heat on dairy cattle mortality, but also an effect of cooler temperatures (to a lesser extent in some areas), which leads to a U-shaped relationship in the studied areas. Our partitioning approach based on weather criteria, associated with classic clustering methods, may contribute to better estimating temperature effects, a critical issue for animal health and welfare. Beyond the interest of its use in animal health, this approach can also be of interest in several situations in the frame of human health.

  15. Hierarchical Solution of the Traveling Salesman Problem with Random Dyadic Tilings

    NASA Astrophysics Data System (ADS)

    Kalmár-Nagy, Tamás; Bak, Bendegúz Dezső

    We propose a hierarchical heuristic approach for solving the Traveling Salesman Problem (TSP) in the unit square. The points are partitioned with a random dyadic tiling and clusters are formed by the points located in the same tile. Each cluster is represented by its geometrical barycenter and a “coarse” TSP solution is calculated for these barycenters. Midpoints are placed at the middle of each edge in the coarse solution. Near-optimal (or optimal) minimum tours are computed for each cluster. The tours are concatenated using the midpoints yielding a solution for the original TSP. The method is tested on random TSPs (independent, identically distributed points in the unit square) up to 10,000 points as well as on a popular benchmark problem (att532 — coordinates of 532 American cities). Our solutions are 8-13% longer than the optimal ones. We also present an optimization algorithm for the partitioning to improve our solutions. This algorithm further reduces the solution errors (by several percent using 1000 iteration steps). The numerical experiments demonstrate the viability of the approach.

  16. A Search for Pulsation in Young Brown Dwarfs and Very Low Mass Stars

    NASA Astrophysics Data System (ADS)

    Cody, Ann Marie

    2012-05-01

    In 2005, Palla and Baraffe proposed that brown dwarfs and very low mass stars (<0.1 solar masses) may be unstable to radial oscillations during the pre-main-sequence deuterium burning phase. With associated oscillation periods of 1--4 hours, this potentially new class of pulsation offers unprecedented opportunities to probe the interiors and evolution of low-mass objects in the 1--15 million year age range. Furthermore, several previous reports of short-period variability have suggested that deuterium-burning pulsation is in fact at work in young clusters. For my dissertation, I developed a photometric monitoring campaign to search for low-amplitude periodic variability in young brown dwarfs and very low mass stars using meter-class telescopes from both the ground and space. The resulting high-precision, high-cadence time-series photometry targeted four young clusters and achieved sensitivity to periodic oscillations with photometric amplitudes down to several millimagnitudes. This unprecedented variability census probed timescales ranging from minutes to weeks in a sample of 200 young, low-mass cluster members of IC 348, Sigma Orionis, Chamaeleon I, and Upper Scorpius. While I find a dearth of photometric periods under 10 hours, the campaign's high time resolution and precision have enabled detailed study of diverse light curve behavior in the clusters: rotational spot modulation, accretion signatures, and occultations by surrounding disk material. Analysis of the data has led to the establishment of a lower limit for the timescale of periodic photometric variability in young low-mass and substellar objects, an extension of the rotation period distribution to the brown dwarf regime, as well as insights into the connection between variability and circumstellar disks in the Sigma Orionis and Chamaeleon I clusters.

  17. Applications of Some Artificial Intelligence Methods to Satellite Soundings

    NASA Technical Reports Server (NTRS)

    Munteanu, M. J.; Jakubowicz, O.

    1985-01-01

    Hard clustering of temperature profiles and regression temperature retrievals were used to refine the method using the probabilities of membership of each pattern vector in each of the clusters derived with discriminant analysis. In hard clustering the maximum probability is taken and the corresponding cluster as the correct cluster are considered discarding the rest of the probabilities. In fuzzy partitioned clustering these probabilities are kept and the final regression retrieval is a weighted regression retrieval of several clusters. This method was used in the clustering of brightness temperatures where the purpose was to predict tropopause height. A further refinement is the division of temperature profiles into three major regions for classification purposes. The results are summarized in the tables total r.m.s. errors are displayed. An approach based on fuzzy logic which is intimately related to artificial intelligence methods is recommended.

  18. Examination of evidence for collinear cluster tri-partition

    NASA Astrophysics Data System (ADS)

    Pyatkov, Yu. V.; Kamanin, D. V.; Alexandrov, A. A.; Alexandrova, I. A.; Goryainova, Z. I.; Malaza, V.; Mkaza, N.; Kuznetsova, E. A.; Strekalovsky, A. O.; Strekalovsky, O. V.; Zhuchko, V. E.

    2017-12-01

    Background: In a series of experiments at different time-of-flight spectrometers of heavy ions we have observed manifestations of a new at least ternary decay channel of low excited heavy nuclei. Due to specific features of the effect, it was called collinear cluster tri-partition (CCT). The obtained experimental results have initiated a number of theoretical articles dedicated to different aspects of the CCT. Special attention was paid to kinematics constraints and stability of collinearity. Purpose: To compare theoretical predictions with our experimental data, only partially published so far. To develop the model of one of the most populated CCT modes that gives rise to the so-called "Ni-bump." Method: The fission events under analysis form regular two-dimensional linear structures in the mass correlation distributions of the fission fragments. The structures were revealed both at a highly statistically reliable level but on the background substrate, and at the low statistics in almost noiseless distribution. The structures are bounded by the known magic fragments and were reproduced at different spectrometers. All this provides high reliability of our experimental findings. The model of the CCT proposed here is based on theoretical results, published recently, and the detailed analysis of all available experimental data. Results: Under our model, the CCT mode giving rise to the Ni bump occurs as a two-stage breakup of the initial three body chain like the nuclear configuration with an elongated central cluster. After the first scission at the touching point with one of the side clusters, the predominantly heavier one, the deformation energy of the central cluster allows the emission of up to four neutrons flying apart isotropically. The heavy side cluster and a dinuclear system, consisting of the light side cluster and the central one, relaxed to a less elongated shape, are accelerated in the mutual Coulomb field. The "tip" of the dinuclear system at the moment of its rupture faces the heavy fragment or the opposite direction due to a single turn of the system around its center of gravity. Conclusions: Additional experimental information regarding the energies of the CCT partners and the proposed model of the process respond to criticisms concerning the kinematic constraints and the stability of collinearity in the CCT. The octupole deformed system formed after the first scission is oriented along the fission axis, and its rupture occurs predominantly after the full acceleration. Noncollinear true ternary fission and far asymmetric binary fission, observed earlier, appear to be the special cases of the decay of the prescission configuration leading to the CCT. Detection of the Ni-7268 fission fragments with a kinetic energy E <25 MeV at the mass-separator Lohengrin is proposed for an independent experimental verification of the CCT.

  19. OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.

    PubMed

    Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A

    2014-10-16

    The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.

  20. Binary recursive partitioning: background, methods, and application to psychology.

    PubMed

    Merkle, Edgar C; Shaffer, Victoria A

    2011-02-01

    Binary recursive partitioning (BRP) is a computationally intensive statistical method that can be used in situations where linear models are often used. Instead of imposing many assumptions to arrive at a tractable statistical model, BRP simply seeks to accurately predict a response variable based on values of predictor variables. The method outputs a decision tree depicting the predictor variables that were related to the response variable, along with the nature of the variables' relationships. No significance tests are involved, and the tree's 'goodness' is judged based on its predictive accuracy. In this paper, we describe BRP methods in a detailed manner and illustrate their use in psychological research. We also provide R code for carrying out the methods.

  1. Spatio-temporal dynamics of ocean conditions and forage taxa reveal regional structuring of seabird–prey relationships.

    PubMed

    Santora, Jarrod A; Schroeder, Isaac D; Field, John C; Wells, Brian K; Sydeman, William J

    Studies of predator–prey demographic responses and the physical drivers of such relationships are rare, yet essential for predicting future changes in the structure and dynamics of marine ecosystems. Here, we hypothesize that predator–prey relationships vary spatially in association with underlying physical ocean conditions, leading to observable changes in demographic rates, such as reproduction. To test this hypothesis, we quantified spatio-temporal variability in hydrographic conditions, krill, and forage fish to model predator (seabird) demographic responses over 18 years (1990–2007). We used principal component analysis and spatial correlation maps to assess coherence among ocean conditions, krill, and forage fish, and generalized additive models to quantify interannual variability in seabird breeding success relative to prey abundance. The first principal component of four hydrographic measurements yielded an index that partitioned “warm/weak upwelling” and “cool/strong upwelling” years. Partitioning of krill and forage fish time series among shelf and oceanic regions yielded spatially explicit indicators of prey availability. Krill abundance within the oceanic region was remarkably consistent between years, whereas krill over the shelf showed marked interannual fluctuations in relation to ocean conditions. Anchovy abundance varied on the shelf, and was greater in years of strong stratification, weak upwelling and warmer temperatures. Spatio-temporal variability of juvenile forage fish co-varied strongly with each other and with krill, but was weakly correlated with hydrographic conditions. Demographic responses between seabirds and prey availability revealed spatially variable associations indicative of the dynamic nature of “predator–habitat” relationships. Quantification of spatially explicit demographic responses, and their variability through time, demonstrate the possibility of delineating specific critical areas where the implementation of protective measures could maintain functions and productivity of central place foraging predators.

  2. Compositional Variability Associated with Stickney Crater on Phobos

    NASA Technical Reports Server (NTRS)

    Roush, T. L.; Hogan, R. C.

    2001-01-01

    Unsupervised clustering techniques identified four regions in and near Stickney crater on Phobos having unique spectral properties. These spectra are best matched by spectra of naturally occurring materials, e.g., lunar soils, meteorites, and rocks. Additional information is contained in the original extended abstract.

  3. A new physical performance classification system for elite handball players: cluster analysis

    PubMed Central

    Chirosa, Ignacio J.; Robinson, Joseph E.; van der Tillaar, Roland; Chirosa, Luis J.; Martín, Isidoro Martínez

    2016-01-01

    Abstract The aim of the present study was to identify different cluster groups of handball players according to their physical performance level assessed in a series of physical assessments, which could then be used to design a training program based on individual strengths and weaknesses, and to determine which of these variables best identified elite performance in a group of under-19 [U19] national level handball players. Players of the U19 National Handball team (n=16) performed a set of tests to determine: 10 m (ST10) and 20 m (ST20) sprint time, ball release velocity (BRv), countermovement jump (CMJ) height and squat jump (SJ) height. All players also performed an incremental-load bench press test to determine the 1 repetition maximum (1RMest), the load corresponding to maximum mean power (LoadMP), the mean propulsive phase power at LoadMP (PMPPMP) and the peak power at LoadMP (PPEAKMP). Cluster analyses of the test results generated four groupings of players. The variables best able to discriminate physical performance were BRv, ST20, 1RMest, PPEAKMP and PMPPMP. These variables could help coaches identify talent or monitor the physical performance of athletes in their team. Each cluster of players has a particular weakness related to physical performance and therefore, the cluster results can be applied to a specific training programmed based on individual needs. PMID:28149376

  4. Handling Data Skew in MapReduce Cluster by Using Partition Tuning

    PubMed

    Gao, Yufei; Zhou, Yanjie; Zhou, Bing; Shi, Lei; Zhang, Jiacai

    2017-01-01

    The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data. © 2017 Yufei Gao et al.

  5. Handling Data Skew in MapReduce Cluster by Using Partition Tuning.

    PubMed

    Gao, Yufei; Zhou, Yanjie; Zhou, Bing; Shi, Lei; Zhang, Jiacai

    2017-01-01

    The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data.

  6. Handling Data Skew in MapReduce Cluster by Using Partition Tuning

    PubMed Central

    Zhou, Yanjie; Zhou, Bing; Shi, Lei

    2017-01-01

    The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH). In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN). We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM) on healthcare data. PMID:29065568

  7. An iterative network partition algorithm for accurate identification of dense network modules

    PubMed Central

    Sun, Siqi; Dong, Xinran; Fu, Yao; Tian, Weidong

    2012-01-01

    A key step in network analysis is to partition a complex network into dense modules. Currently, modularity is one of the most popular benefit functions used to partition network modules. However, recent studies suggested that it has an inherent limitation in detecting dense network modules. In this study, we observed that despite the limitation, modularity has the advantage of preserving the primary network structure of the undetected modules. Thus, we have developed a simple iterative Network Partition (iNP) algorithm to partition a network. The iNP algorithm provides a general framework in which any modularity-based algorithm can be implemented in the network partition step. Here, we tested iNP with three modularity-based algorithms: multi-step greedy (MSG), spectral clustering and Qcut. Compared with the original three methods, iNP achieved a significant improvement in the quality of network partition in a benchmark study with simulated networks, identified more modules with significantly better enrichment of functionally related genes in both yeast protein complex network and breast cancer gene co-expression network, and discovered more cancer-specific modules in the cancer gene co-expression network. As such, iNP should have a broad application as a general method to assist in the analysis of biological networks. PMID:22121225

  8. Entropy-based consensus clustering for patient stratification.

    PubMed

    Liu, Hongfu; Zhao, Rui; Fang, Hongsheng; Cheng, Feixiong; Fu, Yun; Liu, Yang-Yu

    2017-09-01

    Patient stratification or disease subtyping is crucial for precision medicine and personalized treatment of complex diseases. The increasing availability of high-throughput molecular data provides a great opportunity for patient stratification. Many clustering methods have been employed to tackle this problem in a purely data-driven manner. Yet, existing methods leveraging high-throughput molecular data often suffers from various limitations, e.g. noise, data heterogeneity, high dimensionality or poor interpretability. Here we introduced an Entropy-based Consensus Clustering (ECC) method that overcomes those limitations all together. Our ECC method employs an entropy-based utility function to fuse many basic partitions to a consensus one that agrees with the basic ones as much as possible. Maximizing the utility function in ECC has a much more meaningful interpretation than any other consensus clustering methods. Moreover, we exactly map the complex utility maximization problem to the classic K -means clustering problem, which can then be efficiently solved with linear time and space complexity. Our ECC method can also naturally integrate multiple molecular data types measured from the same set of subjects, and easily handle missing values without any imputation. We applied ECC to 110 synthetic and 48 real datasets, including 35 cancer gene expression benchmark datasets and 13 cancer types with four molecular data types from The Cancer Genome Atlas. We found that ECC shows superior performance against existing clustering methods. Our results clearly demonstrate the power of ECC in clinically relevant patient stratification. The Matlab package is available at http://scholar.harvard.edu/yyl/ecc . yunfu@ece.neu.edu or yyl@channing.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  9. A density-based clustering model for community detection in complex networks

    NASA Astrophysics Data System (ADS)

    Zhao, Xiang; Li, Yantao; Qu, Zehui

    2018-04-01

    Network clustering (or graph partitioning) is an important technique for uncovering the underlying community structures in complex networks, which has been widely applied in various fields including astronomy, bioinformatics, sociology, and bibliometric. In this paper, we propose a density-based clustering model for community detection in complex networks (DCCN). The key idea is to find group centers with a higher density than their neighbors and a relatively large integrated-distance from nodes with higher density. The experimental results indicate that our approach is efficient and effective for community detection of complex networks.

  10. Intratumor partitioning and texture analysis of dynamic contrast-enhanced (DCE)-MRI identifies relevant tumor subregions to predict pathological response of breast cancer to neoadjuvant chemotherapy.

    PubMed

    Wu, Jia; Gong, Guanghua; Cui, Yi; Li, Ruijiang

    2016-11-01

    To predict pathological response of breast cancer to neoadjuvant chemotherapy (NAC) based on quantitative, multiregion analysis of dynamic contrast enhancement magnetic resonance imaging (DCE-MRI). In this Institutional Review Board-approved study, 35 patients diagnosed with stage II/III breast cancer were retrospectively investigated using 3T DCE-MR images acquired before and after the first cycle of NAC. First, principal component analysis (PCA) was used to reduce the dimensionality of the DCE-MRI data with high temporal resolution. We then partitioned the whole tumor into multiple subregions using k-means clustering based on the PCA-defined eigenmaps. Within each tumor subregion, we extracted four quantitative Haralick texture features based on the gray-level co-occurrence matrix (GLCM). The change in texture features in each tumor subregion between pre- and during-NAC was used to predict pathological complete response after NAC. Three tumor subregions were identified through clustering, each with distinct enhancement characteristics. In univariate analysis, all imaging predictors except one extracted from the tumor subregion associated with fast washout were statistically significant (P < 0.05) after correcting for multiple testing, with area under the receiver operating characteristic (ROC) curve (AUC) or AUCs between 0.75 and 0.80. In multivariate analysis, the proposed imaging predictors achieved an AUC of 0.79 (P = 0.002) in leave-one-out cross-validation. This improved upon conventional imaging predictors such as tumor volume (AUC = 0.53) and texture features based on whole-tumor analysis (AUC = 0.65). The heterogeneity of the tumor subregion associated with fast washout on DCE-MRI predicted pathological response to NAC in breast cancer. J. Magn. Reson. Imaging 2016;44:1107-1115. © 2016 International Society for Magnetic Resonance in Medicine.

  11. Effect of partition board color on mood and autonomic nervous function.

    PubMed

    Sakuragi, Sokichi; Sugiyama, Yoshiki

    2011-12-01

    The purpose of this study was to evaluate the effects of the presence or absence (control) of a partition board and its color (red, yellow, blue) on subjective mood ratings and changes in autonomic nervous system indicators induced by a video game task. The increase in the mean Profile of Mood States (POMS) Fatigue score and mean Oppressive feeling rating after the task was lowest with the blue partition board. Multiple-regression analysis identified oppressive feeling and error scores on the second half of the task as statistically significant contributors to Fatigue. While explanatory variables were limited to the physiological indices, multiple-regression analysis identified a significant contribution of autonomic reactivity (assessed by heart rate variability) to Fatigue. These results suggest that a blue partition board would reduce task-induced subjective fatigue, in part by lowering the oppressive feeling of being enclosed during the task, possibly by increasing autonomic reactivity.

  12. Influence of meteorological variables on rainfall partitioning for deciduous and coniferous tree species in urban area

    NASA Astrophysics Data System (ADS)

    Zabret, Katarina; Rakovec, Jože; Šraj, Mojca

    2018-03-01

    Rainfall partitioning is an important part of the ecohydrological cycle, influenced by numerous variables. Rainfall partitioning for pine (Pinus nigra Arnold) and birch (Betula pendula Roth.) trees was measured from January 2014 to June 2017 in an urban area of Ljubljana, Slovenia. 180 events from more than three years of observations were analyzed, focusing on 13 meteorological variables, including the number of raindrops, their diameter, and velocity. Regression tree and boosted regression tree analyses were performed to evaluate the influence of the variables on rainfall interception loss, throughfall, and stemflow in different phenoseasons. The amount of rainfall was recognized as the most influential variable, followed by rainfall intensity and the number of raindrops. Higher rainfall amount, intensity, and the number of drops decreased percentage of rainfall interception loss. Rainfall amount and intensity were the most influential on interception loss by birch and pine trees during the leafed and leafless periods, respectively. Lower wind speed was found to increase throughfall, whereas wind direction had no significant influence. Consideration of drop size spectrum properties proved to be important, since the number of drops, drop diameter, and median volume diameter were often recognized as important influential variables.

  13. Clustering stock market companies via chaotic map synchronization

    NASA Astrophysics Data System (ADS)

    Basalto, N.; Bellotti, R.; De Carlo, F.; Facchi, P.; Pascazio, S.

    2005-01-01

    A pairwise clustering approach is applied to the analysis of the Dow Jones index companies, in order to identify similar temporal behavior of the traded stock prices. To this end, the chaotic map clustering algorithm is used, where a map is associated to each company and the correlation coefficients of the financial time series to the coupling strengths between maps. The simulation of a chaotic map dynamics gives rise to a natural partition of the data, as companies belonging to the same industrial branch are often grouped together. The identification of clusters of companies of a given stock market index can be exploited in the portfolio optimization strategies.

  14. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    PubMed

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  15. Genetic structure of the Caribbean giant barrel sponge Xestospongia muta using the I3-M11 partition of COI

    NASA Astrophysics Data System (ADS)

    López-Legentil, S.; Pawlik, J. R.

    2009-03-01

    In recent years, reports of sponge bleaching, disease, and subsequent mortality have increased alarmingly. Population recovery may depend strongly on colonization capabilities of the affected species. The giant barrel sponge Xestospongia muta is a dominant reef constituent in the Caribbean. However, little is known about its population structure and gene flow. The 5'-end fragment of the mitochondrial gene cytochrome oxidase subunit I is often used to address these kinds of questions, but it presents very low intraspecific nucleotide variability in sponges. In this study, the usefulness of the I3-M11 partition of COI to determine the genetic structure of X. muta was tested for seven populations from Florida, the Bahamas and Belize. A total of 116 sequences of 544 bp were obtained for the I3-M11 partition corresponding to four haplotypes. In order to make a comparison with the 5'-end partition, 10 sequences per haplotype were analyzed for this fragment. The 40 resulting sequences were of 569 bp and corresponded to two haplotypes. The nucleotide diversity of the I3-M11 partition (π = 0.00386) was higher than that of the 5'-end partition (π = 0.00058), indicating better resolution at the intraspecific level. Sponges with the most divergent external morphologies (smooth vs. digitate surface) had different haplotypes, while those with the most common external morphology (rough surface) presented a mixture of haplotypes. Pairwise tests for genetic differentiation among geographic locations based on F ST values showed significant genetic divergence between most populations, but this genetic differentiation was not due to isolation by distance. While limited larval dispersal may have led to differentiation among some of the populations, the patterns of genetic structure appear to be most strongly related to patterns of ocean currents. Therefore, hydrological features may play a major role in sponge colonization and need to be considered in future plans for management and conservation of these important components of coral reef ecosystems.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Do, Hainam, E-mail: h.do@nottingham.ac.uk, E-mail: richard.wheatley@nottingham.ac.uk; Wheatley, Richard J., E-mail: h.do@nottingham.ac.uk, E-mail: richard.wheatley@nottingham.ac.uk

    A robust and model free Monte Carlo simulation method is proposed to address the challenge in computing the classical density of states and partition function of solids. Starting from the minimum configurational energy, the algorithm partitions the entire energy range in the increasing energy direction (“upward”) into subdivisions whose integrated density of states is known. When combined with the density of states computed from the “downward” energy partitioning approach [H. Do, J. D. Hirst, and R. J. Wheatley, J. Chem. Phys. 135, 174105 (2011)], the equilibrium thermodynamic properties can be evaluated at any temperature and in any phase. The methodmore » is illustrated in the context of the Lennard-Jones system and can readily be extended to other molecular systems and clusters for which the structures are known.« less

  17. Genetic diversity and environmental associations of sacsaoul ( Haloxylon ammodendron)

    NASA Astrophysics Data System (ADS)

    Zhang, Linjing; Zhao, Guifang; Yue, Ming; Pan, Xiaoling

    2003-07-01

    Random amplified polymorphic DNA (RAPD) markers were used to assess levels and patterns of genetic diversity in H. ammodendron (Chenopodiaceae). A total of 117 plants from 6 subpopulations on oasis-desert ecotone was analyzed by 16 arbitrarily chosen primers resulting in highly reproducible RAPD bands. The analysis of molecular variance (AMOVA) with distances among individuals showed that most of the variation (74%) occurred among individuals within subpopulations, which is expected for a crossing organism, and 26% of variation among subpopulations. Estimates of Shannon index and Nei"s index from allele frequencies corroborated AMOVA partitioning in H. ammodendron. UPGMA cluster analyses, based on genetic distance, do not revealed grouping of some geographically proximate populations. This is the first report of the partitioning of genetic variability within and between subpopulations of H. ammodendron and provides important baseline data for optimizing sampling strategies and for conserving the genetic resources of this species. The Percentage of polymorphic loci was as high as 96%, presumably being response to oasis-desert ecotone. There were gene flows (Nm=5.38 individuals/generation), based on gene differentiation coefficient (GST was 0.1567) between subpopulations, and strong habitat selection override the gene flow to maintain the subpopulation differentiation. Correlation analyses showed that there was significant relationship between genetic diversity and soil CL ion.

  18. Nearest neighbor-density-based clustering methods for large hyperspectral images

    NASA Astrophysics Data System (ADS)

    Cariou, Claude; Chehdi, Kacem

    2017-10-01

    We address the problem of hyperspectral image (HSI) pixel partitioning using nearest neighbor - density-based (NN-DB) clustering methods. NN-DB methods are able to cluster objects without specifying the number of clusters to be found. Within the NN-DB approach, we focus on deterministic methods, e.g. ModeSeek, knnClust, and GWENN (standing for Graph WatershEd using Nearest Neighbors). These methods only require the availability of a k-nearest neighbor (kNN) graph based on a given distance metric. Recently, a new DB clustering method, called Density Peak Clustering (DPC), has received much attention, and kNN versions of it have quickly followed and showed their efficiency. However, NN-DB methods still suffer from the difficulty of obtaining the kNN graph due to the quadratic complexity with respect to the number of pixels. This is why GWENN was embedded into a multiresolution (MR) scheme to bypass the computation of the full kNN graph over the image pixels. In this communication, we propose to extent the MR-GWENN scheme on three aspects. Firstly, similarly to knnClust, the original labeling rule of GWENN is modified to account for local density values, in addition to the labels of previously processed objects. Secondly, we set up a modified NN search procedure within the MR scheme, in order to stabilize of the number of clusters found from the coarsest to the finest spatial resolution. Finally, we show that these extensions can be easily adapted to the three other NN-DB methods (ModeSeek, knnClust, knnDPC) for pixel clustering in large HSIs. Experiments are conducted to compare the four NN-DB methods for pixel clustering in HSIs. We show that NN-DB methods can outperform a classical clustering method such as fuzzy c-means (FCM), in terms of classification accuracy, relevance of found clusters, and clustering speed. Finally, we demonstrate the feasibility and evaluate the performances of NN-DB methods on a very large image acquired by our AISA Eagle hyperspectral imaging sensor.

  19. [Applying the clustering technique for characterising maintenance outsourcing].

    PubMed

    Cruz, Antonio M; Usaquén-Perilla, Sandra P; Vanegas-Pabón, Nidia N; Lopera, Carolina

    2010-06-01

    Using clustering techniques for characterising companies providing health institutions with maintenance services. The study analysed seven pilot areas' equipment inventory (264 medical devices). Clustering techniques were applied using 26 variables. Response time (RT), operation duration (OD), availability and turnaround time (TAT) were amongst the most significant ones. Average biomedical equipment obsolescence value was 0.78. Four service provider clusters were identified: clusters 1 and 3 had better performance, lower TAT, RT and DR values (56 % of the providers coded O, L, C, B, I, S, H, F and G, had 1 to 4 day TAT values:

  20. Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Khandeev, V. I.

    2016-02-01

    The strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters of given sizes (cardinalities) minimizing the sum (over both clusters) of the intracluster sums of squared distances from the elements of the clusters to their centers is considered. It is assumed that the center of one of the sought clusters is specified at the desired (arbitrary) point of space (without loss of generality, at the origin), while the center of the other one is unknown and determined as the mean value over all elements of this cluster. It is shown that unless P = NP, there is no fully polynomial-time approximation scheme for this problem, and such a scheme is substantiated in the case of a fixed space dimension.

  1. The Partition Function in the Four-Dimensional Schwarz-Type Topological Half-Flat Two-Form Gravity

    NASA Astrophysics Data System (ADS)

    Abe, Mitsuko

    We derive the partition functions of the Schwarz-type four-dimensional topological half-flat two-form gravity model on K3-surface or T4 up to on-shell one-loop corrections. In this model the bosonic moduli spaces describe an equivalent class of a trio of the Einstein-Kähler forms (the hyper-Kähler forms). The integrand of the partition function is represented by the product of some bar ∂ -torsions. bar ∂ -torsion is the extension of R-torsion for the de Rham complex to that for the bar ∂ -complex of a complex analytic manifold.

  2. Novel approach to classifying patients with pulmonary arterial hypertension using cluster analysis.

    PubMed

    Parikh, Kishan S; Rao, Youlan; Ahmad, Tariq; Shen, Kai; Felker, G Michael; Rajagopal, Sudarshan

    2017-01-01

    Pulmonary arterial hypertension (PAH) patients have distinct disease courses and responses to treatment, but current diagnostic and treatment schemes provide limited insight. We aimed to see if cluster analysis could distinguish clinical phenotypes in PAH. An unbiased cluster analysis was performed on 17 baseline clinical variables of PAH patients from the FREEDOM-M, FREEDOM-C, and FREEDOM-C2 randomized trials of oral treprostinil versus placebo. Participants were either treatment-naïve (FREEDOM-M) or on background therapy (FREEDOM-C, FREEDOM-C2). We tested for association of clusters with outcomes and interaction with respect to treatment. Primary outcome was 6-minute walking distance (6MWD) change. We included 966 participants with 12-week (FREEDOM-M) or 16-week (FREEDOM-C and FREEDOM-C2) follow-up. Four patient clusters were identified. Compared with Clusters 1 (n = 131) and 2 (n = 496), Clusters 3 (n = 246) and 4 (n = 93) patients were older, heavier, had worse baseline functional class, 6MWD, Borg Dyspnea Index, and fewer years since PAH diagnosis. Clusters also differed by PAH etiology and background therapies, but not gender or race. Mean treatment effect of oral treprostinil differed across Clusters 1-4 increased in a monotonic fashion (Cluster 1: 10.9 m; Cluster 2: 13.0 m; Cluster 3: 25.0 m; Cluster 4: 50.9 m; interaction P value = 0.048). We identified four distinct clusters of PAH patients based on common patient characteristics. Patients who were older, diagnosed with PAH for a shorter period, and had worse baseline symptoms and exercise capacity had the greatest response to oral treprostinil treatment.

  3. An empirical method to cluster objective nebulizer adherence data among adults with cystic fibrosis.

    PubMed

    Hoo, Zhe H; Campbell, Michael J; Curley, Rachael; Wildman, Martin J

    2017-01-01

    The purpose of using preventative inhaled treatments in cystic fibrosis is to improve health outcomes. Therefore, understanding the relationship between adherence to treatment and health outcome is crucial. Temporal variability, as well as absolute magnitude of adherence affects health outcomes, and there is likely to be a threshold effect in the relationship between adherence and outcomes. We therefore propose a pragmatic algorithm-based clustering method of objective nebulizer adherence data to better understand this relationship, and potentially, to guide clinical decisions. This clustering method consists of three related steps. The first step is to split adherence data for the previous 12 months into four 3-monthly sections. The second step is to calculate mean adherence for each section and to score the section based on mean adherence. The third step is to aggregate the individual scores to determine the final cluster ("cluster 1" = very low adherence; "cluster 2" = low adherence; "cluster 3" = moderate adherence; "cluster 4" = high adherence), and taking into account adherence trend as represented by sequential individual scores. The individual scores should be displayed along with the final cluster for clinicians to fully understand the adherence data. We present three cases to illustrate the use of the proposed clustering method. This pragmatic clustering method can deal with adherence data of variable duration (ie, can be used even if 12 months' worth of data are unavailable) and can cluster adherence data in real time. Empirical support for some of the clustering parameters is not yet available, but the suggested classifications provide a structure to investigate parameters in future prospective datasets in which there are accurate measurements of nebulizer adherence and health outcomes.

  4. Spicule size variation in Xestospongia testudinaria Lamarck, 1815 at Probolinggo-Situbondo coastal

    NASA Astrophysics Data System (ADS)

    Subagio, Iwenda Bella; Setiawan, Edwin; Hariyanto, Sucipto; Irawan, Bambang

    2017-06-01

    Xestospongia testudinaria Lamarck, 1815 is a marine sponge that become a main constituent in reef ecosystems at northern waters Probolinggo-Situbondo. This barrel sponge species possesses an oxea type of spicule that varies in dimensions (length and width) in concordance to condition and location of habitat. The experiment aimed to understand how spicules condition of this sponge reacted to environment variables. Sponges' specimen were taken by SCUBA equipment in 6-7 m, 10-11 m, and 14-15 m depths in addition to four different localities and three different part of sponges' body (upper, middle and basal parts). Environmental variables data were also retrieved (salinity, water clarity, temperature, dissolve silica, and depth) in each locations. Results confirmed that oxea spicule size either in length or width dimensions in four locations (Batu Lawang coral cluster [BL], Karang Mayit coral cluster [KM], Paiton coral cluster [PT], and Takat Palapa [TP]) relatively increased toward depth. Likewise, the size of spicules in the TP relatively longer than three other locations. In contrast, spicules oxea in PT relatively wider than three other locations. Salinity gave negative impact to spicules length, while depth gave positive impact. Depth, water clarity, dissolve silica, and temperature gave negative effect to spicules width while salinity gave positive impact.

  5. Site-specific polarizabilities as descriptors of metallic behavior in atomic clusters

    NASA Astrophysics Data System (ADS)

    Jackson, Koblar; Jellinek, Julius

    The electric dipole polarizability of a cluster is a measure of its response to an applied electric field. The site specific polarizability method decomposes the total cluster polarizability into contributions from individual atoms and also allows it to be partitioned into charge transfer and electric dipole contributions. By systematically examining the trends in these quantities for several types of metal atom clusters over a wide range of cluster sizes, we find common characteristics that uniquely link the behavior of the clusters to that of the corresponding bulk metals for clusters as small as 10 atoms. We discuss these trends and compare and contrast them with results for non-metal clusters. This work was supported by the Office of Basic Energy Sciences, Division of Chemical Sciences, Geosciences and Biosciences, U.S. Department of Energy under Grant SC0001330 (KAJ) and Contract No. DE-AC02-06CH11357 (JJ).

  6. Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes.

    PubMed

    Liu, Ying; Navathe, Shamkant B; Pivoshenko, Alex; Dasigi, Venu G; Dingledine, Ray; Ciliax, Brian J

    2006-01-01

    One of the key challenges of microarray studies is to derive biological insights from the gene-expression patterns. Clustering genes by functional keyword association can provide direct information about the functional links among genes. However, the quality of the keyword lists significantly affects the clustering results. We compared two keyword weighting schemes: normalised z-score and term frequency-inverse document frequency (TFIDF). Two gene sets were tested to evaluate the effectiveness of the weighting schemes for keyword extraction for gene clustering. Using established measures of cluster quality, the results produced from TFIDF-weighted keywords outperformed those produced from normalised z-score weighted keywords. The optimised algorithms should be useful for partitioning genes from microarray lists into functionally discrete clusters.

  7. Ground-base multicolour photometry of NGC 6811

    NASA Astrophysics Data System (ADS)

    Ocando, S.; Martín-Ruiz, S.; Rodríguez, E.

    2017-03-01

    NGC 6811 is one of the four open clusters in the field of view of the Kepler space mission. Among its members there are several known pulsating A-F stars of the δ Scuti, γ Doradus, and hybrid type, which makes this cluster a very interesting object to study its pulsational content. During the summers of 2013 and 2014 we performed an extensive observational campaign using the 1.5 m telescope at the Sierra Nevada Observatory and multicolour photometry. New pulsating variables candidates were detected in this work. We fulfilled a frequency analysis for the known variables, with very good agreement with previous results. By using Str ̈omgren photometry we were able to obtain the main physical parameters of the stars such as temperature, surface gravity, metallicity and luminosity. We have also determined the corresponding frequency phase-shifts and amplitude ratios between different filters as a first step to identify the pulsational modes of the variables.

  8. Factor regression for interpreting genotype-environment interaction in bread-wheat trials.

    PubMed

    Baril, C P

    1992-05-01

    The French INRA wheat (Triticum aestivum L. em Thell.) breeding program is based on multilocation trials to produce high-yielding, adapted lines for a wide range of environments. Differential genotypic responses to variable environment conditions limit the accuracy of yield estimations. Factor regression was used to partition the genotype-environment (GE) interaction into four biologically interpretable terms. Yield data were analyzed from 34 wheat genotypes grown in four environments using 12 auxiliary agronomic traits as genotypic and environmental covariates. Most of the GE interaction (91%) was explained by the combination of only three traits: 1,000-kernel weight, lodging susceptibility and spike length. These traits are easily measured in breeding programs, therefore factor regression model can provide a convenient and useful prediction method of yield.

  9. Immunization Attitudes and Beliefs Among Parents: Beyond a Dichotomous Perspective

    ERIC Educational Resources Information Center

    Gust, Deborah; Brown, Cedric; Sheedy, Kristine; Hibbs, Beth; Weaver, Donna; Nowak, Glen

    2005-01-01

    Objective: To better understand differences among parents in their attitudes, beliefs, and behaviors regarding childhood immunizations and health-related issues. Methods: Forty-four survey variables assessing attitudes and beliefs about immunizations and health were analyzed. The K-means clusters technique was used to identify homogeneous groups…

  10. Environmental Uncertainty and Communication Network Complexity: A Cross-System, Cross-Cultural Test.

    ERIC Educational Resources Information Center

    Danowski, James

    An infographic model is proposed to account for the operation of systems within their information environments. Infographics is a communication paradigm used to indicate the clustering of information processing variables in communication systems. Four propositions concerning environmental uncertainty and internal communication network complexity,…

  11. Folksonomies and clustering in the collaborative system CiteULike

    NASA Astrophysics Data System (ADS)

    Capocci, Andrea; Caldarelli, Guido

    2008-06-01

    We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tri-partite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient can be used to analyze the semantical patterns among tags.

  12. Soft Clustering Criterion Functions for Partitional Document Clustering

    DTIC Science & Technology

    2004-05-26

    in the clus- ter that it already belongs to. The refinement phase ends, as soon as we perform an iteration in which no documents moved between...for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 26 MAY 2004 2... it with the one obtained by the hard criterion functions. We present a comprehensive experimental evaluation involving twelve differ- ent datasets

  13. AGT/ℤ2

    NASA Astrophysics Data System (ADS)

    Le Floch, Bruno; Turiaci, Gustavo J.

    2017-12-01

    We relate Liouville/Toda CFT correlators on Riemann surfaces with boundaries and cross-cap states to supersymmetric observables in four-dimensional N=2 gauge theories. Our construction naturally involves four-dimensional theories with fields defined on different ℤ2 quotients of the sphere (hemisphere and projective space) but nevertheless interacting with each other. The six-dimensional origin is a ℤ2 quotient of the setup giving rise to the usual AGT correspondence. To test the correspondence, we work out the ℝℙ4 partition function of four-dimensional N=2 theories by combining a 3d lens space and a 4d hemisphere partition functions. The same technique reproduces known ℝℙ2 partition functions in a form that lets us easily check two-dimensional Seiberg-like dualities on this nonorientable space. As a bonus we work out boundary and cross-cap wavefunctions in Toda CFT.

  14. New cataclysmic variables and other exotic binaries in the globular cluster 47 Tucanae*

    NASA Astrophysics Data System (ADS)

    Rivera Sandoval, L. E.; van den Berg, M.; Heinke, C. O.; Cohn, H. N.; Lugger, P. M.; Anderson, J.; Cool, A. M.; Edmonds, P. D.; Wijnands, R.; Ivanova, N.; Grindlay, J. E.

    2018-04-01

    We present 22 new (+3 confirmed) cataclysmic variables (CVs) in the non-core-collapsed globular cluster 47 Tucanae (47 Tuc). The total number of CVs in the cluster is now 43, the largest sample in any globular cluster so far. For the identifications we used near-ultraviolet (NUV) and optical images from the Hubble Space Telescope, in combination with X-ray results from the Chandra X-ray Observatory. This allowed us to build the deepest NUV CV luminosity function of the cluster to date. We found that the CVs in 47 Tuc are more concentrated towards the cluster centre than the main-sequence turn-off stars. We compared our results to the CV populations of the core-collapsed globular clusters NGC 6397 and NGC 6752. We found that 47 Tuc has fewer bright CVs per unit mass than those two other clusters. That suggests that dynamical interactions in core-collapsed clusters play a major role creating new CVs. In 47 Tuc, the CV population is probably dominated by primordial and old dynamically formed systems. We estimated that the CVs in 47 Tuc have total masses of ˜1.4 M⊙. We also found that the X-ray luminosity function of the CVs in the three clusters is bimodal. Additionally, we discuss a possible double degenerate system and an intriguing/unclassified object. Finally, we present four systems that could be millisecond pulsar companions given their X-ray and NUV/optical colours. For one of them we present very strong evidence for being an ablated companion. The other three could be CO or He white dwarfs.

  15. Subgroups of advanced cancer patients clustered by their symptom profiles: quality-of-life outcomes.

    PubMed

    Husain, Amna; Myers, Jeff; Selby, Debbie; Thomson, Barbara; Chow, Edward

    2011-11-01

    Symptom cluster analysis is a new frontier of research in symptom management. This study clustered patients by their symptom profiles to identify subgroups that may be at higher risk for poor quality of life (QOL) and that may, therefore, benefit most from targeted interventions. Longitudinal study of metastatic cancer patients using the Edmonton Symptom Assessment Scale (ESAS). We generated two-, three-, and four-cluster subgroups and examined the relationship of cluster membership with patient outcomes. To address the problem of missing longitudinal data, we developed a novel outcome variable (QualTime) that measures both QOL and time in study. Two hundred and twenty-one patients with a mean Palliative Performance Scale (PPS) of 59.1 were enrolled. The three-cluster model was chosen for further analysis. The low-burden subgroup had all low severity symptom scores. The intermediate subgroup separates from the low-burden group on the "debility" profile of fatigue, drowsiness, appetite, and well-being. The high-burden group separates from the intermediate-burden group on pain, depression, and anxiety. At baseline, PPS (p=0.0003) and cluster membership (p<0.0001) contributed significantly to global QOL. In univariate analysis, cluster membership was related to the longitudinal outcome, QualTime. In a multivariate model, the relationship of PPS to QualTime was still significant (p=0.0002), but subgroup membership was no longer significant (p=0.1009). PPS is a stronger predictor of the longitudinal variable than cluster subgroups; however, cluster subgroups provide a target for clinical interventions that may improve QOL.

  16. Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials.

    PubMed

    Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B

    2017-04-01

    Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.

  17. Cities with camera-equipped taxicabs experience reduced taxicab driver homicide rates: United States, 1996-2010.

    PubMed

    Menéndez, Cammie Chaumont; Amandus, Harlan; Damadi, Parisa; Wu, Nan; Konda, Srinivas; Hendricks, Scott

    2014-05-01

    Driving a taxicab remains one of the most dangerous occupations in the United States, with leading homicide rates. Although safety equipment designed to reduce robberies exists, it is not clear what effect it has on reducing taxicab driver homicides. Taxicab driver homicide crime reports for 1996 through 2010 were collected from 20 of the largest cities (>200,000) in the United States: 7 cities with cameras installed in cabs, 6 cities with partitions installed, and 7 cities with neither cameras nor partitions. Poisson regression modeling using generalized estimating equations provided city taxicab driver homicide rates while accounting for serial correlation and clustering of data within cities. Two separate models were constructed to compare (1) cities with cameras installed in taxicabs versus cities with neither cameras nor partitions and (2) cities with partitions installed in taxicabs versus cities with neither cameras nor partitions. Cities with cameras installed in cabs experienced a significant reduction in homicides after cameras were installed (adjRR = 0.11, CL 0.06-0.24) and compared to cities with neither cameras nor partitions (adjRR = 0.32, CL 0.15-0.67). Cities with partitions installed in taxicabs experienced a reduction in homicides (adjRR = 0.78, CL 0.41-1.47) compared to cities with neither cameras nor partitions, but it was not statistically significant. The findings suggest cameras installed in taxicabs are highly effective in reducing homicides among taxicab drivers. Although not statistically significant, the findings suggest partitions installed in taxicabs may be effective.

  18. Modifiable lifestyle behavior patterns, sedentary time and physical activity contexts: a cluster analysis among middle school boys and girls in the SALTA study.

    PubMed

    Marques, Elisa A; Pizarro, Andreia N; Figueiredo, Pedro; Mota, Jorge; Santos, Maria P

    2013-06-01

    To analyze how modifiable health-related variables are clustered and associated with children's participation in play, active travel and structured exercise and sport among boys and girls. Data were collected from 9 middle-schools in Porto (Portugal) area. A total of 636 children in the 6th grade (340 girls and 296 boys) with a mean age of 11.64 years old participated in the study. Cluster analyses were used to identify patterns of lifestyle and healthy/unhealthy behaviors. Multinomial logistic regression analysis was used to estimate associations between cluster allocation, sedentary time and participation in three different physical activity (PA) contexts: play, active travel, and structured exercise/sport. Four distinct clusters were identified based on four lifestyle risk factors. The most disadvantaged cluster was characterized by high body mass index, low high-density lipoprotein cholesterol and cardiorespiratory fitness and a moderate level of moderate to vigorous PA. Everyday outdoor play (OR=1.85, 95%CI 0.318-0.915) and structured exercise/sport (OR=1.85, 95%CI 0.291-0.990) were associated with healthier lifestyle patterns. There were no significant associations between health patterns and sedentary time or travel mode. Outdoor play and sport/exercise participation seem more important than active travel from school in influencing children's healthy cluster profiles. Copyright © 2013 Elsevier Inc. All rights reserved.

  19. Cluster and propensity based approximation of a network

    PubMed Central

    2013-01-01

    Background The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets. Results Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM). Conclusions The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust. PMID:23497424

  20. Toward the identification of molecular cogs.

    PubMed

    Dziubiński, Maciej; Lesyng, Bogdan

    2016-04-05

    Computer simulations of molecular systems allow determination of microscopic interactions between individual atoms or groups of atoms, as well as studies of intramolecular motions. Nevertheless, description of structural transformations at the mezoscopic level and identification of causal relations associated with these transformations is very difficult. Structural and functional properties are related to free energy changes. Therefore, to better understand structural and functional properties of molecular systems, it is required to deepen our knowledge of free energy contributions arising from molecular subsystems in the course of structural transformations. The method presented in this work quantifies the energetic contribution of each pair of atoms to the total free energy change along a given collective variable. Next, with the help of a genetic clustering algorithm, the method proposes a division of the system into two groups of atoms referred to as molecular cogs. Atoms which cooperate to push the system forward along a collective variable are referred to as forward cogs, and those which work in the opposite direction as reverse cogs. The procedure was tested on several small molecules for which the genetic clustering algorithm successfully found optimal partitionings into molecular cogs. The primary result of the method is a plot depicting the energetic contributions of the identified molecular cogs to the total Potential of Mean Force (PMF) change. Case-studies presented in this work should help better understand the implications of our approach, and were intended to pave the way to a future, publicly available implementation. © 2015 Wiley Periodicals, Inc.

  1. Initial Stage of Aerosol Formation from Oversaturated Vapors

    NASA Astrophysics Data System (ADS)

    Lushnikov, A. A.; Zagainov, V. A.; Lyubovtseva, Yu. S.

    2018-03-01

    The formation of aerosol particles from oversaturated vapor was considered assuming that the stable nuclei of the new phase contain two (dimers) or three (trimers) condensing vapor molecules. Exact expressions were derived and analyzed for the partition functions of the dimer and trimer suspended in a carrier gas for the rectangular well and repulsive core intermolecular potentials. The equilibrium properties of these clusters and the nucleation rate of aerosol particles were discussed. The bound states of clusters were introduced using a limitation on their total energy: molecular clusters with a negative total energy were considered to exclude configurations with noninteracting fragments.

  2. Hyper-spectral image segmentation using spectral clustering with covariance descriptors

    NASA Astrophysics Data System (ADS)

    Kursun, Olcay; Karabiber, Fethullah; Koc, Cemalettin; Bal, Abdullah

    2009-02-01

    Image segmentation is an important and difficult computer vision problem. Hyper-spectral images pose even more difficulty due to their high-dimensionality. Spectral clustering (SC) is a recently popular clustering/segmentation algorithm. In general, SC lifts the data to a high dimensional space, also known as the kernel trick, then derive eigenvectors in this new space, and finally using these new dimensions partition the data into clusters. We demonstrate that SC works efficiently when combined with covariance descriptors that can be used to assess pixelwise similarities rather than in the high-dimensional Euclidean space. We present the formulations and some preliminary results of the proposed hybrid image segmentation method for hyper-spectral images.

  3. Fine-Scale Analysis Reveals Cryptic Landscape Genetic Structure in Desert Tortoises

    PubMed Central

    Latch, Emily K.; Boarman, William I.; Walde, Andrew; Fleischer, Robert C.

    2011-01-01

    Characterizing the effects of landscape features on genetic variation is essential for understanding how landscapes shape patterns of gene flow and spatial genetic structure of populations. Most landscape genetics studies have focused on patterns of gene flow at a regional scale. However, the genetic structure of populations at a local scale may be influenced by a unique suite of landscape variables that have little bearing on connectivity patterns observed at broader spatial scales. We investigated fine-scale spatial patterns of genetic variation and gene flow in relation to features of the landscape in desert tortoise (Gopherus agassizii), using 859 tortoises genotyped at 16 microsatellite loci with associated data on geographic location, sex, elevation, slope, and soil type, and spatial relationship to putative barriers (power lines, roads). We used spatially explicit and non-explicit Bayesian clustering algorithms to partition the sample into discrete clusters, and characterize the relationships between genetic distance and ecological variables to identify factors with the greatest influence on gene flow at a local scale. Desert tortoises exhibit weak genetic structure at a local scale, and we identified two subpopulations across the study area. Although genetic differentiation between the subpopulations was low, our landscape genetic analysis identified both natural (slope) and anthropogenic (roads) landscape variables that have significantly influenced gene flow within this local population. We show that desert tortoise movements at a local scale are influenced by features of the landscape, and that these features are different than those that influence gene flow at larger scales. Our findings are important for desert tortoise conservation and management, particularly in light of recent translocation efforts in the region. More generally, our results indicate that recent landscape changes can affect gene flow at a local scale and that their effects can be detected almost immediately. PMID:22132143

  4. Fine-scale analysis reveals cryptic landscape genetic structure in desert tortoises.

    PubMed

    Latch, Emily K; Boarman, William I; Walde, Andrew; Fleischer, Robert C

    2011-01-01

    Characterizing the effects of landscape features on genetic variation is essential for understanding how landscapes shape patterns of gene flow and spatial genetic structure of populations. Most landscape genetics studies have focused on patterns of gene flow at a regional scale. However, the genetic structure of populations at a local scale may be influenced by a unique suite of landscape variables that have little bearing on connectivity patterns observed at broader spatial scales. We investigated fine-scale spatial patterns of genetic variation and gene flow in relation to features of the landscape in desert tortoise (Gopherus agassizii), using 859 tortoises genotyped at 16 microsatellite loci with associated data on geographic location, sex, elevation, slope, and soil type, and spatial relationship to putative barriers (power lines, roads). We used spatially explicit and non-explicit Bayesian clustering algorithms to partition the sample into discrete clusters, and characterize the relationships between genetic distance and ecological variables to identify factors with the greatest influence on gene flow at a local scale. Desert tortoises exhibit weak genetic structure at a local scale, and we identified two subpopulations across the study area. Although genetic differentiation between the subpopulations was low, our landscape genetic analysis identified both natural (slope) and anthropogenic (roads) landscape variables that have significantly influenced gene flow within this local population. We show that desert tortoise movements at a local scale are influenced by features of the landscape, and that these features are different than those that influence gene flow at larger scales. Our findings are important for desert tortoise conservation and management, particularly in light of recent translocation efforts in the region. More generally, our results indicate that recent landscape changes can affect gene flow at a local scale and that their effects can be detected almost immediately.

  5. Internal Cluster Validation on Earthquake Data in the Province of Bengkulu

    NASA Astrophysics Data System (ADS)

    Rini, D. S.; Novianti, P.; Fransiska, H.

    2018-04-01

    K-means method is an algorithm for cluster n object based on attribute to k partition, where k < n. There is a deficiency of algorithms that is before the algorithm is executed, k points are initialized randomly so that the resulting data clustering can be different. If the random value for initialization is not good, the clustering becomes less optimum. Cluster validation is a technique to determine the optimum cluster without knowing prior information from data. There are two types of cluster validation, which are internal cluster validation and external cluster validation. This study aims to examine and apply some internal cluster validation, including the Calinski-Harabasz (CH) Index, Sillhouette (S) Index, Davies-Bouldin (DB) Index, Dunn Index (D), and S-Dbw Index on earthquake data in the Bengkulu Province. The calculation result of optimum cluster based on internal cluster validation is CH index, S index, and S-Dbw index yield k = 2, DB Index with k = 6 and Index D with k = 15. Optimum cluster (k = 6) based on DB Index gives good results for clustering earthquake in the Bengkulu Province.

  6. Addiction treatment dropout: exploring patients' characteristics.

    PubMed

    López-Goñi, José J; Fernández-Montalvo, Javier; Arteaga, Alfonso

    2012-01-01

    This study explored the characteristics associated with treatment dropout in substance dependence patients. A sample of 122 addicted patients (84 treatment completers and 38 treatment dropouts) who sought outpatient treatment was assessed to collect information on sociodemographic, consumption (assessed by EuropASI), psychopathological (assessed by SCL-90-R), and personality variables (assessed by MCMI-II). Completers and dropouts were compared on all studied variables. According to the results, dropouts scored significantly higher on the EuropASI variables measuring employment/support, alcohol consumption, and family/social problems, as well as on the schizotypal scale of MCMI-II. Because most of the significant differences were found in EuropASI variables, three clusters analyses (2, 3, and 4 groups) based on EuropASI mean scores were carried out to determine clinically relevant information predicting dropout. The most relevant results were obtained when four groups were used. Comparisons between the four groups derived from cluster analysis showed statistically significant differences in the rate of dropout, with one group exhibiting the highest dropout rate. The distinctive characteristics of the group with highest dropout rate included the presence of an increased labor problem combined with high alcohol consumption. Furthermore, this group had the highest scores on three scales of the MCMI-II: phobic, dependent, and schizotypal. The implications of these results for further research and clinical practice are discussed.  Copyright © American Academy of Addiction Psychiatry.

  7. Convex Regression with Interpretable Sharp Partitions

    PubMed Central

    Petersen, Ashley; Simon, Noah; Witten, Daniela

    2016-01-01

    We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set. PMID:27635120

  8. Intersecting surface defects and instanton partition functions

    NASA Astrophysics Data System (ADS)

    Pan, Yiwen; Peelaers, Wolfger

    2017-07-01

    We analyze intersecting surface defects inserted in interacting four-dimensional N=2 supersymmetric quantum field theories. We employ the realization of a class of such systems as the infrared fixed points of renormalization group flows from larger theories, triggered by perturbed Seiberg-Witten monopole-like configurations, to compute their partition functions. These results are cast into the form of a partition function of 4d/2d/0d coupled systems. Our computations provide concrete expressions for the instanton partition function in the presence of intersecting defects and we study the corresponding ADHM model.

  9. Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

    PubMed

    Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

    2017-05-01

    The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.

  10. Metal/Silicate Partitioning of P, Ga, and W at High Pressures and Temperatures: Dependence on Silicate Melt Composition

    NASA Technical Reports Server (NTRS)

    Bailey, Edward; Drake, Michael J.

    2004-01-01

    The distinctive pattern of element concentrations in the upper mantle provides essential evidence in our attempts to understand the accretion and differentiation of the Earth (e.g., Drake and Righter, 2002; Jones and Drake, 1986; Righter et al., 1997; Wanke 1981). Core formation is best investigated through use of metal/silicate partition coefficients for siderophile elements. The variables influencing partition coefficients are temperature, pressure, the major element compositions of the silicate and metal phases, and oxygen fugacity. Examples of studies investigating the effects of these variables on partitioning behavior are: composition of the metal phase by Capobianco et al. (1999) and Righter et al. (1997); silicate melt composition by Watson (1976), Walter and Thibault (1995), Hillgren et al. (1996), Jana and Walker (1997), and Jaeger and Drake (2000); and oxygen fugacity by Capobianco et al. (1999), and Walter and Thibault (1995). Here we address the relative influences of silicate melt composition, pressure and temperature.

  11. Multiple Attribute Group Decision-Making Methods Based on Trapezoidal Fuzzy Two-Dimensional Linguistic Partitioned Bonferroni Mean Aggregation Operators.

    PubMed

    Yin, Kedong; Yang, Benshuo; Li, Xuemei

    2018-01-24

    In this paper, we investigate multiple attribute group decision making (MAGDM) problems where decision makers represent their evaluation of alternatives by trapezoidal fuzzy two-dimensional uncertain linguistic variable. To begin with, we introduce the definition, properties, expectation, operational laws of trapezoidal fuzzy two-dimensional linguistic information. Then, to improve the accuracy of decision making in some case where there are a sort of interrelationship among the attributes, we analyze partition Bonferroni mean (PBM) operator in trapezoidal fuzzy two-dimensional variable environment and develop two operators: trapezoidal fuzzy two-dimensional linguistic partitioned Bonferroni mean (TF2DLPBM) aggregation operator and trapezoidal fuzzy two-dimensional linguistic weighted partitioned Bonferroni mean (TF2DLWPBM) aggregation operator. Furthermore, we develop a novel method to solve MAGDM problems based on TF2DLWPBM aggregation operator. Finally, a practical example is presented to illustrate the effectiveness of this method and analyses the impact of different parameters on the results of decision-making.

  12. Multiple Attribute Group Decision-Making Methods Based on Trapezoidal Fuzzy Two-Dimensional Linguistic Partitioned Bonferroni Mean Aggregation Operators

    PubMed Central

    Yin, Kedong; Yang, Benshuo

    2018-01-01

    In this paper, we investigate multiple attribute group decision making (MAGDM) problems where decision makers represent their evaluation of alternatives by trapezoidal fuzzy two-dimensional uncertain linguistic variable. To begin with, we introduce the definition, properties, expectation, operational laws of trapezoidal fuzzy two-dimensional linguistic information. Then, to improve the accuracy of decision making in some case where there are a sort of interrelationship among the attributes, we analyze partition Bonferroni mean (PBM) operator in trapezoidal fuzzy two-dimensional variable environment and develop two operators: trapezoidal fuzzy two-dimensional linguistic partitioned Bonferroni mean (TF2DLPBM) aggregation operator and trapezoidal fuzzy two-dimensional linguistic weighted partitioned Bonferroni mean (TF2DLWPBM) aggregation operator. Furthermore, we develop a novel method to solve MAGDM problems based on TF2DLWPBM aggregation operator. Finally, a practical example is presented to illustrate the effectiveness of this method and analyses the impact of different parameters on the results of decision-making. PMID:29364849

  13. Distribution of Diverse Escherichia coli between Cattle and Pasture.

    PubMed

    NandaKafle, Gitanjali; Seale, Tarren; Flint, Toby; Nepal, Madhav; Venter, Stephanus N; Brözel, Volker S

    2017-09-27

    Escherichia coli is widely considered to not survive for extended periods outside the intestines of warm-blooded animals; however, recent studies demonstrated that E. coli strains maintain populations in soil and water without any known fecal contamination. The objective of this study was to investigate whether the niche partitioning of E. coli occurs between cattle and their pasture. We attempted to clarify whether E. coli from bovine feces differs phenotypically and genotypically from isolates maintaining a population in pasture soil over winter. Soil, bovine fecal, and run-off samples were collected before and after the introduction of cattle to the pasture. Isolates (363) were genotyped by uidA and mutS sequences and phylogrouping, and evaluated for curli formation (Rough, Dry, And Red, or RDAR). Three types of clusters emerged, viz. bovine-associated, clusters devoid of cattle isolates and representing isolates endemic to the pasture environment, and clusters with both. All isolates clustered with strains of E. coli sensu stricto, distinct from the cryptic species Clades I, III, IV, and V. Pasture soil endemic and bovine fecal populations had very different phylogroup distributions, indicating niche partitioning. The soil endemic population was largely comprised of phylogroup B1 and had a higher average RDAR score than other isolates. These results indicate the existence of environmental E. coli strains that are phylogenetically distinct from bovine fecal isolates, and that have the ability to maintain populations in the soil environment.

  14. Phylodynamic Analysis Reveals CRF01_AE Dissemination between Japan and Neighboring Asian Countries and the Role of Intravenous Drug Use in Transmission

    PubMed Central

    Shiino, Teiichiro; Hattori, Junko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

    2014-01-01

    Background One major circulating HIV-1 subtype in Southeast Asian countries is CRF01_AE, but little is known about its epidemiology in Japan. We conducted a molecular phylodynamic study of patients newly diagnosed with CRF01_AE from 2003 to 2010. Methods Plasma samples from patients registered in Japanese Drug Resistance HIV-1 Surveillance Network were analyzed for protease-reverse transcriptase sequences; all sequences undergo subtyping and phylogenetic analysis using distance-matrix-based, maximum likelihood and Bayesian coalescent Markov Chain Monte Carlo (MCMC) phylogenetic inferences. Transmission clusters were identified using interior branch test and depth-first searches for sub-tree partitions. Times of most recent common ancestor (tMRCAs) of significant clusters were estimated using Bayesian MCMC analysis. Results Among 3618 patient registered in our network, 243 were infected with CRF01_AE. The majority of individuals with CRF01_AE were Japanese, predominantly male, and reported heterosexual contact as their risk factor. We found 5 large clusters with ≥5 members and 25 small clusters consisting of pairs of individuals with highly related CRF01_AE strains. The earliest cluster showed a tMRCA of 1996, and consisted of individuals with their known risk as heterosexual contacts. The other four large clusters showed later tMRCAs between 2000 and 2002 with members including intravenous drug users (IVDU) and non-Japanese, but not men who have sex with men (MSM). In contrast, small clusters included a high frequency of individuals reporting MSM risk factors. Phylogenetic analysis also showed that some individuals infected with HIV strains spread in East and South-eastern Asian countries. Conclusions Introduction of CRF01_AE viruses into Japan is estimated to have occurred in the 1990s. CFR01_AE spread via heterosexual behavior, then among persons connected with non-Japanese, IVDU, and MSM. Phylogenetic analysis demonstrated that some viral variants are largely restricted to Japan, while others have a broad geographic distribution. PMID:25025900

  15. Analysis of heart rate variability signal in meditation using second-order difference plot

    NASA Astrophysics Data System (ADS)

    Goswami, Damodar Prasad; Tibarewala, Dewaki Nandan; Bhattacharya, Dilip Kumar

    2011-06-01

    In this article, the heart rate variability signal taken from subjects practising different types of meditations have been investigated to find the underlying similarity among them and how they differ from the non-meditative condition. Four different groups of subjects having different meditation techniques are involved. The data have been obtained from the Physionet and also collected with our own ECG machine. For data analysis, the second order difference plot is applied. Each of the plots obtained from the second order differences form a single cluster which is nearly elliptical in shape except for some outliers. In meditation, the axis of the elliptical cluster rotates anticlockwise from the cluster formed from the premeditation data, although the amount of rotation is not of the same extent in every case. This form study reveals definite and specific changes in the heart rate variability of the subjects during meditation. All the four groups of subjects followed different procedures but surprisingly the resulting physiological effect is the same to some extent. It indicates that there is some commonness among all the meditative techniques in spite of their apparent dissimilarity and it may be hoped that each of them leads to the same result as preached by the masters of meditation. The study shows that meditative state has a completely different physiology and that it can be achieved by any meditation technique we have observed. Possible use of this tool in clinical setting such as in stress management and in the treatment of hypertension is also mentioned.

  16. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures

    PubMed Central

    Chen, Yun; Yang, Hui

    2016-01-01

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering. PMID:27966581

  17. A Novel Information-Theoretic Approach for Variable Clustering and Predictive Modeling Using Dirichlet Process Mixtures.

    PubMed

    Chen, Yun; Yang, Hui

    2016-12-14

    In the era of big data, there are increasing interests on clustering variables for the minimization of data redundancy and the maximization of variable relevancy. Existing clustering methods, however, depend on nontrivial assumptions about the data structure. Note that nonlinear interdependence among variables poses significant challenges on the traditional framework of predictive modeling. In the present work, we reformulate the problem of variable clustering from an information theoretic perspective that does not require the assumption of data structure for the identification of nonlinear interdependence among variables. Specifically, we propose the use of mutual information to characterize and measure nonlinear correlation structures among variables. Further, we develop Dirichlet process (DP) models to cluster variables based on the mutual-information measures among variables. Finally, orthonormalized variables in each cluster are integrated with group elastic-net model to improve the performance of predictive modeling. Both simulation and real-world case studies showed that the proposed methodology not only effectively reveals the nonlinear interdependence structures among variables but also outperforms traditional variable clustering algorithms such as hierarchical clustering.

  18. Comparison in partition efficiency of protein separation between four different tubing modifications in spiral high-speed countercurrent chromatography

    PubMed Central

    Ito, Yoichiro; Clary, Robert

    2016-01-01

    High-speed countercurrent chromatography with a spiral tube assembly can retain a satisfactory amount of stationary phase of polymer phase systems used for protein separation. In order to improve the partition efficiency a simple tool to modify the tubing shapes was fabricated, and the following four different tubing modifications were made: intermittently pressed at 10 mm width, flat, flat-wave, and flat-twist. Partition efficiencies of the separation column made from these modified tubing were examined in protein separation with an aqueous-aqueous polymer phase system at flow rates of 1–2 ml/min under 800 rpm. The results indicated that the column with all modified tubing improved the partition efficiency at a flow rate of 1 ml/min, but at a higher flow rate of 2 ml/min the columns made of flattened tubing showed lowered partition efficiency apparently due to the loss of the retained stationary phase. Among all the modified columns, the column with intermittently pressed tubing gave the best peak resolution. It may be concluded that the intermittently pressed and flat-twist improve the partition efficiency in a semi-preparative separation while other modified tubing of flat and flat-wave configurations may be used for analytical separations with a low flow rate. PMID:27790621

  19. Comparison in partition efficiency of protein separation between four different tubing modifications in spiral high-speed countercurrent chromatography.

    PubMed

    Ito, Yoichiro; Clary, Robert

    2016-12-01

    High-speed countercurrent chromatography with a spiral tube assembly can retain a satisfactory amount of stationary phase of polymer phase systems used for protein separation. In order to improve the partition efficiency a simple tool to modify the tubing shapes was fabricated, and the following four different tubing modifications were made: intermittently pressed at 10 mm width, flat, flat-wave, and flat-twist. Partition efficiencies of the separation column made from these modified tubing were examined in protein separation with an aqueous-aqueous polymer phase system at flow rates of 1-2 ml/min under 800 rpm. The results indicated that the column with all modified tubing improved the partition efficiency at a flow rate of 1 ml/min, but at a higher flow rate of 2 ml/min the columns made of flattened tubing showed lowered partition efficiency apparently due to the loss of the retained stationary phase. Among all the modified columns, the column with intermittently pressed tubing gave the best peak resolution. It may be concluded that the intermittently pressed and flat-twist improve the partition efficiency in a semi-preparative separation while other modified tubing of flat and flat-wave configurations may be used for analytical separations with a low flow rate.

  20. Genetic and Metabolite Diversity of Sardinian Populations of Helichrysum italicum

    PubMed Central

    Melito, Sara; Sias, Angela; Petretto, Giacomo L.; Chessa, Mario; Pintore, Giorgio; Porceddu, Andrea

    2013-01-01

    Background Helichrysum italicum (Asteraceae) is a small shrub endemic to the Mediterranean Basin, growing in fragmented and diverse habitats. The species has attracted attention due to its secondary metabolite content, but little effort has as yet been dedicated to assessing the genetic and metabolite diversity present in these populations. Here, we describe the diversity of 50 H. italicum populations collected from a range of habitats in Sardinia. Methods H. italicum plants were AFLP fingerprinted and the composition of their leaf essential oil characterized by GC-MS. The relationships between the genetic structure of the populations, soil, habitat and climatic variables and the essential oil chemotypes present were evaluated using Bayesian clustering, contingency analyses and AMOVA. Key results The Sardinian germplasm could be partitioned into two AFLP-based clades. Populations collected from the southwestern region constituted a homogeneous group which remained virtually intact even at high levels of K. The second, much larger clade was more diverse. A positive correlation between genetic diversity and elevation suggested the action of natural purifying selection. Four main classes of compounds were identified among the essential oils, namely monoterpenes, oxygenated monoterpenes, sesquiterpenes and oxygenated sesquiterpenes. Oxygenated monoterpene levels were significantly correlated with the AFLP-based clade structure, suggesting a correspondence between gene pool and chemical diversity. Conclusions The results suggest an association between chemotype, genetic diversity and collection location which is relevant for the planning of future collections aimed at identifying valuable sources of essential oil. PMID:24260149

  1. Genetic characterization of Uruguayan Pampa Rocha pigs with microsatellite markers

    PubMed Central

    Montenegro, M; Llambí, S; Castro, G; Barlocco, N; Vadell, A; Landi, V; Delgado, JV; Martínez, A

    2015-01-01

    In this study, we genetically characterized the Uruguayan pig breed Pampa Rocha. Genetic variability was assessed by analyzing a panel of 25 microsatellite markers from a sample of 39 individuals. Pampa Rocha pigs showed high genetic variability with observed and expected heterozygosities of 0.583 and 0.603, respectively. The mean number of alleles was 5.72. Twenty-four markers were polymorphic, with 95.8% of them in Hardy Weinberg equilibrium. The level of endogamy was low (FIS = 0.0475). A factorial analysis of correspondence was used to assess the genetic differences between Pampa Rocha and other pig breeds; genetic distances were calculated, and a tree was designed to reflect the distance matrix. Individuals were also allocated into clusters. This analysis showed that the Pampa Rocha breed was separated from the other breeds along the first and second axes. The neighbour-joining tree generated by the genetic distances DA showed clustering of Pampa Rocha with the Meishan breed. The allocation of individuals to clusters showed a clear separation of Pampa Rocha pigs. These results provide insights into the genetic variability of Pampa Rocha pigs and indicate that this breed is a well-defined genetic entity. PMID:25983624

  2. Correlation Study of Physics Achievement, Learning Strategy, Attitude and Gender in an Introductory Physics Course

    ERIC Educational Resources Information Center

    Sezgin Selcuk, Gamze

    2010-01-01

    This study investigates the relationship between multiple predictors of physics achievement including reported use of four learning strategy clusters (elaboration, organization, comprehension monitoring and rehearsal), attitudes towards physics (sense of care and sense of interest) and a demographic variable (gender) in order to determine the…

  3. The comparison of automated clustering algorithms for resampling representative conformer ensembles with RMSD matrix.

    PubMed

    Kim, Hyoungrae; Jang, Cheongyun; Yadav, Dharmendra K; Kim, Mi-Hyun

    2017-03-23

    The accuracy of any 3D-QSAR, Pharmacophore and 3D-similarity based chemometric target fishing models are highly dependent on a reasonable sample of active conformations. Since a number of diverse conformational sampling algorithm exist, which exhaustively generate enough conformers, however model building methods relies on explicit number of common conformers. In this work, we have attempted to make clustering algorithms, which could find reasonable number of representative conformer ensembles automatically with asymmetric dissimilarity matrix generated from openeye tool kit. RMSD was the important descriptor (variable) of each column of the N × N matrix considered as N variables describing the relationship (network) between the conformer (in a row) and the other N conformers. This approach used to evaluate the performance of the well-known clustering algorithms by comparison in terms of generating representative conformer ensembles and test them over different matrix transformation functions considering the stability. In the network, the representative conformer group could be resampled for four kinds of algorithms with implicit parameters. The directed dissimilarity matrix becomes the only input to the clustering algorithms. Dunn index, Davies-Bouldin index, Eta-squared values and omega-squared values were used to evaluate the clustering algorithms with respect to the compactness and the explanatory power. The evaluation includes the reduction (abstraction) rate of the data, correlation between the sizes of the population and the samples, the computational complexity and the memory usage as well. Every algorithm could find representative conformers automatically without any user intervention, and they reduced the data to 14-19% of the original values within 1.13 s per sample at the most. The clustering methods are simple and practical as they are fast and do not ask for any explicit parameters. RCDTC presented the maximum Dunn and omega-squared values of the four algorithms in addition to consistent reduction rate between the population size and the sample size. The performance of the clustering algorithms was consistent over different transformation functions. Moreover, the clustering method can also be applied to molecular dynamics sampling simulation results.

  4. Start codon targeted (SCoT) and target region amplification polymorphism (TRAP) for evaluating the genetic relationship of Dendrobium species.

    PubMed

    Feng, Shangguo; He, Refeng; Yang, Sai; Chen, Zhe; Jiang, Mengying; Lu, Jiangjie; Wang, Huizhong

    2015-08-10

    Two molecular marker systems, start codon targeted (SCoT) and target region amplification polymorphism (TRAP), were used for genetic relationship analysis of 36 Dendrobium species collected from China. Twenty-two selected SCoT primers produced 337 loci, of which 324 (96%) were polymorphic, whereas 13 TRAP primer combinations produced a total of 510 loci, with 500 (97.8%) of them being polymorphic. An average polymorphism information content of 0.953 and 0.983 was detected using the SCoT and TRAP primers, respectively, showing that a high degree of genetic diversity exists among Chinese Dendrobium species. The partition of clusters in the unweighted pair group method with arithmetic mean dendrogram and principal coordinate analysis plot based on the SCoT and TRAP markers was similar and clustered the 36 Dendrobium species into four main groups. Our results will provide useful information for resource protection and will also be useful to improve the current Dendrobium breeding programs. Our results also demonstrate that SCoT and TRAP markers are informative and can be used to evaluate genetic relationships between Dendrobium species. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Clustering of Variables for Mixed Data

    NASA Astrophysics Data System (ADS)

    Saracco, J.; Chavent, M.

    2016-05-01

    This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.

  6. Lagrangian based methods for coherent structure detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Allshouse, Michael R., E-mail: mallshouse@chaos.utexas.edu; Peacock, Thomas, E-mail: tomp@mit.edu

    There has been a proliferation in the development of Lagrangian analytical methods for detecting coherent structures in fluid flow transport, yielding a variety of qualitatively different approaches. We present a review of four approaches and demonstrate the utility of these methods via their application to the same sample analytic model, the canonical double-gyre flow, highlighting the pros and cons of each approach. Two of the methods, the geometric and probabilistic approaches, are well established and require velocity field data over the time interval of interest to identify particularly important material lines and surfaces, and influential regions, respectively. The other twomore » approaches, implementing tools from cluster and braid theory, seek coherent structures based on limited trajectory data, attempting to partition the flow transport into distinct regions. All four of these approaches share the common trait that they are objective methods, meaning that their results do not depend on the frame of reference used. For each method, we also present a number of example applications ranging from blood flow and chemical reactions to ocean and atmospheric flows.« less

  7. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    PubMed

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.

  8. Electron correlation in the interacting quantum atoms partition via coupled-cluster lagrangian densities.

    PubMed

    Holguín-Gallego, Fernando José; Chávez-Calvillo, Rodrigo; García-Revilla, Marco; Francisco, Evelio; Pendás, Ángel Martín; Rocha-Rinza, Tomás

    2016-07-15

    The electronic energy partition established by the Interacting Quantum Atoms (IQA) approach is an important method of wavefunction analyses which has yielded valuable insights about different phenomena in physical chemistry. Most of the IQA applications have relied upon approximations, which do not include either dynamical correlation (DC) such as Hartree-Fock (HF) or external DC like CASSCF theory. Recently, DC was included in the IQA method by means of HF/Coupled-Cluster (CC) transition densities (Chávez-Calvillo et al., Comput. Theory Chem. 2015, 1053, 90). Despite the potential utility of this approach, it has a few drawbacks, for example, it is not consistent with the calculation of CC properties different from the total electronic energy. To improve this situation, we have implemented the IQA energy partition based on CC Lagrangian one- and two-electron orbital density matrices. The development presented in this article is tested and illustrated with the H2 , LiH, H2 O, H2 S, N2 , and CO molecules for which the IQA results obtained under the consideration of (i) the CC Lagrangian, (ii) HF/CC transition densities, and (iii) HF are critically analyzed and compared. Additionally, the effect of the DC in the different components of the electronic energy in the formation of the T-shaped (H2 )2 van der Waals cluster and the bimolecular nucleophilic substitution between F(-) and CH3 F is examined. We anticipate that the approach put forward in this article will provide new understandings on subjects in physical chemistry wherein DC plays a crucial role like molecular interactions along with chemical bonding and reactivity. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  9. Community structure and function of planktonic Crenarchaeota: changes with depth in the South China Sea.

    PubMed

    Hu, Anyi; Jiao, Nianzhi; Zhang, Chuanlun L

    2011-10-01

    Marine Crenarchaeota represent a widespread and abundant microbial group in marine ecosystems. Here, we investigated the abundance, diversity, and distribution of planktonic Crenarchaeota in the epi-, meso-, and bathypelagic zones at three stations in the South China Sea (SCS) by analysis of crenarchaeal 16S rRNA gene, ammonia monooxygenase gene amoA involved in ammonia oxidation, and biotin carboxylase gene accA putatively involved in archaeal CO(2) fixation. Quantitative PCR analyses indicated that crenarchaeal amoA and accA gene abundances varied similarly with archaeal and crenarchaeal 16S rRNA gene abundances at all stations, except that crenarchaeal accA genes were almost absent in the epipelagic zone. Ratios of the crenarchaeal amoA gene to 16S rRNA gene abundances decreased ~2.6 times from the epi- to bathypelagic zones, whereas the ratios of crenarchaeal accA gene to marine group I crenarchaeal 16S rRNA gene or to crenarchaeal amoA gene abundances increased with depth, suggesting that the metabolism of Crenarchaeota may change from the epi- to meso- or bathypelagic zones. Denaturing gradient gel electrophoresis profiling of the 16S rRNA genes revealed depth partitioning in archaeal community structures. Clone libraries of crenarchaeal amoA and accA genes showed two clusters: the "shallow" cluster was exclusively derived from epipelagic water and the "deep" cluster was from meso- and/or bathypelagic waters, suggesting that niche partitioning may take place between the shallow and deep marine Crenarchaeota. Overall, our results show strong depth partitioning of crenarchaeal populations in the SCS and suggest a shift in their community structure and ecological function with increasing depth.

  10. Image-driven Population Analysis through Mixture Modeling

    PubMed Central

    Sabuncu, Mert R.; Balci, Serdar K.; Shenton, Martha E.; Golland, Polina

    2009-01-01

    We present iCluster, a fast and efficient algorithm that clusters a set of images while co-registering them using a parameterized, nonlinear transformation model. The output of the algorithm is a small number of template images that represent different modes in a population. This is in contrast with traditional, hypothesis-driven computational anatomy approaches that assume a single template to construct an atlas. We derive the algorithm based on a generative model of an image population as a mixture of deformable template images. We validate and explore our method in four experiments. In the first experiment, we use synthetic data to explore the behavior of the algorithm and inform a design choice on parameter settings. In the second experiment, we demonstrate the utility of having multiple atlases for the application of localizing temporal lobe brain structures in a pool of subjects that contains healthy controls and schizophrenia patients. Next, we employ iCluster to partition a data set of 415 whole brain MR volumes of subjects aged 18 through 96 years into three anatomical subgroups. Our analysis suggests that these subgroups mainly correspond to age groups. The templates reveal significant structural differences across these age groups that confirm previous findings in aging research. In the final experiment, we run iCluster on a group of 15 patients with dementia and 15 age-matched healthy controls. The algorithm produces two modes, one of which contains dementia patients only. These results suggest that the algorithm can be used to discover sub-populations that correspond to interesting structural or functional “modes.” PMID:19336293

  11. Zodiacal Exoplanets in Time (ZEIT). V. A Uniform Search for Transiting Planets in Young Clusters Observed by K2

    NASA Astrophysics Data System (ADS)

    Rizzuto, Aaron C.; Mann, Andrew W.; Vanderburg, Andrew; Kraus, Adam L.; Covey, Kevin R.

    2017-12-01

    Detection of transiting exoplanets around young stars is more difficult than for older systems owing to increased stellar variability. Nine young open cluster planets have been found in the K2 data, but no single analysis pipeline identified all planets. We have developed a transit search pipeline for young stars that uses a transit-shaped notch and quadratic continuum in a 12 or 24 hr window to fit both the stellar variability and the presence of a transit. In addition, for the most rapid rotators ({P}{rot}< 2 days) we model the variability using a linear combination of observed rotations of each star. To maximally exploit our new pipeline, we update the membership for four stellar populations observed by K2 (Upper Scorpius, Pleiades, Hyades, Praesepe) and conduct a uniform search of the members. We identify all known transiting exoplanets in the clusters, 17 eclipsing binaries, one transiting planet candidate orbiting a potential Pleiades member, and three orbiting unlikely members of the young clusters. Limited injection recovery testing on the known planet hosts indicates that for the older Praesepe systems we are sensitive to additional exoplanets as small as 1-2 R ⊕, and for the larger Upper Scorpius planet host (K2-33) our pipeline is sensitive to ˜4 R ⊕ transiting planets. The lack of detected multiple systems in the young clusters is consistent with the expected frequency from the original Kepler sample, within our detection limits. With a robust pipeline that detects all known planets in the young clusters, occurrence rate testing at young ages is now possible.

  12. Measures of clustering and heterogeneity in multilevel Poisson regression analyses of rates/count data

    PubMed Central

    Austin, Peter C.; Stryhn, Henrik; Leckie, George; Merlo, Juan

    2017-01-01

    Multilevel data occur frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models. These models incorporate cluster‐specific random effects that allow one to partition the total variation in the outcome into between‐cluster variation and between‐individual variation. The magnitude of the effect of clustering provides a measure of the general contextual effect. When outcomes are binary or time‐to‐event in nature, the general contextual effect can be quantified by measures of heterogeneity like the median odds ratio or the median hazard ratio, respectively, which can be calculated from a multilevel regression model. Outcomes that are integer counts denoting the number of times that an event occurred are common in epidemiological and medical research. The median (incidence) rate ratio in multilevel Poisson regression for counts that corresponds to the median odds ratio or median hazard ratio for binary or time‐to‐event outcomes respectively is relatively unknown and is rarely used. The median rate ratio is the median relative change in the rate of the occurrence of the event when comparing identical subjects from 2 randomly selected different clusters that are ordered by rate. We also describe how the variance partition coefficient, which denotes the proportion of the variation in the outcome that is attributable to between‐cluster differences, can be computed with count outcomes. We illustrate the application and interpretation of these measures in a case study analyzing the rate of hospital readmission in patients discharged from hospital with a diagnosis of heart failure. PMID:29114926

  13. Mining the modular structure of protein interaction networks.

    PubMed

    Berenstein, Ariel José; Piñero, Janet; Furlong, Laura Inés; Chernomoretz, Ariel

    2015-01-01

    Cluster-based descriptions of biological networks have received much attention in recent years fostered by accumulated evidence of the existence of meaningful correlations between topological network clusters and biological functional modules. Several well-performing clustering algorithms exist to infer topological network partitions. However, due to respective technical idiosyncrasies they might produce dissimilar modular decompositions of a given network. In this contribution, we aimed to analyze how alternative modular descriptions could condition the outcome of follow-up network biology analysis. We considered a human protein interaction network and two paradigmatic cluster recognition algorithms, namely: the Clauset-Newman-Moore and the infomap procedures. We analyzed to what extent both methodologies yielded different results in terms of granularity and biological congruency. In addition, taking into account Guimera's cartographic role characterization of network nodes, we explored how the adoption of a given clustering methodology impinged on the ability to highlight relevant network meso-scale connectivity patterns. As a case study we considered a set of aging related proteins and showed that only the high-resolution modular description provided by infomap, could unveil statistically significant associations between them and inter/intra modular cartographic features. Besides reporting novel biological insights that could be gained from the discovered associations, our contribution warns against possible technical concerns that might affect the tools used to mine for interaction patterns in network biology studies. In particular our results suggested that sub-optimal partitions from the strict point of view of their modularity levels might still be worth being analyzed when meso-scale features were to be explored in connection with external source of biological knowledge.

  14. Uncertainty in Twenty-First-Century CMIP5 Sea Level Projections

    NASA Technical Reports Server (NTRS)

    Little, Christopher M.; Horton, Radley M.; Kopp, Robert E.; Oppenheimer, Michael; Yip, Stan

    2015-01-01

    The representative concentration pathway (RCP) simulations included in phase 5 of the Coupled Model Intercomparison Project (CMIP5) quantify the response of the climate system to different natural and anthropogenic forcing scenarios. These simulations differ because of 1) forcing, 2) the representation of the climate system in atmosphere-ocean general circulation models (AOGCMs), and 3) the presence of unforced (internal) variability. Global and local sea level rise projections derived from these simulations, and the emergence of distinct responses to the four RCPs depend on the relative magnitude of these sources of uncertainty at different lead times. Here, the uncertainty in CMIP5 projections of sea level is partitioned at global and local scales, using a 164-member ensemble of twenty-first-century simulations. Local projections at New York City (NYSL) are highlighted. The partition between model uncertainty, scenario uncertainty, and internal variability in global mean sea level (GMSL) is qualitatively consistent with that of surface air temperature, with model uncertainty dominant for most of the twenty-first century. Locally, model uncertainty is dominant through 2100, with maxima in the North Atlantic and the Arctic Ocean. The model spread is driven largely by 4 of the 16 AOGCMs in the ensemble; these models exhibit outlying behavior in all RCPs and in both GMSL and NYSL. The magnitude of internal variability varies widely by location and across models, leading to differences of several decades in the local emergence of RCPs. The AOGCM spread, and its sensitivity to model exclusion and/or weighting, has important implications for sea level assessments, especially if a local risk management approach is utilized.

  15. An observational study identifying obese subgroups among older adults at increased risk of mobility disability: do perceptions of the neighborhood environment matter?

    PubMed

    King, Abby C; Salvo, Deborah; Banda, Jorge A; Ahn, David K; Gill, Thomas M; Miller, Michael; Newman, Anne B; Fielding, Roger A; Siordia, Carlos; Moore, Spencer; Folta, Sara; Spring, Bonnie; Manini, Todd; Pahor, Marco

    2015-12-18

    Obesity is an increasingly prevalent condition among older adults, yet relatively little is known about how built environment variables may be associated with obesity in older age groups. This is particularly the case for more vulnerable older adults already showing functional limitations associated with subsequent disability. The Lifestyle Interventions and Independence for Elders (LIFE) trial dataset (n = 1600) was used to explore the associations between perceived built environment variables and baseline obesity levels. Age-stratified recursive partitioning methods were applied to identify distinct subgroups with varying obesity prevalence. Among participants aged 70-78 years, four distinct subgroups, defined by combinations of perceived environment and race-ethnicity variables, were identified. The subgroups with the lowest obesity prevalence (45.5-59.4%) consisted of participants who reported living in neighborhoods with higher residential density. Among participants aged 79-89 years, the subgroup (of three distinct subgroups identified) with the lowest obesity prevalence (19.4%) consisted of non-African American/Black participants who reported living in neighborhoods with friends or acquaintances similar in demographic characteristics to themselves. Overall support for the partitioned subgroupings was obtained using mixed model regression analysis. The results suggest that, in combination with race/ethnicity, features of the perceived neighborhood built and social environments differentiated distinct groups of vulnerable older adults from different age strata that differed in obesity prevalence. Pending further verification, the results may help to inform subsequent targeting of such subgroups for further investigation. Clinicaltrials.gov Identifier =  NCT01072500.

  16. Efficiently sampling conformations and pathways using the concurrent adaptive sampling (CAS) algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahn, Surl-Hee; Grate, Jay W.; Darve, Eric F.

    Molecular dynamics (MD) simulations are useful in obtaining thermodynamic and kinetic properties of bio-molecules but are limited by the timescale barrier, i.e., we may be unable to efficiently obtain properties because we need to run microseconds or longer simulations using femtoseconds time steps. While there are several existing methods to overcome this timescale barrier and efficiently sample thermodynamic and/or kinetic properties, problems remain in regard to being able to sample un- known systems, deal with high-dimensional space of collective variables, and focus the computational effort on slow timescales. Hence, a new sampling method, called the “Concurrent Adaptive Sampling (CAS) algorithm,”more » has been developed to tackle these three issues and efficiently obtain conformations and pathways. The method is not constrained to use only one or two collective variables, unlike most reaction coordinate-dependent methods. Instead, it can use a large number of collective vari- ables and uses macrostates (a partition of the collective variable space) to enhance the sampling. The exploration is done by running a large number of short simula- tions, and a clustering technique is used to accelerate the sampling. In this paper, we introduce the new methodology and show results from two-dimensional models and bio-molecules, such as penta-alanine and triazine polymer« less

  17. Intersecting surface defects and instanton partition functions

    DOE PAGES

    Pan, Yiwen; Peelaers, Wolfger

    2017-07-14

    We analyze intersecting surface defects inserted in interacting four-dimensional N = 2 supersymmetric quantum field theories. We employ the realization of a class of such systems as the infrared xed points of renormalization group flows from larger theories, triggered by perturbed Seiberg-Witten monopole-like con gurations, to compute their partition functions. These results are cast into the form of a partition function of 4d/2d/0d coupled systems. In conclusion, our computations provide concrete expressions for the instanton partition function in the presence of intersecting defects and we study the corresponding ADHM model.

  18. Intersecting surface defects and instanton partition functions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pan, Yiwen; Peelaers, Wolfger

    We analyze intersecting surface defects inserted in interacting four-dimensional N = 2 supersymmetric quantum field theories. We employ the realization of a class of such systems as the infrared xed points of renormalization group flows from larger theories, triggered by perturbed Seiberg-Witten monopole-like con gurations, to compute their partition functions. These results are cast into the form of a partition function of 4d/2d/0d coupled systems. In conclusion, our computations provide concrete expressions for the instanton partition function in the presence of intersecting defects and we study the corresponding ADHM model.

  19. Performance analysis of clustering techniques over microarray data: A case study

    NASA Astrophysics Data System (ADS)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  20. Prediction of Partition Coefficients of Organic Compounds between SPME/PDMS and Aqueous Solution

    PubMed Central

    Chao, Keh-Ping; Lu, Yu-Ting; Yang, Hsiu-Wen

    2014-01-01

    Polydimethylsiloxane (PDMS) is commonly used as the coated polymer in the solid phase microextraction (SPME) technique. In this study, the partition coefficients of organic compounds between SPME/PDMS and the aqueous solution were compiled from the literature sources. The correlation analysis for partition coefficients was conducted to interpret the effect of their physicochemical properties and descriptors on the partitioning process. The PDMS-water partition coefficients were significantly correlated to the polarizability of organic compounds (r = 0.977, p < 0.05). An empirical model, consisting of the polarizability, the molecular connectivity index, and an indicator variable, was developed to appropriately predict the partition coefficients of 61 organic compounds for the training set. The predictive ability of the empirical model was demonstrated by using it on a test set of 26 chemicals not included in the training set. The empirical model, applying the straightforward calculated molecular descriptors, for estimating the PDMS-water partition coefficient will contribute to the practical applications of the SPME technique. PMID:24534804

  1. Research on retailer data clustering algorithm based on Spark

    NASA Astrophysics Data System (ADS)

    Huang, Qiuman; Zhou, Feng

    2017-03-01

    Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.

  2. Robust Intratumor Partitioning to Identify High-Risk Subregions in Lung Cancer: A Pilot Study.

    PubMed

    Wu, Jia; Gensheimer, Michael F; Dong, Xinzhe; Rubin, Daniel L; Napel, Sandy; Diehn, Maximilian; Loo, Billy W; Li, Ruijiang

    2016-08-01

    To develop an intratumor partitioning framework for identifying high-risk subregions from (18)F-fluorodeoxyglucose positron emission tomography (FDG-PET) and computed tomography (CT) imaging and to test whether tumor burden associated with the high-risk subregions is prognostic of outcomes in lung cancer. In this institutional review board-approved retrospective study, we analyzed the pretreatment FDG-PET and CT scans of 44 lung cancer patients treated with radiation therapy. A novel, intratumor partitioning method was developed, based on a 2-stage clustering process: first at the patient level, each tumor was over-segmented into many superpixels by k-means clustering of integrated PET and CT images; next, tumor subregions were identified by merging previously defined superpixels via population-level hierarchical clustering. The volume associated with each of the subregions was evaluated using Kaplan-Meier analysis regarding its prognostic capability in predicting overall survival (OS) and out-of-field progression (OFP). Three spatially distinct subregions were identified within each tumor that were highly robust to uncertainty in PET/CT co-registration. Among these, the volume of the most metabolically active and metabolically heterogeneous solid component of the tumor was predictive of OS and OFP on the entire cohort, with a concordance index or CI of 0.66-0.67. When restricting the analysis to patients with stage III disease (n=32), the same subregion achieved an even higher CI of 0.75 (hazard ratio 3.93, log-rank P=.002) for predicting OS, and a CI of 0.76 (hazard ratio 4.84, log-rank P=.002) for predicting OFP. In comparison, conventional imaging markers, including tumor volume, maximum standardized uptake value, and metabolic tumor volume using threshold of 50% standardized uptake value maximum, were not predictive of OS or OFP, with CI mostly below 0.60 (log-rank P>.05). We propose a robust intratumor partitioning method to identify clinically relevant, high-risk subregions in lung cancer. We envision that this approach will be applicable to identifying useful imaging biomarkers in many cancer types. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Robust Intratumor Partitioning to Identify High-Risk Subregions in Lung Cancer: A Pilot Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Jia; Gensheimer, Michael F.; Dong, Xinzhe

    2016-08-01

    Purpose: To develop an intratumor partitioning framework for identifying high-risk subregions from {sup 18}F-fluorodeoxyglucose positron emission tomography (FDG-PET) and computed tomography (CT) imaging and to test whether tumor burden associated with the high-risk subregions is prognostic of outcomes in lung cancer. Methods and Materials: In this institutional review board–approved retrospective study, we analyzed the pretreatment FDG-PET and CT scans of 44 lung cancer patients treated with radiation therapy. A novel, intratumor partitioning method was developed, based on a 2-stage clustering process: first at the patient level, each tumor was over-segmented into many superpixels by k-means clustering of integrated PET andmore » CT images; next, tumor subregions were identified by merging previously defined superpixels via population-level hierarchical clustering. The volume associated with each of the subregions was evaluated using Kaplan-Meier analysis regarding its prognostic capability in predicting overall survival (OS) and out-of-field progression (OFP). Results: Three spatially distinct subregions were identified within each tumor that were highly robust to uncertainty in PET/CT co-registration. Among these, the volume of the most metabolically active and metabolically heterogeneous solid component of the tumor was predictive of OS and OFP on the entire cohort, with a concordance index or CI of 0.66-0.67. When restricting the analysis to patients with stage III disease (n=32), the same subregion achieved an even higher CI of 0.75 (hazard ratio 3.93, log-rank P=.002) for predicting OS, and a CI of 0.76 (hazard ratio 4.84, log-rank P=.002) for predicting OFP. In comparison, conventional imaging markers, including tumor volume, maximum standardized uptake value, and metabolic tumor volume using threshold of 50% standardized uptake value maximum, were not predictive of OS or OFP, with CI mostly below 0.60 (log-rank P>.05). Conclusion: We propose a robust intratumor partitioning method to identify clinically relevant, high-risk subregions in lung cancer. We envision that this approach will be applicable to identifying useful imaging biomarkers in many cancer types.« less

  4. Characterizing UT/LS O3 from Global Ozonesonde Profiles Using a Clustering Technique and Meteorological Reanalyses

    NASA Astrophysics Data System (ADS)

    Stauffer, R. M.; Thompson, A. M.

    2017-12-01

    Previous studies employing the self-organizing map (SOM) clustering technique to US ozonesonde data proved valuable for quantifying UT/LS O3 variability, and linking meteorological and chemical drivers to the shape of the ozone (O3) profile from the troposphere to the lower stratosphere. Focus has thus far been limited to specific geographical regions, but SOM has demonstrated the advantages of clustering over monthly climatological O3 averages, which mask day-to-day variability in the O3 profile and the correspondence between O3 and meteorology. We expand SOM to a global set of ozonesonde profiles, mostly from WOUDC, spanning 1980-present from 30 sites to evaluate global O3 climatologies and quantify links to geophysical processes for various meteorological regimes. Four clusters of O3 mixing ratio profiles are generated for each site, which show dominant profile shapes that correspond to site latitude. Offsets among O3 profile clusters and monthly O3 climatologies are 100s of ppbv in the UT/LS at higher latitude sites with active dynamics. Examination of meteorological reanalyses reveals a clear relationship among SOM clusters and covarying meteorological fields (geopotential height, potential vorticity, and tropopause height) for most sites. Tropical SOM clusters show marked dependence on velocity potential anomalies calculated from reanalysis winds, with low UT/LS O3 amounts corresponding to enhanced upper-level divergence, and vice versa. In addition to creating SOM cluster-based O3 climatologies, these results are meant to inform future approaches to validation of chemical transport models and satellite retrievals, which often struggle in the UT/LS region.

  5. Effects of rainfall partitioning in the seasonal and spatial variability of soil water content in a Mediterranean downy oak forest

    NASA Astrophysics Data System (ADS)

    Garcia-Estringana, P.; Latron, J.; Molina, A. J.; Llorens, P.

    2012-04-01

    Rainfall partitioning fluxes (throughfall and stemflow) have a large degree of temporal and spatial variability and may consequently lead to significant changes in the volume and composition of water that reach the understory and the soil. The objective of this work is to study the effect of rainfall partitioning on the seasonal and spatial variability of the soil water content in a Mediterranean downy oak forest (Quercus pubescens), located in the Vallcebre research catchments (42° 12'N, 1° 49'E). The monitoring design, started on July 2011, consists of a set of 20 automatic rain recorders and 40 automatic soil moisture probes located below the canopy. One hundred hemispheric photographs of the canopy were used to place the instruments at representative locations (in terms of canopy cover) within the plot. Bulk rainfall, stemflow and meteorological conditions above the forest cover are also automatically recorded. Canopy cover, in leaf and leafless periods, as well as biometric characteristics of the plot, are also regularly measured. This work presents the first results describing throughfall and soil moisture spatial variability during both the leaf and leafless periods. The main drivers of throughfall variability, as canopy structure and meteorological conditions are also analysed.

  6. "To be or not to be Retained … That's the Question!" Retention, Self-esteem, Self-concept, Achievement Goals, and Grades.

    PubMed

    Peixoto, Francisco; Monteiro, Vera; Mata, Lourdes; Sanches, Cristina; Pipa, Joana; Almeida, Leandro S

    2016-01-01

    Keeping students back in the same grade - retention - has always been a controversial issue in Education, with some defending it as a beneficial remedial practice and others arguing against its detrimental effects. This paper undertakes an analysis of this issue, focusing on the differences in student motivation and self-related variables according to their retention related status, and the interrelationship between retention and these variables. The participants were 695 students selected from two cohorts (5th and 7th graders) of a larger group of students followed over a 3-year project. The students were assigned to four groups according to their retention-related status over time: (1) students with past and recent retention; (2) students with past but no recent retention; (3) students with no past but recent retention; (4) students with no past or recent retention. Measures of achievement goal orientations, self-concept, self-esteem, importance given to school subjects and Grade Point Average (GPA) were collected for all students. Repeated measures MANCOVA analyses were carried out showing group differences in self-esteem, academic self-concept, importance attributed to academic competencies, task and avoidance orientation and academic achievement. To attain a deeper understanding of these results and to identify profiles across variables, a cluster analysis based on achievement goals was conducted and four clusters were identified. Students who were retained at the end of the school year are mainly represented in clusters with less adaptive motivational profiles and almost absent from clusters exhibiting more adaptive ones. Findings highlight that retention leaves a significant mark that remains even when students recover academic achievement and retention is in the distant past. This is reflected in the low academic self-concept as well as in the devaluation of academic competencies and in the avoidance orientation which, taken together, can undermine students' academic adjustment and turn retention into a risk factor.

  7. “To be or not to be Retained … That’s the Question!” Retention, Self-esteem, Self-concept, Achievement Goals, and Grades

    PubMed Central

    Peixoto, Francisco; Monteiro, Vera; Mata, Lourdes; Sanches, Cristina; Pipa, Joana; Almeida, Leandro S.

    2016-01-01

    Keeping students back in the same grade – retention – has always been a controversial issue in Education, with some defending it as a beneficial remedial practice and others arguing against its detrimental effects. This paper undertakes an analysis of this issue, focusing on the differences in student motivation and self-related variables according to their retention related status, and the interrelationship between retention and these variables. The participants were 695 students selected from two cohorts (5th and 7th graders) of a larger group of students followed over a 3-year project. The students were assigned to four groups according to their retention-related status over time: (1) students with past and recent retention; (2) students with past but no recent retention; (3) students with no past but recent retention; (4) students with no past or recent retention. Measures of achievement goal orientations, self-concept, self-esteem, importance given to school subjects and Grade Point Average (GPA) were collected for all students. Repeated measures MANCOVA analyses were carried out showing group differences in self-esteem, academic self-concept, importance attributed to academic competencies, task and avoidance orientation and academic achievement. To attain a deeper understanding of these results and to identify profiles across variables, a cluster analysis based on achievement goals was conducted and four clusters were identified. Students who were retained at the end of the school year are mainly represented in clusters with less adaptive motivational profiles and almost absent from clusters exhibiting more adaptive ones. Findings highlight that retention leaves a significant mark that remains even when students recover academic achievement and retention is in the distant past. This is reflected in the low academic self-concept as well as in the devaluation of academic competencies and in the avoidance orientation which, taken together, can undermine students’ academic adjustment and turn retention into a risk factor. PMID:27790167

  8. Scalable multi-sample single-cell data analysis by Partition-Assisted Clustering and Multiple Alignments of Networks

    PubMed Central

    Samusik, Nikolay; Wang, Xiaowei; Guan, Leying; Nolan, Garry P.

    2017-01-01

    Mass cytometry (CyTOF) has greatly expanded the capability of cytometry. It is now easy to generate multiple CyTOF samples in a single study, with each sample containing single-cell measurement on 50 markers for more than hundreds of thousands of cells. Current methods do not adequately address the issues concerning combining multiple samples for subpopulation discovery, and these issues can be quickly and dramatically amplified with increasing number of samples. To overcome this limitation, we developed Partition-Assisted Clustering and Multiple Alignments of Networks (PAC-MAN) for the fast automatic identification of cell populations in CyTOF data closely matching that of expert manual-discovery, and for alignments between subpopulations across samples to define dataset-level cellular states. PAC-MAN is computationally efficient, allowing the management of very large CyTOF datasets, which are increasingly common in clinical studies and cancer studies that monitor various tissue samples for each subject. PMID:29281633

  9. A network model of successive partitioning-limited solute diffusion through the stratum corneum.

    PubMed

    Schumm, Phillip; Scoglio, Caterina M; van der Merwe, Deon

    2010-02-07

    As the most exposed point of contact with the external environment, the skin is an important barrier to many chemical exposures, including medications, potentially toxic chemicals and cosmetics. Traditional dermal absorption models treat the stratum corneum lipids as a homogenous medium through which solutes diffuse according to Fick's first law of diffusion. This approach does not explain non-linear absorption and irregular distribution patterns within the stratum corneum lipids as observed in experimental data. A network model, based on successive partitioning-limited solute diffusion through the stratum corneum, where the lipid structure is represented by a large, sparse, and regular network where nodes have variable characteristics, offers an alternative, efficient, and flexible approach to dermal absorption modeling that simulates non-linear absorption data patterns. Four model versions are presented: two linear models, which have unlimited node capacities, and two non-linear models, which have limited node capacities. The non-linear model outputs produce absorption to dose relationships that can be best characterized quantitatively by using power equations, similar to the equations used to describe non-linear experimental data.

  10. Stochastic modeling of phosphorus transport in the Three Gorges Reservoir by incorporating variability associated with the phosphorus partition coefficient

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Lei; Fang, Hongwei; Xu, Xingya

    Phosphorus (P) fate and transport plays a crucial role in the ecology of rivers and reservoirs in which eutrophication is limited by P. A key uncertainty in models used to help manage P in such systems is the partitioning of P to suspended and bed sediments. By analyzing data from field and laboratory experiments, we stochastically characterize the variability of the partition coefficient (Kd) and derive spatio-temporal solutions for P transport in the Three Gorges Reservoir (TGR). We formulate a set of stochastic partial different equations (SPDEs) to simulate P transport by randomly sampling Kd from the measured distributions, tomore » obtain statistical descriptions of the P concentration and retention in the TGR. The correspondence between predicted and observed P concentrations and P retention in the TGR combined with the ability to effectively characterize uncertainty suggests that a model that incorporates the observed variability can better describe P dynamics and more effectively serve as a tool for P management in the system. This study highlights the importance of considering parametric uncertainty in estimating uncertainty/variability associated with simulated P transport.« less

  11. Stochastic modeling of phosphorus transport in the Three Gorges Reservoir by incorporating variability associated with the phosphorus partition coefficient

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Lei; Fang, Hongwei; Xu, Xingya

    Phosphorus (P) fate and transport plays a crucial role in the ecology of rivers and reservoirs in which eutrophication is limited by P. A key uncertainty in models used to help manage P in such systems is the partitioning of P to suspended and bed sediments. By analyzing data from field and laboratory experiments, we stochastically characterize the variability of the partition coefficient (Kd) and derive spatio-temporal solutions for P transport in the Three Gorges Reservoir (TGR). Here, we formulate a set of stochastic partial different equations (SPDEs) to simulate P transport by randomly sampling Kd from the measured distributions,more » to obtain statistical descriptions of the P concentration and retention in the TGR. Furthermore, the correspondence between predicted and observed P concentrations and P retention in the TGR combined with the ability to effectively characterize uncertainty suggests that a model that incorporates the observed variability can better describe P dynamics and more effectively serve as a tool for P management in the system. Our study highlights the importance of considering parametric uncertainty in estimating uncertainty/variability associated with simulated P transport.« less

  12. Stochastic modeling of phosphorus transport in the Three Gorges Reservoir by incorporating variability associated with the phosphorus partition coefficient

    DOE PAGES

    Huang, Lei; Fang, Hongwei; Xu, Xingya; ...

    2017-08-01

    Phosphorus (P) fate and transport plays a crucial role in the ecology of rivers and reservoirs in which eutrophication is limited by P. A key uncertainty in models used to help manage P in such systems is the partitioning of P to suspended and bed sediments. By analyzing data from field and laboratory experiments, we stochastically characterize the variability of the partition coefficient (Kd) and derive spatio-temporal solutions for P transport in the Three Gorges Reservoir (TGR). Here, we formulate a set of stochastic partial different equations (SPDEs) to simulate P transport by randomly sampling Kd from the measured distributions,more » to obtain statistical descriptions of the P concentration and retention in the TGR. Furthermore, the correspondence between predicted and observed P concentrations and P retention in the TGR combined with the ability to effectively characterize uncertainty suggests that a model that incorporates the observed variability can better describe P dynamics and more effectively serve as a tool for P management in the system. Our study highlights the importance of considering parametric uncertainty in estimating uncertainty/variability associated with simulated P transport.« less

  13. QSAR modeling of human serum protein binding with several modeling techniques utilizing structure-information representation.

    PubMed

    Votano, Joseph R; Parham, Marc; Hall, L Mark; Hall, Lowell H; Kier, Lemont B; Oloff, Scott; Tropsha, Alexander

    2006-11-30

    Four modeling techniques, using topological descriptors to represent molecular structure, were employed to produce models of human serum protein binding (% bound) on a data set of 1008 experimental values, carefully screened from publicly available sources. To our knowledge, this data is the largest set on human serum protein binding reported for QSAR modeling. The data was partitioned into a training set of 808 compounds and an external validation test set of 200 compounds. Partitioning was accomplished by clustering the compounds in a structure descriptor space so that random sampling of 20% of the whole data set produced an external test set that is a good representative of the training set with respect to both structure and protein binding values. The four modeling techniques include multiple linear regression (MLR), artificial neural networks (ANN), k-nearest neighbors (kNN), and support vector machines (SVM). With the exception of the MLR model, the ANN, kNN, and SVM QSARs were ensemble models. Training set correlation coefficients and mean absolute error ranged from r2=0.90 and MAE=7.6 for ANN to r2=0.61 and MAE=16.2 for MLR. Prediction results from the validation set yielded correlation coefficients and mean absolute errors which ranged from r2=0.70 and MAE=14.1 for ANN to a low of r2=0.59 and MAE=18.3 for the SVM model. Structure descriptors that contribute significantly to the models are discussed and compared with those found in other published models. For the ANN model, structure descriptor trends with respect to their affects on predicted protein binding can assist the chemist in structure modification during the drug design process.

  14. Coping with Stress in Deprived Urban Neighborhoods: What Is the Role of Green Space According to Life Stage?

    PubMed

    Roe, Jenny J; Aspinall, Peter A; Ward Thompson, Catharine

    2017-01-01

    This study follows previous research showing how green space quantity and contact with nature (via access to gardens/allotments) helps mitigate stress in people living in deprived urban environments (Ward Thompson et al., 2016). However, little is known about how these environments aid stress mitigation nor how stress levels vary in a population experiencing higher than average stress. This study used Latent Class Analysis (LCA) to, first, identify latent health clusters in the same population ( n = 406) and, second, to relate health cluster membership to variables of interest, including four hypothetical stress coping scenarios. Results showed a three-cluster model best fit the data, with membership to health clusters differentiated by age, perceived stress, general health, and subjective well-being. The clusters were labeled by the primary health outcome (i.e., perceived stress) and age group (1) Low-stress Youth characterized by ages 16-24; (2) Low-stress Seniors characterized by ages 65+ and (3) High-stress Mid-Age characterized by ages 25-44. Next, LCA identified that health membership was significantly related to four hypothetical stress coping scenarios set in people's current residential context: " staying at home " and three scenarios set outwith the home, " seeking peace and quiet," "going for a walk " or " seeking company. " Stress coping in Low stress Youth is characterized by " seeking company " and " going for a walk "; stress coping in Low-stress Seniors and High stress Mid-Age is characterized by " staying at home. " Finally, LCA identified significant relationships between health cluster membership and a range of demographic, other individual and environmental variables including access to, use of and perceptions of local green space. Our study found that the opportunities in the immediate neighborhood for stress reduction vary by age. Stress coping in youth is likely supported by being social and keeping physically active outdoors, including local green space visits. By contrast, local green space appears not to support stress regulation in young-middle aged and older adults, who choose to stay at home. We conclude that it is important to understand the complexities of stress management and the opportunities offered by local green space for stress mitigation by age and other demographic variables, such as gender.

  15. Coping with Stress in Deprived Urban Neighborhoods: What Is the Role of Green Space According to Life Stage?

    PubMed Central

    Roe, Jenny J.; Aspinall, Peter A.; Ward Thompson, Catharine

    2017-01-01

    This study follows previous research showing how green space quantity and contact with nature (via access to gardens/allotments) helps mitigate stress in people living in deprived urban environments (Ward Thompson et al., 2016). However, little is known about how these environments aid stress mitigation nor how stress levels vary in a population experiencing higher than average stress. This study used Latent Class Analysis (LCA) to, first, identify latent health clusters in the same population (n = 406) and, second, to relate health cluster membership to variables of interest, including four hypothetical stress coping scenarios. Results showed a three-cluster model best fit the data, with membership to health clusters differentiated by age, perceived stress, general health, and subjective well-being. The clusters were labeled by the primary health outcome (i.e., perceived stress) and age group (1) Low-stress Youth characterized by ages 16–24; (2) Low-stress Seniors characterized by ages 65+ and (3) High-stress Mid-Age characterized by ages 25–44. Next, LCA identified that health membership was significantly related to four hypothetical stress coping scenarios set in people's current residential context: “staying at home” and three scenarios set outwith the home, “seeking peace and quiet,” “going for a walk” or “seeking company.” Stress coping in Low stress Youth is characterized by “seeking company” and “going for a walk”; stress coping in Low-stress Seniors and High stress Mid-Age is characterized by “staying at home.” Finally, LCA identified significant relationships between health cluster membership and a range of demographic, other individual and environmental variables including access to, use of and perceptions of local green space. Our study found that the opportunities in the immediate neighborhood for stress reduction vary by age. Stress coping in youth is likely supported by being social and keeping physically active outdoors, including local green space visits. By contrast, local green space appears not to support stress regulation in young-middle aged and older adults, who choose to stay at home. We conclude that it is important to understand the complexities of stress management and the opportunities offered by local green space for stress mitigation by age and other demographic variables, such as gender. PMID:29093689

  16. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  17. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  18. Near-infrared Variability in the Orion Nebula Cluster

    NASA Astrophysics Data System (ADS)

    Rice, Thomas S.; Reipurth, Bo; Wolk, Scott J.; Vaz, Luiz Paulo; Cross, N. J. G.

    2015-10-01

    Using UKIRT on Mauna Kea, we have carried out a new near-infrared J, H, K monitoring survey of almost a square degree of the star-forming Orion Nebula Cluster with observations on 120 nights over three observing seasons, spanning a total of 894 days. We monitored ˜15,000 stars down to J≈ 20 using the WFCAM instrument, and have extracted 1203 significantly variable stars from our data. By studying variability in young stellar objects (YSOs) in the H - K, K color-magnitude diagram, we are able to distinguish between physical mechanisms of variability. Many variables show color behavior indicating either dust-extinction or disk/accretion activity, but we find that when monitored for longer periods of time, a number of stars shift between these two variability mechanisms. Further, we show that the intrinsic timescale of disk/accretion variability in young stars is longer than that of dust-extinction variability. We confirm that variability amplitude is statistically correlated with evolutionary class in all bands and colors. Our investigations of these 1203 variables have revealed 73 periodic AA Tau type variables, many large-amplitude and long-period (P\\gt 15 days) YSOs, including three stars showing widely spaced periodic brightening events consistent with circumbinary disk activity, and four new eclipsing binaries. These phenomena and others indicate the activity of long-term disk/accretion variability processes taking place in young stars. We have made the light curves and associated data for these 1203 variables available online.

  19. Soft sensor modeling based on variable partition ensemble method for nonlinear batch processes

    NASA Astrophysics Data System (ADS)

    Wang, Li; Chen, Xiangguang; Yang, Kai; Jin, Huaiping

    2017-01-01

    Batch processes are always characterized by nonlinear and system uncertain properties, therefore, the conventional single model may be ill-suited. A local learning strategy soft sensor based on variable partition ensemble method is developed for the quality prediction of nonlinear and non-Gaussian batch processes. A set of input variable sets are obtained by bootstrapping and PMI criterion. Then, multiple local GPR models are developed based on each local input variable set. When a new test data is coming, the posterior probability of each best performance local model is estimated based on Bayesian inference and used to combine these local GPR models to get the final prediction result. The proposed soft sensor is demonstrated by applying to an industrial fed-batch chlortetracycline fermentation process.

  20. Simultaneous clustering of gene expression data with clinical chemistry and pathological evaluations reveals phenotypic prototypes

    PubMed Central

    Bushel, Pierre R; Wolfinger, Russell D; Gibson, Greg

    2007-01-01

    Background Commonly employed clustering methods for analysis of gene expression data do not directly incorporate phenotypic data about the samples. Furthermore, clustering of samples with known phenotypes is typically performed in an informal fashion. The inability of clustering algorithms to incorporate biological data in the grouping process can limit proper interpretation of the data and its underlying biology. Results We present a more formal approach, the modk-prototypes algorithm, for clustering biological samples based on simultaneously considering microarray gene expression data and classes of known phenotypic variables such as clinical chemistry evaluations and histopathologic observations. The strategy involves constructing an objective function with the sum of the squared Euclidean distances for numeric microarray and clinical chemistry data and simple matching for histopathology categorical values in order to measure dissimilarity of the samples. Separate weighting terms are used for microarray, clinical chemistry and histopathology measurements to control the influence of each data domain on the clustering of the samples. The dynamic validity index for numeric data was modified with a category utility measure for determining the number of clusters in the data sets. A cluster's prototype, formed from the mean of the values for numeric features and the mode of the categorical values of all the samples in the group, is representative of the phenotype of the cluster members. The approach is shown to work well with a simulated mixed data set and two real data examples containing numeric and categorical data types. One from a heart disease study and another from acetaminophen (an analgesic) exposure in rat liver that causes centrilobular necrosis. Conclusion The modk-prototypes algorithm partitioned the simulated data into clusters with samples in their respective class group and the heart disease samples into two groups (sick and buff denoting samples having pain type representative of angina and non-angina respectively) with an accuracy of 79%. This is on par with, or better than, the assignment accuracy of the heart disease samples by several well-known and successful clustering algorithms. Following modk-prototypes clustering of the acetaminophen-exposed samples, informative genes from the cluster prototypes were identified that are descriptive of, and phenotypically anchored to, levels of necrosis of the centrilobular region of the rat liver. The biological processes cell growth and/or maintenance, amine metabolism, and stress response were shown to discern between no and moderate levels of acetaminophen-induced centrilobular necrosis. The use of well-known and traditional measurements directly in the clustering provides some guarantee that the resulting clusters will be meaningfully interpretable. PMID:17408499

  1. Combinations of Personal Responsibility: Differences on Pre-service and Practicing Teachers’ Efficacy, Engagement, Classroom Goal Structures and Wellbeing

    PubMed Central

    Daniels, Lia M.; Radil, Amanda I.; Goegan, Lauren D.

    2017-01-01

    Pre-service and practicing teachers feel responsible for a range of educational activities. Four domains of personal responsibility emerging in the literature are: student achievement, student motivation, relationships with students, and responsibility for ones own teaching. To date, most research has used variable-centered approaches to examining responsibilities even though the domains appear related. In two separate samples we used cluster analysis to explore how pre-service (n = 130) and practicing (n = 105) teachers combined personal responsibilities and their impact on three professional cognitions and their wellbeing. Both groups had low and high responsibility clusters but the third cluster differed: Pre-service teachers combined responsibilities for relationships and their own teaching in a cluster we refer to as teacher-based responsibility; whereas, practicing teachers combined achievement and motivation in a cluster we refer to as student-outcome focused responsibility. These combinations affected outcomes for pre-service but not practicing teachers. Pre-service teachers in the low responsibility cluster reported less engagement, less mastery approaches to instruction, and more performance goal structures than the other two clusters. PMID:28620332

  2. Combinations of Personal Responsibility: Differences on Pre-service and Practicing Teachers' Efficacy, Engagement, Classroom Goal Structures and Wellbeing.

    PubMed

    Daniels, Lia M; Radil, Amanda I; Goegan, Lauren D

    2017-01-01

    Pre-service and practicing teachers feel responsible for a range of educational activities. Four domains of personal responsibility emerging in the literature are: student achievement, student motivation, relationships with students, and responsibility for ones own teaching. To date, most research has used variable-centered approaches to examining responsibilities even though the domains appear related. In two separate samples we used cluster analysis to explore how pre-service ( n = 130) and practicing ( n = 105) teachers combined personal responsibilities and their impact on three professional cognitions and their wellbeing. Both groups had low and high responsibility clusters but the third cluster differed: Pre-service teachers combined responsibilities for relationships and their own teaching in a cluster we refer to as teacher-based responsibility; whereas, practicing teachers combined achievement and motivation in a cluster we refer to as student-outcome focused responsibility. These combinations affected outcomes for pre-service but not practicing teachers. Pre-service teachers in the low responsibility cluster reported less engagement, less mastery approaches to instruction, and more performance goal structures than the other two clusters.

  3. Determination of gas-liquid partition coefficients of several organic solutes in trihexyl(tetradecyl)phosphonium bromide using capillary gas chromatography columns.

    PubMed

    Ronco, Nicolás R; Menestrina, Fiorella; Romero, Lílian M; Castells, Cecilia B

    2017-06-09

    In this paper, we report gas-liquid partition constants for thirty-five volatile organic solutes in the room temperature ionic liquid trihexyl(tetradecyl)phosphonium bromide measured by gas-liquid chromatography using capillary columns. The relative contribution of gas-liquid partition and interfacial adsorption to retention was evaluated through the use of columns with different the phase ratio. Four capillary columns with exactly known phase ratios were constructed and employed to measure the solute retention factors at four temperatures between 313.15 and 343.15K. The partition coefficients were calculated from the slopes of the linear regression between solute retention factors and the reciprocal of phase ratio at a given temperature according to the gas-liquid chromatographic theory. Gas-liquid interfacial adsorption was detected for a few solutes and it has been considered for the calculations of partition coefficient. Reliable solute's infinite dilution activity coefficients can be obtained when retention data are determined by a unique partitioning mechanism. The partial molar excess enthalpies at infinite dilution have been estimated from the dependence of experimental values of solute activity coefficients with the column temperature. A thorough discussion of the uncertainties of the experimental measurements and the main advantages of the use of capillary columns to acquire the aforementioned relevant thermodynamic information was performed. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Assessment of mechanical properties of isolated bovine intervertebral discs from multi-parametric magnetic resonance imaging.

    PubMed

    Recuerda, Maximilien; Périé, Delphine; Gilbert, Guillaume; Beaudoin, Gilles

    2012-10-12

    The treatment planning of spine pathologies requires information on the rigidity and permeability of the intervertebral discs (IVDs). Magnetic resonance imaging (MRI) offers great potential as a sensitive and non-invasive technique for describing the mechanical properties of IVDs. However, the literature reported small correlation coefficients between mechanical properties and MRI parameters. Our hypothesis is that the compressive modulus and the permeability of the IVD can be predicted by a linear combination of MRI parameters. Sixty IVDs were harvested from bovine tails, and randomly separated in four groups (in-situ, digested-6h, digested-18h, digested-24h). Multi-parametric MRI acquisitions were used to quantify the relaxation times T1 and T2, the magnetization transfer ratio MTR, the apparent diffusion coefficient ADC and the fractional anisotropy FA. Unconfined compression, confined compression and direct permeability measurements were performed to quantify the compressive moduli and the hydraulic permeabilities. Differences between groups were evaluated from a one way ANOVA. Multi linear regressions were performed between dependent mechanical properties and independent MRI parameters to verify our hypothesis. A principal component analysis was used to convert the set of possibly correlated variables into a set of linearly uncorrelated variables. Agglomerative Hierarchical Clustering was performed on the 3 principal components. Multilinear regressions showed that 45 to 80% of the Young's modulus E, the aggregate modulus in absence of deformation HA0, the radial permeability kr and the axial permeability in absence of deformation k0 can be explained by the MRI parameters within both the nucleus pulposus and the annulus pulposus. The principal component analysis reduced our variables to two principal components with a cumulative variability of 52-65%, which increased to 70-82% when considering the third principal component. The dendograms showed a natural division into four clusters for the nucleus pulposus and into three or four clusters for the annulus fibrosus. The compressive moduli and the permeabilities of isolated IVDs can be assessed mostly by MT and diffusion sequences. However, the relationships have to be improved with the inclusion of MRI parameters more sensitive to IVD degeneration. Before the use of this technique to quantify the mechanical properties of IVDs in vivo on patients suffering from various diseases, the relationships have to be defined for each degeneration state of the tissue that mimics the pathology. Our MRI protocol associated to principal component analysis and agglomerative hierarchical clustering are promising tools to classify the degenerated intervertebral discs and further find biomarkers and predictive factors of the evolution of the pathologies.

  5. Recognition of building group patterns in topographic maps based on graph partitioning and random forest

    NASA Astrophysics Data System (ADS)

    He, Xianjin; Zhang, Xinchang; Xin, Qinchuan

    2018-02-01

    Recognition of building group patterns (i.e., the arrangement and form exhibited by a collection of buildings at a given mapping scale) is important to the understanding and modeling of geographic space and is hence essential to a wide range of downstream applications such as map generalization. Most of the existing methods develop rigid rules based on the topographic relationships between building pairs to identify building group patterns and thus their applications are often limited. This study proposes a method to identify a variety of building group patterns that allow for map generalization. The method first identifies building group patterns from potential building clusters based on a machine-learning algorithm and further partitions the building clusters with no recognized patterns based on the graph partitioning method. The proposed method is applied to the datasets of three cities that are representative of the complex urban environment in Southern China. Assessment of the results based on the reference data suggests that the proposed method is able to recognize both regular (e.g., the collinear, curvilinear, and rectangular patterns) and irregular (e.g., the L-shaped, H-shaped, and high-density patterns) building group patterns well, given that the correctness values are consistently nearly 90% and the completeness values are all above 91% for three study areas. The proposed method shows promises in automated recognition of building group patterns that allows for map generalization.

  6. Evidence of a Partitioned Dynamo Reversal Process from Paleomagnetic Recordings in Tahitian Lavas

    NASA Astrophysics Data System (ADS)

    Hoffman, K. A.; Mochizuki, N.

    2012-12-01

    Lavas erupted at the Society hotspot during the Matuyama-Brunhes (M-B) reversal record transitional field behavior containing two tight, subhorizontal paleodirectional groups that when averaged are antipodal at the 95% confidence level, and thus correlate to antipodal clustered virtual geomagnetic poles (VGPs). These observations--data obtained from two published records of the M-B transition from distinct sections of a succession of flows on Tahiti--are associated with a time when the strength of the axial dipole was significantly reduced. One cluster was recorded by lavas that were not erupted in succession, involving a directional rebound, suggesting that significant time had passed during this volcanic activity. Time spent during the formation of the antipodal cluster is unknown, yet it resides in the same location as VGP clusters from four other transitional events obtained from Society hotspot lavas. Calculated VGPs at the Society hotspot for both "polarities" of the 400-year averaged historic field--less the axial dipole term--are found in the cluster locations. These findings offer strong support for a two-tiered dynamo process in which nearly the entire axial dipole component undergoes both demise and regeneration quasi-independently from that of the remainder of the field--the proposed Shallow Core Generated (SCOR) field--the pattern of which being tied to long-held physical conditions of the lower-most mantle. Apart from polarity reversal, such fixed magnetic features along the core-mantle boundary would also significantly influence the long-term pattern of global paleosecular variation and likely impose strict site-dependent limits on the utility of the geocentric axial dipole (GAD) hypothesis.Clustered Matuyama-Brunhes transitional VGPs reported from the Punaruu Valley (in red), along with the VGP associated with each sign ("polarity") of the 400-year mean historic NAD-field (in yellow) calculated from model gulm1 for the site of the Society hotspot.

  7. Mammographic images segmentation based on chaotic map clustering algorithm

    PubMed Central

    2014-01-01

    Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766

  8. Distribution of Diverse Escherichia coli between Cattle and Pasture

    PubMed Central

    NandaKafle, Gitanjali; Seale, Tarren; Flint, Toby; Nepal, Madhav; Venter, Stephanus N.; Brözel, Volker S.

    2017-01-01

    Escherichia coli is widely considered to not survive for extended periods outside the intestines of warm-blooded animals; however, recent studies demonstrated that E. coli strains maintain populations in soil and water without any known fecal contamination. The objective of this study was to investigate whether the niche partitioning of E. coli occurs between cattle and their pasture. We attempted to clarify whether E. coli from bovine feces differs phenotypically and genotypically from isolates maintaining a population in pasture soil over winter. Soil, bovine fecal, and run-off samples were collected before and after the introduction of cattle to the pasture. Isolates (363) were genotyped by uidA and mutS sequences and phylogrouping, and evaluated for curli formation (Rough, Dry, And Red, or RDAR). Three types of clusters emerged, viz. bovine-associated, clusters devoid of cattle isolates and representing isolates endemic to the pasture environment, and clusters with both. All isolates clustered with strains of E. coli sensu stricto, distinct from the cryptic species Clades I, III, IV, and V. Pasture soil endemic and bovine fecal populations had very different phylogroup distributions, indicating niche partitioning. The soil endemic population was largely comprised of phylogroup B1 and had a higher average RDAR score than other isolates. These results indicate the existence of environmental E. coli strains that are phylogenetically distinct from bovine fecal isolates, and that have the ability to maintain populations in the soil environment. PMID:28747587

  9. Tropospheric Ozonesonde Profiles at Long-term U.S. Monitoring Sites: 1. A Climatology Based on Self-Organizing Maps

    NASA Technical Reports Server (NTRS)

    Stauffer, Ryan M.; Thompson, Anne M.; Young, George S.

    2016-01-01

    Sonde-based climatologies of tropospheric ozone (O3) are vital for developing satellite retrieval algorithms and evaluating chemical transport model output. Typical O3 climatologies average measurements by latitude or region, and season. A recent analysis using self-organizing maps (SOM) to cluster ozonesondes from two tropical sites found that clusters of O3 mixing ratio profiles are an excellent way to capture O3variability and link meteorological influences to O3 profiles. Clusters correspond to distinct meteorological conditions, e.g., convection, subsidence, cloud cover, and transported pollution. Here the SOM technique is extended to four long-term U.S. sites (Boulder, CO; Huntsville, AL; Trinidad Head, CA; and Wallops Island, VA) with4530 total profiles. Sensitivity tests on k-means algorithm and SOM justify use of 3 3 SOM (nine clusters). Ateach site, SOM clusters together O3 profiles with similar tropopause height, 500 hPa height temperature, and amount of tropospheric and total column O3. Cluster means are compared to monthly O3 climatologies.For all four sites, near-tropopause O3 is double (over +100 parts per billion by volume; ppbv) the monthly climatological O3 mixing ratio in three clusters that contain 1316 of profiles, mostly in winter and spring.Large midtropospheric deviations from monthly means (6 ppbv, +710 ppbv O3 at 6 km) are found in two of the most populated clusters (combined 3639 of profiles). These two clusters contain distinctly polluted(summer) and clean O3 (fall-winter, high tropopause) profiles, respectively. As for tropical profiles previously analyzed with SOM, O3 averages are often poor representations of U.S. O3 profile statistics.

  10. Tropospheric ozonesonde profiles at long-term U.S. monitoring sites: 1. A climatology based on self-organizing maps

    PubMed Central

    Stauffer, Ryan M.; Thompson, Anne M.; Young, George S.

    2018-01-01

    Sonde-based climatologies of tropospheric ozone (O3) are vital for developing satellite retrieval algorithms and evaluating chemical transport model output. Typical O3 climatologies average measurements by latitude or region, and season. Recent analysis using self-organizing maps (SOM) to cluster ozonesondes from two tropical sites found clusters of O3 mixing ratio profiles are an excellent way to capture O3 variability and link meteorological influences to O3 profiles. Clusters correspond to distinct meteorological conditions, e.g. convection, subsidence, cloud cover, and transported pollution. Here, the SOM technique is extended to four long-term U.S. sites (Boulder, CO; Huntsville, AL; Trinidad Head, CA; Wallops Island, VA) with 4530 total profiles. Sensitivity tests on k-means algorithm and SOM justify use of 3×3 SOM (nine clusters). At each site, SOM clusters together O3 profiles with similar tropopause height, 500 hPa height/temperature, and amount of tropospheric and total column O3. Cluster means are compared to monthly O3 climatologies. For all four sites, near-tropopause O3 is double (over +100 parts per billion by volume; ppbv) the monthly climatological O3 mixing ratio in three clusters that contain 13 – 16% of profiles, mostly in winter and spring. Large mid-tropospheric deviations from monthly means (−6 ppbv, +7 – 10 ppbv O3 at 6 km) are found in two of the most populated clusters (combined 36 – 39% of profiles). These two clusters contain distinctly polluted (summer) and clean O3 (fall-winter, high tropopause) profiles, respectively. As for tropical profiles previously analyzed with SOM, O3 averages are often poor representations of U.S. O3 profile statistics. PMID:29619288

  11. Tropospheric ozonesonde profiles at long-term U.S. monitoring sites: 1. A climatology based on self-organizing maps.

    PubMed

    Stauffer, Ryan M; Thompson, Anne M; Young, George S

    2016-02-16

    Sonde-based climatologies of tropospheric ozone (O 3 ) are vital for developing satellite retrieval algorithms and evaluating chemical transport model output. Typical O 3 climatologies average measurements by latitude or region, and season. Recent analysis using self-organizing maps (SOM) to cluster ozonesondes from two tropical sites found clusters of O 3 mixing ratio profiles are an excellent way to capture O 3 variability and link meteorological influences to O 3 profiles. Clusters correspond to distinct meteorological conditions, e.g. convection, subsidence, cloud cover, and transported pollution. Here, the SOM technique is extended to four long-term U.S. sites (Boulder, CO; Huntsville, AL; Trinidad Head, CA; Wallops Island, VA) with 4530 total profiles. Sensitivity tests on k-means algorithm and SOM justify use of 3×3 SOM (nine clusters). At each site, SOM clusters together O 3 profiles with similar tropopause height, 500 hPa height/temperature, and amount of tropospheric and total column O 3 . Cluster means are compared to monthly O 3 climatologies. For all four sites, near-tropopause O 3 is double (over +100 parts per billion by volume; ppbv) the monthly climatological O 3 mixing ratio in three clusters that contain 13 - 16% of profiles, mostly in winter and spring. Large mid-tropospheric deviations from monthly means (-6 ppbv, +7 - 10 ppbv O 3 at 6 km) are found in two of the most populated clusters (combined 36 - 39% of profiles). These two clusters contain distinctly polluted (summer) and clean O 3 (fall-winter, high tropopause) profiles, respectively. As for tropical profiles previously analyzed with SOM, O 3 averages are often poor representations of U.S. O 3 profile statistics.

  12. Multi-temporal clustering of continental floods and associated atmospheric circulations

    NASA Astrophysics Data System (ADS)

    Liu, Jianyu; Zhang, Yongqiang

    2017-12-01

    Investigating clustering of floods has important social, economic and ecological implications. This study examines the clustering of Australian floods at different temporal scales and its possible physical mechanisms. Flood series with different severities are obtained by peaks-over-threshold (POT) sampling in four flood thresholds. At intra-annual scale, Cox regression and monthly frequency methods are used to examine whether and when the flood clustering exists, respectively. At inter-annual scale, dispersion indices with four-time variation windows are applied to investigate the inter-annual flood clustering and its variation. Furthermore, the Kernel occurrence rate estimate and bootstrap resampling methods are used to identify flood-rich/flood-poor periods. Finally, seasonal variation of horizontal wind at 850 hPa and vertical wind velocity at 500 hPa are used to investigate the possible mechanisms causing the temporal flood clustering. Our results show that: (1) flood occurrences exhibit clustering at intra-annual scale, which are regulated by climate indices representing the impacts of the Pacific and Indian Oceans; (2) the flood-rich months occur from January to March over northern Australia, and from July to September over southwestern and southeastern Australia; (3) stronger inter-annual clustering takes place across southern Australia than northern Australia; and (4) Australian floods are characterised by regional flood-rich and flood-poor periods, with 1987-1992 identified as the flood-rich period across southern Australia, but the flood-poor period across northern Australia, and 2001-2006 being the flood-poor period across most regions of Australia. The intra-annual and inter-annual clustering and temporal variation of flood occurrences are in accordance with the variation of atmospheric circulation. These results provide relevant information for flood management under the influence of climate variability, and, therefore, are helpful for developing flood hazard mitigation schemes.

  13. Inter- and Intra-individual Variability in Response to Transcranial Direct Current Stimulation (tDCS) at Varying Current Intensities.

    PubMed

    Chew, Taariq; Ho, Kerrie-Anne; Loo, Colleen K

    2015-01-01

    Translation of transcranial direct current stimulation (tDCS) from research to clinical practice is hindered by a lack of consensus on optimal stimulation parameters, significant inter-individual variability in response, and in sufficient intra-individual reliability data. Inter-individual differences in response to anodal tDCS at a range of current intensities were explored. Intra-individual reliability in response to anodal tDCS across two identical sessions was also investigated. Twenty-nine subjects participated in a crossover study. Anodal-tDCS using four different current intensities (0.2, 0.5, 1 and 2 mA), with an anode size of 16 cm2, was tested. The 0.5 mA condition was repeated to assess intra-individual variability. TMS was used to elicit 40 motor-evoked potentials (MEPs) before 10 min of tDCS, and 20 MEPs at four time-points over 30 min following tDCS. ANOVA revealed no main effect of TIME for all conditions except the first 0.5 mA condition, and no differences in response between the four current intensities. Cluster analysis identified two clusters for the 0.2 and 2 mA conditions only. Frequency distributions based on individual subject responses (excitatory, inhibitory or no response) to each condition indicate possible differential responses between individuals to different current intensities. Test-retest reliability was negligible (ICC(2,1) = -0.50). Significant inter-individual variability in response to tDCS across a range of current intensities was found. 2 mA and 0.2 mA tDCS were most effective at inducing a distinct response. Significant intra-individual variability in response to tDCS was also found. This has implications for interpreting results of single-session tDCS experiments. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  14. Scalable clustering algorithms for continuous environmental flow cytometry.

    PubMed

    Hyrkas, Jeremy; Clayton, Sophie; Ribalet, Francois; Halperin, Daniel; Armbrust, E Virginia; Howe, Bill

    2016-02-01

    Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers. We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data. Source code available for download at https://github.com/jhyrkas/seaflow_cluster, implemented in Java for use with Hadoop. hyrkas@cs.washington.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications

    PubMed Central

    Austin, Peter C.

    2017-01-01

    Summary Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log–log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata). PMID:29307954

  16. A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications.

    PubMed

    Austin, Peter C

    2017-08-01

    Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log-log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata).

  17. Relation between financial market structure and the real economy: comparison between clustering methods.

    PubMed

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].

  18. EVIDENCE FOR THE UNIVERSALITY OF PROPERTIES OF RED-SEQUENCE GALAXIES IN X-RAY- AND RED-SEQUENCE-SELECTED CLUSTERS AT z ∼ 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foltz, R.; Wilson, G.; DeGroot, A.

    We study the slope, intercept, and scatter of the color–magnitude and color–mass relations for a sample of 10 infrared red-sequence-selected clusters at z ∼ 1. The quiescent galaxies in these clusters formed the bulk of their stars above z ≳ 3 with an age spread Δt ≳ 1 Gyr. We compare UVJ color–color and spectroscopic-based galaxy selection techniques, and find a 15% difference in the galaxy populations classified as quiescent by these methods. We compare the color–magnitude relations from our red-sequence selected sample with X-ray- and photometric-redshift-selected cluster samples of similar mass and redshift. Within uncertainties, we are unable tomore » detect any difference in the ages and star formation histories of quiescent cluster members in clusters selected by different methods, suggesting that the dominant quenching mechanism is insensitive to cluster baryon partitioning at z ∼ 1.« less

  19. Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

    PubMed Central

    Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T.

    2015-01-01

    We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging. PMID:25786703

  20. Roles of N-terminal fatty acid acylations in membrane compartment partitioning: Arabidopsis h-type thioredoxins as a case study.

    PubMed

    Traverso, José A; Micalella, Chiara; Martinez, Aude; Brown, Spencer C; Satiat-Jeunemaître, Béatrice; Meinnel, Thierry; Giglione, Carmela

    2013-03-01

    N-terminal fatty acylations (N-myristoylation [MYR] and S-palmitoylation [PAL]) are crucial modifications affecting 2 to 4% of eukaryotic proteins. The role of these modifications is to target proteins to membranes. Predictive tools have revealed unexpected targets of these acylations in Arabidopsis thaliana and other plants. However, little is known about how N-terminal lipidation governs membrane compartmentalization of proteins in plants. We show here that h-type thioredoxins (h-TRXs) cluster in four evolutionary subgroups displaying strictly conserved N-terminal modifications. It was predicted that one subgroup undergoes only MYR and another undergoes both MYR and PAL. We used plant TRXs as a model protein family to explore the effect of MYR alone or MYR and PAL in the same family of proteins. We used a high-throughput biochemical strategy to assess MYR of specific TRXs. Moreover, various TRX-green fluorescent protein fusions revealed that MYR localized protein to the endomembrane system and that partitioning between this membrane compartment and the cytosol correlated with the catalytic efficiency of the N-myristoyltransferase acting at the N terminus of the TRXs. Generalization of these results was obtained using several randomly selected Arabidopsis proteins displaying a MYR site only. Finally, we demonstrated that a palmitoylatable Cys residue flanking the MYR site is crucial to localize proteins to micropatching zones of the plasma membrane.

  1. Roles of N-Terminal Fatty Acid Acylations in Membrane Compartment Partitioning: Arabidopsis h-Type Thioredoxins as a Case Study[C][W

    PubMed Central

    Traverso, José A.; Micalella, Chiara; Martinez, Aude; Brown, Spencer C.; Satiat-Jeunemaître, Béatrice; Meinnel, Thierry; Giglione, Carmela

    2013-01-01

    N-terminal fatty acylations (N-myristoylation [MYR] and S-palmitoylation [PAL]) are crucial modifications affecting 2 to 4% of eukaryotic proteins. The role of these modifications is to target proteins to membranes. Predictive tools have revealed unexpected targets of these acylations in Arabidopsis thaliana and other plants. However, little is known about how N-terminal lipidation governs membrane compartmentalization of proteins in plants. We show here that h-type thioredoxins (h-TRXs) cluster in four evolutionary subgroups displaying strictly conserved N-terminal modifications. It was predicted that one subgroup undergoes only MYR and another undergoes both MYR and PAL. We used plant TRXs as a model protein family to explore the effect of MYR alone or MYR and PAL in the same family of proteins. We used a high-throughput biochemical strategy to assess MYR of specific TRXs. Moreover, various TRX–green fluorescent protein fusions revealed that MYR localized protein to the endomembrane system and that partitioning between this membrane compartment and the cytosol correlated with the catalytic efficiency of the N-myristoyltransferase acting at the N terminus of the TRXs. Generalization of these results was obtained using several randomly selected Arabidopsis proteins displaying a MYR site only. Finally, we demonstrated that a palmitoylatable Cys residue flanking the MYR site is crucial to localize proteins to micropatching zones of the plasma membrane. PMID:23543785

  2. Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.

    PubMed

    Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren

    2014-12-01

    Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p < 0.0001), and respiratory cause mortality (HR 21.5, p < 0.0001) than those in the other four groups. Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.

  3. A genetic graph-based approach for partitional clustering.

    PubMed

    Menéndez, Héctor D; Barrero, David F; Camacho, David

    2014-05-01

    Clustering is one of the most versatile tools for data analysis. In the recent years, clustering that seeks the continuity of data (in opposition to classical centroid-based approaches) has attracted an increasing research interest. It is a challenging problem with a remarkable practical interest. The most popular continuity clustering method is the spectral clustering (SC) algorithm, which is based on graph cut: It initially generates a similarity graph using a distance measure and then studies its graph spectrum to find the best cut. This approach is sensitive to the parameters of the metric, and a correct parameter choice is critical to the quality of the cluster. This work proposes a new algorithm, inspired by SC, that reduces the parameter dependency while maintaining the quality of the solution. The new algorithm, named genetic graph-based clustering (GGC), takes an evolutionary approach introducing a genetic algorithm (GA) to cluster the similarity graph. The experimental validation shows that GGC increases robustness of SC and has competitive performance in comparison with classical clustering methods, at least, in the synthetic and real dataset used in the experiments.

  4. A ground truth based comparative study on clustering of gene expression data.

    PubMed

    Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue

    2008-05-01

    Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.

  5. Accelerating DNA analysis applications on GPU clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tumeo, Antonino; Villa, Oreste

    DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less

  6. Effects of Maximal Sodium and Potassium Conductance on the Stability of Hodgkin-Huxley Model

    PubMed Central

    Wang, Kuanquan; Yuan, Yongfeng; Zhang, Henggui

    2014-01-01

    Hodgkin-Huxley (HH) equation is the first cell computing model in the world and pioneered the use of model to study electrophysiological problems. The model consists of four differential equations which are based on the experimental data of ion channels. Maximal conductance is an important characteristic of different channels. In this study, mathematical method is used to investigate the importance of maximal sodium conductance g-Na and maximal potassium conductance g-K. Applying stability theory, and taking g-Na and g-K as variables, we analyze the stability and bifurcations of the model. Bifurcations are found when the variables change, and bifurcation points and boundary are also calculated. There is only one bifurcation point when g-Na is the variable, while there are two points when g-K is variable. The (g-Na,  g-K) plane is partitioned into two regions and the upper bifurcation boundary is similar to a line when both g-Na and g-K are variables. Numerical simulations illustrate the validity of the analysis. The results obtained could be helpful in studying relevant diseases caused by maximal conductance anomaly. PMID:25104970

  7. An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems

    PubMed Central

    Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.

    2014-01-01

    This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235

  8. Intuitive visual impressions (cogs) for identifying clusters of diversity within potato species

    USDA-ARS?s Scientific Manuscript database

    One of the basic research activities of genebanks is to partition stocks into groups that facilitate the efficient preservation and evaluation of the full range of useful phenotype diversity. We sought to test the usefulness of making of infra-specific groups by replicated rapid visual intuitive imp...

  9. Estimating Accuracy of Land-Cover Composition From Two-Stage Clustering Sampling

    EPA Science Inventory

    Land-cover maps are often used to compute land-cover composition (i.e., the proportion or percent of area covered by each class), for each unit in a spatial partition of the region mapped. We derive design-based estimators of mean deviation (MD), mean absolute deviation (MAD), ...

  10. Tobacco, Marijuana, and Alcohol Use in University Students: A Cluster Analysis

    ERIC Educational Resources Information Center

    Primack, Brian A.; Kim, Kevin H.; Shensa, Ariel; Sidani, Jaime E.; Barnett, Tracey E.; Switzer, Galen E.

    2012-01-01

    Objective: Segmentation of populations may facilitate development of targeted substance abuse prevention programs. The authors aimed to partition a national sample of university students according to profiles based on substance use. Participants: The authors used 2008-2009 data from the National College Health Assessment from the American College…

  11. Clusters of Healthy and Unhealthy Eating Behaviors are Associated with Body Mass Index Among Adults

    PubMed Central

    Heerman, William J.; Jackson, Natalie; Hargreaves, Margaret; Mulvaney, Shelagh A.; Schlundt, David; Wallston, Kenneth A.; Rothman, Russell L.

    2017-01-01

    Objective To identify eating styles from 6 eating behaviors and test their association with Body Mass Index (BMI) among adults. Design Cross-sectional analysis of self-report survey data Setting 12 primary care and specialty clinics in 5 states Participants 11,776 adult patients consented to participate; 9,977 completed survey questions. Variables measured Frequency of eating healthy food; frequency of eating unhealthy food; breakfast frequency; frequency of snacking; overall diet quality; and problem eating behaviors. The primary dependent variable was BMI, calculated from self-reported height and weight data. Analysis Kmeans cluster analysis of eating behaviors was used to determine eating styles. A categorical variable representing each eating style cluster was entered in a multivariate linear regression predicting BMI, controlling for covariates. Results Four eating styles were identified and defined by healthy vs. unhealthy diet patterns and engagement in problem eating behaviors. Each group had significantly higher average BMI than the healthy eating style: healthy with problem eating behaviors (β=1.9, p<0.001); unhealthy (β=2.5, p<0.001), and unhealthy with problem eating behaviors (β=5.1, p<0.001). Conclusions Future attempts to improve eating styles should address not only the consumption of healthy foods, but also snacking behaviors and the emotional component of eating. PMID:28363804

  12. Russian consumers' motives for food choice.

    PubMed

    Honkanen, Pirjo; Frewer, Lynn

    2009-04-01

    Knowledge about food choice motives which have potential to influence consumer consumption decisions is important when designing food and health policies, as well as marketing strategies. Russian consumers' food choice motives were studied in a survey (1081 respondents across four cities), with the purpose of identifying consumer segments based on these motives. These segments were then profiled using consumption, attitudinal and demographic variables. Face-to-face interviews were used to sample the data, which were analysed with two-step cluster analysis (SPSS). Three clusters emerged, representing 21.5%, 45.8% and 32.7% of the sample. The clusters were similar in terms of the order of motivations, but differed in motivational level. Sensory factors and availability were the most important motives for food choice in all three clusters, followed by price. This may reflect the turbulence which Russia has recently experienced politically and economically. Cluster profiles differed in relation to socio-demographic factors, consumption patterns and attitudes towards health and healthy food.

  13. Investigation of Spatial and Temporal Trends in Water Quality in Daya Bay, South China Sea

    PubMed Central

    Wu, Mei-Lin; Wang, You-Shao; Dong, Jun-De; Sun, Cui-Ci; Wang, Yu-Tu; Sun, Fu-Lin; Cheng, Hao

    2011-01-01

    The objective is to identify the spatial and temporal variability of the hydrochemical quality of the water column in a subtropical coastal system, Daya Bay, China. Water samples were collected in four seasons at 12 monitoring sites. The Southeast Asian monsoons, northeasterly from October to the next April and southwesterly from May to September have also an important influence on water quality in Daya Bay. In the spatial pattern, two groups have been identified, with the help of multidimensional scaling analysis and cluster analysis. Cluster I consisted of the sites S3, S8, S10 and S11 in the west and north coastal parts of Daya Bay. Cluster I is mainly related to anthropogenic activities such as fish-farming. Cluster II consisted of the rest of the stations in the center, east and south parts of Daya Bay. Cluster II is mainly related to seawater exchange from South China Sea. PMID:21776234

  14. Automatic reconstruction of fault networks from seismicity catalogs: Three-dimensional optimal anisotropic dynamic clustering

    NASA Astrophysics Data System (ADS)

    Ouillon, G.; Ducorbier, C.; Sornette, D.

    2008-01-01

    We propose a new pattern recognition method that is able to reconstruct the three-dimensional structure of the active part of a fault network using the spatial location of earthquakes. The method is a generalization of the so-called dynamic clustering (or k means) method, that partitions a set of data points into clusters, using a global minimization criterion of the variance of the hypocenters locations about their center of mass. The new method improves on the original k means method by taking into account the full spatial covariance tensor of each cluster in order to partition the data set into fault-like, anisotropic clusters. Given a catalog of seismic events, the output is the optimal set of plane segments that fits the spatial structure of the data. Each plane segment is fully characterized by its location, size, and orientation. The main tunable parameter is the accuracy of the earthquake locations, which fixes the resolution, i.e., the residual variance of the fit. The resolution determines the number of fault segments needed to describe the earthquake catalog: the better the resolution, the finer the structure of the reconstructed fault segments. The algorithm successfully reconstructs the fault segments of synthetic earthquake catalogs. Applied to the real catalog constituted of a subset of the aftershock sequence of the 28 June 1992 Landers earthquake in southern California, the reconstructed plane segments fully agree with faults already known on geological maps or with blind faults that appear quite obvious in longer-term catalogs. Future improvements of the method are discussed, as well as its potential use in the multiscale study of the inner structure of fault zones.

  15. The Inter-Annual Variability Analysis of Carbon Exchange in Low Artic Fen Uncovers The Climate Sensitivity And The Uncertainties Around Net Ecosystem Exchange Partitioning

    NASA Astrophysics Data System (ADS)

    Blanco, E. L.; Lund, M.; Williams, M. D.; Christensen, T. R.; Tamstorf, M. P.

    2015-12-01

    An improvement in our process-based understanding of CO2 exchanges in the Arctic, and their climate sensitivity, is critical for examining the role of tundra ecosystems in changing climates. Arctic organic carbon storage has seen increased attention in recent years due to large potential for carbon releases following thaw. Our knowledge about the exact scale and sensitivity for a phase-change of these C stocks are, however, limited. Minor variations in Gross Primary Production (GPP) and Ecosystem Respiration (Reco) driven by changes in the climate can lead to either C sink or C source states, which likely will impact the overall C cycle of the ecosystem. Eddy covariance data is usually used to partition Net Ecosystem Exchange (NEE) into GPP and Reco achieved by flux separation algorithms. However, different partitioning approaches lead to different estimates. as well as undefined uncertainties. The main objectives of this study are to use model-data fusion approaches to (1) determine the inter-annual variability in C source/sink strength for an Arctic fen, and attribute such variations to GPP vs Reco, (2) investigate the climate sensitivity of these processes and (3) explore the uncertainties in NEE partitioning. The intention is to elaborate on the information gathered in an existing catchment area under an extensive cross-disciplinary ecological monitoring program in low Arctic West Greenland, established under the auspices of the Greenland Ecosystem Monitoring (GEM) program. The use of such a thorough long-term (7 years) dataset applied to the exploration in inter-annual variability of carbon exchange, related driving factors and NEE partition uncertainties provides a novel input into our understanding about land-atmosphere CO2 exchange.

  16. Preliminary results from a simulated laboratory experiment or an encounter of cluster satellite probes with a reconnection layer

    NASA Astrophysics Data System (ADS)

    Yamada, M.; Ren, Y.; Ji, H.; Gerhardt, S.; Darfman, S.

    2006-12-01

    With the recent upgrade of the MRX (Magnetic Reconnection Experiment) device[1], our experimental operation allows us to carry out a jog experiment in which a current sheet can be moved swiftly across an inserted probe assembly. A cluster of probes with variable distances can be inserted into a known desired position in the MRX device. This setup can be similar to the situation in which a cluster of satellites encounters a rapidly moving reconnection layer. If necessary, we can create a neutral sheet where the density of one side is significantly higher than the other, as is the case for the magnetopause. A variable guide field will be applied to study its effect on reconnection. We proposed[2] to document basic patterns of data during a simulated encounter of the MRX reconnection layer with the four-probe mock-up system and compare them with data acquired from past satellites. Relative position of the MMS satellites in the magnetosphere can then be determined. Optimum cluster configuration or distance between the four satellites can be determined for various diagnostics or research missions. The relationship of magnetic fluctuations[3] with the observed out-of- plane quadrupole field, a characteristic signature of the Hall MHD, can be also studied in this series of experiments. In this paper, results from a preliminary experiment will be presented. These experiments utilize effectively the unique MRX ability to accurately know the location of diagnostics with respect to the moving reconnection layer. Supported by DoE, NASA, NSF. [1] M. Yamada et al, Phys. Plasmas 13, 052119 (2006), [2] M.Yamada et al, MMS-IDS proposal (2006), [3] H. Ji et al, Phys. Rev. Letts. 92, 115001 (2004)

  17. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  18. HST Snapshot Study of Variable Stars in Globular Clusters: Inner Region of NGC 6441

    NASA Technical Reports Server (NTRS)

    Pritzl, Barton J.; Smith, Horace A.; Stetson, Peter B.; Catelan, Marcio; Sweigart, Allen V.; Layden, Andrew C.; Rich, R. Michael

    2003-01-01

    We present the results of a Hubble Space Telescope snapshot program to survey the inner region of the metal-rich globular cluster NGC 6441 for its variable stars. A total of 57 variable stars was found including 38 RR Lyrae stars, 6 Population II Cepheids, and 12 long period variables. Twenty-four of the RR Lyrae stars and all of the Population II Cepheids were previously undiscovered in ground-based surveys. Of the RR Lyrae stars observed in h s survey, 26 are pulsating in the fundamental mode with a mean period of 0.753 d and 12 are first-overtone mode pulsators with a mean period of 0.365 d. These values match up very well with those found in ground-based surveys. Combining all the available data for NGC 6441, we find mean periods of 0.759 d and 0.375 d for the RRab and RRc stars, respectively. We also find that the RR Lyrae in this survey are located in the same regions of a period-amplitude diagram as those found in ground-based surveys. The overall ratio of RRc to total RR Lyrae is 0.33. Although NGC 6441 is a metal-rich globular cluster and would, on that ground, be expected either to have few RR Lyrae stars, or to be an Oosterhoff type I system, its RR Lyrae more closely resemble those in Oosterhoff type II globular clusters. However, even compared to typical Oosterhoff type II systems, the mean period of its RRab stars is unusually long. We also derived I-band period-luminosity relations for the RR Lyrae stars. Of the six Population II Cepheids, five are of W Virginis type and one is a BL Herculis variable star. This makes NGC 6441, along with NGC 6388, the most metal-rich globular cluster known to contain these types of variable stars. Another variable, V118, may also be a Population II Cepheid given its long period and its separation in magnitude from the RR Lyrae stars. We examine the period-luminosity relation for these Population II Cepheids and compare it to those in other globular clusters and in the Large Magellanic Cloud. We argue that there does not appear to be a change in the period-luminosity relation slope between the BL Herculis and W Virginis stars, but that a change of slope does occur when the RV Tauri stars are added to the period-luminosity relation.

  19. A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data

    PubMed Central

    2015-01-01

    Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469

  20. Kaposi’s sarcoma–associated herpesvirus stably clusters its genomes across generations to maintain itself extrachromosomally

    PubMed Central

    Chiu, Ya-Fang; Sugden, Arthur U.

    2017-01-01

    Genetic elements that replicate extrachromosomally are rare in mammals; however, several human tumor viruses, including the papillomaviruses and the gammaherpesviruses, maintain their plasmid genomes by tethering them to cellular chromosomes. We have uncovered an unprecedented mechanism of viral replication: Kaposi’s sarcoma–associated herpesvirus (KSHV) stably clusters its genomes across generations to maintain itself extrachromosomally. To identify and characterize this mechanism, we developed two complementary, independent approaches: live-cell imaging and a predictive computational model. The clustering of KSHV requires the viral protein, LANA1, to bind viral genomes to nucleosomes arrayed on both cellular and viral DNA. Clustering affects both viral partitioning and viral genome numbers of KSHV. The clustering of KSHV plasmids provides it with an effective evolutionary strategy to rapidly increase copy numbers of genomes per cell at the expense of the total numbers of cells infected. PMID:28696226

  1. A novel artificial bee colony based clustering algorithm for categorical data.

    PubMed

    Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang

    2015-01-01

    Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.

  2. Kaposi’s sarcoma–associated herpesvirus stably clusters its genomes across generations to maintain itself extrachromosomally

    DOE PAGES

    Chiu, Ya-Fang; Sugden, Arthur U.; Fox, Kathryn; ...

    2017-07-10

    Genetic elements that replicate extrachromosomally are rare in mammals; however, several human tumor viruses, including the papillomaviruses and the gammaherpesviruses, maintain their plasmid genomes by tethering them to cellular chromosomes. We have uncovered an unprecedented mechanism of viral replication: Kaposi’s sarcoma–associated herpesvirus (KSHV) stably clusters its genomes across generations to maintain itself extrachromosomally. To identify and characterize this mechanism, we developed two complementary, independent approaches: live-cell imaging and a predictive computational model. The clustering of KSHV requires the viral protein, LANA1, to bind viral genomes to nucleosomes arrayed on both cellular and viral DNA. Clustering affects both viral partitioning andmore » viral genome numbers of KSHV. The clustering of KSHV plasmids provides it with an effective evolutionary strategy to rapidly increase copy numbers of genomes per cell at the expense of the total numbers of cells infected.« less

  3. Kaposi’s sarcoma–associated herpesvirus stably clusters its genomes across generations to maintain itself extrachromosomally

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chiu, Ya-Fang; Sugden, Arthur U.; Fox, Kathryn

    Genetic elements that replicate extrachromosomally are rare in mammals; however, several human tumor viruses, including the papillomaviruses and the gammaherpesviruses, maintain their plasmid genomes by tethering them to cellular chromosomes. We have uncovered an unprecedented mechanism of viral replication: Kaposi’s sarcoma–associated herpesvirus (KSHV) stably clusters its genomes across generations to maintain itself extrachromosomally. To identify and characterize this mechanism, we developed two complementary, independent approaches: live-cell imaging and a predictive computational model. The clustering of KSHV requires the viral protein, LANA1, to bind viral genomes to nucleosomes arrayed on both cellular and viral DNA. Clustering affects both viral partitioning andmore » viral genome numbers of KSHV. The clustering of KSHV plasmids provides it with an effective evolutionary strategy to rapidly increase copy numbers of genomes per cell at the expense of the total numbers of cells infected.« less

  4. Clusterization in Ternary Fission

    NASA Astrophysics Data System (ADS)

    Kamanin, D. V.; Pyatkov, Y. V.

    This lecture notes are devoted to the new kind of ternary decay of low excited heavy nuclei called by us "collinear cluster tri-partition" (CCT) due to the features of the effect observed, namely, decay partners fly away almost collinearly and at least one of them has magic nucleon composition. At the early stage of our work the process of "true ternary fission" (fission of the nucleus into three fragments of comparable masses) was considered to be undiscovered for low excited heavy nuclei. Another possible prototype—three body cluster radioactivity—was also unknown. The most close to the CCT phenomenon, at least cinematically, stands so called "polar emission", but only very light ions (up to isotopes of Be) were observed so far.

  5. Replicating cluster subtypes for the prevention of adolescent smoking and alcohol use.

    PubMed

    Babbin, Steven F; Velicer, Wayne F; Paiva, Andrea L; Brick, Leslie Ann D; Redding, Colleen A

    2015-01-01

    Substance abuse interventions tailored to the individual level have produced effective outcomes for a wide variety of behaviors. One approach to enhancing tailoring involves using cluster analysis to identify prevention subtypes that represent different attitudes about substance use. This study applied this approach to better understand tailored interventions for smoking and alcohol prevention. Analyses were performed on a sample of sixth graders from 20 New England middle schools involved in a 36-month tailored intervention study. Most adolescents reported being in the Acquisition Precontemplation (aPC) stage at baseline: not smoking or not drinking and not planning to start in the next six months. For smoking (N=4059) and alcohol (N=3973), each sample was randomly split into five subsamples. Cluster analysis was performed within each subsample based on three variables: Pros and Cons (from Decisional Balance Scales), and Situational Temptations. Across all subsamples for both smoking and alcohol, the following four clusters were identified: (1) Most Protected (MP; low Pros, high Cons, low Temptations); (2) Ambivalent (AM; high Pros, average Cons and Temptations); (3) Risk Denial (RD; average Pros, low Cons, average Temptations); and (4) High Risk (HR; high Pros, low Cons, and very high Temptations). Finding the same four clusters within aPC for both smoking and alcohol, replicating the results across the five subsamples, and demonstrating hypothesized relations among the clusters with additional external validity analyses provide strong evidence of the robustness of these results. These clusters demonstrate evidence of validity and can provide a basis for tailoring interventions. Copyright © 2014. Published by Elsevier Ltd.

  6. Replicating cluster subtypes for the prevention of adolescent smoking and alcohol use

    PubMed Central

    Babbin, Steven F.; Velicer, Wayne F.; Paiva, Andrea L.; Brick, Leslie Ann D.; Redding, Colleen A.

    2015-01-01

    Introduction Substance abuse interventions tailored to the individual level have produced effective outcomes for a wide variety of behaviors. One approach to enhancing tailoring involves using cluster analysis to identify prevention subtypes that represent different attitudes about substance use. This study applied this approach to better understand tailored interventions for smoking and alcohol prevention. Methods Analyses were performed on a sample of sixth graders from 20 New England middle schools involved in a 36-month tailored intervention study. Most adolescents reported being in the Acquisition Precontemplation (aPC) stage at baseline: not smoking or not drinking and not planning to start in the next six months. For smoking (N= 4059) and alcohol (N= 3973), each sample was randomly split into five subsamples. Cluster analysis was performed within each subsample based on three variables: Pros and Cons (from Decisional Balance Scales), and Situational Temptations. Results Across all subsamples for both smoking and alcohol, the following four clusters were identified: (1) Most Protected (MP; low Pros, high Cons, low Temptations); (2) Ambivalent (AM; high Pros, average Cons and Temptations); (3) Risk Denial (RD; average Pros, low Cons, average Temptations); and (4) High Risk (HR; high Pros, low Cons, and very high Temptations). Conclusions Finding the same four clusters within aPC for both smoking and alcohol, replicating the results across the five subsamples, and demonstrating hypothesized relations among the clusters with additional external validity analyses provide strong evidence of the robustness of these results. These clusters demonstrate evidence of validity and can provide a basis for tailoring interventions. PMID:25222849

  7. [Clustering patterns of behavioral risk factors linked to chronic disease among young adults in two localities in Bogota, Colombia: importance of sex differences].

    PubMed

    Gómez Gutiérrez, Luis Fernando; Lucumí Cuesta, Diego Iván; Girón Vargas, Sandra Lorena; Espinosa García, Gladys

    2004-01-01

    The characterization of clustering behavioral risk factors may be used as a guideline for interventions aimed at preventing chronic diseases. This study determined the clustering patterns of some behavioral risk factors in young adults aged 18 to 29 years and established the factors associated with having two or more of them. Patterns of clustering by gender were established in four behavioral risk factors (low consumption of fruits and vegetables, physical inactivity in leisure time, current tobacco consumption and acute alcohol consumption), in 1465 young adults participants through a multistage probabilistic sample. Regression models identified the sociodemografic variables associated with having two or more of the aforementioned behavioral risk factors. Having one, 32.9% two and 17.7% three or four. Acute alcohol consumption was the risk factor most frequent in the combined risk factor patterns among males; physical inactivity during leisure time being the most frequent among females. Among the females, having two or more behavioral risk factors was linked to be separated or divorced, this having been linked to work having been the main activity over the past 30 days among males. The combinations of behavioral risk factors studied and the factors associated with clustering show different patterns among males and females. These findings stressed the need of designing interventions sensitive to gender differences.

  8. Detecting communities in large networks

    NASA Astrophysics Data System (ADS)

    Capocci, A.; Servedio, V. D. P.; Caldarelli, G.; Colaiori, F.

    2005-07-01

    We develop an algorithm to detect community structure in complex networks. The algorithm is based on spectral methods and takes into account weights and link orientation. Since the method detects efficiently clustered nodes in large networks even when these are not sharply partitioned, it turns to be specially suitable for the analysis of social and information networks. We test the algorithm on a large-scale data-set from a psychological experiment of word association. In this case, it proves to be successful both in clustering words, and in uncovering mental association patterns.

  9. From globally coupled maps to complex-systems biology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kaneko, Kunihiko, E-mail: kaneko@complex.c.u-tokyo.ac.jp

    Studies of globally coupled maps, introduced as a network of chaotic dynamics, are briefly reviewed with an emphasis on novel concepts therein, which are universal in high-dimensional dynamical systems. They include clustering of synchronized oscillations, hierarchical clustering, chimera of synchronization and desynchronization, partition complexity, prevalence of Milnor attractors, chaotic itinerancy, and collective chaos. The degrees of freedom necessary for high dimensionality are proposed to equal the number in which the combinatorial exceeds the exponential. Future analysis of high-dimensional dynamical systems with regard to complex-systems biology is briefly discussed.

  10. Network clustering and community detection using modulus of families of loops.

    PubMed

    Shakeri, Heman; Poggi-Corradini, Pietro; Albin, Nathan; Scoglio, Caterina

    2017-01-01

    We study the structure of loops in networks using the notion of modulus of loop families. We introduce an alternate measure of network clustering by quantifying the richness of families of (simple) loops. Modulus tries to minimize the expected overlap among loops by spreading the expected link usage optimally. We propose weighting networks using these expected link usages to improve classical community detection algorithms. We show that the proposed method enhances the performance of certain algorithms, such as spectral partitioning and modularity maximization heuristics, on standard benchmarks.

  11. How Robust Is Linear Regression with Dummy Variables?

    ERIC Educational Resources Information Center

    Blankmeyer, Eric

    2006-01-01

    Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…

  12. Conformational Clusters of Phosphorylated Tyrosine.

    PubMed

    Abdelrasoul, Maha; Ponniah, Komala; Mao, Alice; Warden, Meghan S; Elhefnawy, Wessam; Li, Yaohang; Pascal, Steven M

    2017-12-06

    Tyrosine phosphorylation plays an important role in many cellular and intercellular processes including signal transduction, subcellular localization, and regulation of enzymatic activity. In 1999, Blom et al., using the limited number of protein data bank (PDB) structures available at that time, reported that the side chain structures of phosphorylated tyrosine (pY) are partitioned into two conserved conformational clusters ( Blom, N.; Gammeltoft, S.; Brunak, S. J. Mol. Biol. 1999 , 294 , 1351 - 1362 ). We have used the spectral clustering algorithm to cluster the increasingly growing number of protein structures with pY sites, and have found that the pY residues cluster into three distinct side chain conformations. Two of these pY conformational clusters associate strongly with a narrow range of tyrosine backbone conformation. The novel cluster also highly correlates with the identity of the n + 1 residue, and is strongly associated with a sequential pYpY conformation which places two adjacent pY side chains in a specific relative orientation. Further analysis shows that the three pY clusters are associated with distinct distributions of cognate protein kinases.

  13. Variability of Stratospheric Reactive Nitrogen and Ozone Related to the QBO

    NASA Astrophysics Data System (ADS)

    Park, M.; Randel, W. J.; Kinnison, D. E.; Bourassa, A. E.; Degenstein, D. A.; Roth, C. Z.; McLinden, C. A.; Sioris, C. E.; Livesey, N. J.; Santee, M. L.

    2017-09-01

    The stratospheric quasi-biennial oscillation (QBO) dominates interannual variability of dynamical variables and trace constituents in the tropical stratosphere and provides a natural experiment to test circulation-chemistry interactions. This work quantifies the relationships among ozone (O3), reactive nitrogen (NOy), and source gas N2O, and their links to the QBO, based on satellite constituent measurements and meteorological data spanning 2005-2014 (over four QBO cycles). Data include O3, HNO3, and N2O from the Aura Microwave Limb Sounder and an NOx proxy derived from Optical Spectrograph and Infrared Imager System NO2 measurements combined with a photochemical box model (= NOx*). Results are compared to simulations from the Whole Atmosphere Community Climate Model, version 4 incorporating a QBO circulation nudged to assimilated winds. Cross correlations and composites with respect to the QBO phase show coherent 180° out-of-phase relationships between NOy and N2O throughout the stratosphere, with the NOx/HNO3 ratio increasing with altitude. The anomalies in NOy species propagate coherently downward with the QBO. Ozone is anticorrelated with reactive nitrogen in the middle stratosphere above 28 km due to NOx control of ozone catalytic loss cycles. Quantitative comparisons of nitrogen partitioning and O3 sensitivity to NOx show good overall agreement between satellite observations and model results (suggesting closure of the NOy budget), although the model results show larger (up to 20%) N2O, NOx, and O3 variations near 35 km compared to observations. These analyses serve to assess the consistency of diverse satellite-based data sets and also to evaluate nitrogen partitioning and NOx-dependent ozone chemistry in the global model.

  14. Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories.

    PubMed

    Sáez, Carlos; Zurriaga, Oscar; Pérez-Panadés, Jordi; Melchor, Inma; Robles, Montserrat; García-Gómez, Juan M

    2016-11-01

    To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ). Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data. The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices. Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed. Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  15. Matrix partitioning and EOF/principal component analysis of Antarctic Sea ice brightness temperatures

    NASA Technical Reports Server (NTRS)

    Murray, C. W., Jr.; Mueller, J. L.; Zwally, H. J.

    1984-01-01

    A field of measured anomalies of some physical variable relative to their time averages, is partitioned in either the space domain or the time domain. Eigenvectors and corresponding principal components of the smaller dimensioned covariance matrices associated with the partitioned data sets are calculated independently, then joined to approximate the eigenstructure of the larger covariance matrix associated with the unpartitioned data set. The accuracy of the approximation (fraction of the total variance in the field) and the magnitudes of the largest eigenvalues from the partitioned covariance matrices together determine the number of local EOF's and principal components to be joined by any particular level. The space-time distribution of Nimbus-5 ESMR sea ice measurement is analyzed.

  16. Hierarchical clusters of phytoplankton variables in dammed water bodies

    NASA Astrophysics Data System (ADS)

    Silva, Eliana Costa e.; Lopes, Isabel Cristina; Correia, Aldina; Gonçalves, A. Manuela

    2017-06-01

    In this paper a dataset containing biological variables of the water column of several Portuguese reservoirs is analyzed. Hierarchical cluster analysis is used to obtain clusters of phytoplankton variables of the phylum Cyanophyta, with the objective of validating the classification of Portuguese reservoirs previewly presented in [1] which were divided into three clusters: (1) Interior Tagus and Aguieira; (2) Douro; and (3) Other rivers. Now three new clusters of Cyanophyta variables were found. Kruskal-Wallis and Mann-Whitney tests are used to compare the now obtained Cyanophyta clusters and the previous Reservoirs clusters, in order to validate the classification of the water quality of reservoirs. The amount of Cyanophyta algae present in the reservoirs from the three clusters is significantly different, which validates the previous classification.

  17. Analyzing students’ errors on fractions in the number line

    NASA Astrophysics Data System (ADS)

    Widodo, S.; Ikhwanudin, T.

    2018-05-01

    The objectives of this study are to know the type of students’ errors when they deal with fractions on the number line. This study used qualitative with a descriptive method, and involved 31 sixth grade students at one of the primary schools in Purwakarta, Indonesia. The results of this study are as follow, there are four types of student’s errors: unit confusion, tick mark interpretation error, partitioning and un partitioning error, and estimation error. We recommend that teachers should: strengthen unit understanding to the students when studying fractions, make students understand about tick mark interpretation, remind student of the importance of partitioning and un-partitioning strategy and teaches effective estimation strategies.

  18. The C-terminal domain of CblD interacts with CblC and influences intracellular cobalamin partitioning.

    PubMed

    Gherasim, Carmen; Hannibal, Luciana; Rajagopalan, Deepa; Jacobsen, Donald W; Banerjee, Ruma

    2013-05-01

    Mutations in cobalamin or B12 trafficking genes needed for cofactor assimilation and targeting lead to inborn errors of cobalamin metabolism. The gene corresponding to one of these loci, cblD, affects both the mitochondrial and cytoplasmic pathways for B12 processing. We have demonstrated that fibroblast cell lines from patients with mutations in CblD, can dealkylate exogenously supplied methylcobalamin (MeCbl), an activity catalyzed by the CblC protein, but show imbalanced intracellular partitioning of the cofactor into the MeCbl and 5'-deoxyadenosylcobalamin (AdoCbl) pools. These results confirm that CblD functions downstream of CblC in the cofactor assimilation pathway and that it plays an important role in controlling the traffic of the cofactor between the competing cytoplasmic and mitochondrial routes for MeCbl and AdoCbl synthesis, respectively. In this study, we report the interaction of CblC with four CblD protein variants with variable N-terminal start sites. We demonstrate that a complex between CblC and CblD can be isolated particularly under conditions that permit dealkylation of alkylcobalamin by CblC or in the presence of the corresponding dealkylated and oxidized product, hydroxocobalamin (HOCbl). A weak CblC·CblD complex is also seen in the presence of cyanocobalamin. Formation of the CblC·CblD complex is observed with all four CblD variants tested suggesting that the N-terminal 115 residues missing in the shortest variant are not essential for this interaction. Furthermore, limited proteolysis of the CblD variants indicates the presence of a stable C-terminal domain spanning residues ∼116-296. Our results are consistent with an adapter function for CblD, which in complex with CblC·HOCbl, or possibly the less oxidized CblC·cob(II)alamin, partitions the cofactor between AdoCbl and MeCbl assimilation pathways. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  19. Spatial pattern recognition of seismic events in South West Colombia

    NASA Astrophysics Data System (ADS)

    Benítez, Hernán D.; Flórez, Juan F.; Duque, Diana P.; Benavides, Alberto; Lucía Baquero, Olga; Quintero, Jiber

    2013-09-01

    Recognition of seismogenic zones in geographical regions supports seismic hazard studies. This recognition is usually based on visual, qualitative and subjective analysis of data. Spatial pattern recognition provides a well founded means to obtain relevant information from large amounts of data. The purpose of this work is to identify and classify spatial patterns in instrumental data of the South West Colombian seismic database. In this research, clustering tendency analysis validates whether seismic database possesses a clustering structure. A non-supervised fuzzy clustering algorithm creates groups of seismic events. Given the sensitivity of fuzzy clustering algorithms to centroid initial positions, we proposed a methodology to initialize centroids that generates stable partitions with respect to centroid initialization. As a result of this work, a public software tool provides the user with the routines developed for clustering methodology. The analysis of the seismogenic zones obtained reveals meaningful spatial patterns in South-West Colombia. The clustering analysis provides a quantitative location and dispersion of seismogenic zones that facilitates seismological interpretations of seismic activities in South West Colombia.

  20. An extended affinity propagation clustering method based on different data density types.

    PubMed

    Zhao, XiuLi; Xu, WeiXiang

    2015-01-01

    Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.

  1. Diametrical clustering for identifying anti-correlated gene clusters.

    PubMed

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  2. The Usefulness of Zone Division Using Belt Partition at the Entry Zone of MRI Machine Room: An Analysis of the Restrictive Effect of Dangerous Action Using a Questionnaire.

    PubMed

    Funada, Tatsuro; Shibuya, Tsubasa

    2016-08-01

    The American College of Radiology recommends dividing magnetic resonance imaging (MRI) machine rooms into four zones depending on the education level. However, structural limitations restrict us to apply such recommendation in most of the Japanese facilities. This study examines the effectiveness of the usage of a belt partition to create the zonal division by a questionnaire survey including three critical parameters. They are, the influence of individuals' background (relevance to MRI, years of experience, individuals' post, occupation [i.e., nurse or nursing assistant], outpatient section or ward), the presence or absence of a door or belt partition (opening or closing), and any four personnel scenarios that may be encountered during a visit to an MRI site (e.g., from visiting the MRI site to receive a patient) . In this survey, the influence of dangerous action is uncertain on individuals' backgrounds (maximum odds ratio: 6.3, 95% CI: 1.47-27.31) and the scenarios of personnel (maximum risk ratio: 2.4, 95% CI: 1.16-4.85). Conversely, the presence of the door and belt partition influences significantly (maximum risk ratio: 17.4, 95% CI: 7.94-17.38). For that reason, we suggest that visual impression has a strong influence on an individuals' actions. Even if structural limitations are present, zonal division by belt partition will provide a visual deterrent. Then, the partitioned zone will serve as a buffer zone. We conclude that if the belt partition is used properly, it is an inexpensive and effective safety management device for MRI rooms.

  3. LPS-induced NO inhibition and antioxidant activities of ethanol extracts and their solvent partitioned fractions from four brown seaweeds

    NASA Astrophysics Data System (ADS)

    Cho, Myoung Lae; Lee, Dong-Jin; Lee, Hyi-Seung; Lee, Yeon-Ju; You, Sang Guan

    2013-12-01

    The nitric oxide inhibitory (NOI) and antioxidant (ABTS and DPPH radical scavenging effects with reducing power) activities of the ethanol (EtOH) extracts and solvent partitioned fractions from Scytosiphon lomentaria, Chorda filum, Agarum cribrosum, and Desmarestia viridis were investigated, and the correlation between biological activity and total phenolic (TP) and phlorotannin (TPT) content was determined by PCA analysis. The yield of EtOH extracts from four brown seaweeds ranged from 2.6 to 6.6% with the highest yield from D. viridis, and the predominant compounds in their solvent partitioned fractions had medium and/or less polarity. The TP and TPT content of the EtOH extracts were in the ranges of 25.0-44.1 mg GAE/g sample and 0.2-4.6 mg PG/g sample, respectively, which were mostly included in the organic solvent partitioned fractions. Strong NOI activity was observed in the EtOH extracts and their solvent partitioned fractions from D. viridis and C. filum. In addition, the EtOH extract and its solvent partitioned fractions of D. viridis exhibited little cytotoxicity to Raw 264.7 cells. The most potent ABTS and DPPH radical scavenging capacity was shown in the EtOH extracts and their solvent partitioned fractions from S. lomentaria and C. filum, and both also exhibited strong reducing ability. In the PCA analysis the content of TPT had a good correlation with DPPH ( r = 0.62), ABTS ( r = 0.69) and reducing power ( r = 0.65), however, an unfair correlation was observed between the contents of TP and TPT and NOI, suggesting that the phlorotannins might be responsible for the DPPH and ABTS radical scavenging activities.

  4. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    PubMed

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  5. Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations.

    PubMed

    Gupta, Mayetri; Cheung, Ching-Lung; Hsu, Yi-Hsiang; Demissie, Serkalem; Cupples, L Adrienne; Kiel, Douglas P; Karasik, David

    2011-06-01

    Genome-wide association studies (GWAS) using high-density genotyping platforms offer an unbiased strategy to identify new candidate genes for osteoporosis. It is imperative to be able to clearly distinguish signal from noise by focusing on the best phenotype in a genetic study. We performed GWAS of multiple phenotypes associated with fractures [bone mineral density (BMD), bone quantitative ultrasound (QUS), bone geometry, and muscle mass] with approximately 433,000 single-nucleotide polymorphisms (SNPs) and created a database of resulting associations. We performed analysis of GWAS data from 23 phenotypes by a novel modification of a block clustering algorithm followed by gene-set enrichment analysis. A data matrix of standardized regression coefficients was partitioned along both axes--SNPs and phenotypes. Each partition represents a distinct cluster of SNPs that have similar effects over a particular set of phenotypes. Application of this method to our data shows several SNP-phenotype connections. We found a strong cluster of association coefficients of high magnitude for 10 traits (BMD at several skeletal sites, ultrasound measures, cross-sectional bone area, and section modulus of femoral neck and shaft). These clustered traits were highly genetically correlated. Gene-set enrichment analyses indicated the augmentation of genes that cluster with the 10 osteoporosis-related traits in pathways such as aldosterone signaling in epithelial cells, role of osteoblasts, osteoclasts, and chondrocytes in rheumatoid arthritis, and Parkinson signaling. In addition to several known candidate genes, we also identified PRKCH and SCNN1B as potential candidate genes for multiple bone traits. In conclusion, our mining of GWAS results revealed the similarity of association results between bone strength phenotypes that may be attributed to pleiotropic effects of genes. This knowledge may prove helpful in identifying novel genes and pathways that underlie several correlated phenotypes, as well as in deciphering genetic and phenotypic modularity underlying osteoporosis risk. Copyright © 2011 American Society for Bone and Mineral Research.

  6. Determination System Of Food Vouchers For the Poor Based On Fuzzy C-Means Method

    NASA Astrophysics Data System (ADS)

    Anamisa, D. R.; Yusuf, M.; Syakur, M. A.

    2018-01-01

    Food vouchers are government programs to tackle the poverty of rural communities. This program aims to help the poor group in getting enough food and nutrients from carbohydrates. There are several factors that influence to receive the food voucher, such as: job, monthly income, Taxes, electricity bill, size of house, number of family member, education certificate and amount of rice consumption every week. In the execution for the distribution of vouchers is often a lot of problems, such as: the distribution of food vouchers has been misdirected and someone who receives is still subjective. Some of the solutions to decision making have not been done. The research aims to calculating the change of each partition matrix and each cluster using Fuzzy C-Means method. Hopefully this research makes contribution by providing higher result using Fuzzy C-Means comparing to other method for this case study. In this research, decision making is done by using Fuzzy C-Means method. The Fuzzy C-Means method is a clustering method that has an organized and scattered cluster structure with regular patterns on two-dimensional datasets. Furthermore, Fuzzy C-Means method used for calculates the change of each partition matrix. Each cluster will be sorted by the proximity of the data element to the centroid of the cluster to get the ranking. Various trials were conducted for grouping and ranking of proposed data that received food vouchers based on the quota of each village. This testing by Fuzzy C-Means method, is developed and abled for determining the recipient of the food voucher with satisfaction results. Fulfillment of the recipient of the food voucher is 80% to 90% and this testing using data of 115 Family Card from 6 Villages. The quality of success affected, has been using the number of iteration factors is 20 and the number of clusters is 3

  7. Beverage Consumption Patterns at Age 13 to 17 Years Are Associated with Weight, Height, and Body Mass Index at Age 17 Years.

    PubMed

    Marshall, Teresa A; Van Buren, John M; Warren, John J; Cavanaugh, Joseph E; Levy, Steven M

    2017-05-01

    Sugar-sweetened beverages (SSBs) have been associated with obesity in children and adults; however, associations between beverage patterns and obesity are not understood. Our aim was to describe beverage patterns during adolescence and associations between adolescent beverage patterns and anthropometric measures at age 17 years. We conducted a cross-sectional analyses of longitudinally collected data. Data from participants in the longitudinal Iowa Fluoride Study having at least one beverage questionnaire completed between ages 13.0 and 14.0 years, having a second questionnaire completed between 16.0 and 17.0 years, and attending clinic examination for weight and height measurements at age 17 years (n=369) were included. Beverages were collapsed into four categories (ie, 100% juice, milk, water and other sugar-free beverages, and SSBs) for the purpose of clustering. Five beverage clusters were identified from standardized age 13 to 17 years mean daily beverage intakes and named by the authors for the dominant beverage: juice, milk, water/sugar-free beverages, neutral, and SSB. Weight, height, and body mass index (BMI; calculated as kg/m 2 ) at age 17 years were analyzed. We used Ward's method for clustering of beverage variables, one-way analysis of variance and χ 2 tests for bivariable associations, and γ-regression for associations of weight or BMI (outcomes) with beverage clusters and demographic variables. Linear regression was used for associations of height (outcome) with beverage clusters and demographic variables. Participants with family incomes <$60,000 trended shorter (1.5±0.8 cm; P=0.070) and were heavier (2.0±0.7 BMI units; P=0.002) than participants with family incomes ≥$60,000/year. Adjusted mean weight, height, and BMI estimates differed by beverage cluster membership. For example, on average, male and female members of the neutral cluster were 4.5 cm (P=0.010) and 4.2 cm (P=0.034) shorter, respectively, than members of the milk cluster. For members of the juice cluster, mean BMI was lower than for members of the milk cluster (by 2.4 units), water/sugar-free beverage cluster (3.5 units), neutral cluster (2.2 units), and SSB cluster (3.2 units) (all P<0.05). Beverage patterns at ages 13 to 17 years were associated with anthropometric measures and BMI at age 17 years in this sample. Beverage patterns might be characteristic of overall food choices and dietary behaviors that influence growth. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.

  8. The hydrological effects of varying vegetation characteristics in a temperate water-limited basin: Development of the dynamic Budyko-Choudhury-Porporato (dBCP) model

    NASA Astrophysics Data System (ADS)

    Liu, Qiang; McVicar, Tim R.; Yang, Zhifeng; Donohue, Randall J.; Liang, Liqiao; Yang, Yuting

    2016-12-01

    Vegetation patterns are affected by water availability, which, in turn, influences the hydrological partitioning and regional water balance, especially in water-limited regions. Considering the important role of vegetation in partitioning the catchment water yield, the recently developed Budyko-Choudhury-Porporato (or BCP) model incorporated Porporato's model of key ecohydrological processes into Choudury's form of the Budyko hydroclimatic framework. Here we extend the steady state BCP model by incorporating dynamic ecohydrological processes into it and combining it with a typical bucket soil water balance model (resulting in the dynamic BCP, or dBCP, model). The dBCP model is used here to assess the impacts of vegetation on the water balance in a temperate water-limited basin (i.e., the Yellow River Basin (YRB) in north China), where growing season phenology is primarily constrained by low temperatures. The results show that: (i) the incorporation of dynamic growing season (fs) and dynamic effective rooting depth (Ze) conditions into the dBCP model improves results when compared to the original BCP model; (ii) dBCP model's results vary depending on time-step used (i.e., we tested mean-annual to monthly), which reflected the influence of catchment variables, e.g., catchment area, catchment-average air temperature, dryness index and Ze; and (iii) actual evapotranspiration (E) is more sensitive to changes in mean storm depth (α), followed by P, Ze, and Ep. When taking into account observed variability of each of four ecohydrological variables, changes in Ze cause the greatest variability in E, generally followed by variability in P and α, and then Ep. The dBCP results indicate that incorporating dynamic ecohydrological processes into the Budyko framework can improve the estimation of inter-annual variability of the regional water balance. This can help to understand the water requirement and to establish suitable water management strategies to adapt to climate change in the YRB. The dBCP model has modest forcing data requirements and can be applied to other basins globally.

  9. Characterizing Temperature Variability and Associated Large Scale Meteorological Patterns Across South America

    NASA Astrophysics Data System (ADS)

    Detzer, J.; Loikith, P. C.; Mechoso, C. R.; Barkhordarian, A.; Lee, H.

    2017-12-01

    South America's climate varies considerably owing to its large geographic range and diverse topographical features. Spanning the tropics to the mid-latitudes and from high peaks to tropical rainforest, the continent experiences an array of climate and weather patterns. Due to this considerable spatial extent, assessing temperature variability at the continent scale is particularly challenging. It is well documented in the literature that temperatures have been increasing across portions of South America in recent decades, and while there have been many studies that have focused on precipitation variability and change, temperature has received less scientific attention. Therefore, a more thorough understanding of the drivers of temperature variability is critical for interpreting future change. First, k-means cluster analysis is used to identify four primary modes of temperature variability across the continent, stratified by season. Next, composites of large scale meteorological patterns (LSMPs) are calculated for months assigned to each cluster. Initial results suggest that LSMPs, defined using meteorological variables such as sea level pressure (SLP), geopotential height, and wind, are able to identify synoptic scale mechanisms important for driving temperature variability at the monthly scale. Some LSMPs indicate a relationship with known recurrent modes of climate variability. For example, composites of geopotential height suggest that the Southern Annular Mode is an important, but not necessarily dominant, component of temperature variability over southern South America. This work will be extended to assess the drivers of temperature extremes across South America.

  10. Seasonal patterns in energy partitioning of two freshwater marsh ecosystems in the Florida Everglades

    Treesearch

    Sparkle L. Malone; Christina L. Staudhammer; Henry W. Loescher; Paulo Olivas; Steven F. Oberbauer; Michael G. Ryan; Jessica Schedlbauer; Gregory Starr

    2014-01-01

    We analyzed energy partitioning in short- and long-hydroperiod freshwater marsh ecosystems in the Florida Everglades by examining energy balance components (eddy covariance derived latent energy (LE) and sensible heat (H) flux). The study period included several wet and dry seasons and variable water levels, allowing us to gain better mechanistic information about the...

  11. High Pressure and Temperature Core Formation as an Alternative to the "Late Veneer" Hypothesis

    NASA Technical Reports Server (NTRS)

    Righter, Kevin; Pando, K.; Humayun, M.; Danielson, L.

    2011-01-01

    The highly siderophile elements (HSE; Re, Au and the Platinum Group Elements - Pd Pt, Rh, Ru, Ir, Os) are commonly utilized to constrain accretion processes in terrestrial differentiated bodies due to their affinity for FeNi metal [1]. These eight elements exhibit highly siderophile behavior, but nonetheless have highly diverse metal-silicate partition coefficients [2]. Therefore the near chondritic relative concentrations of HSEs in the terrestrial and lunar mantles, as well as some other bodies, are attributed to late accretion rather than core formation [1]. Evaluation of competing theories, such as high pressure metal-silicate partitioning or magma ocean hypotheses has been hindered by a lack of relevant partitioning data for this group of eight elements. In particular, systematic studies isolating the effect of one variable (e.g. temperature or melt compositions) are lacking. Here we undertake new experiments on all eight elements, using Fe metal and FeO-bearing silicate melts at fixed pressure, but variable temperatures. These experiments, as well as some additional planned experiments should allow partition coefficients to be more accurately calculated or estimated at the PT conditions and compositions at which core formation is thought to have occurred.

  12. Cluster analysis of obesity and asthma phenotypes.

    PubMed

    Sutherland, E Rand; Goleva, Elena; King, Tonya S; Lehman, Erik; Stevens, Allen D; Jackson, Leisa P; Stream, Amanda R; Fahy, John V; Leung, Donald Y M

    2012-01-01

    Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC). Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype. In a cohort of clinical trial participants (n = 250), minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα) and induction of MAP kinase phosphatase-1 (MKP-1) expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2)) and severity of asthma symptoms (AEQ score) the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively). Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ) and control (ACQ), exhaled nitric oxide concentration (F(E)NO) and airway hyperresponsiveness (methacholine PC(20)) but were similar with regard to measures of lung function (FEV(1) (%) and FEV(1)/FVC), airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP). Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasone Obesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals. Reduced expression of the dominant functional isoform of the GCR may mediate GC insensitivity in obese asthmatics.

  13. Synergistic Effects of Age on Patterns of White and Gray Matter Volume across Childhood and Adolescence1,2,3

    PubMed Central

    Krongold, Mark; Cooper, Cassandra; Lebel, Catherine

    2015-01-01

    Abstract The human brain develops with a nonlinear contraction of gray matter across late childhood and adolescence with a concomitant increase in white matter volume. Across the adult population, properties of cortical gray matter covary within networks that may represent organizational units for development and degeneration. Although gray matter covariance may be strongest within structurally connected networks, the relationship to volume changes in white matter remains poorly characterized. In the present study we examined age-related trends in white and gray matter volume using T1-weighted MR images from 360 human participants from the NIH MRI study of Normal Brain Development. Images were processed through a voxel-based morphometry pipeline. Linear effects of age on white and gray matter volume were modeled within four age bins, spanning 4-18 years, each including 90 participants (45 male). White and gray matter age-slope maps were separately entered into k-means clustering to identify regions with similar age-related variability across the four age bins. Four white matter clusters were identified, each with a dominant direction of underlying fibers: anterior–posterior, left–right, and two clusters with superior–inferior directions. Corresponding, spatially proximal, gray matter clusters encompassed largely cerebellar, fronto-insular, posterior, and sensorimotor regions, respectively. Pairs of gray and white matter clusters followed parallel slope trajectories, with white matter changes generally positive from 8 years onward (indicating volume increases) and gray matter negative (decreases). As developmental disorders likely target networks rather than individual regions, characterizing typical coordination of white and gray matter development can provide a normative benchmark for understanding atypical development. PMID:26464999

  14. Variable Screening for Cluster Analysis.

    ERIC Educational Resources Information Center

    Donoghue, John R.

    Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…

  15. Resource partitioning within major bottom fish species in a highly productive upwelling ecosystem

    NASA Astrophysics Data System (ADS)

    Abdellaoui, Souad; El Halouani, Hassan; Tai, Imane; Masski, Hicham

    2017-09-01

    The Saharan Bank (21-26°N) is a wide subtropical continental shelf and a highly productive upwelling ecosystem. The bottom communities are dominated by octopus and sparid fish, which are the main targets of bottom-trawl fishing fleets. To investigate resource partitioning within the bottom fish community, adult fish from 14 of the most abundant species were investigated for stomach content analysis. Samples were collected during two periods: October 2003 and May 2007. The diet of the analysed species showed more variation between periods than between size classes, suggesting that temporal or spatial variability in prey availability appears to play a significant role in their diet. Multivariate analysis and subsequent clustering led to a grouping of the species within five trophic guilds. Two species were fish feeders, and the others mainly fed on benthic invertebrates, where epibenthic crustaceans, lamellibranchs and fish were the most important groups in defining trophic guilds. We found that the studied species had a high rate of overlapping spatial distributions and overlapping trophic niches. In this highly productive upwelling ecosystem, where food resources may not be a limiting factor, inter-specific competition did not appear to be an important factor in structuring bottom fish communities. For the species that showed differences in the proportions of prey categories in comparison with other ecosystems, the rise of the proportion of epibenthic crustaceans in their diet was a common feature; a possible consequence of the benthic productivity of this highly productive upwelling ecosystem.

  16. Near-infrared variability study of the central 2.3 × 2.3 arcmin2 of the Galactic Centre - II. Identification of RR Lyrae stars in the Milky Way nuclear star cluster

    NASA Astrophysics Data System (ADS)

    Dong, Hui; Schödel, Rainer; Williams, Benjamin F.; Nogueras-Lara, Francisco; Gallego-Cano, Eulalia; Gallego-Calvente, Teresa; Wang, Q. Daniel; Rich, R. Michael; Morris, Mark R.; Do, Tuan; Ghez, Andrea

    2017-11-01

    Because of strong and spatially highly variable interstellar extinction and extreme source crowding, the faint (K ≥ 15) stellar population in the Milky Way's nuclear star cluster is still poorly studied. RR Lyrae stars provide us with a tool to estimate the mass of the oldest, relative dim stellar population. Recently, we analysed HST/WFC3/IR observations of the central 2.3 × 2.3 arcmin2 of the Milky Way and found 21 variable stars with periods between 0.2 and 1 d. Here, we present a further comprehensive analysis of these stars. The period-luminosity relationship of RR Lyrae is used to derive their extinctions and distances. Using multiple approaches, we classify our sample as 4 RRc stars, 4 RRab stars, 3 RRab candidates and 10 binaries. Especially, the four RRab stars show sawtooth light curves and fall exactly on to the Oosterhoff I division in the Bailey diagram. Compared to the RRab stars reported by Minniti et al., our new RRab stars have higher extinction (AK > 1.8) and should be closer to the Galactic Centre. The extinction and distance of one RRab stars match those for the Milky Way's nuclear star cluster given in previous works. We perform simulations and find that after correcting for incompleteness, there could be not more than 40 RRab stars within the Milky Way's nuclear star cluster and in our field of view. Through comparing with the known globular clusters of the Milky Way, we estimate that if there exists an old, metal-poor (-1.5 < [Fe/H] < -1) stellar population in the Milky Way nuclear star cluster on a scale of 5 × 5 pc, then it contributes at most 4.7 × 105 M⊙, I.e. ˜18 per cent of the stellar mass.

  17. The adiposity of children is associated with their lifestyle behaviours: a cluster analysis of school-aged children from 12 nations.

    PubMed

    Dumuid, Dorothea; Olds, T; Lewis, L K; Martin-Fernández, J A; Barreira, T; Broyles, S; Chaput, J-P; Fogelholm, M; Hu, G; Kuriyan, R; Kurpad, A; Lambert, E V; Maia, J; Matsudo, V; Onywera, V O; Sarmiento, O L; Standage, M; Tremblay, M S; Tudor-Locke, C; Zhao, P; Katzmarzyk, P; Gillison, F; Maher, C

    2018-02-01

    The relationship between children's adiposity and lifestyle behaviour patterns is an area of growing interest. The objectives of this study are to identify clusters of children based on lifestyle behaviours and compare children's adiposity among clusters. Cross-sectional data from the International Study of Childhood Obesity, Lifestyle and the Environment were used. the participants were children (9-11 years) from 12 nations (n = 5710). 24-h accelerometry and self-reported diet and screen time were clustering input variables. Objectively measured adiposity indicators were waist-to-height ratio, percent body fat and body mass index z-scores. sex-stratified analyses were performed on the global sample and repeated on a site-wise basis. Cluster analysis (using isometric log ratios for compositional data) was used to identify common lifestyle behaviour patterns. Site representation and adiposity were compared across clusters using linear models. Four clusters emerged: (1) Junk Food Screenies, (2) Actives, (3) Sitters and (4) All-Rounders. Countries were represented differently among clusters. Chinese children were over-represented in Sitters and Colombian children in Actives. Adiposity varied across clusters, being highest in Sitters and lowest in Actives. Children from different sites clustered into groups of similar lifestyle behaviours. Cluster membership was linked with differing adiposity. Findings support the implementation of activity interventions in all countries, targeting both physical activity and sedentary time. © 2016 World Obesity Federation.

  18. Pipelining Architecture of Indexing Using Agglomerative Clustering

    NASA Astrophysics Data System (ADS)

    Goyal, Deepika; Goyal, Deepti; Gupta, Parul

    2010-11-01

    The World Wide Web is an interlinked collection of billions of documents. Ironically the huge size of this collection has become an obstacle for information retrieval. To access the information from Internet, search engine is used. Search engine retrieve the pages from indexer. This paper introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time and also clustering algorithm that aims at partitioning the set of documents into ordered clusters so that the documents within the same cluster are similar and are being assigned the closer document identifiers. After assigning to the clusters it creates the hierarchy of index so that searching is efficient. It will make the super cluster then mega cluster by itself. The pipeline architecture will create the index in such a way that it will be efficient in space and time saving manner. It will direct the search from higher level to lower level of index or higher level of clusters to lower level of cluster so that the user gets the possible match result in time saving manner. As one cluster is making by taking only two clusters so it search is limited to two clusters for lower level of index and so on. So it is efficient in time saving manner.

  19. On the rebound: soil organic carbon stocks can bounce back to near forest levels when agroforests replace agriculture in southern India

    NASA Astrophysics Data System (ADS)

    Hombegowda, H. C.; van Straaten, O.; Köhler, M.; Hölscher, D.

    2015-08-01

    Tropical agroforestry has an enormous potential to sequester carbon while simultaneously producing agricultural yields and tree products. The amount of soil organic carbon (SOC) sequestered is however influenced by the type of the agroforestry system established, the soil and climatic conditions and management. In this regional scale study, we utilized a chronosequence approach to investigate how SOC stocks changed when the original forests are converted to agriculture, and then subsequently to four different agroforestry systems (AFSs): homegarden, coffee, coconut and mango. In total we established 224 plots in 56 plot clusters across four climate zones in southern India. Each plot cluster consisted of four plots: a natural forest reference plot, an agriculture reference and two of the same AFS types of two ages (30-60 years and > 60 years). The conversion of forest to agriculture resulted in a large loss the original SOC stock (50-61 %) in the top meter of soil depending on the climate zone. The establishment of homegarden and coffee AFSs on agriculture land caused SOC stocks to rebound to near forest levels, while in mango and coconut AFSs the SOC stock increased only slightly above the agriculture stock. The most important variable regulating SOC stocks and its changes was tree basal area, possibly indicative of organic matter inputs. Furthermore, climatic variables such as temperature and precipitation, and soil variables such as clay fraction and soil pH were likewise all important regulators of SOC and SOC stock changes. Lastly, we found a strong correlation between tree species diversity in homegarden and coffee AFSs and SOC stocks, highlighting possibilities to increase carbon stocks by proper tree species assemblies.

  20. Pinus ponderosa: A checkered past obscured four species.

    PubMed

    Willyard, Ann; Gernandt, David S; Potter, Kevin; Hipkins, Valerie; Marquardt, Paula; Mahalovich, Mary Frances; Langer, Stephen K; Telewski, Frank W; Cooper, Blake; Douglas, Connor; Finch, Kristen; Karemera, Hassani H; Lefler, Julia; Lea, Payton; Wofford, Austin

    2017-01-01

    Molecular genetic evidence can help delineate taxa in species complexes that lack diagnostic morphological characters. Pinus ponderosa (Pinaceae; subsection Ponderosae) is recognized as a problematic taxon: plastid phylogenies of exemplars were paraphyletic, and mitochondrial phylogeography suggested at least four subdivisions of P. ponderosa. These patterns have not been examined in the context of other Ponderosae species. We hypothesized that putative intraspecific subdivisions might each represent a separate taxon. We genotyped six highly variable plastid simple sequence repeats in 1903 individuals from 88 populations of P. ponderosa and related Ponderosae (P. arizonica, P. engelmannii, and P. jeffreyi). We used multilocus haplotype networks and discriminant analysis of principal components to test clustering of individuals into genetically and geographically meaningful taxonomic units. There are at least four distinct plastid clusters within P. ponderosa that roughly correspond to the geographic distribution of mitochondrial haplotypes. Some geographic regions have intermixed plastid lineages, and some mitochondrial and plastid boundaries do not coincide. Based on relative distances to other species of Ponderosae, these clusters diagnose four distinct taxa. Newly revealed geographic boundaries of four distinct taxa (P. benthamiana, P. brachyptera, P. scopulorum, and a narrowed concept of P. ponderosa) do not correspond completely with taxonomies. Further research is needed to understand their morphological and nuclear genetic makeup, but we suggest that resurrecting originally published species names would more appropriately reflect the taxonomy of this checkered classification than their current treatment as varieties of P. ponderosa. © 2017 Willyard et al. Published by the Botanical Society of America. This work is licensed under a Creative Commons public domain license (CC0 1.0).

  1. Neuropsychological phenotypes among men with and without HIV disease in the multicenter AIDS cohort study.

    PubMed

    Molsberry, Samantha A; Cheng, Yu; Kingsley, Lawrence; Jacobson, Lisa; Levine, Andrew J; Martin, Eileen; Miller, Eric N; Munro, Cynthia A; Ragin, Ann; Sacktor, Ned; Becker, James T

    2018-05-11

    Mild forms of HIV-associated neurocognitive disorder (HAND) remain prevalent in the combination anti-retroviral therapy (cART) era. This study's objective was to identify neuropsychological subgroups within the Multicenter AIDS Cohort Study (MACS) based on the participant-based latent structure of cognitive function and to identify factors associated with subgroups. The MACS is a four-site longitudinal study of the natural and treated history of HIV disease among gay and bisexual men. Using neuropsychological domain scores we used a cluster variable selection algorithm to identify the optimal subset of domains with cluster information. Latent profile analysis was applied using scores from identified domains. Exploratory and post-hoc analyses were conducted to identify factors associated with cluster membership and the drivers of the observed associations. Cluster variable selection identified all domains as containing cluster information except for Working Memory. A three-profile solution produced the best fit for the data. Profile 1 performed below average on all domains, Profile 2 performed average on executive functioning, motor, and speed and below average on learning and memory, Profile 3 performed at or above average across all domains. Several demographic, cognitive, and social factors were associated with profile membership; these associations were driven by differences between Profile 1 and the other profiles. There is an identifiable pattern of neuropsychological performance among MACS members determined by all domains except Working Memory. Neither HIV nor HIV-related biomarkers were related with cluster membership, consistent with other findings that cognitive performance patterns do not map directly onto HIV serostatus.

  2. Aboveground sink strength in forests controls the allocation of carbon below ground and its [CO2]-induced enhancement

    Treesearch

    Sari Palmroth; Ram Oren; Heather R. McCarthy; Kurt H. Johnsen; Adrien C. Finzi; John R. Butnor; Michael G. Ryan; William H. Schlesinger

    2006-01-01

    The partitioning among carbon (C) pools of the extra C captured under elevated atmospheric CO2 concentration ([CO2]) determines the enhancement in C sequestration, yet no clear partitioning rules exist. Here, we used first principles and published data from four free-air CO2 enrichment (FACE)...

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, J; Gensheimer, M; Dong, X

    Purpose: To develop an intra-tumor partitioning framework for identifying high-risk subregions from 18F-fluorodeoxyglucose positron emission tomography (FDG-PET) and CT imaging, and to test whether tumor burden associated with the high-risk subregions is prognostic of outcomes in lung cancer. Methods: In this institutional review board-approved retrospective study, we analyzed the pre-treatment FDG-PET and CT scans of 44 lung cancer patients treated with radiotherapy. A novel, intra-tumor partitioning method was developed based on a two-stage clustering process: first at patient-level, each tumor was over-segmented into many superpixels by k-means clustering of integrated PET and CT images; next, tumor subregions were identified bymore » merging previously defined superpixels via population-level hierarchical clustering. The volume associated with each of the subregions was evaluated using Kaplan-Meier analysis regarding its prognostic capability in predicting overall survival (OS) and out-of-field progression (OFP). Results: Three spatially distinct subregions were identified within each tumor, which were highly robust to uncertainty in PET/CT co-registration. Among these, the volume of the most metabolically active and metabolically heterogeneous solid component of the tumor was predictive of OS and OFP on the entire cohort, with a concordance index or CI = 0.66–0.67. When restricting the analysis to patients with stage III disease (n = 32), the same subregion achieved an even higher CI = 0.75 (HR = 3.93, logrank p = 0.002) for predicting OS, and a CI = 0.76 (HR = 4.84, logrank p = 0.002) for predicting OFP. In comparison, conventional imaging markers including tumor volume, SUVmax and MTV50 were not predictive of OS or OFP, with CI mostly below 0.60 (p < 0.001). Conclusion: We propose a robust intra-tumor partitioning method to identify clinically relevant, high-risk subregions in lung cancer. We envision that this approach will be applicable to identifying useful imaging biomarkers in many cancer types.« less

  4. The cluster model of a hot dense vapor

    NASA Astrophysics Data System (ADS)

    Zhukhovitskii, D. I.

    2015-04-01

    We explore thermodynamic properties of a vapor in the range of state parameters where the contribution to thermodynamic functions from bound states of atoms (clusters) dominates over the interaction between the components of the vapor in free states. The clusters are assumed to be light and sufficiently "hot" for the number of bonds to be minimized. We use the technique of calculation of the cluster partition function for the cluster with a minimum number of interatomic bonds to calculate the caloric properties (heat capacity and velocity of sound) for an ideal mixture of the lightest clusters. The problem proves to be exactly solvable and resulting formulas are functions solely of the equilibrium constant of the dimer formation. These formulas ensure a satisfactory correlation with the reference data for the vapors of cesium, mercury, and argon up to moderate densities in both the sub- and supercritical regions. For cesium, we extend the model to the densities close to the critical one by inclusion of the clusters of arbitrary size. Knowledge of the cluster composition of the cesium vapor makes it possible to treat nonequilibrium phenomena such as nucleation of the supersaturated vapor, for which the effect of the cluster structural transition is likely to be significant.

  5. An advanced method for classifying atmospheric circulation types based on prototypes connectivity graph

    NASA Astrophysics Data System (ADS)

    Zagouras, Athanassios; Argiriou, Athanassios A.; Flocas, Helena A.; Economou, George; Fotopoulos, Spiros

    2012-11-01

    Classification of weather maps at various isobaric levels as a methodological tool is used in several problems related to meteorology, climatology, atmospheric pollution and to other fields for many years. Initially the classification was performed manually. The criteria used by the person performing the classification are features of isobars or isopleths of geopotential height, depending on the type of maps to be classified. Although manual classifications integrate the perceptual experience and other unquantifiable qualities of the meteorology specialists involved, these are typically subjective and time consuming. Furthermore, during the last years different approaches of automated methods for atmospheric circulation classification have been proposed, which present automated and so-called objective classifications. In this paper a new method of atmospheric circulation classification of isobaric maps is presented. The method is based on graph theory. It starts with an intelligent prototype selection using an over-partitioning mode of fuzzy c-means (FCM) algorithm, proceeds to a graph formulation for the entire dataset and produces the clusters based on the contemporary dominant sets clustering method. Graph theory is a novel mathematical approach, allowing a more efficient representation of spatially correlated data, compared to the classical Euclidian space representation approaches, used in conventional classification methods. The method has been applied to the classification of 850 hPa atmospheric circulation over the Eastern Mediterranean. The evaluation of the automated methods is performed by statistical indexes; results indicate that the classification is adequately comparable with other state-of-the-art automated map classification methods, for a variable number of clusters.

  6. Anionic Pt in Silicate Melts at Low Oxygen Fugacity: Speciation, Partitioning and Implications for Core Formation Processes on Asteroids

    NASA Technical Reports Server (NTRS)

    Medard, E.; Martin, A. M.; Righter, K.; Malouta, A.; Lee, C.-T.

    2017-01-01

    Most siderophile element concentrations in planetary mantles can be explained by metal/ silicate equilibration at high temperature and pressure during core formation. Highly siderophile elements (HSE = Au, Re, and the Pt-group elements), however, usually have higher mantle abundances than predicted by partitioning models, suggesting that their concentrations have been set by late accretion of material that did not equilibrate with the core. The partitioning of HSE at the low oxygen fugacities relevant for core formation is however poorly constrained due to the lack of sufficient experimental constraints to describe the variations of partitioning with key variables like temperature, pressure, and oxygen fugacity. To better understand the relative roles of metal/silicate partitioning and late accretion, we performed a self-consistent set of experiments that parameterizes the influence of oxygen fugacity, temperature and melt composition on the partitioning of Pt, one of the HSE, between metal and silicate melts. The major outcome of this project is the fact that Pt dissolves in an anionic form in silicate melts, causing a dependence of partitioning on oxygen fugacity opposite to that reported in previous studies.

  7. Value-based customer grouping from large retail data sets

    NASA Astrophysics Data System (ADS)

    Strehl, Alexander; Ghosh, Joydeep

    2000-04-01

    In this paper, we propose OPOSSUM, a novel similarity-based clustering algorithm using constrained, weighted graph- partitioning. Instead of binary presence or absence of products in a market-basket, we use an extended 'revenue per product' measure to better account for management objectives. Typically the number of clusters desired in a database marketing application is only in the teens or less. OPOSSUM proceeds top-down, which is more efficient and takes a small number of steps to attain the desired number of clusters as compared to bottom-up agglomerative clustering approaches. OPOSSUM delivers clusters that are balanced in terms of either customers (samples) or revenue (value). To facilitate data exploration and validation of results we introduce CLUSION, a visualization toolkit for high-dimensional clustering problems. To enable closed loop deployment of the algorithm, OPOSSUM has no user-specified parameters. Thresholding heuristics are avoided and the optimal number of clusters is automatically determined by a search for maximum performance. Results are presented on a real retail industry data-set of several thousand customers and products, to demonstrate the power of the proposed technique.

  8. Hazard evaluation of ten organophosphorous insecticides against the midge, Chironomus riparius via QSAR

    USGS Publications Warehouse

    Landrum, Peter F.; Fisher, Susan W.; Hwang, Haejo; Hickey, James P.

    1999-01-01

    Toxicities of ten organophosphorus (OP) insecticides were measured against midge larvae (Chironomus riparius) under varying temperature (11, 18, and 25°C) and pH (6, 7, and 8) conditions and with and without sediment. Toxicity usually increased with increasing temperature and was greater in the absence of sediment. No trend was found with varying pH. A series of unidimensional parameters and multidimensional models were used to describe the changes in toxicity. Log Kow was able to explain about 40–60% of the variability in response data for aqueous exposures while molecular volume and aqueous solubility were less predictive. Likewise, the linear solvation energy relationship (LSER) model only explained 40–70% of the response variability, suggesting that factors other than solubility were most important for producing the observed response. Molecular connectivity was the most useful for describing the variability in the response. In the absence of sediment, 1χv and 3κ were best able to describe the variation in response among all compounds at each pH (70–90%). In the presence of sediment, even molecular connectivity could not describe the variability until the partitioning potential to sediment was accounted for by assuming equilibrium partitioning. After correcting for partitioning, the same molecular connectivity terms as in the aqueous exposures described most of the variability, 61–87%, except for the 11°C data where correlations were not significant. Molecular connectivity was a better tool than LSER or the unidimensional variables to explain the steric fitness of OP insecticides which was crucial to the toxicity.

  9. Harnessing the Bethe free energy†

    PubMed Central

    Bapst, Victor

    2016-01-01

    ABSTRACT A wide class of problems in combinatorics, computer science and physics can be described along the following lines. There are a large number of variables ranging over a finite domain that interact through constraints that each bind a few variables and either encourage or discourage certain value combinations. Examples include the k‐SAT problem or the Ising model. Such models naturally induce a Gibbs measure on the set of assignments, which is characterised by its partition function. The present paper deals with the partition function of problems where the interactions between variables and constraints are induced by a sparse random (hyper)graph. According to physics predictions, a generic recipe called the “replica symmetric cavity method” yields the correct value of the partition function if the underlying model enjoys certain properties [Krzkala et al., PNAS (2007) 10318–10323]. Guided by this conjecture, we prove general sufficient conditions for the success of the cavity method. The proofs are based on a “regularity lemma” for probability measures on sets of the form Ωn for a finite Ω and a large n that may be of independent interest. © 2016 Wiley Periodicals, Inc. Random Struct. Alg., 49, 694–741, 2016 PMID:28035178

  10. A multiple-time-scale turbulence model based on variable partitioning of turbulent kinetic energy spectrum

    NASA Technical Reports Server (NTRS)

    Kim, S.-W.; Chen, C.-P.

    1987-01-01

    A multiple-time-scale turbulence model of a single point closure and a simplified split-spectrum method is presented. In the model, the effect of the ratio of the production rate to the dissipation rate on eddy viscosity is modeled by use of the multiple-time-scales and a variable partitioning of the turbulent kinetic energy spectrum. The concept of a variable partitioning of the turbulent kinetic energy spectrum and the rest of the model details are based on the previously reported algebraic stress turbulence model. Example problems considered include: a fully developed channel flow, a plane jet exhausting into a moving stream, a wall jet flow, and a weakly coupled wake-boundary layer interaction flow. The computational results compared favorably with those obtained by using the algebraic stress turbulence model as well as experimental data. The present turbulence model, as well as the algebraic stress turbulence model, yielded significantly improved computational results for the complex turbulent boundary layer flows, such as the wall jet flow and the wake boundary layer interaction flow, compared with available computational results obtained by using the standard kappa-epsilon turbulence model.

  11. A multiple-time-scale turbulence model based on variable partitioning of the turbulent kinetic energy spectrum

    NASA Technical Reports Server (NTRS)

    Kim, S.-W.; Chen, C.-P.

    1989-01-01

    A multiple-time-scale turbulence model of a single point closure and a simplified split-spectrum method is presented. In the model, the effect of the ratio of the production rate to the dissipation rate on eddy viscosity is modeled by use of the multiple-time-scales and a variable partitioning of the turbulent kinetic energy spectrum. The concept of a variable partitioning of the turbulent kinetic energy spectrum and the rest of the model details are based on the previously reported algebraic stress turbulence model. Example problems considered include: a fully developed channel flow, a plane jet exhausting into a moving stream, a wall jet flow, and a weakly coupled wake-boundary layer interaction flow. The computational results compared favorably with those obtained by using the algebraic stress turbulence model as well as experimental data. The present turbulence model, as well as the algebraic stress turbulence model, yielded significantly improved computational results for the complex turbulent boundary layer flows, such as the wall jet flow and the wake boundary layer interaction flow, compared with available computational results obtained by using the standard kappa-epsilon turbulence model.

  12. Dietary flexibility and niche partitioning of large herbivores through the Pleistocene of Britain

    NASA Astrophysics Data System (ADS)

    Rivals, Florent; Lister, Adrian M.

    2016-08-01

    Tooth wear analysis techniques (mesowear and microwear) are employed to analyze dietary traits in proboscideans, perissodactyls and artiodactyls from 33 Pleistocene localities in Britain. The objectives of this study are to examine the variability in each taxon, to track dietary shifts through time, and to investigate resource partitioning among species. The integration of mesowear and microwear results first allowed us to examine dietary variability. We identified differences in variability among species, from more stenotopic species such as Capreolus capreolus to more eurytopic species such as Megaloceros giganteus and Cervus elaphus. Broad dietary shifts at the community level are seen between climatic phases, and are the result of species turnover as well as dietary shifts in the more flexible species. The species present at each locality are generally spread over a large part of the dietary spectrum, and resource partitioning was identified at most of these localities. Mixed feeders always coexist with at least one of the two strict dietary groups, grazers or browsers. Finally, for some species, a discrepancy is observed between meso- and microwear signals and may imply that individuals tended to die at a time of year when their normal food was in short supply.

  13. Intercenter Differences in Bronchopulmonary Dysplasia or Death Among Very Low Birth Weight Infants

    PubMed Central

    Walsh, Michele; Bobashev, Georgiy; Das, Abhik; Levine, Burton; Carlo, Waldemar A.; Higgins, Rosemary D.

    2011-01-01

    OBJECTIVES: To determine (1) the magnitude of clustering of bronchopulmonary dysplasia (36 weeks) or death (the outcome) across centers of the Eunice Kennedy Shriver National Institute of Child and Human Development National Research Network, (2) the infant-level variables associated with the outcome and estimate their clustering, and (3) the center-specific practices associated with the differences and build predictive models. METHODS: Data on neonates with a birth weight of <1250 g from the cluster-randomized benchmarking trial were used to determine the magnitude of clustering of the outcome according to alternating logistic regression by using pairwise odds ratio and predictive modeling. Clinical variables associated with the outcome were identified by using multivariate analysis. The magnitude of clustering was then evaluated after correction for infant-level variables. Predictive models were developed by using center-specific and infant-level variables for data from 2001 2004 and projected to 2006. RESULTS: In 2001–2004, clustering of bronchopulmonary dysplasia/death was significant (pairwise odds ratio: 1.3; P < .001) and increased in 2006 (pairwise odds ratio: 1.6; overall incidence: 52%; range across centers: 32%–74%); center rates were relatively stable over time. Variables that varied according to center and were associated with increased risk of outcome included lower body temperature at NICU admission, use of prophylactic indomethacin, specific drug therapy on day 1, and lack of endotracheal intubation. Center differences remained significant even after correction for clustered variables. CONCLUSION: Bronchopulmonary dysplasia/death rates demonstrated moderate clustering according to center. Clinical variables associated with the outcome were also clustered. Center differences after correction of clustered variables indicate presence of as-yet unmeasured center variables. PMID:21149431

  14. The partition function of the Bures ensemble as the τ-function of BKP and DKP hierarchies: continuous and discrete

    NASA Astrophysics Data System (ADS)

    Hu, Xing-Biao; Li, Shi-Hao

    2017-07-01

    The relationship between matrix integrals and integrable systems was revealed more than 20 years ago. As is known, matrix integrals over a Gaussian ensemble used in random matrix theory could act as the τ-function of several hierarchies of integrable systems. In this article, we will show that the time-dependent partition function of the Bures ensemble, whose measure has many interesting geometric properties, could act as the τ-function of BKP and DKP hierarchies. In addition, if discrete time variables are introduced, then this partition function could act as the τ-function of discrete BKP and DKP hierarchies. In particular, there are some links between the partition function of the Bures ensemble and Toda-type equations.

  15. Acoustic, genetic and morphological variations within the katydid Gampsocleis sedakovii (Orthoptera, Tettigonioidea)

    PubMed Central

    Zhang, Xue; Wen, Ming; Li, Junjian; Zhu, Hui; Wang, Yinliang; Ren, Bingzhong

    2015-01-01

    Abstract In an attempt to explain the variation within this species and clarify the subspecies classification, an analysis of the genetic, calling songs, and morphological variations within the species Gampsocleis sedakovii is presented from Inner Mongolia, China. Recordings were compared of the male calling songs and analysis performed of selected acoustic variables. This analysis is combined with sequencing of mtDNA - COI and examination of morphological traits to perform cluster analyses. The trees constructed from different datasets were structurally similar, bisecting the six geographical populations studied. Based on two large branches in the analysis, the species Gampsocleis sedakovii was partitioned into two subspecies, Gampsocleis sedakovii sedakovii (Fischer von Waldheim, 1846) and Gampsocleis sedakovii obscura (Walker, 1869). Comparing all the traits, the individual of Elunchun (ELC) was the intermediate type in this species according to the acoustic, genetic, and morphological characteristics. This study provides evidence for insect acoustic signal divergence and the process of subspeciation. PMID:26692795

  16. Similarity-transformed perturbation theory on top of truncated local coupled cluster solutions: Theory and applications to intermolecular interactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Azar, Richard Julian, E-mail: julianazar2323@berkeley.edu; Head-Gordon, Martin, E-mail: mhg@cchem.berkeley.edu

    2015-05-28

    Your correspondents develop and apply fully nonorthogonal, local-reference perturbation theories describing non-covalent interactions. Our formulations are based on a Löwdin partitioning of the similarity-transformed Hamiltonian into a zeroth-order intramonomer piece (taking local CCSD solutions as its zeroth-order eigenfunction) plus a first-order piece coupling the fragments. If considerations are limited to a single molecule, the proposed intermolecular similarity-transformed perturbation theory represents a frozen-orbital variant of the “(2)”-type theories shown to be competitive with CCSD(T) and of similar cost if all terms are retained. Different restrictions on the zeroth- and first-order amplitudes are explored in the context of large-computation tractability and elucidationmore » of non-local effects in the space of singles and doubles. To accurately approximate CCSD intermolecular interaction energies, a quadratically growing number of variables must be included at zeroth-order.« less

  17. When the wind goes out of the sail - declining recovery expectations in the first weeks of back pain.

    PubMed

    Carstens, J K P; Shaw, W S; Boersma, K; Reme, S E; Pransky, G; Linton, S J

    2014-02-01

    Expectations for recovery are a known predictor for returning to work. Most studies seem to conclude that the higher the expectancy the better the outcome. However, the development of expectations over time is rarely researched and experimental studies show that realistic expectations rather than high expectancies are the most adaptive. This study aims to explore patterns of stability and change in expectations for recovery during the first weeks of a back-pain episode and how these patterns relate to other psychological variables and outcome. The study included 496 volunteer patients seeking treatment for work-related, acute back pain. The participants were measured with self-report scales of depression, fear of pain, life impact of pain, catastrophizing and expectations for recovery at two time points. A follow-up focusing on recovery and return to work was conducted 3 months later. A cluster analysis was conducted, categorizing the data on the trajectories of recovery expectations. Cluster analysis revealed four clusters regarding the development of expectations for recovery during a 2-week period after pain onset. Three out of four clusters showed stability in their expectations as well as corresponding levels of proximal psychological factors. The fourth cluster showed increases in distress and a decrease in expectations for recovery. This cluster also has poor odds ratios for returning to work and recovery. Decreases in expectancies for recovery seem as important as baseline values in terms of outcome, which has clinical and theoretical implications. © 2013 European Pain Federation - EFIC®

  18. Partitioning dynamic electron correlation energy: Viewing Møller-Plesset correlation energies through Interacting Quantum Atom (IQA) energy partitioning

    NASA Astrophysics Data System (ADS)

    McDonagh, James L.; Vincent, Mark A.; Popelier, Paul L. A.

    2016-10-01

    Here MP2, MP3 and MP4(SDQ) are energy-partitioned for the first time within the Interacting Quantum Atoms (IQA) context, as proof-of-concept for H2, He2 and HF. Energies are decomposed into four primary energy contributions: (i) atomic self-energies, and atomic interaction energies comprising of (ii) Coulomb, (iii) exchange and (iv) dynamic election correlation terms. We generate and partition one- and two-particle density-matrices to obtain all atomic energy components. This work suggests that, in terms of Van der Waals dispersion, the correlation energies represent an atomic stabilisation, by proximity to other atoms, as opposed to direct interactions with other nearby atoms.

  19. Exploring cluster Monte Carlo updates with Boltzmann machines

    NASA Astrophysics Data System (ADS)

    Wang, Lei

    2017-11-01

    Boltzmann machines are physics informed generative models with broad applications in machine learning. They model the probability distribution of an input data set with latent variables and generate new samples accordingly. Applying the Boltzmann machines back to physics, they are ideal recommender systems to accelerate the Monte Carlo simulation of physical systems due to their flexibility and effectiveness. More intriguingly, we show that the generative sampling of the Boltzmann machines can even give different cluster Monte Carlo algorithms. The latent representation of the Boltzmann machines can be designed to mediate complex interactions and identify clusters of the physical system. We demonstrate these findings with concrete examples of the classical Ising model with and without four-spin plaquette interactions. In the future, automatic searches in the algorithm space parametrized by Boltzmann machines may discover more innovative Monte Carlo updates.

  20. Metal-Silicate Partitioning of Various Siderophile Elements at High Pressure and High Temperatures: a Diamond Anvil Cell Study

    NASA Astrophysics Data System (ADS)

    Badro, J.; Blanchard, I.; Siebert, J.

    2015-12-01

    Core formation is the major chemical fractionation that occurred on Earth. This event is widely believed to have happened at pressures of at least 40 GPa and temperatures exceeding 3000 K. It has left a significant imprint on the chemistry of the mantle by removing most of the siderophile (iron-loving) elements from it. Abundances of most siderophile elements in the bulk silicate Earth are significantly different than those predicted from experiments at low P-T. Among them, vanadium, chromium, cobalt and gallium are four siderophile elements which abundances in the mantle have been marked by core formation processes. Thus, understand their respective abundance in the mantle can help bringing constraints on the conditions of Earth's differentiation. We performed high-pressure high-temperature experiments using laser heating diamond anvil cell to investigate the metal-silicate partitioning of those four elements. Homogeneous glasses doped in vanadium, chromium, cobalt and gallium were synthesized using a levitation furnace and load inside the diamond anvil cell along with metallic powder. Samples were recovered using a Focused Ion Beam and chemically analyzed using an electron microprobe. We investigate the effect of pressure, temperature and metal composition on the metal-silicate partitioning of V, Cr, Co and Ga. Three previous studies focused on V, Cr and Co partitioning at those conditions of pressure and temperature, but none explore gallium partitioning at the relevant extreme conditions of core formation. We will present the first measurements of gallium metal-silicate partitioning performed at the appropriate conditions of pressure and temperature of Earth's differentiation.

  1. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    PubMed

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  2. ADPROCLUS: a graphical user interface for fitting additive profile clustering models to object by variable data matrices.

    PubMed

    Wilderjans, Tom F; Ceulemans, Eva; Van Mechelen, Iven; Depril, Dirk

    2011-03-01

    In many areas of psychology, one is interested in disclosing the underlying structural mechanisms that generated an object by variable data set. Often, based on theoretical or empirical arguments, it may be expected that these underlying mechanisms imply that the objects are grouped into clusters that are allowed to overlap (i.e., an object may belong to more than one cluster). In such cases, analyzing the data with Mirkin's additive profile clustering model may be appropriate. In this model: (1) each object may belong to no, one or several clusters, (2) there is a specific variable profile associated with each cluster, and (3) the scores of the objects on the variables can be reconstructed by adding the cluster-specific variable profiles of the clusters the object in question belongs to. Until now, however, no software program has been publicly available to perform an additive profile clustering analysis. For this purpose, in this article, the ADPROCLUS program, steered by a graphical user interface, is presented. We further illustrate its use by means of the analysis of a patient by symptom data matrix.

  3. The association between mood state and chronobiological characteristics in bipolar I disorder: a naturalistic, variable cluster analysis-based study.

    PubMed

    Gonzalez, Robert; Suppes, Trisha; Zeitzer, Jamie; McClung, Colleen; Tamminga, Carol; Tohen, Mauricio; Forero, Angelica; Dwivedi, Alok; Alvarado, Andres

    2018-02-19

    Multiple types of chronobiological disturbances have been reported in bipolar disorder, including characteristics associated with general activity levels, sleep, and rhythmicity. Previous studies have focused on examining the individual relationships between affective state and chronobiological characteristics. The aim of this study was to conduct a variable cluster analysis in order to ascertain how mood states are associated with chronobiological traits in bipolar I disorder (BDI). We hypothesized that manic symptomatology would be associated with disturbances of rhythm. Variable cluster analysis identified five chronobiological clusters in 105 BDI subjects. Cluster 1, comprising subjective sleep quality was associated with both mania and depression. Cluster 2, which comprised variables describing the degree of rhythmicity, was associated with mania. Significant associations between mood state and cluster analysis-identified chronobiological variables were noted. Disturbances of mood were associated with subjectively assessed sleep disturbances as opposed to objectively determined, actigraphy-based sleep variables. No associations with general activity variables were noted. Relationships between gender and medication classes in use and cluster analysis-identified chronobiological characteristics were noted. Exploratory analyses noted that medication class had a larger impact on these relationships than the number of psychiatric medications in use. In a BDI sample, variable cluster analysis was able to group related chronobiological variables. The results support our primary hypothesis that mood state, particularly mania, is associated with chronobiological disturbances. Further research is required in order to define these relationships and to determine the directionality of the associations between mood state and chronobiological characteristics.

  4. Clustervision: Visual Supervision of Unsupervised Clustering.

    PubMed

    Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam

    2018-01-01

    Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.

  5. Percolation of the site random-cluster model by Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Wang, Songsong; Zhang, Wanzhou; Ding, Chengxiang

    2015-08-01

    We propose a site random-cluster model by introducing an additional cluster weight in the partition function of the traditional site percolation. To simulate the model on a square lattice, we combine the color-assignation and the Swendsen-Wang methods to design a highly efficient cluster algorithm with a small critical slowing-down phenomenon. To verify whether or not it is consistent with the bond random-cluster model, we measure several quantities, such as the wrapping probability Re, the percolating cluster density P∞, and the magnetic susceptibility per site χp, as well as two exponents, such as the thermal exponent yt and the fractal dimension yh of the percolating cluster. We find that for different exponents of cluster weight q =1.5 , 2, 2.5 , 3, 3.5 , and 4, the numerical estimation of the exponents yt and yh are consistent with the theoretical values. The universalities of the site random-cluster model and the bond random-cluster model are completely identical. For larger values of q , we find obvious signatures of the first-order percolation transition by the histograms and the hysteresis loops of percolating cluster density and the energy per site. Our results are helpful for the understanding of the percolation of traditional statistical models.

  6. Multilevel systems biology modeling characterized the atheroprotective efficiencies of modified dairy fats in a hamster model.

    PubMed

    Martin, Jean-Charles; Berton, Amélie; Ginies, Christian; Bott, Romain; Scheercousse, Pierre; Saddi, Alessandra; Gripois, Daniel; Landrier, Jean-François; Dalemans, Daniel; Alessi, Marie-Christine; Delplanque, Bernadette

    2015-09-01

    We assessed the atheroprotective efficiency of modified dairy fats in hyperlipidemic hamsters. A systems biology approach was implemented to reveal and quantify the dietary fat-related components of the disease. Three modified dairy fats (40% energy) were prepared from regular butter by mixing with a plant oil mixture, by removing cholesterol alone, or by removing cholesterol in combination with reducing saturated fatty acids. A plant oil mixture and a regular butter were used as control diets. The atherosclerosis severity (aortic cholesteryl-ester level) was higher in the regular butter-fed hamsters than in the other four groups (P < 0.05). Eighty-seven of the 1,666 variables measured from multiplatform analysis were found to be strongly associated with the disease. When aggregated into 10 biological clusters combined into a multivariate predictive equation, these 87 variables explained 81% of the disease variability. The biological cluster "regulation of lipid transport and metabolism" appeared central to atherogenic development relative to diets. The "vitamin E metabolism" cluster was the main driver of atheroprotection with the best performing transformed dairy fat. Under conditions that promote atherosclerosis, the impact of dairy fats on atherogenesis could be greatly ameliorated by technological modifications. Our modeling approach allowed for identifying and quantifying the contribution of complex factors to atherogenic development in each dietary setup. Copyright © 2015 the American Physiological Society.

  7. Two Cepheid variables in the Fornax dwarf galaxy

    NASA Technical Reports Server (NTRS)

    Light, R. M.; Armandroff, T. E.; Zinn, R.

    1986-01-01

    Two fields surrounding globular clusters 2 and 3 in the Fornax dwarf spheroidal galaxy have been searched for short-period variable stars that are brighter than the horizontal branch. This survey confirmed as variable the two suspected suprahorizontal-branch variables discovered by Buonanno et al. (1985) in their photometry of the clusters. The observations show that the star in cluster 2 is a W Virginis variable of 14.4 day period. It is the first W Vir variable to be found in a dwarf spheroidal galaxy, and its proximity to the center of cluster 2 suggests that it is a cluster member. The other star appears to be an anomalous Cephpeid of 0.78 day period. It lies outside or very near the boundary of cluster 3, and is therefore probably a member of the field population of Fornax. Although no other suprahorizontal-branch variables were discovered in the survey, it did confirm as variable two of the RR Lyrae candidates of Buonanno et al., which appeared at the survey limit. The implications of these observations for the understanding of the stellar content at Fornax are discussed.

  8. Prediction of chemotherapeutic response in bladder cancer using K-means clustering of dynamic contrast-enhanced (DCE)-MRI pharmacokinetic parameters.

    PubMed

    Nguyen, Huyen T; Jia, Guang; Shah, Zarine K; Pohar, Kamal; Mortazavi, Amir; Zynger, Debra L; Wei, Lai; Yang, Xiangyu; Clark, Daniel; Knopp, Michael V

    2015-05-01

    To apply k-means clustering of two pharmacokinetic parameters derived from 3T dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) to predict the chemotherapeutic response in bladder cancer at the mid-cycle timepoint. With the predetermined number of three clusters, k-means clustering was performed on nondimensionalized Amp and kep estimates of each bladder tumor. Three cluster volume fractions (VFs) were calculated for each tumor at baseline and mid-cycle. The changes of three cluster VFs from baseline to mid-cycle were correlated with the tumor's chemotherapeutic response. Receiver-operating-characteristics curve analysis was used to evaluate the performance of each cluster VF change as a biomarker of chemotherapeutic response in bladder cancer. The k-means clustering partitioned each bladder tumor into cluster 1 (low kep and low Amp), cluster 2 (low kep and high Amp), cluster 3 (high kep and low Amp). The changes of all three cluster VFs were found to be associated with bladder tumor response to chemotherapy. The VF change of cluster 2 presented with the highest area-under-the-curve value (0.96) and the highest sensitivity/specificity/accuracy (96%/100%/97%) with a selected cutoff value. The k-means clustering of the two DCE-MRI pharmacokinetic parameters can characterize the complex microcirculatory changes within a bladder tumor to enable early prediction of the tumor's chemotherapeutic response. © 2014 Wiley Periodicals, Inc.

  9. Prediction of chemotherapeutic response in bladder cancer using k-means clustering of DCE-MRI pharmacokinetic parameters

    PubMed Central

    Nguyen, Huyen T.; Jia, Guang; Shah, Zarine K.; Pohar, Kamal; Mortazavi, Amir; Zynger, Debra L.; Wei, Lai; Yang, Xiangyu; Clark, Daniel; Knopp, Michael V.

    2015-01-01

    Purpose To apply k-means clustering of two pharmacokinetic parameters derived from 3T DCE-MRI to predict chemotherapeutic response in bladder cancer at the mid-cycle time-point. Materials and Methods With the pre-determined number of 3 clusters, k-means clustering was performed on non-dimensionalized Amp and kep estimates of each bladder tumor. Three cluster volume fractions (VFs) were calculated for each tumor at baseline and mid-cycle. The changes of three cluster VFs from baseline to mid-cycle were correlated with the tumor’s chemotherapeutic response. Receiver-operating-characteristics curve analysis was used to evaluate the performance of each cluster VF change as a biomarker of chemotherapeutic response in bladder cancer. Results k-means clustering partitioned each bladder tumor into cluster 1 (low kep and low Amp), cluster 2 (low kep and high Amp), cluster 3 (high kep and low Amp). The changes of all three cluster VFs were found to be associated with bladder tumor response to chemotherapy. The VF change of cluster 2 presented with the highest area-under-the-curve value (0.96) and the highest sensitivity/specificity/accuracy (96%/100%/97%) with a selected cutoff value. Conclusion k-means clustering of the two DCE-MRI pharmacokinetic parameters can characterize the complex microcirculatory changes within a bladder tumor to enable early prediction of the tumor’s chemotherapeutic response. PMID:24943272

  10. Confinement and Mayer cluster expansions

    NASA Astrophysics Data System (ADS)

    Bourgine, Jean-Emile

    2014-05-01

    In this paper, we study a class of grand-canonical partition functions with a kernel depending on a small parameter ɛ. This class is directly relevant to Nekrasov partition functions of 𝒩 = 2 SUSY gauge theories on the 4d Ω-background, for which ɛ is identified with one of the equivariant deformation parameter. In the Nekrasov-Shatashvili limit ɛ→0, we show that the free energy is given by an on-shell effective action. The equations of motion take the form of a TBA equation. The free energy is identified with the Yang-Yang functional of the corresponding system of Bethe roots. We further study the associated canonical model that takes the form of a generalized matrix model. Confinement of the eigenvalues by the short-range potential is observed. In the limit where this confining potential becomes weak, the collective field theory formulation is recovered. Finally, we discuss the connection with the alternative expression of instanton partition functions as sums over Young tableaux.

  11. Buckminsterfullerene's (C60) octanol-water partition coefficient (Kow) and aqueous solubility.

    PubMed

    Jafvert, Chad T; Kulkarni, Pradnya P

    2008-08-15

    To assess the risk and fate of fullerene C60 in the environment, its water solubility and partition coefficients in various systems are useful. In this study, the log Kow of C60 was measured to be 6.67, and the toluene-water partition coefficient was measured at log Ktw = 8.44. From these values and the respective solubilities of C60 in water-saturated octanol and water-saturated toluene, C60's aqueous solubility was calculated at 7.96 ng/L(1.11 x 10(-11) M) for the organic solvent-saturated aqueous phase. Additionally, the solubility of C60 was measured in mixtures of ethanol-water and tetrahydrofuran-water and modeled with Wohl's equation to confirm the accuracy of the calculated solubility value. Results of a generator column experiment strongly support the hypothesis that clusters form at aqueous concentrations below or near this calculated solubility. The Kow value is compared to those of other hydrophobic organic compounds, and bioconcentration factors for C60 were estimated on the basis of Kow.

  12. Molecular phylogeny of the aquatic beetle family Noteridae (Coleoptera: Adephaga) with an emphasis on data partitioning strategies.

    PubMed

    Baca, Stephen M; Toussaint, Emmanuel F A; Miller, Kelly B; Short, Andrew E Z

    2017-02-01

    The first molecular phylogenetic hypothesis for the aquatic beetle family Noteridae is inferred using DNA sequence data from five gene fragments (mitochondrial and nuclear): COI, H3, 16S, 18S, and 28S. Our analysis is the most comprehensive phylogenetic reconstruction of Noteridae to date, and includes 53 species representing all subfamilies, tribes and 16 of the 17 genera within the family. We examine the impact of data partitioning on phylogenetic inference by comparing two different algorithm-based partitioning strategies: one using predefined subsets of the dataset, and another recently introduced method, which uses the k-means algorithm to iteratively divide the dataset into clusters of sites evolving at similar rates across sampled loci. We conducted both maximum likelihood and Bayesian inference analyses using these different partitioning schemes. Resulting trees are strongly incongruent with prior classifications of Noteridae. We recover variant tree topologies and support values among the implemented partitioning schemes. Bayes factors calculated with marginal likelihoods of Bayesian analyses support a priori partitioning over k-means and unpartitioned data strategies. Our study substantiates the importance of data partitioning in phylogenetic inference, and underscores the use of comparative analyses to determine optimal analytical strategies. Our analyses recover Noterini Thomson to be paraphyletic with respect to three other tribes. The genera Suphisellus Crotch and Hydrocanthus Say are also recovered as paraphyletic. Following the results of the preferred partitioning scheme, we here propose a revised classification of Noteridae, comprising two subfamilies, three tribes and 18 genera. The following taxonomic changes are made: Notomicrinae sensu n. (= Phreatodytinae syn. n.) is expanded to include the tribe Phreatodytini; Noterini sensu n. (= Neohydrocoptini syn. n., Pronoterini syn. n., Tonerini syn. n.) is expanded to include all genera of the Noterinae; The genus Suphisellus Crotch is expanded to include species of Pronoterus Sharp syn. n.; and the former subgenus Sternocanthus Guignot stat. rev. is resurrected from synonymy and elevated to genus rank. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. A conceptual framework of outcomes for caregivers of assistive technology users.

    PubMed

    Demers, Louise; Fuhrer, Marcus J; Jutai, Jeffrey; Lenker, James; Depa, Malgorzata; De Ruyter, Frank

    2009-08-01

    To develop and validate the content of a conceptual framework concerning outcomes for caregivers whose recipients are assistive technology users. The study was designed in four stages. First, a list of potential key variables relevant to the caregivers of assistive technology users was generated from a review of the existing literature and semistructured interviews with caregivers. Second, the variables were analyzed, regrouped, and partitioned, using a conceptual mapping approach. Third, the key areas were anchored in a general stress model of caregiving. Finally, the judgments of rehabilitation experts were used to evaluate the conceptual framework. An important result of this study is the identification of a complex set of variables that need to be considered when examining the experience of caregivers of assistive technology users. Stressors, such as types of assistance, number of tasks, and physical effort, are predominant contributors to caregiver outcomes along with caregivers' personal resources acting as mediating factors (intervening variables) and assistive technology acting as a key moderating factor (effect modifier variable). Recipients' use of assistive technology can enhance caregivers' well being because of its potential for alleviating a number of stressors associated with caregiving. Viewed as a whole, this work demonstrates that the assistive technology experience of caregivers has many facets that merit the attention of outcomes researchers.

  14. Stereotypy and variability of social calls among clustering female big-footed myotis (Myotis macrodactylus).

    PubMed

    Xiao, Yan-Hong; Wang, Lei; Hoyt, Joseph R; Jiang, Ting-Lei; Lin, Ai-Qing; Feng, Jiang

    2018-03-18

    Echolocating bats have developed advanced auditory perception systems, predominantly using acoustic signaling to communicate with each other. They can emit a diverse range of social calls in complex behavioral contexts. This study examined the vocal repertoire of five pregnant big-footed myotis bats (Myotis macrodactylus). In the process of clustering, the last individual to return to the colony (LI) emitted social calls that correlated with behavior, as recorded on a PC-based digital recorder. These last individuals could emit 10 simple monosyllabic and 27 complex multisyllabic types of calls, constituting four types of syllables. The social calls were composed of highly stereotyped syllables, hierarchically organized by a common set of syllables. However, intra-specific variation was also found in the number of syllables, syllable order and patterns of syllable repetition across call renditions. Data were obtained to characterize the significant individual differences that existed in the maximum frequency and duration of calls. Time taken to return to the roost was negatively associated with the diversity of social calls. Our findings indicate that variability in social calls may be an effective strategy taken by individuals during reintegration into clusters of female M. macrodactylus.

  15. Tripartite efficacy profiles: a cluster analytic investigation of athletes' perceptions of their relationship with their coach.

    PubMed

    Jackson, Ben; Gucciardi, Daniel F; Dimmock, James A

    2011-06-01

    Recent studies of coach-athlete interaction have explored the bivariate relationships between each of the tripartite efficacy constructs (self-efficacy; other-efficacy; relation-inferred self-efficacy, or RISE) and various indicators of relationship quality. This investigation adopted an alternative approach by using cluster analyses to identify tripartite efficacy profiles within a sample of 377 individual sport athletes (Mage = 20.25, SD = 2.12), and examined how individuals in each cluster group differed in their perceptions about their relationship with their coach (i.e., commitment, satisfaction, conflict). Four clusters emerged: High (n = 128), Moderate (n = 95), and Low (n = 78) profiles, in which athletes reported relatively high, moderate, or low scores across all tripartite perceptions, respectively, as well as an Unfulfilled profile (n = 76) in which athletes held relatively high self-efficacy, but perceived lower levels of other-efficacy and RISE. Multivariate analyses revealed differences between the clusters on all relationship variables that were in line with theory. These results underscore the utility of considering synergistic issues in the examination of the tripartite efficacy framework.

  16. An Empirical Taxonomy of Hospital Governing Board Roles

    PubMed Central

    Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R

    2008-01-01

    Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260

  17. Clustering, Seriation, and Subset Extraction of Confusion Data

    ERIC Educational Resources Information Center

    Brusco, Michael J.; Steinley, Douglas

    2006-01-01

    The study of confusion data is a well established practice in psychology. Although many types of analytical approaches for confusion data are available, among the most common methods are the extraction of 1 or more subsets of stimuli, the partitioning of the complete stimulus set into distinct groups, and the ordering of the stimulus set. Although…

  18. 78 FR 13755 - Notice of Receipt of Petition for Decision That Nonconforming 2003 Jeep Wrangler Multi-Purpose...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-28

    ... System, 114 Theft Protection, 116 Motor Vehicle Brake Fluids, 118 Power-Operated Window, Partition, and...: Replacement of the instrument cluster with a U.S.-model component with inscription of the word ``brake'' on the brake failure warning light as well as reading speed in mph. Standard No. 108 Lamps, Reflective...

  19. A Human Rights and History Education Model for Teaching about Historical Events of Mass Violence: The 1947 British India Partition

    ERIC Educational Resources Information Center

    Chhabra, Meenakshi

    2017-01-01

    This article examines singular historical narratives of the 1947 British India Partition in four history textbooks from India, Pakistan, Bangladesh, and Britain, respectively. Drawing on analysis and work in the field, this study proposes a seven-module "integrated snail model" with a human rights orientation that can be applied to…

  20. An application of bioassessment metrics and multivariate techniques to evaluate central Nebraska streams

    USGS Publications Warehouse

    Frenzel, S.A.

    1996-01-01

    Ninety-one stream sites in central Nebraska were classified into four clusters on the basis of a cluster analysis (TWINSPAN) of macroinvertebrate data. Rapid bioassessment protocol scores for macroinvertebrate species were significantly different among sites grouped by teh first division into two clusters. This division may have distinguished sites on the basis of water-quality imparement. Individual metrics that differed between clusters of sites were the Hilsenhoff Biotic Index, the number of Ephemeroptera, Plecoptera, and Trichoptera (EPT) taxa, and the ratio of individuals in EPT to Chironomidae taxa. Canonical correspondence analysis of 57 of 91 sites showed that stream width, site altitude, latitude, soil permeability, water temperature, and mean annual precipitation were the most important environmental variables describing variance in the species-environment relation. Stream width and soil permeability reflected streamflow characteristics of a site, whereas site altitude and latitude were factors related to general climatic conditions. Mean annual precipitation related to both streamflow and climatic conditions.

Top