Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick
2017-03-09
Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M
2016-09-01
To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire(SGRQ) score, acute exacerbation in the past one year, PEF variability and allergic dermatitis (P<0.05). (2) Four clusters were also identified by two-step cluster analysis as followings, cluster 1, COPD patients with moderate to severe airflow limitation; cluster 2, asthma and COPD patients with heavy smoking, airflow limitation and increased airways reversibility; cluster 3, patients having less smoking and normal pulmonary function with wheezing but no chronic cough; cluster 4, chronic bronchitis patients with normal pulmonary function and chronic cough. Significant differences were revealed regarding gender distribution, respiratory symptoms, pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, MMEF% pred, DLCO/VA% pred, RV% pred, PEF variability, total serum IgE level, cumulative tobacco cigarette consumption (pack-years), and SGRQ score (P<0.05). By different cluster analyses, distinct clinical phenotypes of chronic airway diseases are identified. Thus, individualized treatments may guide doctors to provide based on different phenotypes.
NASA Astrophysics Data System (ADS)
Scharfenberg, Franz-Josef; Bogner, Franz X.
2011-08-01
Emphasis on improving higher level biology education continues. A new two-step approach to the experimental phases within an outreach gene technology lab, derived from cognitive load theory, is presented. We compared our approach using a quasi-experimental design with the conventional one-step mode. The difference consisted of additional focused discussions combined with students writing down their ideas (step one) prior to starting any experimental procedure (step two). We monitored students' activities during the experimental phases by continuously videotaping 20 work groups within each approach ( N = 131). Subsequent classification of students' activities yielded 10 categories (with well-fitting intra- and inter-observer scores with respect to reliability). Based on the students' individual time budgets, we evaluated students' roles during experimentation from their prevalent activities (by independently using two cluster analysis methods). Independently of the approach, two common clusters emerged, which we labeled as `all-rounders' and as `passive students', and two clusters specific to each approach: `observers' as well as `high-experimenters' were identified only within the one-step approach whereas under the two-step conditions `managers' and `scribes' were identified. Potential changes in group-leadership style during experimentation are discussed, and conclusions for optimizing science teaching are drawn.
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Du, Yuncheng; Budman, Hector M; Duever, Thomas A
2017-06-01
Accurate and fast quantitative analysis of living cells from fluorescence microscopy images is useful for evaluating experimental outcomes and cell culture protocols. An algorithm is developed in this work to automatically segment and distinguish apoptotic cells from normal cells. The algorithm involves three steps consisting of two segmentation steps and a classification step. The segmentation steps are: (i) a coarse segmentation, combining a range filter with a marching square method, is used as a prefiltering step to provide the approximate positions of cells within a two-dimensional matrix used to store cells' images and the count of the number of cells for a given image; and (ii) a fine segmentation step using the Active Contours Without Edges method is applied to the boundaries of cells identified in the coarse segmentation step. Although this basic two-step approach provides accurate edges when the cells in a given image are sparsely distributed, the occurrence of clusters of cells in high cell density samples requires further processing. Hence, a novel algorithm for clusters is developed to identify the edges of cells within clusters and to approximate their morphological features. Based on the segmentation results, a support vector machine classifier that uses three morphological features: the mean value of pixel intensities in the cellular regions, the variance of pixel intensities in the vicinity of cell boundaries, and the lengths of the boundaries, is developed for distinguishing apoptotic cells from normal cells. The algorithm is shown to be efficient in terms of computational time, quantitative analysis, and differentiation accuracy, as compared with the use of the active contours method without the proposed preliminary coarse segmentation step.
Nevo, Daniel; Zucker, David M; Tamimi, Rulla M; Wang, Molin
2016-12-30
A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps-clustering and risk factor assessment. However, two sources of measurement error arise in these problems. The first is the measurement error in the biomarker values. The second is the misclassification error when assigning observations to clusters. We consider the case with a specified set of relevant markers and propose a unified single-likelihood approach for normally distributed biomarkers. As an alternative, we consider a two-step procedure with the tumor type misclassification error taken into account in the second-step risk factor analysis. We describe our method for binary data and also for survival analysis data using a modified version of the Cox model. We present asymptotic theory for the proposed estimators. Simulation results indicate that our methods significantly lower the bias with a small price being paid in terms of variance. We present an analysis of breast cancer data from the Nurses' Health Study to demonstrate the utility of our method. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Clustering of Variables for Mixed Data
NASA Astrophysics Data System (ADS)
Saracco, J.; Chavent, M.
2016-05-01
This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.
Stefurak, Tres; Calhoun, Georgia B
2007-01-01
The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.
Student Motivational Profiles in an Introductory MIS Course: An Exploratory Cluster Analysis
ERIC Educational Resources Information Center
Nelson, Klara
2014-01-01
This study profiles students in an introductory MIS course according to a variety of variables associated with choice of academic major. The data were collected through a survey administered to 12 sections of the course. A two-step cluster analysis was performed with gender as a categorical variable and students' perceptions of task value…
Schramm, Catherine; Vial, Céline; Bachoud-Lévi, Anne-Catherine; Katsahian, Sandrine
2018-01-01
Heterogeneity in treatment efficacy is a major concern in clinical trials. Clustering may help to identify the treatment responders and the non-responders. In the context of longitudinal cluster analyses, sample size and variability of the times of measurements are the main issues with the current methods. Here, we propose a new two-step method for the Clustering of Longitudinal data by using an Extended Baseline. The first step relies on a piecewise linear mixed model for repeated measurements with a treatment-time interaction. The second step clusters the random predictions and considers several parametric (model-based) and non-parametric (partitioning, ascendant hierarchical clustering) algorithms. A simulation study compares all options of the clustering of longitudinal data by using an extended baseline method with the latent-class mixed model. The clustering of longitudinal data by using an extended baseline method with the two model-based algorithms was the more robust model. The clustering of longitudinal data by using an extended baseline method with all the non-parametric algorithms failed when there were unequal variances of treatment effect between clusters or when the subgroups had unbalanced sample sizes. The latent-class mixed model failed when the between-patients slope variability is high. Two real data sets on neurodegenerative disease and on obesity illustrate the clustering of longitudinal data by using an extended baseline method and show how clustering may help to identify the marker(s) of the treatment response. The application of the clustering of longitudinal data by using an extended baseline method in exploratory analysis as the first stage before setting up stratified designs can provide a better estimation of treatment effect in future clinical trials.
Coarse Point Cloud Registration by Egi Matching of Voxel Clusters
NASA Astrophysics Data System (ADS)
Wang, Jinhu; Lindenbergh, Roderik; Shen, Yueqian; Menenti, Massimo
2016-06-01
Laser scanning samples the surface geometry of objects efficiently and records versatile information as point clouds. However, often more scans are required to fully cover a scene. Therefore, a registration step is required that transforms the different scans into a common coordinate system. The registration of point clouds is usually conducted in two steps, i.e. coarse registration followed by fine registration. In this study an automatic marker-free coarse registration method for pair-wise scans is presented. First the two input point clouds are re-sampled as voxels and dimensionality features of the voxels are determined by principal component analysis (PCA). Then voxel cells with the same dimensionality are clustered. Next, the Extended Gaussian Image (EGI) descriptor of those voxel clusters are constructed using significant eigenvectors of each voxel in the cluster. Correspondences between clusters in source and target data are obtained according to the similarity between their EGI descriptors. The random sampling consensus (RANSAC) algorithm is employed to remove outlying correspondences until a coarse alignment is obtained. If necessary, a fine registration is performed in a final step. This new method is illustrated on scan data sampling two indoor scenarios. The results of the tests are evaluated by computing the point to point distance between the two input point clouds. The presented two tests resulted in mean distances of 7.6 mm and 9.5 mm respectively, which are adequate for fine registration.
Nevo, Daniel; Zucker, David M.; Tamimi, Rulla M.; Wang, Molin
2017-01-01
A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps–clustering and risk factor assessment. However, two sources of measurement error arise in these problems. The first is the measurement error in the biomarker values. The second is the misclassification error when assigning observations to clusters. We consider the case with a specified set of relevant markers and propose a unified single-likelihood approach for normally distributed biomarkers. As an alternative, we consider a two-step procedure with the tumor type misclassification error taken into account in the second-step risk factor analysis. We describe our method for binary data and also for survival analysis data using a modified version of the Cox model. We present asymptotic theory for the proposed estimators. Simulation results indicate that our methods significantly lower the bias with a small price being paid in terms of variance. We present an analysis of breast cancer data from the Nurses’ Health Study to demonstrate the utility of our method. PMID:27558651
A two-step initial mass function:. Consequences of clustered star formation for binary properties
NASA Astrophysics Data System (ADS)
Durisen, R. H.; Sterzik, M. F.; Pickett, B. K.
2001-06-01
If stars originate in transient bound clusters of moderate size, these clusters will decay due to dynamic interactions in which a hard binary forms and ejects most or all the other stars. When the cluster members are chosen at random from a reasonable initial mass function (IMF), the resulting binary characteristics do not match current observations. We find a significant improvement in the trends of binary properties from this scenario when an additional constraint is taken into account, namely that there is a distribution of total cluster masses set by the masses of the cloud cores from which the clusters form. Two distinct steps then determine final stellar masses - the choice of a cluster mass and the formation of the individual stars. We refer to this as a ``two-step'' IMF. Simple statistical arguments are used in this paper to show that a two-step IMF, combined with typical results from dynamic few-body system decay, tends to give better agreement between computed binary characteristics and observations than a one-step mass selection process.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-10-30
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-01-01
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms. PMID:26528984
Shawyer, Frances; Enticott, Joanne C; Brophy, Lisa; Bruxner, Annie; Fossey, Ellie; Inder, Brett; Julian, John; Kakuma, Ritsuko; Weller, Penelope; Wilson-Evered, Elisabeth; Edan, Vrinda; Slade, Mike; Meadows, Graham N
2017-05-08
Recovery features strongly in Australian mental health policy; however, evidence is limited for the efficacy of recovery-oriented practice at the service level. This paper describes the Principles Unite Local Services Assisting Recovery (PULSAR) Specialist Care trial protocol for a recovery-oriented practice training intervention delivered to specialist mental health services staff. The primary aim is to evaluate whether adult consumers accessing services where staff have received the intervention report superior recovery outcomes compared to adult consumers accessing services where staff have not yet received the intervention. A qualitative sub-study aims to examine staff and consumer views on implementing recovery-oriented practice. A process evaluation sub-study aims to articulate important explanatory variables affecting the interventions rollout and outcomes. The mixed methods design incorporates a two-step stepped-wedge cluster randomized controlled trial (cRCT) examining cross-sectional data from three phases, and nested qualitative and process evaluation sub-studies. Participating specialist mental health care services in Melbourne, Victoria are divided into 14 clusters with half randomly allocated to receive the staff training in year one and half in year two. Research participants are consumers aged 18-75 years who attended the cluster within a previous three-month period either at baseline, 12 (step 1) or 24 months (step 2). In the two nested sub-studies, participation extends to cluster staff. The primary outcome is the Questionnaire about the Process of Recovery collected from 756 consumers (252 each at baseline, step 1, step 2). Secondary and other outcomes measuring well-being, service satisfaction and health economic impact are collected from a subset of 252 consumers (63 at baseline; 126 at step 1; 63 at step 2) via interviews. Interview-based longitudinal data are also collected 12 months apart from 88 consumers with a psychotic disorder diagnosis (44 at baseline, step 1; 44 at step 1, step 2). cRCT data will be analyzed using multilevel mixed-effects modelling to account for clustering and some repeated measures, supplemented by thematic analysis of qualitative interview data. The process evaluation will draw on qualitative, quantitative and documentary data. Findings will provide an evidence-base for the continued transformation of Australian mental health service frameworks toward recovery. Australian and New Zealand Clinical Trial Registry: ACTRN12614000957695 . Date registered: 8 September 2014.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less
Steenbergen, K G; Gaston, N
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.
Deckersbach, Thilo; Peters, Amy T.; Sylvia, Louisa G.; Gold, Alexandra K.; da Silva Magalhaes, Pedro Vieira; Henry, David B.; Frank, Ellen; Otto, Michael W.; Berk, Michael; Dougherty, Darin D.; Nierenberg, Andrew A.; Miklowitz, David J.
2016-01-01
Background We sought to address how predictors and moderators of psychotherapy for bipolar depression – identified individually in prior analyses – can inform the development of a metric for prospectively classifying treatment outcome in intensive psychotherapy (IP) versus collaborative care (CC) adjunctive to pharmacotherapy in the Systematic Treatment Enhancement Program (STEP-BD) study. Methods We conducted post-hoc analyses on 135 STEP-BD participants using cluster analysis to identify subsets of participants with similar clinical profiles and investigated this combined metric as a moderator and predictor of response to IP. We used agglomerative hierarchical cluster analyses and k-means clustering to determine the content of the clinical profiles. Logistic regression and Cox proportional hazard models were used to evaluate whether the resulting clusters predicted or moderated likelihood of recovery or time until recovery. Results The cluster analysis yielded a two-cluster solution: 1) “less-recurrent/severe” and 2) “chronic/recurrent.” Rates of recovery in IP were similar for less-recurrent/severe and chronic/recurrent participants. Less-recurrent/severe patients were more likely than chronic/recurrent patients to achieve recovery in CC (p = .040, OR = 4.56). IP yielded a faster recovery for chronic/recurrent participants, whereas CC led to recovery sooner in the less-recurrent/severe cluster (p = .034, OR = 2.62). Limitations Cluster analyses require list-wise deletion of cases with missing data so we were unable to conduct analyses on all STEP-BD participants. Conclusions A well-powered, parametric approach can distinguish patients based on illness history and provide clinicians with symptom profiles of patients that confer differential prognosis in CC vs. IP. PMID:27289316
Yokoyama, Eiji; Uchimura, Masako
2007-11-01
Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.
Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra
2016-11-20
The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Automated modal parameter estimation using correlation analysis and bootstrap sampling
NASA Astrophysics Data System (ADS)
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Two-step evolution of endosymbiosis between hydra and algae.
Ishikawa, Masakazu; Shimizu, Hiroshi; Nozawa, Masafumi; Ikeo, Kazuho; Gojobori, Takashi
2016-10-01
In the Hydra vulgaris group, only 2 of the 25 strains in the collection of the National Institute of Genetics in Japan currently show endosymbiosis with green algae. However, whether the other non-symbiotic strains also have the potential to harbor algae remains unknown. The endosymbiotic potential of non-symbiotic strains that can harbor algae may have been acquired before or during divergence of the strains. With the aim of understanding the evolutionary process of endosymbiosis in the H. vulgaris group, we examined the endosymbiotic potential of non-symbiotic strains of the H. vulgaris group by artificially introducing endosymbiotic algae. We found that 12 of the 23 non-symbiotic strains were able to harbor the algae until reaching the grand-offspring through the asexual reproduction by budding. Moreover, a phylogenetic analysis of mitochondrial genome sequences showed that all the strains with endosymbiotic potential grouped into a single cluster (cluster γ). This cluster contained two strains (J7 and J10) that currently harbor algae; however, these strains were not the closest relatives. These results suggest that evolution of endosymbiosis occurred in two steps; first, endosymbiotic potential was gained once in the ancestor of the cluster γ lineage; second, strains J7 and J10 obtained algae independently after the divergence of the strains. By demonstrating the evolution of the endosymbiotic potential in non-symbiotic H. vulgaris group strains, we have clearly distinguished two evolutionary steps. The step-by-step evolutionary process provides significant insight into the evolution of endosymbiosis in cnidarians. Copyright © 2016 Elsevier Inc. All rights reserved.
A cross-species bi-clustering approach to identifying conserved co-regulated genes.
Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo
2016-06-15
A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ jinbo@engr.uconn.edu. © The Author 2016. Published by Oxford University Press.
van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A
2009-06-01
Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.
Multi-Spatiotemporal Patterns of Residential Burglary Crimes in Chicago: 2006-2016
NASA Astrophysics Data System (ADS)
Luo, J.
2017-10-01
This research attempts to explore the patterns of burglary crimes at multi-spatiotemporal scales in Chicago between 2006 and 2016. Two spatial scales are investigated that are census block and police beat area. At each spatial scale, three temporal scales are integrated to make spatiotemporal slices: hourly scale with two-hour time step from 12:00am to the end of the day; daily scale with one-day step from Sunday to Saturday within a week; monthly scale with one-month step from January to December. A total of six types of spatiotemporal slices will be created as the base for the analysis. Burglary crimes are spatiotemporally aggregated to spatiotemporal slices based on where and when they occurred. For each type of spatiotemporal slices with burglary occurrences integrated, spatiotemporal neighborhood will be defined and managed in a spatiotemporal matrix. Hot-spot analysis will identify spatiotemporal clusters of each type of spatiotemporal slices. Spatiotemporal trend analysis is conducted to indicate how the clusters shift in space and time. The analysis results will provide helpful information for better target policing and crime prevention policy such as police patrol scheduling regarding times and places covered.
A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis
Liu, Jingxian; Wu, Kefeng
2017-01-01
The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with traditional spectral clustering and fast affinity propagation clustering. Experimental results have illustrated its superior performance in terms of quantitative and qualitative evaluations. PMID:28777353
Bias and inference from misspecified mixed-effect models in stepped wedge trial analysis.
Thompson, Jennifer A; Fielding, Katherine L; Davey, Calum; Aiken, Alexander M; Hargreaves, James R; Hayes, Richard J
2017-10-15
Many stepped wedge trials (SWTs) are analysed by using a mixed-effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common-to-all or varied-between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within-cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within-cluster comparisons in the standard model. In the SWTs simulated here, mixed-effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within-cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Bias and inference from misspecified mixed‐effect models in stepped wedge trial analysis
Fielding, Katherine L.; Davey, Calum; Aiken, Alexander M.; Hargreaves, James R.; Hayes, Richard J.
2017-01-01
Many stepped wedge trials (SWTs) are analysed by using a mixed‐effect model with a random intercept and fixed effects for the intervention and time periods (referred to here as the standard model). However, it is not known whether this model is robust to misspecification. We simulated SWTs with three groups of clusters and two time periods; one group received the intervention during the first period and two groups in the second period. We simulated period and intervention effects that were either common‐to‐all or varied‐between clusters. Data were analysed with the standard model or with additional random effects for period effect or intervention effect. In a second simulation study, we explored the weight given to within‐cluster comparisons by simulating a larger intervention effect in the group of the trial that experienced both the control and intervention conditions and applying the three analysis models described previously. Across 500 simulations, we computed bias and confidence interval coverage of the estimated intervention effect. We found up to 50% bias in intervention effect estimates when period or intervention effects varied between clusters and were treated as fixed effects in the analysis. All misspecified models showed undercoverage of 95% confidence intervals, particularly the standard model. A large weight was given to within‐cluster comparisons in the standard model. In the SWTs simulated here, mixed‐effect models were highly sensitive to departures from the model assumptions, which can be explained by the high dependence on within‐cluster comparisons. Trialists should consider including a random effect for time period in their SWT analysis model. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28556355
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.
Williams, N J; Nasuto, S J; Saddy, J D
2015-07-30
The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
Unsupervised color image segmentation using a lattice algebra clustering technique
NASA Astrophysics Data System (ADS)
Urcid, Gonzalo; Ritter, Gerhard X.
2011-08-01
In this paper we introduce a lattice algebra clustering technique for segmenting digital images in the Red-Green- Blue (RGB) color space. The proposed technique is a two step procedure. Given an input color image, the first step determines the finite set of its extreme pixel vectors within the color cube by means of the scaled min-W and max-M lattice auto-associative memory matrices, including the minimum and maximum vector bounds. In the second step, maximal rectangular boxes enclosing each extreme color pixel are found using the Chebychev distance between color pixels; afterwards, clustering is performed by assigning each image pixel to its corresponding maximal box. The two steps in our proposed method are completely unsupervised or autonomous. Illustrative examples are provided to demonstrate the color segmentation results including a brief numerical comparison with two other non-maximal variations of the same clustering technique.
Cluster analysis of Southeastern U.S. climate stations
NASA Astrophysics Data System (ADS)
Stooksbury, D. E.; Michaels, P. J.
1991-09-01
A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.
Crack, Jason C; Gaskell, Alisa A; Green, Jeffrey; Cheesman, Myles R; Le Brun, Nick E; Thomson, Andrew J
2008-02-06
In Escherichia coli, the switch between aerobic and anaerobic metabolism is primarily controlled by the fumarate and nitrate reduction transcriptional regulator FNR. In the absence of O2, FNR binds a [4Fe-4S]2+ cluster, generating a transcriptionally active dimeric form. Exposure to O2 results in the conversion of the cluster to a [2Fe-2S]2+ form, leading to dissociation of the protein into transcriptionally inactive monomers. The [4Fe-4S]2+ to [2Fe-2S]2+ cluster conversion proceeds in two steps. Step 1 involves the one-electron oxidation of the cluster, resulting in the release of Fe2+, generating a [3Fe-4S]1+ cluster intermediate, and a superoxide ion. In step 2, the cluster intermediate spontaneously rearranges to form the [2Fe-2S]2+ cluster, with the release of a Fe3+ ion and two sulfide ions. Here, we demonstrate that, in both native and reconstituted [4Fe-4S] FNR, the reaction environment and, in particular, the presence of Fe2+ and/or Fe3+ chelators can influence significantly the cluster conversion reaction. We demonstrate that while the rate of step 1 is largely insensitive to chelators, that of step 2 is significantly enhanced by both Fe2+ and Fe3+ chelators. We show that, for reactions in Fe3+-coordinating phosphate buffer, step 2 is enhanced to the extent that step 1 becomes the rate determining step and the [3Fe-4S]1+ intermediate is no longer detectable. Furthermore, Fe3+ released during this step is susceptible to reduction in the presence of Fe2+ chelators. This work, which may have significance for the in vivo FNR cluster conversion reaction in the cell cytoplasm, provides an explanation for apparently contradictory results reported from different laboratories.
Iterative Stable Alignment and Clustering of 2D Transmission Electron Microscope Images
Yang, Zhengfan; Fang, Jia; Chittuluru, Johnathan; Asturias, Francisco J.; Penczek, Pawel A.
2012-01-01
SUMMARY Identification of homogeneous subsets of images in a macromolecular electron microscopy (EM) image data set is a critical step in single-particle analysis. The task is handled by iterative algorithms, whose performance is compromised by the compounded limitations of image alignment and K-means clustering. Here we describe an approach, iterative stable alignment and clustering (ISAC) that, relying on a new clustering method and on the concepts of stability and reproducibility, can extract validated, homogeneous subsets of images. ISAC requires only a small number of simple parameters and, with minimal human intervention, can eliminate bias from two-dimensional image clustering and maximize the quality of group averages that can be used for ab initio three-dimensional structural determination and analysis of macromolecular conformational variability. Repeated testing of the stability and reproducibility of a solution within ISAC eliminates heterogeneous or incorrect classes and introduces critical validation to the process of EM image clustering. PMID:22325773
Investigation of correlation classification techniques
NASA Technical Reports Server (NTRS)
Haskell, R. E.
1975-01-01
A two-step classification algorithm for processing multispectral scanner data was developed and tested. The first step is a single pass clustering algorithm that assigns each pixel, based on its spectral signature, to a particular cluster. The output of that step is a cluster tape in which a single integer is associated with each pixel. The cluster tape is used as the input to the second step, where ground truth information is used to classify each cluster using an iterative method of potentials. Once the clusters have been assigned to classes the cluster tape is read pixel-by-pixel and an output tape is produced in which each pixel is assigned to its proper class. In addition to the digital classification programs, a method of using correlation clustering to process multispectral scanner data in real time by means of an interactive color video display is also described.
Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation
Siraj, Maheyzah Md; Zainal, Anazida; Elshoush, Huwaida Tagelsir; Elhaj, Fatin
2016-01-01
Grouping and clustering alerts for intrusion detection based on the similarity of features is referred to as structurally base alert correlation and can discover a list of attack steps. Previous researchers selected different features and data sources manually based on their knowledge and experience, which lead to the less accurate identification of attack steps and inconsistent performance of clustering accuracy. Furthermore, the existing alert correlation systems deal with a huge amount of data that contains null values, incomplete information, and irrelevant features causing the analysis of the alerts to be tedious, time-consuming and error-prone. Therefore, this paper focuses on selecting accurate and significant features of alerts that are appropriate to represent the attack steps, thus, enhancing the structural-based alert correlation model. A two-tier feature selection method is proposed to obtain the significant features. The first tier aims at ranking the subset of features based on high information gain entropy in decreasing order. The second tier extends additional features with a better discriminative ability than the initially ranked features. Performance analysis results show the significance of the selected features in terms of the clustering accuracy using 2000 DARPA intrusion detection scenario-specific dataset. PMID:27893821
Metabolic network visualization eliminating node redundance and preserving metabolic pathways
Bourqui, Romain; Cottret, Ludovic; Lacroix, Vincent; Auber, David; Mary, Patrick; Sagot, Marie-France; Jourdan, Fabien
2007-01-01
Background The tools that are available to draw and to manipulate the representations of metabolism are usually restricted to metabolic pathways. This limitation becomes problematic when studying processes that span several pathways. The various attempts that have been made to draw genome-scale metabolic networks are confronted with two shortcomings: 1- they do not use contextual information which leads to dense, hard to interpret drawings, 2- they impose to fit to very constrained standards, which implies, in particular, duplicating nodes making topological analysis considerably more difficult. Results We propose a method, called MetaViz, which enables to draw a genome-scale metabolic network and that also takes into account its structuration into pathways. This method consists in two steps: a clustering step which addresses the pathway overlapping problem and a drawing step which consists in drawing the clustered graph and each cluster. Conclusion The method we propose is original and addresses new drawing issues arising from the no-duplication constraint. We do not propose a single drawing but rather several alternative ways of presenting metabolism depending on the pathway on which one wishes to focus. We believe that this provides a valuable tool to explore the pathway structure of metabolism. PMID:17608928
Patterns of Dysmorphic Features in Schizophrenia
Scutt, L.E.; Chow, E.W.C.; Weksberg, R.; Honer, W.G.; Bassett, Anne S.
2011-01-01
Congenital dysmorphic features are prevalent in schizophrenia and may reflect underlying neurodevelopmental abnormalities. A cluster analysis approach delineating patterns of dysmorphic features has been used in genetics to classify individuals into more etiologically homogeneous subgroups. In the present study, this approach was applied to schizophrenia, using a sample with a suspected genetic syndrome as a testable model. Subjects (n = 159) with schizophrenia or schizoaffective disorder were ascertained from chronic patient populations (random, n=123) or referred with possible 22q11 deletion syndrome (referred, n = 36). All subjects were evaluated for presence or absence of 70 reliably assessed dysmorphic features, which were used in a three-step cluster analysis. The analysis produced four major clusters with different patterns of dysmorphic features. Significant between-cluster differences were found for rates of 37 dysmorphic features (P < 0.05), median number of dysmorphic features (P = 0.0001), and validating features not used in the cluster analysis: mild mental retardation (P = 0.001) and congenital heart defects (P = 0.002). Two clusters (1 and 4) appeared to represent more developmental subgroups of schizophrenia with elevated rates of dysmorphic features and validating features. Cluster 1 (n = 27) comprised mostly referred subjects. Cluster 4 (n= 18) had a different pattern of dysmorphic features; one subject had a mosaic Turner syndrome variant. Two other clusters had lower rates and patterns of features consistent with those found in previous studies of schizophrenia. Delineating patterns of dysmorphic features may help identify subgroups that could represent neurodevelopmental forms of schizophrenia with more homogeneous origins. PMID:11803519
An assessment of fatigue in patients with postural orthostatic tachycardia syndrome.
Wise, Shelby; Ross, Amanda; Brown, Abigail; Evans, Meredyth; Jason, Leonard
2017-05-01
Individuals with postural orthostatic tachycardia syndrome share many symptoms with those who have chronic fatigue syndrome; one of which is severe fatigue. Previous literature found that those with chronic fatigue syndrome experience many forms of fatigue. The goal of this study was to investigate whether individuals with postural orthostatic tachycardia syndrome also experience multidimensional fatigue and whether these individuals can be clustered into subgroups based on the types of fatigue they endorse. A convenience sample of 138 participants (aged 14-29) with postural orthostatic tachycardia syndrome completed questionnaires that assessed fatigue, brain fog symptom severity, activities that improve brain fog, and brain fog-related disability. An exploratory factor analysis was conducted on the Fatigue Types Questionnaire, and a three-factor solution was produced. Factor scores were then used to cluster the patients into groups using a TwoStep cluster analysis. This resulted in two clusters, a high severity group and a low severity group. The clusters were then compared on a number of items related to symptom expression. Individuals within the more severe cluster had significantly more brain fog at the beginning and end of the survey when compared to cluster two. Those in the more severe cluster also described more activity impairment as well as more frequent, more severe, and more debilitation from postural orthostatic tachycardia syndrome and brain fog. The findings of the factor analysis suggest that patients with postural orthostatic tachycardia syndrome experience fatigue as a multidimensional construct and they also can be subgrouped based on symptom severity.
Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.
Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D
2017-06-01
Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.
Construction and engineering of large biochemical pathways via DNA assembler
Shao, Zengyi; Zhao, Huimin
2015-01-01
Summary DNA assembler enables rapid construction and engineering of biochemical pathways in a one-step fashion by exploitation of the in vivo homologous recombination mechanism in Saccharomyces cerevisiae. It has many applications in pathway engineering, metabolic engineering, combinatorial biology, and synthetic biology. Here we use two examples including the zeaxanthin biosynthetic pathway and the aureothin biosynthetic gene cluster to describe the key steps in the construction of pathways containing multiple genes using the DNA assembler approach. Methods for construct design, pathway assembly, pathway confirmation, and functional analysis are shown. The protocol for fine genetic modifications such as site-directed mutagenesis for engineering the aureothin gene cluster is also illustrated. PMID:23996442
Identification and characterization of near-fatal asthma phenotypes by cluster analysis.
Serrano-Pariente, J; Rodrigo, G; Fiz, J A; Crespo, A; Plaza, V
2015-09-01
Near-fatal asthma (NFA) is a heterogeneous clinical entity and several profiles of patients have been described according to different clinical, pathophysiological and histological features. However, there are no previous studies that identify in a unbiased way--using statistical methods such as clusters analysis--different phenotypes of NFA. Therefore, the aim of the present study was to identify and to characterize phenotypes of near fatal asthma using a cluster analysis. Over a period of 2 years, 33 Spanish hospitals enrolled 179 asthmatics admitted for an episode of NFA. A cluster analysis using two-steps algorithm was performed from data of 84 of these cases. The analysis defined three clusters of patients with NFA: cluster 1, the largest, including older patients with clinical and therapeutic criteria of severe asthma; cluster 2, with an high proportion of respiratory arrest (68%), impaired consciousness level (82%) and mechanical ventilation (93%); and cluster 3, which included younger patients, characterized by an insufficient anti-inflammatory treatment and frequent sensitization to Alternaria alternata and soybean. These results identify specific asthma phenotypes involved in NFA, confirming in part previous findings observed in studies with a clinical approach. The identification of patients with a specific NFA phenotype could suggest interventions to prevent future severe asthma exacerbations. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Marshman, Z; Broomhead, T; Rodd, H D; Jones, K; Burke, D; Baker, S R
2016-09-28
Emergency departments (EDs) have been identified as key providers of dental care although few studies have examined patterns of attendance or clusters of characteristics. The aim was to identify the reasons for visits to an ED, whether these remained stable over time, and characterize clusters of patients by socio-demographic and attendance variables. Pseudonymized data were obtained for children who attended the ED in 2003-2004, 2004-2005 and 2012-2013. Presenting complaint was categorized as attending for dental or nondental reasons. Other variables analysed included patient (age, sex, ethnicity and deprivation) and attendance characteristics (distance travelled, season, nature of complaint, time elapsed since onset of symptoms, day of week and hours of attendance), together with treatment outcome (advice, antibiotics and referral). To assess trends over time, analyses were conducted on patient, attendance and treatment outcome variables. To examine whether patients could be characterized by socio-demographic and attendance variables, a two-step cluster analysis was undertaken on 2003-2004 data set and validated on 2004-2005 and 2012-2013 data sets. In 2003-2004, 550 children attended the ED for dental reasons rising to 687 in 2012-2013. The most important predictors of dental attendance were as follows: nature of complaint, ethnicity, time elapsed, sex and deprivation of the area in which children lived. The analysis showed two clusters: cluster 1 was comprised of children who attended the ED for dental injury, were of White ethnicity and attended within 24 h of onset of symptoms. Children in this cluster were likely to be from the least or less deprived areas (compared to Cluster 2) and were more likely to be males. Cluster 2 comprised of children attending the ED for caries, oral mucosal lesions or other complaints, were likely to be of other (non-White) ethnicities and were likely to attend more than 24 h after symptoms began. Children in this cluster were more likely to come from the most deprived areas and were both males and females. The clusters varied according to treatment outcome; those patients in Cluster 2 were more likely to be prescribed medication, whilst those children in Cluster 1 were more likely to be referred to another specialty. A significant number of visits to the ED were for dental reasons with two clusters of children. The results have identified groups of patients for whom appropriate dental provision is lacking and where targeted services are needed to improve outcomes for children and reduce the burden on EDs. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Unsupervised spike sorting based on discriminative subspace learning.
Keshtkaran, Mohammad Reza; Yang, Zhi
2014-01-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
Hou, Jin-Le; Luo, Wen; Wu, Yin-Yin; Su, Hu-Chao; Zhang, Guang-Lin; Zhu, Qin-Yu; Dai, Jie
2015-12-14
Two benzene dicarboxylate (BDC) and salicylate (SAL) substituted titanium-oxo-clusters, Ti13O10(o-BDC)4(SAL)4(O(i)Pr)16 (1) and Ti13O10(o-BDC)4(SAL-Cl)4(O(i)Pr)16 (2), are prepared by one step in situ solvothermal synthesis. Single crystal analysis shows that the two Ti13 clusters take a paddle arrangement with an S4 symmetry. The non-compact (non-sphere) structure is stabilized by the coordination of BDC and SAL. Film photoelectrodes are prepared by the wet coating process using the solution of the clusters and the photocurrent response properties of the electrodes are studied. It is found that the photocurrent density and photoresponsiveness of the electrodes are related to the number of coating layers and the annealing temperature. Using ligand coordinated titanium-oxo-clusters as the molecular precursors of TiO2 anatase films is found to be effective due to their high solubility, appropriate stability in solution and hence the easy controllability.
Goh, Yong-Shian; Lee, Alice; Chan, Sally Wai-Chi; Chan, Moon Fai
2015-08-01
This study aimed to determine whether definable profiles existed in a cohort of nursing staff with regard to demographic characteristics, job satisfaction, acculturation, work environment, stress, cultural values and coping abilities. A survey was conducted in one hospital in Singapore from June to July 2012, and 814 full-time staff nurses completed a self-report questionnaire (89% response rate). Demographic characteristics, job satisfaction, acculturation, work environment, perceived stress, cultural values, ways of coping and intention to leave current workplace were assessed as outcomes. The two-step cluster analysis revealed three clusters. Nurses in cluster 1 (n = 222) had lower acculturation scores than nurses in cluster 3. Cluster 2 (n = 362) was a group of younger nurses who reported higher intention to leave (22.4%), stress level and job dissatisfaction than the other two clusters. Nurses in cluster 3 (n = 230) were mostly Singaporean and reported the lowest intention to leave (13.0%). Resources should be allocated to specifically address the needs of younger nurses and hopefully retain them in the profession. Management should focus their retention strategies on junior nurses and provide a work environment that helps to strengthen their intention to remain in nursing by increasing their job satisfaction. © 2014 Wiley Publishing Asia Pty Ltd.
Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven
2017-01-01
Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313
Clustering of financial time series with application to index and enhanced index tracking portfolio
NASA Astrophysics Data System (ADS)
Dose, Christian; Cincotti, Silvano
2005-09-01
A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.
Cluster size selectivity in the product distribution of ethene dehydrogenation on niobium clusters.
Parnis, J Mark; Escobar-Cabrera, Eric; Thompson, Matthew G K; Jacula, J Paul; Lafleur, Rick D; Guevara-García, Alfredo; Martínez, Ana; Rayner, David M
2005-08-18
Ethene reactions with niobium atoms and clusters containing up to 25 constituent atoms have been studied in a fast-flow metal cluster reactor. The clusters react with ethene at about the gas-kinetic collision rate, indicating a barrierless association process as the cluster removal step. Exceptions are Nb8 and Nb10, for which a significantly diminished rate is observed, reflecting some cluster size selectivity. Analysis of the experimental primary product masses indicates dehydrogenation of ethene for all clusters save Nb10, yielding either Nb(n)C2H2 or Nb(n)C2. Over the range Nb-Nb6, the extent of dehydrogenation increases with cluster size, then decreases for larger clusters. For many clusters, secondary and tertiary product masses are also observed, showing varying degrees of dehydrogenation corresponding to net addition of C2H4, C2H2, or C2. With Nb atoms and several small clusters, formal addition of at least six ethene molecules is observed, suggesting a polymerization process may be active. Kinetic analysis of the Nb atom and several Nb(n) cluster reactions with ethene shows that the process is consistent with sequential addition of ethene units at rates corresponding approximately to the gas-kinetic collision frequency for several consecutive reacting ethene molecules. Some variation in the rate of ethene pick up is found, which likely reflects small energy barriers or steric constraints associated with individual mechanistic steps. Density functional calculations of structures of Nb clusters up to Nb(6), and the reaction products Nb(n)C2H2 and Nb(n)C2 (n = 1...6) are presented. Investigation of the thermochemistry for the dehydrogenation of ethene to form molecular hydrogen, for the Nb atom and clusters up to Nb6, demonstrates that the exergonicity of the formation of Nb(n)C2 species increases with cluster size over this range, which supports the proposal that the extent of dehydrogenation is determined primarily by thermodynamic constraints. Analysis of the structural variations present in the cluster species studied shows an increase in C-H bond lengths with cluster size that closely correlates with the increased thermodynamic drive to full dehydrogenation. This correlation strongly suggests that all steps in the reaction are barrierless, and that weakening of the C-H bonds is directly reflected in the thermodynamics of the overall dehydrogenation process. It is also demonstrated that reaction exergonicity in the initial partial dehydrogenation step must be carried through as excess internal energy into the second dehydrogenation step.
Two-step entanglement concentration for arbitrary electronic cluster state
NASA Astrophysics Data System (ADS)
Zhao, Sheng-Yang; Liu, Jiong; Zhou, Lan; Sheng, Yu-Bo
2013-12-01
We present an efficient protocol for concentrating an arbitrary four-electron less-entangled cluster state into a maximally entangled cluster state. As a two-step entanglement concentration protocol (ECP), it only needs one pair of less-entangled cluster state, which makes this ECP more economical. With the help of electronic polarization beam splitter (PBS) and the charge detection, the whole concentration process is essentially the quantum nondemolition (QND) measurement. Therefore, the concentrated maximally entangled state can be remained for further application. Moreover, the discarded terms in some traditional ECPs can be reused to obtain a high success probability. It is feasible and useful in current one-way quantum computation.
Jung, Inuk; Jo, Kyuri; Kang, Hyejin; Ahn, Hongryul; Yu, Youngjae; Kim, Sun
2017-12-01
Identifying biologically meaningful gene expression patterns from time series gene expression data is important to understand the underlying biological mechanisms. To identify significantly perturbed gene sets between different phenotypes, analysis of time series transcriptome data requires consideration of time and sample dimensions. Thus, the analysis of such time series data seeks to search gene sets that exhibit similar or different expression patterns between two or more sample conditions, constituting the three-dimensional data, i.e. gene-time-condition. Computational complexity for analyzing such data is very high, compared to the already difficult NP-hard two dimensional biclustering algorithms. Because of this challenge, traditional time series clustering algorithms are designed to capture co-expressed genes with similar expression pattern in two sample conditions. We present a triclustering algorithm, TimesVector, specifically designed for clustering three-dimensional time series data to capture distinctively similar or different gene expression patterns between two or more sample conditions. TimesVector identifies clusters with distinctive expression patterns in three steps: (i) dimension reduction and clustering of time-condition concatenated vectors, (ii) post-processing clusters for detecting similar and distinct expression patterns and (iii) rescuing genes from unclassified clusters. Using four sets of time series gene expression data, generated by both microarray and high throughput sequencing platforms, we demonstrated that TimesVector successfully detected biologically meaningful clusters of high quality. TimesVector improved the clustering quality compared to existing triclustering tools and only TimesVector detected clusters with differential expression patterns across conditions successfully. The TimesVector software is available at http://biohealth.snu.ac.kr/software/TimesVector/. sunkim.bioinfo@snu.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat
2014-07-01
Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.
[Autism Spectrum Disorder and DSM-5: Spectrum or Cluster?].
Kienle, Xaver; Freiberger, Verena; Greulich, Heide; Blank, Rainer
2015-01-01
Within the new DSM-5, the currently differentiated subgroups of "Autistic Disorder" (299.0), "Asperger's Disorder" (299.80) and "Pervasive Developmental Disorder" (299.80) are replaced by the more general "Autism Spectrum Disorder". With regard to a patient-oriented and expedient advising therapy planning, however, the issue of an empirically reproducible and clinically feasible differentiation into subgroups must still be raised. Based on two Autism-rating-scales (ASDS and FSK), an exploratory two-step cluster analysis was conducted with N=103 children (age: 5-18) seen in our social-pediatric health care centre to examine potentially autistic symptoms. In the two-cluster solution of both rating scales, mainly the problems in social communication grouped the children into a cluster "with communication problems" (51 % and 41 %), and a cluster "without communication problems". Within the three-cluster solution of the ASDS, sensory hypersensitivity, cleaving to routines and social-communicative problems generated an "autistic" subgroup (22%). The children of the second cluster ("communication problems", 35%) were only described by social-communicative problems, and the third group did not show any problems (38%). In the three-cluster solution of the FSK, the "autistic cluster" of the two-cluster solution differentiated in a subgroup with mainly social-communicative problems (cluster 1) and a second subgroup described by restrictive, repetitive behavior. The different cluster solutions will be discussed with a view to the new DSM-5 diagnostic criteria, for following studies a further specification of some of the ASDS and FSK items could be helpful.
Dynamic Trajectory Extraction from Stereo Vision Using Fuzzy Clustering
NASA Astrophysics Data System (ADS)
Onishi, Masaki; Yoda, Ikushi
In recent years, many human tracking researches have been proposed in order to analyze human dynamic trajectory. These researches are general technology applicable to various fields, such as customer purchase analysis in a shopping environment and safety control in a (railroad) crossing. In this paper, we present a new approach for tracking human positions by stereo image. We use the framework of two-stepped clustering with k-means method and fuzzy clustering to detect human regions. In the initial clustering, k-means method makes middle clusters from objective features extracted by stereo vision at high speed. In the last clustering, c-means fuzzy method cluster middle clusters based on attributes into human regions. Our proposed method can be correctly clustered by expressing ambiguity using fuzzy clustering, even when many people are close to each other. The validity of our technique was evaluated with the experiment of trajectories extraction of doctors and nurses in an emergency room of a hospital.
Multi-party Measurement-Device-Independent Quantum Key Distribution Based on Cluster States
NASA Astrophysics Data System (ADS)
Liu, Chuanqi; Zhu, Changhua; Ma, Shuquan; Pei, Changxing
2018-03-01
We propose a novel multi-party measurement-device-independent quantum key distribution (MDI-QKD) protocol based on cluster states. A four-photon analyzer which can distinguish all the 16 cluster states serves as the measurement device for four-party MDI-QKD. Any two out of four participants can build secure keys after the analyzers obtains successful outputs and the two participants perform post-processing. We derive a security analysis for the protocol, and analyze the key rates under different values of polarization misalignment. The results show that four-party MDI-QKD is feasible over 280 km in the optical fiber channel when the key rate is about 10- 6 with the polarization misalignment parameter 0.015. Moreover, our work takes an important step toward a quantum communication network.
[Difficulties in emotion regulation and personal distress in young adults with social anxiety].
Contardi, Anna; Farina, Benedetto; Fabbricatore, Mariantonietta; Tamburello, Stella; Scapellato, Paolo; Penzo, Ilaria; Tamburello, Antonino; Innamorati, Marco
2013-01-01
The aim of this study was to assess the association between social anxiety and difficulties in emotion regulation in a sample of Italian young adults. Our convenience sample was composed of 298 Italian young adults (184 women and 114 men) aged 18-34 years. Participants were administered the Interaction Anxiousness Scale (IAS), the Audience Anxiousness Scale (AAS), the Difficulties in Emotion Regulation Scale (DERS), and the Interpersonal Reactivity Index (IRI). A Two Step cluster analysis was used to group subjects according to their level of social anxiety. The cluster analysis indicated a two-cluster solution. The first cluster included 163 young adults with higher scores on the AAS and the IAS than those included in cluster 2 (n=135). A generalized linear model with groups as dependent variable indicated that people with higher social anxiety (compared to those with lower social anxiety) have higher scores on the dimension personal distress of the IRI (p<0.01), and on the DERS non acceptance of negative emotions (p<0.001) and lack of emotional clarity (p<0.05). The results are consistent with models of psychopathology, which hypothesize that people who cannot deal effectively with their emotions may develop depressive and anxious disorders.
Nara, Ayako; Hashimoto, Takuya; Komatsu, Mamoru; Nishiyama, Makoto; Kuzuyama, Tomohisa; Ikeda, Haruo
2017-05-01
Bafilomycins A 1 , C 1 and B 1 (setamycin) produced by Kitasatospora setae KM-6054 belong to the plecomacrolide family, which exhibit antibacterial, antifungal, antineoplastic and immunosuppressive activities. An analysis of gene clusters from K. setae KM-6054 governing the biosynthesis of bafilomycins revealed that it contains five large open reading frames (ORFs) encoding the multifunctional polypeptides of bafilomycin polyketide synthases (PKSs). These clustered PKS genes, which are responsible for bafilomycin biosynthesis, together encode 11 homologous sets of enzyme activities, each catalyzing a specific round of polyketide chain elongation. The region contains an additional 13 ORFs spanning a distance of 73 287 bp, some of which encode polypeptides governing other key steps in bafilomycin biosynthesis. Five ORFs, BfmB, BfmC, BfmD, BfmE and BfmF, were involved in the formation of methoxymalonyl-acyl carrier protein (ACP). Two possible regulatory genes, bfmR and bfmH, were found downstream of the above genes. A gene-knockout analysis revealed that BfmR was only a transcriptional regulator for the transcription of bafilomycin biosynthetic genes. Two genes, bfmI and bfmJ, were found downstream of bfmH. An analysis of these gene-disruption mutants in addition to an enzymatic analysis of BfmI and BfmJ revealed that BfmJ activated fumarate and BfmI functioned as a catalyst to form a fumaryl ester at the C21 hydroxyl residue of bafilomycin A 1 . A comparative analysis of bafilomycin gene clusters in K. setae KM-6054, Streptomyces lohii JCM 14114 and Streptomyces griseus DSM 2608 revealed that each ORF of both gene clusters in two Streptomyces strains were quite similar to each other. However, each ORF of gene cluster in K. setae KM-6054 was of lower similarity to that of corresponding ORF in the two Streptomyces species.
Kent, Peter; Jensen, Rikke K; Kongsted, Alice
2014-10-02
There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets indicated that all three clustering methods showed a near-perfect ability to detect known subgroups and correctly classify individuals into those subgroups. Our subjective judgement was that Latent Gold offered the best balance of sensitivity to subgroups, ease of use and presentation of results with these datasets but we recognise that different clustering methods may suit other types of data and clinical research questions.
Romay-Tallon, Raquel; Rivera-Baltanas, Tania; Allen, Josh; Olivares, Jose M; Kalynchuk, Lisa E; Caruncho, Hector J
2017-01-01
The pattern of serotonin transporter clustering on the plasma membrane of lymphocytes extracted from human whole blood samples has been identified as a putative biomarker of therapeutic efficacy in major depression. Here we evaluated the possibility of performing a similar analysis using blood smears obtained from rats, and from control human subjects and depression patients. We hypothesized that we could optimize a protocol to make the analysis of serotonin protein clustering in blood smears comparable to the analysis of serotonin protein clustering using isolated lymphocytes. Our data indicate that blood smears require a longer fixation time and longer times of incubation with primary and secondary antibodies. In addition, one needs to optimize the image analysis settings for the analysis of smears. When these steps are followed, the quantitative analysis of both the number and size of serotonin transporter clusters on the plasma membrane of lymphocytes is similar using both blood smears and isolated lymphocytes. The development of this novel protocol will greatly facilitate the collection of appropriate samples by eliminating the necessity and cost of specialized personnel for drawing blood samples, and by being a less invasive procedure. Therefore, this protocol will help us advance the validation of membrane protein clustering in lymphocytes as a biomarker of therapeutic efficacy in major depression, and bring it closer to its clinical application.
Group sequential designs for stepped-wedge cluster randomised trials
Grayling, Michael J; Wason, James MS; Mander, Adrian P
2017-01-01
Background/Aims: The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Methods: Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. Results: We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial’s type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. Conclusion: The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial. PMID:28653550
Group sequential designs for stepped-wedge cluster randomised trials.
Grayling, Michael J; Wason, James Ms; Mander, Adrian P
2017-10-01
The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial's type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial.
Common factor analysis versus principal component analysis: choice for symptom cluster research.
Kim, Hee-Ju
2008-03-01
The purpose of this paper is to examine differences between two factor analytical methods and their relevance for symptom cluster research: common factor analysis (CFA) versus principal component analysis (PCA). Literature was critically reviewed to elucidate the differences between CFA and PCA. A secondary analysis (N = 84) was utilized to show the actual result differences from the two methods. CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data. If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research), CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-05-21
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
Kasza, J; Hemming, K; Hooper, R; Matthews, Jns; Forbes, A B
2017-01-01
Stepped wedge and cluster randomised crossover trials are examples of cluster randomised designs conducted over multiple time periods that are being used with increasing frequency in health research. Recent systematic reviews of both of these designs indicate that the within-cluster correlation is typically taken account of in the analysis of data using a random intercept mixed model, implying a constant correlation between any two individuals in the same cluster no matter how far apart in time they are measured: within-period and between-period intra-cluster correlations are assumed to be identical. Recently proposed extensions allow the within- and between-period intra-cluster correlations to differ, although these methods require that all between-period intra-cluster correlations are identical, which may not be appropriate in all situations. Motivated by a proposed intensive care cluster randomised trial, we propose an alternative correlation structure for repeated cross-sectional multiple-period cluster randomised trials in which the between-period intra-cluster correlation is allowed to decay depending on the distance between measurements. We present results for the variance of treatment effect estimators for varying amounts of decay, investigating the consequences of the variation in decay on sample size planning for stepped wedge, cluster crossover and multiple-period parallel-arm cluster randomised trials. We also investigate the impact of assuming constant between-period intra-cluster correlations instead of decaying between-period intra-cluster correlations. Our results indicate that in certain design configurations, including the one corresponding to the proposed trial, a correlation decay can have an important impact on variances of treatment effect estimators, and hence on sample size and power. An R Shiny app allows readers to interactively explore the impact of correlation decay.
Determining the Number of Clusters in a Data Set Without Graphical Interpretation
NASA Technical Reports Server (NTRS)
Aguirre, Nathan S.; Davies, Misty D.
2011-01-01
Cluster analysis is a data mining technique that is meant ot simplify the process of classifying data points. The basic clustering process requires an input of data points and the number of clusters wanted. The clustering algorithm will then pick starting C points for the clusters, which can be either random spatial points or random data points. It then assigns each data point to the nearest C point where "nearest usually means Euclidean distance, but some algorithms use another criterion. The next step is determining whether the clustering arrangement this found is within a certain tolerance. If it falls within this tolerance, the process ends. Otherwise the C points are adjusted based on how many data points are in each cluster, and the steps repeat until the algorithm converges,
Carbon Fibers Conductivity Studies
NASA Technical Reports Server (NTRS)
Yang, C. Y.; Butkus, A. M.
1980-01-01
In an attempt to understand the process of electrical conduction in polyacrylonitrile (PAN)-based carbon fibers, calculations were carried out on cluster models of the fiber consisting of carbon, nitrogen, and hydrogen atoms using the modified intermediate neglect of differential overlap (MINDO) molecular orbital (MO) method. The models were developed based on the assumption that PAN carbon fibers obtained with heat treatment temperatures (HTT) below 1000 C retain nitrogen in a graphite-like lattice. For clusters modeling an edge nitrogen site, analysis of the occupied MO's indicated an electron distribution similar to that of graphite. A similar analysis for the somewhat less stable interior nitrogen site revealed a partially localized II electron distribution around the nitrogen atom. The differences in bonding trends and structural stability between edge and interior nitrogen clusters led to a two-step process proposed for nitrogen evolution with increasing HTT.
Snell, Deborah L; Surgenor, Lois J; Hay-Smith, E Jean C; Williman, Jonathan; Siegert, Richard J
2015-01-01
Outcomes after mild traumatic brain injury (MTBI) vary, with slow or incomplete recovery for a significant minority. This study examines whether groups of cases with shared psychological factors but with different injury outcomes could be identified using cluster analysis. This is a prospective observational study following 147 adults presenting to a hospital-based emergency department or concussion services in Christchurch, New Zealand. This study examined associations between baseline demographic, clinical, psychological variables (distress, injury beliefs and symptom burden) and outcome 6 months later. A two-step approach to cluster analysis was applied (Ward's method to identify clusters, K-means to refine results). Three meaningful clusters emerged (high-adapters, medium-adapters, low-adapters). Baseline cluster-group membership was significantly associated with outcomes over time. High-adapters appeared recovered by 6-weeks and medium-adapters revealed improvements by 6-months. The low-adapters continued to endorse many symptoms, negative recovery expectations and distress, being significantly at risk for poor outcome more than 6-months after injury (OR (good outcome) = 0.12; CI = 0.03-0.53; p < 0.01). Cluster analysis supported the notion that groups could be identified early post-injury based on psychological factors, with group membership associated with differing outcomes over time. Implications for clinical care providers regarding therapy targets and cases that may benefit from different intensities of intervention are discussed.
El Ansari, Walid; Ssewanyana, Derrick; Stock, Christiane
2018-01-01
Limited research has explored clustering of lifestyle behavioral risk factors (BRFs) among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance. We assessed (BRFs), namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities. A two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious) with very high health awareness/consciousness, good nutrition, and physical activity (PA), and relatively low alcohol, tobacco, and other drug (ATOD) use. Cluster 2 (the abstinent) had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious) included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking) showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1), students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4) reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students. Our results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.
Determining the Optimal Number of Clusters with the Clustergram
NASA Technical Reports Server (NTRS)
Fluegemann, Joseph K.; Davies, Misty D.; Aguirre, Nathan D.
2011-01-01
Cluster analysis aids research in many different fields, from business to biology to aerospace. It consists of using statistical techniques to group objects in large sets of data into meaningful classes. However, this process of ordering data points presents much uncertainty because it involves several steps, many of which are subject to researcher judgment as well as inconsistencies depending on the specific data type and research goals. These steps include the method used to cluster the data, the variables on which the cluster analysis will be operating, the number of resulting clusters, and parts of the interpretation process. In most cases, the number of clusters must be guessed or estimated before employing the clustering method. Many remedies have been proposed, but none is unassailable and certainly not for all data types. Thus, the aim of current research for better techniques of determining the number of clusters is generally confined to demonstrating that the new technique excels other methods in performance for several disparate data types. Our research makes use of a new cluster-number-determination technique based on the clustergram: a graph that shows how the number of objects in the cluster and the cluster mean (the ordinate) change with the number of clusters (the abscissa). We use the features of the clustergram to make the best determination of the cluster-number.
Rajab, Maher I
2011-11-01
Since the introduction of epiluminescence microscopy (ELM), image analysis tools have been extended to the field of dermatology, in an attempt to algorithmically reproduce clinical evaluation. Accurate image segmentation of skin lesions is one of the key steps for useful, early and non-invasive diagnosis of coetaneous melanomas. This paper proposes two image segmentation algorithms based on frequency domain processing and k-means clustering/fuzzy k-means clustering. The two methods are capable of segmenting and extracting the true border that reveals the global structure irregularity (indentations and protrusions), which may suggest excessive cell growth or regression of a melanoma. As a pre-processing step, Fourier low-pass filtering is applied to reduce the surrounding noise in a skin lesion image. A quantitative comparison of the techniques is enabled by the use of synthetic skin lesion images that model lesions covered with hair to which Gaussian noise is added. The proposed techniques are also compared with an established optimal-based thresholding skin-segmentation method. It is demonstrated that for lesions with a range of different border irregularity properties, the k-means clustering and fuzzy k-means clustering segmentation methods provide the best performance over a range of signal to noise ratios. The proposed segmentation techniques are also demonstrated to have similar performance when tested on real skin lesions representing high-resolution ELM images. This study suggests that the segmentation results obtained using a combination of low-pass frequency filtering and k-means or fuzzy k-means clustering are superior to the result that would be obtained by using k-means or fuzzy k-means clustering segmentation methods alone. © 2011 John Wiley & Sons A/S.
Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour
2018-01-12
Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-01-01
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
Subgroup Analysis in Burnout: Relations Between Fatigue, Anxiety, and Depression
van Dam, Arno
2016-01-01
Several authors have suggested that burned out patients do not form a homogeneous group and that subgroups should be considered. The identification of these subgroups may contribute to a better understanding of the burnout construct and lead to more specific therapeutic interventions. Subgroup analysis may also help clarify whether burnout is a distinct entity and whether subgroups of burnout overlap with other disorders such as depression and chronic fatigue syndrome. In a group of 113 clinically diagnosed burned out patients, levels of fatigue, depression, and anxiety were assessed. In order to identify possible subgroups, we performed a two-step cluster analysis. The analysis revealed two clusters that differed from one another in terms of symptom severity on the three aforementioned measures. Depression appeared to be the strongest predictor of group membership. These results are considered in the light of the scientific debate on whether burnout can be distinguished from depression and whether burnout subtyping is useful. Finally, implications for clinical practice and future research are discussed. PMID:26869983
A marker-free system for the analysis of movement disabilities.
Legrand, L; Marzani, F; Dusserre, L
1998-01-01
A major step toward improving the treatments of disabled persons may be achieved by using motion analysis equipment. We are developing such a system. It allows the analysis of plane human motion (e.g. gait) without using the tracking of markers. The system is composed of one fixed camera which acquires an image sequence of a human in motion. Then the treatment is divided into two steps: first, a large number of pixels belonging to the boundaries of the human body are extracted at each acquisition time. Secondly, a two-dimensional model of the human body, based on tapered superquadrics, is successively matched with the sets of pixels previously extracted; a specific fuzzy clustering process is used for this purpose. Moreover, an optical flow procedure gives a prediction of the model location at each acquisition time from its location at the previous time. Finally we present some results of this process applied to a leg in motion.
Dogra, Vivek; Bagler, Ganesh; Sreenivasulu, Yelam
2015-01-01
Podophyllum hexandrum Royle is an important high-altitude plant of Himalayas with immense medicinal value. Earlier, it was reported that the cell wall hydrolases were up accumulated during radicle protrusion step of Podophyllum seed germination. In the present study, Podophyllum seed Germination protein interaction Network (PGN) was constructed by using the differentially accumulated protein (DAP) data set of Podophyllum during the radicle protrusion step of seed germination, with reference to Arabidopsis protein–protein interaction network (AtPIN). The developed PGN is comprised of a giant cluster with 1028 proteins having 10,519 interactions and a few small clusters with relevant gene ontological signatures. In this analysis, a germination pathway related cluster which is also central to the topology and information dynamics of PGN was obtained with a set of 60 key proteins. Among these, eight proteins which are known to be involved in signaling, metabolism, protein modification, cell wall modification, and cell cycle regulation processes were found commonly highlighted in both the proteomic and interactome analysis. The systems-level analysis of PGN identified the key proteins involved in radicle protrusion step of seed germination in Podophyllum. PMID:26579141
ERIC Educational Resources Information Center
Lake County Area Vocational Center, Grayslake, IL.
This document contains a task analysis for health occupations (professional nurse) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…
ERIC Educational Resources Information Center
Lake County Area Vocational Center, Grayslake, IL.
This document contains a task analysis for health occupations (home health aid) in the nursing cluster. For each task listed, occupation, duty area, performance standard, steps, knowledge, attitudes, safety, equipment/supplies, source of analysis, and Illinois state goals for learning are listed. For the duty area of "providing therapeutic…
Spatial Correlation of Solar-Wind Turbulence from Two-Point Measurements
NASA Technical Reports Server (NTRS)
Matthaeus, W. H.; Milano, L. J.; Dasso, S.; Weygand, J. M.; Smith, C. W.; Kivelson, M. G.
2005-01-01
Interplanetary turbulence, the best studied case of low frequency plasma turbulence, is the only directly quantified instance of astrophysical turbulence. Here, magnetic field correlation analysis, using for the first time only proper two-point, single time measurements, provides a key step in unraveling the space-time structure of interplanetary turbulence. Simultaneous magnetic field data from the Wind, ACE, and Cluster spacecraft are analyzed to determine the correlation (outer) scale, and the Taylor microscale near Earth's orbit.
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population.
Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi
2015-01-01
Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability--the basis of cluster generation--is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.
Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V
2016-11-01
There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review
Morris, Tom; Gray, Laura
2017-01-01
Objectives To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Setting Any, not limited to healthcare settings. Participants Any taking part in an SW-CRT published up to March 2016. Primary and secondary outcome measures The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Results Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22–0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Conclusions Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. PMID:29146637
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques
ERIC Educational Resources Information Center
Luan, Jing
2004-01-01
This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…
NASA Technical Reports Server (NTRS)
Smedes, H. W.; Linnerud, H. J.; Woolaver, L. B.; Su, M. Y.; Jayroe, R. R.
1972-01-01
Two clustering techniques were used for terrain mapping by computer of test sites in Yellowstone National Park. One test was made with multispectral scanner data using a composite technique which consists of (1) a strictly sequential statistical clustering which is a sequential variance analysis, and (2) a generalized K-means clustering. In this composite technique, the output of (1) is a first approximation of the cluster centers. This is the input to (2) which consists of steps to improve the determination of cluster centers by iterative procedures. Another test was made using the three emulsion layers of color-infrared aerial film as a three-band spectrometer. Relative film densities were analyzed using a simple clustering technique in three-color space. Important advantages of the clustering technique over conventional supervised computer programs are (1) human intervention, preparation time, and manipulation of data are reduced, (2) the computer map, gives unbiased indication of where best to select the reference ground control data, (3) use of easy to obtain inexpensive film, and (4) the geometric distortions can be easily rectified by simple standard photogrammetric techniques.
Typology of people with first-episode psychosis.
Subramaniam, Mythily; Zheng, Huili; Soh, Pauline; Poon, Lye Yin; Vaingankar, Janhavi A; Chong, Siow Ann; Verma, Swapna
2016-08-01
The aim of the current study was to create a typology of patients with first-episode psychosis based on sociodemographic and clinical characteristics, service use and outcomes using cluster analysis. Data from all respondents who were accepted into the Early Psychosis Intervention Programme (EPIP), Singapore from 2007 to 2011 were analysed. A two-step clustering method was carried out to classify the patients into distinct clusters. Two clusters were identified. Cluster 1 comprised largely of younger people with mean age of 25.5 (6.0) years at treatment contact, who were predominantly male (55.3%), single (98.3%) and living with parents (86.3%). Cluster 1 had a higher proportion of people diagnosed with the schizophrenia spectrum disorder (71.4%) and with a positive family history of psychiatric illness. Patients in cluster 2 were generally older with a mean age of 33.6 (4.7) years and the majority were women (74.2%). Cluster 1 had people with higher Positive and Negative Syndrome Scale (PANSS) scores at baseline as compared with cluster 2. After a 1-year follow up, their scores were still poorer than their counterparts in cluster 2, especially for PANSS negative score. The functioning level of people in cluster 1 showed less improvement than the people in cluster 2 after a year of treatment. There is a compelling need to develop new therapies and intensively treat young people presenting with psychosis as this group tends to have poorer outcomes even after 1 year of treatment. © 2014 Wiley Publishing Asia Pty Ltd.
Identification of microRNA-mRNA modules using microarray data.
Jayaswal, Vivek; Lutherborrow, Mark; Ma, David D F; Yang, Yee H
2011-03-06
MicroRNAs (miRNAs) are post-transcriptional regulators of mRNA expression and are involved in numerous cellular processes. Consequently, miRNAs are an important component of gene regulatory networks and an improved understanding of miRNAs will further our knowledge of these networks. There is a many-to-many relationship between miRNAs and mRNAs because a single miRNA targets multiple mRNAs and a single mRNA is targeted by multiple miRNAs. However, most of the current methods for the identification of regulatory miRNAs and their target mRNAs ignore this biological observation and focus on miRNA-mRNA pairs. We propose a two-step method for the identification of many-to-many relationships between miRNAs and mRNAs. In the first step, we obtain miRNA and mRNA clusters using a combination of miRNA-target mRNA prediction algorithms and microarray expression data. In the second step, we determine the associations between miRNA clusters and mRNA clusters based on changes in miRNA and mRNA expression profiles. We consider the miRNA-mRNA clusters with statistically significant associations to be potentially regulatory and, therefore, of biological interest. Our method reduces the interactions between several hundred miRNAs and several thousand mRNAs to a few miRNA-mRNA groups, thereby facilitating a more meaningful biological analysis and a more targeted experimental validation.
a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis
NASA Astrophysics Data System (ADS)
Huang, W.; Li, S.; Xu, S.
2016-06-01
How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the approximate 55% spatiotemporal clusters distributed in different locations can be eventually grouped as the same type of clusters with consideration of semantic aspect.
Comprehensive cluster analysis with Transitivity Clustering.
Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan
2011-03-01
Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.
Clustering on Magnesium Surfaces - Formation and Diffusion Energies.
Chu, Haijian; Huang, Hanchen; Wang, Jian
2017-07-12
The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and [Formula: see text]. In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a [Formula: see text], clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface [Formula: see text] is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich-Schwoebel barriers converge as the step height is three atomic layers or thicker. Adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.
Hierarchical clustering of EMD based interest points for road sign detection
NASA Astrophysics Data System (ADS)
Khan, Jesmin; Bhuiyan, Sharif; Adhami, Reza
2014-04-01
This paper presents an automatic road traffic signs detection and recognition system based on hierarchical clustering of interest points and joint transform correlation. The proposed algorithm consists of the three following stages: interest points detection, clustering of those points and similarity search. At the first stage, good discriminative, rotation and scale invariant interest points are selected from the image edges based on the 1-D empirical mode decomposition (EMD). We propose a two-step unsupervised clustering technique, which is adaptive and based on two criterion. In this context, the detected points are initially clustered based on the stable local features related to the brightness and color, which are extracted using Gabor filter. Then points belonging to each partition are reclustered depending on the dispersion of the points in the initial cluster using position feature. This two-step hierarchical clustering yields the possible candidate road signs or the region of interests (ROIs). Finally, a fringe-adjusted joint transform correlation (JTC) technique is used for matching the unknown signs with the existing known reference road signs stored in the database. The presented framework provides a novel way to detect a road sign from the natural scenes and the results demonstrate the efficacy of the proposed technique, which yields a very low false hit rate.
Cardoza, R. E.; Malmierca, M. G.; Hermosa, M. R.; Alexander, N. J.; McCormick, S. P.; Proctor, R. H.; Tijerino, A. M.; Rumbero, A.; Monte, E.; Gutiérrez, S.
2011-01-01
Trichothecenes are mycotoxins produced by Trichoderma, Fusarium, and at least four other genera in the fungal order Hypocreales. Fusarium has a trichothecene biosynthetic gene (TRI) cluster that encodes transport and regulatory proteins as well as most enzymes required for the formation of the mycotoxins. However, little is known about trichothecene biosynthesis in the other genera. Here, we identify and characterize TRI gene orthologues (tri) in Trichoderma arundinaceum and Trichoderma brevicompactum. Our results indicate that both Trichoderma species have a tri cluster that consists of orthologues of seven genes present in the Fusarium TRI cluster. Organization of genes in the cluster is the same in the two Trichoderma species but differs from the organization in Fusarium. Sequence and functional analysis revealed that the gene (tri5) responsible for the first committed step in trichothecene biosynthesis is located outside the cluster in both Trichoderma species rather than inside the cluster as it is in Fusarium. Heterologous expression analysis revealed that two T. arundinaceum cluster genes (tri4 and tri11) differ in function from their Fusarium orthologues. The Tatri4-encoded enzyme catalyzes only three of the four oxygenation reactions catalyzed by the orthologous enzyme in Fusarium. The Tatri11-encoded enzyme catalyzes a completely different reaction (trichothecene C-4 hydroxylation) than the Fusarium orthologue (trichothecene C-15 hydroxylation). The results of this study indicate that although some characteristics of the tri/TRI cluster have been conserved during evolution of Trichoderma and Fusarium, the cluster has undergone marked changes, including gene loss and/or gain, gene rearrangement, and divergence of gene function. PMID:21642405
Zhang, Shaoliang; Lorenzo, Alberto; Gómez, Miguel-Angel; Mateus, Nuno; Gonçalves, Bruno; Sampaio, Jaime
2018-04-20
The aim of this study was: (i) to group basketball players into similar clusters based on a combination of anthropometric characteristics and playing experience; and (ii) explore the distribution of players (included starters and non-starters) from different levels of teams within the obtained clusters. The game-related statistics from 699 regular season balanced games were analyzed using a two-step cluster model and a discriminant analysis. The clustering process allowed identifying five different player profiles: Top height and weight (HW) with low experience, TopHW-LowE; Middle HW with middle experience, MiddleHW-MiddleE; Middle HW with top experience, MiddleHW-TopE; Low HW with low experience, LowHW-LowE; Low HW with middle experience, LowHW-MiddleE. Discriminant analysis showed that TopHW-LowE group was highlighted by two-point field goals made and missed, offensive and defensive rebounds, blocks, and personal fouls; whereas the LowHW-LowE group made fewest passes and touches. The players from weaker teams were mostly distributed in LowHW-LowE group, whereas players from stronger teams were mainly grouped in LowHW-MiddleE group; and players that participated in the finals were allocated in the MiddleHW-MiddleE group. These results provide alternative references for basketball staff concerning the process of evaluating performance.
Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer
2015-01-01
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885
NASA Astrophysics Data System (ADS)
Lu, Xin-Ming
Shallow junction formation made by low energy ion implantation and rapid thermal annealing is facing a major challenge for ULSI (ultra large scale integration) as the line width decreases down to the sub micrometer region. The issues include low beam current, the channeling effect in low energy ion implantation and TED (transient enhanced diffusion) during annealing after ion implantation. In this work, boron containing small cluster ions, such as GeB, SiB and SiB2, was generated by using the SNICS (source of negative ion by cesium sputtering) ion source to implant into Si substrates to form shallow junctions. The use of boron containing cluster ions effectively reduces the boron energy while keeping the energy of the cluster ion beam at a high level. At the same time, it reduces the channeling effect due to amorphization by co-implanted heavy atoms like Ge and Si. Cluster ions have been used to produce 0.65--2keV boron for low energy ion implantation. Two stage annealing, which is a combination of low temperature (550°C) preannealing and high temperature annealing (1000°C), was carried out to anneal the Si sample implanted by GeB, SiBn clusters. The key concept of two-step annealing, that is, the separation of crystal regrowth, point defects removal with dopant activation from dopant diffusion, is discussed in detail. The advantages of the two stage annealing include better lattice structure, better dopant activation and retarded boron diffusion. The junction depth of the two stage annealed GeB sample was only half that of the one-step annealed sample, indicating that TED was suppressed by two stage annealing. Junction depths as small as 30 nm have been achieved by two stage annealing of sample implanted with 5 x 10-4/cm2 of 5 keV GeB at 1000°C for 1 second. The samples were evaluated by SIMS (secondary ion mass spectrometry) profiling, TEM (transmission electron microscopy) and RBS (Rutherford Backscattering Spectrometry)/channeling. Cluster ion implantation in combination with two-step annealing is effective in fabricating ultra-shallow junctions.
An effective fuzzy kernel clustering analysis approach for gene expression data.
Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao
2015-01-01
Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Taljaard, Monica; Hemming, Karla; Shah, Lena; Giraudeau, Bruno; Grimshaw, Jeremy M; Weijer, Charles
2017-08-01
Background/aims The use of the stepped wedge cluster randomized design is rapidly increasing. This design is commonly used to evaluate health policy and service delivery interventions. Stepped wedge cluster randomized trials have unique characteristics that complicate their ethical interpretation. The 2012 Ottawa Statement provides comprehensive guidance on the ethical design and conduct of cluster randomized trials, and the 2010 CONSORT extension for cluster randomized trials provides guidelines for reporting. Our aims were to assess the adequacy of the ethical conduct and reporting of stepped wedge trials to date, focusing on research ethics review and informed consent. Methods We conducted a systematic review of stepped wedge cluster randomized trials in health research published up to 2014 in English language journals. We extracted details of study intervention and data collection procedures, as well as reporting of research ethics review and informed consent. Two reviewers independently extracted data from each trial; discrepancies were resolved through discussion. We identified the presence of any research participants at the cluster level and the individual level. We assessed ethical conduct by tabulating reporting of research ethics review and informed consent against the presence of research participants. Results Of 32 identified stepped wedge trials, only 24 (75%) reported review by a research ethics committee, and only 16 (50%) reported informed consent from any research participants-yet, all trials included research participants at some level. In the subgroup of 20 trials with research participants at cluster level, only 4 (20%) reported informed consent from such participants; in 26 trials with individual-level research participants, only 15 (58%) reported their informed consent. Interventions (regardless of whether targeting cluster- or individual-level participants) were delivered at the group level in more than two-thirds of trials; nine trials (28%) had no identifiable data collected from any research participants. Overall, only three trials (9%) indicated that a waiver of consent had been granted by a research ethics committee. When considering the combined requirement of research ethics review and informed consent (or a waiver), only one in three studies were compliant. Conclusion The ethical conduct and reporting of key ethical protections in stepped wedge trials, namely, research ethics review and informed consent, are inadequate. We recommend that stepped wedge trials be classified as research and reviewed and approved by a research ethics committee. We also recommend that researchers appropriately identify research participants (which may include health professionals), seek informed consent or appeal to an ethics committee for a waiver of consent, and include explicit details of research ethics approval and informed consent in the trial report.
Applying Machine Learning to Star Cluster Classification
NASA Astrophysics Data System (ADS)
Fedorenko, Kristina; Grasha, Kathryn; Calzetti, Daniela; Mahadevan, Sridhar
2016-01-01
Catalogs describing populations of star clusters are essential in investigating a range of important issues, from star formation to galaxy evolution. Star cluster catalogs are typically created in a two-step process: in the first step, a catalog of sources is automatically produced; in the second step, each of the extracted sources is visually inspected by 3-to-5 human classifiers and assigned a category. Classification by humans is labor-intensive and time consuming, thus it creates a bottleneck, and substantially slows down progress in star cluster research.We seek to automate the process of labeling star clusters (the second step) through applying supervised machine learning techniques. This will provide a fast, objective, and reproducible classification. Our data is HST (WFC3 and ACS) images of galaxies in the distance range of 3.5-12 Mpc, with a few thousand star clusters already classified by humans as a part of the LEGUS (Legacy ExtraGalactic UV Survey) project. The classification is based on 4 labels (Class 1 - symmetric, compact cluster; Class 2 - concentrated object with some degree of asymmetry; Class 3 - multiple peak system, diffuse; and Class 4 - spurious detection). We start by looking at basic machine learning methods such as decision trees. We then proceed to evaluate performance of more advanced techniques, focusing on convolutional neural networks and other Deep Learning methods. We analyze the results, and suggest several directions for further improvement.
Impulsivity profiles in pathological slot machine gamblers.
Aragay, Núria; Barrios, Maite; Ramirez-Gendrau, Isabel; Garcia-Caballero, Anna; Garrido, Gemma; Ramos-Grille, Irene; Galindo, Yésika; Martin-Dombrowski, Jonatan; Vallès, Vicenç
2018-05-01
In gambling disorder (GD), impulsivity has been related with severity, treatment outcome and a greater dropout rate. The aim of the study is to obtain an empirical classification of GD patients based on their impulsivity and compare the resulting groups in terms of sociodemographic, clinical and gambling behavior variables. 126 patients with slot machine GD attending the Pathological Gambling Unit between 2013 and 2016 were included. The UPPS-P Impulsive Behavior Scale was used to assess impulsivity, and the severity of past-year gambling behavior was established with the Screen for Gambling problems questionnaire (NODS). Depression and anxiety symptoms and executive function were also assessed. A two-step cluster analysis was carried out to determine impulsivity profiles. According to the UPPS-P data, two clusters were generated. Cluster 1 showed the highest scores on all the UPPS-P subscales, whereas patients from cluster 2 exhibited only high scores on two UPPS-P subscales: Negative Urgency and Lack of premeditation. Additionally, patients on cluster 1 were younger and showed significantly higher scores on the Beck Depression Inventory and on the State-Trait Anxiety Inventory questionnaires, worse emotional regulation and executive functioning, and reported more psychiatric comorbidity compared to patients in cluster 2. With regard to gambling behavior, cluster 1 patients had significantly higher NODS scores and a higher percentage presented active gambling behavior at treatment start than in cluster 2. We found two impulsivity subtypes of slot machine gamblers. Patients with high impulsivity showed more severe gambling behavior, more clinical psychopathology and worse emotional regulation and executive functioning than those with lower levels of impulsivity. These two different clinical profiles may require different therapeutic approaches. Copyright © 2018 Elsevier Inc. All rights reserved.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Sample size determination for GEE analyses of stepped wedge cluster randomized trials.
Li, Fan; Turner, Elizabeth L; Preisser, John S
2018-06-19
In stepped wedge cluster randomized trials, intact clusters of individuals switch from control to intervention from a randomly assigned period onwards. Such trials are becoming increasingly popular in health services research. When a closed cohort is recruited from each cluster for longitudinal follow-up, proper sample size calculation should account for three distinct types of intraclass correlations: the within-period, the inter-period, and the within-individual correlations. Setting the latter two correlation parameters to be equal accommodates cross-sectional designs. We propose sample size procedures for continuous and binary responses within the framework of generalized estimating equations that employ a block exchangeable within-cluster correlation structure defined from the distinct correlation types. For continuous responses, we show that the intraclass correlations affect power only through two eigenvalues of the correlation matrix. We demonstrate that analytical power agrees well with simulated power for as few as eight clusters, when data are analyzed using bias-corrected estimating equations for the correlation parameters concurrently with a bias-corrected sandwich variance estimator. © 2018, The International Biometric Society.
ctsGE-clustering subgroups of expression data.
Sharabi-Schwager, Michal; Or, Etti; Ophir, Ron
2017-07-01
A pre-requisite to clustering noisy data, such as gene-expression data, is the filtering step. As an alternative to this step, the ctsGE R-package applies a sorting step in which all of the data are divided into small groups. The groups are divided according to how the time points are related to the time-series median. Then clustering is performed separately on each group. Thus, the clustering is done in two steps. First, an expression index (i.e. a sequence of 1, -1 and 0) is defined and genes with the same index are grouped together, and then each group of genes is clustered by k-means to create subgroups. The ctsGE package also provides an interactive tool to visualize and explore the gene-expression patterns and their subclusters. ctsGE proposes a way of organizing and exploring expression data without eliminating valuable information. Freely available as part of the Bioconductor project at https://bioconductor.org/packages/ctsGE/ . ron@agri.gov.il. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Frejo, L; Martin-Sanz, E; Teggi, R; Trinidad, G; Soto-Varela, A; Santos-Perez, S; Manrique, R; Perez, N; Aran, I; Almeida-Branco, M S; Batuecas-Caletrio, A; Fraile, J; Espinosa-Sanchez, J M; Perez-Guillen, V; Perez-Garrigues, H; Oliva-Dominguez, M; Aleman, O; Benitez, J; Perez, P; Lopez-Escamez, J A
2017-12-01
To define clinical subgroups by cluster analysis in patients with unilateral Meniere disease (MD) and to compare them with the clinical subgroups found in bilateral MD. A cross-sectional study with a two-step cluster analysis. A tertiary referral multicenter study. Nine hundred and eighty-eight adult patients with unilateral MD. best predictors to define clinical subgroups with potential different aetiologies. We established five clusters in unilateral MD. Group 1 is the most frequently found, includes 53% of patients, and it is defined as the sporadic, classic MD without migraine and without autoimmune disorder (AD). Group 2 is found in 8% of patients, and it is defined by hearing loss, which antedates the vertigo episodes by months or years (delayed MD), without migraine or AD in most of cases. Group 3 involves 13% of patients, and it is considered familial MD, while group 4, which includes 15% of patients, is linked to the presence of migraine in all cases. Group 5 is found in 11% of patients and is defined by a comorbid AD. We found significant differences in the distribution of AD in clusters 3, 4 and 5 between patients with uni- and bilateral MD. Cluster analysis defines clinical subgroups in MD, and it extends the phenotype beyond audiovestibular symptoms. This classification will help to improve the phenotyping in MD and facilitate the selection of patients for randomised clinical trials. © 2017 John Wiley & Sons Ltd.
Russian consumers' motives for food choice.
Honkanen, Pirjo; Frewer, Lynn
2009-04-01
Knowledge about food choice motives which have potential to influence consumer consumption decisions is important when designing food and health policies, as well as marketing strategies. Russian consumers' food choice motives were studied in a survey (1081 respondents across four cities), with the purpose of identifying consumer segments based on these motives. These segments were then profiled using consumption, attitudinal and demographic variables. Face-to-face interviews were used to sample the data, which were analysed with two-step cluster analysis (SPSS). Three clusters emerged, representing 21.5%, 45.8% and 32.7% of the sample. The clusters were similar in terms of the order of motivations, but differed in motivational level. Sensory factors and availability were the most important motives for food choice in all three clusters, followed by price. This may reflect the turbulence which Russia has recently experienced politically and economically. Cluster profiles differed in relation to socio-demographic factors, consumption patterns and attitudes towards health and healthy food.
Evaluation of Second-Level Inference in fMRI Analysis
Roels, Sanne P.; Loeys, Tom; Moerkerke, Beatrijs
2016-01-01
We investigate the impact of decisions in the second-level (i.e., over subjects) inferential process in functional magnetic resonance imaging on (1) the balance between false positives and false negatives and on (2) the data-analytical stability, both proxies for the reproducibility of results. Second-level analysis based on a mass univariate approach typically consists of 3 phases. First, one proceeds via a general linear model for a test image that consists of pooled information from different subjects. We evaluate models that take into account first-level (within-subjects) variability and models that do not take into account this variability. Second, one proceeds via inference based on parametrical assumptions or via permutation-based inference. Third, we evaluate 3 commonly used procedures to address the multiple testing problem: familywise error rate correction, False Discovery Rate (FDR) correction, and a two-step procedure with minimal cluster size. Based on a simulation study and real data we find that the two-step procedure with minimal cluster size results in most stable results, followed by the familywise error rate correction. The FDR results in most variable results, for both permutation-based inference and parametrical inference. Modeling the subject-specific variability yields a better balance between false positives and false negatives when using parametric inference. PMID:26819578
Mehta, S; Rice, D; McIntyre, A; Getty, H; Speechley, M; Sequeira, K; Shapiro, A P; Morley-Forster, P; Teasell, R W
2016-01-01
Objective. The current study attempted to identify and characterize distinct CP subgroups based on their level of dispositional personality traits. The secondary objective was to compare the difference among the subgroups in mood, coping, and disability. Methods. Individuals with chronic pain were assessed for demographic, psychosocial, and personality measures. A two-step cluster analysis was conducted in order to identify distinct subgroups of patients based on their level of personality traits. Differences in clinical outcomes were compared using the multivariate analysis of variance based on cluster membership. Results. In 229 participants, three clusters were formed. No significant difference was seen among the clusters on patient demographic factors including age, sex, relationship status, duration of pain, and pain intensity. Those with high levels of dispositional personality traits had greater levels of mood impairment compared to the other two groups (p < 0.05). Significant difference in disability was seen between the subgroups. Conclusions. The study identified a high risk group of CP individuals whose level of personality traits significantly correlated with impaired mood and coping. Use of pharmacological treatment alone may not be successful in improving clinical outcomes among these individuals. Instead, a more comprehensive treatment involving psychological treatments may be important in managing the personality traits that interfere with recovery.
Formation of Nitrogenase NifDK Tetramers in the Mitochondria of Saccharomyces cerevisiae
2017-01-01
Transferring the prokaryotic enzyme nitrogenase into a eukaryotic host with the final aim of developing N2 fixing cereal crops would revolutionize agricultural systems worldwide. Targeting it to mitochondria has potential advantages because of the organelle’s high O2 consumption and the presence of bacterial-type iron–sulfur cluster biosynthetic machinery. In this study, we constructed 96 strains of Saccharomyces cerevisiae in which transcriptional units comprising nine Azotobacter vinelandii nif genes (nifHDKUSMBEN) were integrated into the genome. Two combinatorial libraries of nif gene clusters were constructed: a library of mitochondrial leading sequences consisting of 24 clusters within four subsets of nif gene expression strength, and an expression library of 72 clusters with fixed mitochondrial leading sequences and nif expression levels assigned according to factorial design. In total, 29 promoters and 18 terminators were combined to adjust nif gene expression levels. Expression and mitochondrial targeting was confirmed at the protein level as immunoblot analysis showed that Nif proteins could be efficiently accumulated in mitochondria. NifDK tetramer formation, an essential step of nitrogenase assembly, was experimentally proven both in cell-free extracts and in purified NifDK preparations. This work represents a first step toward obtaining functional nitrogenase in the mitochondria of a eukaryotic cell. PMID:28221768
Adnane, Choaib; Adouly, Taoufik; Khallouk, Amine; Rouadi, Sami; Abada, Redallah; Roubal, Mohamed; Mahtar, Mohamed
2017-02-01
The purpose of this study is to use unsupervised cluster methodology to identify phenotype and mucosal eosinophilia endotype subgroups of patients with medical refractory chronic rhinosinusitis (CRS), and evaluate the difference in quality of life (QOL) outcomes after endoscopic sinus surgery (ESS) between these clusters for better surgical case selection. A prospective cohort study included 131 patients with medical refractory CRS who elected ESS. The Sino-Nasal Outcome Test (SNOT-22) was used to evaluate QOL before and 12 months after surgery. Unsupervised two-step clustering method was performed. One hundred and thirteen subjects were retained in this study: 46 patients with CRS without nasal polyps and 67 patients with nasal polyps. Nasal polyps, gender, mucosal eosinophilia profile, and prior sinus surgery were the most discriminating factors in the generated clusters. Three clusters were identified. A significant clinical improvement was observed in all clusters 12 months after surgery with a reduction of SNOT-22 scores. There was a significant difference in QOL outcomes between clusters; cluster 1 had the worst QOL improvement after FESS in comparison with the other clusters 2 and 3. All patients in cluster 1 presented CRSwNP with the highest mucosal eosinophilia endotype. Clustering method is able to classify CRS phenotypes and endotypes with different associated surgical outcomes.
Response to treatment of myasthenia gravis according to clinical subtype.
Akaishi, Tetsuya; Suzuki, Yasushi; Imai, Tomihiro; Tsuda, Emiko; Minami, Naoya; Nagane, Yuriko; Uzawa, Akiyuki; Kawaguchi, Naoki; Masuda, Masayuki; Konno, Shingo; Suzuki, Hidekazu; Murai, Hiroyuki; Aoki, Masashi; Utsugisawa, Kimiaki
2016-11-17
We have previously reported using two-step cluster analysis to classify myasthenia gravis (MG) patients into the following five subtypes: ocular MG; thymoma-associated MG; MG with thymic hyperplasia; anti-acetylcholine receptor antibody (AChR-Ab)-negative MG; and AChR-Ab-positive MG without thymic abnormalities. The objectives of the present study were to examine the reproducibility of this five-subtype classification using a new data set of MG patients and to identify additional characteristics of these subtypes, particularly in regard to response to treatment. A total of 923 consecutive MG patients underwent two-step cluster analysis for the classification of subtypes. The variables used for classification were sex, age of onset, disease duration, presence of thymoma or thymic hyperplasia, positivity for AChR-Ab or anti-muscle-specific tyrosine kinase antibody, positivity for other concurrent autoantibodies, and disease condition at worst and current. The period from the start of treatment until the achievement of minimal manifestation status (early-stage response) was determined and then compared between subtypes using Kaplan-Meier analysis and the log-rank test. In addition, between subtypes, the rate of the number of patients who maintained minimal manifestations during the study period/that of patients who only achieved the status once (stability of improved status) was compared. As a result of two-step cluster analysis, 923 MG patients were classified into five subtypes as follows: ocular MG (AChR-Ab-positivity, 77%; histogram of onset age, skewed to older age); thymoma-associated MG (100%; normal distribution); MG with thymic hyperplasia (89%; skewed to younger age); AChR-Ab-negative MG (0%; normal distribution); and AChR-Ab-positive MG without thymic abnormalities (100%, skewed to older age). Furthermore, patients classified as ocular MG showed the best early-stage response to treatment and stability of improved status, followed by those classified as thymoma-associated MG and AChR-Ab-positive MG without thymic abnormalities; by contrast, those classified as AChR-Ab-negative MG showed the worst early-stage response to treatment and stability of improved status. Differences were seen between the five subtypes in demographic characteristics, clinical severity, and therapeutic response. Our five-subtype classification approach would be beneficial not only to elucidate disease subtypes, but also to plan treatment strategies for individual MG patients.
An improved K-means clustering algorithm in agricultural image segmentation
NASA Astrophysics Data System (ADS)
Cheng, Huifeng; Peng, Hui; Liu, Shanmei
Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population
Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi
2015-01-01
Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. PMID:26339613
Do protein crystals nucleate within dense liquid clusters?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maes, Dominique, E-mail: dommaes@vub.ac.be; Vorontsova, Maria A.; Potenza, Marco A. C.
2015-06-27
The evolution of protein-rich clusters and nucleating crystals were characterized by dynamic light scattering (DLS), confocal depolarized dynamic light scattering (cDDLS) and depolarized oblique illumination dark-field microscopy. Newly nucleated crystals within protein-rich clusters were detected directly. These observations indicate that the protein-rich clusters are locations for crystal nucleation. Protein-dense liquid clusters are regions of high protein concentration that have been observed in solutions of several proteins. The typical cluster size varies from several tens to several hundreds of nanometres and their volume fraction remains below 10{sup −3} of the solution. According to the two-step mechanism of nucleation, the protein-rich clustersmore » serve as locations for and precursors to the nucleation of protein crystals. While the two-step mechanism explained several unusual features of protein crystal nucleation kinetics, a direct observation of its validity for protein crystals has been lacking. Here, two independent observations of crystal nucleation with the proteins lysozyme and glucose isomerase are discussed. Firstly, the evolutions of the protein-rich clusters and nucleating crystals were characterized simultaneously by dynamic light scattering (DLS) and confocal depolarized dynamic light scattering (cDDLS), respectively. It is demonstrated that protein crystals appear following a significant delay after cluster formation. The cDDLS correlation functions follow a Gaussian decay, indicative of nondiffusive motion. A possible explanation is that the crystals are contained inside large clusters and are driven by the elasticity of the cluster surface. Secondly, depolarized oblique illumination dark-field microscopy reveals the evolution from liquid clusters without crystals to newly nucleated crystals contained in the clusters to grown crystals freely diffusing in the solution. Collectively, the observations indicate that the protein-rich clusters in lysozyme and glucose isomerase solutions are locations for crystal nucleation.« less
Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.
Kristunas, Caroline; Morris, Tom; Gray, Laura
2017-11-15
To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
WAIS-III index score profiles in the Canadian standardization sample.
Lange, Rael T
2007-01-01
Representative index score profiles were examined in the Canadian standardization sample of the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The identification of profile patterns was based on the methodology proposed by Lange, Iverson, Senior, and Chelune (2002) that aims to maximize the influence of profile shape and minimize the influence of profile magnitude on the cluster solution. A two-step cluster analysis procedure was used (i.e., hierarchical and k-means analyses). Cluster analysis of the four index scores (i.e., Verbal Comprehension [VCI], Perceptual Organization [POI], Working Memory [WMI], Processing Speed [PSI]) identified six profiles in this sample. Profiles were differentiated by pattern of performance and were primarily characterized as (a) high VCI/POI, low WMI/PSI, (b) low VCI/POI, high WMI/PSI, (c) high PSI, (d) low PSI, (e) high VCI/WMI, low POI/PSI, and (f) low VCI, high POI. These profiles are potentially useful for determining whether a patient's WAIS-III performance is unusual in a normal population.
Proposed variations of the stepped-wedge design can be used to accommodate multiple interventions.
Lyons, Vivian H; Li, Lingyu; Hughes, James P; Rowhani-Rahbar, Ali
2017-06-01
Stepped-wedge design (SWD) cluster-randomized trials have traditionally been used for evaluating a single intervention. We aimed to explore design variants suitable for evaluating multiple interventions in an SWD trial. We identified four specific variants of the traditional SWD that would allow two interventions to be conducted within a single cluster-randomized trial: concurrent, replacement, supplementation, and factorial SWDs. These variants were chosen to flexibly accommodate study characteristics that limit a one-size-fits-all approach for multiple interventions. In the concurrent SWD, each cluster receives only one intervention, unlike the other variants. The replacement SWD supports two interventions that will not or cannot be used at the same time. The supplementation SWD is appropriate when the second intervention requires the presence of the first intervention, and the factorial SWD supports the evaluation of intervention interactions. The precision for estimating intervention effects varies across the four variants. Selection of the appropriate design variant should be driven by the research question while considering the trade-off between the number of steps, number of clusters, restrictions for concurrent implementation of the interventions, lingering effects of each intervention, and precision of the intervention effect estimates. Copyright © 2017 Elsevier Inc. All rights reserved.
AMOEBA clustering revisited. [cluster analysis, classification, and image display program
NASA Technical Reports Server (NTRS)
Bryant, Jack
1990-01-01
A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.
PCA based clustering for brain tumor segmentation of T1w MRI images.
Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay
2017-03-01
Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Rain volume estimation over areas using satellite and radar data
NASA Technical Reports Server (NTRS)
Doneaud, A. A.; Vonderhaar, T. H.
1985-01-01
The feasibility of rain volume estimation over fixed and floating areas was investigated using rapid scan satellite data following a technique recently developed with radar data, called the Area Time Integral (ATI) technique. The radar and rapid scan GOES satellite data were collected during the Cooperative Convective Precipitation Experiment (CCOPE) and North Dakota Cloud Modification Project (NDCMP). Six multicell clusters and cells were analyzed to the present time. A two-cycle oscillation emphasizing the multicell character of the clusters is demonstrated. Three clusters were selected on each day, 12 June and 2 July. The 12 June clusters occurred during the daytime, while the 2 July clusters during the nighttime. A total of 86 time steps of radar and 79 time steps of satellite images were analyzed. There were approximately 12-min time intervals between radar scans on the average.
Rayward, Anna T; Duncan, Mitch J; Brown, Wendy J; Plotnikoff, Ronald C; Burton, Nicola W
2017-08-01
This study aimed to identify how different patterns of physical activity, sleep duration and sleep quality cluster together, and to examine how the identified clusters differ in terms of socio-demographic and health characteristics. Participants were adults from Brisbane, Australia, aged 42-72 years who reported their physical activity, sleep duration, sleep quality, socio-demographic and health characteristics in 2011 (n=5854). Two-step Cluster Analyses were used to identify clusters. Cluster differences in socio-demographic and health characteristics were examined using chi square tests (p<0.05). Four clusters were identified: 'Poor Sleepers' (31.2%), 'Moderate Sleepers' (30.7%), 'Mixed Sleepers/Highly Active' (20.5%), and 'Excellent Sleepers/Mixed Activity' (17.6%). The 'Poor Sleepers' cluster had the highest proportion of participants with less-than-recommended sleep duration and poor sleep quality, had the poorest health characteristics and a high proportion of participants with low physical activity. Physical activity, sleep duration and sleep quality cluster together in distinct patterns and clusters of poor behaviours are associated with poor health status. Multiple health behaviour change interventions which target both physical activity and sleep should be prioritised to improve health outcomes in mid-aged adults. Copyright © 2017 Elsevier B.V. All rights reserved.
Clustering on Magnesium Surfaces – Formation and Diffusion Energies
Chu, Haijian; Huang, Hanchen; Wang, Jian
2017-07-12
The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and {more » $$\\bar{1}$$011} . In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a {$$\\bar{1}$$011} , clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface {$$\\bar{1}$$011} is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich–Schwoebel barriers converge as the step height is three atomic layers or thicker. FInally, adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.« less
Clustering on Magnesium Surfaces – Formation and Diffusion Energies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu, Haijian; Huang, Hanchen; Wang, Jian
The formation and diffusion energies of atomic clusters on Mg surfaces determine the surface roughness and formation of faulted structure, which in turn affect the mechanical deformation of Mg. This paper reports first principles density function theory (DFT) based quantum mechanics calculation results of atomic clustering on the low energy surfaces {0001} and {more » $$\\bar{1}$$011} . In parallel, molecular statics calculations serve to test the validity of two interatomic potentials and to extend the scope of the DFT studies. On a {0001} surface, a compact cluster consisting of few than three atoms energetically prefers a face-centered-cubic stacking, to serve as a nucleus of stacking fault. On a {$$\\bar{1}$$011} , clusters of any size always prefer hexagonal-close-packed stacking. Adatom diffusion on surface {$$\\bar{1}$$011} is high anisotropic while isotropic on surface (0001). Three-dimensional Ehrlich–Schwoebel barriers converge as the step height is three atomic layers or thicker. FInally, adatom diffusion along steps is via hopping mechanism, and that down steps is via exchange mechanism.« less
Clustering P-Wave Receiver Functions To Constrain Subsurface Seismic Structure
NASA Astrophysics Data System (ADS)
Chai, C.; Larmat, C. S.; Maceira, M.; Ammon, C. J.; He, R.; Zhang, H.
2017-12-01
The acquisition of high-quality data from permanent and temporary dense seismic networks provides the opportunity to apply statistical and machine learning techniques to a broad range of geophysical observations. Lekic and Romanowicz (2011) used clustering analysis on tomographic velocity models of the western United States to perform tectonic regionalization and the velocity-profile clusters agree well with known geomorphic provinces. A complementary and somewhat less restrictive approach is to apply cluster analysis directly to geophysical observations. In this presentation, we apply clustering analysis to teleseismic P-wave receiver functions (RFs) continuing efforts of Larmat et al. (2015) and Maceira et al. (2015). These earlier studies validated the approach with surface waves and stacked EARS RFs from the USArray stations. In this study, we experiment with both the K-means and hierarchical clustering algorithms. We also test different distance metrics defined in the vector space of RFs following Lekic and Romanowicz (2011). We cluster data from two distinct data sets. The first, corresponding to the western US, was by smoothing/interpolation of receiver-function wavefield (Chai et al. 2015). Spatial coherence and agreement with geologic region increase with this simpler, spatially smoothed set of observations. The second data set is composed of RFs for more than 800 stations of the China Digital Seismic Network (CSN). Preliminary results show a first order agreement between clusters and tectonic region and each region cluster includes a distinct Ps arrival, which probably reflects differences in crustal thickness. Regionalization remains an important step to characterize a model prior to application of full waveform and/or stochastic imaging techniques because of the computational expense of these types of studies. Machine learning techniques can provide valuable information that can be used to design and characterize formal geophysical inversion, providing information on spatial variability in the subsurface geology.
The cosmological analysis of X-ray cluster surveys - I. A new method for interpreting number counts
NASA Astrophysics Data System (ADS)
Clerc, N.; Pierre, M.; Pacaud, F.; Sadibekova, T.
2012-07-01
We present a new method aimed at simplifying the cosmological analysis of X-ray cluster surveys. It is based on purely instrumental observable quantities considered in a two-dimensional X-ray colour-magnitude diagram (hardness ratio versus count rate). The basic principle is that even in rather shallow surveys, substantial information on cluster redshift and temperature is present in the raw X-ray data and can be statistically extracted; in parallel, such diagrams can be readily predicted from an ab initio cosmological modelling. We illustrate the methodology for the case of a 100-deg2XMM survey having a sensitivity of ˜10-14 erg s-1 cm-2 and fit at the same time, the survey selection function, the cluster evolutionary scaling relations and the cosmology; our sole assumption - driven by the limited size of the sample considered in the case study - is that the local cluster scaling relations are known. We devote special attention to the realistic modelling of the count-rate measurement uncertainties and evaluate the potential of the method via a Fisher analysis. In the absence of individual cluster redshifts, the count rate and hardness ratio (CR-HR) method appears to be much more efficient than the traditional approach based on cluster counts (i.e. dn/dz, requiring redshifts). In the case where redshifts are available, our method performs similar to the traditional mass function (dn/dM/dz) for the purely cosmological parameters, but constrains better parameters defining the cluster scaling relations and their evolution. A further practical advantage of the CR-HR method is its simplicity: this fully top-down approach totally bypasses the tedious steps consisting in deriving cluster masses from X-ray temperature measurements.
Automated segmentation of comet assay images using Gaussian filtering and fuzzy clustering.
Sansone, Mario; Zeni, Olga; Esposito, Giovanni
2012-05-01
Comet assay is one of the most popular tests for the detection of DNA damage at single cell level. In this study, an algorithm for comet assay analysis has been proposed, aiming to minimize user interaction and providing reproducible measurements. The algorithm comprises two-steps: (a) comet identification via Gaussian pre-filtering and morphological operators; (b) comet segmentation via fuzzy clustering. The algorithm has been evaluated using comet images from human leukocytes treated with a commonly used DNA damaging agent. A comparison of the proposed approach with a commercial system has been performed. Results show that fuzzy segmentation can increase overall sensitivity, giving benefits in bio-monitoring studies where weak genotoxic effects are expected.
Dynamic clustering detection through multi-valued descriptors of dermoscopic images.
Cozza, Valentina; Guarracino, Maria Rosario; Maddalena, Lucia; Baroni, Adone
2011-09-10
This paper introduces a dynamic clustering methodology based on multi-valued descriptors of dermoscopic images. The main idea is to support medical diagnosis to decide if pigmented skin lesions belonging to an uncertain set are nearer to malignant melanoma or to benign nevi. Melanoma is the most deadly skin cancer, and early diagnosis is a current challenge for clinicians. Most data analysis algorithms for skin lesions discrimination focus on segmentation and extraction of features of categorical or numerical type. As an alternative approach, this paper introduces two new concepts: first, it considers multi-valued data that scalar variables not only describe but also intervals or histogram variables; second, it introduces a dynamic clustering method based on Wasserstein distance to compare multi-valued data. The overall strategy of analysis can be summarized into the following steps: first, a segmentation of dermoscopic images allows to identify a set of multi-valued descriptors; second, we performed a discriminant analysis on a set of images where there is an a priori classification so that it is possible to detect which features discriminate the benign and malignant lesions; and third, we performed the proposed dynamic clustering method on the uncertain cases, which need to be associated to one of the two previously mentioned groups. Results based on clinical data show that the grading of specific descriptors associated to dermoscopic characteristics provides a novel way to characterize uncertain lesions that can help the dermatologist's diagnosis. Copyright © 2011 John Wiley & Sons, Ltd.
Montemagni, Cristiana; Frieri, Tiziana; Villari, Vincenzo; Rocca, Paola
2018-06-01
The purpose of the study was to identify homogenous subgroups, based upon achievement of two functional milestones (marriage and employment) and Global Assessment of Functioning (GAF) score in a sample of 848 acute patients admitted to the Psychiatric Emergency Service (PES) of the Città della Salute e della Scienza di Torino, during a 24-months period. A two-step cluster-analysis, using GAF total score and the achievements in the two milestones as input data was performed. In order to examine whether the identified subgroups differed in external variables that were not included in the clustering process, and consequently to validate the found functional profiles, chi-square tests for categorical variables and analyses of variance (ANOVA) for continuous variables were performed. Five clusters were found. Employed patients (Clusters 4 and 5) had more years of education, less illness chronicity (shorter duration of illness and lower proportion of previous voluntary hospitalizations), lower use of mental health resources in the last year yet higher treatment adherence, larger network size, and higher ordinary discharge. Married inpatients (Clusters 3 and 5) had lower frequencies of substance abuse. The remarkably high rate of unemployment in this inpatients' sample, and the evidence of associations between unemployment and poorer functioning, argue for further research and development of evidence-based supported employment programs, that put forth diligent effort in helping people obtain work quickly and sustain; they may also help to reduce health care service use among that clientele.
Who are the obese? A cluster analysis exploring subgroups of the obese.
Green, M A; Strong, M; Razak, F; Subramanian, S V; Relton, C; Bissell, P
2016-06-01
Body mass index (BMI) can be used to group individuals in terms of their height and weight as obese. However, such a distinction fails to account for the variation within this group across other factors such as health, demographic and behavioural characteristics. The study aims to examine the existence of subgroups of obese individuals. Data were taken from the Yorkshire Health Study (2010-12) including information on demographic, health and behavioural characteristics. Individuals with a BMI of ≥30 were included. A two-step cluster analysis was used to define groups of individuals who shared common characteristics. The cluster analysis found six distinct groups of individuals whose BMI was ≥30. These subgroups were heavy drinking males, young healthy females; the affluent and healthy elderly; the physically sick but happy elderly; the unhappy and anxious middle aged and a cluster with the poorest health. It is important to account for the important heterogeneity within individuals who are obese. Interventions introduced by clinicians and policymakers should not target obese individuals as a whole but tailor strategies depending upon the subgroups that individuals belong to. © The Author 2015. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
The Polar Cusp Observed by Cluster Under Constant Imf-Bz Southward
NASA Astrophysics Data System (ADS)
Escoubet, C. P.; Berchem, J.; Pitout, F.; Trattner, K. J.; Richard, R. L.; Taylor, M. G.; Soucek, J.; Grison, B.; Laakso, H. E.; Masson, A.; Dunlop, M. W.; Dandouras, I. S.; Reme, H.; Fazakerley, A. N.; Daly, P. W.
2011-12-01
The Earth's magnetic field is influenced by the interplanetary magnetic field (IMF), specially at the magnetopause where both magnetic fields enter in direct contact and magnetic reconnection can be initiated. In the polar regions, the polar cusp that extends from the magnetopause down to the ionosphere is also directly influenced. The reconnection not only allow ions and electrons from the solar wind to enter the polar cusp but also give an impulse to the magnetic field lines threading the polar cusp through the reconnection electric field. A dispersion in energy of the ions is subsequently produced by the motion of field lines and the time-of-flight effect on down-going ions. If reconnection is continuous and operates at constant rate, the ion dispersion is smooth and continuous. On the other hand if the reconnection rate varies, we expect interruption in the dispersion forming energy steps or staircase. Similarly, multiple entries near the magnetopause could also produce steps at low or mid-altitude when a spacecraft is crossing subsequently the field lines originating from these multiple sources. Cluster with four spacecraft following each other in the mid-altitude cusp can be used to distinguish between these "temporal" and "spatial" effects. We will show two Cluster cusp crossings where the spacecraft were separated by a few minutes. The energy dispersions observed in the first crossing were the same during the few minutes that separated the spacecraft. In the second crossing, two ion dispersions were observed on the first spacecraft and only one of the following spacecraft, about 10 min later. The detailed analysis indicates that these steps result from spatial structures.
Clustering of amines and hydrazines in atmospheric nucleation
NASA Astrophysics Data System (ADS)
Li, Siyang; Qu, Kun; Zhao, Hailiang; Ding, Lei; Du, Lin
2016-06-01
It has been proved that the presence of amines in the atmosphere can enhance aerosol formation. Hydrazine (HD) and its substituted derivatives, monomethylhydrazine (MMH) and unsymmetrical dimethylhydrazine (UDMH), which are organic derivatives of amine and ammonia, are common trace atmospheric species that may contribute to the growth of nucleation clusters. The structures of the hydrazine and amine clusters containing one or two common nucleation molecules (ammonia, water, methanol and sulfuric acid) have been optimized using density functional theory (DFT) methods. The clusters growth mechanism has been explored from the thermochemistry by calculating the Gibbs free energies of adding an ammonia, water, methanol or sulfuric acid molecule step by step at room temperature, respectively. The results show that hydrazine and its derivatives could enhance heteromolecular homogeneous nucleation in the earth's atmosphere.
Comparative Effectiveness of Two Walking Interventions on Participation, Step Counts, and Health.
Smith-McLallen, Aaron; Heller, Debbie; Vernisi, Kristin; Gulick, Diana; Cruz, Samantha; Snyder, Richard L
2017-03-01
To (1) compare the effects of two worksite-based walking interventions on employee participation rates; (2) compare average daily step counts between conditions, and; (3) examine the effects of increases in average daily step counts on biometric and psychologic outcomes. We conducted a cluster-randomized trial in which six employer groups were randomly selected and randomly assigned to condition. Four manufacturing worksites and two office-based worksite served as the setting. A total of 474 employees from six employer groups were included. A standard walking program was compared to an enhanced program that included incentives, feedback, competitive challenges, and monthly wellness workshops. Walking was measured by self-reported daily step counts. Survey measures and biometric screenings were administered at baseline and 3, 6, and 9 months after baseline. Analysis used linear mixed models with repeated measures. During 9 months, participants in the enhanced condition averaged 726 more steps per day compared with those in the standard condition (p < .001). A 1000-step increase in average daily steps was associated with significant weight loss for both men (-3.8 lbs.) and women (-2.1 lbs.), and reductions in body mass index (-0.41 men, -0.31 women). Higher step counts were also associated with improvements in mood, having more energy, and higher ratings of overall health. An enhanced walking program significantly increases participation rates and daily step counts, which were associated with weight loss and reductions in body mass index.
Mayorga-Vega, Daniel; Viciana, Jesús
2014-06-01
The main purpose of this study was to evaluate the differences in adolescents´ objective physical activity levels and perceived effort in physical education, school recess, and extra-curricular organized sport by motivational profiles in physical education. A sample of 102 students 11-16 yr. old completed a self-report questionnaire assessing self-determined motivation toward physical education. Subsequently, students' objective physical activity levels (steps/min., METs, and moderate-to-vigorous physical activity) and perceived effort were evaluated for each situation. Cluster analysis identified a two-cluster structure: "Moderate motivation toward physical education profile" and "High motivation toward physical education profile." Adolescents in the second cluster had higher physical activity and perceived effort values than adolescents in the first cluster, except for METs and moderate-to-vigorous physical activity in extra-curricular sport. These results support the importance of physical education teachers who should promote self-determined motivation toward physical education so that students can reach the recommended physical activity levels.
Open-Source Sequence Clustering Methods Improve the State Of the Art.
Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob
2016-01-01
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Severity of post-stroke aphasia according to aphasia type and lesion location in Koreans.
Kang, Eun Kyoung; Sohn, Hae Min; Han, Moon-Ku; Kim, Won; Han, Tai Ryoon; Paik, Nam-Jong
2010-01-01
To determine the relations between post-stroke aphasia severity and aphasia type and lesion location, a retrospective review was undertaken using the medical records of 97 Korean patients, treated within 90 days of onset, for aphasia caused by unilateral left hemispheric stroke. Types of aphasia were classified according to the validated Korean version of the Western Aphasia Battery (K-WAB), and severities of aphasia were quantified using WAB Aphasia Quotients (AQ). Lesion locations were classified as cortical or subcortical, and were determined by magnetic resonance imaging. Two-step cluster analysis was performed using AQ values to classify aphasia severity by aphasia type and lesion location. Cluster analysis resulted in four severity clusters: 1) mild; anomic type, 2) moderate; Wernicke's, transcortical motor, transcortical sensory, conduction, and mixed transcortical types, 3) moderately severe; Broca's aphasia, and 4) severe; global aphasia, and also in three lesion location clusters: 1) mild; subcortical 2) moderate; cortical lesions involving Broca's and/or Wernicke's areas, and 3) severe; insular and cortical lesions not in Broca's or Wernicke's areas. These results revealed that within 3 months of stroke, global aphasia was the more severely affected type and cortical lesions were more likely to affect language function than subcortical lesions.
An image processing pipeline to detect and segment nuclei in muscle fiber microscopic images.
Guo, Yanen; Xu, Xiaoyin; Wang, Yuanyuan; Wang, Yaming; Xia, Shunren; Yang, Zhong
2014-08-01
Muscle fiber images play an important role in the medical diagnosis and treatment of many muscular diseases. The number of nuclei in skeletal muscle fiber images is a key bio-marker of the diagnosis of muscular dystrophy. In nuclei segmentation one primary challenge is to correctly separate the clustered nuclei. In this article, we developed an image processing pipeline to automatically detect, segment, and analyze nuclei in microscopic image of muscle fibers. The pipeline consists of image pre-processing, identification of isolated nuclei, identification and segmentation of clustered nuclei, and quantitative analysis. Nuclei are initially extracted from background by using local Otsu's threshold. Based on analysis of morphological features of the isolated nuclei, including their areas, compactness, and major axis lengths, a Bayesian network is trained and applied to identify isolated nuclei from clustered nuclei and artifacts in all the images. Then a two-step refined watershed algorithm is applied to segment clustered nuclei. After segmentation, the nuclei can be quantified for statistical analysis. Comparing the segmented results with those of manual analysis and an existing technique, we find that our proposed image processing pipeline achieves good performance with high accuracy and precision. The presented image processing pipeline can therefore help biologists increase their throughput and objectivity in analyzing large numbers of nuclei in muscle fiber images. © 2014 Wiley Periodicals, Inc.
Proposed variations of the stepped-wedge design can be used to accommodate multiple interventions
Lyons, Vivian H; Li, Lingyu; Hughes, James P; Rowhani-Rahbar, Ali
2018-01-01
Objective Stepped wedge design (SWD) cluster randomized trials have traditionally been used for evaluating a single intervention. We aimed to explore design variants suitable for evaluating multiple interventions in a SWD trial. Study Design and Setting We identified four specific variants of the traditional SWD that would allow two interventions to be conducted within a single cluster randomized trial: Concurrent, Replacement, Supplementation and Factorial SWDs. These variants were chosen to flexibly accommodate study characteristics that limit a one-size-fits-all approach for multiple interventions. Results In the Concurrent SWD, each cluster receives only one intervention, unlike the other variants. The Replacement SWD supports two interventions that will not or cannot be employed at the same time. The Supplementation SWD is appropriate when the second intervention requires the presence of the first intervention, and the Factorial SWD supports the evaluation of intervention interactions. The precision for estimating intervention effects varies across the four variants. Conclusion Selection of the appropriate design variant should be driven by the research question while considering the trade-off between the number of steps, number of clusters, restrictions for concurrent implementation of the interventions, lingering effects of each intervention, and precision of the intervention effect estimates. PMID:28412466
High- and low-level hierarchical classification algorithm based on source separation process
NASA Astrophysics Data System (ADS)
Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber
2016-10-01
High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong
2017-07-01
Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.
Akar, Servet; Solmaz, Dilek; Kasifoglu, Timucin; Bilge, Sule Yasar; Sari, Ismail; Gumus, Zeynep Zehra; Tunca, Mehmet
2016-02-01
The aim of this study was to evaluate whether there are clinical subgroups that may have different prognoses among FMF patients. The cumulative clinical features of a large group of FMF patients [1168 patients, 593 (50.8%) male, mean age 35.3 years (s.d. 12.4)] were studied. To analyse our data and identify groups of FMF patients with similar clinical characteristics, a two-step cluster analysis using log-likelihood distance measures was performed. For clustering the FMF patients, we evaluated the following variables: gender, current age, age at symptom onset, age at diagnosis, presence of major clinical features, variables related with therapy and family history for FMF, renal failure and carriage of M694V. Three distinct groups of FMF patients were identified. Cluster 1 was characterized by a high prevalence of arthritis, pleuritis, erysipelas-like erythema (ELE) and febrile myalgia. The dosage of colchicine and the frequency of amyloidosis were lower in cluster 1. Patients in cluster 2 had an earlier age of disease onset and diagnosis. M694V carriage and amyloidosis prevalence were the highest in cluster 2. This group of patients was using the highest dose of colchicine. Patients in cluster 3 had the lowest prevalence of arthritis, ELE and febrile myalgia. The frequencies of M694V carriage and amyloidosis were lower in cluster 3 than the overall FMF patients. Non-response to colchicine was also slightly lower in cluster 3. Patients with FMF can be clustered into distinct patterns of clinical and genetic manifestations and these patterns may have different prognostic significance. © The Author 2015. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Yen, Tsung-Wen; Lim, Thong-Leng; Yoon, Tiem-Leong; Lai, S. K.
2017-11-01
We combined a new parametrized density functional tight-binding (DFTB) theory (Fihey et al. 2015) with an unbiased modified basin hopping (MBH) optimization algorithm (Yen and Lai 2015) and applied it to calculate the lowest energy structures of Au clusters. From the calculated topologies and their conformational changes, we find that this DFTB/MBH method is a necessary procedure for a systematic study of the structural development of Au clusters but is somewhat insufficient for a quantitative study. As a result, we propose an extended hybridized algorithm. This improved algorithm proceeds in two steps. In the first step, the DFTB theory is employed to calculate the total energy of the cluster and this step (through running DFTB/MBH optimization for given Monte-Carlo steps) is meant to efficiently bring the Au cluster near to the region of the lowest energy minimum since the cluster as a whole has explicitly considered the interactions of valence electrons with ions, albeit semi-quantitatively. Then, in the second succeeding step, the energy-minimum searching process will continue with a skilledly replacement of the energy function calculated by the DFTB theory in the first step by one calculated in the full density functional theory (DFT). In these subsequent calculations, we couple the DFT energy also with the MBH strategy and proceed with the DFT/MBH optimization until the lowest energy value is found. We checked that this extended hybridized algorithm successfully predicts the twisted pyramidal structure for the Au40 cluster and correctly confirms also the linear shape of C8 which our previous DFTB/MBH method failed to do so. Perhaps more remarkable is the topological growth of Aun: it changes from a planar (n =3-11) → an oblate-like cage (n =12-15) → a hollow-shape cage (n =16-18) and finally a pyramidal-like cage (n =19, 20). These varied forms of the cluster's shapes are consistent with those reported in the literature.
Caroleo, Mariarita; Primerano, Amedeo; Rania, Marianna; Aloi, Matteo; Pugliese, Valentina; Magliocco, Fabio; Fazia, Gilda; Filippo, Andrea; Sinopoli, Flora; Ricchio, Marco; Arturi, Franco; Jimenez-Murcia, Susana; Fernandez-Aranda, Fernando; De Fazio, Pasquale; Segura-Garcia, Cristina
2018-02-01
Considering that specific genetic profiles, psychopathological conditions and neurobiological systems underlie human behaviours, the phenotypic differentiation of obese patients according to eating behaviours should be investigated. The aim of this study was to classify obese patients according to their eating behaviours and to compare these clusters in regard to psychopathology, personality traits, neurocognitive patterns and genetic profiles. A total of 201 obese outpatients seeking weight reduction treatment underwent a dietetic visit, psychological and psychiatric assessment and genotyping for SCL6A2 polymorphisms. Eating behaviours were clustered through two-step cluster analysis, and these clusters were subsequently compared. Two groups emerged: cluster 1 contained patients with predominantly prandial hyperphagia, social eating, an increased frequency of the long allele of the 5-HTTLPR and low scores in all tests; and cluster 2 included patients with more emotionally related eating behaviours (emotional eating, grazing, binge eating, night eating, post-dinner eating, craving for carbohydrates), dysfunctional personality traits, neurocognitive impairment, affective disorders and increased frequencies of the short (S) allele and the S/S genotype. Aside from binge eating, dysfunctional eating behaviours were useful symptoms to identify two different phenotypes of obese patients from a comprehensive set of parameters (genetic, clinical, personality and neuropsychology) in this sample. Grazing and emotional eating were the most important predictors for classifying obese patients, followed by binge eating. This clustering overcomes the idea that 'binging' is the predominant altered eating behaviour, and could help physicians other than psychiatrists to identify whether an obese patient has an eating disorder. Finally, recognising different types of obesity may not only allow a more comprehensive understanding of this illness, but also make it possible to tailor patient-specific treatment pathways. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
An information theory analysis of spatial decisions in cognitive development
Scott, Nicole M.; Sera, Maria D.; Georgopoulos, Apostolos P.
2015-01-01
Performance in a cognitive task can be considered as the outcome of a decision-making process operating across various knowledge domains or aspects of a single domain. Therefore, an analysis of these decisions in various tasks can shed light on the interplay and integration of these domains (or elements within a single domain) as they are associated with specific task characteristics. In this study, we applied an information theoretic approach to assess quantitatively the gain of knowledge across various elements of the cognitive domain of spatial, relational knowledge, as a function of development. Specifically, we examined changing spatial relational knowledge from ages 5 to 10 years. Our analyses consisted of a two-step process. First, we performed a hierarchical clustering analysis on the decisions made in 16 different tasks of spatial relational knowledge to determine which tasks were performed similarly at each age group as well as to discover how the tasks clustered together. We next used two measures of entropy to capture the gradual emergence of order in the development of relational knowledge. These measures of “cognitive entropy” were defined based on two independent aspects of chunking, namely (1) the number of clusters formed at each age group, and (2) the distribution of tasks across the clusters. We found that both measures of entropy decreased with age in a quadratic fashion and were positively and linearly correlated. The decrease in entropy and, therefore, gain of information during development was accompanied by improved performance. These results document, for the first time, the orderly and progressively structured “chunking” of decisions across the development of spatial relational reasoning and quantify this gain within a formal information-theoretic framework. PMID:25698915
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.
Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue
2018-05-02
Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.
Dynamical evolution of globular-cluster systems in clusters of galaxies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muzzio, J.C.
1987-04-01
The dynamical processes that affect globular-cluster systems in clusters of galaxies are analyzed. Two-body and impulsive approximations are utilized to study dynamical friction, drag force, tidal stripping, tidal radii, globular-cluster swapping, tidal accretion, and galactic cannibalism. The evolution of galaxies and the collision of galaxies are simulated numerically; the steps involved in the simulation are described. The simulated data are compared with observations. Consideration is given to the number of galaxies, halo extension, location of the galaxies, distribution of the missing mass, nonequilibrium initial conditions, mass dependence, massive central galaxies, globular-cluster distribution, and lost globular clusters. 116 references.
A two-step approach for mining patient treatment pathways in administrative healthcare databases.
Najjar, Ahmed; Reinharz, Daniel; Girouard, Catherine; Gagné, Christian
2018-05-01
Clustering electronic medical records allows the discovery of information on healthcare practices. Entries in such medical records are usually composed of a succession of diagnostics or therapeutic steps. The corresponding processes are complex and heterogeneous since they depend on medical knowledge integrating clinical guidelines, the physician's individual experience, and patient data and conditions. To analyze such data, we are first proposing to cluster medical visits, consultations, and hospital stays into homogeneous groups, and then to construct higher-level patient treatment pathways over these different groups. These pathways are then also clustered to distill typical pathways, enabling interpretation of clusters by experts. This approach is evaluated on a real-world administrative database of elderly people in Québec suffering from heart failures. Copyright © 2018 Elsevier B.V. All rights reserved.
Apparatus for simultaneously disreefing a centrally reefed clustered parachute system
Johnson, Donald W.
1988-01-01
A single multi-line cutter is connected to each of a cluster of parachutes by a separate short tether line that holds the parachutes, initially reefed by closed loop reefing lines, close to one another. The closed loop reefing lines and tether lines, one from each parachute, are disposed within the cutter to be simultaneously cut by its actuation when a central line attached between the payload and the cutter is stretched upon deployment of the cluster. A pyrotechnic or electronic time delay may be included in the cutter to delay the actual simultaneous cutting of all lines until the clustered parachutes attain a measure of stability prior to being disreefed. A second set of reefing lines and second tether lines may be provided for each parachute, to enable a two-stage, separately timed, step-by-step disreefing.
Apparatus for simultaneously disreefing a centrally reefed clustered parachute system
Johnson, D.W.
1988-06-21
A single multi-line cutter is connected to each of a cluster of parachutes by a separate short tether line that holds the parachutes, initially reefed by closed loop reefing lines, close to one another. The closed loop reefing lines and tether lines, one from each parachute, are disposed within the cutter to be simultaneously cut by its actuation when a central line attached between the payload and the cutter is stretched upon deployment of the cluster. A pyrotechnic or electronic time delay may be included in the cutter to delay the actual simultaneous cutting of all lines until the clustered parachutes attain a measure of stability prior to being disreefed. A second set of reefing lines and second tether lines may be provided for each parachute, to enable a two-stage, separately timed, step-by-step disreefing. 13 figs.
NASA Astrophysics Data System (ADS)
van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap
2018-04-01
In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
Farooq, U; Malecki, I A; Mahmood, M; Martin, G B
2017-06-01
One of the basic steps in objective analysis of sperm motility is the subdivision of a motile sperm population into slow, medium and rapid categories based on their velocity. However, for CASA analysis of quail sperm, the velocity values for categorization of slow, medium and rapid sperm have not yet been standardized. To identify the cut-off values of "velocity curvilinear" (VCL) for quail sperm categorization, we captured and analysed 22,300 tracks of quail sperm using SCA ® -CASA. The median and mean VCL values were 85 and 97 μm/s. To define the VCL cut-off values, we used two methods. In the first, we identified the upper (rapid sperm) and lower (slow sperm) cut-off values using: (i) median VCL ± 25% or ± 50% or ± 75% of median VCL value; (ii) first and third quartile values of VCL data (i.e. 25% cut-off setting); and (iii) 33% and 66% of VCL data. Among these settings, sperm categories and their corresponding motility characteristics recorded using the "25%" setting (i.e. slow ≤36 ≤ medium ≤154 ≤ rapid) were found the most realistic and coherent with male ranking by fertility. In the second method, we calculated heteroscedasticity in the total VCL data using PCA and the two-step clustering method. With this approach, the mean of the high and low clusters was 165 and 51 μm/s, respectively. Together, the mean from two methods suggested that, for SCA ® -CASA categorization of quail sperm, sperm should be classed as "rapid" at VCL ≥160 μm/s and "slow" at VCL ≤45 μm/s. © 2017 Blackwell Verlag GmbH.
Search automation of the generalized method of device operational characteristics improvement
NASA Astrophysics Data System (ADS)
Petrova, I. Yu; Puchkova, A. A.; Zaripova, V. M.
2017-01-01
The article presents brief results of analysis of existing search methods of the closest patents, which can be applied to determine generalized methods of device operational characteristics improvement. There were observed the most widespread clustering algorithms and metrics for determining the proximity degree between two documents. The article proposes the technique of generalized methods determination; it has two implementation variants and consists of 7 steps. This technique has been implemented in the “Patents search” subsystem of the “Intellect” system. Also the article gives an example of the use of the proposed technique.
Molecular-dynamics simulations of urea nucleation from aqueous solution
Salvalaglio, Matteo; Perego, Claudio; Giberti, Federico; Mazzotti, Marco; Parrinello, Michele
2015-01-01
Despite its ubiquitous character and relevance in many branches of science and engineering, nucleation from solution remains elusive. In this framework, molecular simulations represent a powerful tool to provide insight into nucleation at the molecular scale. In this work, we combine theory and molecular simulations to describe urea nucleation from aqueous solution. Taking advantage of well-tempered metadynamics, we compute the free-energy change associated to the phase transition. We find that such a free-energy profile is characterized by significant finite-size effects that can, however, be accounted for. The description of the nucleation process emerging from our analysis differs from classical nucleation theory. Nucleation of crystal-like clusters is in fact preceded by large concentration fluctuations, indicating a predominant two-step process, whereby embryonic crystal nuclei emerge from dense, disordered urea clusters. Furthermore, in the early stages of nucleation, two different polymorphs are seen to compete. PMID:25492932
Molecular-dynamics simulations of urea nucleation from aqueous solution.
Salvalaglio, Matteo; Perego, Claudio; Giberti, Federico; Mazzotti, Marco; Parrinello, Michele
2015-01-06
Despite its ubiquitous character and relevance in many branches of science and engineering, nucleation from solution remains elusive. In this framework, molecular simulations represent a powerful tool to provide insight into nucleation at the molecular scale. In this work, we combine theory and molecular simulations to describe urea nucleation from aqueous solution. Taking advantage of well-tempered metadynamics, we compute the free-energy change associated to the phase transition. We find that such a free-energy profile is characterized by significant finite-size effects that can, however, be accounted for. The description of the nucleation process emerging from our analysis differs from classical nucleation theory. Nucleation of crystal-like clusters is in fact preceded by large concentration fluctuations, indicating a predominant two-step process, whereby embryonic crystal nuclei emerge from dense, disordered urea clusters. Furthermore, in the early stages of nucleation, two different polymorphs are seen to compete.
An improved K-means clustering method for cDNA microarray image segmentation.
Wang, T N; Li, T J; Shao, G F; Wu, S X
2015-07-14
Microarray technology is a powerful tool for human genetic research and other biomedical applications. Numerous improvements to the standard K-means algorithm have been carried out to complete the image segmentation step. However, most of the previous studies classify the image into two clusters. In this paper, we propose a novel K-means algorithm, which first classifies the image into three clusters, and then one of the three clusters is divided as the background region and the other two clusters, as the foreground region. The proposed method was evaluated on six different data sets. The analyses of accuracy, efficiency, expression values, special gene spots, and noise images demonstrate the effectiveness of our method in improving the segmentation quality.
Busch, Vincent; Van Stel, Henk F; Schrijvers, Augustinus J P; de Leeuw, Johannes R J
2013-12-04
Recent studies show several health-related behaviors to cluster in adolescents. This has important implications for public health. Interrelated behaviors have been shown to be most effectively targeted by multimodal interventions addressing wider-ranging improvements in lifestyle instead of via separate interventions targeting individual behaviors. However, few previous studies have taken into account a broad, multi-disciplinary range of health-related behaviors and connected these behavioral patterns to health-related outcomes. This paper presents an analysis of the clustering of a broad range of health-related behaviors with relevant demographic factors and several health-related outcomes in adolescents. Self-report questionnaire data were collected from a sample of 2,690 Dutch high school adolescents. Behavioral patterns were deducted via Principal Components Analysis. Subsequently a Two-Step Cluster Analysis was used to identify groups of adolescents with similar behavioral patterns and health-related outcomes. Four distinct behavioral patterns describe the analyzed individual behaviors: 1- risk-prone behavior, 2- bully behavior, 3- problematic screen time use, and 4- sedentary behavior. Subsequent cluster analysis identified four clusters of adolescents. Multi-problem behavior was associated with problematic physical and psychosocial health outcomes, as opposed to those exerting relatively few unhealthy behaviors. These associations were relatively independent of demographics such as ethnicity, gender and socio-economic status. The results show that health-related behaviors tend to cluster, indicating that specific behavioral patterns underlie individual health behaviors. In addition, specific patterns of health-related behaviors were associated with specific health outcomes and demographic factors. In general, unhealthy behavior on account of multiple health-related behaviors was associated with both poor psychosocial and physical health. These findings have significant meaning for future public health programs, which should be more tailored with use of such knowledge on behavioral clustering via e.g. Transfer Learning.
2013-01-01
Background Recent studies show several health-related behaviors to cluster in adolescents. This has important implications for public health. Interrelated behaviors have been shown to be most effectively targeted by multimodal interventions addressing wider-ranging improvements in lifestyle instead of via separate interventions targeting individual behaviors. However, few previous studies have taken into account a broad, multi-disciplinary range of health-related behaviors and connected these behavioral patterns to health-related outcomes. This paper presents an analysis of the clustering of a broad range of health-related behaviors with relevant demographic factors and several health-related outcomes in adolescents. Methods Self-report questionnaire data were collected from a sample of 2,690 Dutch high school adolescents. Behavioral patterns were deducted via Principal Components Analysis. Subsequently a Two-Step Cluster Analysis was used to identify groups of adolescents with similar behavioral patterns and health-related outcomes. Results Four distinct behavioral patterns describe the analyzed individual behaviors: 1- risk-prone behavior, 2- bully behavior, 3- problematic screen time use, and 4- sedentary behavior. Subsequent cluster analysis identified four clusters of adolescents. Multi-problem behavior was associated with problematic physical and psychosocial health outcomes, as opposed to those exerting relatively few unhealthy behaviors. These associations were relatively independent of demographics such as ethnicity, gender and socio-economic status. Conclusions The results show that health-related behaviors tend to cluster, indicating that specific behavioral patterns underlie individual health behaviors. In addition, specific patterns of health-related behaviors were associated with specific health outcomes and demographic factors. In general, unhealthy behavior on account of multiple health-related behaviors was associated with both poor psychosocial and physical health. These findings have significant meaning for future public health programs, which should be more tailored with use of such knowledge on behavioral clustering via e.g. Transfer Learning. PMID:24305509
Fleury, Marie-Josée; Grenier, Guy; Bamvita, Jean-Marie
2017-11-13
This study developed a typology describing change in the perceived adequacy of help received among 204 individuals with severe mental disorders, 5 years after transfer to the community following a major mental health reform in Quebec (Canada). Participant typologies were constructed using a two-step cluster analysis. There were significant differences between T0 and T2 for perceived adequacy of help received and other independent variables, including seriousness of needs, help from services or relatives, and care continuity. Five classes emerged from the analysis. Perceived adequacy of help received at T2 increased for Class 1, mainly comprised of older women with mood disorders. Overall, greater care continuity and levels of help from services and relatives related to higher perceived AHR. Changes in perceived adequacy of help received resulting from several combinations of associated variables indicate that MH service delivery should respond to specific profiles and determinants.
NASA Astrophysics Data System (ADS)
Wang, Yi-Min; Li, Cheng-Zu
2010-01-01
We propose theoretical schemes to generate highly entangled cluster state with superconducting qubits in a circuit QED architecture. Charge qubits are located inside a superconducting transmission line, which serves as a quantum data bus. We show that large clusters state can be efficiently generated in just one step with the long-range Ising-like unitary operators. The quantum operations which are generally realized by two coupling mechanisms: either voltage coupling or current coupling, depend only on global geometric features and are insensitive not only to the thermal state of the transmission line but also to certain random operation errors. Thus high-fidelity one-way quantum computation can be achieved.
Characteristics of Brazilian Offenders and Victims of Interpersonal Violence: An Exploratory Study.
d'Avila, Sérgio; Campos, Ana Cristina; Bernardino, Ítalo de Macedo; Cavalcante, Gigliana Maria Sobral; Nóbrega, Lorena Marques da; Ferreira, Efigênia Ferreira E
2016-10-01
The aim of this study was to characterize the profile of Brazilian offenders and victims of interpersonal violence, following a medicolegal and forensic perspective. A cross-sectional and exploratory study was performed in a Center of Forensic Medicine and Dentistry. The sample was made up of 1,704 victims of nonlethal interpersonal violence with some type of trauma. The victims were subject to forensic examinations by a criminal investigative team that identified and recorded the extent of the injuries. For data collection, a specific form was designed consisting of four parts according to the information provided in the medicolegal and social records: sociodemographic data of the victims, offender's characteristics, aggression characteristics, and types of injuries. Descriptive and multivariate statistics using cluster analysis (CA) were performed. The two-step cluster method was used to characterize the profile of the victims and offenders. Most of the events occurred during the nighttime (50.9%) and on weekdays (66.3%). Soft tissue injuries were the most prevalent type (94.6%). Based on the CA results, two clusters for the victims and two for the offenders were identified. Victims: Cluster 1 was formed typically by women, aged 30 to 59 years, and married; Cluster 2 was composed of men, aged 20 to 29 years, and unmarried. Offenders: Cluster 1 was characterized by men, who perpetrated violence in a community environment. Cluster 2 was formed by men, who perpetrated violence in the familiar environment. These findings revealed different risk groups with distinct characteristics for both victims and offenders, allowing the planning of targeted measures of care, prevention, and health promotion. This study assesses the profile of violence through morbidity data and significantly contributes to building an integrated system of health surveillance in Brazil, as well as linking police stations, forensic services, and emergency hospitals.
Damianos, Konstantina; Ferrando, Riccardo
2012-02-21
The structural modifications of small supported gold clusters caused by realistic surface defects (steps) in the MgO(001) support are investigated by computational methods. The most stable gold cluster structures on a stepped MgO(001) surface are searched for in the size range up to 24 Au atoms, and locally optimized by density-functional calculations. Several structural motifs are found within energy differences of 1 eV: inclined leaflets, arched leaflets, pyramidal hollow cages and compact structures. We show that the interaction with the step clearly modifies the structures with respect to adsorption on the flat defect-free surface. We find that leaflet structures clearly dominate for smaller sizes. These leaflets are either inclined and quasi-horizontal, or arched, at variance with the case of the flat surface in which vertical leaflets prevail. With increasing cluster size pyramidal hollow cages begin to compete against leaflet structures. Cage structures become more and more favourable as size increases. The only exception is size 20, at which the tetrahedron is found as the most stable isomer. This tetrahedron is however quite distorted. The comparison of two different exchange-correlation functionals (Perdew-Burke-Ernzerhof and local density approximation) show the same qualitative trends. This journal is © The Royal Society of Chemistry 2012
Van Landuyt, K L; Peumans, M; Fieuws, S; De Munck, J; Cardoso, M V; Ermis, R B; Lambrechts, P; Van Meerbeek, B
2008-10-01
One-step self-etch adhesives are the most recent generation of adhesives introduced onto the market. The objective of this randomized controlled clinical trial was to test the hypothesis that a one-step self-etch adhesive performs equally well as a conventional three-step etch&rinse adhesive (gold standard). Fifty-two patients had 267 non-carious cervical lesions restored with Gradia Direct Anterior (GC). These composite restorations were bonded either with the 'all-in-one' adhesive G-Bond (GC) or with the three-step etch&rinse adhesive Optibond FL (Kerr). The restorations were evaluated after 6 and 12 months clinical service regarding their retention, marginal integrity and discoloration, caries occurrence, preservation of tooth vitality and post-operative sensitivity. Retention loss, severe marginal defects and/or discoloration that needed intervention (repair or replacement) and the occurrence of caries were considered as clinical failures. A logistic regression analysis with generalized estimating equations was used to account for the clustered data (multiple restorations per patient). The recall rate at 1 year was 98%. The statistical analysis revealed a relatively low patient factor, indicating that supplementary information could be obtained from the additional restorations placed per patient. The retention rate for G-Bond was 98.5% compared to 99.3% for Optibond FL, due to the retention loss of two and one restorations, respectively. There were no significant differences between the two adhesives regarding the evaluated parameters except for the presence of small enamel marginal defects with G-Bond. After 12 months, the simplified one-step G-Bond and the three-step Optibond FL were clinically equally successful, even though both adhesives were characterized by progressive degradation of marginal adaptation, and G-Bond exhibited more small enamel marginal defects.
NASA Astrophysics Data System (ADS)
Bruynooghe, Michel M.
1998-04-01
In this paper, we present a robust method for automatic object detection and delineation in noisy complex images. The proposed procedure is a three stage process that integrates image segmentation by multidimensional pixel clustering and geometrically constrained optimization of deformable contours. The first step is to enhance the original image by nonlinear unsharp masking. The second step is to segment the enhanced image by multidimensional pixel clustering, using our reducible neighborhoods clustering algorithm that has a very interesting theoretical maximal complexity. Then, candidate objects are extracted and initially delineated by an optimized region merging algorithm, that is based on ascendant hierarchical clustering with contiguity constraints and on the maximization of average contour gradients. The third step is to optimize the delineation of previously extracted and initially delineated objects. Deformable object contours have been modeled by cubic splines. An affine invariant has been used to control the undesired formation of cusps and loops. Non linear constrained optimization has been used to maximize the external energy. This avoids the difficult and non reproducible choice of regularization parameters, that are required by classical snake models. The proposed method has been applied successfully to the detection of fine and subtle microcalcifications in X-ray mammographic images, to defect detection by moire image analysis, and to the analysis of microrugosities of thin metallic films. The later implementation of the proposed method on a digital signal processor associated to a vector coprocessor would allow the design of a real-time object detection and delineation system for applications in medical imaging and in industrial computer vision.
A VTVH MCD and EPR Spectroscopic Study of the Maturation of the "Second" Nitrogenase P-Cluster.
Rupnik, Kresimir; Lee, Chi Chung; Hu, Yilin; Ribbe, Markus W; Hales, Brian J
2018-04-16
The P-cluster of the nitrogenase MoFe protein is a [ Fe 8 S 7 ] cluster that mediates efficient transfer of electrons to the active site for substrate reduction. Arguably the most complex homometallic FeS cluster found in nature, the biosynthetic mechanism of the P-cluster is of considerable theoretical and synthetic interest to chemists and biochemists alike. Previous studies have revealed a biphasic assembly mechanism of the two P-clusters in the MoFe protein upon incubation with Fe protein and ATP, in which the first P-cluster is formed through fast fusion of a pair of [ Fe 4 S 4 ] + clusters within 5 min and the second P-cluster is formed through slow fusion of the second pair of [ Fe 4 S 4 ] + clusters in a period of 2 h. Here we report a VTVH MCD and EPR spectroscopic study of the biosynthesis of the slow-forming, second P-cluster within the MoFe protein. Our results show that the first major step in the formation of the second P-cluster is the conversion of one of the precursor [ Fe 4 S 4 ] + clusters into the integer spin cluster [ Fe 4 S 3-4 ] α , a process aided by the assembly protein NifZ, whereas the second major biosynthetic step appears to be the formation of a diamagnetic cluster with a possible structure of [ Fe 8 S 7-8 ] β , which is eventually converted into the P-cluster.
Chen, Bin; Kim, Hyunmi; Keasler, Samuel J; Nellas, Ricky B
2008-04-03
The aggregation-volume-bias Monte Carlo based simulation technique, which has led to our recent success in vapor-liquid nucleation research, was extended to the study of crystal nucleation processes. In contrast to conventional bulk-phase techniques, this method deals with crystal nucleation events in cluster systems. This approach was applied to the crystal nucleation of Lennard-Jonesium under a wide range of undercooling conditions from 35% to 13% below the triple point. It was found that crystal nucleation in these model clusters proceeds initially via a vapor-liquid like aggregation followed by the formation of crystals inside the aggregates. The separation of these two stages of nucleation is distinct except at deeper undercooling conditions where the crystal nucleation barrier was found to diminish. The simulation results obtained for these two nucleation steps are separately compared to the classical nucleation theory (CNT). For the vapor-liquid nucleation step, the CNT was shown to provide a reasonable description of the critical cluster size but overestimate the barrier heights, consistent with previous simulation studies. On the contrary, for the crystal nucleation step, nearly perfect agreement with the barrier heights was found between the simulations and the CNT. For the critical cluster size, the comparison is more difficult as the simulation data were found to be sensitive to the definition of the solid cluster, but a stringent criterion and lower undercooling conditions generally lead to results closer with the CNT. Additional simulations at undercooling conditions of 40% or above indicate a nearly barrierless transition from the liquid to crystalline-like structure for sufficiently large clusters, which leads to further departure of the barrier height predicted by the CNT from the simulation data for the aggregation step. This is consistent with the latest experimental results on argon that show an unusually large underestimation of the nucleation rate by the CNT toward deep undercooling conditions.
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F; Perez, Danny
2017-10-21
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
NASA Astrophysics Data System (ADS)
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F.; Perez, Danny
2017-10-01
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
Vautier, S; Jmel, S; Fourio, C; Moncany, D
2007-09-01
The present study investigates the heterogeneity of the population of young adult drinkers with respect to alcohol consumption and Positive Alcohol Expectancies (PAEs). Based on the positive relationship between both kinds of variables, PAE is commonly viewed as a potential motivational factor of alcoholic addiction. Empirical analyses based on the regression of alcohol consumption on PAEs suppose that the observations are statistically homogeneous with respect to the level of alcohol consumption, however. We explored the existence of moderate drinkers with a high PAE profile, and abusive drinkers with a low PAE profile. 1,017 young adult drinkers, mean age=23 +/- 2.84, with various educational levels, comprising 506 males and 511 females, were recruited as voluntary participants in a survey by undergraduate psychology students from the University of Toulouse Le Mirail. They completed a French version of the Alcohol Use Disorders Identifiction Test (AUDIT) and a French adaptation of the Alcohol Expectancy Questionnaire (AEQ). Three levels of alcohol consumption were defined using the AUDIT score, and six composite scores were obtained by averaging the relevant item-scores from the AEQ. The AEQ scores were interpreted as measurement of six kinds of PAEs, namely Global positive change, Sexual enhancement, Social and physical pleasure, Social assertiveness, Relaxation, and Arousal/Power. The TwoStep cluster methodology was used to explore the data. This methodology is convenient to deal with a mix of quantitative and qualitative variables, and it provides a classification model which is optimized through the use of an information criterion as Schwarz's Bayesian Information Criterion (BIC). The automatic clustering suggested five clusters, whose stability was ascertained until 75% of the sample size. Low drinkers (n=527) were split into one cluster of low PAEs (I1) and, interestingly, one cluster of high PAEs (I3, 46%). High drinkers (n=344) were split into one cluster of intermediate PAEs (II4) and one cluster of high PAEs (II5, 52%). Interestingly again, abusive drinkers (n=146) remained a single group (III2), exhibiting high PAEs. Clusters I3 and III3 comprised a significant proportion of males. Constraining the algorithm to find 6 clusters did not affect class III2, but split low drinkers into three clusters. Although the present results should be considered cautiously because of the novelty of TwoStep cluster methodology, they suggest a group of moderate drinkers with high PAEs. Also, abusive drinkers express high PAEs (except for 2 cases). Statistical homogeneity of moderate drinkers with respect to PAE variables appears as a dubious assumption.
NASA Astrophysics Data System (ADS)
Badrzadeh, Honey; Sarukkalige, Ranjan; Jayawardena, A. W.
2013-12-01
Discrete wavelet transform was applied to decomposed ANN and ANFIS inputs.Novel approach of WNF with subtractive clustering applied for flow forecasting.Forecasting was performed in 1-5 step ahead, using multi-variate inputs.Forecasting accuracy of peak values and longer lead-time significantly improved.
Room-temperature current blockade in atomically defined single-cluster junctions
NASA Astrophysics Data System (ADS)
Lovat, Giacomo; Choi, Bonnie; Paley, Daniel W.; Steigerwald, Michael L.; Venkataraman, Latha; Roy, Xavier
2017-11-01
Fabricating nanoscopic devices capable of manipulating and processing single units of charge is an essential step towards creating functional devices where quantum effects dominate transport characteristics. The archetypal single-electron transistor comprises a small conducting or semiconducting island separated from two metallic reservoirs by insulating barriers. By enabling the transfer of a well-defined number of charge carriers between the island and the reservoirs, such a device may enable discrete single-electron operations. Here, we describe a single-molecule junction comprising a redox-active, atomically precise cobalt chalcogenide cluster wired between two nanoscopic electrodes. We observe current blockade at room temperature in thousands of single-cluster junctions. Below a threshold voltage, charge transfer across the junction is suppressed. The device is turned on when the temporary occupation of the core states by a transiting carrier is energetically enabled, resulting in a sequential tunnelling process and an increase in current by a factor of ∼600. We perform in situ and ex situ cyclic voltammetry as well as density functional theory calculations to unveil a two-step process mediated by an orbital localized on the core of the cluster in which charge carriers reside before tunnelling to the collector reservoir. As the bias window of the junction is opened wide enough to include one of the cluster frontier orbitals, the current blockade is lifted and charge carriers can tunnel sequentially across the junction.
Evans, Christopher M; Love, Alyssa M; Weiss, Emily A
2012-10-17
This article reports control of the competition between step-growth and living chain-growth polymerization mechanisms in the formation of cadmium chalcogenide colloidal quantum dots (QDs) from CdSe(S) clusters by varying the concentration of anionic surfactant in the synthetic reaction mixture. The growth of the particles proceeds by step-addition from initially nucleated clusters in the absence of excess phosphinic or carboxylic acids, which adsorb as their anionic conjugate bases, and proceeds indirectly by dissolution of clusters, and subsequent chain-addition of monomers to stable clusters (Ostwald ripening) in the presence of excess phosphinic or carboxylic acid. Fusion of clusters by step-growth polymerization is an explanation for the consistent observation of so-called "magic-sized" clusters in QD growth reactions. Living chain-addition (chain addition with no explicit termination step) produces QDs over a larger range of sizes with better size dispersity than step-addition. Tuning the molar ratio of surfactant to Se(2-)(S(2-)), the limiting ionic reagent, within the living chain-addition polymerization allows for stoichiometric control of QD radius without relying on reaction time.
Improving clustering with metabolic pathway data.
Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando
2014-04-10
It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.
Gathering Real World Evidence with Cluster Analysis for Clinical Decision Support.
Xia, Eryu; Liu, Haifeng; Li, Jing; Mei, Jing; Li, Xuejun; Xu, Enliang; Li, Xiang; Hu, Gang; Xie, Guotong; Xu, Meilin
2017-01-01
Clinical decision support systems are information technology systems that assist clinical decision-making tasks, which have been shown to enhance clinical performance. Cluster analysis, which groups similar patients together, aims to separate patient cases into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. Useful as it is, the application of cluster analysis in clinical decision support systems is less reported. Here, we describe the usage of cluster analysis in clinical decision support systems, by first dividing patient cases into similar groups and then providing diagnosis or treatment suggestions based on the group profiles. This integration provides data for clinical decisions and compiles a wide range of clinical practices to inform the performance of individual clinicians. We also include an example usage of the system under the scenario of blood lipid management in type 2 diabetes. These efforts represent a step toward promoting patient-centered care and enabling precision medicine.
NASA Astrophysics Data System (ADS)
Wismüller, Axel; DSouza, Adora M.; Abidin, Anas Z.; Wang, Xixi; Hobbs, Susan K.; Nagarajan, Mahesh B.
2015-03-01
Echo state networks (ESN) are recurrent neural networks where the hidden layer is replaced with a fixed reservoir of neurons. Unlike feed-forward networks, neuron training in ESN is restricted to the output neurons alone thereby providing a computational advantage. We demonstrate the use of such ESNs in our mutual connectivity analysis (MCA) framework for recovering the primary motor cortex network associated with hand movement from resting state functional MRI (fMRI) data. Such a framework consists of two steps - (1) defining a pair-wise affinity matrix between different pixel time series within the brain to characterize network activity and (2) recovering network components from the affinity matrix with non-metric clustering. Here, ESNs are used to evaluate pair-wise cross-estimation performance between pixel time series to create the affinity matrix, which is subsequently subject to non-metric clustering with the Louvain method. For comparison, the ground truth of the motor cortex network structure is established with a task-based fMRI sequence. Overlap between the primary motor cortex network recovered with our model free MCA approach and the ground truth was measured with the Dice coefficient. Our results show that network recovery with our proposed MCA approach is in close agreement with the ground truth. Such network recovery is achieved without requiring low-pass filtering of the time series ensembles prior to analysis, an fMRI preprocessing step that has courted controversy in recent years. Thus, we conclude our MCA framework can allow recovery and visualization of the underlying functionally connected networks in the brain on resting state fMRI.
Holden, Richard J; Kulanthaivel, Anand; Purkayastha, Saptarshi; Goggins, Kathryn M; Kripalani, Sunil
2017-12-01
Personas are a canonical user-centered design method increasingly used in health informatics research. Personas-empirically-derived user archetypes-can be used by eHealth designers to gain a robust understanding of their target end users such as patients. To develop biopsychosocial personas of older patients with heart failure using quantitative analysis of survey data. Data were collected using standardized surveys and medical record abstraction from 32 older adults with heart failure recently hospitalized for acute heart failure exacerbation. Hierarchical cluster analysis was performed on a final dataset of n=30. Nonparametric analyses were used to identify differences between clusters on 30 clustering variables and seven outcome variables. Six clusters were produced, ranging in size from two to eight patients per cluster. Clusters differed significantly on these biopsychosocial domains and subdomains: demographics (age, sex); medical status (comorbid diabetes); functional status (exhaustion, household work ability, hygiene care ability, physical ability); psychological status (depression, health literacy, numeracy); technology (Internet availability); healthcare system (visit by home healthcare, trust in providers); social context (informal caregiver support, cohabitation, marital status); and economic context (employment status). Tabular and narrative persona descriptions provide an easy reference guide for informatics designers. Personas development using approaches such as clustering of structured survey data is an important tool for health informatics professionals. We describe insights from our study of patients with heart failure, then recommend a generic ten-step personas development process. Methods strengths and limitations of the study and of personas development generally are discussed. Copyright © 2017 Elsevier B.V. All rights reserved.
Severity of Post-stroke Aphasia According to Aphasia Type and Lesion Location in Koreans
Kang, Eun Kyoung; Sohn, Hae Min; Han, Moon-Ku; Kim, Won; Han, Tai Ryoon
2010-01-01
To determine the relations between post-stroke aphasia severity and aphasia type and lesion location, a retrospective review was undertaken using the medical records of 97 Korean patients, treated within 90 days of onset, for aphasia caused by unilateral left hemispheric stroke. Types of aphasia were classified according to the validated Korean version of the Western Aphasia Battery (K-WAB), and severities of aphasia were quantified using WAB Aphasia Quotients (AQ). Lesion locations were classified as cortical or subcortical, and were determined by magnetic resonance imaging. Two-step cluster analysis was performed using AQ values to classify aphasia severity by aphasia type and lesion location. Cluster analysis resulted in four severity clusters: 1) mild; anomic type, 2) moderate; Wernicke's, transcortical motor, transcortical sensory, conduction, and mixed transcortical types, 3) moderately severe; Broca's aphasia, and 4) severe; global aphasia, and also in three lesion location clusters: 1) mild; subcortical 2) moderate; cortical lesions involving Broca's and/or Wernicke's areas, and 3) severe; insular and cortical lesions not in Broca's or Wernicke's areas. These results revealed that within 3 months of stroke, global aphasia was the more severely affected type and cortical lesions were more likely to affect language function than subcortical lesions. PMID:20052357
Progressive myoclonic epilepsies
Michelucci, Roberto; Canafoglia, Laura; Striano, Pasquale; Gambardella, Antonio; Magaudda, Adriana; Tinuper, Paolo; La Neve, Angela; Ferlazzo, Edoardo; Gobbi, Giuseppe; Giallonardo, Anna Teresa; Capovilla, Giuseppe; Visani, Elisa; Panzica, Ferruccio; Avanzini, Giuliano; Tassinari, Carlo Alberto; Bianchi, Amedeo; Zara, Federico
2014-01-01
Objective: To define the clinical spectrum and etiology of progressive myoclonic epilepsies (PMEs) in Italy using a database developed by the Genetics Commission of the Italian League against Epilepsy. Methods: We collected clinical and laboratory data from patients referred to 25 Italian epilepsy centers regardless of whether a positive causative factor was identified. PMEs of undetermined origins were grouped using 2-step cluster analysis. Results: We collected clinical data from 204 patients, including 77 with a diagnosis of Unverricht-Lundborg disease and 37 with a diagnosis of Lafora body disease; 31 patients had PMEs due to rarer genetic causes, mainly neuronal ceroid lipofuscinoses. Two more patients had celiac disease. Despite extensive investigation, we found no definitive etiology for 57 patients. Cluster analysis indicated that these patients could be grouped into 2 clusters defined by age at disease onset, age at myoclonus onset, previous psychomotor delay, seizure characteristics, photosensitivity, associated signs other than those included in the cardinal definition of PME, and pathologic MRI findings. Conclusions: Information concerning the distribution of different genetic causes of PMEs may provide a framework for an updated diagnostic workup. Phenotypes of the patients with PME of undetermined cause varied widely. The presence of separate clusters suggests that novel forms of PME are yet to be clinically and genetically characterized. PMID:24384641
Progressive myoclonic epilepsies: definitive and still undetermined causes.
Franceschetti, Silvana; Michelucci, Roberto; Canafoglia, Laura; Striano, Pasquale; Gambardella, Antonio; Magaudda, Adriana; Tinuper, Paolo; La Neve, Angela; Ferlazzo, Edoardo; Gobbi, Giuseppe; Giallonardo, Anna Teresa; Capovilla, Giuseppe; Visani, Elisa; Panzica, Ferruccio; Avanzini, Giuliano; Tassinari, Carlo Alberto; Bianchi, Amedeo; Zara, Federico
2014-02-04
To define the clinical spectrum and etiology of progressive myoclonic epilepsies (PMEs) in Italy using a database developed by the Genetics Commission of the Italian League against Epilepsy. We collected clinical and laboratory data from patients referred to 25 Italian epilepsy centers regardless of whether a positive causative factor was identified. PMEs of undetermined origins were grouped using 2-step cluster analysis. We collected clinical data from 204 patients, including 77 with a diagnosis of Unverricht-Lundborg disease and 37 with a diagnosis of Lafora body disease; 31 patients had PMEs due to rarer genetic causes, mainly neuronal ceroid lipofuscinoses. Two more patients had celiac disease. Despite extensive investigation, we found no definitive etiology for 57 patients. Cluster analysis indicated that these patients could be grouped into 2 clusters defined by age at disease onset, age at myoclonus onset, previous psychomotor delay, seizure characteristics, photosensitivity, associated signs other than those included in the cardinal definition of PME, and pathologic MRI findings. Information concerning the distribution of different genetic causes of PMEs may provide a framework for an updated diagnostic workup. Phenotypes of the patients with PME of undetermined cause varied widely. The presence of separate clusters suggests that novel forms of PME are yet to be clinically and genetically characterized.
Classification Order of Surface-Confined Intermixing at Epitaxial Interface
NASA Astrophysics Data System (ADS)
Michailov, M.
The self-organization phenomena at epitaxial interface hold special attention in contemporary material science. Being relevant to the fundamental physical problem of competing, long-range and short-range atomic interactions in systems with reduced dimensionality, these phenomena have found exacting academic interest. They are also of great technological importance for their ability to bring spontaneous formation of regular nanoscale surface patterns and superlattices with exotic properties. The basic phenomenon involved in this process is surface diffusion. That is the motivation behind the present study which deals with important details of diffusion scenarios that control the fine atomic structure of epitaxial interface. Consisting surface imperfections (terraces, steps, kinks, and vacancies), the interface offers variety of barriers for surface diffusion. Therefore, the adatoms and clusters need a certain critical energy to overcome the corresponding diffusion barriers. In the most general case the critical energies can be attained by variation of the system temperature. Hence, their values define temperature limits of system energy gaps associated with different diffusion scenarios. This systematization imply classification order of surface alloying: blocked, incomplete, and complete. On that background, two diffusion problems, related to the atomic-scale surface morphology, will be discussed. The first problem deals with diffusion of atomic clusters on atomically smooth interface. On flat domains, far from terraces and steps, we analyzed the impact of size, shape, and cluster/substrate lattice misfit on the diffusion behavior of atomic clusters (islands). We found that the lattice constant of small clusters depends on the number N of building atoms at 1 < N ≤ 10. In heteroepitaxy, this effect of variable lattice constant originates from the enhanced charge transfer and the strong influence of the surface potential on cluster atomic arrangement. At constant temperature, the variation of the lattice constant leads to variable misfit which affects the island migration. The cluster/substrate commensurability influences the oscillation behavior of the diffusion coefficient caused by variation in the cluster shape. We discuss the results in a physical model that implies cluster diffusion with size-dependent cluster/substrate misfit. The second problem is devoted to diffusion phenomena in the vicinity of atomic terraces on stepped or vicinal surfaces. Here, we develop a computational model that refines important details of diffusion behavior of adatoms accounting for the energy barriers at specific atomic sites (smooth domains, terraces, and steps) located on the crystal surface. The dynamic competition between energy gained by mixing and substrate strain energy results in diffusion scenario where adatoms form alloyed islands and alloyed stripes in the vicinity of terrace edges. Being in agreement with recent experimental findings, the observed effect of stripe and island alloy formation opens up a way regular surface patterns to be configured at different atomic levels on the crystal surface. The complete surface alloying of the entire interface layer is also briefly discussed with critical analysis and classification of experimental findings and simulation data.
A Typology of Social Workers in Long-Term Care Facilities in Israel.
Lev, Sagit; Ayalon, Liat
2018-04-01
This article explores moral distress among long-term care facility (LTCF) social workers by examining the relationships between moral distress and environmental and personal features. Based on these features, authors identified a typology of LTCF social workers and how they handle moral distress. Such a typology can assist in the identification of social workers who are in a particular need for assistance. Overall, 216 LTCF social workers took part in the study. A two-step cluster analysis was conducted to identify a typology of LTCF social workers based on features such as ethical environment, support in workplace, mastery, and resilience. The variance of the identified clusters and their associations with moral distress were examined, and four clusters of LTCF social workers were identified. The clusters varied from each other in relation to their personal and environmental features and in relation to their experience of moral distress. The article concludes with a discussion of the importance of developing programs for LTCF social workers that provide support and enhancement of personal resources and an adequate and ethical environment for practice.
Kim, Hee-Sook; Eun, Sang Jun; Hwang, Jin Yong; Lee, Kun-Sei; Cho, Sung-Il
2018-05-01
Most patients with acute myocardial infarction (AMI) experience more than one symptom at onset. Although symptoms are an important early indicator, patients and physicians may have difficulty interpreting symptoms and detecting AMI at an early stage. This study aimed to identify symptom clusters among Korean patients with ST-elevation myocardial infarction (STEMI), to examine the relationship between symptom clusters and patient-related variables, and to investigate the influence of symptom clusters on treatment time delay (decision time [DT], onset-to-balloon time [OTB]). This was a prospective multicenter study with a descriptive design that used face-to-face interviews. A total of 342 patients with STEMI were included in this study. To identify symptom clusters, two-step cluster analysis was performed using SPSS software. Multinomial logistic regression to explore factors related to each cluster and multiple logistic regression to determine the effect of symptom clusters on treatment time delay were conducted. Three symptom clusters were identified: cluster 1 (classic MI; characterized by chest pain); cluster 2 (stress symptoms; sweating and chest pain); and cluster 3 (multiple symptoms; dizziness, sweating, chest pain, weakness, and dyspnea). Compared with patients in clusters 2 and 3, those in cluster 1 were more likely to have diabetes or prior MI. Patients in clusters 2 and 3, who predominantly showed other symptoms in addition to chest pain, had a significantly shorter DT and OTB than those in cluster 1. In conclusion, to decrease treatment time delay, it seems important that patients and clinicians recognize symptom clusters, rather than relying on chest pain alone. Further research is necessary to translate our findings into clinical practice and to improve patient education and public education campaigns.
Regional Classification of Traditional Japanese Folk Songs
NASA Astrophysics Data System (ADS)
Kawase, Akihiro; Tokosumi, Akifumi
In this study, we focus on the melodies of Japanese folk songs, and examine the basic structures of Japanese folk songs that represent the characteristics of different regions. We sample the five largest song genres within the music corpora of the Nihon Min-yo Taikan (Anthology of Japanese Folk Songs), consisting of 202,246 tones from 1,794 song pieces from 45 prefectures in Japan. Then, we calculate the probabilities of 24 transition patterns that fill the interval of the perfect fourth pitch, which is the interval that maintains most of the frequency for one-step and two-step pitch transitions within 11 regions, in order to determine the parameters for cluster analysis. As a result, we successively classify the regions into two basic groups, eastern Japan and western Japan, which corresponds to geographical factors and cultural backgrounds, and also match accent distributions in the Japanese language.
Real-time observation of formation and relaxation dynamics of NH4 in (CH3OH)m(NH3)n clusters.
Yamada, Yuji; Nishino, Yoko; Fujihara, Akimasa; Ishikawa, Haruki; Fuke, Kiyokazu
2009-03-26
The formation and relaxation dynamics of NH4(CH3OH)m(NH3)n clusters produced by photolysis of ammonia-methanol mixed clusters has been observed by a time-resolved pump-probe method with femtosecond pulse lasers. From the detailed analysis of the time evolutions of the protonated cluster ions, NH4(+)(CH3OH)m(NH3)n, the kinetic model has been constructed, which consists of sequential three-step reaction: ultrafast hydrogen-atom transfer producing the radical pair (NH4-NH2)*, the relaxation process of radical-pair clusters, and dissociation of the solvated NH4 clusters. The initial hydrogen transfer hardly occurs between ammonia and methanol, implying the unfavorable formation of radical pair, (CH3OH2-NH2)*. The remarkable dependence of the time constants in each step on the number and composition of solvents has been explained by the following factors: hydrogen delocalization within the clusters, the internal conversion of the excited-state radical pair, and the stabilization of NH4 by solvation. The dependence of the time profiles on the probe wavelength is attributed to the different ionization efficiency of the NH4(CH3OH)m(NH3)n clusters.
Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I
2016-02-01
Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at-risk groups. Finally, determining predictors of comorbidity for the moderate and severe strata of these phenotypes implies a need to take these factors into account when considering obstructive sleep apnea syndrome treatment options. © 2015 The Authors. Journal of Sleep Research published by John Wiley & Sons Ltd on behalf of European Sleep Research Society.
Lara, Jose; McCrum, Leigh-Ann; Mathers, John C
2014-11-01
Health behaviours including diet, smoking, alcohol consumption, and physical activity, predict health risks at the population level. We explored health behaviours, barriers to healthy eating and self-rated health among individuals of retirement age. Study design 82 men and 124 women participated in an observational, cross-sectional online survey. Main outcome measures A 14-item Mediterranean diet score (MDPS), perceived barriers to healthy eating (PBHE), self-reported smoking, physical activity habits, and current and prior perceived health status (PHS) were assessed. A health behaviours score (HBS) including smoking, physical activity, body mass index (BMI) and MDPS was created to evaluate associations with PHS. Two-step cluster analysis identified natural groups based on PBHE. Analysis of variance was used to evaluate between group comparisons. PBHE number was associated with BMI (r=0.28, P<0.001), age (r=-0.19; P=0.006), and MDPS (r=-0.31; P<0.001). PHBE cluster analysis produced three clusters. Cluster-1 members (busy lifestyle) were significantly younger (57 years), more overweight (28kg/m(2)), scored lower on MDPS (4.7) and reported more PBHE (7). Cluster-3 members (no characteristic PBHE) were leaner (25kg/m(2)), reported the lowest number of PBHE (2), and scored higher on HBS (2.7) and MDPS (6.2). Those in PHS categories, bad/fair, good, and very good, reported mean HBS of 2.0, 2.4 and 3.0, respectively (P<0.001). Compared with the previous year, no significant associations between PHS and HBS were observed. PBHE clusters were associated with BMI, MDPS and PHS and could be a useful tool to tailor interventions for those of peri-retirement age. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
An Evaluation of Second Step: What Are the Benefits for Youth With and Without Disabilities?
ERIC Educational Resources Information Center
Sullivan, Terri N.; Sutherland, Kevin S.; Farrell, Albert D.; Taylor, Katherine A.
2015-01-01
The impact of a school-based violence prevention program, Second Step, on peer victimization and aggression, and emotion regulation was evaluated among 457 sixth graders. A cluster-randomized trial was conducted with classrooms randomly assigned to intervention (n = 14) or control (n = 14) conditions. A repeated measures analysis of covariance on…
A novel data-mining approach leveraging social media to monitor consumer opinion of sitagliptin.
Akay, Altug; Dragomir, Andrei; Erlandsson, Björn-Erik
2015-01-01
A novel data mining method was developed to gauge the experience of the drug Sitagliptin (trade name Januvia) by patients with diabetes mellitus type 2. To this goal, we devised a two-step analysis framework. Initial exploratory analysis using self-organizing maps was performed to determine structures based on user opinions among the forum posts. The results were a compilation of user's clusters and their correlated (positive or negative) opinion of the drug. Subsequent modeling using network analysis methods was used to determine influential users among the forum members. These findings can open new avenues of research into rapid data collection, feedback, and analysis that can enable improved outcomes and solutions for public health and important feedback for the manufacturer.
A holistic image segmentation framework for cloud detection and extraction
NASA Astrophysics Data System (ADS)
Shen, Dan; Xu, Haotian; Blasch, Erik; Horvath, Gregory; Pham, Khanh; Zheng, Yufeng; Ling, Haibin; Chen, Genshe
2013-05-01
Atmospheric clouds are commonly encountered phenomena affecting visual tracking from air-borne or space-borne sensors. Generally clouds are difficult to detect and extract because they are complex in shape and interact with sunlight in a complex fashion. In this paper, we propose a clustering game theoretic image segmentation based approach to identify, extract, and patch clouds. In our framework, the first step is to decompose a given image containing clouds. The problem of image segmentation is considered as a "clustering game". Within this context, the notion of a cluster is equivalent to a classical equilibrium concept from game theory, as the game equilibrium reflects both the internal and external (e.g., two-player) cluster conditions. To obtain the evolutionary stable strategies, we explore three evolutionary dynamics: fictitious play, replicator dynamics, and infection and immunization dynamics (InImDyn). Secondly, we use the boundary and shape features to refine the cloud segments. This step can lower the false alarm rate. In the third step, we remove the detected clouds and patch the empty spots by performing background recovery. We demonstrate our cloud detection framework on a video clip provides supportive results.
Systems analysis and improvement to optimize pMTCT (SAIA): a cluster randomized trial
2014-01-01
Background Despite significant increases in global health investment and the availability of low-cost, efficacious interventions to prevent mother-to-child HIV transmission (pMTCT) in low- and middle-income countries with high HIV burden, the translation of scientific advances into effective delivery strategies has been slow, uneven and incomplete. As a result, pediatric HIV infection remains largely uncontrolled. A five-step, facility-level systems analysis and improvement intervention (SAIA) was designed to maximize effectiveness of pMTCT service provision by improving understanding of inefficiencies (step one: cascade analysis), guiding identification and prioritization of low-cost workflow modifications (step two: value stream mapping), and iteratively testing and redesigning these modifications (steps three through five). This protocol describes the SAIA intervention and methods to evaluate the intervention’s impact on reducing drop-offs along the pMTCT cascade. Methods This study employs a two-arm, longitudinal cluster randomized trial design. The unit of randomization is the health facility. A total of 90 facilities were identified in Côte d’Ivoire, Kenya and Mozambique (30 per country). A subset was randomly selected and assigned to intervention and comparison arms, stratified by country and service volume, resulting in 18 intervention and 18 comparison facilities across all three countries, with six intervention and six comparison facilities per country. The SAIA intervention will be implemented for six months in the 18 intervention facilities. Primary trial outcomes are designed to assess improvements in the pMTCT service cascade, and include the percentage of pregnant women being tested for HIV at the first antenatal care visit, the percentage of HIV-infected pregnant women receiving adequate prophylaxis or combination antiretroviral therapy in pregnancy, and the percentage of newborns exposed to HIV in pregnancy receiving an HIV diagnosis eight weeks postpartum. The Consolidated Framework for Implementation Research (CFIR) will guide collection and analysis of qualitative data on implementation process. Discussion This study is a pragmatic trial that has the potential benefit of improving maternal and infant outcomes by reducing drop-offs along the pMTCT cascade. The SAIA intervention is designed to provide simple tools to guide decision-making for pMTCT program staff at the facility level, and to identify low cost, contextually appropriate pMTCT improvement strategies. Trial registration ClinicalTrials.gov NCT02023658 PMID:24885976
Systems analysis and improvement to optimize pMTCT (SAIA): a cluster randomized trial.
Sherr, Kenneth; Gimbel, Sarah; Rustagi, Alison; Nduati, Ruth; Cuembelo, Fatima; Farquhar, Carey; Wasserheit, Judith; Gloyd, Stephen
2014-05-08
Despite significant increases in global health investment and the availability of low-cost, efficacious interventions to prevent mother-to-child HIV transmission (pMTCT) in low- and middle-income countries with high HIV burden, the translation of scientific advances into effective delivery strategies has been slow, uneven and incomplete. As a result, pediatric HIV infection remains largely uncontrolled. A five-step, facility-level systems analysis and improvement intervention (SAIA) was designed to maximize effectiveness of pMTCT service provision by improving understanding of inefficiencies (step one: cascade analysis), guiding identification and prioritization of low-cost workflow modifications (step two: value stream mapping), and iteratively testing and redesigning these modifications (steps three through five). This protocol describes the SAIA intervention and methods to evaluate the intervention's impact on reducing drop-offs along the pMTCT cascade. This study employs a two-arm, longitudinal cluster randomized trial design. The unit of randomization is the health facility. A total of 90 facilities were identified in Côte d'Ivoire, Kenya and Mozambique (30 per country). A subset was randomly selected and assigned to intervention and comparison arms, stratified by country and service volume, resulting in 18 intervention and 18 comparison facilities across all three countries, with six intervention and six comparison facilities per country. The SAIA intervention will be implemented for six months in the 18 intervention facilities. Primary trial outcomes are designed to assess improvements in the pMTCT service cascade, and include the percentage of pregnant women being tested for HIV at the first antenatal care visit, the percentage of HIV-infected pregnant women receiving adequate prophylaxis or combination antiretroviral therapy in pregnancy, and the percentage of newborns exposed to HIV in pregnancy receiving an HIV diagnosis eight weeks postpartum. The Consolidated Framework for Implementation Research (CFIR) will guide collection and analysis of qualitative data on implementation process. This study is a pragmatic trial that has the potential benefit of improving maternal and infant outcomes by reducing drop-offs along the pMTCT cascade. The SAIA intervention is designed to provide simple tools to guide decision-making for pMTCT program staff at the facility level, and to identify low cost, contextually appropriate pMTCT improvement strategies. ClinicalTrials.gov NCT02023658.
NASA Astrophysics Data System (ADS)
Andryani, Diyah Septi; Bustamam, Alhadi; Lestari, Dian
2017-03-01
Clustering aims to classify the different patterns into groups called clusters. In this clustering method, we use n-mers frequency to calculate the distance matrix which is considered more accurate than using the DNA alignment. The clustering results could be used to discover biologically important sub-sections and groups of genes. Many clustering methods have been developed, while hard clustering methods considered less accurate than fuzzy clustering methods, especially if it is used for outliers data. Among fuzzy clustering methods, fuzzy c-means is one the best known for its accuracy and simplicity. Fuzzy c-means clustering uses membership function variable, which refers to how likely the data could be members into a cluster. Fuzzy c-means clustering works using the principle of minimizing the objective function. Parameters of membership function in fuzzy are used as a weighting factor which is also called the fuzzier. In this study we implement hybrid clustering using fuzzy c-means and divisive algorithm which could improve the accuracy of cluster membership compare to traditional partitional approach only. In this study fuzzy c-means is used in the first step to find partition results. Furthermore divisive algorithms will run on the second step to find sub-clusters and dendogram of phylogenetic tree. To find the best number of clusters is determined using the minimum value of Davies Bouldin Index (DBI) of the cluster results. In this research, the results show that the methods introduced in this paper is better than other partitioning methods. Finally, we found 3 clusters with DBI value of 1.126628 at first step of clustering. Moreover, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values.
Liston, Adam D; De Munck, Jan C; Hamandi, Khalid; Laufs, Helmut; Ossenblok, Pauly; Duncan, John S; Lemieux, Louis
2006-07-01
Simultaneous acquisition of EEG and fMRI data enables the investigation of the hemodynamic correlates of interictal epileptiform discharges (IEDs) during the resting state in patients with epilepsy. This paper addresses two issues: (1) the semi-automation of IED classification in statistical modelling for fMRI analysis and (2) the improvement of IED detection to increase experimental fMRI efficiency. For patients with multiple IED generators, sensitivity to IED-correlated BOLD signal changes can be improved when the fMRI analysis model distinguishes between IEDs of differing morphology and field. In an attempt to reduce the subjectivity of visual IED classification, we implemented a semi-automated system, based on the spatio-temporal clustering of EEG events. We illustrate the technique's usefulness using EEG-fMRI data from a subject with focal epilepsy in whom 202 IEDs were visually identified and then clustered semi-automatically into four clusters. Each cluster of IEDs was modelled separately for the purpose of fMRI analysis. This revealed IED-correlated BOLD activations in distinct regions corresponding to three different IED categories. In a second step, Signal Space Projection (SSP) was used to project the scalp EEG onto the dipoles corresponding to each IED cluster. This resulted in 123 previously unrecognised IEDs, the inclusion of which, in the General Linear Model (GLM), increased the experimental efficiency as reflected by significant BOLD activations. We have also shown that the detection of extra IEDs is robust in the face of fluctuations in the set of visually detected IEDs. We conclude that automated IED classification can result in more objective fMRI models of IEDs and significantly increased sensitivity.
Kristunas, Caroline A; Smith, Karen L; Gray, Laura J
2017-03-07
The current methodology for sample size calculations for stepped-wedge cluster randomised trials (SW-CRTs) is based on the assumption of equal cluster sizes. However, as is often the case in cluster randomised trials (CRTs), the clusters in SW-CRTs are likely to vary in size, which in other designs of CRT leads to a reduction in power. The effect of an imbalance in cluster size on the power of SW-CRTs has not previously been reported, nor what an appropriate adjustment to the sample size calculation should be to allow for any imbalance. We aimed to assess the impact of an imbalance in cluster size on the power of a cross-sectional SW-CRT and recommend a method for calculating the sample size of a SW-CRT when there is an imbalance in cluster size. The effect of varying degrees of imbalance in cluster size on the power of SW-CRTs was investigated using simulations. The sample size was calculated using both the standard method and two proposed adjusted design effects (DEs), based on those suggested for CRTs with unequal cluster sizes. The data were analysed using generalised estimating equations with an exchangeable correlation matrix and robust standard errors. An imbalance in cluster size was not found to have a notable effect on the power of SW-CRTs. The two proposed adjusted DEs resulted in trials that were generally considerably over-powered. We recommend that the standard method of sample size calculation for SW-CRTs be used, provided that the assumptions of the method hold. However, it would be beneficial to investigate, through simulation, what effect the maximum likely amount of inequality in cluster sizes would be on the power of the trial and whether any inflation of the sample size would be required.
Sekigami, Yuka; Kobayashi, Takuya; Omi, Ai; Nishitsuji, Koki; Ikuta, Tetsuro; Fujiyama, Asao; Satoh, Noriyuki; Saiga, Hidetoshi
2017-01-01
Hox gene clusters with at least 13 paralog group (PG) members are common in vertebrate genomes and in that of amphioxus. Ascidians, which belong to the subphylum Tunicata (Urochordata), are phylogenetically positioned between vertebrates and amphioxus, and traditionally divided into two groups: the Pleurogona and the Enterogona. An enterogonan ascidian, Ciona intestinalis ( Ci ), possesses nine Hox genes localized on two chromosomes; thus, the Hox gene cluster is disintegrated. We investigated the Hox gene cluster of a pleurogonan ascidian, Halocynthia roretzi ( Hr ) to investigate whether Hox gene cluster disintegration is common among ascidians, and if so, how such disintegration occurred during ascidian or tunicate evolution. Our phylogenetic analysis reveals that the Hr Hox gene complement comprises nine members, including one with a relatively divergent Hox homeodomain sequence. Eight of nine Hr Hox genes were orthologous to Ci-Hox1 , 2, 3, 4, 5, 10, 12 and 13. Following the phylogenetic classification into 13 PGs, we designated Hr Hox genes as Hox1, 2, 3, 4, 5, 10, 11/12/13.a , 11/12/13.b and HoxX . To address the chromosomal arrangement of the nine Hox genes, we performed two-color chromosomal fluorescent in situ hybridization, which revealed that the nine Hox genes are localized on a single chromosome in Hr , distinct from their arrangement in Ci . We further examined the order of the nine Hox genes on the chromosome by chromosome/scaffold walking. This analysis suggested a gene order of Hox1 , 11/12/13.b, 11/12/13.a, 10, 5, X, followed by either Hox4, 3, 2 or Hox2, 3, 4 on the chromosome. Based on the present results and those previously reported in Ci , we discuss the establishment of the Hox gene complement and disintegration of Hox gene clusters during the course of ascidian or tunicate evolution. The Hox gene cluster and the genome must have experienced extensive reorganization during the course of evolution from the ancestral tunicate to Hr and Ci . Nevertheless, some features are shared in Hox gene components and gene arrangement on the chromosomes, suggesting that Hox gene cluster disintegration in ascidians involved early events common to tunicates as well as later ascidian lineage-specific events.
Fe-S Cluster Hsp70 Chaperones: The ATPase Cycle and Protein Interactions.
Dutkiewicz, Rafal; Nowak, Malgorzata; Craig, Elizabeth A; Marszalek, Jaroslaw
2017-01-01
Hsp70 chaperones and their obligatory J-protein cochaperones function together in many cellular processes. Via cycles of binding to short stretches of exposed amino acids on substrate proteins, Hsp70/J-protein chaperones not only facilitate protein folding but also drive intracellular protein transport, biogenesis of cellular structures, and disassembly of protein complexes. The biogenesis of iron-sulfur (Fe-S) clusters is one of the critical cellular processes that require Hsp70/J-protein action. Fe-S clusters are ubiquitous cofactors critical for activity of proteins performing diverse functions in, for example, metabolism, RNA/DNA transactions, and environmental sensing. This biogenesis process can be divided into two sequential steps: first, the assembly of an Fe-S cluster on a conserved scaffold protein, and second, the transfer of the cluster from the scaffold to a recipient protein. The second step involves Hsp70/J-protein chaperones. Via binding to the scaffold, chaperones enable cluster transfer to recipient proteins. In eukaryotic cells mitochondria have a key role in Fe-S cluster biogenesis. In this review, we focus on methods that enabled us to dissect protein interactions critical for the function of Hsp70/J-protein chaperones in the mitochondrial process of Fe-S cluster biogenesis in the yeast Saccharomyces cerevisiae. © 2017 Elsevier Inc. All rights reserved.
Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam
2017-10-27
Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.
NASA Astrophysics Data System (ADS)
Corsaro, Enrico; Lee, Yueh-Ning; García, Rafael A.; Hennebelle, Patrick; Mathur, Savita; Beck, Paul G.; Mathis, Stephane; Stello, Dennis; Bouvier, Jérôme
2017-10-01
Stars originate by the gravitational collapse of a turbulent molecular cloud of a diffuse medium, and are often observed to form clusters. Stellar clusters therefore play an important role in our understanding of star formation and of the dynamical processes at play. However, investigating the cluster formation is diffcult because the density of the molecular cloud undergoes a change of many orders of magnitude. Hierarchical-step approaches to decompose the problem into different stages are therefore required, as well as reliable assumptions on the initial conditions in the clouds. We report for the first time the use of the full potential of NASA Kepler asteroseismic observations coupled with 3D numerical simulations, to put strong constraints on the early formation stages of open clusters. Thanks to a Bayesian peak bagging analysis of about 50 red giant members of NGC 6791 and NGC 6819, the two most populated open clusters observed in the nominal Kepler mission, we derive a complete set of detailed oscillation mode properties for each star, with thousands of oscillation modes characterized. We therefore show how these asteroseismic properties lead us to a discovery about the rotation history of stellar clusters. Finally, our observational findings will be compared with hydrodynamical simulations for stellar cluster formation to constrain the physical processes of turbulence, rotation, and magnetic fields that are in action during the collapse of the progenitor cloud into a proto-cluster.
NASA Astrophysics Data System (ADS)
Chung, Yongjin; Christwardana, Marcelinus; Tannia, Daniel Chris; Kim, Ki Jae; Kwon, Yongchai
2017-08-01
An enzyme cluster composite (TPA/GOx) formed from glucose oxidase (GOx) and terephthalaldehyde (TPA) that is coated onto polyethyleneimine (PEI) and carbon nanotubes (CNTs) is suggested as a new catalyst ([(TPA/GOx)/PEI]/CNT). In this catalyst, TPA promotes inter-GOx links by crosslinking to form a large and porous structure, and the TPA/GOx composite is again crosslinked with PEI/CNT to increase the amount of immobilized GOx. Such a two-step crosslinking (i) increases electron transfer because of electron delocalization by π conjugation and (ii) reduces GOx denaturation because of the formation of strong chemical bonds while its porosity facilitates mass transfer. With these features, an enzymatic biofuel cell (EBC) employing the new catalyst is fabricated and induces an excellent maximum power density (1.62 ± 0.08 mW cm-2), while the catalytic activity of the [(TPA/GOx)/PEI]/CNT catalyst is outstanding. This is clear evidence that the two-step crosslinking and porous structure caused by adoption of the TPA/GOx composite affect the performance enhancement of EBC.
Cluster observations of ion dispersion discontinuities in the polar cusp
NASA Astrophysics Data System (ADS)
Escoubet, C. P.; Berchem, J.; Pitout, F.; Richard, R. L.; Trattner, K. J.; Grison, B.; Taylor, M. G.; Masson, A.; Dunlop, M. W.; Dandouras, I. S.; Reme, H.; Fazakerley, A. N.
2009-12-01
The reconnection between the interplanetary magnetic field (IMF) and the Earth’s magnetic field is taking place at the magnetopause on magnetic field lines threading through the polar cusp. When the IMF is southward, reconnection occurs near the subsolar point, which is magnetically connected to the equatorward boundary of the polar cusp. Subsequently the ions injected through the reconnection point precipitate in the cusp and are dispersed poleward. If reconnection is continuous and operates at constant rate, the ion dispersion is smooth and continuous. On the other hand if the reconnection rate varies, we expect interruption in the dispersion forming energy steps or staircase. Similarly, multiple entries near the magnetopause could also produce steps at low or mid-altitude when a spacecraft is crossing subsequently the field lines originating from these multiple sources. In addition, motion of the magnetopause induced by solar wind pressure changes or erosion due to reconnection can also induce a motion of the polar cusp and a disruption of the ions dispersion observed by a spacecraft. Cluster with four spacecraft following each other in the mid-altitude cusp can be used to distinguish between these “temporal” and “spatial” effects. We will present a cusp crossing with two spacecraft, separated by around two minutes. The two spacecraft observed a very similar dispersion with a step in energy in its centre and two other dispersions poleward. We will show that the steps could be temporal (assuming that the time between two reconnection bursts corresponds to the time delay between the two spacecraft) but it would be a fortuitous coincidence. On the other hand the steps and the two poleward dispersions could be explained by spatial effects if we take into account the motion of the open-closed boundary between the two spacecraft crossings.
Debelle, Aurelien; Boulle, Alexandre; Chartier, Alain; ...
2014-11-25
We present a combination of experimental and computational evaluations of disorder level and lattice swelling in ion-irradiated materials. Information obtained from X-ray diffraction experiments is compared to X-ray diffraction data generated using atomic-scale simulations. The proposed methodology, which can be applied to a wide range of crystalline materials, is used to study the amorphization process in irradiated SiC. Results show that this process can be divided into two steps. In the first step, point defects and small defect clusters are produced and generate both large lattice swelling and high elastic energy. In the second step, enhanced coalescence of defects andmore » defect clusters occurs to limit this increase in energy, which rapidly leads to complete amorphization.« less
Border preserving skin lesion segmentation
NASA Astrophysics Data System (ADS)
Kamali, Mostafa; Samei, Golnoosh
2008-03-01
Melanoma is a fatal cancer with a growing incident rate. However it could be cured if diagnosed in early stages. The first step in detecting melanoma is the separation of skin lesion from healthy skin. There are particular features associated with a malignant lesion whose successful detection relies upon accurately extracted borders. We propose a two step approach. First, we apply K-means clustering method (to 3D RGB space) that extracts relatively accurate borders. In the second step we perform an extra refining step for detecting the fading area around some lesions as accurately as possible. Our method has a number of novelties. Firstly as the clustering method is directly applied to the 3D color space, we do not overlook the dependencies between different color channels. In addition, it is capable of extracting fine lesion borders up to pixel level in spite of the difficulties associated with fading areas around the lesion. Performing clustering in different color spaces reveals that 3D RGB color space is preferred. The application of the proposed algorithm to an extensive data-base of skin lesions shows that its performance is superior to that of existing methods both in terms of accuracy and computational complexity.
Photon-Induced Thermal Desorption of CO from Small Metal-Carbonyl Clusters
NASA Astrophysics Data System (ADS)
Lüttgens, G.; Pontius, N.; Bechthold, P. S.; Neeb, M.; Eberhardt, W.
2002-02-01
Thermal CO desorption from photoexcited free metal-carbonyl clusters has been resolved in real time using two-color pump-probe photoelectron spectroscopy. Sequential energy dissipation steps between the initial photoexcitation and the final desorption event, e.g., electron relaxation and thermalization, have been resolved for Au2(CO)- and Pt2(CO)-5. The desorption rates for the two clusters differ considerably due to the different numbers of vibrational degrees of freedom. The unimolecular CO-desorption thresholds of Au2(CO)- and Pt2(CO)-5 have been approximated by means of a statistical Rice-Ramsperger-Kassel calculation using the experimentally derived desorption rate constants.
Wang, Jiang; Yu, Yi; Tang, Kexuan; Liu, Wen; He, Xinyi; Huang, Xi; Deng, Zixin
2010-01-01
Thiopeptide antibiotics are an important class of natural products resulting from posttranslational modifications of ribosomally synthesized peptides. Cyclothiazomycin is a typical thiopeptide antibiotic that has a unique bridged macrocyclic structure derived from an 18-amino-acid structural peptide. Here we reported cloning, sequencing, and heterologous expression of the cyclothiazomycin biosynthetic gene cluster from Streptomyces hygroscopicus 10-22. Remarkably, successful heterologous expression of a 22.7-kb gene cluster in Streptomyces lividans 1326 suggested that there is a minimum set of 15 open reading frames that includes all of the functional genes required for cyclothiazomycin production. Six genes of these genes, cltBCDEFG flanking the structural gene cltA, were predicted to encode the enzymes required for the main framework of cyclothiazomycin, and two enzymes encoded by a putative operon, cltMN, were hypothesized to participate in the tailoring step to generate the tertiary thioether, leading to the final cyclization of the bridged macrocyclic structure. This rigorous bioinformatics analysis based on heterologous expression of cyclothiazomycin resulted in an ideal biosynthetic model for us to understand the biosynthesis of thiopeptides. PMID:20154110
Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.
2009-01-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N
2009-06-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ma, Xiang; Zhang, Shuai; Jiao, Fang
Two-step nucleation pathways in which disordered, amorphous, or dense liquid states precede appearance of crystalline phases have been reported for a wide range of materials, but the dynamics of such pathways are poorly understood. Moreover, whether these pathways are general features of crystallizing systems or a consequence of system-specific structural details that select for direct vs two-step processes is unknown. Using atomic force microscopy to directly observe crystallization of sequence-defined polymers, we show that crystallization pathways are indeed sequence dependent. When a short hydrophobic region is added to a sequence that directly forms crystalline particles, crystallization instead follows a two-stepmore » pathway that begins with creation of disordered clusters of 10-20 molecules and is characterized by highly non-linear crystallization kinetics in which clusters transform into ordered structures that then enter the growth phase. The results shed new light on non-classical crystallization mechanisms and have implications for design of self-assembling polymer systems.« less
VizieR Online Data Catalog: Slug analysis of star clusters in NGC 628 & 7793 (Krumholz+, 2015)
NASA Astrophysics Data System (ADS)
Krumholz, M. R.; Adamo, A.; Fumagalli, M.; Wofford, A.; Calzetti, D.; Lee, J. C.; Whitmore, B. C.; Bright, S. N.; Grasha, K.; Gouliermis, D. A.; Kim, H.; Nair, P.; Ryon, J. E.; Smith, L. J.; Thilker, D.; Ubeda, L.; Zackrisson, E.
2016-02-01
In this paper we use slug, the Stochastically Lighting Up Galaxies code (da Silva et al. 2012ApJ...745..145D, 2014MNRAS.444.3275D; Krumholz et al. 2015MNRAS.452.1447K), and its post-processing tool for analysis of star cluster properties, cluster_slug, to analyze an initial sample of clusters from the LEGUS (Calzetti et al. 2015AJ....149...51C). A description of the steps required to produce final cluster catalogs of the Legacy Extragalactic UV Survey (LEGUS) targets can be found in Calzetti et al. (2015AJ....149...51C), and in A. Adamo et al. (2015, in preparation). LEGUS is an HST Cycle 21 Treasury program that is imaging 50 nearby galaxies in five broadbands with the WFC3/UVIS, from the NUV to the I band. (1 data file).
Thompson, Jennifer A; Fielding, Katherine; Hargreaves, James; Copas, Andrew
2017-12-01
Background/Aims We sought to optimise the design of stepped wedge trials with an equal allocation of clusters to sequences and explored sample size comparisons with alternative trial designs. Methods We developed a new expression for the design effect for a stepped wedge trial, assuming that observations are equally correlated within clusters and an equal number of observations in each period between sequences switching to the intervention. We minimised the design effect with respect to (1) the fraction of observations before the first and after the final sequence switches (the periods with all clusters in the control or intervention condition, respectively) and (2) the number of sequences. We compared the design effect of this optimised stepped wedge trial to the design effects of a parallel cluster-randomised trial, a cluster-randomised trial with baseline observations, and a hybrid trial design (a mixture of cluster-randomised trial and stepped wedge trial) with the same total cluster size for all designs. Results We found that a stepped wedge trial with an equal allocation to sequences is optimised by obtaining all observations after the first sequence switches and before the final sequence switches to the intervention; this means that the first sequence remains in the control condition and the last sequence remains in the intervention condition for the duration of the trial. With this design, the optimal number of sequences is [Formula: see text], where [Formula: see text] is the cluster-mean correlation, [Formula: see text] is the intracluster correlation coefficient, and m is the total cluster size. The optimal number of sequences is small when the intracluster correlation coefficient and cluster size are small and large when the intracluster correlation coefficient or cluster size is large. A cluster-randomised trial remains more efficient than the optimised stepped wedge trial when the intracluster correlation coefficient or cluster size is small. A cluster-randomised trial with baseline observations always requires a larger sample size than the optimised stepped wedge trial. The hybrid design can always give an equally or more efficient design, but will be at most 5% more efficient. We provide a strategy for selecting a design if the optimal number of sequences is unfeasible. For a non-optimal number of sequences, the sample size may be reduced by allowing a proportion of observations before the first or after the final sequence has switched. Conclusion The standard stepped wedge trial is inefficient. To reduce sample sizes when a hybrid design is unfeasible, stepped wedge trial designs should have no observations before the first sequence switches or after the final sequence switches.
Scott, JoAnna M; deCamp, Allan; Juraska, Michal; Fay, Michael P; Gilbert, Peter B
2017-04-01
Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
Vinholes, Daniele Botelho; Assunção, Maria Cecília Formoso; Neutzling, Marilda Borges
2009-04-01
This study aimed to measure frequency of healthy eating habits and associated factors using the 10 Steps to Healthy Eating score proposed by the Ministry of Health in the adult population in Pelotas, Rio Grande do Sul State, Brazil. A cross-sectional population-based survey was conducted on a cluster sample of 3,136 adult residents in Pelotas. The frequency of each step to healthy eating was collected with a pre-coded questionnaire. Data analysis consisted of descriptive analysis, followed by bivariate analysis using the chi-square test. Only 1.1% of the population followed all the recommended steps. The average number of steps was six. Step four, salt intake, showed the highest frequency, while step nine, physical activity, showed the lowest. Knowledge of the population's eating habits and their distribution according to demographic and socioeconomic variables is important to guide local and national strategies to promote healthy eating habits and thus improve quality of life.
Clustered-dot halftoning with direct binary search.
Goyal, Puneet; Gupta, Madhur; Staelin, Carl; Fischer, Mani; Shacham, Omri; Allebach, Jan P
2013-02-01
In this paper, we present a new algorithm for aperiodic clustered-dot halftoning based on direct binary search (DBS). The DBS optimization framework has been modified for designing clustered-dot texture, by using filters with different sizes in the initialization and update steps of the algorithm. Following an intuitive explanation of how the clustered-dot texture results from this modified framework, we derive a closed-form cost metric which, when minimized, equivalently generates stochastic clustered-dot texture. An analysis of the cost metric and its influence on the texture quality is presented, which is followed by a modification to the cost metric to reduce computational cost and to make it more suitable for screen design.
Gur-Ozmen, S; Mula, M; Agrawal, N; Cock, H R; Lozsadi, D; von Oertzen, T J
2017-09-01
People with epilepsy are at increased risk of accidents and injuries but, despite several studies on this subject, data regarding preventable causes are still contradictory. The aim of this study was to investigate the relationship between injuries, side effects of antiepileptic drugs (AEDs) and depression. Data from a consecutive sample of adult patients with epilepsy attending the outpatient clinics at St George's University Hospital in London were included. All patients were asked if they had had any injury since the last clinic appointment and completed the Liverpool Adverse Event Profile (LAEP) and Neurological Disorders Depression Inventory for Epilepsy. Among 407 patients (243 females, mean age 43.1 years), 71 (17.4%) reported injuries since the last appointment. A two-step cluster analysis revealed two clusters with the major cluster (53.5% of the injured group) showing a total score for LAEP ≥45, a positive Neurological Disorders Depression Inventory for Epilepsy screening and presence of AED polytherapy. A total score for LAEP ≥45 was the most important predictor. Antiepileptic drug treatment should be reviewed in patients reporting injuries in order to evaluate the potential contribution and burden of AED side effects. © 2017 EAN.
Software forecasting as it is really done: A study of JPL software engineers
NASA Technical Reports Server (NTRS)
Griesel, Martha Ann; Hihn, Jairus M.; Bruno, Kristin J.; Fouser, Thomas J.; Tausworthe, Robert C.
1993-01-01
This paper presents a summary of the results to date of a Jet Propulsion Laboratory internally funded research task to study the costing process and parameters used by internally recognized software cost estimating experts. Protocol Analysis and Markov process modeling were used to capture software engineer's forecasting mental models. While there is significant variation between the mental models that were studied, it was nevertheless possible to identify a core set of cost forecasting activities, and it was also found that the mental models cluster around three forecasting techniques. Further partitioning of the mental models revealed clustering of activities, that is very suggestive of a forecasting lifecycle. The different forecasting methods identified were based on the use of multiple-decomposition steps or multiple forecasting steps. The multiple forecasting steps involved either forecasting software size or an additional effort forecast. Virtually no subject used risk reduction steps in combination. The results of the analysis include: the identification of a core set of well defined costing activities, a proposed software forecasting life cycle, and the identification of several basic software forecasting mental models. The paper concludes with a discussion of the implications of the results for current individual and institutional practices.
Scoring clustering solutions by their biological relevance.
Gat-Viks, I; Sharan, R; Shamir, R
2003-12-12
A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering gene expression data into homogeneous groups was shown to be instrumental in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on clustering algorithms for gene expression analysis, very few works addressed the systematic comparison and evaluation of clustering results. Typically, different clustering algorithms yield different clustering solutions on the same data, and there is no agreed upon guideline for choosing among them. We developed a novel statistically based method for assessing a clustering solution according to prior biological knowledge. Our method can be used to compare different clustering solutions or to optimize the parameters of a clustering algorithm. The method is based on projecting vectors of biological attributes of the clustered elements onto the real line, such that the ratio of between-groups and within-group variance estimators is maximized. The projected data are then scored using a non-parametric analysis of variance test, and the score's confidence is evaluated. We validate our approach using simulated data and show that our scoring method outperforms several extant methods, including the separation to homogeneity ratio and the silhouette measure. We apply our method to evaluate results of several clustering methods on yeast cell-cycle gene expression data. The software is available from the authors upon request.
An application of data mining in district heating substations for improving energy performance
NASA Astrophysics Data System (ADS)
Xue, Puning; Zhou, Zhigang; Chen, Xin; Liu, Jing
2017-11-01
Automatic meter reading system is capable of collecting and storing a huge number of district heating (DH) data. However, the data obtained are rarely fully utilized. Data mining is a promising technology to discover potential interesting knowledge from vast data. This paper applies data mining methods to analyse the massive data for improving energy performance of DH substation. The technical approach contains three steps: data selection, cluster analysis and association rule mining (ARM). Two-heating-season data of a substation are used for case study. Cluster analysis identifies six distinct heating patterns based on the primary heat of the substation. ARM reveals that secondary pressure difference and secondary flow rate have a strong correlation. Using the discovered rules, a fault occurring in remote flow meter installed at secondary network is detected accurately. The application demonstrates that data mining techniques can effectively extrapolate potential useful knowledge to better understand substation operation strategies and improve substation energy performance.
Perche-Letuvée, Phanélie; Kathirvelu, Velavan; Berggren, Gustav; Clemancey, Martin; Latour, Jean-Marc; Maurel, Vincent; Douki, Thierry; Armengaud, Jean; Mulliez, Etienne; Fontecave, Marc; Garcia-Serres, Ricardo; Gambarelli, Serge; Atta, Mohamed
2012-01-01
Wybutosine and its derivatives are found in position 37 of tRNA encoding Phe in eukaryotes and archaea. They are believed to play a key role in the decoding function of the ribosome. The second step in the biosynthesis of wybutosine is catalyzed by TYW1 protein, which is a member of the well established class of metalloenzymes called “Radical-SAM.” These enzymes use a [4Fe-4S] cluster, chelated by three cysteines in a CX3CX2C motif, and S-adenosyl-l-methionine (SAM) to generate a 5′-deoxyadenosyl radical that initiates various chemically challenging reactions. Sequence analysis of TYW1 proteins revealed, in the N-terminal half of the enzyme beside the Radical-SAM cysteine triad, an additional highly conserved cysteine motif. In this study we show by combining analytical and spectroscopic methods including UV-visible absorption, Mössbauer, EPR, and HYSCORE spectroscopies that these additional cysteines are involved in the coordination of a second [4Fe-4S] cluster displaying a free coordination site that interacts with pyruvate, the second substrate of the reaction. The presence of two distinct iron-sulfur clusters on TYW1 is reminiscent of MiaB, another tRNA-modifying metalloenzyme whose active form was shown to bind two iron-sulfur clusters. A possible role for the second [4Fe-4S] cluster in the enzyme activity is discussed. PMID:23043105
The X-CLASS-redMaPPer galaxy cluster comparison. I. Identification procedures
NASA Astrophysics Data System (ADS)
Sadibekova, T.; Pierre, M.; Clerc, N.; Faccioli, L.; Gastaud, R.; Le Fevre, J.-P.; Rozo, E.; Rykoff, E.
2014-11-01
Context. This paper is the first in a series undertaking a comprehensive correlation analysis between optically selected and X-ray-selected cluster catalogues. The rationale of the project is to develop a holistic picture of galaxy clusters utilising optical and X-ray-cluster-selected catalogues with well-understood selection functions. Aims: Unlike most of the X-ray/optical cluster correlations to date, the present paper focuses on the non-matching objects in either waveband. We investigate how the differences observed between the optical and X-ray catalogues may stem from (1) a shortcoming of the detection algorithms; (2) dispersion in the X-ray/optical scaling relations; or (3) substantial intrinsic differences between the cluster populations probed in the X-ray and optical bands. The aim is to inventory and elucidate these effects in order to account for selection biases in the further determination of X-ray/optical cluster scaling relations. Methods: We correlated the X-CLASS serendipitous cluster catalogue extracted from the XMM archive with the redMaPPer optical cluster catalogue derived from the Sloan Digital Sky Survey (DR8). We performed a detailed and, in large part, interactive analysis of the matching output from the correlation. The overlap between the two catalogues has been accurately determined and possible cluster positional errors were manually recovered. The final samples comprise 270 and 355 redMaPPer and X-CLASS clusters, respectively. X-ray cluster matching rates were analysed as a function of optical richness. In the second step, the redMaPPer clusters were correlated with the entire X-ray catalogue, containing point and uncharacterised sources (down to a few 10-15 erg s-1 cm-2 in the [0.5-2] keV band). A stacking analysis was performed for the remaining undetected optical clusters. Results: We find that all rich (λ ≥ 80) clusters are detected in X-rays out to z = 0.6. Below this redshift, the richness threshold for X-ray detection steadily decreases with redshift. Likewise, all X-ray bright clusters are detected by redMaPPer. After correcting for obvious pipeline shortcomings (about 10% of the cases both in optical and X-ray), ~50% of the redMaPPer (down to a richness of 20) are found to coincide with an X-CLASS cluster; when considering X-ray sources of any type, this fraction increases to ~80%; for the remaining objects, the stacking analysis finds a weak signal within 0.5 Mpc around the cluster optical centres. The fraction of clusters totally dominated by AGN-type emission appears to be a few percent. Conversely, ~40% of the X-CLASS clusters are identified with a redMaPPer (down to a richness of 20) - part of the non-matches being due to the X-CLASS sample extending further out than redMaPPer (z< 1.5 vs. z< 0.6), but extending the correlation down to a richness of 5 raises the matching rate to ~65%. Conclusions: This state-of-the-art study involving two well-validated cluster catalogues has shown itself to be complex, and it points to a number of issues inherent to blind cross-matching, owing both to pipeline shortcomings and cluster peculiar properties. These can only been accounted for after a manual check. The combined X-ray and optical scaling relations will be presented in a subsequent article.
Post-genome research on the biosynthesis of ergot alkaloids.
Li, Shu-Ming; Unsöld, Inge A
2006-10-01
Genome sequencing provides new opportunities and challenges for identifying genes for the biosynthesis of secondary metabolites. A putative biosynthetic gene cluster of fumigaclavine C, an ergot alkaloid of the clavine type, was identified in the genome sequence of ASPERGILLUS FUMIGATUS by a bioinformatic approach. This cluster spans 22 kb of genomic DNA and comprises at least 11 open reading frames (ORFs). Seven of them are orthologous to genes from the biosynthetic gene cluster of ergot alkaloids in CLAVICEPS PURPUREA. Experimental evidence of the identified cluster was provided by heterologous expression and biochemical characterization of two ORFs, FgaPT1 and FgaPT2, in the cluster of A. FUMIGATUS, which show remarkable similarities to dimethylallyltryptophan synthase from C. PURPUREA and function as prenyltransferases. FgaPT2 converts L-tryptophan to dimethylallyltryptophan and thereby catalyzes the first step of ergot alkaloid biosynthesis, whilst FgaPT1 catalyzes the last step of the fumigaclavine C biosynthesis, i. e., the prenylation of fumigaclavine A at C-2 position of the indole nucleus. In addition to information obtained from the gene cluster of ergot alkaloids from C. PURPUREA, the identification of the biosynthetic gene cluster of fumigaclavine C in A. FUMIGATUS opens an alternative way to study the biosynthesis of ergot alkaloids in fungi.
Pulley, Simon; Foster, Ian; Collins, Adrian L
2017-06-01
The objective classification of sediment source groups is at present an under-investigated aspect of source tracing studies, which has the potential to statistically improve discrimination between sediment sources and reduce uncertainty. This paper investigates this potential using three different source group classification schemes. The first classification scheme was simple surface and subsurface groupings (Scheme 1). The tracer signatures were then used in a two-step cluster analysis to identify the sediment source groupings naturally defined by the tracer signatures (Scheme 2). The cluster source groups were then modified by splitting each one into a surface and subsurface component to suit catchment management goals (Scheme 3). The schemes were tested using artificial mixtures of sediment source samples. Controlled corruptions were made to some of the mixtures to mimic the potential causes of tracer non-conservatism present when using tracers in natural fluvial environments. It was determined how accurately the known proportions of sediment sources in the mixtures were identified after unmixing modelling using the three classification schemes. The cluster analysis derived source groups (2) significantly increased tracer variability ratios (inter-/intra-source group variability) (up to 2122%, median 194%) compared to the surface and subsurface groupings (1). As a result, the composition of the artificial mixtures was identified an average of 9.8% more accurately on the 0-100% contribution scale. It was found that the cluster groups could be reclassified into a surface and subsurface component (3) with no significant increase in composite uncertainty (a 0.1% increase over Scheme 2). The far smaller effects of simulated tracer non-conservatism for the cluster analysis based schemes (2 and 3) was primarily attributed to the increased inter-group variability producing a far larger sediment source signal that the non-conservatism noise (1). Modified cluster analysis based classification methods have the potential to reduce composite uncertainty significantly in future source tracing studies. Copyright © 2016 Elsevier Ltd. All rights reserved.
Netz, Daili J. A.; Pierik, Antonio J.; Stümpfig, Martin; Bill, Eckhard; Sharma, Anil K.; Pallesen, Leif J.; Walden, William E.; Lill, Roland
2012-01-01
The essential P-loop NTPases Cfd1 and Nbp35 of the cytosolic iron-sulfur (Fe-S) protein assembly machinery perform a scaffold function for Fe-S cluster synthesis. Both proteins contain a nucleotide binding motif of unknown function and a C-terminal motif with four conserved cysteine residues. The latter motif defines the Mrp/Nbp35 subclass of P-loop NTPases and is suspected to be involved in transient Fe-S cluster binding. To elucidate the function of these two motifs, we first created cysteine mutant proteins of Cfd1 and Nbp35 and investigated the consequences of these mutations by genetic, cell biological, biochemical, and spectroscopic approaches. The two central cysteine residues (CPXC) of the C-terminal motif were found to be crucial for cell viability, protein function, coordination of a labile [4Fe-4S] cluster, and Cfd1-Nbp35 hetero-tetramer formation. Surprisingly, the two proximal cysteine residues were dispensable for all these functions, despite their strict evolutionary conservation. Several lines of evidence suggest that the C-terminal CPXC motifs of Cfd1-Nbp35 coordinate a bridging [4Fe-4S] cluster. Upon mutation of the nucleotide binding motifs Fe-S clusters could no longer be assembled on these proteins unless wild-type copies of Cfd1 and Nbp35 were present in trans. This result indicated that Fe-S cluster loading on these scaffold proteins is a nucleotide-dependent step. We propose that the bridging coordination of the C-terminal Fe-S cluster may be ideal for its facile assembly, labile binding, and efficient transfer to target Fe-S apoproteins, a step facilitated by the cytosolic iron-sulfur (Fe-S) protein assembly proteins Nar1 and Cia1 in vivo. PMID:22362766
A cyber-event correlation framework and metrics
NASA Astrophysics Data System (ADS)
Kang, Myong H.; Mayfield, Terry
2003-08-01
In this paper, we propose a cyber-event fusion, correlation, and situation assessment framework that, when instantiated, will allow cyber defenders to better understand the local, regional, and global cyber-situation. This framework, with associated metrics, can be used to guide assessment of our existing cyber-defense capabilities, and to help evaluate the state of cyber-event correlation research and where we must focus our future cyber-event correlation research. The framework, based on the cyber-event gathering activities and analysis functions, consists of five operational steps, each of which provides a richer set of contextual information to support greater situational understanding. The first three steps are categorically depicted as increasingly richer and broader-scoped contexts achieved through correlation activity, while in the final two steps, these richer contexts are achieved through analytical activities (situation assessment, and threat analysis & prediction). Category 1 Correlation focuses on the detection of suspicious activities and the correlation of events from a single cyber-event source. Category 2 Correlation clusters the same or similar events from multiple detectors that are located at close proximity and prioritizes them. Finally, the events from different time periods and event sources at different location/regions are correlated at Category 3 to recognize the relationship among different events. This is the category that focuses on the detection of large-scale and coordinated attacks. The situation assessment step (Category 4) focuses on the assessment of cyber asset damage and the analysis of the impact on missions. The threat analysis and prediction step (Category 5) analyzes attacks based on attack traces and predicts the next steps. Metrics that can distinguish correlation and cyber-situation assessment tools for each category are also proposed.
A Hierarchical Framework for State-Space Matrix Inference and Clustering.
Zuo, Chandler; Chen, Kailei; Hewitt, Kyle J; Bresnick, Emery H; Keleş, Sündüz
2016-09-01
In recent years, a large number of genomic and epigenomic studies have been focusing on the integrative analysis of multiple experimental datasets measured over a large number of observational units. The objectives of such studies include not only inferring a hidden state of activity for each unit over individual experiments, but also detecting highly associated clusters of units based on their inferred states. Although there are a number of methods tailored for specific datasets, there is currently no state-of-the-art modeling framework for this general class of problems. In this paper, we develop the MBASIC ( M atrix B ased A nalysis for S tate-space I nference and C lustering) framework. MBASIC consists of two parts: state-space mapping and state-space clustering. In state-space mapping, it maps observations onto a finite state-space, representing the activation states of units across conditions. In state-space clustering, MBASIC incorporates a finite mixture model to cluster the units based on their inferred state-space profiles across all conditions. Both the state-space mapping and clustering can be simultaneously estimated through an Expectation-Maximization algorithm. MBASIC flexibly adapts to a large number of parametric distributions for the observed data, as well as the heterogeneity in replicate experiments. It allows for imposing structural assumptions on each cluster, and enables model selection using information criterion. In our data-driven simulation studies, MBASIC showed significant accuracy in recovering both the underlying state-space variables and clustering structures. We applied MBASIC to two genome research problems using large numbers of datasets from the ENCODE project. The first application grouped genes based on transcription factor occupancy profiles of their promoter regions in two different cell types. The second application focused on identifying groups of loci that are similar to a GATA2 binding site that is functional at its endogenous locus by utilizing transcription factor occupancy data and illustrated applicability of MBASIC in a wide variety of problems. In both studies, MBASIC showed higher levels of raw data fidelity than analyzing these data with a two-step approach using ENCODE results on transcription factor occupancy data.
NASA Astrophysics Data System (ADS)
Chen, Y.; Ho, C.; Chang, L.
2011-12-01
In previous decades, the climate change caused by global warming increases the occurrence frequency of extreme hydrological events. Water supply shortages caused by extreme events create great challenges for water resource management. To evaluate future climate variations, general circulation models (GCMs) are the most wildly known tools which shows possible weather conditions under pre-defined CO2 emission scenarios announced by IPCC. Because the study area of GCMs is the entire earth, the grid sizes of GCMs are much larger than the basin scale. To overcome the gap, a statistic downscaling technique can transform the regional scale weather factors into basin scale precipitations. The statistic downscaling technique can be divided into three categories include transfer function, weather generator and weather type. The first two categories describe the relationships between the weather factors and precipitations respectively based on deterministic algorithms, such as linear or nonlinear regression and ANN, and stochastic approaches, such as Markov chain theory and statistical distributions. In the weather type, the method has ability to cluster weather factors, which are high dimensional and continuous variables, into weather types, which are limited number of discrete states. In this study, the proposed downscaling model integrates the weather type, using the K-means clustering algorithm, and the weather generator, using the kernel density estimation. The study area is Shihmen basin in northern of Taiwan. In this study, the research process contains two steps, a calibration step and a synthesis step. Three sub-steps were used in the calibration step. First, weather factors, such as pressures, humidities and wind speeds, obtained from NCEP and the precipitations observed from rainfall stations were collected for downscaling. Second, the K-means clustering grouped the weather factors into four weather types. Third, the Markov chain transition matrixes and the conditional probability density function (PDF) of precipitations approximated by the kernel density estimation are calculated respectively for each weather types. In the synthesis step, 100 patterns of synthesis data are generated. First, the weather type of the n-th day are determined by the results of K-means clustering. The associated transition matrix and PDF of the weather type were also determined for the usage of the next sub-step in the synthesis process. Second, the precipitation condition, dry or wet, can be synthesized basing on the transition matrix. If the synthesized condition is dry, the quantity of precipitation is zero; otherwise, the quantity should be further determined in the third sub-step. Third, the quantity of the synthesized precipitation is assigned as the random variable of the PDF defined above. The synthesis efficiency compares the gap of the monthly mean curves and monthly standard deviation curves between the historical precipitation data and the 100 patterns of synthesis data.
Ligand Rearrangements at Fe/S Cofactors: Slow Isomerization of a Biomimetic [2Fe-2S] Cluster.
Bergner, Marie; Roy, Lisa; Dechert, Sebastian; Neese, Frank; Ye, Shengfa; Meyer, Franc
2017-04-18
Ligand exchange plays an important role in the biogenesis of Fe/S clusters, most prominently during cluster transfer from a scaffold protein to its target protein. Although in vivo and in vitro studies have provided some insight into this process, the microscopic details of the ligand exchange steps are mostly unknown. In this work, the kinetics of the ligand rearrangement in a biomimetic [2Fe-2S] cluster with mixed S/N capping ligands have been studied. Two geometrical isomers of the cluster are present in solution, and mechanistic insight into the isomerization process was obtained by variable-temperature 1 H NMR spectroscopy. Combined experimental and computational results reveal that this is an associative process that involves the coordination of a solvent molecule to one of the ferric ions. The cluster isomerizes at least two orders of magnitude faster in its protonated and mixed-valent states. These findings may contribute to a deeper understanding of cluster transfer and sensing processes occurring in Fe/S cluster biogenesis. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
A novel framework for feature extraction in multi-sensor action potential sorting.
Wu, Shun-Chi; Swindlehurst, A Lee; Nenadic, Zoran
2015-09-30
Extracellular recordings of multi-unit neural activity have become indispensable in neuroscience research. The analysis of the recordings begins with the detection of the action potentials (APs), followed by a classification step where each AP is associated with a given neural source. A feature extraction step is required prior to classification in order to reduce the dimensionality of the data and the impact of noise, allowing source clustering algorithms to work more efficiently. In this paper, we propose a novel framework for multi-sensor AP feature extraction based on the so-called Matched Subspace Detector (MSD), which is shown to be a natural generalization of standard single-sensor algorithms. Clustering using both simulated data and real AP recordings taken in the locust antennal lobe demonstrates that the proposed approach yields features that are discriminatory and lead to promising results. Unlike existing methods, the proposed algorithm finds joint spatio-temporal feature vectors that match the dominant subspace observed in the two-dimensional data without needs for a forward propagation model and AP templates. The proposed MSD approach provides more discriminatory features for unsupervised AP sorting applications. Copyright © 2015 Elsevier B.V. All rights reserved.
Gravitational lensing by clusters of galaxies - Constraining the mass distribution
NASA Technical Reports Server (NTRS)
Miralda-Escude, Jordi
1991-01-01
The possibility of placing constraints on the mass distribution of a cluster of galaxies by analyzing the cluster's gravitational lensing effect on the images of more distant galaxies is investigated theoretically in the limit of weak distortion. The steps in the proposed analysis are examined in detail, and it is concluded that detectable distortion can be produced by clusters with line-of-sight velocity dispersions of over 500 km/sec. Hence it should be possible to determine (1) the cluster center position (with accuracy equal to the mean separation of the background galaxies), (2) the cluster-potential quadrupole moment (to within about 20 percent of the total potential if velocity dispersion is 1000 km/sec), and (3) the power law for the outer-cluster density profile (if enough background galaxies in the surrounding region are observed).
An extended affinity propagation clustering method based on different data density types.
Zhao, XiuLi; Xu, WeiXiang
2015-01-01
Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.
A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong
2012-01-01
Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.
Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun
2017-01-01
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
Critical behavior of a two-step contagion model with multiple seeds
NASA Astrophysics Data System (ADS)
Choi, Wonjun; Lee, Deokjae; Kahng, B.
2017-06-01
A two-step contagion model with a single seed serves as a cornerstone for understanding the critical behaviors and underlying mechanism of discontinuous percolation transitions induced by cascade dynamics. When the contagion spreads from a single seed, a cluster of infected and recovered nodes grows without any cluster merging process. However, when the contagion starts from multiple seeds of O (N ) where N is the system size, a node weakened by a seed can be infected more easily when it is in contact with another node infected by a different pathogen seed. This contagion process can be viewed as a cluster merging process in a percolation model. Here we show analytically and numerically that when the density of infectious seeds is relatively small but O (1 ) , the epidemic transition is hybrid, exhibiting both continuous and discontinuous behavior, whereas when it is sufficiently large and reaches a critical point, the transition becomes continuous. We determine the full set of critical exponents describing the hybrid and the continuous transitions. Their critical behaviors differ from those in the single-seed case.
Helium cluster isolation spectroscopy
NASA Astrophysics Data System (ADS)
Higgins, John Paul
Clusters of helium, each containing ~103- 104 atoms, are produced in a molecular beam and are doped with alkali metal atoms (Li, Na, and K) and large organic molecules. Electronic spectroscopy in the visible and UV regions of the spectrum is carried out on the dopant species. Since large helium clusters are liquid and attain an equilibrium internal temperature of 0.4 K, they interact weakly with atoms or molecules absorbed on their surface or resident inside the cluster. The spectra that are obtained are characterized by small frequency shifts from the positions of the gas phase transitions, narrow lines, and cold vibrational temperatures. Alkali atoms aggregate on the helium cluster surface to form dimers and trimers. The spectra of singlet alkali dimers exhibit the presence of elementary excitations in the superfluid helium cluster matrix. It is found that preparation of the alkali molecules on the surface of helium clusters leads to the preferential formation of high-spin, van der Waals bound, triplet dimers and quartet trimers. Four bound-bound and two bound-free transitions are observed in the triplet manifold of the alkali dimers. The quartet trimers serve as an ideal system for the study of a simple unimolecular reaction in the cold helium cluster environment. Analysis of the lowest quartet state provides valuable insight into three-body forces in a van der Waals trimer. The wide range of atomic and molecular systems studied in this thesis constitutes a preliminary step in the development of helium cluster isolation spectroscopy, a hybrid technique combining the advantages of high resolution spectroscopy with the synthetic, low temperature environment of matrices.
Universal dynamical properties preclude standard clustering in a large class of biochemical data.
Gomez, Florian; Stoop, Ralph L; Stoop, Ruedi
2014-09-01
Clustering of chemical and biochemical data based on observed features is a central cognitive step in the analysis of chemical substances, in particular in combinatorial chemistry, or of complex biochemical reaction networks. Often, for reasons unknown to the researcher, this step produces disappointing results. Once the sources of the problem are known, improved clustering methods might revitalize the statistical approach of compound and reaction search and analysis. Here, we present a generic mechanism that may be at the origin of many clustering difficulties. The variety of dynamical behaviors that can be exhibited by complex biochemical reactions on variation of the system parameters are fundamental system fingerprints. In parameter space, shrimp-like or swallow-tail structures separate parameter sets that lead to stable periodic dynamical behavior from those leading to irregular behavior. We work out the genericity of this phenomenon and demonstrate novel examples for their occurrence in realistic models of biophysics. Although we elucidate the phenomenon by considering the emergence of periodicity in dependence on system parameters in a low-dimensional parameter space, the conclusions from our simple setting are shown to continue to be valid for features in a higher-dimensional feature space, as long as the feature-generating mechanism is not too extreme and the dimension of this space is not too high compared with the amount of available data. For online versions of super-paramagnetic clustering see http://stoop.ini.uzh.ch/research/clustering. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yamazaki, Kaoru; Nakamura, Takashi; Kanno, Manabu
2014-09-28
To establish the fundamental understanding of the fragmentation dynamics of highly positive charged nano- and bio-materials, we carried out on-the-fly classical trajectory calculations on the fragmentation dynamics of C{sub 60}{sup q+} (q = 20–60). We used the UB3LYP/3-21G level of density functional theory and the self-consistent charge density-functional based tight-binding theory. For q ≥ 20, we found that a two-step explosion mechanism governs the fragmentation dynamics: C{sub 60}{sup q+} first ejects singly and multiply charged fast atomic cations C{sup z+} (z ≥ 1) via Coulomb explosions on a timescale of 10 fs to stabilize the remaining core cluster. Thermal evaporationsmore » of slow atomic and molecular fragments from the core cluster subsequently occur on a timescale of 100 fs to 1 ps. Increasing the charge q makes the fragments smaller. This two-step mechanism governs the fragmentation dynamics in the most likely case that the initial kinetic energy accumulated upon ionization to C{sub 60}{sup q+} by ion impact or X-ray free electron laser is larger than 100 eV.« less
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
NASA Astrophysics Data System (ADS)
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard
2014-09-01
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G., E-mail: yannis@princeton.edu, E-mail: gerhard.hummer@biophys.mpg.de
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlapmore » with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.« less
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard
2014-01-01
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space. PMID:25240340
A comparison of regional flood frequency analysis approaches in a simulation framework
NASA Astrophysics Data System (ADS)
Ganora, D.; Laio, F.
2016-07-01
Regional frequency analysis (RFA) is a well-established methodology to provide an estimate of the flood frequency curve at ungauged (or scarcely gauged) sites. Different RFA approaches exist, depending on the way the information is transferred to the site of interest, but it is not clear in the literature if a specific method systematically outperforms the others. The aim of this study is to provide a framework wherein carrying out the intercomparison by building up a virtual environment based on synthetically generated data. The considered regional approaches include: (i) a unique regional curve for the whole region; (ii) a multiple-region model where homogeneous subregions are determined through cluster analysis; (iii) a Region-of-Influence model which defines a homogeneous subregion for each site; (iv) a spatially smooth estimation procedure where the parameters of the regional model vary continuously along the space. Virtual environments are generated considering different patterns of heterogeneity, including step change and smooth variations. If the region is heterogeneous, with the parent distribution changing continuously within the region, the spatially smooth regional approach outperforms the others, with overall errors 10-50% lower than the other methods. In the case of a step-change, the spatially smooth and clustering procedures perform similarly if the heterogeneity is moderate, while clustering procedures work better when the step-change is severe. To extend our findings, an extensive sensitivity analysis has been performed to investigate the effect of sample length, number of virtual stations, return period of the predicted quantile, variability of the scale parameter of the parent distribution, number of predictor variables and different parent distribution. Overall, the spatially smooth approach appears as the most robust approach as its performances are more stable across different patterns of heterogeneity, especially when short records are considered.
Clustering of color map pixels: an interactive approach
NASA Astrophysics Data System (ADS)
Moon, Yiu Sang; Luk, Franklin T.; Yuen, K. N.; Yeung, Hoi Wo
2003-12-01
The demand for digital maps continues to arise as mobile electronic devices become more popular nowadays. Instead of creating the entire map from void, we may convert a scanned paper map into a digital one. Color clustering is the very first step of the conversion process. Currently, most of the existing clustering algorithms are fully automatic. They are fast and efficient but may not work well in map conversion because of the numerous ambiguous issues associated with printed maps. Here we introduce two interactive approaches for color clustering on the map: color clustering with pre-calculated index colors (PCIC) and color clustering with pre-calculated color ranges (PCCR). We also introduce a memory model that could enhance and integrate different image processing techniques for fine-tuning the clustering results. Problems and examples of the algorithms are discussed in the paper.
Clermont, Gilles; Chen, Lujie; Dubrawski, Artur W.; Ren, Dianxu; Hoffman, Leslie A.; Pinsky, Michael R.; Hravnak, Marilyn
2018-01-01
Cardiorespiratory instability (CRI) in monitored step-down unit (SDU) patients has a variety of etiologies, and likely manifests in patterns of vital signs (VS) changes. We explored use of clustering techniques to identify patterns in the initial CRI epoch (CRI1; first exceedances of VS beyond stability thresholds after SDU admission) of unstable patients, and inter-cluster differences in admission characteristics and outcomes. Continuous noninvasive monitoring of heart rate (HR), respiratory rate (RR), and pulse oximetry (SpO2) were sampled at 1/20 Hz. We identified CRI1 in 165 patients, employed hierarchical and k-means clustering, tested several clustering solutions, used 10-fold cross validation to establish the best solution and assessed inter-cluster differences in admission characteristics and outcomes. Three clusters (C) were derived: C1) normal/high HR and RR, normal SpO2 (n = 30); C2) normal HR and RR, low SpO2 (n = 103); and C3) low/normal HR, low RR and normal SpO2 (n = 32). Clusters were significantly different based on age (p < 0.001; older patients in C2), number of comorbidities (p = 0.008; more C2 patients had ≥ 2) and hospital length of stay (p = 0.006; C1 patients stayed longer). There were no between-cluster differences in SDU length of stay, or mortality. Three different clusters of VS presentations for CRI1 were identified. Clusters varied on age, number of comorbidities and hospital length of stay. Future study is needed to determine if there are common physiologic underpinnings of VS clusters which might inform clinical decision-making when CRI first manifests. PMID:28229353
Bioinformatics in proteomics: application, terminology, and pitfalls.
Wiemer, Jan C; Prokudin, Alexander
2004-01-01
Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
Jeong, Ji-Wook; Chae, Seung-Hoon; Chae, Eun Young; Kim, Hak Hee; Choi, Young-Wook; Lee, Sooyeul
2016-01-01
We propose computer-aided detection (CADe) algorithm for microcalcification (MC) clusters in reconstructed digital breast tomosynthesis (DBT) images. The algorithm consists of prescreening, MC detection, clustering, and false-positive (FP) reduction steps. The DBT images containing the MC-like objects were enhanced by a multiscale Hessian-based three-dimensional (3D) objectness response function and a connected-component segmentation method was applied to extract the cluster seed objects as potential clustering centers of MCs. Secondly, a signal-to-noise ratio (SNR) enhanced image was also generated to detect the individual MC candidates and prescreen the MC-like objects. Each cluster seed candidate was prescreened by counting neighboring individual MC candidates nearby the cluster seed object according to several microcalcification clustering criteria. As a second step, we introduced bounding boxes for the accepted seed candidate, clustered all the overlapping cubes, and examined. After the FP reduction step, the average number of FPs per case was estimated to be 2.47 per DBT volume with a sensitivity of 83.3%.
Fornasaro, Stefano; Vicario, Annalisa; De Leo, Luigina; Bonifacio, Alois; Not, Tarcisio; Sergo, Valter
2018-05-14
Raman hyperspectral imaging is an emerging practice in biological and biomedical research for label free analysis of tissues and cells. Using this method, both spatial distribution and spectral information of analyzed samples can be obtained. The current study reports the first Raman microspectroscopic characterisation of colon tissues from patients with Coeliac Disease (CD). The aim was to assess if Raman imaging coupled with hyperspectral multivariate image analysis is capable of detecting the alterations in the biochemical composition of intestinal tissues associated with CD. The analytical approach was based on a multi-step methodology: duodenal biopsies from healthy and coeliac patients were measured and processed with Multivariate Curve Resolution Alternating Least Squares (MCR-ALS). Based on the distribution maps and the pure spectra of the image constituents obtained from MCR-ALS, interesting biochemical differences between healthy and coeliac patients has been derived. Noticeably, a reduced distribution of complex lipids in the pericryptic space, and a different distribution and abundance of proteins rich in beta-sheet structures was found in CD patients. The output of the MCR-ALS analysis was then used as a starting point for two clustering algorithms (k-means clustering and hierarchical clustering methods). Both methods converged with similar results providing precise segmentation over multiple Raman images of studied tissues.
Panagopoulos, G P; Angelopoulou, D; Tzirtzilakis, E E; Giannoulopoulos, P
2016-10-01
This paper presents an innovated method for the discrimination of groundwater samples in common groups representing the hydrogeological units from where they have been pumped. This method proved very efficient even in areas with complex hydrogeological regimes. The proposed method requires chemical analyses of water samples only for major ions, meaning that it is applicable to most of cases worldwide. Another benefit of the method is that it gives a further insight of the aquifer hydrogeochemistry as it provides the ions that are responsible for the discrimination of the group. The procedure begins with cluster analysis of the dataset in order to classify the samples in the corresponding hydrogeological unit. The feasibility of the method is proven from the fact that the samples of volcanic origin were separated into two different clusters, namely the lava units and the pyroclastic-ignimbritic aquifer. The second step is the discriminant analysis of the data which provides the functions that distinguish the groups from each other and the most significant variables that define the hydrochemical composition of the aquifer. The whole procedure was highly successful as the 94.7 % of the samples were classified to the correct aquifer system. Finally, the resulted functions can be safely used to categorize samples of either unknown or doubtful origin improving thus the quality and the size of existing hydrochemical databases.
NASA Astrophysics Data System (ADS)
Titantah, John T.; Karttunen, Mikko
2016-05-01
Electronic and optical properties of silver clusters were calculated using two different ab initio approaches: (1) based on all-electron full-potential linearized-augmented plane-wave method and (2) local basis function pseudopotential approach. Agreement is found between the two methods for small and intermediate sized clusters for which the former method is limited due to its all-electron formulation. The latter, due to non-periodic boundary conditions, is the more natural approach to simulate small clusters. The effect of cluster size is then explored using the local basis function approach. We find that as the cluster size increases, the electronic structure undergoes a transition from molecular behavior to nanoparticle behavior at a cluster size of 140 atoms (diameter ~1.7 nm). Above this cluster size the step-like electronic structure, evident as several features in the imaginary part of the polarizability of all clusters smaller than Ag147, gives way to a dominant plasmon peak localized at wavelengths 350 nm ≤ λ ≤ 600 nm. It is, thus, at this length-scale that the conduction electrons' collective oscillations that are responsible for plasmonic resonances begin to dominate the opto-electronic properties of silver nanoclusters.
Function analysis of 5'-UTR of the cellulosomal xyl-doc cluster in Clostridium papyrosolvens.
Zou, Xia; Ren, Zhenxing; Wang, Na; Cheng, Yin; Jiang, Yuanyuan; Wang, Yan; Xu, Chenggang
2018-01-01
Anaerobic, mesophilic, and cellulolytic Clostridium papyrosolvens produces an efficient cellulolytic extracellular complex named cellulosome that hydrolyzes plant cell wall polysaccharides into simple sugars. Its genome harbors two long cellulosomal clusters: cip - cel operon encoding major cellulosome components (including scaffolding) and xyl - doc gene cluster encoding hemicellulases. Compared with works on cip - cel operon, there are much fewer studies on xyl - doc mainly due to its rare location in cellulolytic clostridia. Sequence analysis of xyl - doc revealed that it harbors a 5' untranslated region (5'-UTR) which potentially plays a role in the regulation of downstream gene expression. Here, we analyzed the function of 5'-UTR of xyl - doc cluster in C. papyrosolvens in vivo via transformation technology developed in this study. In this study, we firstly developed an electrotransformation method for C. papyrosolvens DSM 2782 before the analysis of 5'-UTR of xyl - doc cluster. In the optimized condition, a field with an intensity of 7.5-9.0 kV/cm was applied to a cuvette (0.2 cm gap) containing a mixture of plasmid and late cell suspended in exponential phase to form a 5 ms pulse in a sucrose-containing buffer. Afterwards, the putative promoter and the 5'-UTR of xyl - doc cluster were determined by sequence alignment. It is indicated that xyl - doc possesses a long conservative 5'-UTR with a complex secondary structure encompassing at least two perfect stem-loops which are potential candidates for controlling the transcriptional termination. In the last step, we employed an oxygen-independent flavin-based fluorescent protein (FbFP) as a quantitative reporter to analyze promoter activity and 5'-UTR function in vivo. It revealed that 5'-UTR significantly blocked transcription of downstream genes, but corn stover can relieve its suppression. In the present study, our results demonstrated that 5'-UTR of the cellulosomal xyl - doc cluster blocks the transcriptional activity of promoter. However, some substrates, such as corn stover, can relieve the effect of depression of 5'-UTR. Thus, it is speculated that 5'-UTR of xyl - doc was a putative riboswitch to regulate the expression of downstream cellulosomal genes, which is helpful to understand the complex regulation of cellulosome.
Li, Hong Zhi; Hu, Li Hong; Tao, Wei; Gao, Ting; Li, Hui; Lu, Ying Hua; Su, Zhong Min
2012-01-01
A DFT-SOFM-RBFNN method is proposed to improve the accuracy of DFT calculations on Y-NO (Y = C, N, O, S) homolysis bond dissociation energies (BDE) by combining density functional theory (DFT) and artificial intelligence/machine learning methods, which consist of self-organizing feature mapping neural networks (SOFMNN) and radial basis function neural networks (RBFNN). A descriptor refinement step including SOFMNN clustering analysis and correlation analysis is implemented. The SOFMNN clustering analysis is applied to classify descriptors, and the representative descriptors in the groups are selected as neural network inputs according to their closeness to the experimental values through correlation analysis. Redundant descriptors and intuitively biased choices of descriptors can be avoided by this newly introduced step. Using RBFNN calculation with the selected descriptors, chemical accuracy (≤1 kcal·mol(-1)) is achieved for all 92 calculated organic Y-NO homolysis BDE calculated by DFT-B3LYP, and the mean absolute deviations (MADs) of the B3LYP/6-31G(d) and B3LYP/STO-3G methods are reduced from 4.45 and 10.53 kcal·mol(-1) to 0.15 and 0.18 kcal·mol(-1), respectively. The improved results for the minimal basis set STO-3G reach the same accuracy as those of 6-31G(d), and thus B3LYP calculation with the minimal basis set is recommended to be used for minimizing the computational cost and to expand the applications to large molecular systems. Further extrapolation tests are performed with six molecules (two containing Si-NO bonds and two containing fluorine), and the accuracy of the tests was within 1 kcal·mol(-1). This study shows that DFT-SOFM-RBFNN is an efficient and highly accurate method for Y-NO homolysis BDE. The method may be used as a tool to design new NO carrier molecules.
Li, Hong Zhi; Hu, Li Hong; Tao, Wei; Gao, Ting; Li, Hui; Lu, Ying Hua; Su, Zhong Min
2012-01-01
A DFT-SOFM-RBFNN method is proposed to improve the accuracy of DFT calculations on Y-NO (Y = C, N, O, S) homolysis bond dissociation energies (BDE) by combining density functional theory (DFT) and artificial intelligence/machine learning methods, which consist of self-organizing feature mapping neural networks (SOFMNN) and radial basis function neural networks (RBFNN). A descriptor refinement step including SOFMNN clustering analysis and correlation analysis is implemented. The SOFMNN clustering analysis is applied to classify descriptors, and the representative descriptors in the groups are selected as neural network inputs according to their closeness to the experimental values through correlation analysis. Redundant descriptors and intuitively biased choices of descriptors can be avoided by this newly introduced step. Using RBFNN calculation with the selected descriptors, chemical accuracy (≤1 kcal·mol−1) is achieved for all 92 calculated organic Y-NO homolysis BDE calculated by DFT-B3LYP, and the mean absolute deviations (MADs) of the B3LYP/6-31G(d) and B3LYP/STO-3G methods are reduced from 4.45 and 10.53 kcal·mol−1 to 0.15 and 0.18 kcal·mol−1, respectively. The improved results for the minimal basis set STO-3G reach the same accuracy as those of 6-31G(d), and thus B3LYP calculation with the minimal basis set is recommended to be used for minimizing the computational cost and to expand the applications to large molecular systems. Further extrapolation tests are performed with six molecules (two containing Si-NO bonds and two containing fluorine), and the accuracy of the tests was within 1 kcal·mol−1. This study shows that DFT-SOFM-RBFNN is an efficient and highly accurate method for Y-NO homolysis BDE. The method may be used as a tool to design new NO carrier molecules. PMID:22942689
Cluster-based analysis improves predictive validity of spike-triggered receptive field estimates
Malone, Brian J.
2017-01-01
Spectrotemporal receptive field (STRF) characterization is a central goal of auditory physiology. STRFs are often approximated by the spike-triggered average (STA), which reflects the average stimulus preceding a spike. In many cases, the raw STA is subjected to a threshold defined by gain values expected by chance. However, such correction methods have not been universally adopted, and the consequences of specific gain-thresholding approaches have not been investigated systematically. Here, we evaluate two classes of statistical correction techniques, using the resulting STRF estimates to predict responses to a novel validation stimulus. The first, more traditional technique eliminated STRF pixels (time-frequency bins) with gain values expected by chance. This correction method yielded significant increases in prediction accuracy, including when the threshold setting was optimized for each unit. The second technique was a two-step thresholding procedure wherein clusters of contiguous pixels surviving an initial gain threshold were then subjected to a cluster mass threshold based on summed pixel values. This approach significantly improved upon even the best gain-thresholding techniques. Additional analyses suggested that allowing threshold settings to vary independently for excitatory and inhibitory subfields of the STRF resulted in only marginal additional gains, at best. In summary, augmenting reverse correlation techniques with principled statistical correction choices increased prediction accuracy by over 80% for multi-unit STRFs and by over 40% for single-unit STRFs, furthering the interpretational relevance of the recovered spectrotemporal filters for auditory systems analysis. PMID:28877194
Biclustering of gene expression data using reactive greedy randomized adaptive search procedure.
Dharan, Smitha; Nair, Achuthsankar S
2009-01-30
Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts.
MC 2 : galaxy imaging and redshift analysis of the merging cluster Ciza J2242.8+5301
Dawson, William A.; Jee, M. James; Stroe, Andra; ...
2015-05-28
X-ray and radio observations of CIZA J2242.8+5301 suggest that it is a major cluster merger. Despite being well studied in the X-ray, and radio, little has been presented on the cluster structure and dynamics inferred from its galaxy population. We carried out a deep (i < 25) broad band imaging survey of the system with Subaru SuprimeCam (g & i bands) and the Canada France Hawaii Telescope (r band) as well as a comprehensive spectroscopic survey of the cluster area (505 redshifts) using Keck DEIMOS. We use this data to perform a comprehensive galaxy/redshift analysis of the system, which ismore » the first step to a proper understanding the geometry and dynamics of the merger, as well as using the merger to constrain self-interacting dark matter.« less
Yi, Chucai; Tian, Yingli
2012-09-01
In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.
Results from DESDM Pipeline on Data From Blanco Cosmology Survey
NASA Astrophysics Data System (ADS)
Desai, Shantanu; Mohr, J.; Armstrong, R.; Bertin, E.; Zenteno, A.; Tucker, D.; Song, J.; Ngeow, C.; Lin, H.; Bazin, G.; Liu, J.; Cosmology Survey, Blanco
2011-01-01
The Blanco Cosmology Survey (BCS) is a 60-night survey of the southern skies using the CTIO Blanco 4 m telescope, whose main goal to study cosmic acceleration using galaxy clusters. BCS has carried out observations in two 50 degree patches of the southern skies centered at 23 hr and 5 hr in griz bands. These fields were chosen to maximize overlap with the the South Pole Telescope. The data from this survey has been processed using the Dark energy Data Management System (DESDM) on Teragrid resources at NCSA and CCT. DESDM is developed to analyze data from the Dark Energy Survey, which begins around 2011 and analysis of real data provides valuable warmup exercise before the DES survey starts. We describe in detail the key steps in producing science ready catalogs from the raw data. This includes detrending, astrometric calibration, photometric calibration, co-addition with psf homogenization. The final catalogs are constructed using model-fitting photometry which includes detailed galaxy fitting models convolved with the local PSF. We illustrate how photometric redshifts of galaxy clusters are estimated using red-sequence fitting and show results from a few clusters.
The Bacillus subtilis GntR family repressor YtrA responds to cell wall antibiotics.
Salzberg, Letal I; Luo, Yun; Hachmann, Anna-Barbara; Mascher, Thorsten; Helmann, John D
2011-10-01
The transglycosylation step of cell wall synthesis is a prime antibiotic target because it is essential and specific to bacteria. Two antibiotics, ramoplanin and moenomycin, target this step by binding to the substrate lipid II and the transglycosylase enzyme, respectively. Here, we compare the ramoplanin and moenomycin stimulons in the Gram-positive model organism Bacillus subtilis. Ramoplanin strongly induces the LiaRS two-component regulatory system, while moenomycin almost exclusively induces genes that are part of the regulon of the extracytoplasmic function (ECF) σ factor σ(M). Ramoplanin additionally induces the ytrABCDEF and ywoBCD operons, which are not part of a previously characterized antibiotic-responsive regulon. Cluster analysis reveals that these two operons are selectively induced by a subset of cell wall antibiotics that inhibit lipid II function or recycling. Repression of both operons requires YtrA, which recognizes an inverted repeat in front of its own operon and in front of ywoB. These results suggest that YtrA is an additional regulator of cell envelope stress responses.
The Bacillus subtilis GntR Family Repressor YtrA Responds to Cell Wall Antibiotics▿§
Salzberg, Letal I.; Luo, Yun; Hachmann, Anna-Barbara; Mascher, Thorsten; Helmann, John D.
2011-01-01
The transglycosylation step of cell wall synthesis is a prime antibiotic target because it is essential and specific to bacteria. Two antibiotics, ramoplanin and moenomycin, target this step by binding to the substrate lipid II and the transglycosylase enzyme, respectively. Here, we compare the ramoplanin and moenomycin stimulons in the Gram-positive model organism Bacillus subtilis. Ramoplanin strongly induces the LiaRS two-component regulatory system, while moenomycin almost exclusively induces genes that are part of the regulon of the extracytoplasmic function (ECF) σ factor σM. Ramoplanin additionally induces the ytrABCDEF and ywoBCD operons, which are not part of a previously characterized antibiotic-responsive regulon. Cluster analysis reveals that these two operons are selectively induced by a subset of cell wall antibiotics that inhibit lipid II function or recycling. Repression of both operons requires YtrA, which recognizes an inverted repeat in front of its own operon and in front of ywoB. These results suggest that YtrA is an additional regulator of cell envelope stress responses. PMID:21856850
NASA Astrophysics Data System (ADS)
Root, D. B.; Mattinson, J. M.; Hacker, B. R.; Wooden, J. L.
2002-12-01
Understanding the formation and exhumation of the ultrahigh-pressure (UHP) rocks of western Norway hinges on precise determination of the time of eclogite recrystallization. Our study consists of SHRIMP analysis, in conjunction with CL imagery, of zircon from four UHP and high-pressure (HP) eclogites; and detailed TIMS analysis of zircon from two samples subjected to combined thermal annealing and multi-step chemical abrasion (CA). SHRIMP analyses of the Otnheim and Langenes eclogites yield Caledonian spot ages of ca. 400 Ma from zircon rims. CL imagery and Th/U ratios from the Langenes eclogite indicate formation of rims by recrystallization of inherited zircon. SHRIMP analysis of the UHP Flatraket eclogite yielded a broad range of apparently concordant Caledonian ages. CA analyses of two fractions yielded moderate Pb loss from the first (lowest T) steps; possible minor Pb loss or minor growth at 400 Ma from the second steps; and a 407-404 Ma cluster of slightly discordant 206Pb/238U ages, most likely free from Pb loss, from the remaining steps. We interpret the latter to reflect recrystallization of inherited zircon, with possible new growth, at ca. 400-395 Ma. Alternatively, the high-temperature CA steps could represent growth at 407-404 Ma, with apparent discordance due to intermediate daughter product effects. HP/UHP zircon recrystallization in the Flatraket eclogite is inferred from three lines of evidence: i) zircon occurs as inclusions in garnet, omphacite, breunnerite, dolomite, and quartz, as well as in symplectites after phengite and omphacite; ii) association of zircon with rutile implies zircon formation during HP breakdown of Zr-ilmenite; and iii) chondrite-normalized ICP-MS analyses of the CA steps reveal small Eu anomalies and shallow HREE profiles, indicating zircon recrystallization in the presence of garnet. CA analysis of the Verpeneset eclogite yielded distinctly discordant step ages from two steps comprising <90% of the sample, with 206Pb/238U ages of 408 and 414 Ma. CL imagery indicates incomplete recrystallization of inherited igneous zircon, in keeping with steep HREE profiles determined from chondrite-normalized ICP-MS analyses. Our zircon age of ca. 400-395 Ma for the Flatraket eclogite is significantly younger than the 425 Ma age often cited for western Norway eclogite recrystallization, implying, in conjunction with 390-385 Ma 40Ar/39Ar white mica cooling ages, faster rates of exhumation (ca. 15 km/m.y.), and weakening the link between UHP metamorphism and ophiolite emplacement at 430-425 Ma.
NASA Astrophysics Data System (ADS)
Haghighi, Babak; Choi, Jiwoong; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long
2017-11-01
Accurate modeling of small airway diameters in patients with chronic obstructive pulmonary disease (COPD) is a crucial step toward patient-specific CFD simulations of regional airflow and particle transport. We proposed to use computed tomography (CT) imaging-based cluster membership to identify structural characteristics of airways in each cluster and use them to develop cluster-specific airway diameter models. We analyzed 284 COPD smokers with airflow limitation, and 69 healthy controls. We used multiscale imaging-based cluster analysis (MICA) to classify smokers into 4 clusters. With representative cluster patients and healthy controls, we performed multiple regressions to quantify variation of airway diameters by generation as well as by cluster. The cluster 2 and 4 showed more diameter decrease as generation increases than other clusters. The cluster 4 had more rapid decreases of airway diameters in the upper lobes, while cluster 2 in the lower lobes. We then used these regression models to estimate airway diameters in CT unresolved regions to obtain pressure-volume hysteresis curves using a 1D resistance model. These 1D flow solutions can be used to provide the patient-specific boundary conditions for 3D CFD simulations in COPD patients. Support for this study was provided, in part, by NIH Grants U01-HL114494, R01-HL112986 and S10-RR022421.
Sarrafzadeh, Omid; Dehnavi, Alireza Mehri
2015-01-01
Segmentation of leukocytes acts as the foundation for all automated image-based hematological disease recognition systems. Most of the time, hematologists are interested in evaluation of white blood cells only. Digital image processing techniques can help them in their analysis and diagnosis. The main objective of this paper is to detect leukocytes from a blood smear microscopic image and segment them into their two dominant elements, nucleus and cytoplasm. The segmentation is conducted using two stages of applying K-means clustering. First, the nuclei are segmented using K-means clustering. Then, a proposed method based on region growing is applied to separate the connected nuclei. Next, the nuclei are subtracted from the original image. Finally, the cytoplasm is segmented using the second stage of K-means clustering. The results indicate that the proposed method is able to extract the nucleus and cytoplasm regions accurately and works well even though there is no significant contrast between the components in the image. In this paper, a method based on K-means clustering and region growing is proposed in order to detect leukocytes from a blood smear microscopic image and segment its components, the nucleus and the cytoplasm. As region growing step of the algorithm relies on the information of edges, it will not able to separate the connected nuclei more accurately in poor edges and it requires at least a weak edge to exist between the nuclei. The nucleus and cytoplasm segments of a leukocyte can be used for feature extraction and classification which leads to automated leukemia detection.
Nucleus and cytoplasm segmentation in microscopic images using K-means clustering and region growing
Sarrafzadeh, Omid; Dehnavi, Alireza Mehri
2015-01-01
Background: Segmentation of leukocytes acts as the foundation for all automated image-based hematological disease recognition systems. Most of the time, hematologists are interested in evaluation of white blood cells only. Digital image processing techniques can help them in their analysis and diagnosis. Materials and Methods: The main objective of this paper is to detect leukocytes from a blood smear microscopic image and segment them into their two dominant elements, nucleus and cytoplasm. The segmentation is conducted using two stages of applying K-means clustering. First, the nuclei are segmented using K-means clustering. Then, a proposed method based on region growing is applied to separate the connected nuclei. Next, the nuclei are subtracted from the original image. Finally, the cytoplasm is segmented using the second stage of K-means clustering. Results: The results indicate that the proposed method is able to extract the nucleus and cytoplasm regions accurately and works well even though there is no significant contrast between the components in the image. Conclusions: In this paper, a method based on K-means clustering and region growing is proposed in order to detect leukocytes from a blood smear microscopic image and segment its components, the nucleus and the cytoplasm. As region growing step of the algorithm relies on the information of edges, it will not able to separate the connected nuclei more accurately in poor edges and it requires at least a weak edge to exist between the nuclei. The nucleus and cytoplasm segments of a leukocyte can be used for feature extraction and classification which leads to automated leukemia detection. PMID:26605213
Searching regional rainfall homogeneity using atmospheric fields
NASA Astrophysics Data System (ADS)
Gabriele, Salvatore; Chiaravalloti, Francesco
2013-03-01
The correct identification of homogeneous areas in regional rainfall frequency analysis is fundamental to ensure the best selection of the probability distribution and the regional model which produce low bias and low root mean square error of quantiles estimation. In an attempt at rainfall spatial homogeneity, the paper explores a new approach that is based on meteo-climatic information. The results are verified ex-post using standard homogeneity tests applied to the annual maximum daily rainfall series. The first step of the proposed procedure selects two different types of homogeneous large regions: convective macro-regions, which contain high values of the Convective Available Potential Energy index, normally associated with convective rainfall events, and stratiform macro-regions, which are characterized by low values of the Q vector Divergence index, associated with dynamic instability and stratiform precipitation. These macro-regions are identified using Hot Spot Analysis to emphasize clusters of extreme values of the indexes. In the second step, inside each identified macro-region, homogeneous sub-regions are found using kriging interpolation on the mean direction of the Vertically Integrated Moisture Flux. To check the proposed procedure, two detailed examples of homogeneous sub-regions are examined.
Enhancement of deuterium retention in damaged tungsten by plasma-induced defect clustering
NASA Astrophysics Data System (ADS)
Jin, Younggil; Roh, Ki-Baek; Sheen, Mi-Hyang; Kim, Nam-Kyun; Song, Jaemin; Kim, Young-Woon; Kim, Gon-Ho
2017-12-01
The enhancement of deuterium retention was investigated for tungsten in the presence of both 2.8 MeV self-ion induced cascade damage and fuel hydrogen isotope plasma. Vacancy clustering in cascade damaged polycrystalline tungsten occurred due to deuterium irradiation and was observed near the grain boundary by using all-step transmission electron microscopy analysis. Analysis of the highest desorption temperature peak using thermal desorption spectroscopy supports reasonable evidence of defect clustering in the damaged polycrystalline tungsten. The defect clustering was neither observed on the damaged polycrystalline tungsten without deuterium irradiation nor on the damaged single-crystalline tungsten with deuterium irradiation. This result implies the synergetic role of deuterium and grain boundary on defect clustering. This study proposes a path for the defect transform from point defect to defect cluster, by the agglomeration between irradiated deuterium and cascade damage-induced defect. This agglomeration may induce more severe damage on the tungsten divertor at which the high fuel hydrogen ions, fast neutrons, and self-ions are irradiated simultaneously and it would increase the in-vessel tritium inventory.
Gifford, Elizabeth V; Tavakoli, Sara; Weingardt, Kenneth R; Finney, John W; Pierson, Heather M; Rosen, Craig S; Hagedorn, Hildi J; Cook, Joan M; Curran, Geoff M
2012-01-01
Evidence-based psychological treatments (EBPTs) are clusters of interventions, but it is unclear how providers actually implement these clusters in practice. A disaggregated measure of EBPTs was developed to characterize clinicians' component-level evidence-based practices and to examine relationships among these practices. Survey items captured components of evidence-based treatments based on treatment integrity measures. The Web-based survey was conducted with 75 U.S. Department of Veterans Affairs (VA) substance use disorder (SUD) practitioners and 149 non-VA community-based SUD practitioners. Clinician's self-designated treatment orientations were positively related to their endorsement of those EBPT components; however, clinicians used components from a variety of EBPTs. Hierarchical cluster analysis indicated that clinicians combined and organized interventions from cognitive-behavioral therapy, the community reinforcement approach, motivational interviewing, structured family and couples therapy, 12-step facilitation, and contingency management into clusters including empathy and support, treatment engagement and activation, abstinence initiation, and recovery maintenance. Understanding how clinicians use EBPT components may lead to improved evidence-based practice dissemination and implementation. Published by Elsevier Inc.
Comparing the performance of biomedical clustering methods.
Wiwie, Christian; Baumbach, Jan; Röttger, Richard
2015-11-01
Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.
Automatic classification of atypical lymphoid B cells using digital blood image processing.
Alférez, S; Merino, A; Mujica, L E; Ruiz, M; Bigorra, L; Rodellar, J
2014-08-01
There are automated systems for digital peripheral blood (PB) cell analysis, but they operate most effectively in nonpathological blood samples. The objective of this work was to design a methodology to improve the automatic classification of abnormal lymphoid cells. We analyzed 340 digital images of individual lymphoid cells from PB films obtained in the CellaVision DM96:150 chronic lymphocytic leukemia (CLL) cells, 100 hairy cell leukemia (HCL) cells, and 90 normal lymphocytes (N). We implemented the Watershed Transformation to segment the nucleus, the cytoplasm, and the peripheral cell region. We extracted 44 features and then the clustering Fuzzy C-Means (FCM) was applied in two steps for the lymphocyte classification. The images were automatically clustered in three groups, one of them with 98% of the HCL cells. The set of the remaining cells was clustered again using FCM and texture features. The two new groups contained 83.3% of the N cells and 71.3% of the CLL cells, respectively. The approach has been able to automatically classify with high precision three types of lymphoid cells. The addition of more descriptors and other classification techniques will allow extending the classification to other classes of atypical lymphoid cells. © 2013 John Wiley & Sons Ltd.
Davey, Calum; Aiken, Alexander M; Hayes, Richard J; Hargreaves, James R
2015-01-01
Introduction: Helminth (worm) infections cause morbidity among poor communities worldwide. An influential study conducted in Kenya in 1998–99 reported that a school-based drug-and-educational intervention had benefits for worm infections and school attendance. Methods: In this statistical replication, we re-analysed data from this cluster quasi-randomized stepped-wedge trial, specifying two co-primary outcomes: school attendance and examination performance. We estimated intention-to-treat effects using year-stratified cluster-summary analysis and observation-level random-effects regression, and combined both years with a random-effects model accounting for year. The participants were not blinded to allocation status, and other interventions were concurrently conducted in a sub-set of schools. A protocol guiding outcome data collection was not available. Results: Quasi-randomization resulted in three similar groups of 25 schools. There was a substantial amount of missing data. In year-stratified cluster-summary analysis, there was no clear evidence for improvement in either school attendance or examination performance. In year-stratified regression models, there was some evidence of improvement in school attendance [adjusted odds ratios (aOR): year 1: 1.48, 95% confidence interval (CI) 0.88–2.52, P = 0.147; year 2: 1.23, 95% CI 1.01–1.51, P = 0.044], but not examination performance (adjusted differences: year 1: −0.135, 95% CI −0.323–0.054, P = 0.161; year 2: −0.017, 95% CI −0.201–0.166, P = 0.854). When both years were combined, there was strong evidence of an effect on attendance (aOR 1.82, 95% CI 1.74–1.91, P < 0.001), but not examination performance (adjusted difference −0.121, 95% CI −0.293–0.052, P = 0.169). Conclusions: The evidence supporting an improvement in school attendance differed by analysis method. This, and various other important limitations of the data, caution against over-interpretation of the results. We find that the study provides some evidence, but with high risk of bias, that a school-based drug-treatment and health-education intervention improved school attendance and no evidence of effect on examination performance. PMID:26203171
Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo
2017-12-01
To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Chi -Ta; Wood, Brandon C.; Bhethanabotla, Venkat R.
2015-09-04
In this study, using density functional theory calculations, we investigate the influence of size-dependent cluster morphology on the synergistic catalytic properties of anatase TiO 2(101) surfaces decorated with subnanometer Pt clusters. Focusing on the formation of the key precursor in the CO 2 photoreduction reaction (bent CO 2 –), we find that flatter (2D-like) Pt clusters that “wet” the TiO 2 surface offer significantly less benefit than 3D-like Pt clusters. We attribute the differences to three factors. First, the 3D clusters provide a greater number of accessible Pt–TiO 2 interfacial sites with geometries that can aid CO 2 bond bendingmore » and charge transfer processes. Second, binding competition among each Pt–CO 2 bonding interaction mitigates maximum orbital overlaps, leading to insufficient CO 2 binding. Third and also most interestingly, the 3D clusters tend to possess higher structural fluxionality than the flatter clusters, which is shown to correlate positively with CO2 binding strength. The preferred morphology adopted by the clusters depends on several factors, including the cluster size and the presence of oxygen vacancies on the TiO 2 surface; this suggests a strategy for optimizing the synergistic effect between Pt clusters and TiO 2 surfaces for CO 2 photocatalysis. Clusters of ~6–8 atoms should provide the largest benefit, since they retain the desired 3D morphology, yet are small enough to exhibit high structural fluxionality. Electronic structure analysis provides additional insight into the electronic motivations for the enhanced binding of CO 2 on TiO 2-supported 3D Pt clusters, as well as suppressed binding on flattened, 2D-like clusters.« less
de Freitas, Mariana Gonçalves; Bonolo, Palmira de Fátima; de Moraes, Edgar Nunes; Machado, Carla Jorge
2015-03-01
The article aims to describe the profile of elderly victims of falls and traffic accidents from the data of the Surveillance Survey of Violence and Accidents (VIVA). The VIVA Survey was conducted in the emergency health-services of the Unified Health System in the capitals of Brazil in 2011. The sample of elderly by type of accident was subjected to the two-step cluster procedure. Of the 2463 elderly persons in question, 79.8% suffered falls and 20.2% were the victims of traffic accidents. The 1812 elderly who fell were grouped together into 4 clusters: Cluster 1, in which all had disabilities; Cluster 2, all were non-white and falls took place in the home; Cluster 3, younger and active seniors; and Cluster 4, with a higher proportion of seniors 80 years old or above who were white. Among cases of traffic accidents, 446 seniors were grouped into two clusters: Cluster 1 of younger elderly, drivers or passengers; Cluster 2, with higher age seniors, mostly pedestrians. The main victims of falls were women with low schooling and unemployed; traffic accident victims were mostly younger and male. Complications were similar in victims of falls and traffic accidents. Clusters allow adoption of targeted measures of care, prevention and health promotion.
Modeling solute clustering in the diffusion layer around a growing crystal.
Shiau, Lie-Ding; Lu, Yung-Fang
2009-03-07
The mechanism of crystal growth from solution is often thought to consist of a mass transfer diffusion step followed by a surface reaction step. Solute molecules might form clusters in the diffusion step before incorporating into the crystal lattice. A model is proposed in this work to simulate the evolution of the cluster size distribution due to the simultaneous aggregation and breakage of solute molecules in the diffusion layer around a growing crystal in the stirred solution. The crystallization of KAl(SO(4))(2)12H(2)O from aqueous solution is studied to illustrate the effect of supersaturation and diffusion layer thickness on the number-average degree of clustering and the size distribution of solute clusters in the diffusion layer.
Multiple-locus variable-number tandem repeat analysis for molecular typing of Aspergillus fumigatus
2010-01-01
Background Multiple-locus variable-number tandem repeat (VNTR) analysis (MLVA) is a prominent subtyping method to resolve closely related microbial isolates to provide information for establishing genetic patterns among isolates and to investigate disease outbreaks. The usefulness of MLVA was recently demonstrated for the avian major pathogen Chlamydophila psittaci. In the present study, we developed a similar method for another pathogen of birds: the filamentous fungus Aspergillus fumigatus. Results We selected 10 VNTR markers located on 4 different chromosomes (1, 5, 6 and 8) of A. fumigatus. These markers were tested with 57 unrelated isolates from different hosts or their environment (53 isolates from avian species in France, China or Morocco, 3 isolates from humans collected at CHU Henri Mondor hospital in France and the reference strain CBS 144.89). The Simpson index for individual markers ranged from 0.5771 to 0.8530. A combined loci index calculated with all the markers yielded an index of 0.9994. In a second step, the panel of 10 markers was used in different epidemiological situations and tested on 277 isolates, including 62 isolates from birds in Guangxi province in China, 95 isolates collected in two duck farms in France and 120 environmental isolates from a turkey hatchery in France. A database was created with the results of the present study http://minisatellites.u-psud.fr/MLVAnet/. Three major clusters of isolates were defined by using the graphing algorithm termed Minimum Spanning Tree (MST). The first cluster comprised most of the avian isolates collected in the two duck farms in France, the second cluster comprised most of the avian isolates collected in poultry farms in China and the third one comprised most of the isolates collected in the turkey hatchery in France. Conclusions MLVA displayed excellent discriminatory power. The method showed a good reproducibility. MST analysis revealed an interesting clustering with a clear separation between isolates according to their geographic origin rather than their respective hosts. PMID:21143842
Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.
Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B
2017-11-01
Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
The Rise of Radicals in Bioinorganic Chemistry.
Gray, Harry B; Winkler, Jay R
2016-10-01
Prior to 1950, the consensus was that biological transformations occurred in two-electron steps, thereby avoiding the generation of free radicals. Dramatic advances in spectroscopy, biochemistry, and molecular biology have led to the realization that protein-based radicals participate in a vast array of vital biological mechanisms. Redox processes involving high-potential intermediates formed in reactions with O 2 are particularly susceptible to radical formation. Clusters of tyrosine (Tyr) and tryptophan (Trp) residues have been found in many O 2 -reactive enzymes, raising the possibility that they play an antioxidant protective role. In blue copper proteins with plastocyanin-like domains, Tyr/Trp clusters are uncommon in the low-potential single-domain electron-transfer proteins and in the two-domain copper nitrite reductases. The two-domain muticopper oxidases, however, exhibit clusters of Tyr and Trp residues near the trinuclear copper active site where O 2 is reduced. These clusters may play a protective role to ensure that reactive oxygen species are not liberated during O 2 reduction.
ERIC Educational Resources Information Center
DiStefano, Christine; Kamphaus, R. W.
2006-01-01
Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…
Clustering ENTLN sferics to improve TGF temporal analysis
NASA Astrophysics Data System (ADS)
Pradhan, E.; Briggs, M. S.; Stanbro, M.; Cramer, E.; Heckman, S.; Roberts, O.
2017-12-01
Using TGFs detected with Fermi Gamma-ray Burst Monitor (GBM) and simultaneous radio sferics detected by Earth Network Total Lightning Network (ENTLN), we establish a temporal co-relation between them. The first step is to find ENTLN strokes that that are closely associated to GBM TGFs. We then identify all the related strokes in the lightning flash that the TGF-associated-stroke belongs to. After trying several algorithms, we found out that the DBSCAN clustering algorithm was best for clustering related ENTLN strokes into flashes. The operation of DBSCAN was optimized using a single seperation measure that combined time and distance seperation. Previous analysis found that these strokes show three timescales with respect to the gamma-ray time. We will use the improved identification of flashes to research this.
Reich, Richard R; Lengacher, Cecile A; Alinat, Carissa B; Kip, Kevin E; Paterson, Carly; Ramesar, Sophia; Han, Heather S; Ismail-Khan, Roohi; Johnson-Mallard, Versie; Moscoso, Manolete; Budhrani-Shani, Pinky; Shivers, Steve; Cox, Charles E; Goodman, Matthew; Park, Jong
2017-01-01
Breast cancer survivors (BCS) face adverse physical and psychological symptoms, often co-occurring. Biologic and psychological factors may link symptoms within clusters, distinguishable by prevalence and/or severity. Few studies have examined the effects of behavioral interventions or treatment of symptom clusters. The aim of this study was to identify symptom clusters among post-treatment BCS and determine symptom cluster improvement following the Mindfulness-Based Stress Reduction for Breast Cancer (MBSR(BC)) program. Three hundred twenty-two Stage 0-III post-treatment BCS were randomly assigned to either a six-week MBSR(BC) program or usual care. Psychological (depression, anxiety, stress, and fear of recurrence), physical (fatigue, pain, sleep, and drowsiness), and cognitive symptoms and quality of life were assessed at baseline, six, and 12 weeks, along with demographic and clinical history data at baseline. A three-step analytic process included the error-accounting models of factor analysis and structural equation modeling. Four symptom clusters emerged at baseline: pain, psychological, fatigue, and cognitive. From baseline to six weeks, the model demonstrated evidence of MBSR(BC) effectiveness in both the psychological (anxiety, depression, perceived stress and QOL, emotional well-being) (P = 0.007) and fatigue (fatigue, sleep, and drowsiness) (P < 0.001) clusters. Results between six and 12 weeks showed sustained effects, but further improvement was not observed. Our results provide clinical effectiveness evidence that MBSR(BC) works to improve symptom clusters, particularly for psychological and fatigue symptom clusters, with the greatest improvement occurring during the six-week program with sustained effects for several weeks after MBSR(BC) training. Name and URL of Registry: ClinicalTrials.gov. Registration number: NCT01177124. Copyright © 2016. Published by Elsevier Inc.
Cellular metabolic network analysis: discovering important reactions in Treponema pallidum.
Chen, Xueying; Zhao, Min; Qu, Hong
2015-01-01
T. pallidum, the syphilis-causing pathogen, performs very differently in metabolism compared with other bacterial pathogens. The desire for safe and effective vaccine of syphilis requests identification of important steps in T. pallidum's metabolism. Here, we apply Flux Balance Analysis to represent the reactions quantitatively. Thus, it is possible to cluster all reactions in T. pallidum. By calculating minimal cut sets and analyzing topological structure for the metabolic network of T. pallidum, critical reactions are identified. As a comparison, we also apply the analytical approaches to the metabolic network of H. pylori to find coregulated drug targets and unique drug targets for different microorganisms. Based on the clustering results, all reactions are further classified into various roles. Therefore, the general picture of their metabolic network is obtained and two types of reactions, both of which are involved in nucleic acid metabolism, are found to be essential for T. pallidum. It is also discovered that both hubs of reactions and the isolated reactions in purine and pyrimidine metabolisms play important roles in T. pallidum. These reactions could be potential drug targets for treating syphilis.
A hierarchical clustering methodology for the estimation of toxicity.
Martin, Todd M; Harten, Paul; Venkatapathy, Raghuraman; Das, Shashikala; Young, Douglas M
2008-01-01
ABSTRACT A quantitative structure-activity relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural similarity is defined in terms of 2-D physicochemical descriptors (such as connectivity and E-state indices). A genetic algorithm-based technique is used to generate statistically valid QSAR models for each cluster (using the pool of descriptors described above). The toxicity for a given query compound is estimated using the weighted average of the predictions from the closest cluster from each step in the hierarchical clustering assuming that the compound is within the domain of applicability of the cluster. The hierarchical clustering methodology was tested using a Tetrahymena pyriformis acute toxicity data set containing 644 chemicals in the training set and with two prediction sets containing 339 and 110 chemicals. The results from the hierarchical clustering methodology were compared to the results from several different QSAR methodologies.
An Empirical Typology of Narcissism and Mental Health in Late Adolescence
ERIC Educational Resources Information Center
Lapsley, Daniel K.; Aalsma, Matthew C.
2006-01-01
A two-step cluster analytic strategy was used in two studies to identify an empirically derived typology of narcissism in late adolescence. In Study 1, late adolescents (N=204) responded to the profile of narcissistic dispositions and measures of grandiosity (''superiority'') and idealization (''goal instability'') inspired by Kohut's theory,…
Ye, Yong-ling; Wang, Pei-gang; Qu, Geng-cong; Yuan, Shuai; Phongsavan, Philayrath; He, Qi-qiang
2016-01-01
Although there is substantial evidence that health risk behaviors increase risks of premature morbidity and mortality, little is known about the multiple health risk behaviors in Chinese college students. Here, we investigated the prevalence of multiple health risk behaviors and its relation to mental health among Chinese college students. A cross-sectional study was conducted in Wuhan, China from May to June 2012. The students reported their health risk behaviors using self-administered questionnaires. Depression and anxiety were assessed using the self-rating depression scale and self-rating anxiety scale, respectively. A total of 2422 college students (1433 males) aged 19.7 ± 1.2 years were participated in the study. The prevalence of physical inactivity, sleep disturbance, poor dietary behavior, Internet addiction disorder (IAD), frequent alcohol use and current smoking was 62.0, 42.6, 29.8, 22.3, 11.6 and 9.3%, respectively. Significantly increased risks for depression and anxiety were found among students with frequent alcohol use, sleep disturbance, poor dietary behavior and IAD. Two-step cluster analysis identified two different clusters. Participants in the cluster with more unhealthy behaviors showed significantly increased risk for depression (odds ratio (OR): 2.21; 95% confidence interval (CI): 1.83, 2.67) and anxiety (OR: 2.32; 95% CI: 1.85, 2.92). This study indicates that a relatively high prevalence of multiple health risk behaviors was found among Chinese college students. Furthermore, the clustering of health risk behaviors was significantly associated with increased risks for depression and anxiety.
ERIC Educational Resources Information Center
Lake County Area Vocational Center, Grayslake, IL.
This task analysis for nursing education provides performance standards, steps to be followed, knowledge required, attitudes to be developed, safety procedures, and equipment and supplies needed for 13 tasks performed by geriatric aides in the duty area of performing diagnostic measures and for 30 tasks in the duty area of providing therapeutic…
Deregulation upon DNA damage revealed by joint analysis of context-specific perturbation data
2011-01-01
Background Deregulation between two different cell populations manifests itself in changing gene expression patterns and changing regulatory interactions. Accumulating knowledge about biological networks creates an opportunity to study these changes in their cellular context. Results We analyze re-wiring of regulatory networks based on cell population-specific perturbation data and knowledge about signaling pathways and their target genes. We quantify deregulation by merging regulatory signal from the two cell populations into one score. This joint approach, called JODA, proves advantageous over separate analysis of the cell populations and analysis without incorporation of knowledge. JODA is implemented and freely available in a Bioconductor package 'joda'. Conclusions Using JODA, we show wide-spread re-wiring of gene regulatory networks upon neocarzinostatin-induced DNA damage in Human cells. We recover 645 deregulated genes in thirteen functional clusters performing the rich program of response to damage. We find that the clusters contain many previously characterized neocarzinostatin target genes. We investigate connectivity between those genes, explaining their cooperation in performing the common functions. We review genes with the most extreme deregulation scores, reporting their involvement in response to DNA damage. Finally, we investigate the indirect impact of the ATM pathway on the deregulated genes, and build a hypothetical hierarchy of direct regulation. These results prove that JODA is a step forward to a systems level, mechanistic understanding of changes in gene regulation between different cell populations. PMID:21693013
Deregulation upon DNA damage revealed by joint analysis of context-specific perturbation data.
Szczurek, Ewa; Markowetz, Florian; Gat-Viks, Irit; Biecek, Przemysław; Tiuryn, Jerzy; Vingron, Martin
2011-06-21
Deregulation between two different cell populations manifests itself in changing gene expression patterns and changing regulatory interactions. Accumulating knowledge about biological networks creates an opportunity to study these changes in their cellular context. We analyze re-wiring of regulatory networks based on cell population-specific perturbation data and knowledge about signaling pathways and their target genes. We quantify deregulation by merging regulatory signal from the two cell populations into one score. This joint approach, called JODA, proves advantageous over separate analysis of the cell populations and analysis without incorporation of knowledge. JODA is implemented and freely available in a Bioconductor package 'joda'. Using JODA, we show wide-spread re-wiring of gene regulatory networks upon neocarzinostatin-induced DNA damage in Human cells. We recover 645 deregulated genes in thirteen functional clusters performing the rich program of response to damage. We find that the clusters contain many previously characterized neocarzinostatin target genes. We investigate connectivity between those genes, explaining their cooperation in performing the common functions. We review genes with the most extreme deregulation scores, reporting their involvement in response to DNA damage. Finally, we investigate the indirect impact of the ATM pathway on the deregulated genes, and build a hypothetical hierarchy of direct regulation. These results prove that JODA is a step forward to a systems level, mechanistic understanding of changes in gene regulation between different cell populations.
Real-time dynamics of RNA Polymerase II clustering in live human cells
NASA Astrophysics Data System (ADS)
Cisse, Ibrahim
2014-03-01
Transcription is the first step in the central dogma of molecular biology, when genetic information encoded on DNA is made into messenger RNA. How this fundamental process occurs within living cells (in vivo) is poorly understood,[1] despite extensive biochemical characterizations with isolated biomolecules (in vitro). For high-order organisms, like humans, transcription is reported to be spatially compartmentalized in nuclear foci consisting of clusters of RNA Polymerase II, the enzyme responsible for synthesizing all messenger RNAs. However, little is known of when these foci assemble or their relative stability. We developed an approach based on photo-activation localization microscopy (PALM) combined with a temporal correlation analysis, which we refer to as tcPALM. The tcPALM method enables the real-time characterization of biomolecular spatiotemporal organization, with single-molecule sensitivity, directly in living cells.[2] Using tcPALM, we observed that RNA Polymerase II clusters form transiently, with an average lifetime of 5.1 (+/- 0.4) seconds. Stimuli affecting transcription regulation yielded orders of magnitude changes in the dynamics of the polymerase clusters, implying that clustering is regulated and plays a role in the cells ability to effect rapid response to external signals. Our results suggest that the transient crowding of enzymes may aid in rate-limiting steps of genome regulation.
An empirical method to cluster objective nebulizer adherence data among adults with cystic fibrosis.
Hoo, Zhe H; Campbell, Michael J; Curley, Rachael; Wildman, Martin J
2017-01-01
The purpose of using preventative inhaled treatments in cystic fibrosis is to improve health outcomes. Therefore, understanding the relationship between adherence to treatment and health outcome is crucial. Temporal variability, as well as absolute magnitude of adherence affects health outcomes, and there is likely to be a threshold effect in the relationship between adherence and outcomes. We therefore propose a pragmatic algorithm-based clustering method of objective nebulizer adherence data to better understand this relationship, and potentially, to guide clinical decisions. This clustering method consists of three related steps. The first step is to split adherence data for the previous 12 months into four 3-monthly sections. The second step is to calculate mean adherence for each section and to score the section based on mean adherence. The third step is to aggregate the individual scores to determine the final cluster ("cluster 1" = very low adherence; "cluster 2" = low adherence; "cluster 3" = moderate adherence; "cluster 4" = high adherence), and taking into account adherence trend as represented by sequential individual scores. The individual scores should be displayed along with the final cluster for clinicians to fully understand the adherence data. We present three cases to illustrate the use of the proposed clustering method. This pragmatic clustering method can deal with adherence data of variable duration (ie, can be used even if 12 months' worth of data are unavailable) and can cluster adherence data in real time. Empirical support for some of the clustering parameters is not yet available, but the suggested classifications provide a structure to investigate parameters in future prospective datasets in which there are accurate measurements of nebulizer adherence and health outcomes.
Using cluster ensemble and validation to identify subtypes of pervasive developmental disorders.
Shen, Jess J; Lee, Phil-Hyoun; Holden, Jeanette J A; Shatkay, Hagit
2007-10-11
Pervasive Developmental Disorders (PDD) are neurodevelopmental disorders characterized by impairments in social interaction, communication and behavior. Given the diversity and varying severity of PDD, diagnostic tools attempt to identify homogeneous subtypes within PDD. Identifying subtypes can lead to targeted etiology studies and to effective type-specific intervention. Cluster analysis can suggest coherent subsets in data; however, different methods and assumptions lead to different results. Several previous studies applied clustering to PDD data, varying in number and characteristics of the produced subtypes. Most studies used a relatively small dataset (fewer than 150 subjects), and all applied only a single clustering method. Here we study a relatively large dataset (358 PDD patients), using an ensemble of three clustering methods. The results are evaluated using several validation methods, and consolidated through an integration step. Four clusters are identified, analyzed and compared to subtypes previously defined by the widely used diagnostic tool DSM-IV.
Using Cluster Ensemble and Validation to Identify Subtypes of Pervasive Developmental Disorders
Shen, Jess J.; Lee, Phil Hyoun; Holden, Jeanette J.A.; Shatkay, Hagit
2007-01-01
Pervasive Developmental Disorders (PDD) are neurodevelopmental disorders characterized by impairments in social interaction, communication and behavior.1 Given the diversity and varying severity of PDD, diagnostic tools attempt to identify homogeneous subtypes within PDD. Identifying subtypes can lead to targeted etiology studies and to effective type-specific intervention. Cluster analysis can suggest coherent subsets in data; however, different methods and assumptions lead to different results. Several previous studies applied clustering to PDD data, varying in number and characteristics of the produced subtypes19. Most studies used a relatively small dataset (fewer than 150 subjects), and all applied only a single clustering method. Here we study a relatively large dataset (358 PDD patients), using an ensemble of three clustering methods. The results are evaluated using several validation methods, and consolidated through an integration step. Four clusters are identified, analyzed and compared to subtypes previously defined by the widely used diagnostic tool DSM-IV.2 PMID:18693920
Another collision for the Coma cluster
NASA Technical Reports Server (NTRS)
Vikhlinin, A.; Forman, W.; Jones, C.
1996-01-01
The wavelet transform analysis of the Rosat position sensitive proportional counter (PSPC) images of the Coma cluster are presented. The analysis shows, on small scales, a substructure dominated by two extended sources surrounding the two bright clusters NGC 4874 and NGC 4889. On scales of about 2 arcmin to 3 arcmin, the analysis reveals a tail of X-ray emission originating near the cluster center, curving to the south and east for approximately 25 arcmin and ending near the galaxy NGC 4911. The results are interpreted in terms of a merger of a group, having a core mass of approximately 10(exp 13) solar mass, with the main body of the Coma cluster.
Stepped wedge designs: insights from a design of experiments perspective.
Matthews, J N S; Forbes, A B
2017-10-30
Stepped wedge designs (SWDs) have received considerable attention recently, as they are potentially a useful way to assess new treatments in areas such as health services implementation. Because allocation is usually by cluster, SWDs are often viewed as a form of cluster-randomized trial. However, since the treatment within a cluster changes during the course of the study, they can also be viewed as a form of crossover design. This article explores SWDs from the perspective of crossover trials and designed experiments more generally. We show that the treatment effect estimator in a linear mixed effects model can be decomposed into a weighted mean of the estimators obtained from (1) regarding an SWD as a conventional row-column design and (2) a so-called vertical analysis, which is a row-column design with row effects omitted. This provides a precise representation of "horizontal" and "vertical" comparisons, respectively, which to date have appeared without formal description in the literature. This decomposition displays a sometimes surprising way the analysis corrects for the partial confounding between time and treatment effects. The approach also permits the quantification of the loss of efficiency caused by mis-specifying the correlation parameter in the mixed-effects model. Optimal extensions of the vertical analysis are obtained, and these are shown to be highly inefficient for values of the within-cluster dependence that are likely to be encountered in practice. Some recently described extensions to the classic SWD incorporating multiple treatments are also compared using the experimental design framework. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Mostafa, Mostafa E.
2005-10-01
The present study shows that reconstructing the reduced stress tensor (RST) from the measurable fault-slip data (FSD) and the immeasurable shear stress magnitudes (SSM) is a typical iteration problem. The result of direct inversion of FSD presented by Angelier [1990. Geophysical Journal International 103, 363-376] is considered as a starting point (zero step iteration) where all SSM are assigned constant value ( λ=√{3}/2). By iteration, the SSM and RST update each other until they converge to fixed values. Angelier [1990. Geophysical Journal International 103, 363-376] designed the function upsilon ( υ) and the two estimators: relative upsilon (RUP) and (ANG) to express the divergence between the measured and calculated shear stresses. Plotting individual faults' RUP at successive iteration steps shows that they tend to zero (simulated data) or to fixed values (real data) at a rate depending on the orientation and homogeneity of the data. FSD of related origin tend to aggregate in clusters. Plots of the estimators ANG versus RUP show that by iteration, labeled data points are disposed in clusters about a straight line. These two new plots form the basis of a technique for separating FSD into homogeneous clusters.
NASA Astrophysics Data System (ADS)
André, Marie-Françoise; Hall, Kevin
2005-02-01
Analysis of three generations of glacial deposits and of a range of geomorphic features including widespread honeycombs and tafonis at Two Step Cliffs/Mars Oasis (71°52‧S, 68°15‧W) provides new insights into the geomorphological evolution of West Antarctica, with special respect to alveolar weathering. At Two Step Terrace, indicators of the inherited character of cavernous weathering were found, such as 97% non-flaking and varnished backwalls, and 80% tafoni floors that are till-covered and/or sealed by lithobiontic coatings. Based on the NE predominant aspect of the alveolized boulder faces, tafoni initiation is attributed to coastal salt spray weathering by halite coming from the George VI Sound during the 6.5 ka BP open water period. The present-day activity of these inherited cavities is restricted to roof flaking attributed to a combination of processes involving thermal stresses. This 6.5 ka BP phase of coastal alveolization is the first step of a six-stage Holocene geomorphological scenario that includes alternatively phases of glacial advance or stationing, and phases of vegetal colonization and/or rock weathering and aeolian abrasion on the deglaciated outcrops. This geomorphic scenario is tentatively correlated with the available palaeoenvironmental record in the Antarctic Peninsula region, with two potential geomorphic indicators of the Holocene Optimum being identified: (1) clusters of centimetric honeycombs facing the sound (marine optimum at 6.5 ka BP); (2) salmon-pink lithobiontic coatings preserved inside cavities and at the boulder surface (terrestrial optimum at 4 3 ka BP).
Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis.
Turner, Elizabeth L; Prague, Melanie; Gallis, John A; Li, Fan; Murray, David M
2017-07-01
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.
Low Temperature Kinetics of the First Steps of Water Cluster Formation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bourgalais, J.; Roussel, V.; Capron, M.
2016-03-01
We present a combined experimental and theoretical low temperature kinetic study of water cluster formation. Water cluster growth takes place in low temperature (23-69 K) supersonic flows. The observed kinetics of formation of water clusters are reproduced with a kinetic model based on theoretical predictions for the first steps of clusterization. The temperature-and pressure-dependent association and dissociation rate coefficients are predicted with an ab initio transition state theory based master equation approach over a wide range of temperatures (20-100 K) and pressures (10(-6) - 10 bar).
Buddeberg-Fischer, B; Stamm, M; Buddeberg, C; Bauer, G; Hämmig, O; Klaghofer, R
2008-11-01
Based on the Effort-Reward-Imbalance Model by Siegrist a study was undertaken to find out (a) in what way young doctors assess effort and reward during their specialist training; (b) whether there are certain job stress patterns over time; and (c) what the correlations are, if any, between perceived job stress, health and satisfaction with life. Within the framework of a prospective study (2001 - 2007) 370 doctors who had just qualified and were residents in the German-speaking part of Switzerland were assessed four times by means of anonymized questionnaires. Job stress, measured by the Effort-Reward-Imbalance scale, as well as health and satisfaction with life were assessed in these doctors' 2nd (T2), 4th (T3), and 6th (T4) year of specialist training ("residents"). Stress patterns of the participants were evaluated, based on the effort and reward scale values at T2, T3, and T4, by two-step cluster analysis. Gender differences between the clusters were calculated by the 2 test and differences in the continuous variables by analysis of variance with repeated measurements. During residency the percentage of doctors who experienced an Effort-Reward-Imbalance (ratio between effort and reward ERI > 1) increased from 18% at T2 to 20 % at T3 to 25 % at T4. The cluster analysis revealed two clusters: Type 1 (67%) with effort values below average and reward values above average (ER balance) across the three measurement points, and type 2 (33 %) with effort values above average and reward values below average (ER imbalance). Subjects in cluster 2 showed unfavorable values, when compared with those in cluster 1, in overcommitment, in workload and in the health variables (anxiety, depression, physical and psychological well-being), as well as in their assessed satisfaction with life at all three measurement points. One third of the doctors experienced stress at work, caused by an effort-reward imbalance. This had a negative impact on their health and satisfaction with life. Regular supervision and goal-oriented career counselling provided by senior physicians could contribute to young doctors not feeling so much stressed at work, feeling well and being more content with their work.
Reinhart, F.; Huber, A.; Thiele, R.; Unden, G.
2010-01-01
The sensor kinase NreB from Staphylococcus carnosus contains an O2-sensitive [4Fe-4S]2+ cluster which is converted by O2 to a [2Fe-2S]2+ cluster, followed by complete degradation and formation of Fe-S-less apo-NreB. NreB·[2Fe-2S]2+ and apoNreB are devoid of kinase activity. NreB contains four Cys residues which ligate the Fe-S clusters. The accessibility of the Cys residues to alkylating agents was tested and used to differentiate Fe-S-containing and Fe-S-less NreB. In a two-step labeling procedure, accessible Cys residues in the native protein were first labeled by iodoacetate. In the second step, Cys residues not labeled in the first step were alkylated with the fluorescent monobromobimane (mBBr) after denaturing of the protein. In purified (aerobic) apoNreB, most (96%) of the Cys residues were alkylated in the first step, but in anaerobic (Fe-S-containing) NreB only a small portion (23%) were alkylated. In anaerobic bacteria, a very small portion of the Cys residues of NreB (9%) were accessible to alkylation in the native state, whereas most (89%) of the Cys residues from aerobic bacteria were accessible. The change in accessibility allowed determination of the half-time (6 min) for the conversion of NreB·[4Fe-4S]2+ to apoNreB after the addition of air in vitro. Overall, in anaerobic bacteria most of the NreB exists as NreB·[4Fe-4S]2+, whereas in aerobic bacteria the (Fe-S-less) apoNreB is predominant and represents the physiological form. The number of accessible Cys residues was also determined by iodoacetate alkylation followed by mass spectrometry of Cys-containing peptides. The pattern of mass increases confirmed the results from the two-step labeling experiments. PMID:19854899
The composite sequential clustering technique for analysis of multispectral scanner data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Temperament clusters associate with anxiety disorder comorbidity in depression.
Paavonen, Vesa; Luoto, Kaisa; Lassila, Antero; Leinonen, Esa; Kampman, Olli
2018-08-15
Individual temperament is associated with psychiatric morbidity and could explain differences in psychiatric comorbidities. We investigated the association of temperament profile clusters with anxiety disorder comorbidity in patients with depression. We assessed the temperament of 204 specialized care-treated depressed patients with the Temperament and Character Inventory (TCI-R) and their diagnoses with the Mini International Neuropsychiatric Interview. Two-step cluster analysis was used for defining patients' temperament profiles and logistic regression analysis was used for predicting different anxiety disorders for various temperament profiles. Four temperament clusters were found: 1) Novelty seekers with highest Novelty Seeking scores (n = 56),2) Persistent with highest Persistence scores (n = 36), 3) Reserved with lowest Novelty Seeking scores (n = 66) and 4) Wearied with highest Harm avoidance, lowest Reward Dependence and lowest Persistence scores (n = 58). After adjusting for clinical variables, panic disorder and/or agoraphobia were predicted by Novelty seekers' temperament profile with odds ratio [OR] = 3.5 (95% confidence interval [CI] = 1.8 - 6.9, p < 0.001), social anxiety disorder was predicted by Wearied temperament profile with OR = 3.4 (95% CI = 1.6 - 7.5, p = 0.002), and generalized anxiety disorder was predicted by Reserved temperament profile with OR = 2.6 (95% CI = 1.2 - 5.3, p = 0.01). The patients' temperament profiles were assessed while displaying depressive symptoms, which may have affected results. Temperament clusters with unique dimensional profiles were specifically associated with different anxiety disorders in this study. These results suggest that TCI-R could offer a valuable dimensional method for predicting the risk of anxiety disorders in diverse depressed patients. Copyright © 2018 Elsevier B.V. All rights reserved.
Pimenta e Silva Machado, Luciana; de Macedo Nery, Marianita Batista; de Góis Nery, Cláudio; Leles, Cláudio Rodrigues
2012-08-02
Temporomandibular disorder (TMD) patients might present a number of concurrent clinical diagnoses that may be clustered according to their similarity. Profiling patients' clinical presentations can be useful for better understanding the behavior of TMD and for providing appropriate treatment planning. The aim of this study was to simultaneously classify symptomatic patients diagnosed with a variety of subtypes of TMD into homogenous groups based on their clinical presentation and occurrence of comorbidities. Clinical records of 357 consecutive TMD patients seeking treatment in a private specialized clinic were included in the study sample. Patients presenting multiple subtypes of TMD diagnosed simultaneously were categorized according to the AAOP criteria. Descriptive statistics and two-step cluster analysis were used to characterize the clinical presentation of these patients based on the primary and secondary clinical diagnoses. The most common diagnoses were localized masticatory muscle pain (n = 125) and disc displacement without reduction (n = 104). Comorbidity was identified in 288 patients. The automatic selection of an optimal number of clusters included 100% of cases, generating an initial 6-cluster solution and a final 4-cluster solution. The interpretation of within-group ranking of the importance of variables in the clustering solutions resulted in the following characterization of clusters: chronic facial pain (n = 36), acute muscle pain (n = 125), acute articular pain (n = 75) and chronic articular impairment (n = 121). Subgroups of acute and chronic TMD patients seeking treatment can be identified using clustering methods to provide a better understanding of the clinical presentation of TMD when multiple diagnosis are present. Classifying patients into identifiable symptomatic profiles would help clinicians to estimate how common a disorder is within a population of TMD patients and understand the probability of certain pattern of clinical complaints.
Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R
2016-11-01
To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.
Initial Analysis of and Predictive Model Development for Weather Reroute Advisory Use
NASA Technical Reports Server (NTRS)
Arneson, Heather M.
2016-01-01
In response to severe weather conditions, traffic management coordinators specify reroutes to route air traffic around affected regions of airspace. Providing analysis and recommendations of available reroute options would assist the traffic management coordinators in making more efficient rerouting decisions. These recommendations can be developed by examining historical data to determine which previous reroute options were used in similar weather and traffic conditions. Essentially, using previous information to inform future decisions. This paper describes the initial steps and methodology used towards this goal. A method to extract relevant features from the large volume of weather data to quantify the convective weather scenario during a particular time range is presented. Similar routes are clustered. A description of the algorithm to identify which cluster of reroute advisories were actually followed by pilots is described. Models built for fifteen of the top twenty most frequently used reroute clusters correctly predict the use of the cluster for over 60 of the test examples. Results are preliminary but indicate that the methodology is worth pursuing with modifications based on insight gained from this analysis.
Lee, Hyunsoo; Lee, Han-Bo-Ram; Kwon, Sangku; Salmeron, Miquel; Park, Jeong Young
2015-04-28
We report on the physical and chemical properties of atomic steps on the surface of highly oriented pyrolytic graphite (HOPG) investigated using atomic force microscopy. Two types of step edges are identified: internal (formed during crystal growth) and external (formed by mechanical cleavage of bulk HOPG). The external steps exhibit higher friction than the internal steps due to the broken bonds of the exposed edge C atoms, while carbon atoms in the internal steps are not exposed. The reactivity of the atomic steps is manifested in a variety of ways, including the preferential attachment of Pt nanoparticles deposited on HOPG when using atomic layer deposition and KOH clusters formed during drop casting from aqueous solutions. These phenomena imply that only external atomic steps can be used for selective electrodeposition for nanoscale electronic devices.
Analysis of the Seismicity Preceding Large Earthquakes
NASA Astrophysics Data System (ADS)
Stallone, A.; Marzocchi, W.
2016-12-01
The most common earthquake forecasting models assume that the magnitude of the next earthquake is independent from the past. This feature is probably one of the most severe limitations of the capability to forecast large earthquakes.In this work, we investigate empirically on this specific aspect, exploring whether spatial-temporal variations in seismicity encode some information on the magnitude of the future earthquakes. For this purpose, and to verify the universality of the findings, we consider seismic catalogs covering quite different space-time-magnitude windows, such as the Alto Tiberina Near Fault Observatory (TABOO) catalogue, and the California and Japanese seismic catalog. Our method is inspired by the statistical methodology proposed by Zaliapin (2013) to distinguish triggered and background earthquakes, using the nearest-neighbor clustering analysis in a two-dimension plan defined by rescaled time and space. In particular, we generalize the metric based on the nearest-neighbor to a metric based on the k-nearest-neighbors clustering analysis that allows us to consider the overall space-time-magnitude distribution of k-earthquakes (k-foreshocks) which anticipate one target event (the mainshock); then we analyze the statistical properties of the clusters identified in this rescaled space. In essence, the main goal of this study is to verify if different classes of mainshock magnitudes are characterized by distinctive k-foreshocks distribution. The final step is to show how the findings of this work may (or not) improve the skill of existing earthquake forecasting models.
NASA Astrophysics Data System (ADS)
Kafle, Amol; Coy, Stephen L.; Wong, Bryan M.; Fornace, Albert J.; Glick, James J.; Vouros, Paul
2014-07-01
A systematic study involving the use and optimization of gas-phase modifiers in quantitative differential mobility-mass spectrometry (DMS-MS) analysis is presented using nucleoside-adduct biomarkers of DNA damage as an important reference point for analysis in complex matrices. Commonly used polar protic and polar aprotic modifiers have been screened for use against two deoxyguanosine adducts of DNA: N-(deoxyguanosin-8-yl)-4-aminobiphenyl (dG-C8-4-ABP) and N-(deoxyguanosin-8-y1)-2-amino-l-methyl-6-phenylimidazo[4,5-b]pyridine (dG-C8-PhIP). Particular attention was paid to compensation voltage (CoV) shifts, peak shapes, and product ion signal intensities while optimizing the DMS-MS conditions. The optimized parameters were then applied to rapid quantitation of the DNA adducts in calf thymus DNA. After a protein precipitation step, adduct levels corresponding to less than one modification in 106 normal DNA bases were detected using the DMS-MS platform. Based on DMS fundamentals and ab initio thermochemical results, we interpret the complexity of DMS modifier responses in terms of thermal activation and the development of solvent shells. At very high bulk gas temperature, modifier dipole moment may be the most important factor in cluster formation and cluster geometry, but at lower temperatures, multi-neutral clusters are important and less predictable. This work provides a useful protocol for targeted DNA adduct quantitation and a basis for future work on DMS modifier effects.
Kafle, Amol; Coy, Stephen L.; Wong, Bryan M.; Fornace, Albert J.; Glick, James J.; Vouros, Paul
2014-01-01
A systematic study involving the use and optimization of gas phase modifiers in quantitative differential mobility- mass spectrometry (DMS-MS) analysis is presented using mucleoside-adduct biomarkers of DNA damage as an important reference point for analysis in complex matrices. Commonly used polar protic and polar aprotic modifiers have been screened for use against two deoxyguanosine adducts of DNA: N-(deoxyguanosin-8-yl)-4-aminobiphenyl (dG-C8-4-ABP) and N-(deoxyguanosin-8-y1)-2-amino-l-methyl-6-phenylimidazo[4,5-b]pyridine (dG-C8-PhIP). Particular attention was paid to compensation voltage (CoV) shifts, peak shapes and product ion signal intensities while optimizing the DMS-MS conditions. The optimized parameters were then applied to rapid quantitation of the DNA adducts in calf thymus DNA. After a protein precipitation step, adduct levels corresponding to less than one modification in 106 normal DNA bases were detected using the DMS-MS platform. Based on DMS fundamentals and ab-initio thermochemical results we interpret the complexity of DMS modifier responses in terms of thermal activation and the development of solvent shells. At very high bulk gas temperature, modifier dipole moment may be the most important factor in cluster formation and cluster geometry in mobility differences, but at lower temperatures multi-neutral clusters are important and less predictable. This work provides a useful protocol for targeted DNA adduct quantitation and a basis for future work on DMS modifier effects. PMID:24452298
NASA Astrophysics Data System (ADS)
Mengis, Nadine; Keller, David P.; Oschlies, Andreas
2018-01-01
This study introduces the Systematic Correlation Matrix Evaluation (SCoMaE) method, a bottom-up approach which combines expert judgment and statistical information to systematically select transparent, nonredundant indicators for a comprehensive assessment of the state of the Earth system. The methods consists of two basic steps: (1) the calculation of a correlation matrix among variables relevant for a given research question and (2) the systematic evaluation of the matrix, to identify clusters of variables with similar behavior and respective mutually independent indicators. Optional further analysis steps include (3) the interpretation of the identified clusters, enabling a learning effect from the selection of indicators, (4) testing the robustness of identified clusters with respect to changes in forcing or boundary conditions, (5) enabling a comparative assessment of varying scenarios by constructing and evaluating a common correlation matrix, and (6) the inclusion of expert judgment, for example, to prescribe indicators, to allow for considerations other than statistical consistency. The example application of the SCoMaE method to Earth system model output forced by different CO2 emission scenarios reveals the necessity of reevaluating indicators identified in a historical scenario simulation for an accurate assessment of an intermediate-high, as well as a business-as-usual, climate change scenario simulation. This necessity arises from changes in prevailing correlations in the Earth system under varying climate forcing. For a comparative assessment of the three climate change scenarios, we construct and evaluate a common correlation matrix, in which we identify robust correlations between variables across the three considered scenarios.
Biclustering of gene expression data using reactive greedy randomized adaptive search procedure
Dharan, Smitha; Nair, Achuthsankar S
2009-01-01
Background Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. Results We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. Conclusion The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts. PMID:19208127
Caminiti, Caterina; Iezzi, Elisa; Passalacqua, Rodolfo
2017-01-01
Introduction Our group previously demonstrated the feasibility of the HuCare Quality Improvement Strategy (HQIS), aimed at integrating into practice six psychosocial interventions recommended by international guidelines. This trial will assess whether the introduction of the strategy in oncology wards improves patient’s health-related quality of life (HRQoL). Methods and analysis Multicentre, incomplete stepped-wedge cluster randomised controlled trial, conducted in three clusters of five centres each, in three equally spaced time epochs. The study also includes an initial epoch when none of the centres are exposed to the intervention, and a final epoch when all centres will have implemented the strategy. The intervention is applied at a cluster level, and assessed at an individual level with cross-sectional model. A total of 720 patients who received a cancer diagnosis in the previous 2 months and about to start medical treatment will be enrolled. The primary aim is to evaluate the effectiveness of the HQIS versus standard care in terms of improvement of at least one of two domains (emotional and social functions) of HRQoL using the EORTC QLQ-C30 (European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire-Core 30 items) questionnaire, at baseline and at 3 months. This outcome was chosen because patients with cancer generally exhibit low HRQoL, particularly at certain stages of care, and because it allows to assess the strategy’s impact as perceived by patients themselves. The HQIS comprises three phases: (1) clinician training—to improve communication-relational skills and instruct on the project; (2) centre support—four on-site visits by experts of the project team, aimed to boost motivation, help with context analysis and identification of solutions; (3) implementation of Evidence-Based Medicine (EBM) recommendations at the centre. Ethics and dissemination Ethics committee review approval has been obtained from the Ethics Committee of Parma. Results will be disseminated at conferences, and in peer-reviewed and professional journals intended for policymakers and managers. Trial registration number NCT03008993; Pre-results. PMID:28988170
Accident patterns for construction-related workers: a cluster analysis
NASA Astrophysics Data System (ADS)
Liao, Chia-Wen; Tyan, Yaw-Yauan
2012-01-01
The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.
Accident patterns for construction-related workers: a cluster analysis
NASA Astrophysics Data System (ADS)
Liao, Chia-Wen; Tyan, Yaw-Yauan
2011-12-01
The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.
Electrician Cluster, STEP Training Plan. Skills Training and Education Program.
ERIC Educational Resources Information Center
Alabama State Dept. of Postsecondary Education, Montgomery.
This guide is a training plan for the electrical skills cluster of the Skills Training and Education Program (STEP), an open-entry, open-exit program funded by the Job Training Partnership Act (JTPA). In the STEP training plan, each task has its own lesson plan guide. This manual contains the following information: definitions, instructions for…
Clerical Cluster, STEP Training Plan. Skills Training and Education Program.
ERIC Educational Resources Information Center
Alabama State Dept. of Postsecondary Education, Montgomery.
This guide is a training plan for the clerical skills cluster of the Skills Training and Education Program (STEP), an open-entry, open-exit program funded by the Job Training Partnership Act (JTPA). In the STEP training plan, each task has its own lesson plan guide. This manual contains the following information: definitions, instructions for…
ERIC Educational Resources Information Center
Miyamoto, S.; Nakayama, K.
1983-01-01
A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…
A two step Bayesian approach for genomic prediction of breeding values.
Shariati, Mohammad M; Sørensen, Peter; Janss, Luc
2012-05-21
In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.
Fast clustering using adaptive density peak detection.
Wang, Xiao-Feng; Xu, Yifan
2017-12-01
Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Performance analysis of parallel gravitational N-body codes on large GPU clusters
NASA Astrophysics Data System (ADS)
Huang, Si-Yi; Spurzem, Rainer; Berczik, Peter
2016-01-01
We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large Graphics Processing Unit (GPU) clusters, both of which are pioneers in their own fields as well as on certain mutual scales - NBODY6++ and Bonsai. We carry out benchmarks of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200 - 300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of the two codes, finding that in the same cases Bonsai adopts larger time steps as well as larger relative energy errors than NBODY6++, typically ranging from 10 - 50 times larger, depending on the chosen parameters of the codes. Although the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scales, thus allowing the user to choose either one by fine-tuning parameters accordingly.
Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.
Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço
2017-11-01
The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.
Orbit Clustering Based on Transfer Cost
NASA Technical Reports Server (NTRS)
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aubart, M.A.; Chandler, B.D.; Gould, R.A.T.
Platinum- and palladium-gold cluster compounds were evaluated with respect to their ability to catalyze H{sub 2}-D{sub 2} equilibration. In addition, these phosphine-stabilized complexes were structurally characterized. Mechanistic studies for this reaction were performed by kinetic and spectroscopic analysis. The catalytic reaction appears to occur in three steps, which were determined.
Stephenson, Geoffrey M; Zygouris, Nikolaos
2007-02-01
Thirty clients receiving Twelve-Step Facilitation Therapy in a rehabilitation setting formed the intervention group. They were asked to complete in third person a weekly evaluation of progress based on reading personal "Feelings" diaries they had written on a daily basis over a period of one week, starting 3weeks previously. The diaries of the clients and of a further 60 clients in two matched control groups were compared. One control group consisted of clients receiving treatment before the intervention was introduced, and clients in the second control group received treatment after the intervention was terminated. Clients in the control groups were matched to the intervention group according to presenting disorder (alcohol, drugs or food), gender and age. Analysis of the number of words written and diaries produced suggested that the experimental group's productivity was enhanced. Linguistic and cluster analyses indicated that the clients in the Intervention group referred more frequently to key elements of the programme (steps and spirituality) and responded in a more integrated way to the major aspects of their treatment regime. The study supported the expectation that by promoting self-reflection on progress in therapeutic settings, an increase in programme engagement can be expected.
Terminal-Area Aircraft Intent Inference Approach Based on Online Trajectory Clustering.
Yang, Yang; Zhang, Jun; Cai, Kai-quan
2015-01-01
Terminal-area aircraft intent inference (T-AII) is a prerequisite to detect and avoid potential aircraft conflict in the terminal airspace. T-AII challenges the state-of-the-art AII approaches due to the uncertainties of air traffic situation, in particular due to the undefined flight routes and frequent maneuvers. In this paper, a novel T-AII approach is introduced to address the limitations by solving the problem with two steps that are intent modeling and intent inference. In the modeling step, an online trajectory clustering procedure is designed for recognizing the real-time available routes in replacing of the missed plan routes. In the inference step, we then present a probabilistic T-AII approach based on the multiple flight attributes to improve the inference performance in maneuvering scenarios. The proposed approach is validated with real radar trajectory and flight attributes data of 34 days collected from Chengdu terminal area in China. Preliminary results show the efficacy of the presented approach.
Simultaneous Two-Way Clustering of Multiple Correspondence Analysis
ERIC Educational Resources Information Center
Hwang, Heungsun; Dillon, William R.
2010-01-01
A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…
Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.
Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik
2017-11-01
Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease
Konno, Satoshi; Taniguchi, Natsuko; Makita, Hironi; Nakamaru, Yuji; Shimizu, Kaoruko; Shijubo, Noriharu; Fuke, Satoshi; Takeyabu, Kimihiro; Oguri, Mitsuru; Kimura, Hirokazu; Maeda, Yukiko; Suzuki, Masaru; Nagai, Katsura; Ito, Yoichi M; Wenzel, Sally E; Nishimura, Masaharu
2015-12-01
Smoking may have multifactorial effects on asthma phenotypes, particularly in severe asthma. Cluster analysis has been applied to explore novel phenotypes, which are not based on any a priori hypotheses. To explore novel severe asthma phenotypes by cluster analysis when including cigarette smokers. We recruited a total of 127 subjects with severe asthma, including 59 current or ex-smokers, from our university hospital and its 29 affiliated hospitals/pulmonary clinics. Twelve clinical variables obtained during a 2-day hospital stay were used for cluster analysis. After clustering using clinical variables, the sputum levels of 14 molecules were measured to biologically characterize the clinical clusters. Five clinical clusters were identified, including two characterized by high pack-year exposure to cigarette smoking and low FEV1/FVC. There were marked differences between the two clusters of cigarette smokers. One had high levels of circulating eosinophils, high IgE levels, and a high sinus disease score. The other was characterized by low levels of the same parameters. Sputum analysis revealed increased levels of IL-5 in the former cluster and increased levels of IL-6 and osteopontin in the latter. The other three clusters were similar to those previously reported: young onset/atopic, nonsmoker/less eosinophilic, and female/obese. Key clinical variables were confirmed to be stable and consistent 1 year later. This study reveals two distinct phenotypes of severe asthma in current and former cigarette smokers with potentially different biological pathways contributing to fixed airflow limitation. Clinical trial registered with www.umin.ac.jp (000003254).
NASA Astrophysics Data System (ADS)
Abdi, Abdi M.; Szu, Harold H.
2003-04-01
With the growing rate of interconnection among computer systems, network security is becoming a real challenge. Intrusion Detection System (IDS) is designed to protect the availability, confidentiality and integrity of critical network information systems. Today"s approach to network intrusion detection involves the use of rule-based expert systems to identify an indication of known attack or anomalies. However, these techniques are less successful in identifying today"s attacks. Hackers are perpetually inventing new and previously unanticipated techniques to compromise information infrastructure. This paper proposes a dynamic way of detecting network intruders on time serious data. The proposed approach consists of a two-step process. Firstly, obtaining an efficient multi-user detection method, employing the recently introduced complexity minimization approach as a generalization of a standard ICA. Secondly, we identified unsupervised learning neural network architecture based on Kohonen"s Self-Organizing Map for potential functional clustering. These two steps working together adaptively will provide a pseudo-real time novelty detection attribute to supplement the current intrusion detection statistical methodology.
Fablet, C; Rose, N; Grasland, B; Robert, N; Lewandowski, E; Gosselin, M
2018-01-01
Growing and finishing performances of pigs strongly influence farm efficiency and profitability. The performances of the pigs rely on the herd health status and also on several non-infectious factors. Many recommendations for the improvement of the technical performances of a herd are based on the results of studies assessing the effect of one or a limited number of infections or environmental factors. Few studies investigated jointly the influence of both type of factors on swine herd performances. This work aimed at identifying infectious and non-infectious factors associated with the growing and finishing performances of 41 French swine herds. Two groups of herds were identified using a clustering analysis: a cluster of 24 herds with the highest technical performance values (mean average daily gain = 781.1 g/day +/- 26.3; mean feed conversion ratio = 2.5 kg/kg +/- 0.1; mean mortality rate = 4.1% +/- 0.9; and mean carcass slaughter weight = 121.2 kg +/- 5.2) and a cluster of 17 herds with the lowest performance values (mean average daily gain =715.8 g/day +/- 26.5; mean feed conversion ratio = 2.6 kg/kg +/- 0.1; mean mortality rate = 6.8% +/- 2.0; and mean carcass slaughter weight = 117.7 kg +/- 3.6). Multiple correspondence analysis was used to identify factors associated with the level of technical performance. Infection with the porcine reproductive and respiratory syndrome virus and the porcine circovirus type 2 were infectious factors associated with the cluster having the lowest performance values. This cluster also featured farrow-to-finish type herds, a short interval between successive batches of pigs (≤3 weeks) and mixing of pigs from different batches in the growing or/and finishing steps. Inconsistency between nursery and fattening building management was another factor associated with the low-performance cluster. The odds of a herd showing low growing-finishing performance was significantly increased when infected by PRRS virus in the growing-finishing steps (OR = 8.8, 95% confidence interval [95% CI]: 1.8-41.7) and belonging to a farrow-to-finish type herd (OR = 5.1, 95% CI = 1.1-23.8). Herd management and viral infections significantly influenced the performance levels of the swine herds included in this study.
A taxonomy of accountable care organizations for policy and practice.
Shortell, Stephen M; Wu, Frances M; Lewis, Valerie A; Colla, Carrie H; Fisher, Elliott S
2014-12-01
To develop an exploratory taxonomy of Accountable Care Organizations (ACOs) to describe and understand early ACO development and to provide a basis for technical assistance and future evaluation of performance. Data from the National Survey of Accountable Care Organizations, fielded between October 2012 and May 2013, of 173 Medicare, Medicaid, and commercial payer ACOs. Drawing on resource dependence and institutional theory, we develop measures of eight attributes of ACOs such as size, scope of services offered, and the use of performance accountability mechanisms. Data are analyzed using a two-step cluster analysis approach that accounts for both continuous and categorical data. We identified a reliable and internally valid three-cluster solution: larger, integrated systems that offer a broad scope of services and frequently include one or more postacute facilities; smaller, physician-led practices, centered in primary care, and that possess a relatively high degree of physician performance management; and moderately sized, joint hospital-physician and coalition-led groups that offer a moderately broad scope of services with some involvement of postacute facilities. ACOs can be characterized into three distinct clusters. The taxonomy provides a framework for assessing performance, for targeting technical assistance, and for diagnosing potential antitrust violations. © Health Research and Educational Trust.
A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis
ERIC Educational Resources Information Center
DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim
2004-01-01
This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…
NASA Astrophysics Data System (ADS)
Vargas-Magaña, Mariana; Ho, Shirley; Fromenteau, Sebastien.; Cuesta, Antonio. J.
2017-05-01
The reconstruction algorithm introduced by Eisenstein et al., which is widely used in clustering analysis, is based on the inference of the first-order Lagrangian displacement field from the Gaussian smoothed galaxy density field in redshift space. The smoothing scale applied to the density field affects the inferred displacement field that is used to move the galaxies, and partially erases the non-linear evolution of the density field. In this article, we explore this crucial step in the reconstruction algorithm. We study the performance of the reconstruction technique using two metrics: first, we study the performance using the anisotropic clustering, extending previous studies focused on isotropic clustering; secondly, we study its effect on the displacement field. We find that smoothing has a strong effect in the quadrupole of the correlation function and affects the accuracy and precision with which we can measure DA(z) and H(z). We find that the optimal smoothing scale to use in the reconstruction algorithm applied to Baryonic Oscillations Spectroscopic Survey-Constant (stellar) MASS (CMASS) is between 5 and 10 h-1 Mpc. Varying from the `usual' 15-5 h-1 Mpc shows ˜0.3 per cent variations in DA(z) and ˜0.4 per cent H(z) and uncertainties are also reduced by 40 per cent and 30 per cent, respectively. We also find that the accuracy of velocity field reconstruction depends strongly on the smoothing scale used for the density field. We measure the bias and uncertainties associated with different choices of smoothing length.
Yang, Li-Gang; Tucker, Joseph D; Yang, Bin; Shen, Song-Ying; Sun, Xi-Feng; Chen, Yong-Feng; Chen, Xiang-Sheng
2010-12-30
Syphilis cases have risen in many parts of China, with developed regions reporting the greatest share of cases. Since syphilis increases in these areas are likely driven by both increased screening and changes in sexual behaviours, distinguishing between these two factors is important. Examining municipal-level primary syphilis cases with spatial analysis allows a more direct understanding of changing sexual behaviours at a more policy-relevant level. In this study we examined all reported primary syphilis cases from Guangdong Province, a southern province in China, since the disease was first incorporated into the mandatory reporting system in 1995. Spatial autocorrelation statistics were used to correlate municipal-level clustering of reported primary syphilis cases and gross domestic product (GDP). A total of 52,036 primary syphilis cases were reported over the period 1995-2008, and the primary syphilis cases increased from 0.88 per 100,000 population in 1995 to 7.61 per 100,000 in 2008. The Pearl River Delta region has a disproportionate share (44.7%) of syphilis cases compared to other regions. Syphilis cases were spatially clustered (p = 0.01) and Moran's I analysis found that syphilis cases were clustered in municipalities with higher GDP (p = 0.004). Primary syphilis cases continue to increase in Guangdong Province, especially in the Pearl River Delta region. Considering the economic impact of syphilis and its tendency to spatially cluster, expanded syphilis testing in specific municipalities and further investigating the costs and benefits of syphilis screening are critical next steps.
Improving stability of prediction models based on correlated omics data by using network approaches.
Tissier, Renaud; Houwing-Duistermaat, Jeanine; Rodríguez-Girondo, Mar
2018-01-01
Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.
van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim
2017-01-01
In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Examining Subtypes of Behavioral/Emotional Risk Using Cluster Analysis
ERIC Educational Resources Information Center
Dever, Bridget V.; Gallagher, Emily K.; Hochbein, Craig D.; Loukas, Austin; Dai, Chenchen
2017-01-01
Behavioral and emotional problems among children and adolescents can lead to numerous negative outcomes without intervention. From a prevention standpoint, screening for behavioral and emotional risk is an important step toward identifying such problems before the point of diagnosis or referral. The present study conducted a k-means cluster…
A statistical software tool, Stream Fish Community Predictor (SFCP), based on EMAP stream sampling in the mid-Atlantic Highlands, was developed to predict stream fish communities using stream and watershed characteristics. Step one in the tool development was a cluster analysis t...
The effect of billboard design specifications on driving: A pilot study.
Marciano, Hadas; Setter, Pe'erly
2017-07-01
Decades of research on the effects of advertising billboards on road accident rates, driver performance, and driver visual scanning behavior, has produced no conclusive findings. We suggest that road safety researchers should shift their focus and attempt to identify the billboard characteristics that are most distracting to drivers. This line of research may produce concrete guidelines for permissible billboards that would be likely to reduce the influence of the billboards on road safety. The current study is a first step towards this end. A pool of 161 photos of real advertising billboards was used as stimuli within a triple task paradigm designed to simulate certain components of driving. Each trial consisted of one ongoing tracking task accompanied by two additional concurrent tasks: (1) billboard observation task; and (2) circle color change identification task. Five clusters of billboards, identified by conducting a cluster analysis of their graphic content, were used as a within variable in one-way ANOVAs conducted on performance level data collected from the multiple tasks. Cluster 5, labeled Loaded Billboards, yielded significantly deteriorated performance on the tracking task. Cluster 4, labeled Graphical Billboards, yielded deteriorated performance primarily on the color change identification task. Cluster 3, labeled Minimal Billboards, had no effect on any of these tasks. We strongly recommend that these clusters be systematically explored in experiments involving additional real driving settings, such as driving simulators and field studies. This will enable validation of the current results and help incorporate them into real driving situations. Copyright © 2017. Published by Elsevier Ltd.
Exploiting visual search theory to infer social interactions
NASA Astrophysics Data System (ADS)
Rota, Paolo; Dang-Nguyen, Duc-Tien; Conci, Nicola; Sebe, Nicu
2013-03-01
In this paper we propose a new method to infer human social interactions using typical techniques adopted in literature for visual search and information retrieval. The main piece of information we use to discriminate among different types of interactions is provided by proxemics cues acquired by a tracker, and used to distinguish between intentional and casual interactions. The proxemics information has been acquired through the analysis of two different metrics: on the one hand we observe the current distance between subjects, and on the other hand we measure the O-space synergy between subjects. The obtained values are taken at every time step over a temporal sliding window, and processed in the Discrete Fourier Transform (DFT) domain. The features are eventually merged into an unique array, and clustered using the K-means algorithm. The clusters are reorganized using a second larger temporal window into a Bag Of Words framework, so as to build the feature vector that will feed the SVM classifier.
Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.
2016-01-01
Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796
Albaum, Stefan P; Neuweger, Heiko; Fränzel, Benjamin; Lange, Sita; Mertens, Dominik; Trötschel, Christian; Wolters, Dirk; Kalinowski, Jörn; Nattkemper, Tim W; Goesmann, Alexander
2009-12-01
The goal of present -omics sciences is to understand biological systems as a whole in terms of interactions of the individual cellular components. One of the main building blocks in this field of study is proteomics where tandem mass spectrometry (LC-MS/MS) in combination with isotopic labelling techniques provides a common way to obtain a direct insight into regulation at the protein level. Methods to identify and quantify the peptides contained in a sample are well established, and their output usually results in lists of identified proteins and calculated relative abundance values. The next step is to move ahead from these abstract lists and apply statistical inference methods to compare measurements, to identify genes that are significantly up- or down-regulated, or to detect clusters of proteins with similar expression profiles. We introduce the Rich Internet Application (RIA) Qupe providing comprehensive data management and analysis functions for LC-MS/MS experiments. Starting with the import of mass spectra data the system guides the experimenter through the process of protein identification by database search, the calculation of protein abundance ratios, and in particular, the statistical evaluation of the quantification results including multivariate analysis methods such as analysis of variance or hierarchical cluster analysis. While a data model to store these results has been developed, a well-defined programming interface facilitates the integration of novel approaches. A compute cluster is utilized to distribute computationally intensive calculations, and a web service allows to interchange information with other -omics software applications. To demonstrate that Qupe represents a step forward in quantitative proteomics analysis an application study on Corynebacterium glutamicum has been carried out. Qupe is implemented in Java utilizing Hibernate, Echo2, R and the Spring framework. We encourage the usage of the RIA in the sense of the 'software as a service' concept, maintained on our servers and accessible at the following location: http://qupe.cebitec.uni-bielefeld.de. Supplementary data are available at Bioinformatics online.
Hanrath, Michael; Engels-Putzka, Anna
2010-08-14
In this paper, we present an efficient implementation of general tensor contractions, which is part of a new coupled-cluster program. The tensor contractions, used to evaluate the residuals in each coupled-cluster iteration are particularly important for the performance of the program. We developed a generic procedure, which carries out contractions of two tensors irrespective of their explicit structure. It can handle coupled-cluster-type expressions of arbitrary excitation level. To make the contraction efficient without loosing flexibility, we use a three-step procedure. First, the data contained in the tensors are rearranged into matrices, then a matrix-matrix multiplication is performed, and finally the result is backtransformed to a tensor. The current implementation is significantly more efficient than previous ones capable of treating arbitrary high excitations.
Bruno, Andrew E.; Ruby, Amanda M.; Luft, Joseph R.; Grant, Thomas D.; Seetharaman, Jayaraman; Montelione, Gaetano T.; Hunt, John F.; Snell, Edward H.
2014-01-01
Many bioscience fields employ high-throughput methods to screen multiple biochemical conditions. The analysis of these becomes tedious without a degree of automation. Crystallization, a rate limiting step in biological X-ray crystallography, is one of these fields. Screening of multiple potential crystallization conditions (cocktails) is the most effective method of probing a proteins phase diagram and guiding crystallization but the interpretation of results can be time-consuming. To aid this empirical approach a cocktail distance coefficient was developed to quantitatively compare macromolecule crystallization conditions and outcome. These coefficients were evaluated against an existing similarity metric developed for crystallization, the C6 metric, using both virtual crystallization screens and by comparison of two related 1,536-cocktail high-throughput crystallization screens. Hierarchical clustering was employed to visualize one of these screens and the crystallization results from an exopolyphosphatase-related protein from Bacteroides fragilis, (BfR192) overlaid on this clustering. This demonstrated a strong correlation between certain chemically related clusters and crystal lead conditions. While this analysis was not used to guide the initial crystallization optimization, it led to the re-evaluation of unexplained peaks in the electron density map of the protein and to the insertion and correct placement of sodium, potassium and phosphate atoms in the structure. With these in place, the resulting structure of the putative active site demonstrated features consistent with active sites of other phosphatases which are involved in binding the phosphoryl moieties of nucleotide triphosphates. The new distance coefficient, CDcoeff, appears to be robust in this application, and coupled with hierarchical clustering and the overlay of crystallization outcome, reveals information of biological relevance. While tested with a single example the potential applications related to crystallography appear promising and the distance coefficient, clustering, and hierarchal visualization of results undoubtedly have applications in wider fields. PMID:24971458
X-ray and EPR Characterization of the Auxiliary Fe-S Clusters in the Radical SAM Enzyme PqqE.
Barr, Ian; Stich, Troy A; Gizzi, Anthony S; Grove, Tyler L; Bonanno, Jeffrey B; Latham, John A; Chung, Tyler; Wilmot, Carrie M; Britt, R David; Almo, Steven C; Klinman, Judith P
2018-02-27
The Radical SAM (RS) enzyme PqqE catalyzes the first step in the biosynthesis of the bacterial cofactor pyrroloquinoline quinone, forming a new carbon-carbon bond between two side chains within the ribosomally synthesized peptide substrate PqqA. In addition to the active site RS 4Fe-4S cluster, PqqE is predicted to have two auxiliary Fe-S clusters, like the other members of the SPASM domain family. Here we identify these sites and examine their structure using a combination of X-ray crystallography and Mössbauer and electron paramagnetic resonance (EPR) spectroscopies. X-ray crystallography allows us to identify the ligands to each of the two auxiliary clusters at the C-terminal region of the protein. The auxiliary cluster nearest the RS site (AuxI) is in the form of a 2Fe-2S cluster ligated by four cysteines, an Fe-S center not seen previously in other SPASM domain proteins; this assignment is further supported by Mössbauer and EPR spectroscopies. The second, more remote cluster (AuxII) is a 4Fe-4S center that is ligated by three cysteine residues and one aspartate residue. In addition, we examined the roles these ligands play in catalysis by the RS and AuxII clusters using site-directed mutagenesis coupled with EPR spectroscopy. Lastly, we discuss the possible functional consequences that these unique AuxI and AuxII clusters may have in catalysis for PqqE and how these may extend to additional RS enzymes catalyzing the post-translational modification of ribosomally encoded peptides.
NASA Astrophysics Data System (ADS)
Bustamam, A.; Ulul, E. D.; Hura, H. F. A.; Siswantining, T.
2017-07-01
Hierarchical clustering is one of effective methods in creating a phylogenetic tree based on the distance matrix between DNA (deoxyribonucleic acid) sequences. One of the well-known methods to calculate the distance matrix is k-mer method. Generally, k-mer is more efficient than some distance matrix calculation techniques. The steps of k-mer method are started from creating k-mer sparse matrix, and followed by creating k-mer singular value vectors. The last step is computing the distance amongst vectors. In this paper, we analyze the sequences of MERS-CoV (Middle East Respiratory Syndrome - Coronavirus) DNA by implementing hierarchical clustering using k-mer sparse matrix in order to perform the phylogenetic analysis. Our results show that the ancestor of our MERS-CoV is coming from Egypt. Moreover, we found that the MERS-CoV infection that occurs in one country may not necessarily come from the same country of origin. This suggests that the process of MERS-CoV mutation might not only be influenced by geographical factor.
Castro-González, Maribeb; Braker, Gesche; Farías, Laura; Ulloa, Osvaldo
2005-09-01
The major sites of water column denitrification in the ocean are oxygen minimum zones (OMZ), such as one in the eastern South Pacific (ESP). To understand the structure of denitrifying communities in the OMZ off Chile, denitrifier communities at two sites in the Chilean OMZ (Antofagasta and Iquique) and at different water depths were explored by terminal restriction fragment length polymorphism analysis and cloning of polymerase chain reaction (PCR)-amplified nirS genes. NirS is a functional marker gene for denitrification encoding cytochrome cd1-containing nitrite reductase, which catalyses the reduction of nitrite to nitric oxide, the key step in denitrification. Major differences were found between communities from the two geographic locations. Shifts in community structure occurred along a biogeochemical gradient at Antofagasta. Canonical correspondence analysis indicated that O2, NO3-, NO2- and depth were important environmental factors governing these communities along the biogeochemical gradient in the water column. Phylogenetic analysis grouped the majority of clones from the ESP in distinct clusters of genes from presumably novel and yet uncultivated denitrifers. These nirS clusters were distantly related to those found in the water column of the Arabian Sea but the phylogenetic distance was even higher compared with environmental sequences from marine sediments or any other habitat. This finding suggests similar environmental conditions trigger the development of denitrifiers with related nirS genotypes despite large geographic distances.
Numerical Analysis of Base Flowfield for a Four-Engine Clustered Nozzle Configuration
NASA Technical Reports Server (NTRS)
Wang, Ten-See
1995-01-01
Excessive base heating has been a problem for many launch vehicles. For certain designs such as the direct dump of turbine exhaust inside and at the lip of the nozzle, the potential burning of the turbine exhaust in the base region can be of great concern. Accurate prediction of the base environment at altitudes is therefore very important during the vehicle design phase. Otherwise, undesirable consequences may occur. In this study, the turbulent base flowfield of a cold flow experimental investigation for a four-engine clustered nozzle was numerically benchmarked using a pressure-based computational fluid dynamics (CFD) method. This is a necessary step before the benchmarking of hot flow and combustion flow tests can be considered. Since the medium was unheated air, reasonable prediction of the base pressure distribution at high altitude was the main goal. Several physical phenomena pertaining to the multiengine clustered nozzle base flow physics were deduced from the analysis.
The two-step assemblies of basic-amino-Acid-rich Peptide with a highly charged polyoxometalate.
Zhang, Teng; Li, Hong-Wei; Wu, Yuqing; Wang, Yizhan; Wu, Lixin
2015-06-15
Two-step assembly of a peptide from HPV16 L1 with a highly charged europium-substituted polyoxometalate (POM) cluster, accompanying a great luminescence enhancement of the inorganic polyanions, is reported. The mechanism is discussed in detail by analyzing the thermodynamic parameters from isothermal titration calorimetry (ITC), time-resolved fluorescent and NMR spectra. By comparing the actions of the peptide analogues, a binding process and model are proposed accordingly. The driving forces in each binding step are clarified, and the initial POM aggregation, basic-sequence and hydrophobic C termini of peptide are revealed to contribute essentially to the two-step assembly. The present study demonstrates both a meaningful preparation for bioinorganic materials and a strategy using POMs to modulate the assembly of peptides and even proteins, which could be extended to other proteins and/or viruses by using peptides and POMs with similar properties. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
2017-01-01
The aim of this study was to evaluate the effects of the lateral amplitude and regularity of upper body fluctuation on step time variability. Return map analysis was used to clarify the relationship between step time variability and a history of falling. Eleven healthy, community-dwelling older adults and twelve younger adults participated in the study. All of the subjects walked 25 m at a comfortable speed. Trunk acceleration was measured using triaxial accelerometers attached to the third lumbar vertebrae (L3) and the seventh cervical vertebrae (C7). The normalized average magnitude of acceleration, the coefficient of determination ($R^2$) of the return map, and the step time variabilities, were calculated. Cluster analysis using the average fluctuation and the regularity of C7 fluctuation identified four walking patterns in the mediolateral (ML) direction. The participants with higher fluctuation and lower regularity showed significantly greater step time variability compared with the others. Additionally, elderly participants who had fallen in the past year had higher amplitude and a lower regularity of fluctuation during walking. In conclusion, by focusing on the time evolution of each step, it is possible to understand the cause of stride and/or step time variability that is associated with a risk of falls. PMID:28700633
Chidori, Kazuhiro; Yamamoto, Yuji
2017-01-01
The aim of this study was to evaluate the effects of the lateral amplitude and regularity of upper body fluctuation on step time variability. Return map analysis was used to clarify the relationship between step time variability and a history of falling. Eleven healthy, community-dwelling older adults and twelve younger adults participated in the study. All of the subjects walked 25 m at a comfortable speed. Trunk acceleration was measured using triaxial accelerometers attached to the third lumbar vertebrae (L3) and the seventh cervical vertebrae (C7). The normalized average magnitude of acceleration, the coefficient of determination ($R^2$) of the return map, and the step time variabilities, were calculated. Cluster analysis using the average fluctuation and the regularity of C7 fluctuation identified four walking patterns in the mediolateral (ML) direction. The participants with higher fluctuation and lower regularity showed significantly greater step time variability compared with the others. Additionally, elderly participants who had fallen in the past year had higher amplitude and a lower regularity of fluctuation during walking. In conclusion, by focusing on the time evolution of each step, it is possible to understand the cause of stride and/or step time variability that is associated with a risk of falls.
Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.
Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J
2008-06-18
Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson correlation coefficient and the SD-weighted correlation coefficient, and is particularly useful for clustering replicated microarray data. This computational approach should be generally useful for proteomic data or other high-throughput analysis methodology.
Ramón, M; Martínez-Pastor, F
2018-04-23
Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.
Life analysis of multiroller planetary traction drive
NASA Technical Reports Server (NTRS)
Coy, J. J.; Rohn, D. A.; Loewenthal, S. H.
1981-01-01
A contact fatigue life analysis was performed for a constant ratio, Nasvytis Multiroller Traction Drive. The analysis was based on the Lundberg-Palmgren method for rolling element bearing life prediction. Life adjustment factors for materials, processing, lubrication and traction were included. The 14.7 to 1 ratio drive consisted of a single stage planetary configuration with two rows of stepped planet rollers of five rollers per row, having a roller cluster diameter of approximately 0.21 m, a width of 0.06 m and a weight of 9 kg. Drive system 10 percent life ranged from 18,800 hours at 16.6 kW (22.2 hp) and 25,000 rpm sun roller speed, to 305 hours at maximum operating conditions of 149 kw (200 hp) and 75,000 rpm sun roller speed. The effect of roller diameter and roller center location on life were determined. It was found that an optimum life geometry exists.
Stressful jobs and non-stressful jobs: a cluster analysis of office jobs.
Carayon, P
1994-02-01
The purpose of the study was to determine if office jobs could be characterized by a small number of combinations of stressors that could be related to job-title information and self-report of psychological strain. Two-hundred-and-sixty-two office workers from three public service organizations provided data on nine job stressors and seven indicators of psychological strain. Using cluster analysis on the nine stressors, office jobs were classified into three clusters. The first cluster included jobs with high skill utilization, task clarity, job control and social support and low future ambiguity, but also high on job demands such as quantitative work-load, attention and work pressure. The second cluster included jobs with high demands and future ambiguity and low skill utilization, task clarity, job control and social support. The third cluster was intermediary between the first two clusters. The three clusters were related to job-title information. The second cluster was the highest on a range of psychological strain indicators, while the other two clusters were high on certain strain indicators but low on others. The study showed that office jobs could be characterized by a small number of combinations of stressors that were related to job-title information and psychological strain.
Variable Screening for Cluster Analysis.
ERIC Educational Resources Information Center
Donoghue, John R.
Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables that pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, "m" and…
3D reconstruction from non-uniform point clouds via local hierarchical clustering
NASA Astrophysics Data System (ADS)
Yang, Jiaqi; Li, Ruibo; Xiao, Yang; Cao, Zhiguo
2017-07-01
Raw scanned 3D point clouds are usually irregularly distributed due to the essential shortcomings of laser sensors, which therefore poses a great challenge for high-quality 3D surface reconstruction. This paper tackles this problem by proposing a local hierarchical clustering (LHC) method to improve the consistency of point distribution. Specifically, LHC consists of two steps: 1) adaptive octree-based decomposition of 3D space, and 2) hierarchical clustering. The former aims at reducing the computational complexity and the latter transforms the non-uniform point set into uniform one. Experimental results on real-world scanned point clouds validate the effectiveness of our method from both qualitative and quantitative aspects.
Alivisatos, A. Paul; Colvin, Vicki L.
1998-01-01
Methods are described for attaching semiconductor nanocrystals to solid inorganic surfaces, using self-assembled bifunctional organic monolayers as bridge compounds. Two different techniques are presented. One relies on the formation of self-assembled monolayers on these surfaces. When exposed to solutions of nanocrystals, these bridge compounds bind the crystals and anchor them to the surface. The second technique attaches nanocrystals already coated with bridge compounds to the surfaces. Analyses indicate the presence of quantum confined clusters on the surfaces at the nanolayer level. These materials allow electron spectroscopies to be completed on condensed phase clusters, and represent a first step towards synthesis of an organized assembly of clusters. These new products are also disclosed.
NASA Astrophysics Data System (ADS)
Soltanian-Zadeh, Hamid; Windham, Joe P.; Peck, Donald J.
1997-04-01
This paper presents development and performance evaluation of an MRI feature space method. The method is useful for: identification of tissue types; segmentation of tissues; and quantitative measurements on tissues, to obtain information that can be used in decision making (diagnosis, treatment planning, and evaluation of treatment). The steps of the work accomplished are as follows: (1) Four T2-weighted and two T1-weighted images (before and after injection of Gadolinium) were acquired for ten tumor patients. (2) Images were analyed by two image analysts according to the following algorithm. The intracranial brain tissues were segmented from the scalp and background. The additive noise was suppressed using a multi-dimensional non-linear edge- preserving filter which preserves partial volume information on average. Image nonuniformities were corrected using a modified lowpass filtering approach. The resulting images were used to generate and visualize an optimal feature space. Cluster centers were identified on the feature space. Then images were segmented into normal tissues and different zones of the tumor. (3) Biopsy samples were extracted from each patient and were subsequently analyzed by the pathology laboratory. (4) Image analysis results were compared to each other and to the biopsy results. Pre- and post-surgery feature spaces were also compared. The proposed algorithm made it possible to visualize the MRI feature space and to segment the image. In all cases, the operators were able to find clusters for normal and abnormal tissues. Also, clusters for different zones of the tumor were found. Based on the clusters marked for each zone, the method successfully segmented the image into normal tissues (white matter, gray matter, and CSF) and different zones of the lesion (tumor, cyst, edema, radiation necrosis, necrotic core, and infiltrated tumor). The results agreed with those obtained from the biopsy samples. Comparison of pre- to post-surgery and radiation feature spaces confirmed that the tumor was not present in the second study but radiation necrosis was generated as a result of radiation.
Mitchell, Kimberly J; Finkelhor, David; Becker-Blease, Kathryn A
2007-06-01
This article utilizes data from clinical reports of 929 adults to examine whether various problematic Internet experiences are distinctly different from or extensions of conventional problems. A TwoStep Cluster Analysis identified three mutually exclusive groups of adults, those with (1) online relationship problems and victimization; (2) online and offline problems; and (3) marital discord. Results suggest some initial support for the idea that problematic Internet experiences are often extensions of experiences and behaviors that pre-date the Internet. However, the Internet may be introducing some qualitatively new dimensions-such as an increased severity, an increased frequency, or unique dynamics-that require new responses or interventions.
Belevich, Nikolai P; Bertsova, Yulia V; Verkhovskaya, Marina L; Baykov, Alexander A; Bogachev, Alexander V
2016-02-01
Bacterial Na(+)-translocating NADH:quinone oxidoreductase (Na(+)-NQR) uses a unique set of prosthetic redox groups-two covalently bound FMN residues, a [2Fe-2S] cluster, FAD, riboflavin and a Cys4[Fe] center-to catalyze electron transfer from NADH to ubiquinone in a reaction coupled with Na(+) translocation across the membrane. Here we used an ultra-fast microfluidic stopped-flow instrument to determine rate constants and the difference spectra for the six consecutive reaction steps of Vibrio harveyi Na(+)-NQR reduction by NADH. The instrument, with a dead time of 0.25 ms and optical path length of 1 cm allowed collection of visible spectra in 50-μs intervals. By comparing the spectra of reaction steps with the spectra of known redox transitions of individual enzyme cofactors, we were able to identify the chemical nature of most intermediates and the sequence of electron transfer events. A previously unknown spectral transition was detected and assigned to the Cys4[Fe] center reduction. Electron transfer from the [2Fe-2S] cluster to the Cys4[Fe] center and all subsequent steps were markedly accelerated when Na(+) concentration was increased from 20 μM to 25 mM, suggesting coupling of the former step with tight Na(+) binding to or occlusion by the enzyme. An alternating access mechanism was proposed to explain electron transfer between subunits NqrF and NqrC. According to the proposed mechanism, the Cys4[Fe] center is alternatively exposed to either side of the membrane, allowing the [2Fe-2S] cluster of NqrF and the FMN residue of NqrC to alternatively approach the Cys4[Fe] center from different sides of the membrane. Copyright © 2015 Elsevier B.V. All rights reserved.
Kristunas, Caroline A; Hemming, Karla; Eborall, Helen C; Gray, Laura J
2017-01-01
Introduction The stepped-wedge cluster randomised trial (SW-CRT) is a complex design, for which many decisions about key design parameters must be made during the planning. These include the number of steps and the duration of time needed to embed the intervention. Feasibility studies are likely to be useful for informing these decisions and increasing the likelihood of the main trial's success. However, the number of feasibility studies being conducted for SW-CRTs is currently unknown. This review aims to establish the number of feasibility studies being conducted for SW-CRTs and determine which feasibility issues are commonly investigated. Methods and analysis Fully published feasibility studies for SW-CRTs will be identified, according to predefined inclusion criteria, from searches conducted in Ovid MEDLINE, Scopus, Embase and PsycINFO. To also identify and gain information on unpublished feasibility studies the following will be contacted: authors of published SW-CRTs (identified from the most recent systematic reviews); contacts for registered SW-CRTs (identified from clinical trials registries); lead statisticians of UK registered clinical trials units and researchers known to work in the area of SW-CRTs. Data extraction will be conducted independently by two reviewers. For the fully published feasibility studies, data will be extracted on the study characteristics, the rationale for the study, the process for determining progression to a main trial, how the study informed the main trial and whether the main trial went ahead. The researchers involved in the unpublished feasibility studies will be contacted to elicit the same information. A narrative synthesis will be conducted and provided alongside a descriptive analysis of the study characteristics. Ethics and dissemination This review does not require ethical approval, as no individual patient data will be used. The results of this review will be published in an open-access peer-reviewed journal. PMID:28765139
Jabson, Jennifer M.; Bowen, Deborah; Weinberg, Janice; Kroenke, Candyce; Luo, Juhua; Messina, Catherine; Shumaker, Sally; Tindle, Hilary A.
2016-01-01
BACKGROUND Strategies for identifying the most relevant psychosocial predictors in studies of racial/ethnic minority women’s health are limited because they largely exclude cultural influences and they assume that psychosocial predictors are independent. This paper proposes and tests an empirical solution. METHODS Hierarchical cluster analysis, conducted with data from 140,652 Women’s Health Initiative participants, identified clusters among individual psychosocial predictors. Multivariable analyses tested associations between clusters and health outcomes. RESULTS A Social Cluster and a Stress Cluster were identified. The Social Cluster was positively associated with well-being and inversely associated with chronic disease index, and the Stress Cluster was inversely associated with well-being and positively associated with chronic disease index. As hypothesized, the magnitude of association between clusters and outcomes differed by race/ethnicity. CONCLUSIONS By identifying psychosocial clusters and their associations with health, we have taken an important step toward understanding how individual psychosocial predictors interrelate and how empirically formed Stress and Social clusters relate to health outcomes. This study has also demonstrated important insight about differences in associations between these psychosocial clusters and health among racial/ethnic minorities. These differences could signal the best pathways for intervention modification and tailoring. PMID:27279761
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Review of Recent Methodological Developments in Group-Randomized Trials: Part 1—Design
Li, Fan; Gallis, John A.; Prague, Melanie; Murray, David M.
2017-01-01
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis. PMID:28426295
Review of Recent Methodological Developments in Group-Randomized Trials: Part 1-Design.
Turner, Elizabeth L; Li, Fan; Gallis, John A; Prague, Melanie; Murray, David M
2017-06-01
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have highlighted the developments of the past 13 years in design with a companion article to focus on developments in analysis. As a pair, these articles update the 2004 review. We have discussed developments in the topics of the earlier review (e.g., clustering, matching, and individually randomized group-treatment trials) and in new topics, including constrained randomization and a range of randomized designs that are alternatives to the standard parallel-arm GRT. These include the stepped-wedge GRT, the pseudocluster randomized trial, and the network-randomized GRT, which, like the parallel-arm GRT, require clustering to be accounted for in both their design and analysis.
NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways.
Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Sand, Olivier; Janky, Rekin's; Vanderstocken, Gilles; Deville, Yves; van Helden, Jacques
2008-07-01
The network analysis tools (NeAT) (http://rsat.ulb.ac.be/neat/) provide a user-friendly web access to a collection of modular tools for the analysis of networks (graphs) and clusters (e.g. microarray clusters, functional classes, etc.). A first set of tools supports basic operations on graphs (comparison between two graphs, neighborhood of a set of input nodes, path finding and graph randomization). Another set of programs makes the connection between networks and clusters (graph-based clustering, cliques discovery and mapping of clusters onto a network). The toolbox also includes programs for detecting significant intersections between clusters/classes (e.g. clusters of co-expression versus functional classes of genes). NeAT are designed to cope with large datasets and provide a flexible toolbox for analyzing biological networks stored in various databases (protein interactions, regulation and metabolism) or obtained from high-throughput experiments (two-hybrid, mass-spectrometry and microarrays). The web interface interconnects the programs in predefined analysis flows, enabling to address a series of questions about networks of interest. Each tool can also be used separately by entering custom data for a specific analysis. NeAT can also be used as web services (SOAP/WSDL interface), in order to design programmatic workflows and integrate them with other available resources.
Xu, Feng; Miras, Haralampos N.; Scullion, Rachel A.; Long, De-Liang; Thiel, Johannes; Cronin, Leroy
2012-01-01
Molecular self-assembly has often been suggested as the ultimate route for the bottom-up construction of building blocks atom-by-atom for functional nanotechnology, yet structural design or prediction of nanomolecular assemblies is still far from reach. Whereas nature uses complex machinery such as the ribosome, chemists use painstakingly engineered step-by-step approaches to build complex molecules but the size and complexity of such molecules, not to mention the accessible yields, can be limited. Herein we present the discovery of a palladium oxometalate {Pd84}-ring cluster 3.3 nm in diameter; [Pd84O42(OAc)28(PO4)42]70- ({Pd84} ≡ {Pd12}7) that is formed in water just by mixing two reagents at room temperature, giving crystals of the compound in just a few days. The structure of the {Pd84}-ring has sevenfold symmetry, comprises 196 building blocks, and we also show, using mass spectrometry, that a large library of other related nanostructures is present in solution. Finally, by analysis of the symmetry and the building block library that construct the {Pd84} we show that the correlation of the symmetry, subunit number, and overall cluster nuclearity can be used as a “Rosetta Stone” to rationalize the “magic numbers” defining a number of other systems. This is because the discovery of {Pd84} allows the relationship between seemingly unrelated families of molecular inorganic nanosystems to be decoded from the overall cluster magic-number nuclearity, to the symmetry and building blocks that define such structures allowing the prediction of other members of these nanocluster families. PMID:22753516
Sample size calculations for stepped wedge and cluster randomised trials: a unified approach
Hemming, Karla; Taljaard, Monica
2016-01-01
Objectives To clarify and illustrate sample size calculations for the cross-sectional stepped wedge cluster randomized trial (SW-CRT) and to present a simple approach for comparing the efficiencies of competing designs within a unified framework. Study Design and Setting We summarize design effects for the SW-CRT, the parallel cluster randomized trial (CRT), and the parallel cluster randomized trial with before and after observations (CRT-BA), assuming cross-sectional samples are selected over time. We present new formulas that enable trialists to determine the required cluster size for a given number of clusters. We illustrate by example how to implement the presented design effects and give practical guidance on the design of stepped wedge studies. Results For a fixed total cluster size, the choice of study design that provides the greatest power depends on the intracluster correlation coefficient (ICC) and the cluster size. When the ICC is small, the CRT tends to be more efficient; when the ICC is large, the SW-CRT tends to be more efficient and can serve as an alternative design when the CRT is an infeasible design. Conclusion Our unified approach allows trialists to easily compare the efficiencies of three competing designs to inform the decision about the most efficient design in a given scenario. PMID:26344808
Sathish, Thirunavukkarasu; Williams, Emily D; Pasricha, Naanki; Absetz, Pilvikki; Lorgelly, Paula; Wolfe, Rory; Mathews, Elezebeth; Aziz, Zahra; Thankappan, Kavumpurathu Raman; Zimmet, Paul; Fisher, Edwin; Tapp, Robyn; Hollingsworth, Bruce; Mahal, Ajay; Shaw, Jonathan; Jolley, Damien; Daivadanam, Meena; Oldenburg, Brian
2013-11-04
India currently has more than 60 million people with Type 2 Diabetes Mellitus (T2DM) and this is predicted to increase by nearly two-thirds by 2030. While management of those with T2DM is important, preventing or delaying the onset of the disease, especially in those individuals at 'high risk' of developing T2DM, is urgently needed, particularly in resource-constrained settings. This paper describes the protocol for a cluster randomised controlled trial of a peer-led lifestyle intervention program to prevent diabetes in Kerala, India. A total of 60 polling booths are randomised to the intervention arm or control arm in rural Kerala, India. Data collection is conducted in two steps. Step 1 (Home screening): Participants aged 30-60 years are administered a screening questionnaire. Those having no history of T2DM and other chronic illnesses with an Indian Diabetes Risk Score value of ≥60 are invited to attend a mobile clinic (Step 2). At the mobile clinic, participants complete questionnaires, undergo physical measurements, and provide blood samples for biochemical analysis. Participants identified with T2DM at Step 2 are excluded from further study participation. Participants in the control arm are provided with a health education booklet containing information on symptoms, complications, and risk factors of T2DM with the recommended levels for primary prevention. Participants in the intervention arm receive: (1) eleven peer-led small group sessions to motivate, guide and support in planning, initiation and maintenance of lifestyle changes; (2) two diabetes prevention education sessions led by experts to raise awareness on T2DM risk factors, prevention and management; (3) a participant handbook containing information primarily on peer support and its role in assisting with lifestyle modification; (4) a participant workbook to guide self-monitoring of lifestyle behaviours, goal setting and goal review; (5) the health education booklet that is given to the control arm. Follow-up assessments are conducted at 12 and 24 months. The primary outcome is incidence of T2DM. Secondary outcomes include behavioural, psychosocial, clinical, and biochemical measures. An economic evaluation is planned. Results from this trial will contribute to improved policy and practice regarding lifestyle intervention programs to prevent diabetes in India and other resource-constrained settings. Australia and New Zealand Clinical Trials Registry: ACTRN12611000262909.
Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard
2015-01-01
Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001); being on ART (highest proportions for clusters two and three, p=0.012); global distress (F=26.8, p<0.001), physical distress (F=36.3, p<0.001) and psychological distress subscale (F=21.8, p<0.001) (all subscales worst for cluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.
Loss of balance during balance beam walking elicits a multifocal theta band electrocortical response
Gwin, Joseph T.; Makeig, Scott; Ferris, Daniel P.
2013-01-01
Determining the neural correlates of loss of balance during walking could lead to improved clinical assessment and treatment for individuals predisposed to falls. We used high-density electroencephalography (EEG) combined with independent component analysis (ICA) to study loss of balance during human walking. We examined 26 healthy young subjects performing heel-to-toe walking on a treadmill-mounted balance beam as well as walking on the treadmill belt (both at 0.22 m/s). ICA identified clusters of electrocortical EEG sources located in or near anterior cingulate, anterior parietal, superior dorsolateral-prefrontal, and medial sensorimotor cortex that exhibited significantly larger mean spectral power in the theta band (4–7 Hz) during walking on the balance beam compared with treadmill walking. Left and right sensorimotor cortex clusters produced significantly less power in the beta band (12–30 Hz) during walking on the balance beam compared with treadmill walking. For each source cluster, we also computed a normalized mean time/frequency spectrogram time locked to the gait cycle during loss of balance (i.e., when subjects stepped off the balance beam). All clusters except the medial sensorimotor cluster exhibited a transient increase in theta band power during loss of balance. Cluster spectrograms demonstrated that the first electrocortical indication of impending loss of balance occurred in the left sensorimotor cortex at the transition from single support to double support prior to stepping off the beam. These findings provide new insight into the neural correlates of walking balance control and could aid future studies on elderly individuals and others with balance impairments. PMID:23926037
Sipp, Amy R; Gwin, Joseph T; Makeig, Scott; Ferris, Daniel P
2013-11-01
Determining the neural correlates of loss of balance during walking could lead to improved clinical assessment and treatment for individuals predisposed to falls. We used high-density electroencephalography (EEG) combined with independent component analysis (ICA) to study loss of balance during human walking. We examined 26 healthy young subjects performing heel-to-toe walking on a treadmill-mounted balance beam as well as walking on the treadmill belt (both at 0.22 m/s). ICA identified clusters of electrocortical EEG sources located in or near anterior cingulate, anterior parietal, superior dorsolateral-prefrontal, and medial sensorimotor cortex that exhibited significantly larger mean spectral power in the theta band (4-7 Hz) during walking on the balance beam compared with treadmill walking. Left and right sensorimotor cortex clusters produced significantly less power in the beta band (12-30 Hz) during walking on the balance beam compared with treadmill walking. For each source cluster, we also computed a normalized mean time/frequency spectrogram time locked to the gait cycle during loss of balance (i.e., when subjects stepped off the balance beam). All clusters except the medial sensorimotor cluster exhibited a transient increase in theta band power during loss of balance. Cluster spectrograms demonstrated that the first electrocortical indication of impending loss of balance occurred in the left sensorimotor cortex at the transition from single support to double support prior to stepping off the beam. These findings provide new insight into the neural correlates of walking balance control and could aid future studies on elderly individuals and others with balance impairments.
MPIGeneNet: Parallel Calculation of Gene Co-Expression Networks on Multicore Clusters.
Gonzalez-Dominguez, Jorge; Martin, Maria J
2017-10-10
In this work we present MPIGeneNet, a parallel tool that applies Pearson's correlation and Random Matrix Theory to construct gene co-expression networks. It is based on the state-of-the-art sequential tool RMTGeneNet, which provides networks with high robustness and sensitivity at the expenses of relatively long runtimes for large scale input datasets. MPIGeneNet returns the same results as RMTGeneNet but improves the memory management, reduces the I/O cost, and accelerates the two most computationally demanding steps of co-expression network construction by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on two different systems using three typical input datasets shows that MPIGeneNet is significantly faster than RMTGeneNet. As an example, our tool is up to 175.41 times faster on a cluster with eight nodes, each one containing two 12-core Intel Haswell processors. Source code of MPIGeneNet, as well as a reference manual, are available at https://sourceforge.net/projects/mpigenenet/.
Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.
Cheng, Gong; Zhang, Guocai; Xu, Liang
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317
Lindström, Miia; Hinderink, Katja; Somervuo, Panu; Kiviniemi, Katri; Nevas, Mari; Chen, Ying; Auvinen, Petri; Carter, Andrew T.; Mason, David R.; Peck, Michael W.; Korkeala, Hannu
2009-01-01
Comparative genomic hybridization analysis of 32 Nordic group I Clostridium botulinum type B strains isolated from various sources revealed two homogeneous clusters, clusters BI and BII. The type B strains differed from reference strain ATCC 3502 by 413 coding sequence (CDS) probes, sharing 88% of all the ATCC 3502 genes represented on the microarray. The two Nordic type B clusters differed from each other by their response to 145 CDS probes related mainly to transport and binding, adaptive mechanisms, fatty acid biosynthesis, the cell membranes, bacteriophages, and transposon-related elements. The most prominent differences between the two clusters were related to resistance to toxic compounds frequently found in the environment, such as arsenic and cadmium, reflecting different adaptive responses in the evolution of the two clusters. Other relatively variable CDS groups were related to surface structures and the gram-positive cell wall, suggesting that the two clusters possess different antigenic properties. All the type B strains carried CDSs putatively related to capsule formation, which may play a role in adaptation to different environmental and clinical niches. Sequencing showed that representative strains of the two type B clusters both carried subtype B2 neurotoxin genes. As many of the type B strains studied have been isolated from foods or associated with botulism, it is expected that the two group I C. botulinum type B clusters present a public health hazard in Nordic countries. Knowing the genetic and physiological markers of these clusters will assist in targeting control measures against these pathogens. PMID:19270141
NASA Astrophysics Data System (ADS)
Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue
2017-08-01
Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.
Cluster analysis of multiple planetary flow regimes
NASA Technical Reports Server (NTRS)
Mo, Kingtse; Ghil, Michael
1987-01-01
A modified cluster analysis method was developed to identify spatial patterns of planetary flow regimes, and to study transitions between them. This method was applied first to a simple deterministic model and second to Northern Hemisphere (NH) 500 mb data. The dynamical model is governed by the fully-nonlinear, equivalent-barotropic vorticity equation on the sphere. Clusters of point in the model's phase space are associated with either a few persistent or with many transient events. Two stationary clusters have patterns similar to unstable stationary model solutions, zonal, or blocked. Transient clusters of wave trains serve as way stations between the stationary ones. For the NH data, cluster analysis was performed in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters are found in the low-frequency band of more than 10 days, and transient clusters in the bandpass frequency window between 2.5 and 6 days. In the low-frequency band three pairs of clusters determine, respectively, EOFs 1, 2, and 3. They exhibit well-known regional features, such as blocking, the Pacific/North American (PNA) pattern and wave trains. Both model and low-pass data show strong bimodality. Clusters in the bandpass window show wave-train patterns in the two jet exit regions. They are related, as in the model, to transitions between stationary clusters.
Tu, Po-An; Lin, Der-Yuh; Li, Guang-Fu; Huang, Jan-Chi; Wang, De-Chi; Wang, Pei-Hwa
2014-01-01
In recent years, the population size of Taiwan yellow cattle has drastically declined, even become endangered. A preservation project, Taiwan Yellow Cattle Genetic Preservation Project (TYCGPP), was carried out at the Livestock Research Institute (LRI) Hengchun branch (1988-present). An analysis of intra- and inter- population variability was performed to be the first step to preserve this precious genetic resource. In this work, a total number of 140 individuals selected from the five Taiwan yellow cattle populations were analyzed using 12 microsatellite markers (loci). These markers determined the level of genetic variation within and among populations as well as the phylogenetic structure. The total number of alleles detected (122, 10.28 per locus) and the expected heterozygosity (0.712) indicated that these five populations had a high level of genetic variability. Bayesian cluster analysis showed that the most likely number of groups was 2 (K = 2). Genetic differentiation among clusters was moderate (F ST = 0.095). The result of AMOVA showed that yellow cattle in Taiwan had maintained a high level of within-population genetic differentiation (91%), the remainder being accounted for by differentiation among subpopulations (4%), and by differentiation among regions (5%). The results of STRUCTURE and principal component analysis (PCA) revealed two divergent clusters. The individual unrooted phylogenetic tree showed that some Kinmen yellow cattle in the Hengchun facility (KMHC individuals) were overlapped with Taiwan yellow cattle (TW) and Taiwan yellow cattle Hengchun (HC) populations. Also, they were overlapped with Kinmen × Taiwan (KT) and Kinmen yellow cattle (KM) populations. It is possible that KMHC kept similar phenotypic characteristics and analogous genotypes between TW and KM. A significant inbreeding coefficient (F IS = 0.185; P < 0.01) was detected, suggesting a medium level of inbreeding for yellow cattle in Taiwan. The hypothesis that yellow cattle in Taiwan were derived from two different clusters was also supported by the phylogenetic tree constructed by the UPGMA, indicating that the yellow cattle in Taiwan and in Kinmen should be treated as two different management units. This result will be applied to maintain a good level of genetic variability and rusticity (stress-resistance) and to avoid further inbreeding for yellow cattle population in Taiwan.
NASA Astrophysics Data System (ADS)
Taira, T.; Kato, A.
2013-12-01
A high-resolution Vp/Vs ratio estimate is one of the key parameters to understand spatial variations of composition and physical state within the Earth. Lin and Shearer (2007, BSSA) recently developed a methodology to obtain local Vp/Vs ratios in individual similar earthquake clusters, based on P- and S-wave differential times. A waveform cross-correlation approach is typically employed to measure those differential times for pairs of seismograms from similar earthquakes clusters, at narrow time windows around the direct P and S waves. This approach effectively collects P- and S-wave differential times and however requires the robust P- and S-wave time windows that are extracted based on either manually or automatically picked P- and S-phases. We present another technique to estimate P- and S-wave differential times by exploiting temporal properties of delayed time as a function of elapsed time on the seismograms with a moving-window cross-correlation analysis (e.g., Snieder, 2002, Phys. Rev. E; Niu et al. 2003, Nature). Our approach is based on the principle that the delayed time for the direct S wave differs from that for the direct P wave. Two seismograms aligned by the direct P waves from a pair of similar earthquakes yield that delayed times become zero around the direct P wave. In contrast, delayed times obtained from time windows including the direct S wave have non-zero value. Our approach, in principle, is capable of measuring both P- and S-wave differential times from single-component seismograms. In an ideal case, the temporal evolution of delayed time becomes a step function with its discontinuity at the onset of the direct S wave. The offset in the resulting step function would be the S-wave differential time, relative to the P-wave differential time as the two waveforms are aligned by the direct P wave. We apply our moving-window cross-correlation technique to the two different data sets collected at: 1) the Wakayama district, Japan and 2) the Geysers geothermal field, California. The both target areas are characterized by earthquake swarms that provide a number of similar events clusters. We use the following automated procedure to systematically analyze the two data sets: 1) the identification of the direct P arrivals by using an Akaike Information Criterion based phase picking algorithm introduced by Zhang and Thurber (2003, BSSA), 2) the waveform alignment by the P-wave with a waveform cross-correlation to obtain P-wave differential time, 3) the moving-time window analysis to estimate the S-differential time. Kato et al. (2010, GRL) have estimated the Vp/Vs ratios for a few similar earthquake clusters from the Wakayama data set, by a conventional approach to obtain differential times. We find that the resulting Vp/Vs ratios from our approach for the same earthquake clusters are comparable with those obtained from Kato et al. (2010, GRL). We show that the moving-window cross-correlation technique effectively measures both P- and S-wave differential times for the seismograms in which the clear P and S phases are not observed. We will show spatial distributions in Vp/Vs ratios in our two target areas.
Optimization Strategies for Hardware-Based Cofactorization
NASA Astrophysics Data System (ADS)
Loebenberger, Daniel; Putzka, Jens
We use the specific structure of the inputs to the cofactorization step in the general number field sieve (GNFS) in order to optimize the runtime for the cofactorization step on a hardware cluster. An optimal distribution of bitlength-specific ECM modules is proposed and compared to existing ones. With our optimizations we obtain a speedup between 17% and 33% of the cofactorization step of the GNFS when compared to the runtime of an unoptimized cluster.
A Study of Pupil Control Ideology: A Person-Oriented Approach to Data Analysis
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2010-01-01
Responses of urban school teachers to the Pupil Control Ideology questionnaire were studied using Latent Class Analysis. The results of the analysis suggest that the best fitting model to the data is a two-cluster solution. In particular, the pupil control ideology of the sample delineates into two clusters of teachers, those with humanistic and…
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Clustering multilayer omics data using MuNCut.
Teran Hidalgo, Sebastian J; Ma, Shuangge
2018-03-14
Omics profiling is now a routine component of biomedical studies. In the analysis of omics data, clustering is an essential step and serves multiple purposes including for example revealing the unknown functionalities of omics units, assisting dimension reduction in outcome model building, and others. In the most recent omics studies, a prominent trend is to conduct multilayer profiling, which collects multiple types of genetic, genomic, epigenetic and other measurements on the same subjects. In the literature, clustering methods tailored to multilayer omics data are still limited. Directly applying the existing clustering methods to multilayer omics data and clustering each layer first and then combing across layers are both "suboptimal" in that they do not accommodate the interconnections within layers and across layers in an informative way. In this study, we develop the MuNCut (Multilayer NCut) clustering approach. It is tailored to multilayer omics data and sufficiently accounts for both across- and within-layer connections. It is based on the novel NCut technique and also takes advantages of regularized sparse estimation. It has an intuitive formulation and is computationally very feasible. To facilitate implementation, we develop the function muncut in the R package NcutYX. Under a wide spectrum of simulation settings, it outperforms competitors. The analysis of TCGA (The Cancer Genome Atlas) data on breast cancer and cervical cancer shows that MuNCut generates biologically meaningful results which differ from those using the alternatives. We propose a more effective clustering analysis of multiple omics data. It provides a new venue for jointly analyzing genetic, genomic, epigenetic and other measurements.
Subspace Clustering via Learning an Adaptive Low-Rank Graph.
Yin, Ming; Xie, Shengli; Wu, Zongze; Zhang, Yun; Gao, Junbin
2018-08-01
By using a sparse representation or low-rank representation of data, the graph-based subspace clustering has recently attracted considerable attention in computer vision, given its capability and efficiency in clustering data. However, the graph weights built using the representation coefficients are not the exact ones as the traditional definition is in a deterministic way. The two steps of representation and clustering are conducted in an independent manner, thus an overall optimal result cannot be guaranteed. Furthermore, it is unclear how the clustering performance will be affected by using this graph. For example, the graph parameters, i.e., the weights on edges, have to be artificially pre-specified while it is very difficult to choose the optimum. To this end, in this paper, a novel subspace clustering via learning an adaptive low-rank graph affinity matrix is proposed, where the affinity matrix and the representation coefficients are learned in a unified framework. As such, the pre-computed graph regularizer is effectively obviated and better performance can be achieved. Experimental results on several famous databases demonstrate that the proposed method performs better against the state-of-the-art approaches, in clustering.
Strong Lensing Analysis of the Galaxy Cluster MACS J1319.9+7003 and the Discovery of a Shell Galaxy
NASA Astrophysics Data System (ADS)
Zitrin, Adi
2017-01-01
We present a strong-lensing (SL) analysis of the galaxy cluster MACS J1319.9+7003 (z = 0.33, also known as Abell 1722), as part of our ongoing effort to analyze massive clusters with archival Hubble Space Telescope (HST) imaging. We spectroscopically measured with Keck/Multi-Object Spectrometer For Infra-Red Exploration (MOSFIRE) two galaxies multiply imaged by the cluster. Our analysis reveals a modest lens, with an effective Einstein radius of {θ }e(z=2)=12+/- 1\\prime\\prime , enclosing 2.1+/- 0.3× {10}13 M⊙. We briefly discuss the SL properties of the cluster, using two different modeling techniques (see the text for details), and make the mass models publicly available (ftp://wise-ftp.tau.ac.il/pub/adiz/MACS1319/). Independently, we identified a noteworthy, young shell galaxy (SG) system forming around two likely interacting cluster members, 20″ north of the brightest cluster galaxy. SGs are rare in galaxy clusters, and indeed, a simple estimate reveals that they are only expected in roughly one in several dozen, to several hundred, massive galaxy clusters (the estimate can easily change by an order of magnitude within a reasonable range of characteristic values relevant for the calculation). Taking advantage of our lens model best-fit, mass-to-light scaling relation for cluster members, we infer that the total mass of the SG system is ˜ 1.3× {10}11 {M}⊙ , with a host-to-companion mass ratio of about 10:1. Despite being rare in high density environments, the SG constitutes an example to how stars of cluster galaxies are efficiently redistributed to the intra-cluster medium. Dedicated numerical simulations for the observed shell configuration, perhaps aided by the mass model, might cast interesting light on the interaction history and properties of the two galaxies. An archival HST search in galaxy cluster images can reveal more such systems.
Zhang, Ding-Kun; Han, Xue; Tan, Peng; Li, Rui-Yu; Niu, Ming; Zhang, Cong-En; Wang, Jia-Bo; Yang, Ming; Xiao, Xiao-He
2017-01-01
Aconite is a valuable drug and also a toxic material, which can be used only after detoxification processing. Although traditional processing methods can achieve detoxification effect as desired, there are some obvious drawbacks, including a significant loss of alkaloids and poor quality consistency. It is thus necessary to develop a new detoxification approach. In the present study, we designed a novel one-step detoxification approach by quickly drying fresh-cut aconite particles. In order to evaluate the technical advantages, the contents of mesaconitine, aconitine, hypaconitine, benzoylmesaconine, benzoylaconine, benzoylhypaconine, neoline, fuziline, songorine, and talatisamine were determined using HPLC and UHPLC/Q-TOF-MS. Multivariate analysis methods, such as Clustering analysis and Principle component analysis, were applied to determine the quality differences between samples. Our results showed that traditional processes could reduce toxicity as desired, but also led to more than 85.2% alkaloids loss. However, our novel one-step method was capable of achieving virtually the same detoxification effect, with only an approximately 30% alkaloids loss. Cluster analysis and Principal component analysis analyses suggested that Shengfupian and the novel products were significantly different from various traditional products. Acute toxicity testing showed that the novel products achieved a good detoxification effect, with its maximum tolerated dose being equivalent to 20 times of adult dosage. And cardiac effect testing also showed that the activity of the novel products was stronger than that of traditional products. Moreover, particles specification greatly improved the quality consistency of the novel products, which was immensely superior to the traditional products. These results would help guide the rational optimization of aconite processing technologies, providing better drugs for clinical treatment. Copyright © 2017 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.
Focused maternity care in Ghana: results of a cluster analysis.
Ayanore, Martin Amogre; Pavlova, Milena; Groot, Wim
2016-08-17
Ghana missed out in attaining Millennium Development Goal 5 in 2015. The provision of adequate prenatal and postnatal care remains problematic, with poor evidence on women's views on met and unmet maternity care needs across all regions in Ghana. This paper examines maternal care utilization in Ghana by applying WHO indicators for focused maternal care utilization. Two-step cluster analysis segregated women into groups based on the components of the maternity care used. Using cluster membership variables as dependent variables, we applied multinomial and binary regression to examine associations of care use with individual, household and regional characteristics. We identified three patterns of care use: adequate, less and least adquate care. The presence of a female and skilled provider is an indicator of adequate care. Women in Volta, Upper West, Northern and Western regions received less adequate care compared with other regions. Supply-related factors (drugs availability, distance/transport, health insurance ownership, rural residence) were associated with adequacy of care. The lack of female autonomy, widowed/divorced women, age and parity were associated with less adequate care. Care patterns were distinctively associated with the quality of health care support (skilled and female attendant) instead of with the number of visits made to the facility. Across regions and within rural settings, disparities exist, often compounded by supply-related factors. Efforts to address skilled workforce shortages, greater accountability for quality and equity, improving women motivation for care seeking and active participation are important for maternity care in Ghana.
Case-based fracture image retrieval.
Zhou, Xin; Stern, Richard; Müller, Henning
2012-05-01
Case-based fracture image retrieval can assist surgeons in decisions regarding new cases by supplying visually similar past cases. This tool may guide fracture fixation and management through comparison of long-term outcomes in similar cases. A fracture image database collected over 10 years at the orthopedic service of the University Hospitals of Geneva was used. This database contains 2,690 fracture cases associated with 43 classes (based on the AO/OTA classification). A case-based retrieval engine was developed and evaluated using retrieval precision as a performance metric. Only cases in the same class as the query case are considered as relevant. The scale-invariant feature transform (SIFT) is used for image analysis. Performance evaluation was computed in terms of mean average precision (MAP) and early precision (P10, P30). Retrieval results produced with the GNU image finding tool (GIFT) were used as a baseline. Two sampling strategies were evaluated. One used a dense 40 × 40 pixel grid sampling, and the second one used the standard SIFT features. Based on dense pixel grid sampling, three unsupervised feature selection strategies were introduced to further improve retrieval performance. With dense pixel grid sampling, the image is divided into 1,600 (40 × 40) square blocks. The goal is to emphasize the salient regions (blocks) and ignore irrelevant regions. Regions are considered as important when a high variance of the visual features is found. The first strategy is to calculate the variance of all descriptors on the global database. The second strategy is to calculate the variance of all descriptors for each case. A third strategy is to perform a thumbnail image clustering in a first step and then to calculate the variance for each cluster. Finally, a fusion between a SIFT-based system and GIFT is performed. A first comparison on the selection of sampling strategies using SIFT features shows that dense sampling using a pixel grid (MAP = 0.18) outperformed the SIFT detector-based sampling approach (MAP = 0.10). In a second step, three unsupervised feature selection strategies were evaluated. A grid parameter search is applied to optimize parameters for feature selection and clustering. Results show that using half of the regions (700 or 800) obtains the best performance for all three strategies. Increasing the number of clusters in clustering can also improve the retrieval performance. The SIFT descriptor variance in each case gave the best indication of saliency for the regions (MAP = 0.23), better than the other two strategies (MAP = 0.20 and 0.21). Combining GIFT (MAP = 0.23) and the best SIFT strategy (MAP = 0.23) produced significantly better results (MAP = 0.27) than each system alone. A case-based fracture retrieval engine was developed and is available for online demonstration. SIFT is used to extract local features, and three feature selection strategies were introduced and evaluated. A baseline using the GIFT system was used to evaluate the salient point-based approaches. Without supervised learning, SIFT-based systems with optimized parameters slightly outperformed the GIFT system. A fusion of the two approaches shows that the information contained in the two approaches is complementary. Supervised learning on the feature space is foreseen as the next step of this study.
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
Clustering of food and activity preferences in primary school children.
Rodenburg, Gerda; Oenema, Anke; Pasma, Marleen; Kremers, Stef P J; van de Mheen, Dike
2013-01-01
This study examined clustering of food and activity preferences in Dutch primary school children. It also explored whether the preference clusters are associated with child and parental background characteristics and with parenting practices. Data were used from 1480 parent-child dyads participating in the IVO Nutrition and Physical Activity Child cohort (INPACT). Children aged 8-11years reported their preferences for food (e.g. fruit and sweet snacks) and activities (e.g. biking and watching television) at school with a newly-developed, visual instrument designed for primary school children. Parents completed a questionnaire at home. Principal component analysis was used to identify preference clusters. Backward regression analyses were used to examine the relationship between child and parental characteristics with cluster scores. We found (1) a clustering of preferences for unhealthy foods and unhealthy drinks, (2) a clustering of preferences for various physical activity behaviours, and (3) a clustering of preferences for unhealthy drinks and sedentary behaviour. Boys had a higher cluster score than girls on all three preference clusters. In addition, physical activity-related parenting practices were negatively related to unhealthy preference clusters and positively to the physical-activity-preference cluster. The next step is to relate our preference clusters to child dietary and activity behaviours, with special attention to gender differences. This may help in the development of interventions aimed at improving children's food and activity preferences. Copyright © 2012 Elsevier Ltd. All rights reserved.
Glatman-Freedman, Aharona; Kaufman, Zalman; Kopel, Eran; Bassal, Ravit; Taran, Diana; Valinsky, Lea; Agmon, Vered; Shpriz, Manor; Cohen, Daniel; Anis, Emilia; Shohat, Tamy
2016-08-01
To enhance timely surveillance of bacterial enteric pathogens, space-time cluster analysis was introduced in Israel in May 2013. Stool isolation data of Salmonella, Shigella, and Campylobacter from patients of a large Health Maintenance Organization were analyzed weekly by ArcGIS and SaTScan, and cluster results were sent promptly to local departments of health (LDOHs). During eighteen months, we identified 52 Shigella sonnei clusters, two Salmonella clusters, and no Campylobacter clusters. S. sonnei clusters lasted from one to 33 days and included three to 30 individuals. Thirty-one (60%) of the S. sonnei clusters were known to LDOHs prior to cluster analysis. Clusters not previously known by the LDOHs prompted epidemiologic investigations. In 31 of the 37 (84%) confirmed clusters, educational institutes (nursery schools, kindergartens, and a primary school) were involved. Cluster analysis demonstrated capability to complement enteric disease surveillance. Scaling up the system can further enhance timely detection and control of outbreaks. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.
Repair of Clustered Damage and DNA Polymerase Iota.
Belousova, E A; Lavrik, O I
2015-08-01
Multiple DNA lesions occurring within one or two turns of the DNA helix known as clustered damage are a source of double-stranded DNA breaks, which represent a serious threat to the cells. Repair of clustered lesions is accomplished in several steps. If a clustered lesion contains oxidized bases, an individual DNA lesion is repaired by the base excision repair (BER) mechanism involving a specialized DNA polymerase after excising DNA damage. Here, we investigated DNA synthesis catalyzed by DNA polymerase iota using damaged DNA templates. Two types of DNA substrates were used as model DNAs: partial DNA duplexes containing breaks of different length, and DNA duplexes containing 5-formyluracil (5-foU) and uracil as a precursor of apurinic/apyrimidinic sites (AP) in opposite DNA strands. For the first time, we showed that DNA polymerase iota is able to catalyze DNA synthesis using partial DNA duplexes having breaks of different length as substrates. In addition, we found that DNA polymerase iota could catalyze DNA synthesis during repair of clustered damage via the BER system by using both undamaged and 5-foU-containing templates. We found that hPCNA (human proliferating cell nuclear antigen) increased efficacy of DNA synthesis catalyzed by DNA polymerase iota.
[Anxiety and depression in residents - results of a Swiss longitudinal study].
Buddeberg-Fischer, Barbara; Stamm, Martina; Buddeberg, Claus; Klaghofer, Richard
2009-01-01
The study investigates the development of anxiety and depression during residents' postgraduate training as well as the symptom patterns and the prediction of these patterns of impaired affectivity by personality factors. It furthermore regards the differences between these patterns in workplace- and career-related factors as well as in worklife balance. In a prospective cohort study (2001-2007), 390 junior physicians of various specialties (54.9% females, 45.1% males) were investigated with respect to the percentage of participants with elevated anxiety and depression scores at the beginning of the second, fourth, and sixth year of residency, respectively. Symptom patterns were evaluated by two-step cluster analysis. The prediction of the assignment to the symptom patterns was investigated by logistic regression analysis. The differences in further factors between the two patterns was analyzed by t-tests. In the second year of residency, relevant anxiety symptoms were found in 30% of the physicians, and in the fourth and sixth year in 20%; relevant depression symptoms were found in 15% and 10%, respectively. The cluster analysis revealed two symptom patterns: Type A (n = 135, 34.6%) with continuously elevated anxiety and depression symptoms; and type B (n = 255, 65.4%) with continuously low values. Personality factors such as the sense of coherence, self-esteem, occupational self-efficacy expectation, and overcommitment significantly predicted the assignment to the symptom patterns. Also in terms of workload, mentoring experience, career satisfaction, and worklife balance, persons of type A differ from those of type B. Personality factors play an important role in physicians' ability to cope with job demands. Persons with an elevated vulnerability for anxiety and depression should be continuously supported and counselled by a mentor during residency.
ERIC Educational Resources Information Center
Hale, Robert L.; Dougherty, Donna
1988-01-01
Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…
Grimsley, Jasmine M S; Gadziola, Marie A; Wenstrup, Jeffrey J
2012-01-01
Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified 10 syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.
Concept mapping and network analysis: an analytic approach to measure ties among constructs.
Goldman, Alyssa W; Kane, Mary
2014-12-01
Group concept mapping is a mixed-methods approach that helps a group visually represent its ideas on a topic of interest through a series of related maps. The maps and additional graphics are useful for planning, evaluation and theory development. Group concept maps are typically described, interpreted and utilized through points, clusters and distances, and the implications of these features in understanding how constructs relate to one another. This paper focuses on the application of network analysis to group concept mapping to quantify the strength and directionality of relationships among clusters. The authors outline the steps of this analysis, and illustrate its practical use through an organizational strategic planning example. Additional benefits of this analysis to evaluation projects are also discussed, supporting the overall utility of this supplemental technique to the standard concept mapping methodology. Copyright © 2014 Elsevier Ltd. All rights reserved.
A time-series approach for clustering farms based on slaughterhouse health aberration data.
Hulsegge, B; de Greef, K H
2018-05-01
A large amount of data is collected routinely in meat inspection in pig slaughterhouses. A time series clustering approach is presented and applied that groups farms based on similar statistical characteristics of meat inspection data over time. A three step characteristic-based clustering approach was used from the idea that the data contain more info than the incidence figures. A stratified subset containing 511,645 pigs was derived as a study set from 3.5 years of meat inspection data. The monthly averages of incidence of pleuritis and of pneumonia of 44 Dutch farms (delivering 5149 batches to 2 pig slaughterhouses) were subjected to 1) derivation of farm level data characteristics 2) factor analysis and 3) clustering into groups of farms. The characteristic-based clustering was able to cluster farms for both lung aberrations. Three groups of data characteristics were informative, describing incidence, time pattern and degree of autocorrelation. The consistency of clustering similar farms was confirmed by repetition of the analysis in a larger dataset. The robustness of the clustering was tested on a substantially extended dataset. This confirmed the earlier results, three data distribution aspects make up the majority of distinction between groups of farms and in these groups (clusters) the majority of the farms was allocated comparable to the earlier allocation (75% and 62% for pleuritis and pneumonia, respectively). The difference between pleuritis and pneumonia in their seasonal dependency was confirmed, supporting the biological relevance of the clustering. Comparison of the identified clusters of statistically comparable farms can be used to detect farm level risk factors causing the health aberrations beyond comparison on disease incidence and trend alone. Copyright © 2018 Elsevier B.V. All rights reserved.
Seifert, L; De Jesus, K; Komar, J; Ribeiro, J; Abraldes, J A; Figueiredo, P; Vilas-Boas, J P; Fernandes, R J
2016-06-01
The aim was to examine behavioural variability within and between individuals, especially in a swimming task, to explore how swimmers with various specialty (competitive short distance swimming vs. triathlon) adapt to repetitive events of sub-maximal intensity, controlled in speed but of various distances. Five swimmers and five triathletes randomly performed three variants (with steps of 200, 300 and 400m distances) of a front crawl incremental step test until exhaustion. Multi-camera system was used to collect and analyse eight kinematical and swimming efficiency parameters. Analysis of variance showed significant differences between swimmers and triathletes, with significant individual effect. Cluster analysis put these parameters together to investigate whether each individual used the same pattern(s) and one or several patterns to achieve the task goal. Results exhibited ten patterns for the whole population, with only two behavioural patterns shared between swimmers and triathletes. Swimmers tended to use higher hand velocity and index of coordination than triathletes. Mono-stability occurred in swimmers whatever the task constraint showing high stability, while triathletes revealed bi-stability because they switched to another pattern at mid-distance of the task. Finally, our analysis helped to explain and understand effect of specialty and more broadly individual adaptation to task constraint. Copyright © 2016 Elsevier B.V. All rights reserved.
Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques
NASA Astrophysics Data System (ADS)
Gulgundi, Mohammad Shahid; Shetty, Amba
2018-03-01
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.
Chaillou, Stéphane; Zagorec, Monique; Champomier-Vergès, Marie-Christine
2013-01-01
In silico analysis of the genome sequence of the meat-borne lactic acid bacterium (LAB) Lactobacillus sakei 23K has revealed a repertoire of potential functions related to the adaptation of this bacterium to the meat environment. Among these functions, the ability to use N-acetyl-neuraminic acid (NANA) as a carbon source could provide a competitive advantage for growth on meat in which this amino sugar is present. In this work, we proposed to analyze the functionality of a gene cluster encompassing nanTEAR and nanK (nanTEAR-nanK). We established that this cluster encoded a pathway allowing transport and early steps of the catabolism of NANA in this genome. We also demonstrated that this cluster was absent from the genome of other L. sakei strains that were shown to be unable to grow on NANA. Moreover, L. sakei 23K nanA, nanT, nanK, and nanE genes were able to complement Escherichia coli mutants. Construction of different mutants in L. sakei 23K ΔnanR, ΔnanT, and ΔnanK and the double mutant L. sakei 23K Δ(nanA-nanE) made it possible to show that all were impaired for growth on NANA. In addition, two genes located downstream from nanK, lsa1644 and lsa1645, are involved in the catabolism of sialic acid in L. sakei 23K, as a L. sakei 23K Δlsa1645 mutant was no longer able to grow on NANA. All these results demonstrate that the gene cluster nanTEAR-nanK-lsa1644-lsa1645 is indeed involved in the use of NANA as an energy source by L. sakei. PMID:23335758
Steps Toward Understanding Mitochondrial Fe/S Cluster Biogenesis.
Melber, Andrew; Winge, Dennis R
2018-01-01
Iron-sulfur clusters (Fe/S clusters) are essential cofactors required throughout the clades of biology for performing a myriad of unique functions including nitrogen fixation, ribosome assembly, DNA repair, mitochondrial respiration, and metabolite catabolism. Although Fe/S clusters can be synthesized in vitro and transferred to a client protein without enzymatic assistance, biology has evolved intricate mechanisms to assemble and transfer Fe/S clusters within the cellular environment. In eukaryotes, the foundation of all cellular clusters starts within the mitochondria. The focus of this review is to detail the mitochondrial Fe/S biogenesis (ISC) pathway along with the Fe/S cluster transfer steps necessary to mature Fe/S proteins. New advances in our understanding of the mitochondrial Fe/S biogenesis machinery will be highlighted. Additionally, we will address various experimental approaches that have been successful in the identification and characterization of components of the ISC pathway. © 2018 Elsevier Inc. All rights reserved.
Alivisatos, A.P.; Colvin, V.L.
1998-05-12
Methods are described for attaching semiconductor nanocrystals to solid inorganic surfaces, using self-assembled bifunctional organic monolayers as bridge compounds. Two different techniques are presented. One relies on the formation of self-assembled monolayers on these surfaces. When exposed to solutions of nanocrystals, these bridge compounds bind the crystals and anchor them to the surface. The second technique attaches nanocrystals already coated with bridge compounds to the surfaces. Analyses indicate the presence of quantum confined clusters on the surfaces at the nanolayer level. These materials allow electron spectroscopies to be completed on condensed phase clusters, and represent a first step towards synthesis of an organized assembly of clusters. These new products are also disclosed. 10 figs.
NASA Astrophysics Data System (ADS)
Farsadnia, F.; Rostami Kamrood, M.; Moghaddam Nia, A.; Modarres, R.; Bray, M. T.; Han, D.; Sadatinejad, J.
2014-02-01
One of the several methods in estimating flood quantiles in ungauged or data-scarce watersheds is regional frequency analysis. Amongst the approaches to regional frequency analysis, different clustering techniques have been proposed to determine hydrologically homogeneous regions in the literature. Recently, Self-Organization feature Map (SOM), a modern hydroinformatic tool, has been applied in several studies for clustering watersheds. However, further studies are still needed with SOM on the interpretation of SOM output map for identifying hydrologically homogeneous regions. In this study, two-level SOM and three clustering methods (fuzzy c-mean, K-mean, and Ward's Agglomerative hierarchical clustering) are applied in an effort to identify hydrologically homogeneous regions in Mazandaran province watersheds in the north of Iran, and their results are compared with each other. Firstly the SOM is used to form a two-dimensional feature map. Next, the output nodes of the SOM are clustered by using unified distance matrix algorithm and three clustering methods to form regions for flood frequency analysis. The heterogeneity test indicates the four regions achieved by the two-level SOM and Ward approach after adjustments are sufficiently homogeneous. The results suggest that the combination of SOM and Ward is much better than the combination of either SOM and FCM or SOM and K-mean.
Hartley, Suzanne; Foy, Robbie; Walwyn, Rebecca E A; Cicero, Robert; Farrin, Amanda J; Francis, Jill J; Lorencatto, Fabiana; Gould, Natalie J; Grant-Casey, John; Grimshaw, Jeremy M; Glidewell, Liz; Michie, Susan; Morris, Stephen; Stanworth, Simon J
2017-07-03
Blood for transfusion is a frequently used clinical intervention, and is also a costly and limited resource with risks. Many transfusions are given to stable and non-bleeding patients despite no clear evidence of benefit from clinical studies. Audit and feedback (A&F) is widely used to improve the quality of healthcare, including appropriate use of blood. However, its effects are often inconsistent, indicating the need for coordinated research including more head-to-head trials comparing different ways of delivering feedback. A programmatic series of research projects, termed the 'Audit and Feedback INterventions to Increase evidence-based Transfusion practIcE' (AFFINITIE) programme, aims to test different ways of developing and delivering feedback within an existing national audit structure. The evaluation will comprise two linked 2×2 factorial, cross-sectional cluster-randomised controlled trials. Each trial will estimate the effects of two feedback interventions, 'enhanced content' and 'enhanced follow-on support', designed in earlier stages of the AFFINITIE programme, compared to current practice. The interventions will be embedded within two rounds of the UK National Comparative Audit of Blood Transfusion (NCABT) focusing on patient blood management in surgery and use of blood transfusions in patients with haematological malignancies. The unit of randomisation will be National Health Service (NHS) trust or health board. Clusters providing care relevant to the audit topics will be randomised following each baseline audit (separately for each trial), with stratification for size (volume of blood transfusions) and region (Regional Transfusion Committee). The primary outcome for each topic will be the proportion of patients receiving a transfusion coded as unnecessary. For each audit topic a linked, mixed-method fidelity assessment and cost-effectiveness analysis will be conducted in parallel to the trial. AFFINITIE involves a series of studies to explore how A&F may be refined to change practice including two cluster randomised trials linked to national audits of transfusion practice. The methodology represents a step-wise increment in study design to more fully evaluate the effects of two enhanced feedback interventions on patient- and trust-level clinical, cost, safety and process outcomes. http://www.isrctn.com/ISRCTN15490813.
New clinical grading scales and objective measurement for conjunctival injection.
Park, In Ki; Chun, Yeoun Sook; Kim, Kwang Gi; Yang, Hee Kyung; Hwang, Jeong-Min
2013-08-05
To establish a new clinical grading scale and objective measurement method to evaluate conjunctival injection. Photographs of conjunctival injection with variable ocular diseases in 429 eyes were reviewed. Seventy-three images with concordance by three ophthalmologists were classified into a 4-step and 10-step subjective grading scale, and used as standard photographs. Each image was quantified in four ways: the relative magnitude of the redness component of each red-green-blue (RGB) pixel; two different algorithms based on the occupied area by blood vessels (K-means clustering with LAB color model and contrast-limited adaptive histogram equalization [CLAHE] algorithm); and the presence of blood vessel edges, based on the Canny edge-detection algorithm. Area under the receiver operating characteristic curves (AUCs) were calculated to summarize diagnostic accuracies of the four algorithms. The RGB color model, K-means clustering with LAB color model, and CLAHE algorithm showed good correlation with the clinical 10-step grading scale (R = 0.741, 0.784, 0.919, respectively) and with the clinical 4-step grading scale (R = 0.645, 0.702, 0.838, respectively). The CLAHE method showed the largest AUC, best distinction power (P < 0.001, ANOVA, Bonferroni multiple comparison test), and high reproducibility (R = 0.996). CLAHE algorithm showed the best correlation with the 10-step and 4-step subjective clinical grading scales together with high distinction power and reproducibility. CLAHE algorithm can be a useful for method for assessment of conjunctival injection.
NASA Astrophysics Data System (ADS)
Newman, W. I.; Turcotte, D. L.
2002-12-01
We have studied a hybrid model combining the forest-fire model with the site-percolation model in order to better understand the earthquake cycle. We consider a square array of sites. At each time step, a "tree" is dropped on a randomly chosen site and is planted if the site is unoccupied. When a cluster of "trees" spans the site (a percolating cluster), all the trees in the cluster are removed ("burned") in a "fire." The removal of the cluster is analogous to a characteristic earthquake and planting "trees" is analogous to increasing the regional stress. The clusters are analogous to the metastable regions of a fault over which an earthquake rupture can propagate once triggered. We find that the frequency-area statistics of the metastable regions are power-law with a negative exponent of two (as in the forest-fire model). This is analogous to the Gutenberg-Richter distribution of seismicity. This "self-organized critical behavior" can be explained in terms of an inverse cascade of clusters. Individual trees move from small to larger clusters until they are destroyed. This inverse cascade of clusters is self-similar and the power-law distribution of cluster sizes has been shown to have an exponent of two. We have quantified the forecasting of the spanning fires using error diagrams. The assumption that "fires" (earthquakes) are quasi-periodic has moderate predictability. The density of trees gives an improved degree of predictability, while the size of the largest cluster of trees provides a substantial improvement in forecasting a "fire."
Analysis of earthquake clustering and source spectra in the Salton Sea Geothermal Field
NASA Astrophysics Data System (ADS)
Cheng, Y.; Chen, X.
2015-12-01
The Salton Sea Geothermal field is located within the tectonic step-over between San Andreas Fault and Imperial Fault. Since the 1980s, geothermal energy exploration has resulted with step-like increase of microearthquake activities, which mirror the expansion of geothermal field. Distinguishing naturally occurred and induced seismicity, and their corresponding characteristics (e.g., energy release) is important for hazard assessment. Between 2008 and 2014, seismic data recorded by a local borehole array were provided public access from CalEnergy through SCEC data center; and the high quality local recording of over 7000 microearthquakes provides unique opportunity to sort out characteristics of induced versus natural activities. We obtain high-resolution earthquake location using improved S-wave picks, waveform cross-correlation and a new 3D velocity model. We then develop method to identify spatial-temporally isolated earthquake clusters. These clusters are classified into aftershock-type, swarm-type, and mixed-type (aftershock-like, with low skew, low magnitude and shorter duration), based on the relative timing of largest earthquakes and moment-release. The mixed-type clusters are mostly located at 3 - 4 km depth near injection well; while aftershock-type clusters and swarm-type clusters also occur further from injection well. By counting number of aftershocks within 1day following mainshock in each cluster, we find that the mixed-type clusters have much higher aftershock productivity compared with other types and historic M4 earthquakes. We analyze detailed spatial variation of 'b-value'. We find that the mixed-type clusters are mostly located within high b-value patches, while large (M>3) earthquakes and other types of clusters are located within low b-value patches. We are currently processing P and S-wave spectra to analyze the spatial-temporal correlation of earthquake stress parameter and seismicity characteristics. Preliminary results suggest that the mixed-type clusters and high b-value patches are spatially correlated with low stress drop earthquakes, indicating high-productivity microearthquakes within low differential stress region, potentially due to deeper injection activities.
Ishikawa, Akio; Neurock, Matthew; Iglesia, Enrique
2007-10-31
The identity and reversibility of the elementary steps required for catalytic combustion of dimethyl ether (DME) on Pt clusters were determined by combining isotopic and kinetic analyses with density functional theory estimates of reaction energies and activation barriers to probe the lowest energy paths. Reaction rates are limited by C-H bond activation in DME molecules adsorbed on surfaces of Pt clusters containing chemisorbed oxygen atoms at near-saturation coverages. Reaction energies and activation barriers for C-H bond activation in DME to form methoxymethyl and hydroxyl surface intermediates show that this step is more favorable than the activation of C-O bonds to form two methoxides, consistent with measured rates and kinetic isotope effects. This kinetic preference is driven by the greater stability of the CH3OCH2* and OH* intermediates relative to chemisorbed methoxides. Experimental activation barriers on Pt clusters agree with density functional theory (DFT)-derived barriers on oxygen-covered Pt(111). Measured DME turnover rates increased with increasing DME pressure, but decreased as the O2 pressure increased, because vacancies (*) on Pt surfaces nearly saturated with chemisorbed oxygen are required for DME chemisorption. DFT calculations show that although these surface vacancies are required, higher oxygen coverages lead to lower C-H activation barriers, because the basicity of oxygen adatoms increases with coverage and they become more effective in hydrogen abstraction from DME. Water inhibits reaction rates via quasi-equilibrated adsorption on vacancy sites, consistent with DFT results indicating that water binds more strongly than DME on vacancies. These conclusions are consistent with the measured kinetic response of combustion rates to DME, O2, and H2O, with H/D kinetic isotope effects, and with the absence of isotopic scrambling in reactants containing isotopic mixtures of 18O2-16O2 or 12CH3O12CH3-13CH3O13CH3. Turnover rates increased with Pt cluster size, because small clusters, with more coordinatively unsaturated surface atoms, bind oxygen atoms more strongly than larger clusters and exhibit lower steady-state vacancy concentrations and a consequently smaller number of adsorbed DME intermediates involved in kinetically relevant steps. These effects of cluster size and metal-oxygen bond energies on reactivity are ubiquitous in oxidation reactions requiring vacancies on surfaces nearly saturated with intermediates derived from O2.
Computational methods for evaluation of cell-based data assessment--Bioconductor.
Le Meur, Nolwenn
2013-02-01
Recent advances in miniaturization and automation of technologies have enabled cell-based assay high-throughput screening, bringing along new challenges in data analysis. Automation, standardization, reproducibility have become requirements for qualitative research. The Bioconductor community has worked in that direction proposing several R packages to handle high-throughput data including flow cytometry (FCM) experiment. Altogether, these packages cover the main steps of a FCM analysis workflow, that is, data management, quality assessment, normalization, outlier detection, automated gating, cluster labeling, and feature extraction. Additionally, the open-source philosophy of R and Bioconductor, which offers room for new development, continuously drives research and improvement of theses analysis methods, especially in the field of clustering and data mining. This review presents the principal FCM packages currently available in R and Bioconductor, their advantages and their limits. Copyright © 2012 Elsevier Ltd. All rights reserved.
Whitaker, Rhiannon; Perrett, Stephanie; Zou, Lu; Hickman, Matthew; Lyons, Marion
2015-01-01
Background: The prevalence of hepatitis C (HCV) is elevated within prison populations, yet diagnosis in prisons remains low. Dried blood spot testing (DBST) is a simple procedure for the detection of HCV antibodies; its impact on testing in the prison context is unknown. Methods: We carried out a stepped-wedge cluster-randomized control trial of DBST for HCV among prisoners within five male prisons and one female prison. Each prison was a separate cluster. The order in which the intervention (training in use of DBST for HCV testing and logistic support) was introduced was randomized across clusters. The outcome measure was the HCV testing rate by prison. Imputation analysis was carried out to account for missing data. Planned and actual intervention times differed in some prisons; data were thus analysed by intention to treat (ITT) and by observed step times. Results: There was insufficient evidence of an effect of the intervention on testing rate using either the ITT intervention time (OR: 0.84; 95% CI: 0.68–1.03; P = 0.088) or using the actual intervention time (OR: 0.86; 95% CI: 0.71–1.06; P = 0.153). This was confirmed by the pooled results of five imputed data sets. Conclusions: DBST as a stand-alone intervention was insufficient to increase HCV diagnosis within the UK prison setting. Factors such as staff training and allocation of staff time for regular clinics are key to improving service delivery. We demonstrate that prisons can conduct rigorous studies of new interventions, but data collection can be problematic. Trial registration: International Standard Randomized Controlled Trial Number Register (ISRCTN number ISRCTN05628482). PMID:25061233
Automatic document classification of biological literature
Chen, David; Müller, Hans-Michael; Sternberg, Paul W
2006-01-01
Background Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature. Results We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. Conclusion We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept. PMID:16893465
A Taxonomy of Accountable Care Organizations for Policy and Practice
Shortell, Stephen M; Wu, Frances M; Lewis, Valerie A; Colla, Carrie H; Fisher, Elliott S
2014-01-01
Objective To develop an exploratory taxonomy of Accountable Care Organizations (ACOs) to describe and understand early ACO development and to provide a basis for technical assistance and future evaluation of performance. Data Sources/Study Setting Data from the National Survey of Accountable Care Organizations, fielded between October 2012 and May 2013, of 173 Medicare, Medicaid, and commercial payer ACOs. Study Design Drawing on resource dependence and institutional theory, we develop measures of eight attributes of ACOs such as size, scope of services offered, and the use of performance accountability mechanisms. Data are analyzed using a two-step cluster analysis approach that accounts for both continuous and categorical data. Principal Findings We identified a reliable and internally valid three-cluster solution: larger, integrated systems that offer a broad scope of services and frequently include one or more postacute facilities; smaller, physician-led practices, centered in primary care, and that possess a relatively high degree of physician performance management; and moderately sized, joint hospital–physician and coalition-led groups that offer a moderately broad scope of services with some involvement of postacute facilities. Conclusions ACOs can be characterized into three distinct clusters. The taxonomy provides a framework for assessing performance, for targeting technical assistance, and for diagnosing potential antitrust violations. PMID:25251146
2014-01-01
Background As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Results Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. Conclusions When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates. PMID:24678701
DeGiorgio, Michael; Syring, John; Eckert, Andrew J; Liston, Aaron; Cronn, Richard; Neale, David B; Rosenberg, Noah A
2014-03-29
As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each "strategy" for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.
Analysis of the mutations induced by conazole fungicides in vivo.
Ross, Jeffrey A; Leavitt, Sharon A
2010-05-01
The mouse liver tumorigenic conazole fungicides triadimefon and propiconazole have previously been shown to be in vivo mouse liver mutagens in the Big Blue transgenic mutation assay when administered in feed at tumorigenic doses, whereas the non-tumorigenic conazole myclobutanil was not mutagenic. DNA sequencing of the mutants recovered from each treatment group as well as from animals receiving control diet was conducted to gain additional insight into the mode of action by which tumorigenic conazoles induce mutations. Relative dinucleotide mutabilities (RDMs) were calculated for each possible dinucleotide in each treatment group and then examined by multivariate statistical analysis techniques. Unsupervised hierarchical clustering analysis of RDM values segregated two independent control groups together, along with the non-tumorigen myclobutanil. The two tumorigenic conazoles clustered together in a distinct grouping. Partitioning around mediods of RDM values into two clusters also groups the triadimefon and propiconazole together in one cluster and the two control groups and myclobutanil together in a second cluster. Principal component analysis of these results identifies two components that account for 88.3% of the variability in the points. Taken together, these results are consistent with the hypothesis that propiconazole- and triadimefon-induced mutations do not represent clonal expansion of background mutations and support the hypothesis that they arise from the accumulation of reactive electrophilic metabolic intermediates within the liver in vivo.
NASA Astrophysics Data System (ADS)
Kawahara, Hajime; Reese, Erik D.; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi
2008-11-01
Our previous analysis indicates that small-scale fluctuations in the intracluster medium (ICM) from cosmological hydrodynamic simulations follow the lognormal probability density function. In order to test the lognormal nature of the ICM directly against X-ray observations of galaxy clusters, we develop a method of extracting statistical information about the three-dimensional properties of the fluctuations from the two-dimensional X-ray surface brightness. We first create a set of synthetic clusters with lognormal fluctuations around their mean profile given by spherical isothermal β-models, later considering polytropic temperature profiles as well. Performing mock observations of these synthetic clusters, we find that the resulting X-ray surface brightness fluctuations also follow the lognormal distribution fairly well. Systematic analysis of the synthetic clusters provides an empirical relation between the three-dimensional density fluctuations and the two-dimensional X-ray surface brightness. We analyze Chandra observations of the galaxy cluster Abell 3667, and find that its X-ray surface brightness fluctuations follow the lognormal distribution. While the lognormal model was originally motivated by cosmological hydrodynamic simulations, this is the first observational confirmation of the lognormal signature in a real cluster. Finally we check the synthetic cluster results against clusters from cosmological hydrodynamic simulations. As a result of the complex structure exhibited by simulated clusters, the empirical relation between the two- and three-dimensional fluctuation properties calibrated with synthetic clusters when applied to simulated clusters shows large scatter. Nevertheless we are able to reproduce the true value of the fluctuation amplitude of simulated clusters within a factor of 2 from their two-dimensional X-ray surface brightness alone. Our current methodology combined with existing observational data is useful in describing and inferring the statistical properties of the three-dimensional inhomogeneity in galaxy clusters.
Cluster Stability Estimation Based on a Minimal Spanning Trees Approach
NASA Astrophysics Data System (ADS)
Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard-Wilhelm; Toledano-Kitai, Dvora
2009-08-01
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stability of partitions obtained from clustering of samples. Partitions are considered consistent if their clusters are stable. Clusters validity is measured as the total number of edges, in the clusters' minimal spanning trees, connecting points from different samples. Actually, we use the Friedman and Rafsky two sample test statistic. The homogeneity hypothesis, of well mingled samples within the clusters, leads to asymptotic normal distribution of the considered statistic. Resting upon this fact, the standard score of the mentioned edges quantity is set, and the partition quality is represented by the worst cluster corresponding to the minimal standard score value. It is natural to expect that the true number of clusters can be characterized by the empirical distribution having the shortest left tail. The proposed methodology sequentially creates the described value distribution and estimates its left-asymmetry. Numerical experiments, presented in the paper, demonstrate the ability of the approach to detect the true number of clusters.
Py, Béatrice; Barras, Frédéric
2015-06-01
Since their discovery in the 50's, Fe-S cluster proteins have attracted much attention from chemists, biophysicists and biochemists. However, in the 80's they were joined by geneticists who helped to realize that in vivo maturation of Fe-S cluster bound proteins required assistance of a large number of factors defining complex multi-step pathways. The question of how clusters are formed and distributed in vivo has since been the focus of much effort. Here we review how genetics in discovering genes and investigating processes as they unfold in vivo has provoked seminal advances toward our understanding of Fe-S cluster biogenesis. The power and limitations of genetic approaches are discussed. As a final comment, we argue how the marriage of classic strategies and new high-throughput technologies should allow genetics of Fe-S cluster biology to be even more insightful in the future. This article is part of a Special Issue entitled: Fe/S proteins: Analysis, structure, function, biogenesis and diseases. Copyright © 2015 Elsevier B.V. All rights reserved.
Innovating Big Data Computing Geoprocessing for Analysis of Engineered-Natural Systems
NASA Astrophysics Data System (ADS)
Rose, K.; Baker, V.; Bauer, J. R.; Vasylkivska, V.
2016-12-01
Big data computing and analytical techniques offer opportunities to improve predictions about subsurface systems while quantifying and characterizing associated uncertainties from these analyses. Spatial analysis, big data and otherwise, of subsurface natural and engineered systems are based on variable resolution, discontinuous, and often point-driven data to represent continuous phenomena. We will present examples from two spatio-temporal methods that have been adapted for use with big datasets and big data geo-processing capabilities. The first approach uses regional earthquake data to evaluate spatio-temporal trends associated with natural and induced seismicity. The second algorithm, the Variable Grid Method (VGM), is a flexible approach that presents spatial trends and patterns, such as those resulting from interpolation methods, while simultaneously visualizing and quantifying uncertainty in the underlying spatial datasets. In this presentation we will show how we are utilizing Hadoop to store and perform spatial analyses to efficiently consume and utilize large geospatial data in these custom analytical algorithms through the development of custom Spark and MapReduce applications that incorporate ESRI Hadoop libraries. The team will present custom `Big Data' geospatial applications that run on the Hadoop cluster and integrate with ESRI ArcMap with the team's probabilistic VGM approach. The VGM-Hadoop tool has been specially built as a multi-step MapReduce application running on the Hadoop cluster for the purpose of data reduction. This reduction is accomplished by generating multi-resolution, non-overlapping, attributed topology that is then further processed using ESRI's geostatistical analyst to convey a probabilistic model of a chosen study region. Finally, we will share our approach for implementation of data reduction and topology generation via custom multi-step Hadoop applications, performance benchmarking comparisons, and Hadoop-centric opportunities for greater parallelization of geospatial operations.
NASA Astrophysics Data System (ADS)
Wang, Xiao; Gao, Feng; Dong, Junyu; Qi, Qiang
2018-04-01
Synthetic aperture radar (SAR) image is independent on atmospheric conditions, and it is the ideal image source for change detection. Existing methods directly analysis all the regions in the speckle noise contaminated difference image. The performance of these methods is easily affected by small noisy regions. In this paper, we proposed a novel change detection framework for saliency-guided change detection based on pattern and intensity distinctiveness analysis. The saliency analysis step can remove small noisy regions, and therefore makes the proposed method more robust to the speckle noise. In the proposed method, the log-ratio operator is first utilized to obtain a difference image (DI). Then, the saliency detection method based on pattern and intensity distinctiveness analysis is utilized to obtain the changed region candidates. Finally, principal component analysis and k-means clustering are employed to analysis pixels in the changed region candidates. Thus, the final change map can be obtained by classifying these pixels into changed or unchanged class. The experiment results on two real SAR images datasets have demonstrated the effectiveness of the proposed method.
Botía, Juan A; Vandrovcova, Jana; Forabosco, Paola; Guelfi, Sebastian; D'Sa, Karishma; Hardy, John; Lewis, Cathryn M; Ryten, Mina; Weale, Michael E
2017-04-12
Weighted Gene Co-expression Network Analysis (WGCNA) is a widely used R software package for the generation of gene co-expression networks (GCN). WGCNA generates both a GCN and a derived partitioning of clusters of genes (modules). We propose k-means clustering as an additional processing step to conventional WGCNA, which we have implemented in the R package km2gcn (k-means to gene co-expression network, https://github.com/juanbot/km2gcn ). We assessed our method on networks created from UKBEC data (10 different human brain tissues), on networks created from GTEx data (42 human tissues, including 13 brain tissues), and on simulated networks derived from GTEx data. We observed substantially improved module properties, including: (1) few or zero misplaced genes; (2) increased counts of replicable clusters in alternate tissues (x3.1 on average); (3) improved enrichment of Gene Ontology terms (seen in 48/52 GCNs) (4) improved cell type enrichment signals (seen in 21/23 brain GCNs); and (5) more accurate partitions in simulated data according to a range of similarity indices. The results obtained from our investigations indicate that our k-means method, applied as an adjunct to standard WGCNA, results in better network partitions. These improved partitions enable more fruitful downstream analyses, as gene modules are more biologically meaningful.
Martín, Juan F.; Liras, Paloma
2017-01-01
The clavine alkaloids produced by the fungi of the Aspergillaceae and Arthrodermatacea families differ from the ergot alkaloids produced by Claviceps and Neotyphodium. The clavine alkaloids lack the extensive peptide chain modifications that occur in lysergic acid derived ergot alkaloids. Both clavine and ergot alkaloids arise from the condensation of tryptophan and dimethylallylpyrophosphate by the action of the dimethylallyltryptophan synthase. The first five steps of the biosynthetic pathway that convert tryptophan and dimethylallyl-pyrophosphate (DMA-PP) in chanoclavine-1-aldehyde are common to both clavine and ergot alkaloids. The biosynthesis of ergot alkaloids has been extensively studied and is not considered in this article. We focus this review on recent advances in the gene clusters for clavine alkaloids in the species of Penicillium, Aspergillus (Neosartorya), Arthroderma and Trychophyton and the enzymes encoded by them. The final products of the clavine alkaloids pathways derive from the tetracyclic ergoline ring, which is modified by late enzymes, including a reverse type prenyltransferase, P450 monooxygenases and acetyltransferases. In Aspergillus japonicus, a α-ketoglutarate and Fe2+-dependent dioxygenase is involved in the cyclization of a festuclavine-like unknown type intermediate into cycloclavine. Related dioxygenases occur in the biosynthetic gene clusters of ergot alkaloids in Claviceps purpurea and also in the clavine clusters in Penicillium species. The final products of the clavine alkaloid pathway in these fungi differ from each other depending on the late biosynthetic enzymes involved. An important difference between clavine and ergot alkaloid pathways is that clavine producers lack the enzyme CloA, a P450 monooxygenase, involved in one of the steps of the conversion of chanoclavine-1-aldehyde into lysergic acid. Bioinformatic analysis of the sequenced genomes of the Aspergillaceae and Arthrodermataceae fungi showed the presence of clavine gene clusters in Arthroderma species, Penicillium roqueforti, Penicillium commune, Penicillium camemberti, Penicillium expansum, Penicillium steckii and Penicillium griseofulvum. Analysis of the gene clusters in several clavine alkaloid producers indicates that there are gene gains, gene losses and gene rearrangements. These findings may be explained by a divergent evolution of the gene clusters of ergot and clavine alkaloids from a common ancestral progenitor six genes cluster although horizontal gene transfer of some specific genes may have occurred more recently. PMID:29186777
On the Partitioning of Squared Euclidean Distance and Its Applications in Cluster Analysis.
ERIC Educational Resources Information Center
Carter, Randy L.; And Others
1989-01-01
The partitioning of squared Euclidean--E(sup 2)--distance between two vectors in M-dimensional space into the sum of squared lengths of vectors in mutually orthogonal subspaces is discussed. Applications to specific cluster analysis problems are provided (i.e., to design Monte Carlo studies for performance comparisons of several clustering methods…
Maturation of nitrogenase cofactor—the role of a class E radical SAM methyltransferase NifB
Hu, Yilin; Ribbe, Markus W.
2016-01-01
Nitrogenase catalyzes the important reactions of N2-, CO- and CO2-reduction at its active cofactor site. Designated the M-cluster, this complex metallocofactor is assembled through the generation of a characteristic 8Fe-core prior to the insertion of Mo and homocitrate that completes the stoichiometry of the M-cluster. NifB catalyzes the critical step of radical SAM-dependent carbide insertion that occurs concomitant with the insertion a “9th” sulfur and the rearrangement/coupling of two 4Fe-clusters into a complete 8Fe-core of the M-cluster. Further categorization of a family of NifB proteins as a new class of radical SAM methyltransferases suggests a general function of these proteins in complex metallocofactor assembly and provides a new platform for unveiling unprecedented chemical reactions catalyzed by biological systems. PMID:26969410
Automated detection of microcalcification clusters in mammograms
NASA Astrophysics Data System (ADS)
Karale, Vikrant A.; Mukhopadhyay, Sudipta; Singh, Tulika; Khandelwal, Niranjan; Sadhu, Anup
2017-03-01
Mammography is the most efficient modality for detection of breast cancer at early stage. Microcalcifications are tiny bright spots in mammograms and can often get missed by the radiologist during diagnosis. The presence of microcalcification clusters in mammograms can act as an early sign of breast cancer. This paper presents a completely automated computer-aided detection (CAD) system for detection of microcalcification clusters in mammograms. Unsharp masking is used as a preprocessing step which enhances the contrast between microcalcifications and the background. The preprocessed image is thresholded and various shape and intensity based features are extracted. Support vector machine (SVM) classifier is used to reduce the false positives while preserving the true microcalcification clusters. The proposed technique is applied on two different databases i.e DDSM and private database. The proposed technique shows good sensitivity with moderate false positives (FPs) per image on both databases.
Cappelletti, Martina; Presentato, Alessandro; Milazzo, Giorgio; Turner, Raymond J.; Fedi, Stefano; Frascari, Dario; Zannoni, Davide
2015-01-01
Rhodococcus sp. strain BCP1 was initially isolated for its ability to grow on gaseous n-alkanes, which act as inducers for the co-metabolic degradation of low-chlorinated compounds. Here, both molecular and metabolic features of BCP1 cells grown on gaseous and short-chain n-alkanes (up to n-heptane) were examined in detail. We show that propane metabolism generated terminal and sub-terminal oxidation products such as 1- and 2-propanol, whereas 1-butanol was the only terminal oxidation product detected from n-butane metabolism. Two gene clusters, prmABCD and smoABCD—coding for Soluble Di-Iron Monooxgenases (SDIMOs) involved in gaseous n-alkanes oxidation—were detected in the BCP1 genome. By means of Reverse Transcriptase-quantitative PCR (RT-qPCR) analysis, a set of substrates inducing the expression of the sdimo genes in BCP1 were assessed as well as their transcriptional repression in the presence of sugars, organic acids, or during the cell growth on rich medium (Luria–Bertani broth). The transcriptional start sites of both the sdimo gene clusters were identified by means of primer extension experiments. Finally, proteomic studies revealed changes in the protein pattern induced by growth on gaseous- (n-butane) and/or liquid (n-hexane) short-chain n-alkanes as compared to growth on succinate. Among the differently expressed protein spots, two chaperonins and an isocytrate lyase were identified along with oxidoreductases involved in oxidation reactions downstream of the initial monooxygenase reaction step. PMID:26029173
Structural and morphological properties of mesoporous carbon coated molybdenum oxide films
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dayal, Saurabh, E-mail: saurabhdayal153@gmail.com; Kumar, C. Sasi, E-mail: csasimv@gmail.com
2016-05-06
In the present study, we report the structural and morphological properties of mesoporous carbon coated molybdenum oxide films. The deposition of films was carried out in a two-step process, the first step involves deposition of molybdenum and carbon bilayer thin films using DC magnetron sputtering. In the second step the sample was ex-situ annealed in a muffle furnace at different temperatures (400°C to 600°C) and air cooled in the ambient atmosphere. The formation of the meso-porous carbon clusters on molybdenum oxide during the cooling step was investigated using FESEM and AFM techniques. The structural details were explored using XRD. Themore » meso-porous carbon were found growing over molybdenum oxide layer as a result of segregation phenomena.« less
NASA Astrophysics Data System (ADS)
Truong, Thanh N.; Stefanovich, Eugene V.
1997-05-01
We present a study of micro-solvation of Cl anion by water clusters of the size up to seven molecules using a perturbative Monte Carlo approach with a hybrid HF/MM potential. In this approach, a perturbation theory was used to avoid performing full SCF calculations at every Monte Carlo step. In this study, the anion is treated quantum mechanically at the HF/6-31G ∗ level of theory while interactions between solvent waters are presented by the TIP3P potential force field. Analysis on the solvent induced dipole moment of the ion indicates that the Cl anion resides most of the time on the surface of the clusters. Accuracy of the perturbative MC approach is also discussed.
Automatic Semantic Orientation of Adjectives for Indonesian Language Using PMI-IR and Clustering
NASA Astrophysics Data System (ADS)
Riyanti, Dewi; Arif Bijaksana, M.; Adiwijaya
2018-03-01
We present our work in the area of sentiment analysis for Indonesian language. We focus on bulding automatic semantic orientation using available resources in Indonesian. In this research we used Indonesian corpus that contains 9 million words from kompas.txt and tempo.txt that manually tagged and annotated with of part-of-speech tagset. And then we construct a dataset by taking all the adjectives from the corpus, removing the adjective with no orientation. The set contained 923 adjective words. This systems will include several steps such as text pre-processing and clustering. The text pre-processing aims to increase the accuracy. And finally clustering method will classify each word to related sentiment which is positive or negative. With improvements to the text preprocessing, can be achieved 72% of accuracy.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko
2012-07-15
Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system. Copyright © 2012 Elsevier B.V. All rights reserved.
Assessment of cluster yield components by image analysis.
Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose
2015-04-01
Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
2 x 2 Achievement Goals and Achievement Emotions: A Cluster Analysis of Students' Motivation
ERIC Educational Resources Information Center
Jang, Leong Yeok; Liu, Woon Chia
2012-01-01
This study sought to better understand the adoption of multiple achievement goals at an intra-individual level, and its links to emotional well-being, learning, and academic achievement. Participants were 480 Secondary Two students (aged between 13 and 14 years) from two coeducational government schools. Hierarchical cluster analysis revealed the…
Spot detection and image segmentation in DNA microarray data.
Qin, Li; Rueda, Luis; Ali, Adnan; Ngom, Alioune
2005-01-01
Following the invention of microarrays in 1994, the development and applications of this technology have grown exponentially. The numerous applications of microarray technology include clinical diagnosis and treatment, drug design and discovery, tumour detection, and environmental health research. One of the key issues in the experimental approaches utilising microarrays is to extract quantitative information from the spots, which represent genes in a given experiment. For this process, the initial stages are important and they influence future steps in the analysis. Identifying the spots and separating the background from the foreground is a fundamental problem in DNA microarray data analysis. In this review, we present an overview of state-of-the-art methods for microarray image segmentation. We discuss the foundations of the circle-shaped approach, adaptive shape segmentation, histogram-based methods and the recently introduced clustering-based techniques. We analytically show that clustering-based techniques are equivalent to the one-dimensional, standard k-means clustering algorithm that utilises the Euclidean distance.
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Accidents of Electrical and Mechanical Works for Public Sector Projects in Hong Kong.
Wong, Francis K W; Chan, Albert P C; Wong, Andy K D; Hon, Carol K H; Choi, Tracy N Y
2018-03-10
A study on electrical and mechanical (E&M) works-related accidents for public sector projects provided the opportunity to gain a better understanding of the causes of accidents by analyzing the circumstances of all E&M works accidents. The research aims to examine accidents of E&M works which happened in public sector projects. A total of 421 E&M works-related accidents in the "Public Works Programme Construction Site Safety and Environmental Statistics" (PCSES) system were extracted for analysis. Two-step cluster analysis was conducted to classify the E&M accidents into different groups. The results identified three E&M accidents groups: (1) electricians with over 15 years of experience were prone to 'fall of person from height'; (2) electricians with zero to five years of experience were prone to 'slip, trip or fall on same level'; (3) air-conditioning workers with zero to five years of experience were prone to multiple types of accidents. Practical measures were recommended for each specific cluster group to avoid recurrence of similar accidents. The accident analysis would be vital for industry practitioners to enhance the safety performance of public sector projects. This study contributes to filling the knowledge gap of how and why E&M accidents occur and promulgating preventive measures for E&M accidents which have been under researched.
Accidents of Electrical and Mechanical Works for Public Sector Projects in Hong Kong
Wong, Francis K. W.; Chan, Albert P. C.; Wong, Andy K. D.; Choi, Tracy N. Y.
2018-01-01
A study on electrical and mechanical (E&M) works-related accidents for public sector projects provided the opportunity to gain a better understanding of the causes of accidents by analyzing the circumstances of all E&M works accidents. The research aims to examine accidents of E&M works which happened in public sector projects. A total of 421 E&M works-related accidents in the “Public Works Programme Construction Site Safety and Environmental Statistics” (PCSES) system were extracted for analysis. Two-step cluster analysis was conducted to classify the E&M accidents into different groups. The results identified three E&M accidents groups: (1) electricians with over 15 years of experience were prone to ‘fall of person from height’; (2) electricians with zero to five years of experience were prone to ‘slip, trip or fall on same level’; (3) air-conditioning workers with zero to five years of experience were prone to multiple types of accidents. Practical measures were recommended for each specific cluster group to avoid recurrence of similar accidents. The accident analysis would be vital for industry practitioners to enhance the safety performance of public sector projects. This study contributes to filling the knowledge gap of how and why E&M accidents occur and promulgating preventive measures for E&M accidents which have been under researched. PMID:29534429
Storbeck, Sonja; Rolfes, Sarah; Raux-Deery, Evelyne; Warren, Martin J; Jahn, Dieter; Layer, Gunhild
2010-12-13
Heme is an essential prosthetic group for many proteins involved in fundamental biological processes in all three domains of life. In Eukaryota and Bacteria heme is formed via a conserved and well-studied biosynthetic pathway. Surprisingly, in Archaea heme biosynthesis proceeds via an alternative route which is poorly understood. In order to formulate a working hypothesis for this novel pathway, we searched 59 completely sequenced archaeal genomes for the presence of gene clusters consisting of established heme biosynthetic genes and colocalized conserved candidate genes. Within the majority of archaeal genomes it was possible to identify such heme biosynthesis gene clusters. From this analysis we have been able to identify several novel heme biosynthesis genes that are restricted to archaea. Intriguingly, several of the encoded proteins display similarity to enzymes involved in heme d(1) biosynthesis. To initiate an experimental verification of our proposals two Methanosarcina barkeri proteins predicted to catalyze the initial steps of archaeal heme biosynthesis were recombinantly produced, purified, and their predicted enzymatic functions verified.
Shin, Sung Hee; Yun, Eun Kyoung
2011-06-01
This study was conducted to explore the profiles of online health information users in terms of certain psychological characteristics and to suggest guidelines for the provision of better user-oriented health information service. The cross-sectional study design was used with convenient sampling by Web-based questionnaire survey in Korea. To analyze health information user profiles on the Internet, a two-step cluster analysis was conducted. The results reveal that online health information users can be classified into four groups according to their level of subjective knowledge and health concern. The findings also suggest that four clusters that exhibit distinct profile patterns exist. The findings of this study would be useful for health portal developers who would like to understand users' characteristics and behaviors and to provide more user-oriented service in a satisfactory manner. It is suggested that to develop a full understanding of users' behaviors regarding Internet health information service, further research would be needed to explore users' various needs, their preferences, and relevant factors among users across a variety of health problem-addressing Web sites at different professional levels.
Storbeck, Sonja; Rolfes, Sarah; Raux-Deery, Evelyne; Warren, Martin J.; Jahn, Dieter; Layer, Gunhild
2010-01-01
Heme is an essential prosthetic group for many proteins involved in fundamental biological processes in all three domains of life. In Eukaryota and Bacteria heme is formed via a conserved and well-studied biosynthetic pathway. Surprisingly, in Archaea heme biosynthesis proceeds via an alternative route which is poorly understood. In order to formulate a working hypothesis for this novel pathway, we searched 59 completely sequenced archaeal genomes for the presence of gene clusters consisting of established heme biosynthetic genes and colocalized conserved candidate genes. Within the majority of archaeal genomes it was possible to identify such heme biosynthesis gene clusters. From this analysis we have been able to identify several novel heme biosynthesis genes that are restricted to archaea. Intriguingly, several of the encoded proteins display similarity to enzymes involved in heme d 1 biosynthesis. To initiate an experimental verification of our proposals two Methanosarcina barkeri proteins predicted to catalyze the initial steps of archaeal heme biosynthesis were recombinantly produced, purified, and their predicted enzymatic functions verified. PMID:21197080
Chapter 7. Cloning and analysis of natural product pathways.
Gust, Bertolt
2009-01-01
The identification of gene clusters of natural products has lead to an enormous wealth of information about their biosynthesis and its regulation, and about self-resistance mechanisms. Well-established routine techniques are now available for the cloning and sequencing of gene clusters. The subsequent functional analysis of the complex biosynthetic machinery requires efficient genetic tools for manipulation. Until recently, techniques for the introduction of defined changes into Streptomyces chromosomes were very time-consuming. In particular, manipulation of large DNA fragments has been challenging due to the absence of suitable restriction sites for restriction- and ligation-based techniques. The homologous recombination approach called recombineering (referred to as Red/ET-mediated recombination in this chapter) has greatly facilitated targeted genetic modifications of complex biosynthetic pathways from actinomycetes by eliminating many of the time-consuming and labor-intensive steps. This chapter describes techniques for the cloning and identification of biosynthetic gene clusters, for the generation of gene replacements within such clusters, for the construction of integrative library clones and their expression in heterologous hosts, and for the assembly of entire biosynthetic gene clusters from the inserts of individual library clones. A systematic approach toward insertional mutation of a complete Streptomyces genome is shown by the use of an in vitro transposon mutagenesis procedure.
NASA Technical Reports Server (NTRS)
Hasler, Nicole; Bulbul, Esra; Bonamente, Massimiliano; Carlstrom, John E.; Culverhouse, Thomas L.; Gralla, Megan; Greer, Christopher; Lamb, James W.; Hawkins, David; Hennessy, Ryan;
2012-01-01
We perform a joint analysis of X-ray and Sunyaev-Zel'dovich effect data using an analytic model that describes the gas properties of galaxy clusters. The joint analysis allows the measurement of the cluster gas mass fraction profile and Hubble constant independent of cosmological parameters. Weak cosmological priors are used to calculate the overdensity radius within which the gas mass fractions are reported. Such an analysis can provide direct constraints on the evolution of the cluster gas mass fraction with redshift. We validate the model and the joint analysis on high signal-to-noise data from the Chandra X-ray Observatory and the Sunyaev-Zel'dovich Array for two clusters, A2631 and A2204.
NASA Technical Reports Server (NTRS)
Wharton, S. W.
1980-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. The algorithm interfaces the rapid numerical processing capacity of a computer with the human ability to integrate qualitative information. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters and the analyst, who evaluate and elect to modify the cluster structure. Clusters can be deleted or lumped pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The ICAP was implemented in APL (A Programming Language), an interactive computer language. The flexibility of the algorithm was evaluated using data from different LANDSAT scenes to simulate two situations: one in which the analyst is assumed to have no prior knowledge about the data and wishes to have the clusters formed more or less automatically; and the other in which the analyst is assumed to have some knowledge about the data structure and wishes to use that information to closely supervise the clustering process. For comparison, an existing clustering method was also applied to the two data sets.
TCW: Transcriptome Computational Workbench
Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.
2013-01-01
Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959
TCW: transcriptome computational workbench.
Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R
2013-01-01
The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.
Temporal diagnostic analysis of the SWAT model to detect dominant periods of poor model performance
NASA Astrophysics Data System (ADS)
Guse, Björn; Reusser, Dominik E.; Fohrer, Nicola
2013-04-01
Hydrological models generally include thresholds and non-linearities, such as snow-rain-temperature thresholds, non-linear reservoirs, infiltration thresholds and the like. When relating observed variables to modelling results, formal methods often calculate performance metrics over long periods, reporting model performance with only few numbers. Such approaches are not well suited to compare dominating processes between reality and model and to better understand when thresholds and non-linearities are driving model results. We present a combination of two temporally resolved model diagnostic tools to answer when a model is performing (not so) well and what the dominant processes are during these periods. We look at the temporal dynamics of parameter sensitivities and model performance to answer this question. For this, the eco-hydrological SWAT model is applied in the Treene lowland catchment in Northern Germany. As a first step, temporal dynamics of parameter sensitivities are analyzed using the Fourier Amplitude Sensitivity test (FAST). The sensitivities of the eight model parameters investigated show strong temporal variations. High sensitivities were detected for two groundwater (GW_DELAY, ALPHA_BF) and one evaporation parameters (ESCO) most of the time. The periods of high parameter sensitivity can be related to different phases of the hydrograph with dominances of the groundwater parameters in the recession phases and of ESCO in baseflow and resaturation periods. Surface runoff parameters show high parameter sensitivities in phases of a precipitation event in combination with high soil water contents. The dominant parameters give indication for the controlling processes during a given period for the hydrological catchment. The second step included the temporal analysis of model performance. For each time step, model performance was characterized with a "finger print" consisting of a large set of performance measures. These finger prints were clustered into four reoccurring patterns of typical model performance, which can be related to different phases of the hydrograph. Overall, the baseflow cluster has the lowest performance. By combining the periods with poor model performance with the dominant model components during these phases, the groundwater module was detected as the model part with the highest potential for model improvements. The detection of dominant processes in periods of poor model performance enhances the understanding of the SWAT model. Based on this, concepts how to improve the SWAT model structure for the application in German lowland catchment are derived.
Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033
NASA Astrophysics Data System (ADS)
Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.
2018-04-01
Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.
A biogenesis step upstream of Microprocessor controls miR-17~92 expression
Du, Peng; Wang, Longfei; Sliz, Piotr; Gregory, Richard I.
2015-01-01
SUMMARY The precise control of miR-17~92 microRNA (miRNA) is essential for normal development and overexpression of certain miRNAs from this cluster is oncogenic. Here we find the relative expression of the six miRNAs processed from the primary (pri-miR-17~92) transcript is dynamically regulated during embryonic stem cell (ESC) differentiation. Pri-miR-17~92 is processed to a biogenesis intermediate, termed ‘progenitor-miRNA’ (pro-miRNA). Pro-miRNA is an efficient substrate for Microprocessor and is required to selectively license production of pre-miR-17, -18a, -19a, 20a, and -19b from this cluster. Two complementary cis-regulatory repression domains within pri-miR-17~92 are required for the blockade of miRNA processing through the formation of an autoinhibitory RNA conformation. The endonuclease CPSF3 (CPSF73), and the Spliceosome-associated ISY1 are responsible for pro-miRNA biogenesis and expression of all miRNAs within the cluster except miR-92. Thus, developmentally regulated pro-miRNA processing is key step controlling miRNA expression and explains the posttranscriptional control of miR-17~92 expression in development. PMID:26255770
Chen, Shan; Li, Xiao-ning; Liang, Yi-zeng; Zhang, Zhi-min; Liu, Zhao-xia; Zhang, Qi-ming; Ding, Li-xia; Ye, Fei
2010-08-01
During Raman spectroscopy analysis, the organic molecules and contaminations will obscure or swamp Raman signals. The present study starts from Raman spectra of prednisone acetate tablets and glibenclamide tables, which are acquired from the BWTek i-Raman spectrometer. The background is corrected by R package baselineWavelet. Then principle component analysis and random forests are used to perform clustering analysis. Through analyzing the Raman spectra of two medicines, the accurate and validity of this background-correction algorithm is checked and the influences of fluorescence background on Raman spectra clustering analysis is discussed. Thus, it is concluded that it is important to correct fluorescence background for further analysis, and an effective background correction solution is provided for clustering or other analysis.
Molgenis-impute: imputation pipeline in a box.
Kanterakis, Alexandros; Deelen, Patrick; van Dijk, Freerk; Byelas, Heorhiy; Dijkstra, Martijn; Swertz, Morris A
2015-08-19
Genotype imputation is an important procedure in current genomic analysis such as genome-wide association studies, meta-analyses and fine mapping. Although high quality tools are available that perform the steps of this process, considerable effort and expertise is required to set up and run a best practice imputation pipeline, particularly for larger genotype datasets, where imputation has to scale out in parallel on computer clusters. Here we present MOLGENIS-impute, an 'imputation in a box' solution that seamlessly and transparently automates the set up and running of all the steps of the imputation process. These steps include genome build liftover (liftovering), genotype phasing with SHAPEIT2, quality control, sample and chromosomal chunking/merging, and imputation with IMPUTE2. MOLGENIS-impute builds on MOLGENIS-compute, a simple pipeline management platform for submission and monitoring of bioinformatics tasks in High Performance Computing (HPC) environments like local/cloud servers, clusters and grids. All the required tools, data and scripts are downloaded and installed in a single step. Researchers with diverse backgrounds and expertise have tested MOLGENIS-impute on different locations and imputed over 30,000 samples so far using the 1,000 Genomes Project and new Genome of the Netherlands data as the imputation reference. The tests have been performed on PBS/SGE clusters, cloud VMs and in a grid HPC environment. MOLGENIS-impute gives priority to the ease of setting up, configuring and running an imputation. It has minimal dependencies and wraps the pipeline in a simple command line interface, without sacrificing flexibility to adapt or limiting the options of underlying imputation tools. It does not require knowledge of a workflow system or programming, and is targeted at researchers who just want to apply best practices in imputation via simple commands. It is built on the MOLGENIS compute workflow framework to enable customization with additional computational steps or it can be included in other bioinformatics pipelines. It is available as open source from: https://github.com/molgenis/molgenis-imputation.
Performance analysis of clustering techniques over microarray data: A case study
NASA Astrophysics Data System (ADS)
Dash, Rasmita; Misra, Bijan Bihari
2018-03-01
Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
A 3D particle Monte Carlo approach to studying nucleation
NASA Astrophysics Data System (ADS)
Köhn, Christoph; Enghoff, Martin Bødker; Svensmark, Henrik
2018-06-01
The nucleation of sulphuric acid molecules plays a key role in the formation of aerosols. We here present a three dimensional particle Monte Carlo model to study the growth of sulphuric acid clusters as well as its dependence on the ambient temperature and the initial particle density. We initiate a swarm of sulphuric acid-water clusters with a size of 0.329 nm with densities between 107 and 108 cm-3 at temperatures between 200 and 300 K and a relative humidity of 50%. After every time step, we update the position of particles as a function of size-dependent diffusion coefficients. If two particles encounter, we merge them and add their volumes and masses. Inversely, we check after every time step whether a polymer evaporates liberating a molecule. We present the spatial distribution as well as the size distribution calculated from individual clusters. We also calculate the nucleation rate of clusters with a radius of 0.85 nm as a function of time, initial particle density and temperature. The nucleation rates obtained from the presented model agree well with experimentally obtained values and those of a numerical model which serves as a benchmark of our code. In contrast to previous nucleation models, we here present for the first time a code capable of tracing individual particles and thus of capturing the physics related to the discrete nature of particles.
Social network types and well-being among South Korean older adults.
Park, Sojung; Smith, Jacqui; Dunkle, Ruth E
2014-01-01
The social networks of older individuals reflect personal life history and cultural factors. Despite these two sources of variation, four similar network types have been identified in Europe, North America, Japan, and China: namely 'restricted', 'family', 'friend', and 'diverse'. This study identified the social network types of Korean older adults and examined differential associations of the network types with well-being. The analysis used data from the 2008 wave of the Korean Longitudinal Study of Aging (KLoSA: N = 4251, age range 65-108). We used a two-step cluster analytical approach to identify network types from seven indicators of network structure and function. Regression models determined associations between network types and well-being outcomes, including life satisfaction and depressive symptomatology. Cluster analysis of indicators of network structure and function revealed four types, including the restricted, friend, and diverse types. Instead of a family type, we found a couple-focused type. The young-old (age 65-74) were more likely to be in the couple-focused type and more of the oldest old (age 85+) belonged to the restricted type. Compared with the restricted network, older adults in all other networks were more likely to report higher life satisfaction and lower depressive symptomatology. Life course and cohort-related factors contribute to similarities across societies in network types and their associations with well-being. Korean-specific life course and socio-historical factors, however, may contribute to our unique findings about network types.
An unsupervised classification technique for multispectral remote sensing data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Cummings, R. E.
1973-01-01
Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.
Unsupervised classification of earth resources data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.
1972-01-01
A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.
Stepwise and stagewise approaches for spatial cluster detection
Xu, Jiale
2016-01-01
Spatial cluster detection is an important tool in many areas such as sociology, botany and public health. Previous work has mostly taken either hypothesis testing framework or Bayesian framework. In this paper, we propose a few approaches under a frequentist variable selection framework for spatial cluster detection. The forward stepwise methods search for multiple clusters by iteratively adding currently most likely cluster while adjusting for the effects of previously identified clusters. The stagewise methods also consist of a series of steps, but with tiny step size in each iteration. We study the features and performances of our proposed methods using simulations on idealized grids or real geographic area. From the simulations, we compare the performance of the proposed methods in terms of estimation accuracy and power of detections. These methods are applied to the the well-known New York leukemia data as well as Indiana poverty data. PMID:27246273
Stepwise and stagewise approaches for spatial cluster detection.
Xu, Jiale; Gangnon, Ronald E
2016-05-01
Spatial cluster detection is an important tool in many areas such as sociology, botany and public health. Previous work has mostly taken either a hypothesis testing framework or a Bayesian framework. In this paper, we propose a few approaches under a frequentist variable selection framework for spatial cluster detection. The forward stepwise methods search for multiple clusters by iteratively adding currently most likely cluster while adjusting for the effects of previously identified clusters. The stagewise methods also consist of a series of steps, but with a tiny step size in each iteration. We study the features and performances of our proposed methods using simulations on idealized grids or real geographic areas. From the simulations, we compare the performance of the proposed methods in terms of estimation accuracy and power. These methods are applied to the the well-known New York leukemia data as well as Indiana poverty data. Copyright © 2016 Elsevier Ltd. All rights reserved.
Friesen, Melissa C; Shortreed, Susan M; Wheeler, David C; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S; Baris, Dalsu; Karagas, Margaret R; Schwenn, Molly; Johnson, Alison; Armenti, Karla R; Silverman, Debra T; Yu, Kai
2015-05-01
Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m(-3) respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters' homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job's estimate and the mean estimate for all jobs within the cluster. Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. Published by Oxford University Press on behalf of the British Occupational Hygiene Society 2014.
NASA Astrophysics Data System (ADS)
Gandomkar, Ziba; Tay, Kevin; Ryder, Will; Brennan, Patrick C.; Mello-Thoms, Claudia
2016-03-01
Radiologists' gaze-related parameters combined with image-based features were utilized to classify suspicious mammographic areas ultimately scored as True Positives (TP) and False Positives (FP). Eight breast radiologists read 120 two-view digital mammograms of which 59 had biopsy proven cancer. Eye tracking data was collected and nearby fixations were clustered together. Suspicious areas on mammograms were independently identified based on thresholding an intensity saliency map followed by automatic segmentation and pruning steps. For each radiologist reported area, radiologist's fixation clusters in the area, as well as neighboring suspicious areas within 2.5° of the center of fixation, were found. A 45-dimensional feature vector containing gaze parameters of the corresponding cluster along with image-based characteristics was constructed. Gaze parameters included total number of fixations in the cluster, dwell time, time to hit the cluster for the first time, maximum number of consecutive fixations, and saccade magnitude of the first fixation in the cluster. Image-based features consisted of intensity, shape, and texture descriptors extracted from the region around the suspicious area, its surrounding tissue, and the entire breast. For each radiologist, a userspecific Support Vector Machine (SVM) model was built to classify the reported areas as TPs or FPs. Leave-one-out cross validation was utilized to avoid over-fitting. A feature selection step was embedded in the SVM training procedure by allowing radial basis function kernels to have 45 scaling factors. The proposed method was compared with the radiologists' performance using the jackknife alternative free-response receiver operating characteristic (JAFROC). The JAFROC figure of merit increased significantly for six radiologists.
ERIC Educational Resources Information Center
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Briggs, Beverly D.; Palafox-Hernandez, J. Pablo; Li, Yue
Materials-binding peptides represent a unique avenue towards controlling the shape and size of nanoparticles (NPs) grown under aqueous conditions. Here, employing a bionanocombinatorics approach, two such materials-binding peptides were linked at either end of a photoswitchable spacer, forming a multi-domain materials-binding molecule to control the in situ synthesis and organization of Ag and Au NPs under ambient conditions. These multi-domain molecules retained the peptides’ ability to nucleate, grow, and stabilize Ag and Au NPs in aqueous media. Disordered co-assemblies of the two nanomaterials were observed by TEM imaging of dried samples after sequential growth of the two metals, and showedmore » a clustering behavior that was not observed without both metals and the linker molecules. While TEM evidence indicated the formation of AuNP/AgNP assemblies upon drying, SAXS analysis indicated that no extended assemblies existed in solution, suggesting that sample drying plays an important role in facilitating NP clustering. Molecular simulations and experimental data revealed tunable materials-binding based upon the isomerization state of the photoswitchable unit and metal employed. This work is a first step in generating externally actuated biomolecules with specific material-binding properties that could be used as the building blocks to achieve multi-material switchable NP assemblies.« less
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials
Diaz-Ordaz, Karla; Bartlett, Jonathan W
2016-01-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.
Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W
2017-06-01
Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
User’s guide for GcClust—An R package for clustering of regional geochemical data
Ellefsen, Karl J.; Smith, David B.
2016-04-08
GcClust is a software package developed by the U.S. Geological Survey for statistical clustering of regional geochemical data, and similar data such as regional mineralogical data. Functions within the software package are written in the R statistical programming language. These functions, their documentation, and a copy of the user’s guide are bundled together in R’s unit of sharable code, which is called a “package.” The user’s guide includes step-by-step instructions showing how the functions are used to cluster data and to evaluate the clustering results. These functions are demonstrated in this report using test data, which are included in the package.
ERIC Educational Resources Information Center
Kerr, Deirdre; Chung, Gregory K. W. K.
2012-01-01
The assessment cycle of "evidence-centered design" (ECD) provides a framework for treating an educational video game or simulation as an assessment. One of the main steps in the assessment cycle of ECD is the identification of the key features of student performance. While this process is relatively simple for multiple choice tests, when…
Warden, Craig R
2008-01-01
Background With limited resources available, injury prevention efforts need to be targeted both geographically and to specific populations. As part of a pediatric injury prevention project, data was obtained on all pediatric medical and injury incidents in a fire district to evaluate geographical clustering of pediatric injuries. This will be the first step in attempting to prevent these injuries with specific interventions depending on locations and mechanisms. Results There were a total of 4803 incidents involving patients less than 15 years of age that the fire district responded to during 2001–2005 of which 1997 were categorized as injuries and 2806 as medical calls. The two cohorts (injured versus medical) differed in age distribution (7.7 ± 4.4 years versus 5.4 ± 4.8 years, p < 0.001) and location type of incident (school or church 12% versus 15%, multifamily residence 22% versus 13%, single family residence 51% versus 28%, sport, park or recreational facility 3% versus 8%, public building 8% versus 7%, and street or road 3% versus 30%, respectively, p < 0.001). Using the medical incident locations as controls, there was no significant clustering for environmental or assault injuries using the Bernoulli method while there were four significant clusters for all injury mechanisms combined, 13 clusters for motor vehicle collisions, one for falls, and two for pedestrian or bicycle injuries. Using the Poisson cluster method on incidence rates by census tract identified four clusters for all injuries, three for motor vehicle collisions, four for fall injuries, and one each for environmental and assault injuries. The two detection methods shared a minority of overlapping geographical clusters. Conclusion Significant clustering occurs overall for all injury mechanisms combined and for each mechanism depending on the cluster detection method used. There was some overlap in geographic clusters identified by both methods. The Bernoulli method allows more focused cluster mapping and evaluation since it directly uses location data. Once clusters are found, interventions can be targeted to specific geographic locations, location types, ages of victims, and mechanisms of injury. PMID:18808720
The Cluster Environment of Two High-mass Protostars
NASA Astrophysics Data System (ADS)
Montes, Virginie; Hofner, Peter
2017-06-01
Characterizing the environment and stellar population in which high-mass stars form is an important step to decide between the main massive star formation theories. In the monolithic collapse model, the mass of the core will determine the final stellar mass (e.g., McKee & Tan 2003). In contrast, in the competitive accretion model (e.g., Bonnell & Bate 2006), the mass of the high-mass star is related to the properties of the cluster. As dynamical processes substantially affect the appearance of a cluster, we study early stages of high-mass star formation. These regions often show extended emission from hot dust at infrared wavelengths, which can cause difficulties to define the cluster. We use a multi-wavelength technique to study nearby high-mass star clusters, based on X-ray observations with the Chandra X-Ray Telescope, in conjunction with infrared data and VLA data. The technique relies on the fact that YSOs are particularly bright in X-ray and that contamination is relatively small. X-ray observations allow us to determine the cluster size. The cluster membership and YSOs classification is established using infrared identification of the X-ray sources, and color-color and color-magnitude diagrams.In this talk, I will present our findings on the cluster study of two high-mass star forming regions: IRAS 20126+4104 and IRAS 16562-3959. While most massive stars appear to be formed in rich a cluster environment, those two sources are candidates for the formation of massive stars in a relatively poor cluster. In contrast to what was found in previous studies (Qiu et al. 2008), the dominant B0-type protostar in IRAS 20126+4104 is associated with a small cluster of low-mass stars. I will also show our current work on IRAS 16562-3959, which contains one of the most luminous O-type protostars in the Galaxy. In the vicinity of this particularly interesting region there is a multitude of small clusters, for which I will present how their stellar population differ from the high-mass star-forming cluster IRAS 16562-3959.
Warden, Craig R
2008-09-22
With limited resources available, injury prevention efforts need to be targeted both geographically and to specific populations. As part of a pediatric injury prevention project, data was obtained on all pediatric medical and injury incidents in a fire district to evaluate geographical clustering of pediatric injuries. This will be the first step in attempting to prevent these injuries with specific interventions depending on locations and mechanisms. There were a total of 4803 incidents involving patients less than 15 years of age that the fire district responded to during 2001-2005 of which 1997 were categorized as injuries and 2806 as medical calls. The two cohorts (injured versus medical) differed in age distribution (7.7 +/- 4.4 years versus 5.4 +/- 4.8 years, p < 0.001) and location type of incident (school or church 12% versus 15%, multifamily residence 22% versus 13%, single family residence 51% versus 28%, sport, park or recreational facility 3% versus 8%, public building 8% versus 7%, and street or road 3% versus 30%, respectively, p < 0.001). Using the medical incident locations as controls, there was no significant clustering for environmental or assault injuries using the Bernoulli method while there were four significant clusters for all injury mechanisms combined, 13 clusters for motor vehicle collisions, one for falls, and two for pedestrian or bicycle injuries. Using the Poisson cluster method on incidence rates by census tract identified four clusters for all injuries, three for motor vehicle collisions, four for fall injuries, and one each for environmental and assault injuries. The two detection methods shared a minority of overlapping geographical clusters. Significant clustering occurs overall for all injury mechanisms combined and for each mechanism depending on the cluster detection method used. There was some overlap in geographic clusters identified by both methods. The Bernoulli method allows more focused cluster mapping and evaluation since it directly uses location data. Once clusters are found, interventions can be targeted to specific geographic locations, location types, ages of victims, and mechanisms of injury.
Fidai, Insiya; Wachnowsky, Christine; Cowan, J A
2016-12-07
Ferredoxins are protein mediators of biological electron-transfer reactions and typically contain either [2Fe-2S] or [4Fe-4S] clusters. Two ferredoxin homologues have been identified in the human genome, Fdx1 and Fdx2, that share 43% identity and 69% similarity in protein sequence and both bind [2Fe-2S] clusters. Despite the high similarity, the two ferredoxins play very specific roles in distinct physiological pathways and cannot replace each other in function. Both eukaryotic and prokaryotic ferredoxins and homologues have been reported to receive their Fe-S cluster from scaffold/delivery proteins such as IscU, Isa, glutaredoxins, and Nfu. However, the preferred and physiologically relevant pathway for receiving the [2Fe-2S] cluster by ferredoxins is subject to speculation and is not clearly identified. In this work, we report on in vitro UV-visible (UV-vis) circular dichroism studies of [2Fe-2S] cluster transfer to the ferredoxins from a variety of partners. The results reveal rapid and quantitative transfer to both ferredoxins from several donor proteins (IscU, Isa1, Grx2, and Grx3). Transfer from Isa1 to Fdx2 was also observed to be faster than that of IscU to Fdx2, suggesting that Fdx2 could receive its cluster from Isa1 instead of IscU. Several other transfer combinations were also investigated and the results suggest a complex, but kinetically detailed map for cellular cluster trafficking. This is the first step toward building a network map for all of the possible iron-sulfur cluster transfer pathways in the mitochondria and cytosol, providing insights on the most likely cellular pathways and possible redundancies in these pathways.
Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F
2017-02-01
Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.
Friesen, Melissa C.; Shortreed, Susan M.; Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Pronk, Anjoeka; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Armenti, Karla R.; Silverman, Debra T.; Yu, Kai
2015-01-01
Objectives: Rule-based expert exposure assessment based on questionnaire response patterns in population-based studies improves the transparency of the decisions. The number of unique response patterns, however, can be nearly equal to the number of jobs. An expert may reduce the number of patterns that need assessment using expert opinion, but each expert may identify different patterns of responses that identify an exposure scenario. Here, hierarchical clustering methods are proposed as a systematic data reduction step to reproducibly identify similar questionnaire response patterns prior to obtaining expert estimates. As a proof-of-concept, we used hierarchical clustering methods to identify groups of jobs (clusters) with similar responses to diesel exhaust-related questions and then evaluated whether the jobs within a cluster had similar (previously assessed) estimates of occupational diesel exhaust exposure. Methods: Using the New England Bladder Cancer Study as a case study, we applied hierarchical cluster models to the diesel-related variables extracted from the occupational history and job- and industry-specific questionnaires (modules). Cluster models were separately developed for two subsets: (i) 5395 jobs with ≥1 variable extracted from the occupational history indicating a potential diesel exposure scenario, but without a module with diesel-related questions; and (ii) 5929 jobs with both occupational history and module responses to diesel-relevant questions. For each subset, we varied the numbers of clusters extracted from the cluster tree developed for each model from 100 to 1000 groups of jobs. Using previously made estimates of the probability (ordinal), intensity (µg m−3 respirable elemental carbon), and frequency (hours per week) of occupational exposure to diesel exhaust, we examined the similarity of the exposure estimates for jobs within the same cluster in two ways. First, the clusters’ homogeneity (defined as >75% with the same estimate) was examined compared to a dichotomized probability estimate (<5 versus ≥5%; <50 versus ≥50%). Second, for the ordinal probability metric and continuous intensity and frequency metrics, we calculated the intraclass correlation coefficients (ICCs) between each job’s estimate and the mean estimate for all jobs within the cluster. Results: Within-cluster homogeneity increased when more clusters were used. For example, ≥80% of the clusters were homogeneous when 500 clusters were used. Similarly, ICCs were generally above 0.7 when ≥200 clusters were used, indicating minimal within-cluster variability. The most within-cluster variability was observed for the frequency metric (ICCs from 0.4 to 0.8). We estimated that using an expert to assign exposure at the cluster-level assignment and then to review each job in non-homogeneous clusters would require ~2000 decisions per expert, in contrast to evaluating 4255 unique questionnaire patterns or 14983 individual jobs. Conclusions: This proof-of-concept shows that using cluster models as a data reduction step to identify jobs with similar response patterns prior to obtaining expert ratings has the potential to aid rule-based assessment by systematically reducing the number of exposure decisions needed. While promising, additional research is needed to quantify the actual reduction in exposure decisions and the resulting homogeneity of exposure estimates within clusters for an exposure assessment effort that obtains cluster-level expert assessments as part of the assessment process. PMID:25477475
DOE Office of Scientific and Technical Information (OSTI.GOV)
Young, M; Craft, D
Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less
Altiparmak, Fatih; Ferhatosmanoglu, Hakan; Erdal, Selnur; Trost, Donald C
2006-04-01
An effective analysis of clinical trials data involves analyzing different types of data such as heterogeneous and high dimensional time series data. The current time series analysis methods generally assume that the series at hand have sufficient length to apply statistical techniques to them. Other ideal case assumptions are that data are collected in equal length intervals, and while comparing time series, the lengths are usually expected to be equal to each other. However, these assumptions are not valid for many real data sets, especially for the clinical trials data sets. An addition, the data sources are different from each other, the data are heterogeneous, and the sensitivity of the experiments varies by the source. Approaches for mining time series data need to be revisited, keeping the wide range of requirements in mind. In this paper, we propose a novel approach for information mining that involves two major steps: applying a data mining algorithm over homogeneous subsets of data, and identifying common or distinct patterns over the information gathered in the first step. Our approach is implemented specifically for heterogeneous and high dimensional time series clinical trials data. Using this framework, we propose a new way of utilizing frequent itemset mining, as well as clustering and declustering techniques with novel distance metrics for measuring similarity between time series data. By clustering the data, we find groups of analytes (substances in blood) that are most strongly correlated. Most of these relationships already known are verified by the clinical panels, and, in addition, we identify novel groups that need further biomedical analysis. A slight modification to our algorithm results an effective declustering of high dimensional time series data, which is then used for "feature selection." Using industry-sponsored clinical trials data sets, we are able to identify a small set of analytes that effectively models the state of normal health.
2013-01-01
Background Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state’s Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. Methods The clustering methodology employs a 2-step K-means + Ward’s clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Results Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Conclusions Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units. PMID:23964905
Delamater, Paul L; Shortridge, Ashton M; Messina, Joseph P
2013-08-22
Community-based health care planning and regulation necessitates grouping facilities and areal units into regions of similar health care use. Limited research has explored the methodologies used in creating these regions. We offer a new methodology that clusters facilities based on similarities in patient utilization patterns and geographic location. Our case study focused on Hospital Groups in Michigan, the allocation units used for predicting future inpatient hospital bed demand in the state's Bed Need Methodology. The scientific, practical, and political concerns that were considered throughout the formulation and development of the methodology are detailed. The clustering methodology employs a 2-step K-means + Ward's clustering algorithm to group hospitals. The final number of clusters is selected using a heuristic that integrates both a statistical-based measure of cluster fit and characteristics of the resulting Hospital Groups. Using recent hospital utilization data, the clustering methodology identified 33 Hospital Groups in Michigan. Despite being developed within the politically charged climate of Certificate of Need regulation, we have provided an objective, replicable, and sustainable methodology to create Hospital Groups. Because the methodology is built upon theoretically sound principles of clustering analysis and health care service utilization, it is highly transferable across applications and suitable for grouping facilities or areal units.
Hardy, Victoria; O'Connor, Yvonne; Heavin, Ciara; Mastellos, Nikolaos; Tran, Tammy; O'Donoghue, John; Fitzpatrick, Annette L; Ide, Nicole; Wu, Tsung-Shu Joseph; Chirambo, Griphin Baxter; Muula, Adamson S; Nyirenda, Moffat; Carlsson, Sven; Andersson, Bo; Thompson, Matthew
2017-10-11
There is evidence to suggest that frontline community health workers in Malawi are under-referring children to higher-level facilities. Integrating a digitized version of paper-based methods of Community Case Management (CCM) could strengthen delivery, increasing urgent referral rates and preventing unnecessary re-consultations and hospital admissions. This trial aims to evaluate the added value of the Supporting LIFE electronic Community Case Management Application (SL eCCM App) compared to paper-based CCM on urgent referral, re-consultation and hospitalization rates, in two districts in Northern Malawi. This is a pragmatic, stepped-wedge cluster-randomized trial assessing the added value of the SL eCCM App on urgent referral, re-consultation and hospitalization rates of children aged 2 months and older to up to 5 years, within 7 days of the index visit. One hundred and two health surveillance assistants (HSAs) were stratified into six clusters based on geographical location, and clusters randomized to the timing of crossover to the intervention using simple, computer-generated randomization. Training workshops were conducted prior to the control (paper-CCM) and intervention (paper-CCM + SL eCCM App) in assigned clusters. Neither participants nor study personnel were blinded to allocation. Outcome measures were determined by abstraction of clinical data from patient records 2 weeks after recruitment. A nested qualitative study explored perceptions of adherence to urgent referral recommendations and a cost evaluation determined the financial and time-related costs to caregivers of subsequent health care utilization. The trial was conducted between July 2016 and February 2017. This is the first large-scale trial evaluating the value of adding a mobile application of CCM to the assessment of children aged under 5 years. The trial will generate evidence on the potential use of mobile health for CCM in Malawi, and more widely in other low- and middle-income countries. ClinicalTrials.gov, ID: NCT02763345 . Registered on 3 May 2016.
Vasylkivska, Veronika S.; Huerta, Nicolas J.
2017-06-24
Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vasylkivska, Veronika S.; Huerta, Nicolas J.
Determining the spatiotemporal characteristics of natural and induced seismic events holds the opportunity to gain new insights into why these events occur. Linking the seismicity characteristics with other geologic, geographic, natural, or anthropogenic factors could help to identify the causes and suggest mitigation strategies that reduce the risk associated with such events. The nearest-neighbor approach utilized in this work represents a practical first step toward identifying statistically correlated clusters of recorded earthquake events. Detailed study of the Oklahoma earthquake catalog’s inherent errors, empirical model parameters, and model assumptions is presented. We found that the cluster analysis results are stable withmore » respect to empirical parameters (e.g., fractal dimension) but were sensitive to epicenter location errors and seismicity rates. Most critically, we show that the patterns in the distribution of earthquake clusters in Oklahoma are primarily defined by spatial relationships between events. This observation is a stark contrast to California (also known for induced seismicity) where a comparable cluster distribution is defined by both spatial and temporal interactions between events. These results highlight the difficulty in understanding the mechanisms and behavior of induced seismicity but provide insights for future work.« less
Boyack, Kevin W.; Newman, David; Duhon, Russell J.; Klavans, Richard; Patek, Michael; Biberstine, Joseph R.; Schijvenaars, Bob; Skupin, André; Ma, Nianli; Börner, Katy
2011-01-01
Background We investigate the accuracy of different similarity approaches for clustering over two million biomedical documents. Clustering large sets of text documents is important for a variety of information needs and applications such as collection management and navigation, summary and analysis. The few comparisons of clustering results from different similarity approaches have focused on small literature sets and have given conflicting results. Our study was designed to seek a robust answer to the question of which similarity approach would generate the most coherent clusters of a biomedical literature set of over two million documents. Methodology We used a corpus of 2.15 million recent (2004-2008) records from MEDLINE, and generated nine different document-document similarity matrices from information extracted from their bibliographic records, including titles, abstracts and subject headings. The nine approaches were comprised of five different analytical techniques with two data sources. The five analytical techniques are cosine similarity using term frequency-inverse document frequency vectors (tf-idf cosine), latent semantic analysis (LSA), topic modeling, and two Poisson-based language models – BM25 and PMRA (PubMed Related Articles). The two data sources were a) MeSH subject headings, and b) words from titles and abstracts. Each similarity matrix was filtered to keep the top-n highest similarities per document and then clustered using a combination of graph layout and average-link clustering. Cluster results from the nine similarity approaches were compared using (1) within-cluster textual coherence based on the Jensen-Shannon divergence, and (2) two concentration measures based on grant-to-article linkages indexed in MEDLINE. Conclusions PubMed's own related article approach (PMRA) generated the most coherent and most concentrated cluster solution of the nine text-based similarity approaches tested, followed closely by the BM25 approach using titles and abstracts. Approaches using only MeSH subject headings were not competitive with those based on titles and abstracts. PMID:21437291
Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques
2008-01-01
This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.
Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis
NASA Astrophysics Data System (ADS)
Fu, Pei-hua; Yin, Hong-bo
In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.
Okamoto, Susumu; Taguchi, Takaaki; Ochi, Kozo; Ichinose, Koji
2009-02-27
All known benzoisochromanequinone (BIQ) biosynthetic gene clusters carry a set of genes encoding a two-component monooxygenase homologous to the ActVA-ORF5/ActVB system for actinorhodin biosynthesis in Streptomyces coelicolor A3(2). Here, we conducted molecular genetic and biochemical studies of this enzyme system. Inactivation of actVA-ORF5 yielded a shunt product, actinoperylone (ACPL), apparently derived from 6-deoxy-dihydrokalafungin. Similarly, deletion of actVB resulted in accumulation of ACPL, indicating a critical role for the monooxygenase system in C-6 oxygenation, a biosynthetic step common to all BIQ biosyntheses. Furthermore, in vitro, we showed a quinone-forming activity of the ActVA-ORF5/ActVB system in addition to that of a known C-6 monooxygenase, ActVA-ORF6, by using emodinanthrone as a model substrate. Our results demonstrate that the act gene cluster encodes two alternative routes for quinone formation by C-6 oxygenation in BIQ biosynthesis.
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia
NASA Astrophysics Data System (ADS)
Novak, Vibor; Renema, Willem
2018-01-01
To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
NASA Technical Reports Server (NTRS)
Justice, C.; Townshend, J. (Principal Investigator)
1981-01-01
Two unsupervised classification procedures were applied to ratioed and unratioed LANDSAT multispectral scanner data of an area of spatially complex vegetation and terrain. An objective accuracy assessment was undertaken on each classification and comparison was made of the classification accuracies. The two unsupervised procedures use the same clustering algorithm. By on procedure the entire area is clustered and by the other a representative sample of the area is clustered and the resulting statistics are extrapolated to the remaining area using a maximum likelihood classifier. Explanation is given of the major steps in the classification procedures including image preprocessing; classification; interpretation of cluster classes; and accuracy assessment. Of the four classifications undertaken, the monocluster block approach on the unratioed data gave the highest accuracy of 80% for five coarse cover classes. This accuracy was increased to 84% by applying a 3 x 3 contextual filter to the classified image. A detailed description and partial explanation is provided for the major misclassification. The classification of the unratioed data produced higher percentage accuracies than for the ratioed data and the monocluster block approach gave higher accuracies than clustering the entire area. The moncluster block approach was additionally the most economical in terms of computing time.
One-step generation of continuous-variable quadripartite cluster states in a circuit QED system
NASA Astrophysics Data System (ADS)
Yang, Zhi-peng; Li, Zhen; Ma, Sheng-li; Li, Fu-li
2017-07-01
We propose a dissipative scheme for one-step generation of continuous-variable quadripartite cluster states in a circuit QED setup consisting of four superconducting coplanar waveguide resonators and a gap-tunable superconducting flux qubit. With external driving fields to adjust the desired qubit-resonator and resonator-resonator interactions, we show that continuous-variable quadripartite cluster states of the four resonators can be generated with the assistance of energy relaxation of the qubit. By comparison with the previous proposals, the distinct advantage of our scheme is that only one step of quantum operation is needed to realize the quantum state engineering. This makes our scheme simpler and more feasible in experiment. Our result may have useful application for implementing quantum computation in solid-state circuit QED systems.
NASA Astrophysics Data System (ADS)
Tench, R. J.
1992-11-01
For the first time, nanometer scale uranium clusters were created on the basal plane of highly oriented pyrolytic graphite by laser ablation under ultra-high vacuum conditions. The physical and chemical properties of these clusters were investigated by scanning tunneling microscopy (STM) as well as standard surface science techniques. Auger electron and X-ray photoelectron spectroscopies found the uranium deposit to be free of contamination and showed that no carbide had formed with the underlying graphite. Clusters with sizes ranging from 42 to 630 sq A were observed upon initial room temperature deposition. Surface diffusion of uranium was observed after annealing the substrate above 800 K, as evidenced by the decreased number density and the increased size of the clusters. Preferential depletion of clusters on terraces near step edges as a result of annealing was observed. The activation energy for diffusion deduced from these measurements was found to be 15 Kcal/mole. Novel formation of ordered uranium thin films was observed for coverages greater than two monolayers after annealing above 900 K. These ordered films displayed islands with hexagonally faceted edges rising in uniform step heights characteristic of the unit cell of the P-phase of uranium. In addition, atomic resolution STM images of these ordered films indicated the formation of the (beta)-phase of uranium. The chemical properties of these surfaces were investigated and it was shown that these uranium films had a reduced oxidation rate in air as compared to bulk metal and that STM imaging in air induced a polarity-dependent enhancement of the oxidation rate.
Craen, Saskia de; Commandeur, Jacques J F; Frank, Laurence E; Heiser, Willem J
2006-06-01
K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these populations showed a significant effect of lack of sphericity and group size. This effect was, however, not as large as expected, with still a recovery index of more than 0.5 in the "worst case scenario." An interaction effect between the two data aspects was also found. The decreasing trend in the recovery of clusters for increasing departures from sphericity is different for equal and unequal group sizes.
Gillett-Kunnath, Miriam M.; Sevov, Slavi C.
2012-01-01
Although the first studies of Zintl ions date between the late 1890's and early 1930's they were not structurally characterized until many years later.1,2 Their redox chemistry is even younger, just about ten years old, but despite this short history these deltahedral clusters ions E9n- (E = Si, Ge, Sn, Pb; n = 2, 3, 4) have already shown interesting and diverse reactivity and have been at the forefront of rapidly developing and exciting new chemistry.3-6 Notable milestones are the oxidative coupling of Ge94- clusters to oligomers and infinite chains,7-19 their metallation,14-16,20-25 capping by transition-metal organometallic fragments,26-34 insertion of a transition-metal atom at the center of the cluster which is sometimes combined with capping and oligomerization,35-47 addition of main-group organometallic fragments as exo-bonded substituents,48-50 and functionalization with various organic residues by reactions with organic halides and alkynes.51-58 This latter development of attaching organic fragments directly to the clusters has opened up a new field, namely organo-Zintl chemistry, that is potentially fertile for further synthetic explorations, and it is the step-by-step procedure for the synthesis of germanium-divinyl clusters described herein. The initial steps outline the synthesis of an intermetallic precursor of K4Ge9 from which the Ge94- clusters are extracted later in solution. This involves fused-silica glass blowing, arc-welding of niobium containers, and handling of highly air-sensitive materials in a glove box. The air-sensitive K4Ge9 is then dissolved in ethylenediamine in the box and then alkenylated by a reaction with Me3SiC≡CSiMe3. The reaction is followed by electrospray mass spectrometry while the resulting solution is used for obtaining single crystals containing the functionalized clusters [H2C=CH-Ge9-CH=CH2]2-. For this purpose the solution is centrifuged, filtered, and carefully layered with a toluene solution of 18-crown-6. Left undisturbed for a few days, the so-layered solutions produced orange crystalline blocks of [K(18-crown-6)]2[Ge9(HCCH2)2]•en which were characterized by single-crystal X-ray diffraction. The process highlights standard reaction techniques, work-up, and analysis towards functionalized deltahedral Zintl clusters. It is hoped that it will help towards further development and understanding of these compounds in the community at large. PMID:22349121
Martínez-del Campo, Ana; Bodea, Smaranda; Hamer, Hilary A.; Marks, Jonathan A.; Haiser, Henry J.; Turnbaugh, Peter J.
2015-01-01
ABSTRACT Elucidation of the molecular mechanisms underlying the human gut microbiota’s effects on health and disease has been complicated by difficulties in linking metabolic functions associated with the gut community as a whole to individual microorganisms and activities. Anaerobic microbial choline metabolism, a disease-associated metabolic pathway, exemplifies this challenge, as the specific human gut microorganisms responsible for this transformation have not yet been clearly identified. In this study, we established the link between a bacterial gene cluster, the choline utilization (cut) cluster, and anaerobic choline metabolism in human gut isolates by combining transcriptional, biochemical, bioinformatic, and cultivation-based approaches. Quantitative reverse transcription-PCR analysis and in vitro biochemical characterization of two cut gene products linked the entire cluster to growth on choline and supported a model for this pathway. Analyses of sequenced bacterial genomes revealed that the cut cluster is present in many human gut bacteria, is predictive of choline utilization in sequenced isolates, and is widely but discontinuously distributed across multiple bacterial phyla. Given that bacterial phylogeny is a poor marker for choline utilization, we were prompted to develop a degenerate PCR-based method for detecting the key functional gene choline TMA-lyase (cutC) in genomic and metagenomic DNA. Using this tool, we found that new choline-metabolizing gut isolates universally possessed cutC. We also demonstrated that this gene is widespread in stool metagenomic data sets. Overall, this work represents a crucial step toward understanding anaerobic choline metabolism in the human gut microbiota and underscores the importance of examining this microbial community from a function-oriented perspective. PMID:25873372
Bruehl, Stephen; Maihöfner, Christian; Stanton-Hicks, Michael; Perez, Roberto S G M; Vatine, Jean-Jacques; Brunner, Florian; Birklein, Frank; Schlereth, Tanja; Mackey, Sean; Mailis-Gagnon, Angela; Livshitz, Anatoly; Harden, R Norman
2016-08-01
Limited research suggests that there may be Warm complex regional pain syndrome (CRPS) and Cold CRPS subtypes, with inflammatory mechanisms contributing most strongly to the former. This study for the first time used an unbiased statistical pattern recognition technique to evaluate whether distinct Warm vs Cold CRPS subtypes can be discerned in the clinical population. An international, multisite study was conducted using standardized procedures to evaluate signs and symptoms in 152 patients with clinical CRPS at baseline, with 3-month follow-up evaluations in 112 of these patients. Two-step cluster analysis using automated cluster selection identified a 2-cluster solution as optimal. Results revealed a Warm CRPS patient cluster characterized by a warm, red, edematous, and sweaty extremity and a Cold CRPS patient cluster characterized by a cold, blue, and less edematous extremity. Median pain duration was significantly (P < 0.001) shorter in the Warm CRPS (4.7 months) than in the Cold CRPS subtype (20 months), with pain intensity comparable. A derived total inflammatory score was significantly (P < 0.001) elevated in the Warm CRPS group (compared with Cold CRPS) at baseline but diminished significantly (P < 0.001) over the follow-up period, whereas this score did not diminish in the Cold CRPS group (time × subtype interaction: P < 0.001). Results support the existence of a Warm CRPS subtype common in patients with acute (<6 months) CRPS and a relatively distinct Cold CRPS subtype most common in chronic CRPS. The pattern of clinical features suggests that inflammatory mechanisms contribute most prominently to the Warm CRPS subtype but that these mechanisms diminish substantially during the first year postinjury.
NASA Astrophysics Data System (ADS)
Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.
2018-02-01
This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.
Progeny Clustering: A Method to Identify Biological Phenotypes
Hu, Chenyue W.; Kornblau, Steven M.; Slater, John H.; Qutub, Amina A.
2015-01-01
Estimating the optimal number of clusters is a major challenge in applying cluster analysis to any type of dataset, especially to biomedical datasets, which are high-dimensional and complex. Here, we introduce an improved method, Progeny Clustering, which is stability-based and exceptionally efficient in computing, to find the ideal number of clusters. The algorithm employs a novel Progeny Sampling method to reconstruct cluster identity, a co-occurrence probability matrix to assess the clustering stability, and a set of reference datasets to overcome inherent biases in the algorithm and data space. Our method was shown successful and robust when applied to two synthetic datasets (datasets of two-dimensions and ten-dimensions containing eight dimensions of pure noise), two standard biological datasets (the Iris dataset and Rat CNS dataset) and two biological datasets (a cell phenotype dataset and an acute myeloid leukemia (AML) reverse phase protein array (RPPA) dataset). Progeny Clustering outperformed some popular clustering evaluation methods in the ten-dimensional synthetic dataset as well as in the cell phenotype dataset, and it was the only method that successfully discovered clinically meaningful patient groupings in the AML RPPA dataset. PMID:26267476
Nonclassical nucleation pathways in protein crystallization
NASA Astrophysics Data System (ADS)
Zhang, Fajun
2017-11-01
Classical nucleation theory (CNT), which was established about 90 years ago, has been very successful in many research fields, and continues to be the most commonly used theory in describing the nucleation process. For a fluid-to-solid phase transition, CNT states that the solute molecules in a supersaturated solution reversibly form small clusters. Once the cluster size reaches a critical value, it becomes thermodynamically stable and favored for further growth. One of the most important assumptions of CNT is that the nucleation process is described by one reaction coordinate and all order parameters proceed simultaneously. Recent studies in experiments, computer simulations and theory have revealed nonclassical features in the early stage of nucleation. In particular, the decoupling of order parameters involved during a fluid-to-solid transition leads to the so-called two-step nucleation mechanism, in which a metastable intermediate phase (MIP) exists between the initial supersaturated solution and the final crystals. Depending on the exact free energy landscapes, the MIPs can be a high density liquid phase, mesoscopic clusters, or a pre-ordered state. In this review, we focus on the studies of nonclassical pathways in protein crystallization and discuss the applications of the various scenarios of two-step nucleation theory. In particular, we focus on protein solutions in the presence of multivalent salts, which serve as a model protein system to study the nucleation pathways. We wish to point out the unique features of proteins as model systems for further studies.
Nonclassical nucleation pathways in protein crystallization.
Zhang, Fajun
2017-11-08
Classical nucleation theory (CNT), which was established about 90 years ago, has been very successful in many research fields, and continues to be the most commonly used theory in describing the nucleation process. For a fluid-to-solid phase transition, CNT states that the solute molecules in a supersaturated solution reversibly form small clusters. Once the cluster size reaches a critical value, it becomes thermodynamically stable and favored for further growth. One of the most important assumptions of CNT is that the nucleation process is described by one reaction coordinate and all order parameters proceed simultaneously. Recent studies in experiments, computer simulations and theory have revealed nonclassical features in the early stage of nucleation. In particular, the decoupling of order parameters involved during a fluid-to-solid transition leads to the so-called two-step nucleation mechanism, in which a metastable intermediate phase (MIP) exists between the initial supersaturated solution and the final crystals. Depending on the exact free energy landscapes, the MIPs can be a high density liquid phase, mesoscopic clusters, or a pre-ordered state. In this review, we focus on the studies of nonclassical pathways in protein crystallization and discuss the applications of the various scenarios of two-step nucleation theory. In particular, we focus on protein solutions in the presence of multivalent salts, which serve as a model protein system to study the nucleation pathways. We wish to point out the unique features of proteins as model systems for further studies.
Peng, Bingyin; Huang, Shuangcheng; Liu, Tingting; Geng, Anli
2015-05-17
Xylose isomerase (XI) catalyzes the conversion of xylose to xylulose, which is the key step for anaerobic ethanolic fermentation of xylose. Very few bacterial XIs can function actively in Saccharomyces cerevisiae. Here, we illustrate a group of XIs that would function for xylose fermentation in S. cerevisiae through phylogenetic analysis, recombinant yeast strain construction, and xylose fermentation. Phylogenetic analysis of deposited XI sequences showed that XI evolutionary relationship was highly consistent with the bacterial taxonomic orders and quite a few functional XIs in S. cerevisiae were clustered with XIs from mammal gut Bacteroidetes group. An XI from Bacteroides valgutus in this cluster was actively expressed in S. cerevisiae with an activity comparable to the fungal XI from Piromyces sp. Two XI genes were isolated from the environmental metagenome and they were clustered with XIs from environmental Bacteroidetes group. These two XIs could not be expressed in yeast with activity. With the XI from B. valgutus expressed in S. cerevisiae, background yeast strains were optimized by pentose metabolizing pathway enhancement and adaptive evolution in xylose medium. Afterwards, more XIs from the mammal gut Bacteroidetes group, including those from B. vulgatus, Tannerella sp. 6_1_58FAA_CT1, Paraprevotella xylaniphila and Alistipes sp. HGB5, were individually transformed into S. cerevisiae. The known functional XI from Orpinomyces sp. ukk1, a mammal gut fungus, was used as the control. All the resulting recombinant yeast strains were able to ferment xylose. The respiration-deficient strains harboring B. vulgatus and Alistipes sp. HGB5 XI genes respectively obtained specific xylose consumption rate of 0.662 and 0.704 g xylose gcdw(-1) h(-1), and ethanol specific productivity of 0.277 and 0.283 g ethanol gcdw(-1) h(-1), much comparable to those obtained by the control strain carrying Orpinomyces sp. ukk1 XI gene. This study demonstrated that XIs clustered in the mammal gut Bacteroidetes group were able to be expressed functionally in S. cerevisiae and background strain anaerobic adaptive evolution in xylose medium is essential for the screening of functional XIs. The methods outlined in this paper are instructive for the identification of novel XIs that are functional in S. cerevisiae.
Comparative analysis of peak-detection techniques for comprehensive two-dimensional chromatography.
Latha, Indu; Reichenbach, Stephen E; Tao, Qingping
2011-09-23
Comprehensive two-dimensional gas chromatography (GC×GC) is a powerful technology for separating complex samples. The typical goal of GC×GC peak detection is to aggregate data points of analyte peaks based on their retention times and intensities. Two techniques commonly used for two-dimensional peak detection are the two-step algorithm and the watershed algorithm. A recent study [4] compared the performance of the two-step and watershed algorithms for GC×GC data with retention-time shifts in the second-column separations. In that analysis, the peak retention-time shifts were corrected while applying the two-step algorithm but the watershed algorithm was applied without shift correction. The results indicated that the watershed algorithm has a higher probability of erroneously splitting a single two-dimensional peak than the two-step approach. This paper reconsiders the analysis by comparing peak-detection performance for resolved peaks after correcting retention-time shifts for both the two-step and watershed algorithms. Simulations with wide-ranging conditions indicate that when shift correction is employed with both algorithms, the watershed algorithm detects resolved peaks with greater accuracy than the two-step method. Copyright © 2011 Elsevier B.V. All rights reserved.
The balance between keystone clustering and bed roughness in experimental step-pool stabilization
NASA Astrophysics Data System (ADS)
Johnson, J. P.
2016-12-01
Predicting how mountain channels will respond to environmental perturbations such as floods requires an improved quantitative understanding of morphodynamic feedbacks among bed topography, surface grain size and sediment sorting. In boulder-rich gravel streams, transport and sorting often lead to the development of step pool morphologies, which are expressed both in bed topography and coarse grain clustering. Bed stability is difficult to measure, and is sometimes inferred from the presence of step pools. I use scaled flume experiments to explore feedbacks among surface grain sizes, coarse grain clustering, bed roughness and hydraulic roughness during progressive bed stabilization and over a range of sediment transport rates. While grain clusters are sometimes identified by subjective interpretation, I quantify the degree of coarse surface grain clustering using spatial statistics, including a novel normalization of Ripley's K function. This approach is objective and provides information on the strength of clustering over a range of length scales. Flume experiments start with an initial bed surface with a broad grain size distribution and spatially random positions. Flow causes the bed surface to progressively stabilize in response to erosion, surface coarsening, roughening and grain reorganization. At 95% confidence, many but not all beds stabilized with coarse grains becoming more clustered than complete spatial randomness (CSR). I observe a tradeoff between topographic roughness and clustering. Beds that stabilized with higher degrees of coarse-grain clustering were topographically smoother, and vice-versa. Initial conditions influenced the degree of clustering at stability: Beds that happened to have fewer initial coarse grains had more coarse grain reorganization during stabilization, leading to more clustering. Finally, regressions demonstrate that clustering statistics actually predict hydraulic roughness significantly better than does D84 (the size at which 84% of grains are smaller). In the experimental data, the spatial organization of surface grains is a stronger control on flow characteristics than the size of surface grains.
A Constrained-Clustering Approach to the Analysis of Remote Sensing Data.
1983-01-01
One old and two new clustering methods were applied to the constrained-clustering problem of separating different agricultural fields based on multispectral remote sensing satellite data. (Constrained-clustering involves double clustering in multispectral measurement similarity and geographical location.) The results of applying the three methods are provided along with a discussion of their relative strengths and weaknesses and a detailed description of their algorithms.
On evaluating clustering procedures for use in classification
NASA Technical Reports Server (NTRS)
Pore, M. D.; Moritz, T. E.; Register, D. T.; Yao, S. S.; Eppler, W. G. (Principal Investigator)
1979-01-01
The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.
2013-01-01
Background India currently has more than 60 million people with Type 2 Diabetes Mellitus (T2DM) and this is predicted to increase by nearly two-thirds by 2030. While management of those with T2DM is important, preventing or delaying the onset of the disease, especially in those individuals at ‘high risk’ of developing T2DM, is urgently needed, particularly in resource-constrained settings. This paper describes the protocol for a cluster randomised controlled trial of a peer-led lifestyle intervention program to prevent diabetes in Kerala, India. Methods/design A total of 60 polling booths are randomised to the intervention arm or control arm in rural Kerala, India. Data collection is conducted in two steps. Step 1 (Home screening): Participants aged 30–60 years are administered a screening questionnaire. Those having no history of T2DM and other chronic illnesses with an Indian Diabetes Risk Score value of ≥60 are invited to attend a mobile clinic (Step 2). At the mobile clinic, participants complete questionnaires, undergo physical measurements, and provide blood samples for biochemical analysis. Participants identified with T2DM at Step 2 are excluded from further study participation. Participants in the control arm are provided with a health education booklet containing information on symptoms, complications, and risk factors of T2DM with the recommended levels for primary prevention. Participants in the intervention arm receive: (1) eleven peer-led small group sessions to motivate, guide and support in planning, initiation and maintenance of lifestyle changes; (2) two diabetes prevention education sessions led by experts to raise awareness on T2DM risk factors, prevention and management; (3) a participant handbook containing information primarily on peer support and its role in assisting with lifestyle modification; (4) a participant workbook to guide self-monitoring of lifestyle behaviours, goal setting and goal review; (5) the health education booklet that is given to the control arm. Follow-up assessments are conducted at 12 and 24 months. The primary outcome is incidence of T2DM. Secondary outcomes include behavioural, psychosocial, clinical, and biochemical measures. An economic evaluation is planned. Discussion Results from this trial will contribute to improved policy and practice regarding lifestyle intervention programs to prevent diabetes in India and other resource-constrained settings. Trial registration Australia and New Zealand Clinical Trials Registry: ACTRN12611000262909. PMID:24180316
Estimation of a cover-type change matrix from error-prone data
Steen Magnussen
2009-01-01
Coregistration and classification errors seriously compromise per-pixel estimates of land cover change. A more robust estimation of change is proposed in which adjacent pixels are grouped into 3x3 clusters and treated as a unit of observation. A complete change matrix is recovered in a two-step process. The diagonal elements of a change matrix are recovered from...
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions
NASA Astrophysics Data System (ADS)
Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.
2018-04-01
The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.
NASA Astrophysics Data System (ADS)
Huang, Da; Freeley, Mark; Palma, Matteo
2017-03-01
We present a facile strategy of general applicability for the assembly of individual nanoscale moieties in array configurations with single-molecule control. Combining the programming ability of DNA as a scaffolding material with a one-step lithographic process, we demonstrate the patterning of single quantum dots (QDs) at predefined locations on silicon and transparent glass surfaces: as proof of concept, clusters of either one, two, or three QDs were assembled in highly uniform arrays with a 60 nm interdot spacing within each cluster. Notably, the platform developed is reusable after a simple cleaning process and can be designed to exhibit different geometrical arrangements.
Universal quantum computation with temporal-mode bilayer square lattices
NASA Astrophysics Data System (ADS)
Alexander, Rafael N.; Yokoyama, Shota; Furusawa, Akira; Menicucci, Nicolas C.
2018-03-01
We propose an experimental design for universal continuous-variable quantum computation that incorporates recent innovations in linear-optics-based continuous-variable cluster state generation and cubic-phase gate teleportation. The first ingredient is a protocol for generating the bilayer-square-lattice cluster state (a universal resource state) with temporal modes of light. With this state, measurement-based implementation of Gaussian unitary gates requires only homodyne detection. Second, we describe a measurement device that implements an adaptive cubic-phase gate, up to a random phase-space displacement. It requires a two-step sequence of homodyne measurements and consumes a (non-Gaussian) cubic-phase state.
NASA Astrophysics Data System (ADS)
U-thaipan, Kasira; Tedsree, Karaked
2018-06-01
The surface morphology of flower-like Ag/ZnO nanorod can be manipulated by adopting different synthetic routes and also loading different levels of Ag in order to alter their surface structures to achieve the maximum photocatalytic efficiency. In a single-step preparation method Ag/ZnO was prepared by heating directly a mixture of Zn2+ and Ag+ precursors in an aqueous NaOH-ethylene glycol solution, while in the two-step preparation method an intermediate of flower-shaped ZnO nanorod was obtained by a hydrothermal process before depositing Ag particles on the ZnO surfaces by chemical reduction. The structure, morphology and optical properties of the synthesized samples were characterized using TEM, SEM, XRD, DRS and PL techniques. The sample prepared by single-step method are characterized with agglomeration of Ag atoms as clusters on the surface of ZnO, whereas in the sample prepared by two-step method Ag atoms are found uniformly dispersed and deposited as discrete Ag nanoparticles on the surface of ZnO. A significant enhancement in the adsorption of visible light was evident for Ag/ZnO samples prepared by two-step method especially with low Ag content (0.5 mol%). The flower-like Ag/ZnO nanorod prepared with 0.5 mol% Ag by two-step process was found to be the most efficient photocatalyst for the degradation of phenol, which can decompose 90% of phenol within 120 min.