cluster analysis procedure: Topics by Science.gov

Sample records for cluster analysis procedure

ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data

NASA Technical Reports Server (NTRS)

Wharton, S. W.; Turner, B. J.

1981-01-01

An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

ERIC Educational Resources Information Center

Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

2008-01-01

Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.

PubMed

Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G

2015-11-04

In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.
A New Variable Weighting and Selection Procedure for K-Means Cluster Analysis

ERIC Educational Resources Information Center

Steinley, Douglas; Brusco, Michael J.

2008-01-01

A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these…
I. Excluded volume effects in Ising cluster distributions and nuclear multifragmentation. II. Multiple-chance effects in alpha-particle evaporation

NASA Astrophysics Data System (ADS)

Breus, Dimitry Eugene

In Part I, geometric clusters of the Ising model are studied as possible model clusters for nuclear multifragmentation. These clusters may not be considered as non-interacting (ideal gas) due to excluded volume effect which predominantly is the artifact of the cluster's finite size. Interaction significantly complicates the use of clusters in the analysis of thermodynamic systems. Stillinger's theory is used as a basis for the analysis, which within the RFL (Reiss, Frisch, Lebowitz) fluid-of-spheres approximation produces a prediction for cluster concentrations well obeyed by geometric clusters of the Ising model. If thermodynamic condition of phase coexistence is met, these concentrations can be incorporated into a differential equation procedure of moderate complexity to elucidate the liquid-vapor phase diagram of the system with cluster interaction included. The drawback of increased complexity is outweighted by the reward of greater accuracy of the phase diagram, as it is demonstrated by the Ising model. A novel nuclear-cluster analysis procedure is developed by modifying Fisher's model to contain cluster interaction and employing the differential equation procedure to obtain thermodynamic variables. With this procedure applied to geometric clusters, the guidelines are developed to look for excluded volume effect in nuclear multifragmentation. In Part II, an explanation is offered for the recently observed oscillations in the energy spectra of alpha-particles emitted from hot compound nuclei. Contrary to what was previously expected, the oscillations are assumed to be caused by the multiple-chance nature of alpha-evaporation. In a semi-empirical fashion this assumption is successfully confirmed by a technique of two-spectra decomposition which treats experimental alpha-spectra as having contributions from at least two independent emitters. Building upon the success of the multiple-chance explanation of the oscillations, Moretto's single-chance evaporation theory is augmented to include multiple-chance emission and tested on experimental data to yield positive results.
Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

ERIC Educational Resources Information Center

Brown, S. J.; White, S.; Power, N.

2015-01-01

A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…
ICAP: An Interactive Cluster Analysis Procedure for analyzing remotely sensed data. [to classify the radiance data to produce a thematic map

NASA Technical Reports Server (NTRS)

Wharton, S. W.

1980-01-01

An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. The algorithm interfaces the rapid numerical processing capacity of a computer with the human ability to integrate qualitative information. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters and the analyst, who evaluate and elect to modify the cluster structure. Clusters can be deleted or lumped pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The ICAP was implemented in APL (A Programming Language), an interactive computer language. The flexibility of the algorithm was evaluated using data from different LANDSAT scenes to simulate two situations: one in which the analyst is assumed to have no prior knowledge about the data and wishes to have the clusters formed more or less automatically; and the other in which the analyst is assumed to have some knowledge about the data structure and wishes to use that information to closely supervise the clustering process. For comparison, an existing clustering method was also applied to the two data sets.
A Hierarchical Bayesian Procedure for Two-Mode Cluster Analysis

ERIC Educational Resources Information Center

DeSarbo, Wayne S.; Fong, Duncan K. H.; Liechty, John; Saxton, M. Kim

2004-01-01

This manuscript introduces a new Bayesian finite mixture methodology for the joint clustering of row and column stimuli/objects associated with two-mode asymmetric proximity, dominance, or profile data. That is, common clusters are derived which partition both the row and column stimuli/objects simultaneously into the same derived set of clusters.…
A Systematic Approach for Determining Vertical Pile Depth of Embedment in Cohensionless Soils to Withstand Lateral Barge Train Impact Loads

DTIC Science & Technology

2017-01-30

dynamic structural time- history response analysis of flexible approach walls founded on clustered pile groups using Impact_Deck. In Preparation, ERDC...research (Ebeling et al. 2012) has developed simplified analysis procedures for flexible approach wall systems founded on clustered groups of vertical...history response analysis of flexible approach walls founded on clustered pile groups using Impact_Deck. In Preparation, ERDC/ITL TR-16-X. Vicksburg, MS
The Equivalence of Three Statistical Packages for Performing Hierarchical Cluster Analysis

ERIC Educational Resources Information Center

Blashfield, Roger

1977-01-01

Three different software programs which contain hierarchical agglomerative cluster analysis procedures were shown to generate different solutions on the same data set using apparently the same options. The basis for the differences in the solutions was the formulae used to calculate Euclidean distance. (Author/JKS)
Identifying Peer Institutions Using Cluster Analysis

ERIC Educational Resources Information Center

Boronico, Jess; Choksi, Shail S.

2012-01-01

The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…
Multivariate Cluster Analysis.

ERIC Educational Resources Information Center

McRae, Douglas J.

Procedures for grouping students into homogeneous subsets have long interested educational researchers. The research reported in this paper is an investigation of a set of objective grouping procedures based on multivariate analysis considerations. Four multivariate functions that might serve as criteria for adequate grouping are given and…
Using Cluster Bootstrapping to Analyze Nested Data With a Few Clusters.

PubMed

Huang, Francis L

2018-04-01

Cluster randomized trials involving participants nested within intact treatment and control groups are commonly performed in various educational, psychological, and biomedical studies. However, recruiting and retaining intact groups present various practical, financial, and logistical challenges to evaluators and often, cluster randomized trials are performed with a low number of clusters (~20 groups). Although multilevel models are often used to analyze nested data, researchers may be concerned of potentially biased results due to having only a few groups under study. Cluster bootstrapping has been suggested as an alternative procedure when analyzing clustered data though it has seen very little use in educational and psychological studies. Using a Monte Carlo simulation that varied the number of clusters, average cluster size, and intraclass correlations, we compared standard errors using cluster bootstrapping with those derived using ordinary least squares regression and multilevel models. Results indicate that cluster bootstrapping, though more computationally demanding, can be used as an alternative procedure for the analysis of clustered data when treatment effects at the group level are of primary interest. Supplementary material showing how to perform cluster bootstrapped regressions using R is also provided.
Multivariate Analysis of the Visual Information Processing of Numbers

ERIC Educational Resources Information Center

Levine, David M.

1977-01-01

Nonmetric multidimensional scaling and hierarchical clustering procedures are applied to a confusion matrix of numerals. Two dimensions were interpreted: straight versus curved, and locus of curvature. Four major clusters of numerals were developed. (Author/JKS)
Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials

PubMed Central

Andridge, Rebecca. R.

2011-01-01

In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller ICCs lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random (MCAR), and cases in which data are missing at random (MAR) are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared. PMID:21259309
A Preliminary Comparison of the Effectiveness of Cluster Analysis Weighting Procedures for Within-Group Covariance Structure.

ERIC Educational Resources Information Center

Donoghue, John R.

A Monte Carlo study compared the usefulness of six variable weighting methods for cluster analysis. Data were 100 bivariate observations from 2 subgroups, generated according to a finite normal mixture model. Subgroup size, within-group correlation, within-group variance, and distance between subgroup centroids were manipulated. Of the clustering…
Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures

ERIC Educational Resources Information Center

Steinley, Douglas; Brusco, Michael J.

2008-01-01

Eight different variable selection techniques for model-based and non-model-based clustering are evaluated across a wide range of cluster structures. It is shown that several methods have difficulties when non-informative variables (i.e., random noise) are included in the model. Furthermore, the distribution of the random noise greatly impacts the…
Somatotyping using 3D anthropometry: a cluster analysis.

PubMed

Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur

2013-01-01

Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.
Optimizing disinfection by-product monitoring points in a distribution system using cluster analysis.

PubMed

Delpla, Ianis; Florea, Mihai; Pelletier, Geneviève; Rodriguez, Manuel J

2018-06-04

Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. Copyright © 2018 Elsevier Ltd. All rights reserved.
Cluster analysis of molecular simulation trajectories for systems where both conformation and orientation of the sampled states are important.

PubMed

Abramyan, Tigran M; Snyder, James A; Thyparambil, Aby A; Stuart, Steven J; Latour, Robert A

2016-08-05

Clustering methods have been widely used to group together similar conformational states from molecular simulations of biomolecules in solution. For applications such as the interaction of a protein with a surface, the orientation of the protein relative to the surface is also an important clustering parameter because of its potential effect on adsorbed-state bioactivity. This study presents cluster analysis methods that are specifically designed for systems where both molecular orientation and conformation are important, and the methods are demonstrated using test cases of adsorbed proteins for validation. Additionally, because cluster analysis can be a very subjective process, an objective procedure for identifying both the optimal number of clusters and the best clustering algorithm to be applied to analyze a given dataset is presented. The method is demonstrated for several agglomerative hierarchical clustering algorithms used in conjunction with three cluster validation techniques. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

An unsupervised classification approach for analysis of Landsat data to monitor land reclamation in Belmont county, Ohio

NASA Technical Reports Server (NTRS)

Brumfield, J. O.; Bloemer, H. H. L.; Campbell, W. J.

1981-01-01

Two unsupervised classification procedures for analyzing Landsat data used to monitor land reclamation in a surface mining area in east central Ohio are compared for agreement with data collected from the corresponding locations on the ground. One procedure is based on a traditional unsupervised-clustering/maximum-likelihood algorithm sequence that assumes spectral groupings in the Landsat data in n-dimensional space; the other is based on a nontraditional unsupervised-clustering/canonical-transformation/clustering algorithm sequence that not only assumes spectral groupings in n-dimensional space but also includes an additional feature-extraction technique. It is found that the nontraditional procedure provides an appreciable improvement in spectral groupings and apparently increases the level of accuracy in the classification of land cover categories.
Using data mining to segment healthcare markets from patients' preference perspectives.

PubMed

Liu, Sandra S; Chen, Jie

2009-01-01

This paper aims to provide an example of how to use data mining techniques to identify patient segments regarding preferences for healthcare attributes and their demographic characteristics. Data were derived from a number of individuals who received in-patient care at a health network in 2006. Data mining and conventional hierarchical clustering with average linkage and Pearson correlation procedures are employed and compared to show how each procedure best determines segmentation variables. Data mining tools identified three differentiable segments by means of cluster analysis. These three clusters have significantly different demographic profiles. The study reveals, when compared with traditional statistical methods, that data mining provides an efficient and effective tool for market segmentation. When there are numerous cluster variables involved, researchers and practitioners need to incorporate factor analysis for reducing variables to clearly and meaningfully understand clusters. Interests and applications in data mining are increasing in many businesses. However, this technology is seldom applied to healthcare customer experience management. The paper shows that efficient and effective application of data mining methods can aid the understanding of patient healthcare preferences.
Detailed analysis of CAMS procedures for phase 3 using ground truth inventories

NASA Technical Reports Server (NTRS)

Carnes, J. G.

1979-01-01

The results of a study of Procedure 1 as used during LACIE Phase 3 are presented. The study was performed by comparing the Procedure 1 classification results with digitized ground-truth inventories. The proportion estimation accuracy, dot labeling accuracy, and clustering effectiveness are discussed.
Finding Groups Using Model-based Cluster Analysis: Heterogeneous Emotional Self-regulatory Processes and Heavy Alcohol Use Risk

PubMed Central

Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

2010-01-01

Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138
Statistical Significance for Hierarchical Clustering

PubMed Central

Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

2017-01-01

Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
A Typology of Burnout in Professional Counselors

ERIC Educational Resources Information Center

Lee, Sang Min; Cho, Seong Ho; Kissinger, Daniel; Ogle, Nick T.

2010-01-01

The authors used a cluster analysis procedure and the Counselor Burnout Inventory (S. M. Lee et al., 2007) to identify professional counselors' burnout types. Three clusters were identified: well-adjusted, persevering, and disconnected counselors. The results also indicated that counselors' job satisfaction and self-esteem were good discriminators…
Regression analysis of clustered failure time data with informative cluster size under the additive transformation models.

PubMed

Chen, Ling; Feng, Yanqin; Sun, Jianguo

2017-10-01

This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
Segmenting Student Markets with a Student Satisfaction and Priorities Survey.

ERIC Educational Resources Information Center

Borden, Victor M. H.

1995-01-01

A market segmentation analysis of 872 university students compared 2 hierarchical clustering procedures for deriving market segments: 1 using matching-type measures and an agglomerative clustering algorithm, and 1 using the chi-square based automatic interaction detection. Results and implications for planning, evaluating, and improving academic…
Cluster analysis in phenotyping a Portuguese population.

PubMed

Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

2015-09-03

Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.
An integrated bioinformatics approach to improve two-color microarray quality-control: impact on biological conclusions.

PubMed

van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A

2009-06-01

Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.
Software system for data management and distributed processing of multichannel biomedical signals.

PubMed

Franaszczuk, P J; Jouny, C C

2004-01-01

The presented software is designed for efficient utilization of cluster of PC computers for signal analysis of multichannel physiological data. The system consists of three main components: 1) a library of input and output procedures, 2) a database storing additional information about location in a storage system, 3) a user interface for selecting data for analysis, choosing programs for analysis, and distributing computing and output data on cluster nodes. The system allows for processing multichannel time series data in multiple binary formats. The description of data format, channels and time of recording are included in separate text files. Definition and selection of multiple channel montages is possible. Epochs for analysis can be selected both manually and automatically. Implementation of a new signal processing procedures is possible with a minimal programming overhead for the input/output processing and user interface. The number of nodes in cluster used for computations and amount of storage can be changed with no major modification to software. Current implementations include the time-frequency analysis of multiday, multichannel recordings of intracranial EEG of epileptic patients as well as evoked response analyses of repeated cognitive tasks.
False Discovery Control in Large-Scale Spatial Multiple Testing

PubMed Central

Sun, Wenguang; Reich, Brian J.; Cai, T. Tony; Guindani, Michele; Schwartzman, Armin

2014-01-01

Summary This article develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both point-wise and cluster-wise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the proposed procedures lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analyzing the time trends in tropospheric ozone in eastern US. PMID:25642138
The cosmological analysis of X-ray cluster surveys. III. 4D X-ray observable diagrams

NASA Astrophysics Data System (ADS)

Pierre, M.; Valotti, A.; Faccioli, L.; Clerc, N.; Gastaud, R.; Koulouridis, E.; Pacaud, F.

2017-11-01

Context. Despite compelling theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties surrounding cluster-mass estimates. Aims: Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. Methods: We model the population of detected clusters in the count-rate - hardness-ratio - angular size - redshift space and compare the corresponding four-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are estimated by averaging the results from ten independent survey realisations. The method allows a simultaneous fit of the cosmological parameters of the cluster evolutionary physics and of the selection effects. Results: When using information from the X-ray survey alone plus redshifts, this approach is shown to be as accurate as the modelling of the mass function for the cosmological parameters and to perform better for the cluster physics, for a similar level of assumptions on the scaling relations. It enables the identification of degenerate combinations of parameter values. Conclusions: Given the considerably shorter computer times involved for running the minimisation procedure in the observed parameter space, this method appears to clearly outperform traditional mass-based approaches when X-ray survey data alone are available.
Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

PubMed

Austin, Peter C

2010-04-22

Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Task Analysis for Health Occupations. Cluster: Medical Assisting. Occupation: Medical Assistant. Education for Employment Task Lists.

ERIC Educational Resources Information Center

Lathrop, Janice

Task analyses are provided for two duty areas for the occupation of medical assistant in the medical assisting cluster. Five tasks for the duty area "providing therapeutic measures" are as follows: assist with dressing change, apply clean dressing, apply elastic bandage, assist physician in therapeutic procedure, and apply topical…
Homogeneity tests of clustered diagnostic markers with applications to the BioCycle Study

PubMed Central

Tang, Liansheng Larry; Liu, Aiyi; Schisterman, Enrique F.; Zhou, Xiao-Hua; Liu, Catherine Chun-ling

2014-01-01

Diagnostic trials often require the use of a homogeneity test among several markers. Such a test may be necessary to determine the power both during the design phase and in the initial analysis stage. However, no formal method is available for the power and sample size calculation when the number of markers is greater than two and marker measurements are clustered in subjects. This article presents two procedures for testing the accuracy among clustered diagnostic markers. The first procedure is a test of homogeneity among continuous markers based on a global null hypothesis of the same accuracy. The result under the alternative provides the explicit distribution for the power and sample size calculation. The second procedure is a simultaneous pairwise comparison test based on weighted areas under the receiver operating characteristic curves. This test is particularly useful if a global difference among markers is found by the homogeneity test. We apply our procedures to the BioCycle Study designed to assess and compare the accuracy of hormone and oxidative stress markers in distinguishing women with ovulatory menstrual cycles from those without. PMID:22733707
A Comparison of Heuristic Procedures for Minimum within-Cluster Sums of Squares Partitioning

ERIC Educational Resources Information Center

Brusco, Michael J.; Steinley, Douglas

2007-01-01

Perhaps the most common criterion for partitioning a data set is the minimization of the within-cluster sums of squared deviation from cluster centroids. Although optimal solution procedures for within-cluster sums of squares (WCSS) partitioning are computationally feasible for small data sets, heuristic procedures are required for most practical…
Psychological profiling of offender characteristics from crime behaviors in serial rape offences.

PubMed

Kocsis, Richard N; Cooksey, Ray W; Irwin, Harvey J

2002-04-01

Criminal psychological profiling has progressively been incorporated into police procedures despite a dearth of empirical research. Indeed, in the study of serial violent crimes for the purpose of psychological profiling, very few original, quantitative, academically reviewed studies actually exist. This article reports on the analysis of 62 incidents of serial sexual assault. The statistical procedure of multidimensional scaling was employed in the analysis of this data, which in turn produced a five-cluster model of serial rapist behavior. First, a central cluster of behaviors were identified that represent common behaviors to all patterns of serial rape. Second, four distinct outlying patterns were identified as demonstrating distinct offence styles, these being assigned the following descriptive labels brutality, intercourse, chaotic, and ritual. Furthermore, analysis of these patterns also identified distinct offender characteristics that allow for the use of empirically robust offender profiles in future serial rape investigations.
Effective implementation of hierarchical clustering

NASA Astrophysics Data System (ADS)

Verma, Mudita; Vijayarajan, V.; Sivashanmugam, G.; Bessie Amali, D. Geraldine

2017-11-01

Hierarchical clustering is generally used for cluster analysis in which we build up a hierarchy of clusters. In order to find that which cluster should be split a large amount of observations are being carried out. Here the data set of US based personalities has been considered for clustering. After implementation of hierarchical clustering on the data set we group it in three different clusters one is of politician, sports person and musicians. Training set is the main parameter which decides the category which has to be assigned to the observations that are being collected. The category of these observations must be known. Recognition comes from the formulation of classification. Supervised learning has the main instance in the form of classification. While on the other hand Clustering is an instance of unsupervised procedure. Clustering consists of grouping of data that have similar properties which are either their own or are inherited from some other sources.
The acceptability among young Hindus and Muslims of actively ending the lives of newborns with genetic defects.

PubMed

Kamble, Shanmukh; Ahmed, Ramadan; Sorum, Paul Clay; Mullet, Etienne

2014-03-01

To explore the views in non-Western cultures about ending the lives of damaged newborns. 254 university students from India and 150 from Kuwait rated the acceptability of ending the lives of newborns with genetic defects in 54 vignettes consisting of all combinations of four factors: gestational age (term or 7 months); severity of genetic defect (trisomy 21 alone, trisomy 21 with serious morphological abnormalities or trisomy 13 with impending death); the parents' attitude about prolonging care (unknown, in favour or opposed); and the procedure used (withholding treatment, withdrawing it or injecting a lethal substance). Four clusters were identified by cluster analysis and subjected to analysis of variance. Cluster I, labelled 'Never Acceptable', included 4% of the Indians and 59% of the Kuwaitis. Cluster II, 'No Firm Opinion', had little variation in rating from one scenario to the next; it included 38% of the Indians and 18% of the Kuwaitis. In Cluster III, 'Parents' Attitude+Severity+Procedure', all three factors affected the ratings; it was composed of 18% of the Indians and 16% of the Kuwaitis. Cluster IV was called 'Severity+Parents' Attitude' because these had the strongest impact; it was composed of 40% of the Indians and 7% of the Kuwaitis. In accordance with the teachings of Islam versus Hinduism, Kuwaiti students were more likely to oppose ending a newborn's life under all conditions, Indian students more likely to favour it and to judge its acceptability in light of the different circumstances.

Urban hospital 'clusters' do shift high-risk procedures to key facilities, but more could be done.

PubMed

Luke, Roice D; Luke, Tyler; Muller, Nancy

2011-09-01

Since the 1990s, rapid consolidation in the hospital sector has resulted in the vast majority of hospitals joining systems that already had a considerable presence within their markets. We refer to these important local and regional systems as "clusters." To determine whether hospital clusters have taken measurable steps aimed at improving the quality of care-specifically, by concentrating low-volume, high-complexity services within selected "lead" facilities-this study examined within-cluster concentrations of high-risk cases for seven surgical procedures. We found that lead hospitals on average performed fairly high percentages of the procedures per cluster, ranging from 59 percent for esophagectomy to 87 percent for aortic valve replacement. The numbers indicate that hospitals might need to work with rival facilities outside their cluster to concentrate cases for the lowest-volume procedures, such as esophagectomies, whereas coordination among cluster members might be sufficient for higher-volume procedures. The results imply that policy makers should focus on clusters' potential for restructuring care and further coordinating services across hospitals in local areas.
Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor)

PubMed Central

Zhang, Jingjing; Dennis, Todd E.

2015-01-01

We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known ‘artificial behaviours’ comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified. PMID:25922935
Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor).

PubMed

Zhang, Jingjing; O'Reilly, Kathleen M; Perry, George L W; Taylor, Graeme A; Dennis, Todd E

2015-01-01

We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known 'artificial behaviours' comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified.
Comparative study of two protocols for quantitative image-analysis of serotonin transporter clustering in lymphocytes, a putative biomarker of therapeutic efficacy in major depression.

PubMed

Romay-Tallon, Raquel; Rivera-Baltanas, Tania; Allen, Josh; Olivares, Jose M; Kalynchuk, Lisa E; Caruncho, Hector J

2017-01-01

The pattern of serotonin transporter clustering on the plasma membrane of lymphocytes extracted from human whole blood samples has been identified as a putative biomarker of therapeutic efficacy in major depression. Here we evaluated the possibility of performing a similar analysis using blood smears obtained from rats, and from control human subjects and depression patients. We hypothesized that we could optimize a protocol to make the analysis of serotonin protein clustering in blood smears comparable to the analysis of serotonin protein clustering using isolated lymphocytes. Our data indicate that blood smears require a longer fixation time and longer times of incubation with primary and secondary antibodies. In addition, one needs to optimize the image analysis settings for the analysis of smears. When these steps are followed, the quantitative analysis of both the number and size of serotonin transporter clusters on the plasma membrane of lymphocytes is similar using both blood smears and isolated lymphocytes. The development of this novel protocol will greatly facilitate the collection of appropriate samples by eliminating the necessity and cost of specialized personnel for drawing blood samples, and by being a less invasive procedure. Therefore, this protocol will help us advance the validation of membrane protein clustering in lymphocytes as a biomarker of therapeutic efficacy in major depression, and bring it closer to its clinical application.
Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

PubMed

Ruzik, L; Obarski, N; Papierz, A; Mojski, M

2015-06-01

High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
Relationship between Procedural Tactical Knowledge and Specific Motor Skills in Young Soccer Players

PubMed Central

Aquino, Rodrigo; Marques, Renato Francisco R.; Petiot, Grégory Hallé; Gonçalves, Luiz Guilherme C.; Moraes, Camila; Santiago, Paulo Roberto P.; Puggina, Enrico Fuini

2016-01-01

The purpose of this study was to investigate the association between offensive tactical knowledge and the soccer-specific motor skills performance. Fifteen participants were submitted to two evaluation tests, one to assess their technical and tactical analysis. The motor skills performance was measured through four tests of technical soccer skills: ball control, shooting, passing and dribbling. The tactical performance was based on a tactical assessment system called FUT-SAT (Analyses of Procedural Tactical Knowledge in Soccer). Afterwards, technical and tactical evaluation scores were ranked with and without the use of the cluster method. A positive, weak correlation was perceived in both analyses (rho = 0.39, not significant p = 0.14 (with cluster analysis); and rho = 0.35; not significant p = 0.20 (without cluster analysis)). We can conclude that there was a weak association between the technical and the offensive tactical knowledge. This shows the need to reflect on the use of such tests to assess technical skills in team sports since they do not take into account the variability and unpredictability of game actions and disregard the inherent needs to assess such skill performance in the game. PMID:29910300
Failure Mode Identification Through Clustering Analysis

NASA Technical Reports Server (NTRS)

Arunajadai, Srikesh G.; Stone, Robert B.; Tumer, Irem Y.; Clancy, Daniel (Technical Monitor)

2002-01-01

Research has shown that nearly 80% of the costs and problems are created in product development and that cost and quality are essentially designed into products in the conceptual stage. Currently, failure identification procedures (such as FMEA (Failure Modes and Effects Analysis), FMECA (Failure Modes, Effects and Criticality Analysis) and FTA (Fault Tree Analysis)) and design of experiments are being used for quality control and for the detection of potential failure modes during the detail design stage or post-product launch. Though all of these methods have their own advantages, they do not give information as to what are the predominant failures that a designer should focus on while designing a product. This work uses a functional approach to identify failure modes, which hypothesizes that similarities exist between different failure modes based on the functionality of the product/component. In this paper, a statistical clustering procedure is proposed to retrieve information on the set of predominant failures that a function experiences. The various stages of the methodology are illustrated using a hypothetical design example.
COVARIATE-ADAPTIVE CLUSTERING OF EXPOSURES FOR AIR POLLUTION EPIDEMIOLOGY COHORTS*

PubMed Central

Keller, Joshua P.; Drton, Mathias; Larson, Timothy; Kaufman, Joel D.; Sandler, Dale P.; Szpiro, Adam A.

2017-01-01

Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive k-means procedure identifies centers using a mixture model and is followed by multi-class spatial prediction. In simulations, we demonstrate that predictive k-means can reduce misclassification error by over 50% compared to ordinary k-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive k-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM2.5) exposure varies significantly between different clusters of PM2.5 component profiles. Our cluster-based analysis shows that for subjects assigned to a cluster located in the Midwestern U.S., a 10 μg/m3 difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP. PMID:28572869
Clustering algorithm evaluation and the development of a replacement for procedure 1. [for crop inventories

NASA Technical Reports Server (NTRS)

Lennington, R. K.; Johnson, J. K.

1979-01-01

An efficient procedure which clusters data using a completely unsupervised clustering algorithm and then uses labeled pixels to label the resulting clusters or perform a stratified estimate using the clusters as strata is developed. Three clustering algorithms, CLASSY, AMOEBA, and ISOCLS, are compared for efficiency. Three stratified estimation schemes and three labeling schemes are also considered and compared.
The Cluster Analysis of Jobs Based on Data from the Position Analysis Questionnaire (PAQ). Report No. 7.

ERIC Educational Resources Information Center

DeNisi, Angelo S.; McCormick, Ernest J.

The Position Analysis Questionnaire (PAQ) is a structured job analysis procedure that provides for the analysis of jobs in terms of each of 187 job elements, these job elements being grouped into six divisions: information input, mental processes, work output, relationships with other persons, job context, and other job characteristics. Two…
[Study on procedure of seed quality testing and seed grading scale of Phellodendron amurense].

PubMed

Liu, Yanlu; Zhang, Zhao; Dai, Lingchao; Zhang, Bengang; Zhang, Xiaoling; Wang, Han

2011-12-01

To study the procedure of seed quality testing and seed grading scale of Phellodendron amurense. Seed quality testing methods were developed, which included the test of sampling, seed purity, weight per 1 000 seeds, seed moisture, seed viability and germination rate. The related data from 62 cases of seed specimens of P. amurense were analyzed by cluster analysis. The seed quality test procedure was developed, and the seed quality grading scale was formulated.
Review of Instructional Approaches in Ethics Education.

PubMed

Mulhearn, Tyler J; Steele, Logan M; Watts, Logan L; Medeiros, Kelsey E; Mumford, Michael D; Connelly, Shane

2017-06-01

Increased investment in ethics education has prompted a variety of instructional objectives and frameworks. Yet, no systematic procedure to classify these varying instructional approaches has been attempted. In the present study, a quantitative clustering procedure was conducted to derive a typology of instruction in ethics education. In total, 330 ethics training programs were included in the cluster analysis. The training programs were appraised with respect to four instructional categories including instructional content, processes, delivery methods, and activities. Eight instructional approaches were identified through this clustering procedure, and these instructional approaches showed different levels of effectiveness. Instructional effectiveness was assessed based on one of nine commonly used ethics criteria. With respect to specific training types, Professional Decision Processes Training (d = 0.50) and Field-Specific Compliance Training (d = 0.46) appear to be viable approaches to ethics training based on Cohen's d effect size estimates. By contrast, two commonly used approaches, General Discussion Training (d = 0.31) and Norm Adherence Training (d = 0.37), were found to be considerably less effective. The implications for instruction in ethics training are discussed.
Aftershock identification problem via the nearest-neighbor analysis for marked point processes

NASA Astrophysics Data System (ADS)

Gabrielov, A.; Zaliapin, I.; Wong, H.; Keilis-Borok, V.

2007-12-01

The centennial observations on the world seismicity have revealed a wide variety of clustering phenomena that unfold in the space-time-energy domain and provide most reliable information about the earthquake dynamics. However, there is neither a unifying theory nor a convenient statistical apparatus that would naturally account for the different types of seismic clustering. In this talk we present a theoretical framework for nearest-neighbor analysis of marked processes and obtain new results on hierarchical approach to studying seismic clustering introduced by Baiesi and Paczuski (2004). Recall that under this approach one defines an asymmetric distance D in space-time-energy domain such that the nearest-neighbor spanning graph with respect to D becomes a time- oriented tree. We demonstrate how this approach can be used to detect earthquake clustering. We apply our analysis to the observed seismicity of California and synthetic catalogs from ETAS model and show that the earthquake clustering part is statistically different from the homogeneous part. This finding may serve as a basis for an objective aftershock identification procedure.
Simultaneous Classification and Multidimensional Scaling with External Information

ERIC Educational Resources Information Center

Kiers, Henk A. L.; Vicari, Donatella; Vichi, Maurizio

2005-01-01

For the exploratory analysis of a matrix of proximities or (dis)similarities between objects, one often uses cluster analysis (CA) or multidimensional scaling (MDS). Solutions resulting from such analyses are sometimes interpreted using external information on the objects. Usually the procedures of CA, MDS and using external information are…
Automated modal parameter estimation using correlation analysis and bootstrap sampling

NASA Astrophysics Data System (ADS)

Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.

2018-02-01

The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

PubMed Central

2010-01-01

Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
On evaluating clustering procedures for use in classification

NASA Technical Reports Server (NTRS)

Pore, M. D.; Moritz, T. E.; Register, D. T.; Yao, S. S.; Eppler, W. G. (Principal Investigator)

1979-01-01

The problem of evaluating clustering algorithms and their respective computer programs for use in a preprocessing step for classification is addressed. In clustering for classification the probability of correct classification is suggested as the ultimate measure of accuracy on training data. A means of implementing this criterion and a measure of cluster purity are discussed. Examples are given. A procedure for cluster labeling that is based on cluster purity and sample size is presented.
The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

NASA Astrophysics Data System (ADS)

Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

2017-07-01

Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Data processing 1: Advancements in machine analysis of multispectral data

NASA Technical Reports Server (NTRS)

Swain, P. H.

1972-01-01

Multispectral data processing procedures are outlined beginning with the data display process used to accomplish data editing and proceeding through clustering, feature selection criterion for error probability estimation, and sample clustering and sample classification. The effective utilization of large quantities of remote sensing data by formulating a three stage sampling model for evaluation of crop acreage estimates represents an improvement in determining the cost benefit relationship associated with remote sensing technology.
A Unique Procedure to Identify Cell Surface Markers Through a Spherical Self-Organizing Map Applied to DNA Microarray Analysis.

PubMed

Sugii, Yuh; Kasai, Tomonari; Ikeda, Masashi; Vaidyanath, Arun; Kumon, Kazuki; Mizutani, Akifumi; Seno, Akimasa; Tokutaka, Heizo; Kudoh, Takayuki; Seno, Masaharu

2016-01-01

To identify cell-specific markers, we designed a DNA microarray platform with oligonucleotide probes for human membrane-anchored proteins. Human glioma cell lines were analyzed using microarray and compared with normal and fetal brain tissues. For the microarray analysis, we employed a spherical self-organizing map, which is a clustering method suitable for the conversion of multidimensional data into two-dimensional data and displays the relationship on a spherical surface. Based on the gene expression profile, the cell surface characteristics were successfully mirrored onto the spherical surface, thereby distinguishing normal brain tissue from the disease model based on the strength of gene expression. The clustered glioma-specific genes were further analyzed by polymerase chain reaction procedure and immunocytochemical staining of glioma cells. Our platform and the following procedure were successfully demonstrated to categorize the genes coding for cell surface proteins that are specific to glioma cells. Our assessment demonstrates that a spherical self-organizing map is a valuable tool for distinguishing cell surface markers and can be employed in marker discovery studies for the treatment of cancer.

Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data

PubMed Central

Tian, Ting; McLachlan, Geoffrey J.; Dieters, Mark J.; Basford, Kaye E.

2015-01-01

It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances. PMID:26689369
Application of Multiple Imputation for Missing Values in Three-Way Three-Mode Multi-Environment Trial Data.

PubMed

Tian, Ting; McLachlan, Geoffrey J; Dieters, Mark J; Basford, Kaye E

2015-01-01

It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.
A QUANTITATIVE ANALYSIS OF DISTANT OPEN CLUSTERS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Janes, Kenneth A.; Hoq, Sadia

2011-03-15

The oldest open star clusters are important for tracing the history of the Galactic disk, but many of the more distant clusters are heavily reddened and projected against the rich stellar background of the Galaxy. We have undertaken an investigation of several distant clusters (Berkeley 19, Berkeley 44, King 25, NGC 6802, NGC 6827, Berkeley 52, Berkeley 56, NGC 7142, NGC 7245, and King 9) to develop procedures for separating probable cluster members from the background field. We next created a simple quantitative approach for finding approximate cluster distances, reddenings, and ages. We first conclude that with the possible exceptionmore » of King 25 they are probably all physical clusters. We also find that for these distant clusters our typical errors are about {+-}0.07 in E(B - V), {+-}0.15 in log(age), and {+-}0.25 in (m - M){sub o}. The clusters range in age from 470 Myr to 7 Gyr and range from 7.1 to 16.4 kpc from the Galactic center.« less
A comparison of regional flood frequency analysis approaches in a simulation framework

NASA Astrophysics Data System (ADS)

Ganora, D.; Laio, F.

2016-07-01

Regional frequency analysis (RFA) is a well-established methodology to provide an estimate of the flood frequency curve at ungauged (or scarcely gauged) sites. Different RFA approaches exist, depending on the way the information is transferred to the site of interest, but it is not clear in the literature if a specific method systematically outperforms the others. The aim of this study is to provide a framework wherein carrying out the intercomparison by building up a virtual environment based on synthetically generated data. The considered regional approaches include: (i) a unique regional curve for the whole region; (ii) a multiple-region model where homogeneous subregions are determined through cluster analysis; (iii) a Region-of-Influence model which defines a homogeneous subregion for each site; (iv) a spatially smooth estimation procedure where the parameters of the regional model vary continuously along the space. Virtual environments are generated considering different patterns of heterogeneity, including step change and smooth variations. If the region is heterogeneous, with the parent distribution changing continuously within the region, the spatially smooth regional approach outperforms the others, with overall errors 10-50% lower than the other methods. In the case of a step-change, the spatially smooth and clustering procedures perform similarly if the heterogeneity is moderate, while clustering procedures work better when the step-change is severe. To extend our findings, an extensive sensitivity analysis has been performed to investigate the effect of sample length, number of virtual stations, return period of the predicted quantile, variability of the scale parameter of the parent distribution, number of predictor variables and different parent distribution. Overall, the spatially smooth approach appears as the most robust approach as its performances are more stable across different patterns of heterogeneity, especially when short records are considered.
Network module detection: Affinity search technique with the multi-node topological overlap measure

PubMed Central

Li, Ai; Horvath, Steve

2009-01-01

Background Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: PMID:19619323
Network module detection: Affinity search technique with the multi-node topological overlap measure.

PubMed

Li, Ai; Horvath, Steve

2009-07-20

Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/
Application of microarray analysis on computer cluster and cloud platforms.

PubMed

Bernau, C; Boulesteix, A-L; Knaus, J

2013-01-01

Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Coping profiles, perceived stress and health-related behaviors: a cluster analysis approach.

PubMed

Doron, Julie; Trouillet, Raphael; Maneveau, Anaïs; Ninot, Grégory; Neveu, Dorine

2015-03-01

Using cluster analytical procedure, this study aimed (i) to determine whether people could be differentiated on the basis of coping profiles (or unique combinations of coping strategies); and (ii) to examine the relationships between these profiles and perceived stress and health-related behaviors. A sample of 578 French students (345 females, 233 males; M(age)= 21.78, SD(age)= 2.21) completed the Perceived Stress Scale-14 ( Bruchon-Schweitzer, 2002), the Brief COPE ( Muller and Spitz, 2003) and a series of items measuring health-related behaviors. A two-phased cluster analytic procedure (i.e. hierarchical and non-hierarchical-k-means) was employed to derive clusters of coping strategy profiles. The results yielded four distinctive coping profiles: High Copers, Adaptive Copers, Avoidant Copers and Low Copers. The results showed that clusters differed significantly in perceived stress and health-related behaviors. High Copers and Avoidant Copers displayed higher levels of perceived stress and engaged more in unhealthy behavior, compared with Adaptive Copers and Low Copers who reported lower levels of stress and engaged more in healthy behaviors. These findings suggested that individuals' relative reliance on some strategies and de-emphasis on others may be a more advantageous way of understanding the manner in which individuals cope with stress. Therefore, cluster analysis approach may provide an advantage over more traditional statistical techniques by identifying distinct coping profiles that might best benefit from interventions. Future research should consider coping profiles to provide a deeper understanding of the relationships between coping strategies and health outcomes and to identify risk groups. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study.

PubMed

Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

2016-01-01

The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.
The association between content of the elements S, Cl, K, Fe, Cu, Zn and Br in normal and cirrhotic liver tissue from Danes and Greenlandic Inuit examined by dual hierarchical clustering analysis.

PubMed

Laursen, Jens; Milman, Nils; Pind, Niels; Pedersen, Henrik; Mulvad, Gert

2014-01-01

Meta-analysis of previous studies evaluating associations between content of elements sulphur (S), chlorine (Cl), potassium (K), iron (Fe), copper (Cu), zinc (Zn) and bromine (Br) in normal and cirrhotic autopsy liver tissue samples. Normal liver samples from 45 Greenlandic Inuit, median age 60 years and from 71 Danes, median age 61 years. Cirrhotic liver samples from 27 Danes, median age 71 years. Element content was measured using X-ray fluorescence spectrometry. Dual hierarchical clustering analysis, creating a dual dendrogram, one clustering element contents according to calculated similarities, one clustering elements according to correlation coefficients between the element contents, both using Euclidian distance and Ward Procedure. One dendrogram separated subjects in 7 clusters showing no differences in ethnicity, gender or age. The analysis discriminated between elements in normal and cirrhotic livers. The other dendrogram clustered elements in four clusters: sulphur and chlorine; copper and bromine; potassium and zinc; iron. There were significant correlations between the elements in normal liver samples: S was associated with Cl, K, Br and Zn; Cl with S and Br; K with S, Br and Zn; Cu with Br. Zn with S and K. Br with S, Cl, K and Cu. Fe did not show significant associations with any other element. In contrast to simple statistical methods, which analyses content of elements separately one by one, dual hierarchical clustering analysis incorporates all elements at the same time and can be used to examine the linkage and interplay between multiple elements in tissue samples. Copyright © 2013 Elsevier GmbH. All rights reserved.
Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters.

PubMed

Lukashin, A V; Fuchs, R

2001-05-01

Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.
The Mucciardi-Gose Clustering Algorithm and Its Applications in Automatic Pattern Recognition.

DTIC Science & Technology

A procedure known as the Mucciardi- Gose clustering algorithm, CLUSTR, for determining the geometrical or statistical relationships among groups of N...discussion of clustering algorithms is given; the particular advantages of the Mucciardi- Gose procedure are described. The mathematical basis for, and the
Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

PubMed Central

Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

2013-01-01

Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674
Accurate calibration of a molecular beam time-of-flight mass spectrometer for on-line analysis of high molecular weight species.

PubMed

Apicella, B; Wang, X; Passaro, M; Ciajolo, A; Russo, C

2016-10-15

Time-of-Flight (TOF) Mass Spectrometry is a powerful analytical technique, provided that an accurate calibration by standard molecules in the same m/z range of the analytes is performed. Calibration in a very large m/z range is a difficult task, particularly in studies focusing on the detection of high molecular weight clusters of different molecules or high molecular weight species. External calibration is the most common procedure used for TOF mass spectrometric analysis in the gas phase and, generally, the only available standards are made up of mixtures of noble gases, covering a small mass range for calibration, up to m/z 136 (higher mass isotope of xenon). In this work, an accurate calibration of a Molecular Beam Time-of Flight Mass Spectrometer (MB-TOFMS) is presented, based on the use of water clusters up to m/z 3000. The advantages of calibrating a MB-TOFMS with water clusters for the detection of analytes with masses above those of the traditional calibrants such as noble gases were quantitatively shown by statistical calculations. A comparison of the water cluster and noble gases calibration procedures in attributing the masses to a test mixture extending up to m/z 800 is also reported. In the case of the analysis of combustion products, another important feature of water cluster calibration was shown, that is the possibility of using them as "internal standard" directly formed from the combustion water, under suitable experimental conditions. The water clusters calibration of a MB-TOFMS gives rise to a ten-fold reduction in error compared to the traditional calibration with noble gases. The consequent improvement in mass accuracy in the calibration of a MB-TOFMS has important implications in various fields where detection of high molecular mass species is required. In combustion products analysis, it is also possible to obtain a new calibration spectrum before the acquisition of each spectrum, only modifying some operative conditions. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Task Analysis for Health Occupations. Cluster: Nursing. Occupation: Geriatric Aide. Education for Employment Task Lists.

ERIC Educational Resources Information Center

Lake County Area Vocational Center, Grayslake, IL.

This task analysis for nursing education provides performance standards, steps to be followed, knowledge required, attitudes to be developed, safety procedures, and equipment and supplies needed for 13 tasks performed by geriatric aides in the duty area of performing diagnostic measures and for 30 tasks in the duty area of providing therapeutic…
Intra-Group Motivational Analysis of Students with Learning Disabilities: A Goal Orientation Approach

ERIC Educational Resources Information Center

Sideris, Georgios D.; Tsorbatzoudis, Charalambos

2003-01-01

The purpose of the present study was to profile, using a K-means cluster analysis, the cognitive, motivational, affective, and goal orientation characteristics of elementary school students with and without learning disabilities (LD). Participants were 58 fifth and 6 sixth graders (29 typical and 29 LD) selected using stratified random procedures.…
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

PubMed

Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

2016-08-31

Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.
A Hybrid Approach to Composite Damage and Failure Analysis Combining Synergistic Damage Mechanics and Peridynamics

DTIC Science & Technology

2016-09-30

far from uniform . The final nonuniform distribution of fibers consists of clustered regions and resin pockets. The clustered fiber regions promote...period. Approach and Results A novel procedure has been devised to create nonuniform fiber distributions from the initial fiber bundle (with...used in simulations to produce nonuniform configurations. 2 . , •• ... . .. ·~ · . .. 000 8oa~.f𔄂oo o0~&mt~ go ... ·~· %(1 "’ ,~o ooif-l /j
Determining the trophic guilds of fishes and macroinvertebrates in a seagrass food web

USGS Publications Warehouse

Luczkovich, J.J.; Ward, G.P.; Johnson, J.C.; Christian, R.R.; Baird, D.; Neckles, H.; Rizzo, W.M.

2002-01-01

We established trophic guilds of macroinvertebrate and fish taxa using correspondence analysis and a hierarchical clustering strategy for a seagrass food web in winter in the northeastern Gulf of Mexico. To create the diet matrix, we characterized the trophic linkages of macroinvertebrate and fish taxa present in Halodule wrightii seagrass habitat areas within the St. Marks National Wildlife Refuge (Florida) using binary data, combining dietary links obtained from relevant literature for macroinvertebrates with stomach analysis of common fishes collected during January and February of 1994. Heirarchical average-linkage cluster analysis of the 73 taxa of fishes and macroinvertebrates in the diet matrix yielded 14 clusters with diet similarity ??? 0.60. We then used correspondence analysis with three factors to jointly plot the coordinates of the consumers (identified by cluster membership) and of the 33 food sources. Correspondence analysis served as a visualization tool for assigning each taxon to one of eight trophic guilds: herbivores, detritivores, suspension feeders, omnivores, molluscivores, meiobenthos consumers, macrobenthos consumers, and piscivores. These trophic groups, cross-classified with major taxonomic groups, were further used to develop consumer compartments in a network analysis model of carbon flow in this seagrass ecosystem. The method presented here should greatly improve the development of future network models of food webs by providing an objective procedure for aggregating trophic groups.
Markov Chain Monte Carlo Joint Analysis of Chandra X-Ray Imaging Spectroscopy and Sunyaev-Zel'dovich Effect Data

NASA Technical Reports Server (NTRS)

Bonamente, Massimillano; Joy, Marshall K.; Carlstrom, John E.; Reese, Erik D.; LaRoque, Samuel J.

2004-01-01

X-ray and Sunyaev-Zel'dovich effect data can be combined to determine the distance to galaxy clusters. High-resolution X-ray data are now available from Chandra, which provides both spatial and spectral information, and Sunyaev-Zel'dovich effect data were obtained from the BIMA and Owens Valley Radio Observatory (OVRO) arrays. We introduce a Markov Chain Monte Carlo procedure for the joint analysis of X-ray and Sunyaev- Zel'dovich effect data. The advantages of this method are the high computational efficiency and the ability to measure simultaneously the probability distribution of all parameters of interest, such as the spatial and spectral properties of the cluster gas and also for derivative quantities such as the distance to the cluster. We demonstrate this technique by applying it to the Chandra X-ray data and the OVRO radio data for the galaxy cluster A611. Comparisons with traditional likelihood ratio methods reveal the robustness of the method. This method will be used in follow-up paper to determine the distances to a large sample of galaxy cluster.

Novel approaches to pin cluster synchronization on complex dynamical networks in Lur'e forms

NASA Astrophysics Data System (ADS)

Tang, Ze; Park, Ju H.; Feng, Jianwen

2018-04-01

This paper investigates the cluster synchronization of complex dynamical networks consisted of identical or nonidentical Lur'e systems. Due to the special topology structure of the complex networks and the existence of stochastic perturbations, a kind of randomly occurring pinning controller is designed which not only synchronizes all Lur'e systems in the same cluster but also decreases the negative influence among different clusters. Firstly, based on an extended integral inequality, the convex combination theorem and S-procedure, the conditions for cluster synchronization of identical Lur'e networks are derived in a convex domain. Secondly, randomly occurring adaptive pinning controllers with two independent Bernoulli stochastic variables are designed and then sufficient conditions are obtained for the cluster synchronization on complex networks consisted of nonidentical Lur'e systems. In addition, suitable control gains for successful cluster synchronization of nonidentical Lur'e networks are acquired by designing some adaptive updating laws. Finally, we present two numerical examples to demonstrate the validity of the control scheme and the theoretical analysis.
Robust continuous clustering

PubMed Central

Shah, Sohil Atul

2017-01-01

Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Cluster analysis as a tool for evaluating the exploration potential of Known Geothermal Resource Areas

DOE PAGES

Lindsey, Cary R.; Neupane, Ghanashym; Spycher, Nicolas; ...

2018-01-03

Although many Known Geothermal Resource Areas in Oregon and Idaho were identified during the 1970s and 1980s, few were subsequently developed commercially. Because of advances in power plant design and energy conversion efficiency since the 1980s, some previously identified KGRAs may now be economically viable prospects. Unfortunately, available characterization data vary widely in accuracy, precision, and granularity, making assessments problematic. In this paper, we suggest a procedure for comparing test areas against proven resources using Principal Component Analysis and cluster identification. The result is a low-cost tool for evaluating potential exploration targets using uncertain or incomplete data.
Cluster analysis as a tool for evaluating the exploration potential of Known Geothermal Resource Areas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lindsey, Cary R.; Neupane, Ghanashym; Spycher, Nicolas

Although many Known Geothermal Resource Areas in Oregon and Idaho were identified during the 1970s and 1980s, few were subsequently developed commercially. Because of advances in power plant design and energy conversion efficiency since the 1980s, some previously identified KGRAs may now be economically viable prospects. Unfortunately, available characterization data vary widely in accuracy, precision, and granularity, making assessments problematic. In this paper, we suggest a procedure for comparing test areas against proven resources using Principal Component Analysis and cluster identification. The result is a low-cost tool for evaluating potential exploration targets using uncertain or incomplete data.
Finding text in color images

NASA Astrophysics Data System (ADS)

Zhou, Jiangying; Lopresti, Daniel P.; Tasdizen, Tolga

1998-04-01

In this paper, we consider the problem of locating and extracting text from WWW images. A previous algorithm based on color clustering and connected components analysis works well as long as the color of each character is relatively uniform and the typography is fairly simple. It breaks down quickly, however, when these assumptions are violated. In this paper, we describe more robust techniques for dealing with this challenging problem. We present an improved color clustering algorithm that measures similarity based on both RGB and spatial proximity. Layout analysis is also incorporated to handle more complex typography. THese changes significantly enhance the performance of our text detection procedure.
Detecting multiple outliers in linear functional relationship model for circular variables using clustering technique

NASA Astrophysics Data System (ADS)

Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor

2017-05-01

Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
Determination of Cluster Distances from Chandra Imaging Spectroscopy and Sunyaev-Zeldovich Effect Measurements. I; Analysis Methods and Initial Results

NASA Technical Reports Server (NTRS)

Bonamente, Massimiliano; Joy, Marshall K.; Carlstrom, John E.; LaRoque, Samuel J.

2004-01-01

X-ray and Sunyaev-Zeldovich Effect data ca,n be combined to determine the distance to galaxy clusters. High-resolution X-ray data are now available from the Chandra Observatory, which provides both spatial and spectral information, and interferometric radio measurements of the Sunyam-Zeldovich Effect are available from the BIMA and 0VR.O arrays. We introduce a Monte Carlo Markov chain procedure for the joint analysis of X-ray and Sunyaev-Zeldovich Effect data. The advantages of this method are the high computational efficiency and the ability to measure the full probability distribution of all parameters of interest, such as the spatial and spectral properties of the cluster gas and the cluster distance. We apply this technique to the Chandra X-ray data and the OVRO radio data for the galaxy cluster Abell 611. Comparisons with traditional likelihood-ratio methods reveal the robustness of the method. This method will be used in a follow-up paper to determine the distance of a large sample of galaxy clusters for which high-resolution Chandra X-ray and BIMA/OVRO radio data are available.
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population.

PubMed

Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi

2015-01-01

Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability--the basis of cluster generation--is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided.
Fast clustering using adaptive density peak detection.

PubMed

Wang, Xiao-Feng; Xu, Yifan

2017-12-01

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
Some Integrated Squared Error Procedures for Multivariate Normal Data,

DTIC Science & Technology

1986-01-01

a lnear regresmion or experimental design model). Our procedures have &lSO been usned wcelyOn non -linear models but we do not addres nan-lnear...of fit, outliers, influence functions, experimental design , cluster analysis, robustness 24L A =TO ACT (VCefme - pvre alli of magsy MW identif by...structured data such as multivariate experimental designs . Several illustrations are provided. * 0 %41 %-. 4.’. * " , -.--, ,. -,, ., -, ’v ’ , " ,,- ,, . -,-. . ., * . - tAma- t
The contribution of cluster and discriminant analysis to the classification of complex aquifer systems.

PubMed

Panagopoulos, G P; Angelopoulou, D; Tzirtzilakis, E E; Giannoulopoulos, P

2016-10-01

This paper presents an innovated method for the discrimination of groundwater samples in common groups representing the hydrogeological units from where they have been pumped. This method proved very efficient even in areas with complex hydrogeological regimes. The proposed method requires chemical analyses of water samples only for major ions, meaning that it is applicable to most of cases worldwide. Another benefit of the method is that it gives a further insight of the aquifer hydrogeochemistry as it provides the ions that are responsible for the discrimination of the group. The procedure begins with cluster analysis of the dataset in order to classify the samples in the corresponding hydrogeological unit. The feasibility of the method is proven from the fact that the samples of volcanic origin were separated into two different clusters, namely the lava units and the pyroclastic-ignimbritic aquifer. The second step is the discriminant analysis of the data which provides the functions that distinguish the groups from each other and the most significant variables that define the hydrochemical composition of the aquifer. The whole procedure was highly successful as the 94.7 % of the samples were classified to the correct aquifer system. Finally, the resulted functions can be safely used to categorize samples of either unknown or doubtful origin improving thus the quality and the size of existing hydrochemical databases.
Screening and clustering of sparse regressions with finite non-Gaussian mixtures.

PubMed

Zhang, Jian

2017-06-01

This article proposes a method to address the problem that can arise when covariates in a regression setting are not Gaussian, which may give rise to approximately mixture-distributed errors, or when a true mixture of regressions produced the data. The method begins with non-Gaussian mixture-based marginal variable screening, followed by fitting a full but relatively smaller mixture regression model to the selected data with help of a new penalization scheme. Under certain regularity conditions, the new screening procedure is shown to possess a sure screening property even when the population is heterogeneous. We further prove that there exists an elbow point in the associated scree plot which results in a consistent estimator of the set of active covariates in the model. By simulations, we demonstrate that the new procedure can substantially improve the performance of the existing procedures in the content of variable screening and data clustering. By applying the proposed procedure to motif data analysis in molecular biology, we demonstrate that the new method holds promise in practice. © 2016, The International Biometric Society.
Extracting Aggregation Free Energies of Mixed Clusters from Simulations of Small Systems: Application to Ionic Surfactant Micelles.

PubMed

Zhang, X; Patel, L A; Beckwith, O; Schneider, R; Weeden, C J; Kindt, J T

2017-11-14

Micelle cluster distributions from molecular dynamics simulations of a solvent-free coarse-grained model of sodium octyl sulfate (SOS) were analyzed using an improved method to extract equilibrium association constants from small-system simulations containing one or two micelle clusters at equilibrium with free surfactants and counterions. The statistical-thermodynamic and mathematical foundations of this partition-enabled analysis of cluster histograms (PEACH) approach are presented. A dramatic reduction in computational time for analysis was achieved through a strategy similar to the selector variable method to circumvent the need for exhaustive enumeration of the possible partitions of surfactants and counterions into clusters. Using statistics from a set of small-system (up to 60 SOS molecules) simulations as input, equilibrium association constants for micelle clusters were obtained as a function of both number of surfactants and number of associated counterions through a global fitting procedure. The resulting free energies were able to accurately predict micelle size and charge distributions in a large (560 molecule) system. The evolution of micelle size and charge with SOS concentration as predicted by the PEACH-derived free energies and by a phenomenological four-parameter model fit, along with the sensitivity of these predictions to variations in cluster definitions, are analyzed and discussed.
Energy spectra of X-ray clusters of galaxies

NASA Technical Reports Server (NTRS)

Avni, Y.

1976-01-01

A procedure for estimating the ranges of parameters that describe the spectra of X-rays from clusters of galaxies is presented. The applicability of the method is proved by statistical simulations of cluster spectra; such a proof is necessary because of the nonlinearity of the spectral functions. Implications for the spectra of the Perseus, Coma, and Virgo clusters are discussed. The procedure can be applied in more general problems of parameter estimation.
Mapping of terrain by computer clustering techniques using multispectral scanner data and using color aerial film

NASA Technical Reports Server (NTRS)

Smedes, H. W.; Linnerud, H. J.; Woolaver, L. B.; Su, M. Y.; Jayroe, R. R.

1972-01-01

Two clustering techniques were used for terrain mapping by computer of test sites in Yellowstone National Park. One test was made with multispectral scanner data using a composite technique which consists of (1) a strictly sequential statistical clustering which is a sequential variance analysis, and (2) a generalized K-means clustering. In this composite technique, the output of (1) is a first approximation of the cluster centers. This is the input to (2) which consists of steps to improve the determination of cluster centers by iterative procedures. Another test was made using the three emulsion layers of color-infrared aerial film as a three-band spectrometer. Relative film densities were analyzed using a simple clustering technique in three-color space. Important advantages of the clustering technique over conventional supervised computer programs are (1) human intervention, preparation time, and manipulation of data are reduced, (2) the computer map, gives unbiased indication of where best to select the reference ground control data, (3) use of easy to obtain inexpensive film, and (4) the geometric distortions can be easily rectified by simple standard photogrammetric techniques.
Preferences and needs of patients with a rheumatic disease regarding the structure and content of online self-management support.

PubMed

Ammerlaan, Judy W; van Os-Medendorp, Harmieke; de Boer-Nijhof, Nienke; Maat, Bertha; Scholtus, Lieske; Kruize, Aike A; Bijlsma, Johannes W J; Geenen, Rinie

2017-03-01

Aim of this study was to investigate preferences and needs regarding the structure and content of a person-centered online self-management support intervention for patients with a rheumatic disease. A four step procedure, consisting of online focus group interviews, consensus meetings with patient representatives, card sorting task and hierarchical cluster analysis was used to identify the preferences and needs. Preferences concerning the structure involved 1) suitability to individual needs and questions, 2) fit to the life stage 3) creating the opportunity to share experiences, be in contact with others, 4) have an expert patient as trainer, 5) allow for doing the training at one's own pace and 6) offer a brief intervention. Hierarchical cluster analysis of 55 content needs comprised eleven clusters: 1) treatment knowledge, 2) societal procedures, 3) physical activity, 4) psychological distress, 5) self-efficacy, 6) provider, 7) fluctuations, 8) dealing with rheumatic disease, 9) communication, 10) intimate relationship, and 11) having children. A comprehensive assessment of preferences and needs in patients with a rheumatic disease is expected to contribute to motivation, adherence to and outcome of self-management-support programs. The overview of preferences and needs can be used to build an online-line self-management intervention. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

USGS Publications Warehouse

Ellefsen, Karl J.; Smith, David

2016-01-01

Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.
Distribution and Genetic Diversity of Bacteriocin Gene Clusters in Rumen Microbial Genomes.

PubMed

Azevedo, Analice C; Bento, Cláudia B P; Ruiz, Jeronimo C; Queiroz, Marisa V; Mantovani, Hilário C

2015-10-01

Some species of ruminal bacteria are known to produce antimicrobial peptides, but the screening procedures have mostly been based on in vitro assays using standardized methods. Recent sequencing efforts have made available the genome sequences of hundreds of ruminal microorganisms. In this work, we performed genome mining of the complete and partial genome sequences of 224 ruminal bacteria and 5 ruminal archaea to determine the distribution and diversity of bacteriocin gene clusters. A total of 46 bacteriocin gene clusters were identified in 33 strains of ruminal bacteria. Twenty gene clusters were related to lanthipeptide biosynthesis, while 11 gene clusters were associated with sactipeptide production, 7 gene clusters were associated with class II bacteriocin production, and 8 gene clusters were associated with class III bacteriocin production. The frequency of strains whose genomes encode putative antimicrobial peptide precursors was 14.4%. Clusters related to the production of sactipeptides were identified for the first time among ruminal bacteria. BLAST analysis indicated that the majority of the gene clusters (88%) encoding putative lanthipeptides contained all the essential genes required for lanthipeptide biosynthesis. Most strains of Streptococcus (66.6%) harbored complete lanthipeptide gene clusters, in addition to an open reading frame encoding a putative class II bacteriocin. Albusin B-like proteins were found in 100% of the Ruminococcus albus strains screened in this study. The in silico analysis provided evidence of novel biosynthetic gene clusters in bacterial species not previously related to bacteriocin production, suggesting that the rumen microbiota represents an underexplored source of antimicrobial peptides. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Graph-based analysis of kinetics on multidimensional potential-energy surfaces.

PubMed

Okushima, T; Niiyama, T; Ikeda, K S; Shimizu, Y

2009-09-01

The aim of this paper is twofold: one is to give a detailed description of an alternative graph-based analysis method, which we call saddle connectivity graph, for analyzing the global topography and the dynamical properties of many-dimensional potential-energy landscapes and the other is to give examples of applications of this method in the analysis of the kinetics of realistic systems. A Dijkstra-type shortest path algorithm is proposed to extract dynamically dominant transition pathways by kinetically defining transition costs. The applicability of this approach is first confirmed by an illustrative example of a low-dimensional random potential. We then show that a coarse-graining procedure tailored for saddle connectivity graphs can be used to obtain the kinetic properties of 13- and 38-atom Lennard-Jones clusters. The coarse-graining method not only reduces the complexity of the graphs, but also, with iterative use, reveals a self-similar hierarchical structure in these clusters. We also propose that the self-similarity is common to many-atom Lennard-Jones clusters.
The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

PubMed

Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

2011-03-04

Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

Computer program documentation: ISOCLS iterative self-organizing clustering program, program C094

NASA Technical Reports Server (NTRS)

Minter, R. T. (Principal Investigator)

1972-01-01

The author has identified the following significant results. This program implements an algorithm which, ideally, sorts a given set of multivariate data points into similar groups or clusters. The program is intended for use in the evaluation of multispectral scanner data; however, the algorithm could be used for other data types as well. The user may specify a set of initial estimated cluster means to begin the procedure, or he may begin with the assumption that all the data belongs to one cluster. The procedure is initiatized by assigning each data point to the nearest (in absolute distance) cluster mean. If no initial cluster means were input, all of the data is assigned to cluster 1. The means and standard deviations are calculated for each cluster.
Determining the trophic guilds of fishes and macroinvertebrates in a seagrass food web

USGS Publications Warehouse

Luczkovich, J.J.; Ward, G.P.; Johnson, J.C.; Christian, R.R.; Baird, D.; Neckles, H.; Rizzo, W.M.

2002-01-01

We established trophic guilds of macroinvertebrate and fish taxa using correspondence analysis and a hierarchical clustering strategy for a seagrass food web in winter in the northeastern Gulf of Mexico. To create the diet matrix, we characterized the trophic linkages of macroinvertebrate and fish taxa. present in Hatodule wrightii seagrass habitat areas within the St. Marks National Wildlife Refuge (Florida) using binary data, combining dietary links obtained from relevant literature for macroinvertebrates with stomach analysis of common fishes collected during January and February of 1994. Heirarchical average-linkage cluster analysis of the 73 taxa of fishes and macroinvertebrates in the diet matrix yielded 14 clusters with diet similarity greater than or equal to 0.60. We then used correspondence analysis with three factors to jointly plot the coordinates of the consumers (identified by cluster membership) and of the 33 food sources. Correspondence analysis served as a visualization tool for assigning each taxon to one of eight trophic guilds: herbivores, detritivores, suspension feeders, omnivores, molluscivores, meiobenthos consumers, macrobenthos consumers, and piscivores. These trophic groups, cross-classified with major taxonomic groups, were further used to develop consumer compartments in a network analysis model of carbon flow in this seagrass ecosystem. The method presented here should greatly improve the development of future network models of food webs by providing an objective procedure for aggregating trophic groups.
The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study

PubMed Central

Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

2016-01-01

Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637
A Typology of Students Based on Academic Entitlement

ERIC Educational Resources Information Center

Luckett, Michael; Trocchia, Philip J.; Noel, Noel Mark; Marlin, Dan

2017-01-01

Two hundred ninety-three university business students were surveyed using an academic entitlement (AE) scale updated to include new technologies. Using factor analysis, three components of AE were identified: grade entitlement, behavioral entitlement, and service entitlement. A k-means clustering procedure was then applied to identify four groups…
Consequence of Winning: Interdisciplinary Analysis for Deontological Perspectives of Moral Function and the Interaction with Motivation in Division I College Athletes

ERIC Educational Resources Information Center

Orr, Brandon

2013-01-01

This is a pilot study of a proposed model for examining the main and interactionist effects of achievement goal orientations on moral function and the role of perceived ability as a potential moderator in sport morality levels through cluster analysis procedures. One hundred and three elite (103) athletes participating in Division I wrestling…
GRC RBCC Concept Multidisciplinary Analysis

NASA Technical Reports Server (NTRS)

Suresh, Ambady

2001-01-01

This report outlines the GRC RBCC Concept for Multidisciplinary Analysis. The multidisciplinary coupling procedure is presented, along with technique validations and axisymmetric multidisciplinary inlet and structural results. The NPSS (Numerical Propulsion System Simulation) test bed developments and code parallelization are also presented. These include milestones and accomplishments, a discussion of running R4 fan application on the PII cluster as compared to other platforms, and the National Combustor Code speedup.
Identifying and Assessing Interesting Subgroups in a Heterogeneous Population

PubMed Central

Lee, Woojoo; Alexeyenko, Andrey; Pernemalm, Maria; Guegan, Justine; Dessen, Philippe; Lazar, Vladimir; Lehtiö, Janne; Pawitan, Yudi

2015-01-01

Biological heterogeneity is common in many diseases and it is often the reason for therapeutic failures. Thus, there is great interest in classifying a disease into subtypes that have clinical significance in terms of prognosis or therapy response. One of the most popular methods to uncover unrecognized subtypes is cluster analysis. However, classical clustering methods such as k-means clustering or hierarchical clustering are not guaranteed to produce clinically interesting subtypes. This could be because the main statistical variability—the basis of cluster generation—is dominated by genes not associated with the clinical phenotype of interest. Furthermore, a strong prognostic factor might be relevant for a certain subgroup but not for the whole population; thus an analysis of the whole sample may not reveal this prognostic factor. To address these problems we investigate methods to identify and assess clinically interesting subgroups in a heterogeneous population. The identification step uses a clustering algorithm and to assess significance we use a false discovery rate- (FDR-) based measure. Under the heterogeneity condition the standard FDR estimate is shown to overestimate the true FDR value, but this is remedied by an improved FDR estimation procedure. As illustrations, two real data examples from gene expression studies of lung cancer are provided. PMID:26339613
GC-MS analyses and chemometric processing to discriminate the local and long-distance sources of PAHs associated to atmospheric PM2.5.

PubMed

Masiol, Mauro; Centanni, Elena; Squizzato, Stefania; Hofer, Angelika; Pecorari, Eliana; Rampazzo, Giancarlo; Pavoni, Bruno

2012-09-01

This study presents a procedure to differentiate the local and remote sources of particulate-bound polycyclic aromatic hydrocarbons (PAHs). Data were collected during an extended PM(2.5) sampling campaign (2009-2010) carried out for 1 year in Venice-Mestre, Italy, at three stations with different emissive scenarios: urban, industrial, and semirural background. Diagnostic ratios and factor analysis were initially applied to point out the most probable sources. In a second step, the areal distribution of the identified sources was studied by applying the discriminant analysis on factor scores. Third, samples collected in days with similar atmospheric circulation patterns were grouped using a cluster analysis on wind data. Local contributions to PM(2.5) and PAHs were then assessed by interpreting cluster results with chemical data. Results evidenced that significantly lower levels of PM(2.5) and PAHs were found when faster winds changed air masses, whereas in presence of scarce ventilation, locally emitted pollutants were trapped and concentrations increased. This way, an estimation of pollutant loads due to local sources can be derived from data collected in days with similar wind patterns. Long-range contributions were detected by a cluster analysis on the air mass back-trajectories. Results revealed that PM(2.5) concentrations were relatively high when air masses had passed over the Po Valley. However, external sources do not significantly contribute to the PAHs load. The proposed procedure can be applied to other environments with minor modifications, and the obtained information can be useful to design local and national air pollution control strategies.
A modified procedure for mixture-model clustering of regional geochemical data

USGS Publications Warehouse

Ellefsen, Karl J.; Smith, David B.; Horton, John D.

2014-01-01

A modified procedure is proposed for mixture-model clustering of regional-scale geochemical data. The key modification is the robust principal component transformation of the isometric log-ratio transforms of the element concentrations. This principal component transformation and the associated dimension reduction are applied before the data are clustered. The principal advantage of this modification is that it significantly improves the stability of the clustering. The principal disadvantage is that it requires subjective selection of the number of clusters and the number of principal components. To evaluate the efficacy of this modified procedure, it is applied to soil geochemical data that comprise 959 samples from the state of Colorado (USA) for which the concentrations of 44 elements are measured. The distributions of element concentrations that are derived from the mixture model and from the field samples are similar, indicating that the mixture model is a suitable representation of the transformed geochemical data. Each cluster and the associated distributions of the element concentrations are related to specific geologic and anthropogenic features. In this way, mixture model clustering facilitates interpretation of the regional geochemical data.
Shape analysis of H II regions - I. Statistical clustering

NASA Astrophysics Data System (ADS)

Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred

2018-07-01

We present here our shape analysis method for a sample of 76 Galactic H II regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation are linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorize H II regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionized by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilizing synthetic observations from numerical simulations of H II regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.
Shape Analysis of HII Regions - I. Statistical Clustering

NASA Astrophysics Data System (ADS)

Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred

2018-04-01

We present here our shape analysis method for a sample of 76 Galactic HII regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation is linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorise HII regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionised by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilising synthetic observations from numerical simulations of HII regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.
Clustering analysis of moving target signatures

NASA Astrophysics Data System (ADS)

Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto

2010-04-01

Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
AMMI adjustment for statistical analysis of an international wheat yield trial.

PubMed

Crossa, J; Fox, P N; Pfeiffer, W H; Rajaram, S; Gauch, H G

1991-01-01

Multilocation trials are important for the CIMMYT Bread Wheat Program in producing high-yielding, adapted lines for a wide range of environments. This study investigated procedures for improving predictive success of a yield trial, grouping environments and genotypes into homogeneous subsets, and determining the yield stability of 18 CIMMYT bread wheats evaluated at 25 locations. Additive Main effects and Multiplicative Interaction (AMMI) analysis gave more precise estimates of genotypic yields within locations than means across replicates. This precision facilitated formation by cluster analysis of more cohesive groups of genotypes and locations for biological interpretation of interactions than occurred with unadjusted means. Locations were clustered into two subsets for which genotypes with positive interactions manifested in high, stable yields were identified. The analyses highlighted superior selections with both broad and specific adaptation.
Procedures to handle inventory cluster plots that straddle two or more conditions

Treesearch

Jerold T. Hahn; Colin D. MacLean; Stanford L. Arner; William A. Bechtold

1995-01-01

We review the relative merits and field procedures for four basic plot designs to handle forest inventory plots that straddle two or more conditions, given that subplots will not be moved. A cluster design is recommended that combines fixed-area subplots and variable-radius plot (VRP) sampling. Each subplot in a cluster consists of a large fixed-area subplot for...
Influence of atmospheric transport on ozone and trace- level toxic air contaminants over the northeastern United States

NASA Astrophysics Data System (ADS)

Brankov, Elvira

This thesis presents a methodology for examining the relationship between synoptic-scale atmospheric transport patterns and observed pollutant concentration levels. It involves calculating a large number of back-trajectories from the observational site and subjecting them to cluster analysis. The pollutant concentration data observed at that site are then segregated according to the back-trajectory clusters. If the pollutant observations extend over several seasons, it is important to filter out seasonal and long-term components from the time series data before pollutant cluster-segregation, because only the short-term component of the time series data is related to the synoptic-scale transport. Multiple comparison procedures are used to test for significant differences in the chemical composition of pollutant data associated with each cluster. This procedure is useful in indicating potential pollutant source regions and isolating meteorological regimes associated with pollutant transport from those regions. If many observational sites are available, the spatial and temporal scales of the pollution transport from a given direction can be extracted through the time-lagged inter- site correlation analysis of pollutant concentrations. The proposed methodology is applicable to any pollutant at any site if sufficiently abundant data set is available. This is illustrated through examination of five-year long time series data of ozone concentrations at several sites in the Northeast. The results provide evidence of ozone transport to these sites, revealing the characteristic spatial and temporal scales involved in the transport and identifying source regions for this pollutant. Problems related to statistical analyses of censored data are addressed in the second half of this thesis. Although censoring (reporting concentrations in a non-quantitative way) is typical for trace-level measurements, methods for statistical analysis, inference and interpretation of such data are complex and still under development. In this study, multiple comparison of censored data sets was required in order to examine the influence of synoptic- scale circulations on concentration levels of several trace-level toxic pollutants observed in the Northeast (e.g., As, Se, Mn, V, etc.). Since the traditional multiple comparison procedures are not readily applicable to such data sets, a Monte Carlo simulation study was performed to assess several nonparametric methods for multiple comparison of censored data sets. Application of an appropriate comparison procedure to clusters of toxic trace elements observed in the Northeast led to the identification of potential source regions and atmospheric patterns associated with the long-range transport of these pollutants. A method for comparison of proportions and elemental ratio calculations were used to confirm/clarify these inferences with a greater degree of confidence.
Detection of Anomalies in Hydrometric Data Using Artificial Intelligence Techniques

NASA Astrophysics Data System (ADS)

Lauzon, N.; Lence, B. J.

2002-12-01

This work focuses on the detection of anomalies in hydrometric data sequences, such as 1) outliers, which are individual data having statistical properties that differ from those of the overall population; 2) shifts, which are sudden changes over time in the statistical properties of the historical records of data; and 3) trends, which are systematic changes over time in the statistical properties. For the purpose of the design and management of water resources systems, it is important to be aware of these anomalies in hydrometric data, for they can induce a bias in the estimation of water quantity and quality parameters. These anomalies may be viewed as specific patterns affecting the data, and therefore pattern recognition techniques can be used for identifying them. However, the number of possible patterns is very large for each type of anomaly and consequently large computing capacities are required to account for all possibilities using the standard statistical techniques, such as cluster analysis. Artificial intelligence techniques, such as the Kohonen neural network and fuzzy c-means, are clustering techniques commonly used for pattern recognition in several areas of engineering and have recently begun to be used for the analysis of natural systems. They require much less computing capacity than the standard statistical techniques, and therefore are well suited for the identification of outliers, shifts and trends in hydrometric data. This work constitutes a preliminary study, using synthetic data representing hydrometric data that can be found in Canada. The analysis of the results obtained shows that the Kohonen neural network and fuzzy c-means are reasonably successful in identifying anomalies. This work also addresses the problem of uncertainties inherent to the calibration procedures that fit the clusters to the possible patterns for both the Kohonen neural network and fuzzy c-means. Indeed, for the same database, different sets of clusters can be established with these calibration procedures. A simple method for analyzing uncertainties associated with the Kohonen neural network and fuzzy c-means is developed here. The method combines the results from several sets of clusters, either from the Kohonen neural network or fuzzy c-means, so as to provide an overall diagnosis as to the identification of outliers, shifts and trends. The results indicate an improvement in the performance for identifying anomalies when the method of combining cluster sets is used, compared with when only one cluster set is used.
Evaluation of Second-Level Inference in fMRI Analysis

PubMed Central

Roels, Sanne P.; Loeys, Tom; Moerkerke, Beatrijs

2016-01-01

We investigate the impact of decisions in the second-level (i.e., over subjects) inferential process in functional magnetic resonance imaging on (1) the balance between false positives and false negatives and on (2) the data-analytical stability, both proxies for the reproducibility of results. Second-level analysis based on a mass univariate approach typically consists of 3 phases. First, one proceeds via a general linear model for a test image that consists of pooled information from different subjects. We evaluate models that take into account first-level (within-subjects) variability and models that do not take into account this variability. Second, one proceeds via inference based on parametrical assumptions or via permutation-based inference. Third, we evaluate 3 commonly used procedures to address the multiple testing problem: familywise error rate correction, False Discovery Rate (FDR) correction, and a two-step procedure with minimal cluster size. Based on a simulation study and real data we find that the two-step procedure with minimal cluster size results in most stable results, followed by the familywise error rate correction. The FDR results in most variable results, for both permutation-based inference and parametrical inference. Modeling the subject-specific variability yields a better balance between false positives and false negatives when using parametric inference. PMID:26819578
HICOSMO - cosmology with a complete sample of galaxy clusters - I. Data analysis, sample selection and luminosity-mass scaling relation

NASA Astrophysics Data System (ADS)

Schellenberger, G.; Reiprich, T. H.

2017-08-01

The X-ray regime, where the most massive visible component of galaxy clusters, the intracluster medium, is visible, offers directly measured quantities, like the luminosity, and derived quantities, like the total mass, to characterize these objects. The aim of this project is to analyse a complete sample of galaxy clusters in detail and constrain cosmological parameters, like the matter density, Ωm, or the amplitude of initial density fluctuations, σ8. The purely X-ray flux-limited sample (HIFLUGCS) consists of the 64 X-ray brightest galaxy clusters, which are excellent targets to study the systematic effects, that can bias results. We analysed in total 196 Chandra observations of the 64 HIFLUGCS clusters, with a total exposure time of 7.7 Ms. Here, we present our data analysis procedure (including an automated substructure detection and an energy band optimization for surface brightness profile analysis) that gives individually determined, robust total mass estimates. These masses are tested against dynamical and Planck Sunyaev-Zeldovich (SZ) derived masses of the same clusters, where good overall agreement is found with the dynamical masses. The Planck SZ masses seem to show a mass-dependent bias to our hydrostatic masses; possible biases in this mass-mass comparison are discussed including the Planck selection function. Furthermore, we show the results for the (0.1-2.4) keV luminosity versus mass scaling relation. The overall slope of the sample (1.34) is in agreement with expectations and values from literature. Splitting the sample into galaxy groups and clusters reveals, even after a selection bias correction, that galaxy groups exhibit a significantly steeper slope (1.88) compared to clusters (1.06).
A segmentation/clustering model for the analysis of array CGH data.

PubMed

Picard, F; Robin, S; Lebarbier, E; Daudin, J-J

2007-09-01

Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.
Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique

ERIC Educational Resources Information Center

Steinley, Douglas

2006-01-01

Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate…

Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data

PubMed Central

Nevo, Daniel; Zucker, David M.; Tamimi, Rulla M.; Wang, Molin

2017-01-01

A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps–clustering and risk factor assessment. However, two sources of measurement error arise in these problems. The first is the measurement error in the biomarker values. The second is the misclassification error when assigning observations to clusters. We consider the case with a specified set of relevant markers and propose a unified single-likelihood approach for normally distributed biomarkers. As an alternative, we consider a two-step procedure with the tumor type misclassification error taken into account in the second-step risk factor analysis. We describe our method for binary data and also for survival analysis data using a modified version of the Cox model. We present asymptotic theory for the proposed estimators. Simulation results indicate that our methods significantly lower the bias with a small price being paid in terms of variance. We present an analysis of breast cancer data from the Nurses’ Health Study to demonstrate the utility of our method. PMID:27558651
A comparison of unsupervised classification procedures on LANDSAT MSS data for an area of complex surface conditions in Basilicata, Southern Italy

NASA Technical Reports Server (NTRS)

Justice, C.; Townshend, J. (Principal Investigator)

1981-01-01

Two unsupervised classification procedures were applied to ratioed and unratioed LANDSAT multispectral scanner data of an area of spatially complex vegetation and terrain. An objective accuracy assessment was undertaken on each classification and comparison was made of the classification accuracies. The two unsupervised procedures use the same clustering algorithm. By on procedure the entire area is clustered and by the other a representative sample of the area is clustered and the resulting statistics are extrapolated to the remaining area using a maximum likelihood classifier. Explanation is given of the major steps in the classification procedures including image preprocessing; classification; interpretation of cluster classes; and accuracy assessment. Of the four classifications undertaken, the monocluster block approach on the unratioed data gave the highest accuracy of 80% for five coarse cover classes. This accuracy was increased to 84% by applying a 3 x 3 contextual filter to the classified image. A detailed description and partial explanation is provided for the major misclassification. The classification of the unratioed data produced higher percentage accuracies than for the ratioed data and the monocluster block approach gave higher accuracies than clustering the entire area. The moncluster block approach was additionally the most economical in terms of computing time.
Subspace K-means clustering.

PubMed

Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

2013-12-01

To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Automatic script identification from images using cluster-based templates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hochberg, J.; Kerns, L.; Kelly, P.

We have developed a technique for automatically identifying the script used to generate a document that is stored electronically in bit image form. Our approach differs from previous work in that the distinctions among scripts are discovered by an automatic learning procedure, without any handson analysis. We first develop a set of representative symbols (templates) for each script in our database (Cyrillic, Roman, etc.). We do this by identifying all textual symbols in a set of training documents, scaling each symbol to a fixed size, clustering similar symbols, pruning minor clusters, and finding each cluster`s centroid. To identify a newmore » document`s script, we identify and scale a subset of symbols from the document and compare them to the templates for each script. We choose the script whose templates provide the best match. Our current system distinguishes among the Armenian, Burmese, Chinese, Cyrillic, Ethiopic, Greek, Hebrew, Japanese, Korean, Roman, and Thai scripts with over 90% accuracy.« less
Data evaluation of trace elements determined in Nigerian coal using cluster procedures.

PubMed

Ewa, I O B

2004-05-01

Large data-sets of elements determined by instrumental neutron activation analysis (INAA) require meaningful interpretation in order to determine the pattern of their existence in host matrices. This could be achieved using cluster procedures. Element abundances (Al, As, Ba, Br, Ca, Ce, Cs, Dy, Eu, Fe, Ga, Gd, Hf, K, La, Lu, Mg, Mn, Na, O, Rb, Sb, Sc, Sm, Sr, Ta, Tb, Th, Ti, U, V, Yb, Zn and Zr) of prepared and run-of-mine coals from eight principal mines (Onyeama, Ogbete, Enugu, Gombe, Asaba-Ugwashi, Okaba, Afikpo and Lafia ) in Nigeria were determined by INAA. Quality control of the measurements was assured by the re-determination of a standard reference material, NIST 1632a. These data-sets were then tested for multi-variate statistics using METHOD = SINGLE in the cluster procedure. The computer-assisted package SAS was used to generate the dendrograms while the algorithm used was stored Euclidean distances. The results showed a recognition pattern, useful for the interpretation of coalification histories and the prediction of fuel ranking for Nigerian coals. High segregation of coal fly ash was observed, while metallurgical coal grouped together with high-ranking coals of Okaba, Enugu and Obi (Lafia). Further work revealed some of these coals as having high gross calorific value (7908 kcal kg(-1) for Enugu coal; 7200 kcal kg(-1) for Okaba) and low sulphur thereby making them efficient fuel materials.
Classification of municipal occupations.

PubMed

Ilmarinen, J; Suurnäkki, T; Nygård, C H; Landau, K

1991-01-01

Eighty-eight job titles were analyzed with the "ergonomic job analysis procedure" [Arbeitswissenschaftliche Erhebungsverfahren zur Tätigkeits-analyse abbreviated (AET) in German]. The objective was to classify the wide range of municipal jobs into homogeneous groups according to job demand and to provide better possibilities to study the relationships between work and health among the aging municipal working population. Altogether 216 items were classified. First, a hierarchical cluster analysis was made, and a dendrogram of the analyzed job titles was drawn. Second, a profile analysis was done in which the single items were grouped into 39 sum items, and a graphic profile was drawn. Finally, the stress factors were listed and drawn in ranking order. The cluster analysis formed 13 groups. Groups exposed to the highest stress factor level were kitchen supervisors, dentists, and physicians. More than 10 stress factors (greater than 50% of the maximum) were found in nursing, administration, installation, transport, and technical supervision.
A hybrid clustering approach for multivariate time series - A case study applied to failure analysis in a gas turbine.

PubMed

Fontes, Cristiano Hora; Budman, Hector

2017-11-01

A clustering problem involving multivariate time series (MTS) requires the selection of similarity metrics. This paper shows the limitations of the PCA similarity factor (SPCA) as a single metric in nonlinear problems where there are differences in magnitude of the same process variables due to expected changes in operation conditions. A novel method for clustering MTS based on a combination between SPCA and the average-based Euclidean distance (AED) within a fuzzy clustering approach is proposed. Case studies involving either simulated or real industrial data collected from a large scale gas turbine are used to illustrate that the hybrid approach enhances the ability to recognize normal and fault operating patterns. This paper also proposes an oversampling procedure to create synthetic multivariate time series that can be useful in commonly occurring situations involving unbalanced data sets. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Analyzing coastal environments by means of functional data analysis

NASA Astrophysics Data System (ADS)

Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.

2017-07-01

Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.
Continuously Variable Rating: a new, simple and logical procedure to evaluate original scientific publications

PubMed Central

Silva, Mauricio Rocha e

2011-01-01

OBJECTIVE: Impact Factors (IF) are widely used surrogates to evaluate single articles, in spite of known shortcomings imposed by cite distribution skewness. We quantify this asymmetry and propose a simple computer-based procedure for evaluating individual articles. METHOD: (a) Analysis of symmetry. Journals clustered around nine Impact Factor points were selected from the medical “Subject Categories” in Journal Citation Reports 2010. Citable items published in 2008 were retrieved and ranked by granted citations over the Jan/2008 - Jun/2011 period. Frequency distribution of cites, normalized cumulative cites and absolute cites/decile were determined for each journal cluster. (b) Positive Predictive Value. Three arbitrarily established evaluation classes were generated: LOW (1.3≤IF<2.6); MID: (2.6≤IF<3.9); HIGH: (IF≥3.9). Positive Predictive Value for journal clusters within each class range was estimated. (c) Continuously Variable Rating. An alternative evaluation procedure is proposed to allow the rating of individually published articles in comparison to all articles published in the same journal within the same year of publication. The general guiding lines for the construction of a totally dedicated software program are delineated. RESULTS AND CONCLUSIONS: Skewness followed the Pareto Distribution for (1
Analyzing ZnO clusters through the density-functional theory.

PubMed

Zaragoza, Irineo-Pedro; Soriano-Agueda, Luis-Antonio; Hernández-Esparza, Raymundo; Vargas, Rubicelia; Garza, Jorge

2018-06-16

The potential energy surface of Zn n O n clusters (n = 2, 4, 6, 8) has been explored by using a simulated annealing method. For n = 2, 4, and 6, the CCSD(T)/TZP method was used as the reference, and from here it is shown that the M06-2X/TZP method gives the lowest deviations over PBE, PBE0, B3LYP, M06, and MP2 methods. Thus, with the M06-2X method we predict isomers of Zn n O n clusters, which coincide with some isomers reported previously. By using the atoms in molecules analysis, possible contacts between Zn and O atoms were found for all structures studied in this article. The bond paths involved in several clusters suggest that Zn n O n clusters can be obtained from the zincite (ZnO crystal), such an observation was confirmed for clusters with n = 2 - 9,18 and 20. The structure with n = 23 was obtained by the procedure presented here, from crystal information, which could be important to confirm experimental data delivered for n = 18 and 23.
Source Apportionment of Atmospheric Particles by Electron Probe X-Ray Microanalysis and Receptor Models.

NASA Astrophysics Data System (ADS)

van Borm, Werner August

Electron probe X-ray microanalysis (EPXMA) in combination with an automation system and an energy-dispersive X-ray detection system was used to analyse thousands of microscopical particles, originating from the ambient atmosphere. The huge amount of data was processed by a newly developed X-ray correction method and a number of data reduction procedures. A standardless ZAF procedure for EPXMA was developed for quick semi-quantitative analysis of particles starting from simple corrections, valid for bulk samples and modified taking into account the particle finit diameter, assuming a spherical shape. Tested on a limited database of bulk and particulate samples, the compromise between calculation speed and accuracy yielded for elements with Z > 14 accuracies on concentrations less than 10% while absolute deviations remained below 4 weight%, thus being only important for low concentrations. Next, the possibilities for the use of supervised and unsupervised multivariate particle classification were investigated for source apportionment of individual particles. In a detailed study of the unsupervised cluster analysis technique several aspects were considered, that have a severe influence on the final cluster analysis results, i.e. data acquisition, X-ray peak identification, data normalization, scaling, variable selection, similarity measure, cluster strategy, cluster significance and error propagation. A supervised approach was developed using an expert system-like approach in which identification rules are builded to describe the particle classes in a unique manner. Applications are presented for particles sampled (1) near a zinc smelter (Vieille-Montagne, Balen, Belgium), analyzed for heavy metals, (2) in an urban aerosol (Antwerp, Belgium), analyzed for over 20 elements and (3) in a rural aerosol originating from a swiss mountain area (Bern). Thus is was possible to pinpoint a number of known and unknown sources and characterize their emissions in terms of particles abundance and particle composition. Alternatively, the bulk analysis of filters (total, fine and coarse mode) using Particle Induced X -Ray Emission (PIXE) and the application of a receptor modeling approach provided for complementary information on a macroscopical level. A computer program was developed incorporating an absolute factor analysis based receptor modeling procedure. Source profiles and contributions are described by elemental concentrations and an atmospheric mass balance is put forward. The latter method was applied in a two year study of the Antwerp urban aerosol and for the swiss aerosol, revealing a number of previously known and unknown sources. Both methods were successfully combined to increase the source resolution.
Sampling procedures for inventory of commercial volume tree species in Amazon Forest.

PubMed

Netto, Sylvio P; Pelissari, Allan L; Cysneiros, Vinicius C; Bonazza, Marcelo; Sanquetta, Carlos R

2017-01-01

The spatial distribution of tropical tree species can affect the consistency of the estimators in commercial forest inventories, therefore, appropriate sampling procedures are required to survey species with different spatial patterns in the Amazon Forest. For this, the present study aims to evaluate the conventional sampling procedures and introduce the adaptive cluster sampling for volumetric inventories of Amazonian tree species, considering the hypotheses that the density, the spatial distribution and the zero-plots affect the consistency of the estimators, and that the adaptive cluster sampling allows to obtain more accurate volumetric estimation. We use data from a census carried out in Jamari National Forest, Brazil, where trees with diameters equal to or higher than 40 cm were measured in 1,355 plots. Species with different spatial patterns were selected and sampled with simple random sampling, systematic sampling, linear cluster sampling and adaptive cluster sampling, whereby the accuracy of the volumetric estimation and presence of zero-plots were evaluated. The sampling procedures applied to species were affected by the low density of trees and the large number of zero-plots, wherein the adaptive clusters allowed concentrating the sampling effort in plots with trees and, thus, agglutinating more representative samples to estimate the commercial volume.
Precision growth index using the clustering of cosmic structures and growth data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pouri, Athina; Basilakos, Spyros; Plionis, Manolis, E-mail: athpouri@phys.uoa.gr, E-mail: svasil@academyofathens.gr, E-mail: mplionis@physics.auth.gr

2014-08-01

We use the clustering properties of Luminous Red Galaxies (LRGs) and the growth rate data provided by the various galaxy surveys in order to constrain the growth index γ) of the linear matter fluctuations. We perform a standard χ{sup 2}-minimization procedure between theoretical expectations and data, followed by a joint likelihood analysis and we find a value of γ=0.56± 0.05, perfectly consistent with the expectations of the ΛCDM model, and Ω{sub m0} =0.29± 0.01, in very good agreement with the latest Planck results. Our analysis provides significantly more stringent growth index constraints with respect to previous studies, as indicated by the fact thatmore » the corresponding uncertainty is only ∼ 0.09 γ. Finally, allowing γ to vary with redshift in two manners (Taylor expansion around z=0, and Taylor expansion around the scale factor), we find that the combined statistical analysis between our clustering and literature growth data alleviates the degeneracy and obtain more stringent constraints with respect to other recent studies.« less
Distribution-based fuzzy clustering of electrical resistivity tomography images for interface detection

NASA Astrophysics Data System (ADS)

Ward, W. O. C.; Wilkinson, P. B.; Chambers, J. E.; Oxby, L. S.; Bai, L.

2014-04-01

A novel method for the effective identification of bedrock subsurface elevation from electrical resistivity tomography images is described. Identifying subsurface boundaries in the topographic data can be difficult due to smoothness constraints used in inversion, so a statistical population-based approach is used that extends previous work in calculating isoresistivity surfaces. The analysis framework involves a procedure for guiding a clustering approach based on the fuzzy c-means algorithm. An approximation of resistivity distributions, found using kernel density estimation, was utilized as a means of guiding the cluster centroids used to classify data. A fuzzy method was chosen over hard clustering due to uncertainty in hard edges in the topography data, and a measure of clustering uncertainty was identified based on the reciprocal of cluster membership. The algorithm was validated using a direct comparison of known observed bedrock depths at two 3-D survey sites, using real-time GPS information of exposed bedrock by quarrying on one site, and borehole logs at the other. Results show similarly accurate detection as a leading isosurface estimation method, and the proposed algorithm requires significantly less user input and prior site knowledge. Furthermore, the method is effectively dimension-independent and will scale to data of increased spatial dimensions without a significant effect on the runtime. A discussion on the results by automated versus supervised analysis is also presented.
Phenotypes of sleeplessness: stressing the need for psychodiagnostics in the assessment of insomnia.

PubMed

van de Laar, Merijn; Leufkens, Tim; Bakker, Bart; Pevernagie, Dirk; Overeem, Sebastiaan

2017-09-01

Insomnia is a too general term for various subtypes that might have different etiologies and therefore require different types of treatment. In this explorative study we used cluster analysis to distinguish different phenotypes in 218 patients with insomnia, taking into account several factors including sleep variables and characteristics related to personality and psychiatric comorbidity. Three clusters emerged from the analysis. The 'moderate insomnia with low psychopathology'-cluster was characterized by relatively normal personality traits, as well as normal levels of anxiety and depressive symptoms in the presence of moderate insomnia severity. The 'severe insomnia with moderate psychopathology'-cluster showed relatively high scores on the Insomnia Severity Index and scores on the sleep log that were indicative for severe insomnia. Anxiety and depressive symptoms were slightly above the cut-off and they were characterized by below average self-sufficiency and less goal-directed behavior. The 'early onset insomnia with high psychopathology'-cluster showed a much younger age and earlier insomnia onset than the other two groups. Anxiety and depressive symptoms were well above the cut-off score and the group consisted of a higher percentage of subjects with comorbid psychiatric disorders. This cluster showed a 'typical psychiatric' personality profile. Our findings stress the need for psychodiagnostic procedures next to a sleep-related diagnostic approach, especially in the younger insomnia patients. Specific treatment suggestions are given based on the three phenotypes.
Improving clustering with metabolic pathway data.

PubMed

Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando

2014-04-10

It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.
Chapter 7. Cloning and analysis of natural product pathways.

PubMed

Gust, Bertolt

2009-01-01

The identification of gene clusters of natural products has lead to an enormous wealth of information about their biosynthesis and its regulation, and about self-resistance mechanisms. Well-established routine techniques are now available for the cloning and sequencing of gene clusters. The subsequent functional analysis of the complex biosynthetic machinery requires efficient genetic tools for manipulation. Until recently, techniques for the introduction of defined changes into Streptomyces chromosomes were very time-consuming. In particular, manipulation of large DNA fragments has been challenging due to the absence of suitable restriction sites for restriction- and ligation-based techniques. The homologous recombination approach called recombineering (referred to as Red/ET-mediated recombination in this chapter) has greatly facilitated targeted genetic modifications of complex biosynthetic pathways from actinomycetes by eliminating many of the time-consuming and labor-intensive steps. This chapter describes techniques for the cloning and identification of biosynthetic gene clusters, for the generation of gene replacements within such clusters, for the construction of integrative library clones and their expression in heterologous hosts, and for the assembly of entire biosynthetic gene clusters from the inserts of individual library clones. A systematic approach toward insertional mutation of a complete Streptomyces genome is shown by the use of an in vitro transposon mutagenesis procedure.
THE PREPARATION OF CURRICULUM MATERIALS AND THE DEVELOPMENT OF TEACHERS FOR AN EXPERIMENTAL APPLICATION OF THE CLUSTER CONCEPT OF VOCATIONAL EDUCATION AT THE SECONDARY SCHOOL LEVEL. VOLUME II, INSTRUCTIONAL PLANS FOR THE CONSTRUCTION CLUSTER.

ERIC Educational Resources Information Center

MALEY, DONALD

DESIGNED FOR USE WITH 11TH AND 12TH GRADE STUDENTS, THIS CURRICULUM GUIDE FOR THE OCCUPATIONAL CLUSTER IN CONSTRUCTION WAS DEVELOPED BY PARTICIPATING TEACHERS FROM RESULTS OF THE RESEARCH PROCEDURES DESCRIBED IN VOLUME I (VT 004 162). THE COURSE DESCRIPTION, NEED FOR THE COURSE, COURSE OBJECTIVES, PROCEDURE, AND INSTRUCTIONAL PLAN ARE DISCUSSED…
WAIS-III index score profiles in the Canadian standardization sample.

PubMed

Lange, Rael T

2007-01-01

Representative index score profiles were examined in the Canadian standardization sample of the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). The identification of profile patterns was based on the methodology proposed by Lange, Iverson, Senior, and Chelune (2002) that aims to maximize the influence of profile shape and minimize the influence of profile magnitude on the cluster solution. A two-step cluster analysis procedure was used (i.e., hierarchical and k-means analyses). Cluster analysis of the four index scores (i.e., Verbal Comprehension [VCI], Perceptual Organization [POI], Working Memory [WMI], Processing Speed [PSI]) identified six profiles in this sample. Profiles were differentiated by pattern of performance and were primarily characterized as (a) high VCI/POI, low WMI/PSI, (b) low VCI/POI, high WMI/PSI, (c) high PSI, (d) low PSI, (e) high VCI/WMI, low POI/PSI, and (f) low VCI, high POI. These profiles are potentially useful for determining whether a patient's WAIS-III performance is unusual in a normal population.
Latent Cluster Analysis of Instructional Practices Reported by High- and Low-Performing Mathematics Teachers in Four Countries

ERIC Educational Resources Information Center

Cheng, Qiang; Hsu, Hsien-Yuan

2017-01-01

Using Trends in International Mathematics and Science Study (TIMSS) 2011 eighth-grade international dataset, this study explored the profiles of instructional practices reported by high- and low-performing mathematics teachers across the US, Finland, Korea, and Russia. Concepts of conceptual teaching and procedural teaching were used to frame the…

Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data.

PubMed

Nevo, Daniel; Zucker, David M; Tamimi, Rulla M; Wang, Molin

2016-12-30

A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps-clustering and risk factor assessment. However, two sources of measurement error arise in these problems. The first is the measurement error in the biomarker values. The second is the misclassification error when assigning observations to clusters. We consider the case with a specified set of relevant markers and propose a unified single-likelihood approach for normally distributed biomarkers. As an alternative, we consider a two-step procedure with the tumor type misclassification error taken into account in the second-step risk factor analysis. We describe our method for binary data and also for survival analysis data using a modified version of the Cox model. We present asymptotic theory for the proposed estimators. Simulation results indicate that our methods significantly lower the bias with a small price being paid in terms of variance. We present an analysis of breast cancer data from the Nurses' Health Study to demonstrate the utility of our method. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

PubMed Central

Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

2016-01-01

Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
Sample size estimation for alternating logistic regressions analysis of multilevel randomized community trials of under-age drinking.

PubMed

Reboussin, Beth A; Preisser, John S; Song, Eun-Young; Wolfson, Mark

2012-07-01

Under-age drinking is an enormous public health issue in the USA. Evidence that community level structures may impact on under-age drinking has led to a proliferation of efforts to change the environment surrounding the use of alcohol. Although the focus of these efforts is to reduce drinking by individual youths, environmental interventions are typically implemented at the community level with entire communities randomized to the same intervention condition. A distinct feature of these trials is the tendency of the behaviours of individuals residing in the same community to be more alike than that of others residing in different communities, which is herein called 'clustering'. Statistical analyses and sample size calculations must account for this clustering to avoid type I errors and to ensure an appropriately powered trial. Clustering itself may also be of scientific interest. We consider the alternating logistic regressions procedure within the population-averaged modelling framework to estimate the effect of a law enforcement intervention on the prevalence of under-age drinking behaviours while modelling the clustering at multiple levels, e.g. within communities and within neighbourhoods nested within communities, by using pairwise odds ratios. We then derive sample size formulae for estimating intervention effects when planning a post-test-only or repeated cross-sectional community-randomized trial using the alternating logistic regressions procedure.
Applications of modern statistical methods to analysis of data in physical science

NASA Astrophysics Data System (ADS)

Wicker, James Eric

Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.
THE PREPARATION OF CURRICULUM MATERIALS AND THE DEVELOPMENT OF TEACHERS FOR AN EXPERIMENTAL APPLICATION OF THE CLUSTER CONCEPT OF VOCATIONAL EDUCATION AT THE SECONDARY SCHOOL LEVEL. VOLUME III, INSTRUCTIONAL PLANS FOR THE METAL FORMING AND FABRICATION CLUSTER.

ERIC Educational Resources Information Center

MALEY, DONALD

DESIGNED FOR USE WITH 11TH AND 12TH GRADE STUDENTS, THIS CURRICULUM GUIDE FOR THE OCCUPATIONAL CLUSTER IN METAL FORMING AND FABRICATION WAS DEVELOPED BY PARTICIPATING TEACHERS FROM RESULTS OF THE RESEARCH PROCEDURES DESCRIBED IN VOLUME I (VT 004 162). THE COURSE DESCRIPTION, NEED FOR THE COURSE, COURSE OBJECTIVES, PROCEDURES AND INSTRUCTIONAL PLAN…
THE PREPARATION OF CURRICULUM MATERIALS AND THE DEVELOPMENT OF TEACHERS FOR AN EXPERIMENTAL APPLICATION OF THE CLUSTER CONCEPT OF VOCATIONAL EDUCATION AT THE SECONDARY SCHOOL LEVEL. VOLUME IV, INSTRUCTIONAL PLANS FOR THE ELECTRO-MECHANICAL CLUSTER.

ERIC Educational Resources Information Center

MALEY, DONALD

DESIGNED FOR USE WITH 11TH AND 12TH GRADE STUDENTS, THIS CURRICULUM GUIDE FOR THE OCCUPATIONAL CLUSTER IN ELECTRO-MECHANICAL INSTALLATION AND REPAIR WAS DEVELOPED BY PARTICIPATING TEACHERS FROM RESULTS OF THE RESEARCH PROCEDURES DESCRIBED IN VOLUME I (VT 004 162). THE COURSE DESCRIPTIONS, NEED FOR THE COURSE, COURSE OBJECTIVES, PROCEDURES, AND…
antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters

PubMed Central

Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko

2015-01-01

Abstract Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. PMID:25948579
BESIU Physical Analysis on Hadoop Platform

NASA Astrophysics Data System (ADS)

Huo, Jing; Zang, Dongsong; Lei, Xiaofeng; Li, Qiang; Sun, Gongxing

2014-06-01

In the past 20 years, computing cluster has been widely used for High Energy Physics data processing. The jobs running on the traditional cluster with a Data-to-Computing structure, have to read large volumes of data via the network to the computing nodes for analysis, thereby making the I/O latency become a bottleneck of the whole system. The new distributed computing technology based on the MapReduce programming model has many advantages, such as high concurrency, high scalability and high fault tolerance, and it can benefit us in dealing with Big Data. This paper brings the idea of using MapReduce model to do BESIII physical analysis, and presents a new data analysis system structure based on Hadoop platform, which not only greatly improve the efficiency of data analysis, but also reduces the cost of system building. Moreover, this paper establishes an event pre-selection system based on the event level metadata(TAGs) database to optimize the data analyzing procedure.
Is antibody clustering predictive of clinical subsets and damage in systemic lupus erythematosus?

PubMed

To, C H; Petri, M

2005-12-01

To examine autoantibody clusters and their associations with clinical features and organ damage accrual in patients with systemic lupus erythematosus (SLE). The study group comprised 1,357 consecutive patients with SLE who were recruited to participate in a prospective longitudinal cohort study. In the cohort, 92.6% of the patients were women, the mean +/- SD age of the patients was 41.3 +/- 12.7 years, 55.9% were Caucasian, 39.1% were African American, and 5% were Asian. Seven autoantibodies (anti-double-stranded DNA [anti-dsDNA], anti-Sm, anti-Ro, anti-La, anti-RNP, lupus anticoagulant (LAC), and anticardiolipin antibody [aCL]) were selected for cluster analysis using the K-means cluster analysis procedure. Three distinct autoantibody clusters were identified: cluster 1 (anti-Sm and anti-RNP), cluster 2 (anti-dsDNA, anti-Ro, and anti-La), and cluster 3 (anti-dsDNA, LAC, and aCL). Patients in cluster 1 (n = 451), when compared with patients in clusters 2 (n = 470) and 3 (n = 436), had the lowest incidence of proteinuria (39.7%), anemia (52.8%), lymphopenia (33.9%), and thrombocytopenia (13.7%). The incidence of nephrotic syndrome and leukopenia was also lower in cluster 1 than in cluster 2. Cluster 2 had the highest female-to-male ratio (22:1) and the greatest proportion of Asian patients. Among the 3 clusters, cluster 2 had significantly more patients presenting with secondary Sjögren's syndrome (15.7%). Cluster 3, when compared with the other 2 clusters, consisted of more Caucasian and fewer African American patients and was characterized by the highest incidence of arterial thrombosis (17.4%), venous thrombosis (25.7%), and livedo reticularis (31.4%). By using the Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index, the greatest frequency of nephrotic syndrome (8.9%) was observed in patients in cluster 2, whereas cluster 3 patients had the highest percentage of damage due to cerebrovascular accident (12.8%) and venous thrombosis (7.8%). Osteoporotic fracture (11.9%) was also more common in cluster 3 than in cluster 2. Autoantibody clustering is a valuable tool to differentiate between various subsets of SLE, allowing prediction of subsequent clinical course and organ damage.
Adaptive clustering procedure for continuous gravitational wave searches

NASA Astrophysics Data System (ADS)

Singh, Avneet; Papa, Maria Alessandra; Eggenstein, Heinz-Bernd; Walsh, Sinéad

2017-10-01

In hierarchical searches for continuous gravitational waves, clustering of candidates is an important post-processing step because it reduces the number of noise candidates that are followed up at successive stages [J. Aasi et al., Phys. Rev. Lett. 88, 102002 (2013), 10.1103/PhysRevD.88.102002; B. Behnke, M. A. Papa, and R. Prix, Phys. Rev. D 91, 064007 (2015), 10.1103/PhysRevD.91.064007; M. A. Papa et al., Phys. Rev. D 94, 122006 (2016), 10.1103/PhysRevD.94.122006]. Previous clustering procedures bundled together nearby candidates ascribing them to the same root cause (be it a signal or a disturbance), based on a predefined cluster volume. In this paper, we present a procedure that adapts the cluster volume to the data itself and checks for consistency of such volume with what is expected from a signal. This significantly improves the noise rejection capabilities at fixed detection threshold, and at fixed computing resources for the follow-up stages, this results in an overall more sensitive search. This new procedure was employed in the first Einstein@Home search on data from the first science run of the advanced LIGO detectors (O1) [LIGO Scientific Collaboration and Virgo Collaboration, arXiv:1707.02669 [Phys. Rev. D (to be published)
Multi-Sample Cluster Analysis Using Akaike’s Information Criterion.

DTIC Science & Technology

1982-12-20

Intervals. For more details on these test procedures refer to Gabriel [7J, Krishnaiah (CIlUj, [11]), Srivastava [16), and others. -3- As noted in Consul...723. (4] Consul, P. C. (1969), "The Exact Distributions of Likelihood Criteria for Different Hypotheses," in P. R. Krishnaiah (Ed.), Multivariate...1178. [7] Gabriel, K. R. (1969), "A Comparison of Some lethods of Simultaneous Inference in MANOVA," in P. R. Krishnaiah (Ed.), Multivariate Analysis-lI
Biclustering of gene expression data using reactive greedy randomized adaptive search procedure.

PubMed

Dharan, Smitha; Nair, Achuthsankar S

2009-01-30

Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts.
New insights into old methods for identifying causal rare variants.

PubMed

Wang, Haitian; Huang, Chien-Hsun; Lo, Shaw-Hwa; Zheng, Tian; Hu, Inchi

2011-11-29

The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.
Microforms in gravel bed rivers: Formation, disintegration, and effects on bedload transport

USGS Publications Warehouse

Strom, K.; Papanicolaou, A.N.; Evangelopoulos, N.; Odeh, M.

2004-01-01

This research aims to advance current knowledge on cluster formation and evolution by tackling some of the aspects associated with cluster microtopography and the effects of clusters on bedload transport. The specific objectives of the study are (1) to identify the bed shear stress range in which clusters form and disintegrate, (2) to quantitatively describe the spacing characteristics and orientation of clusters with respect to flow characteristics, (3) to quantify the effects clusters have on the mean bedload rate, and (4) to assess the effects of clusters on the pulsating nature of bedload. In order to meet the objectives of this study, two main experimental scenarios, namely, Test Series A and B (20 experiments overall) are considered in a laboratory flume under well-controlled conditions. Series A tests are performed to address objectives (1) and (2) while Series B is designed to meet objectives (3) and (4). Results show that cluster microforms develop in uniform sediment at 1.25 to 2 times the Shields parameter of an individual particle and start disintegrating at about 2.25 times the Shields parameter. It is found that during an unsteady flow event, effects of clusters on bedload transport rate can be classified in three different phases: a sink phase where clusters absorb incoming sediment, a neutral phase where clusters do not affect bedload, and a source phase where clusters release particles. Clusters also increase the magnitude of the fluctuations in bedload transport rate, showing that clusters amplify the unsteady nature of bedload transport. A fourth-order autoregressive, autoregressive integrated moving average model is employed to describe the time series of bedload and provide a predictive formula for predicting bedload at different periods. Finally, a change-point analysis enhanced with a binary segmentation procedure is performed to identify the abrupt changes in the bedload statistic characteristics due to the effects of clusters and detect the different phases in bedload time series using probability theory. The analysis verifies the experimental findings that three phases are detected in the bedload rate time series structure, namely, sink, neutral, and source. ?? ASCE / JUNE 2004.
Clustering and group selection of multiple criteria alternatives with application to space-based networks.

PubMed

Malakooti, Behnam; Yang, Ziyong

2004-02-01

In many real-world problems, the range of consequences of different alternatives are considerably different. In addition, sometimes, selection of a group of alternatives (instead of only one best alternative) is necessary. Traditional decision making approaches treat the set of alternatives with the same method of analysis and selection. In this paper, we propose clustering alternatives into different groups so that different methods of analysis, selection, and implementation for each group can be applied. As an example, consider the selection of a group of functions (or tasks) to be processed by a group of processors. The set of tasks can be grouped according to their similar criteria, and hence, each cluster of tasks to be processed by a processor. The selection of the best alternative for each clustered group can be performed using existing methods; however, the process of selecting groups is different than the process of selecting alternatives within a group. We develop theories and procedures for clustering discrete multiple criteria alternatives. We also demonstrate how the set of alternatives is clustered into mutually exclusive groups based on 1) similar features among alternatives; 2) ideal (or most representative) alternatives given by the decision maker; and 3) other preferential information of the decision maker. The clustering of multiple criteria alternatives also has the following advantages. 1) It decreases the set of alternatives to be considered by the decision maker (for example, different decision makers are assigned to different groups of alternatives). 2) It decreases the number of criteria. 3) It may provide a different approach for analyzing multiple decision makers problems. Each decision maker may cluster alternatives differently, and hence, clustering of alternatives may provide a basis for negotiation. The developed approach is applicable for solving a class of telecommunication networks problems where a set of objects (such as routers, processors, or intelligent autonomous vehicles) are to be clustered into similar groups. Objects are clustered based on several criteria and the decision maker's preferences.
The Wilcoxon signed rank test for paired comparisons of clustered data.

PubMed

Rosner, Bernard; Glynn, Robert J; Lee, Mei-Ling T

2006-03-01

The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.
Training Effectiveness and Cost Iterative Technique (TECIT). Volume 2. Cost Effectiveness Analysis

DTIC Science & Technology

1988-07-01

Moving Tank in a Field Exercise A The task cluster identified as tank commander’s station/tank gunnery and the sub-task of firing an M250 grenade launcher...Firing Procedures, Task Number 171-126-1028. I OBJECTIVE: Given an Ml tank with crew, loaded M250 I grenade launcher, the commander’s station powered up
Globular and Open Clusters Observed by SDSS/SEGUE: the Giant Stars

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morrison, Heather L.; Ma, Zhibo; Clem, James L.

We present griz observations for the clusters M92, M13 and NGC 6791 and gr photometry for M71, Be 29 and NGC 7789. In addition we present new membership identifications for all these clusters, which have been observed spectroscopically as calibrators for the SDSS/SEGUE survey; this paper focuses in particular on the red giant branch stars in the clusters. In a number of cases, these giants were too bright to be observed in the normal SDSS survey operations, and we describe the procedure used to obtain spectra for these stars. For M71, also present a new variable reddening map and amore » new fiducial for the gr giant branch. For NGC 7789, we derived a transformation from Teff to g-r for giants of near solar abundance, using IRFM Teff measures of stars with good ugriz and 2MASS photometry and SEGUE spectra. The result of our analysis is a robust list of known cluster members with correctly dereddened and (if needed) transformed gr photometry for crucial calibration efforts for SDSS and SEGUE.« less
Globular and Open Clusters Observed by SDSS/SEGUE: The Giant Stars

NASA Astrophysics Data System (ADS)

Morrison, Heather L.; Ma, Zhibo; Clem, James L.; An, Deokkeun; Connor, Thomas; Schechtman-Rook, Andrew; Casagrande, Luca; Rockosi, Constance; Yanny, Brian; Harding, Paul; Beers, Timothy C.; Johnson, Jennifer A.; Schneider, Donald P.

2016-01-01

We present griz observations for the clusters M92, M13 and NGC 6791 and gr photometry for M71, Be 29 and NGC 7789. In addition we present new membership identifications for all these clusters, which have been observed spectroscopically as calibrators for the Sloan Digital Sky Survey (SDSS)/SEGUE survey; this paper focuses in particular on the red giant branch stars in the clusters. In a number of cases, these giants were too bright to be observed in the normal SDSS survey operations, and we describe the procedure used to obtain spectra for these stars. For M71, we also present a new variable reddening map and a new fiducial for the gr giant branch. For NGC 7789, we derived a transformation from Teff to g-r for giants of near solar abundance, using IRFM Teff measures of stars with good ugriz and 2MASS photometry and SEGUE spectra. The result of our analysis is a robust list of known cluster members with correctly dereddened and (if needed) transformed gr photometry for crucial calibration efforts for SDSS and SEGUE.
Globular and Open Clusters Observed by SDSS/SEGUE: the Giant Stars

DOE PAGES

Morrison, Heather L.; Ma, Zhibo; Clem, James L.; ...

2015-12-18

We present griz observations for the clusters M92, M13 and NGC 6791 and gr photometry for M71, Be 29 and NGC 7789. In addition we present new membership identifications for all these clusters, which have been observed spectroscopically as calibrators for the SDSS/SEGUE survey; this paper focuses in particular on the red giant branch stars in the clusters. In a number of cases, these giants were too bright to be observed in the normal SDSS survey operations, and we describe the procedure used to obtain spectra for these stars. For M71, also present a new variable reddening map and amore » new fiducial for the gr giant branch. For NGC 7789, we derived a transformation from Teff to g-r for giants of near solar abundance, using IRFM Teff measures of stars with good ugriz and 2MASS photometry and SEGUE spectra. The result of our analysis is a robust list of known cluster members with correctly dereddened and (if needed) transformed gr photometry for crucial calibration efforts for SDSS and SEGUE.« less

Motion estimation using point cluster method and Kalman filter.

PubMed

Senesh, M; Wolf, A

2009-05-01

The most frequently used method in a three dimensional human gait analysis involves placing markers on the skin of the analyzed segment. This introduces a significant artifact, which strongly influences the bone position and orientation and joint kinematic estimates. In this study, we tested and evaluated the effect of adding a Kalman filter procedure to the previously reported point cluster technique (PCT) in the estimation of a rigid body motion. We demonstrated the procedures by motion analysis of a compound planar pendulum from indirect opto-electronic measurements of markers attached to an elastic appendage that is restrained to slide along the rigid body long axis. The elastic frequency is close to the pendulum frequency, as in the biomechanical problem, where the soft tissue frequency content is similar to the actual movement of the bones. Comparison of the real pendulum angle to that obtained by several estimation procedures--PCT, Kalman filter followed by PCT, and low pass filter followed by PCT--enables evaluation of the accuracy of the procedures. When comparing the maximal amplitude, no effect was noted by adding the Kalman filter; however, a closer look at the signal revealed that the estimated angle based only on the PCT method was very noisy with fluctuation, while the estimated angle based on the Kalman filter followed by the PCT was a smooth signal. It was also noted that the instantaneous frequencies obtained from the estimated angle based on the PCT method is more dispersed than those obtained from the estimated angle based on Kalman filter followed by the PCT method. Addition of a Kalman filter to the PCT method in the estimation procedure of rigid body motion results in a smoother signal that better represents the real motion, with less signal distortion than when using a digital low pass filter. Furthermore, it can be concluded that adding a Kalman filter to the PCT procedure substantially reduces the dispersion of the maximal and minimal instantaneous frequencies.
Adding Dimensions to the Analysis of the Quality of Health Information of Websites Returned by Google: Cluster Analysis Identifies Patterns of Websites According to their Classification and the Type of Intervention Described.

PubMed

Yaqub, Mubashar; Ghezzi, Pietro

2015-01-01

Most of the instruments used to assess the quality of health information on the Web (e.g., the JAMA criteria) only analyze one dimension of information quality (IQ), trustworthiness. In this study, we analyzed the type of intervention that websites describe, whether supported by evidence-based medicine (EBM) or not, to provide a further dimension of IQ, accuracy, and correlated this with the established criteria. We searched Google for "migraine cure" and analyzed the first 200 websites for: (1) JAMA criteria (authorship, attribution, disclosure, currency); (2) class of websites (commercial, health portals, professional, patient groups, no-profit); and (3) type of intervention described (approved drugs, alternative medicine, food, procedures, lifestyle, drugs still at the research stage). We used hierarchical cluster analysis to identify different patterns of websites according to their classification and the information provided. Subgroup analysis on the first 10 websites returned was performed. Google returned health portals (44%), followed by commercial websites (31%) and journalism websites (11%). The type of intervention mentioned most often was alternative medicine (55%), followed by procedures (49%), lifestyle (42%), food (41%), and approved drugs (35%). Cluster analysis indicated that health portals are more likely to describe more than one type of treatment while commercial websites most often describe only one. The average JAMA score of commercial websites was significantly lower than for health portals or journalism websites, and this was mainly due to lack of information on the authors of the text and indication of the date the information was written. Looking at the first 10 websites from Google, commercial websites are underrepresented and approved drugs overrepresented. Analyzing the type of therapies/prevention methods provides additional information to the trustworthiness measures, such as the JAMA score, and could be a convenient and objective indicator of websites whose information is based on EBM.
Evaluation of the procedure 1A component of the 1980 US/Canada wheat and barley exploratory experiment

NASA Technical Reports Server (NTRS)

Chapman, G. M. (Principal Investigator); Carnes, J. G.

1981-01-01

Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.
Clusters of cultures: diversity in meaning of family value and gender role items across Europe.

PubMed

van Vlimmeren, Eva; Moors, Guy B D; Gelissen, John P T M

2017-01-01

Survey data are often used to map cultural diversity by aggregating scores of attitude and value items across countries. However, this procedure only makes sense if the same concept is measured in all countries. In this study we argue that when (co)variances among sets of items are similar across countries, these countries share a common way of assigning meaning to the items. Clusters of cultures can then be observed by doing a cluster analysis on the (co)variance matrices of sets of related items. This study focuses on family values and gender role attitudes. We find four clusters of cultures that assign a distinct meaning to these items, especially in the case of gender roles. Some of these differences reflect response style behavior in the form of acquiescence. Adjusting for this style effect impacts on country comparisons hence demonstrating the usefulness of investigating the patterns of meaning given to sets of items prior to aggregating scores into cultural characteristics.
Cluster Analysis of Weighted Bipartite Networks: A New Copula-Based Approach

PubMed Central

Chessa, Alessandro; Crimaldi, Irene; Riccaboni, Massimo; Trapin, Luca

2014-01-01

In this work we are interested in identifying clusters of “positional equivalent” actors, i.e. actors who play a similar role in a system. In particular, we analyze weighted bipartite networks that describes the relationships between actors on one side and features or traits on the other, together with the intensity level to which actors show their features. We develop a methodological approach that takes into account the underlying multivariate dependence among groups of actors. The idea is that positions in a network could be defined on the basis of the similar intensity levels that the actors exhibit in expressing some features, instead of just considering relationships that actors hold with each others. Moreover, we propose a new clustering procedure that exploits the potentiality of copula functions, a mathematical instrument for the modelization of the stochastic dependence structure. Our clustering algorithm can be applied both to binary and real-valued matrices. We validate it with simulations and applications to real-world data. PMID:25303095
Optimized data fusion for K-means Laplacian clustering

PubMed Central

Yu, Shi; Liu, Xinhai; Tranchevent, Léon-Charles; Glänzel, Wolfgang; Suykens, Johan A. K.; De Moor, Bart; Moreau, Yves

2011-01-01

Motivation: We propose a novel algorithm to combine multiple kernels and Laplacians for clustering analysis. The new algorithm is formulated on a Rayleigh quotient objective function and is solved as a bi-level alternating minimization procedure. Using the proposed algorithm, the coefficients of kernels and Laplacians can be optimized automatically. Results: Three variants of the algorithm are proposed. The performance is systematically validated on two real-life data fusion applications. The proposed Optimized Kernel Laplacian Clustering (OKLC) algorithms perform significantly better than other methods. Moreover, the coefficients of kernels and Laplacians optimized by OKLC show some correlation with the rank of performance of individual data source. Though in our evaluation the K values are predefined, in practical studies, the optimal cluster number can be consistently estimated from the eigenspectrum of the combined kernel Laplacian matrix. Availability: The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/oklc.html. Contact: shiyu@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20980271
A Spatial Division Clustering Method and Low Dimensional Feature Extraction Technique Based Indoor Positioning System

PubMed Central

Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao

2014-01-01

Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470
A cluster expansion model for predicting activation barrier of atomic processes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rehman, Tafizur; Jaipal, M.; Chatterjee, Abhijit, E-mail: achatter@iitk.ac.in

2013-06-15

We introduce a procedure based on cluster expansion models for predicting the activation barrier of atomic processes encountered while studying the dynamics of a material system using the kinetic Monte Carlo (KMC) method. Starting with an interatomic potential description, a mathematical derivation is presented to show that the local environment dependence of the activation barrier can be captured using cluster interaction models. Next, we develop a systematic procedure for training the cluster interaction model on-the-fly, which involves: (i) obtaining activation barriers for handful local environments using nudged elastic band (NEB) calculations, (ii) identifying the local environment by analyzing the NEBmore » results, and (iii) estimating the cluster interaction model parameters from the activation barrier data. Once a cluster expansion model has been trained, it is used to predict activation barriers without requiring any additional NEB calculations. Numerical studies are performed to validate the cluster expansion model by studying hop processes in Ag/Ag(100). We show that the use of cluster expansion model with KMC enables efficient generation of an accurate process rate catalog.« less
Multiple receptor conformation docking, dock pose clustering and 3D QSAR studies on human poly(ADP-ribose) polymerase-1 (PARP-1) inhibitors.

PubMed

Fatima, Sabiha; Jatavath, Mohan Babu; Bathini, Raju; Sivan, Sree Kanth; Manga, Vijjulatha

2014-10-01

Poly(ADP-ribose) polymerase-1 (PARP-1) functions as a DNA damage sensor and signaling molecule. It plays a vital role in the repair of DNA strand breaks induced by radiation and chemotherapeutic drugs; inhibitors of this enzyme have the potential to improve cancer chemotherapy or radiotherapy. Three-dimensional quantitative structure activity relationship (3D QSAR) models were developed using comparative molecular field analysis, comparative molecular similarity indices analysis and docking studies. A set of 88 molecules were docked into the active site of six X-ray crystal structures of poly(ADP-ribose)polymerase-1 (PARP-1), by a procedure called multiple receptor conformation docking (MRCD), in order to improve the 3D QSAR models through the analysis of binding conformations. The docked poses were clustered to obtain the best receptor binding conformation. These dock poses from clustering were used for 3D QSAR analysis. Based on MRCD and QSAR information, some key features have been identified that explain the observed variance in the activity. Two receptor-based QSAR models were generated; these models showed good internal and external statistical reliability that is evident from the [Formula: see text], [Formula: see text] and [Formula: see text]. The identified key features enabled us to design new PARP-1 inhibitors.
Gene expression pattern recognition algorithm inferences to classify samples exposed to chemical agents

NASA Astrophysics Data System (ADS)

Bushel, Pierre R.; Bennett, Lee; Hamadeh, Hisham; Green, James; Ableson, Alan; Misener, Steve; Paules, Richard; Afshari, Cynthia

2002-06-01

We present an analysis of pattern recognition procedures used to predict the classes of samples exposed to pharmacologic agents by comparing gene expression patterns from samples treated with two classes of compounds. Rat liver mRNA samples following exposure for 24 hours with phenobarbital or peroxisome proliferators were analyzed using a 1700 rat cDNA microarray platform. Sets of genes that were consistently differentially expressed in the rat liver samples following treatment were stored in the MicroArray Project System (MAPS) database. MAPS identified 238 genes in common that possessed a low probability (P < 0.01) of being randomly detected as differentially expressed at the 95% confidence level. Hierarchical cluster analysis on the 238 genes clustered specific gene expression profiles that separated samples based on exposure to a particular class of compound.
Unsupervised pattern recognition methods in ciders profiling based on GCE voltammetric signals.

PubMed

Jakubowska, Małgorzata; Sordoń, Wanda; Ciepiela, Filip

2016-07-15

This work presents a complete methodology of distinguishing between different brands of cider and ageing degrees, based on voltammetric signals, utilizing dedicated data preprocessing procedures and unsupervised multivariate analysis. It was demonstrated that voltammograms recorded on glassy carbon electrode in Britton-Robinson buffer at pH 2 are reproducible for each brand. By application of clustering algorithms and principal component analysis visible homogenous clusters were obtained. Advanced signal processing strategy which included automatic baseline correction, interval scaling and continuous wavelet transform with dedicated mother wavelet, was a key step in the correct recognition of the objects. The results show that voltammetry combined with optimized univariate and multivariate data processing is a sufficient tool to distinguish between ciders from various brands and to evaluate their freshness. Copyright © 2016 Elsevier Ltd. All rights reserved.
Forensic analysis of Salvia divinorum using multivariate statistical procedures. Part I: discrimination from related Salvia species.

PubMed

Willard, Melissa A Bodnar; McGuffin, Victoria L; Smith, Ruth Waddell

2012-01-01

Salvia divinorum is a hallucinogenic herb that is internationally regulated. In this study, salvinorin A, the active compound in S. divinorum, was extracted from S. divinorum plant leaves using a 5-min extraction with dichloromethane. Four additional Salvia species (Salvia officinalis, Salvia guaranitica, Salvia splendens, and Salvia nemorosa) were extracted using this procedure, and all extracts were analyzed by gas chromatography-mass spectrometry. Differentiation of S. divinorum from other Salvia species was successful based on visual assessment of the resulting chromatograms. To provide a more objective comparison, the total ion chromatograms (TICs) were subjected to principal components analysis (PCA). Prior to PCA, the TICs were subjected to a series of data pretreatment procedures to minimize non-chemical sources of variance in the data set. Successful discrimination of S. divinorum from the other four Salvia species was possible based on visual assessment of the PCA scores plot. To provide a numerical assessment of the discrimination, a series of statistical procedures such as Euclidean distance measurement, hierarchical cluster analysis, Student's t tests, Wilcoxon rank-sum tests, and Pearson product moment correlation were also applied to the PCA scores. The statistical procedures were then compared to determine the advantages and disadvantages for forensic applications.
antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.

PubMed

Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H

2015-07-01

Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
A strategy for analysis of (molecular) equilibrium simulations: Configuration space density estimation, clustering, and visualization

NASA Astrophysics Data System (ADS)

Hamprecht, Fred A.; Peter, Christine; Daura, Xavier; Thiel, Walter; van Gunsteren, Wilfred F.

2001-02-01

We propose an approach for summarizing the output of long simulations of complex systems, affording a rapid overview and interpretation. First, multidimensional scaling techniques are used in conjunction with dimension reduction methods to obtain a low-dimensional representation of the configuration space explored by the system. A nonparametric estimate of the density of states in this subspace is then obtained using kernel methods. The free energy surface is calculated from that density, and the configurations produced in the simulation are then clustered according to the topography of that surface, such that all configurations belonging to one local free energy minimum form one class. This topographical cluster analysis is performed using basin spanning trees which we introduce as subgraphs of Delaunay triangulations. Free energy surfaces obtained in dimensions lower than four can be visualized directly using iso-contours and -surfaces. Basin spanning trees also afford a glimpse of higher-dimensional topographies. The procedure is illustrated using molecular dynamics simulations on the reversible folding of peptide analoga. Finally, we emphasize the intimate relation of density estimation techniques to modern enhanced sampling algorithms.
Best (but oft-forgotten) practices: designing, analyzing, and reporting cluster randomized controlled trials.

PubMed

Brown, Andrew W; Li, Peng; Bohan Brown, Michelle M; Kaiser, Kathryn A; Keith, Scott W; Oakes, J Michael; Allison, David B

2015-08-01

Cluster randomized controlled trials (cRCTs; also known as group randomized trials and community-randomized trials) are multilevel experiments in which units that are randomly assigned to experimental conditions are sets of grouped individuals, whereas outcomes are recorded at the individual level. In human cRCTs, clusters that are randomly assigned are typically families, classrooms, schools, worksites, or counties. With growing interest in community-based, public health, and policy interventions to reduce obesity or improve nutrition, the use of cRCTs has increased. Errors in the design, analysis, and interpretation of cRCTs are unfortunately all too common. This situation seems to stem in part from investigator confusion about how the unit of randomization affects causal inferences and the statistical procedures required for the valid estimation and testing of effects. In this article, we provide a brief introduction and overview of the importance of cRCTs and highlight and explain important considerations for the design, analysis, and reporting of cRCTs by using published examples. © 2015 American Society for Nutrition.
Multiwavelength mock observations of the WHIM in a simulated galaxy cluster

NASA Astrophysics Data System (ADS)

Planelles, Susana; Mimica, Petar; Quilis, Vicent; Cuesta-Martínez, Carlos

2018-06-01

About half of the expected total baryon budget in the local Universe is `missing'. Hydrodynamical simulations suggest that most of the missing baryons are located in a mildly overdense, warm-hot intergalactic medium (WHIM), which is difficult to be detected at most wavelengths. In this paper, we explore multiwavelength synthetic observations of a massive galaxy cluster developed in a full Eulerian-adaptive mesh refinement cosmological simulation. A novel numerical procedure is applied on the outputs of the simulation, which are post-processed with a full-radiative transfer code that can compute the change of the intensity at any frequency along the null geodesic of photons. We compare the emission from the whole intergalactic medium and from the WHIM component (defined as the gas with a temperature in the range 105-107 K) at three observational bands associated with thermal X-rays, thermal and kinematic Sunyaev-Zel'dovich effect, and radio emission. The synthetic maps produced by this procedure could be directly compared with existing observational maps and could be used as a guide for future observations with forthcoming instruments. The analysis of the different emissions associated with a high-resolution galaxy cluster is in broad agreement with previous simulated and observational estimates of both gas components.
Analysis of indoor air pollutants checklist using environmetric technique for health risk assessment of sick building complaint in nonindustrial workplace

PubMed Central

Syazwan, AI; Rafee, B Mohd; Juahir, Hafizan; Azman, AZF; Nizar, AM; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, AA; Yunos, MA Syafiq; Anita, AR; Hanafiah, J Muhamad; Shaharuddin, MS; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, MN Mohamad; Azizan, HS; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, FT

2012-01-01

Purpose To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. Design A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. Method A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Result Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. Conclusion This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures. PMID:23055779
Analysis of indoor air pollutants checklist using environmetric technique for health risk assessment of sick building complaint in nonindustrial workplace.

PubMed

Syazwan, Ai; Rafee, B Mohd; Juahir, Hafizan; Azman, Azf; Nizar, Am; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, Aa; Yunos, Ma Syafiq; Anita, Ar; Hanafiah, J Muhamad; Shaharuddin, Ms; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, Mn Mohamad; Azizan, Hs; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, Ft

2012-01-01

To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures.
Differences in Coping Styles among Persons with Spinal Cord Injury: A Cluster-Analytic Approach.

ERIC Educational Resources Information Center

Frank, Robert G.; And Others

1987-01-01

Identified and validated two subgroups in group of 53 persons with spinal cord injury by applying cluster-analytic procedures to subjects' self-reported coping and health locus of control belief scores. Cluster 1 coped less effectively and tended to be psychologically distressed; Cluster 2 subjects emphasized internal health attributions and…
Analyzing simulation-based PRA data through traditional and topological clustering: A BWR station blackout case study

DOE PAGES

Maljovec, D.; Liu, S.; Wang, B.; ...

2015-07-14

Here, dynamic probabilistic risk assessment (DPRA) methodologies couple system simulator codes (e.g., RELAP and MELCOR) with simulation controller codes (e.g., RAVEN and ADAPT). Whereas system simulator codes model system dynamics deterministically, simulation controller codes introduce both deterministic (e.g., system control logic and operating procedures) and stochastic (e.g., component failures and parameter uncertainties) elements into the simulation. Typically, a DPRA is performed by sampling values of a set of parameters and simulating the system behavior for that specific set of parameter values. For complex systems, a major challenge in using DPRA methodologies is to analyze the large number of scenarios generated,more » where clustering techniques are typically employed to better organize and interpret the data. In this paper, we focus on the analysis of two nuclear simulation datasets that are part of the risk-informed safety margin characterization (RISMC) boiling water reactor (BWR) station blackout (SBO) case study. We provide the domain experts a software tool that encodes traditional and topological clustering techniques within an interactive analysis and visualization environment, for understanding the structures of such high-dimensional nuclear simulation datasets. We demonstrate through our case study that both types of clustering techniques complement each other for enhanced structural understanding of the data.« less

Rainfall over Friuli-Venezia Giulia: High amounts and strong geographical gradients

NASA Astrophysics Data System (ADS)

Ceschia, M.; Micheletti, St.; Carniel, R.

1991-12-01

The precipitation distribution over Friuli-Venezia Giulia — the easternmost region of Northern Italy extending from the Adriatic Sea to the Alps — has been studied. Monthly rainfall data over the region and the bordering areas of Veneto and Slovenia during the period from 1951 to 1986 have been analyzed by standard statistical methods, including cluster analysis. The overall results emphasize a distribution with rainfall increasing from the sea to the prealpine areas. The highest precipitations were recorded over the Musi-Canin range, with average values exceeding 3 200 mm per year. Noteworthy is the unforeseen subdivision of the region by the clustering procedure by means of the Angot index.
Method for evaluating wind turbine wake effects on wind farm performance

NASA Technical Reports Server (NTRS)

Neustadter, H. E.; Spera, D. A.

1985-01-01

A method of testing the performance of a cluster of wind turbine units an data analysis equations are presented which together form a simple and direct procedure for determining the reduction in energy output caused by the wake of an upwind turbine. This method appears to solve the problems presented by data scatter and wind variability. Test data from the three-unit Mod-2 wind turbine cluster at Goldendale, Washington, are analyzed to illustrate the application of the proposed method. In this sample case the reduction in energy was found to be about 10 percent when the Mod-2 units were separated a distance equal to seven diameters and winds were below rated.
Factors that cause genotype by environment interaction and use of a multiple-trait herd-cluster model for milk yield of Holstein cattle from Brazil and Colombia.

PubMed

Cerón-Muñoz, M F; Tonhati, H; Costa, C N; Rojas-Sarmiento, D; Echeverri Echeverri, D M

2004-08-01

Descriptive herd variables (DVHE) were used to explain genotype by environment interactions (G x E) for milk yield (MY) in Brazilian and Colombian production environments and to develop a herd-cluster model to estimate covariance components and genetic parameters for each herd environment group. Data consisted of 180,522 lactation records of 94,558 Holstein cows from 937 Brazilian and 400 Colombian herds. Herds in both countries were jointly grouped in thirds according to 8 DVHE: production level, phenotypic variability, age at first calving, calving interval, percentage of imported semen, lactation length, and herd size. For each DVHE, REML bivariate animal model analyses were used to estimate genetic correlations for MY between upper and lower thirds of the data. Based on estimates of genetic correlations, weights were assigned to each DVHE to group herds in a cluster analysis using the FASTCLUS procedure in SAS. Three clusters were defined, and genetic and residual variance components were heterogeneous among herd clusters. Estimates of heritability in clusters 1 and 3 were 0.28 and 0.29, respectively, but the estimate was larger (0.39) in Cluster 2. The genetic correlations of MY from different clusters ranged from 0.89 to 0.97. The herd-cluster model based on DVHE properly takes into account G x E by grouping similar environments accordingly and seems to be an alternative to simply considering country borders to distinguish between environments.
[Space-time suicide clustering in the community of Antequera (Spain)].

PubMed

Pérez-Costillas, Lucía; Blasco-Fontecilla, Hilario; Benítez, Nicolás; Comino, Raquel; Antón, José Miguel; Ramos-Medina, Valentín; Lopez, Amalia; Palomo, José Luis; Madrigal, Lucía; Alcalde, Javier; Perea-Millá, Emilio; Artieda-Urrutia, Paula; de León-Martínez, Victoria; de Diego Otero, Yolanda

2015-01-01

Approximately 3,500 people commit suicide every year in Spain. The main aim of this study is to explore if a spatial and temporal clustering of suicide exists in the region of Antequera (Málaga, España). Sample and procedure: All suicides from January 1, 2004 to December 31, 2008 were identified using data from the Forensic Pathology Department of the Institute of Legal Medicine, Málaga (España). Geolocalisation. Google Earth was used to calculate the coordinates for each suicide decedent's address. Statistical analysis. A spatiotemporal permutation scan statistic and the Ripley's K function were used to explore spatiotemporal clustering. Pearson's chi-squared was used to determine whether there were differences between suicides inside and outside the spatiotemporal clusters. A total of 120 individuals committed suicide within the region of Antequera, of which 96 (80%) were included in our analyses. Statistically significant evidence for 7 spatiotemporal suicide clusters emerged within critical limits for the 0-2.5 km distance and for the first and second semanas (P<.05 in both cases) after suicide. There was not a single subject diagnosed with a current psychotic disorder, among suicides within clusters, whereas outside clusters, 20% had this diagnosis (X2=4.13; df=1; P<.05). There are spatiotemporal suicide clusters in the area surrounding Antequera. Patients diagnosed with current psychotic disorder are less likely to be influenced by the factors explaining suicide clustering. Copyright © 2013 SEP y SEPB. Published by Elsevier España. All rights reserved.
Biclustering of gene expression data using reactive greedy randomized adaptive search procedure

PubMed Central

Dharan, Smitha; Nair, Achuthsankar S

2009-01-01

Background Biclustering algorithms belong to a distinct class of clustering algorithms that perform simultaneous clustering of both rows and columns of the gene expression matrix and can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse. Cheng and Church have introduced a measure called mean squared residue score to evaluate the quality of a bicluster and has become one of the most popular measures to search for biclusters. In this paper, we review basic concepts of the metaheuristics Greedy Randomized Adaptive Search Procedure (GRASP)-construction and local search phases and propose a new method which is a variant of GRASP called Reactive Greedy Randomized Adaptive Search Procedure (Reactive GRASP) to detect significant biclusters from large microarray datasets. The method has two major steps. First, high quality bicluster seeds are generated by means of k-means clustering. In the second step, these seeds are grown using the Reactive GRASP, in which the basic parameter that defines the restrictiveness of the candidate list is self-adjusted, depending on the quality of the solutions found previously. Results We performed statistical and biological validations of the biclusters obtained and evaluated the method against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results indicate that the Reactive GRASP approach outperforms the basic GRASP algorithm and Cheng and Church approach. Conclusion The Reactive GRASP approach for the detection of significant biclusters is robust and does not require calibration efforts. PMID:19208127
The ASTRODEEP Frontier Fields catalogues. I. Multiwavelength photometry of Abell-2744 and MACS-J0416

NASA Astrophysics Data System (ADS)

Merlin, E.; Amorín, R.; Castellano, M.; Fontana, A.; Buitrago, F.; Dunlop, J. S.; Elbaz, D.; Boucaud, A.; Bourne, N.; Boutsia, K.; Brammer, G.; Bruce, V. A.; Capak, P.; Cappelluti, N.; Ciesla, L.; Comastri, A.; Cullen, F.; Derriere, S.; Faber, S. M.; Ferguson, H. C.; Giallongo, E.; Grazian, A.; Lotz, J.; Michałowski, M. J.; Paris, D.; Pentericci, L.; Pilo, S.; Santini, P.; Schreiber, C.; Shu, X.; Wang, T.

2016-05-01

Context. The Frontier Fields survey is a pioneering observational program aimed at collecting photometric data, both from space (Hubble Space Telescope and Spitzer Space Telescope) and from ground-based facilities (VLT Hawk-I), for six deep fields pointing at clusters of galaxies and six nearby deep parallel fields, in a wide range of passbands. The analysis of these data is a natural outcome of the Astrodeep project, an EU collaboration aimed at developing methods and tools for extragalactic photometry and creating valuable public photometric catalogues. Aims: We produce multiwavelength photometric catalogues (from B to 4.5 μm) for the first two of the Frontier Fields, Abell-2744 and MACS-J0416 (plus their parallel fields). Methods: To detect faint sources even in the central regions of the clusters, we develop a robust and repeatable procedure that uses the public codes Galapagos and Galfit to model and remove most of the light contribution from both the brightest cluster members, and the intra-cluster light. We perform the detection on the processed HST H160 image to obtain a pure H-selected sample, which is the primary catalogue that we publish. We also add a sample of sources which are undetected in the H160 image but appear on a stacked infrared image. Photometry on the other HST bands is obtained using SExtractor, again on processed images after the procedure for foreground light removal. Photometry on the Hawk-I and IRAC bands is obtained using our PSF-matching deconfusion code t-phot. A similar procedure, but without the need for the foreground light removal, is adopted for the Parallel fields. Results: The procedure of foreground light subtraction allows for the detection and the photometric measurements of ~2500 sources per field. We deliver and release complete photometric H-detected catalogues, with the addition of the complementary sample of infrared-detected sources. All objects have multiwavelength coverage including B to H HST bands, plus K-band from Hawk-I, and 3.6-4.5 μm from Spitzer. full and detailed treatment of photometric errors is included. We perform basic sanity checks on the reliability of our results. Conclusions: The multiwavelength photometric catalogues are available publicly and are ready to be used for scientific purposes. Our procedures allows for the detection of outshone objects near the bright galaxies, which, coupled with the magnification effect of the clusters, can reveal extremely faint high redshift sources. Full analysis on photometric redshifts is presented in Paper II. The catalogues, together with the final processed images for all HST bands (as well as some diagnostic data and images), are publicly available and can be downloaded from the Astrodeep website at http://www.astrodeep.eu/frontier-fields/ and from a dedicated CDS webpage (http://astrodeep.u-strasbg.fr/ff/index.html). The catalogues are also available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/590/A31
Flow Cytometry Data Preparation Guidelines for Improved Automated Phenotypic Analysis.

PubMed

Jimenez-Carretero, Daniel; Ligos, José M; Martínez-López, María; Sancho, David; Montoya, María C

2018-05-15

Advances in flow cytometry (FCM) increasingly demand adoption of computational analysis tools to tackle the ever-growing data dimensionality. In this study, we tested different data input modes to evaluate how cytometry acquisition configuration and data compensation procedures affect the performance of unsupervised phenotyping tools. An analysis workflow was set up and tested for the detection of changes in reference bead subsets and in a rare subpopulation of murine lymph node CD103 + dendritic cells acquired by conventional or spectral cytometry. Raw spectral data or pseudospectral data acquired with the full set of available detectors by conventional cytometry consistently outperformed datasets acquired and compensated according to FCM standards. Our results thus challenge the paradigm of one-fluorochrome/one-parameter acquisition in FCM for unsupervised cluster-based analysis. Instead, we propose to configure instrument acquisition to use all available fluorescence detectors and to avoid integration and compensation procedures, thereby using raw spectral or pseudospectral data for improved automated phenotypic analysis. Copyright © 2018 by The American Association of Immunologists, Inc.
A Novel Artificial Bee Colony Based Clustering Algorithm for Categorical Data

PubMed Central

2015-01-01

Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data. PMID:25993469
A novel artificial bee colony based clustering algorithm for categorical data.

PubMed

Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang

2015-01-01

Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.
Conjoint Analysis for New Service Development on Electricity Distribution in Indonesia

NASA Astrophysics Data System (ADS)

Widaningrum, D. L.; Chynthia; Astuti, L. D.; Seran, M. A. B.

2017-07-01

Many cases of illegal use of electricity in Indonesia is still rampant, especially for activities where the power source is not available, such as in the location of street vendors. It is not only detrimental to the state, but also harm the perpetrators of theft of electricity and the surrounding communities. The purpose of this study is to create New Service Development (NSD) to provide a new electricity source for street vendors' activity based on their preferences. The methods applied in NSD is Conjoint Analysis, Cluster Analysis, Quality Function Deployment (QFD), Service Blueprint, Process Flow Diagrams and Quality Control Plan. The results of this study are the attributes and their importance in the new electricity’s service based on street vendors’ preferences as customers, customer segmentation, service design for new service, designing technical response, designing operational procedures, the quality control plan of any existing operational procedures.
Early phase drug discovery: cheminformatics and computational techniques in identifying lead series.

PubMed

Duffy, Bryan C; Zhu, Lei; Decornez, Hélène; Kitchen, Douglas B

2012-09-15

Early drug discovery processes rely on hit finding procedures followed by extensive experimental confirmation in order to select high priority hit series which then undergo further scrutiny in hit-to-lead studies. The experimental cost and the risk associated with poor selection of lead series can be greatly reduced by the use of many different computational and cheminformatic techniques to sort and prioritize compounds. We describe the steps in typical hit identification and hit-to-lead programs and then describe how cheminformatic analysis assists this process. In particular, scaffold analysis, clustering and property calculations assist in the design of high-throughput screening libraries, the early analysis of hits and then organizing compounds into series for their progression from hits to leads. Additionally, these computational tools can be used in virtual screening to design hit-finding libraries and as procedures to help with early SAR exploration. Copyright © 2012 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.

Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less
Clustering Algorithms: Their Application to Gene Expression Data

PubMed Central

Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel

2016-01-01

Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
The application of time series models to cloud field morphology analysis

NASA Technical Reports Server (NTRS)

Chin, Roland T.; Jau, Jack Y. C.; Weinman, James A.

1987-01-01

A modeling method for the quantitative description of remotely sensed cloud field images is presented. A two-dimensional texture modeling scheme based on one-dimensional time series procedures is adopted for this purpose. The time series procedure used is the seasonal autoregressive, moving average (ARMA) process in Box and Jenkins. Cloud field properties such as directionality, clustering and cloud coverage can be retrieved by this method. It has been demonstrated that a cloud field image can be quantitatively defined by a small set of parameters and synthesized surrogates can be reconstructed from these model parameters. This method enables cloud climatology to be studied quantitatively.
Comparison of different hydrological similarity measures to estimate flow quantiles

NASA Astrophysics Data System (ADS)

Rianna, M.; Ridolfi, E.; Napolitano, F.

2017-07-01

This paper aims to evaluate the influence of hydrological similarity measures on the definition of homogeneous regions. To this end, several attribute sets have been analyzed in the context of the Region of Influence (ROI) procedure. Several combinations of geomorphological, climatological, and geographical characteristics are also used to cluster potentially homogeneous regions. To verify the goodness of the resulting pooled sites, homogeneity tests arecarried out. Through a Monte Carlo simulation and a jack-knife procedure, flow quantiles areestimated for the regions effectively resulting as homogeneous. The analysis areperformed in both the so-called gauged and ungauged scenarios to analyze the effect of hydrological measures on flow quantiles estimation.
Non-invasive localization of atrial ectopic beats by using simulated body surface P-wave integral maps

PubMed Central

Godoy, Eduardo J.; Lozano, Miguel; Martínez-Mateu, Laura; Atienza, Felipe; Saiz, Javier; Sebastian, Rafael

2017-01-01

Non-invasive localization of continuous atrial ectopic beats remains a cornerstone for the treatment of atrial arrhythmias. The lack of accurate tools to guide electrophysiologists leads to an increase in the recurrence rate of ablation procedures. Existing approaches are based on the analysis of the P-waves main characteristics and the forward body surface potential maps (BSPMs) or on the inverse estimation of the electric activity of the heart from those BSPMs. These methods have not provided an efficient and systematic tool to localize ectopic triggers. In this work, we propose the use of machine learning techniques to spatially cluster and classify ectopic atrial foci into clearly differentiated atrial regions by using the body surface P-wave integral map (BSPiM) as a biomarker. Our simulated results show that ectopic foci with similar BSPiM naturally cluster into differentiated non-intersected atrial regions and that new patterns could be correctly classified with an accuracy of 97% when considering 2 clusters and 96% for 4 clusters. Our results also suggest that an increase in the number of clusters is feasible at the cost of decreasing accuracy. PMID:28704537
Structural Analysis of Cubane-Type Iron Clusters

PubMed Central

Tan, Lay Ling; Holm, R. H.; Lee, Sonny C.

2013-01-01

The generalized cluster type [M4(μ3-Q)4Ln]x contains the cubane-type [M4Q4]z core unit that can approach, but typically deviates from, perfect Td symmetry. The geometric properties of this structure have been analyzed with reference to Td symmetry by a new protocol. Using coordinates of M and Q atoms, expressions have been derived for interatomic separations, bond angles, and volumes of tetrahedral core units (M4, Q4) and the total [M4Q4] core (as a tetracapped M4 tetrahedron). Values for structural parameters have been calculated from observed average values for a given cluster type. Comparison of calculated and observed values measures the extent of deviation of a given parameter from that required in an exact tetrahedral structure. The procedure has been applied to the structures of over 130 clusters containing [Fe4Q4] (Q = S2−, Se2−, Te2−, [NPR3]−, [NR]2−) units, of which synthetic and biological sulfide-bridged clusters constitute the largest subset. General structural features and trends in structural parameters are identified and summarized. An extensive database of structural properties (distances, angles, volumes) has been compiled in Supporting Information. PMID:24072952
Structural Analysis of Cubane-Type Iron Clusters.

PubMed

Tan, Lay Ling; Holm, R H; Lee, Sonny C

2013-07-13

The generalized cluster type [M 4 (μ 3 -Q) 4 L n ] x contains the cubane-type [M 4 Q 4 ] z core unit that can approach, but typically deviates from, perfect T d symmetry. The geometric properties of this structure have been analyzed with reference to T d symmetry by a new protocol. Using coordinates of M and Q atoms, expressions have been derived for interatomic separations, bond angles, and volumes of tetrahedral core units (M 4 , Q 4 ) and the total [M 4 Q 4 ] core (as a tetracapped M 4 tetrahedron). Values for structural parameters have been calculated from observed average values for a given cluster type. Comparison of calculated and observed values measures the extent of deviation of a given parameter from that required in an exact tetrahedral structure. The procedure has been applied to the structures of over 130 clusters containing [Fe 4 Q 4 ] (Q = S 2- , Se 2- , Te 2- , [NPR 3 ] - , [NR] 2- ) units, of which synthetic and biological sulfide-bridged clusters constitute the largest subset. General structural features and trends in structural parameters are identified and summarized. An extensive database of structural properties (distances, angles, volumes) has been compiled in Supporting Information.
GLOBULAR AND OPEN CLUSTERS OBSERVED BY SDSS/SEGUE: THE GIANT STARS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morrison, Heather L.; Ma, Zhibo; Connor, Thomas

We present griz observations for the clusters M92, M13 and NGC 6791 and gr photometry for M71, Be 29 and NGC 7789. In addition we present new membership identifications for all these clusters, which have been observed spectroscopically as calibrators for the Sloan Digital Sky Survey (SDSS)/SEGUE survey; this paper focuses in particular on the red giant branch stars in the clusters. In a number of cases, these giants were too bright to be observed in the normal SDSS survey operations, and we describe the procedure used to obtain spectra for these stars. For M71, we also present a newmore » variable reddening map and a new fiducial for the gr giant branch. For NGC 7789, we derived a transformation from T{sub eff} to g–r for giants of near solar abundance, using IRFM T{sub eff} measures of stars with good ugriz and 2MASS photometry and SEGUE spectra. The result of our analysis is a robust list of known cluster members with correctly dereddened and (if needed) transformed gr photometry for crucial calibration efforts for SDSS and SEGUE.« less
The X-CLASS-redMaPPer galaxy cluster comparison. I. Identification procedures

NASA Astrophysics Data System (ADS)

Sadibekova, T.; Pierre, M.; Clerc, N.; Faccioli, L.; Gastaud, R.; Le Fevre, J.-P.; Rozo, E.; Rykoff, E.

2014-11-01

Context. This paper is the first in a series undertaking a comprehensive correlation analysis between optically selected and X-ray-selected cluster catalogues. The rationale of the project is to develop a holistic picture of galaxy clusters utilising optical and X-ray-cluster-selected catalogues with well-understood selection functions. Aims: Unlike most of the X-ray/optical cluster correlations to date, the present paper focuses on the non-matching objects in either waveband. We investigate how the differences observed between the optical and X-ray catalogues may stem from (1) a shortcoming of the detection algorithms; (2) dispersion in the X-ray/optical scaling relations; or (3) substantial intrinsic differences between the cluster populations probed in the X-ray and optical bands. The aim is to inventory and elucidate these effects in order to account for selection biases in the further determination of X-ray/optical cluster scaling relations. Methods: We correlated the X-CLASS serendipitous cluster catalogue extracted from the XMM archive with the redMaPPer optical cluster catalogue derived from the Sloan Digital Sky Survey (DR8). We performed a detailed and, in large part, interactive analysis of the matching output from the correlation. The overlap between the two catalogues has been accurately determined and possible cluster positional errors were manually recovered. The final samples comprise 270 and 355 redMaPPer and X-CLASS clusters, respectively. X-ray cluster matching rates were analysed as a function of optical richness. In the second step, the redMaPPer clusters were correlated with the entire X-ray catalogue, containing point and uncharacterised sources (down to a few 10-15 erg s-1 cm-2 in the [0.5-2] keV band). A stacking analysis was performed for the remaining undetected optical clusters. Results: We find that all rich (λ ≥ 80) clusters are detected in X-rays out to z = 0.6. Below this redshift, the richness threshold for X-ray detection steadily decreases with redshift. Likewise, all X-ray bright clusters are detected by redMaPPer. After correcting for obvious pipeline shortcomings (about 10% of the cases both in optical and X-ray), ~50% of the redMaPPer (down to a richness of 20) are found to coincide with an X-CLASS cluster; when considering X-ray sources of any type, this fraction increases to ~80%; for the remaining objects, the stacking analysis finds a weak signal within 0.5 Mpc around the cluster optical centres. The fraction of clusters totally dominated by AGN-type emission appears to be a few percent. Conversely, ~40% of the X-CLASS clusters are identified with a redMaPPer (down to a richness of 20) - part of the non-matches being due to the X-CLASS sample extending further out than redMaPPer (z< 1.5 vs. z< 0.6), but extending the correlation down to a richness of 5 raises the matching rate to ~65%. Conclusions: This state-of-the-art study involving two well-validated cluster catalogues has shown itself to be complex, and it points to a number of issues inherent to blind cross-matching, owing both to pipeline shortcomings and cluster peculiar properties. These can only been accounted for after a manual check. The combined X-ray and optical scaling relations will be presented in a subsequent article.

Bagging Voronoi classifiers for clustering spatial functional data

NASA Astrophysics Data System (ADS)

Secchi, Piercesare; Vantini, Simone; Vitelli, Valeria

2013-06-01

We propose a bagging strategy based on random Voronoi tessellations for the exploration of geo-referenced functional data, suitable for different purposes (e.g., classification, regression, dimensional reduction, …). Urged by an application to environmental data contained in the Surface Solar Energy database, we focus in particular on the problem of clustering functional data indexed by the sites of a spatial finite lattice. We thus illustrate our strategy by implementing a specific algorithm whose rationale is to (i) replace the original data set with a reduced one, composed by local representatives of neighborhoods covering the entire investigated area; (ii) analyze the local representatives; (iii) repeat the previous analysis many times for different reduced data sets associated to randomly generated different sets of neighborhoods, thus obtaining many different weak formulations of the analysis; (iv) finally, bag together the weak analyses to obtain a conclusive strong analysis. Through an extensive simulation study, we show that this new procedure - which does not require an explicit model for spatial dependence - is statistically and computationally efficient.
Cluster analysis of word frequency dynamics

NASA Astrophysics Data System (ADS)

Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

2015-01-01

This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

NASA Astrophysics Data System (ADS)

Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard

2014-09-01

Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G., E-mail: yannis@princeton.edu, E-mail: gerhard.hummer@biophys.mpg.de

Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlapmore » with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.« less
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

PubMed Central

Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard

2014-01-01

Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space. PMID:25240340
An efficient matrix-matrix multiplication based antisymmetric tensor contraction engine for general order coupled cluster.

PubMed

Hanrath, Michael; Engels-Putzka, Anna

2010-08-14

In this paper, we present an efficient implementation of general tensor contractions, which is part of a new coupled-cluster program. The tensor contractions, used to evaluate the residuals in each coupled-cluster iteration are particularly important for the performance of the program. We developed a generic procedure, which carries out contractions of two tensors irrespective of their explicit structure. It can handle coupled-cluster-type expressions of arbitrary excitation level. To make the contraction efficient without loosing flexibility, we use a three-step procedure. First, the data contained in the tensors are rearranged into matrices, then a matrix-matrix multiplication is performed, and finally the result is backtransformed to a tensor. The current implementation is significantly more efficient than previous ones capable of treating arbitrary high excitations.
Competency Based Teacher Education Component. Curriculum Methods and Materials, Elementary Mathematics and Social Studies.

ERIC Educational Resources Information Center

Woodworth, William D.

Four mathematical/social studies module clusters are presented in an effort to develop proficiency in instruction and in inductive and deductive teaching procedures. Modules within the first cluster concern systems of numeration, set operations, numbers, measurement, geometry, mathematics, and reasoning. The second mathematical cluster presents…
Use of market segmentation to identify untapped consumer needs in vision correction surgery for future growth.

PubMed

Loarie, Thomas M; Applegate, David; Kuenne, Christopher B; Choi, Lawrence J; Horowitz, Diane P

2003-01-01

Market segmentation analysis identifies discrete segments of the population whose beliefs are consistent with exhibited behaviors such as purchase choice. This study applies market segmentation analysis to low myopes (-1 to -3 D with less than 1 D cylinder) in their consideration and choice of a refractive surgery procedure to discover opportunities within the market. A quantitative survey based on focus group research was sent to a demographically balanced sample of myopes using contact lenses and/or glasses. A variable reduction process followed by a clustering analysis was used to discover discrete belief-based segments. The resulting segments were validated both analytically and through in-market testing. Discontented individuals who wear contact lenses are the primary target for vision correction surgery. However, 81% of the target group is apprehensive about laser in situ keratomileusis (LASIK). They are nervous about the procedure and strongly desire reversibility and exchangeability. There exists a large untapped opportunity for vision correction surgery within the low myope population. Market segmentation analysis helped determine how to best meet this opportunity through repositioning existing procedures or developing new vision correction technology, and could also be applied to identify opportunities in other vision correction populations.
On the problem of nonsense correlations in allergological tests after routine extraction.

PubMed

Rijckaert, G

1981-01-01

The influence of extraction procedures and culturing methods of material used for the preparation of allergenic extracts on correlation patterns found in allergological testing (skin test and RAST) was investigated. In our laboratory a short extraction procedure performed at O degrees C was used for Aspergillus repens. A. penicilloides, Wallemia sebi, their rearing media and non-inoculated medium. For the commercially available extracts from house dust, house-dust mite, pollen of Dactylus glomerata and A. penicilloides a longer procedure (several days) performed at room temperature was used. Statistical analysis showed a separation of all test results into two clusters, each cluster being composed of correlations between extracts from only one the manufacturers did not show any correlation. The correlations found between the short time incubated extracts of the xerophilic fungi and their rearing media could be explained by genetical and biochemical relationships between these fungi depending on ecological conditions. However, while the correlation found between house dust and house-dust mite is understandable, correlations found between long time incubated extracts from house-dust mite and D. glomerata or A. penicilloides may be nonsense correlations, that do not adequately describe the in vivo situation. The similarity of these extracts is presumably artificially created during extraction.
Utility of correlation techniques in gravity and magnetic interpretation

NASA Technical Reports Server (NTRS)

Chandler, V. W.; Koski, J. S.; Braile, L. W.; Hinze, W. J.

1977-01-01

Two methods of quantitative combined analysis, internal correspondence and clustering, are presented. Model studies are used to illustrate implementation and interpretation procedures of these methods, particularly internal correspondence. Analysis of the results of applying these methods to data from the midcontinent and a transcontinental profile show they can be useful in identifying crustal provinces, providing information on horizontal and vertical variations of physical properties over province size zones, validating long wave-length anomalies, and isolating geomagnetic field removal problems. Thus, these techniques are useful in considering regional data acquired by satellites.
The CNO Bi-cycle in the Open Cluster NGC 752

NASA Astrophysics Data System (ADS)

Hawkins, Keith; Schuler, S.; King, J.; The, L.

2011-01-01

The CNO bi-cycle is the primary energy source for main sequence stars more massive than the sun. To test our understanding of stellar evolution models using the CNO bi-cycle, we have undertaken light-element (CNO) abundance analysis of three main sequence dwarf stars and three red giant stars in the open cluster NGC 752 utilizing high resolution (R 50,000) spectroscopy from the Keck Observatory. Preliminary results indicate, as expected, there is a depletion of carbon in the giants relative to the dwarfs. Additional analysis is needed to determine if the amount of depletion is in line with model predictions, as seen in the Hyades open cluster. Oxygen abundances are derived from the high-excitation O I triplet, and there is a 0.19 dex offset in the [O/H] abundances between the giants and dwarfs which may be explained by non-local thermodynamic equilibrium (NLTE), although further analysis is needed to verify this. The standard procedure for spectroscopically determining stellar parameters used here allows for a measurement of the cluster metallicity, [Fe/H] = 0.04 ± 0.02. In addition to the Fe abundances we have determined Na, Mg, and Al abundances to determine the status of other nucleosynthesis processes. The Na, Mg and Al abundances of the giants are enhanced relative to the dwarfs, which is consistent with similar findings in giants of other open clusters. Support for K. Hawkins was provided by the NOAO/KPNO Research Experiences for Undergraduates (REU) Program which is funded by the National Science Foundation Research Experiences for Undergraduates Program and the Department of Defense ASSURE program through Scientific Program Order No. 13 (AST-0754223) of the Cooperative Agreement No. AST-0132798 between the Association of Universities for Research in Astronomy (AURA) and the NSF.
Development and selection of Asian-specific humeral implants based on statistical atlas: toward planning minimally invasive surgery.

PubMed

Wu, K; Daruwalla, Z J; Wong, K L; Murphy, D; Ren, H

2015-08-01

The commercial humeral implants based on the Western population are currently not entirely compatible with Asian patients, due to differences in bone size, shape and structure. Surgeons may have to compromise or use different implants that are less conforming, which may cause complications of as well as inconvenience to the implant position. The construction of Asian humerus atlases of different clusters has therefore been proposed to eradicate this problem and to facilitate planning minimally invasive surgical procedures [6,31]. According to the features of the atlases, new implants could be designed specifically for different patients. Furthermore, an automatic implant selection algorithm has been proposed as well in order to reduce the complications caused by implant and bone mismatch. Prior to the design of the implant, data clustering and extraction of the relevant features were carried out on the datasets of each gender. The fuzzy C-means clustering method is explored in this paper. Besides, two new schemes of implant selection procedures, namely the Procrustes analysis-based scheme and the group average distance-based scheme, were proposed to better search for the matching implants for new coming patients from the database. Both these two algorithms have not been used in this area, while they turn out to have excellent performance in implant selection. Additionally, algorithms to calculate the matching scores between various implants and the patient data are proposed in this paper to assist the implant selection procedure. The results obtained have indicated the feasibility of the proposed development and selection scheme. The 16 sets of male data were divided into two clusters with 8 and 8 subjects, respectively, and the 11 female datasets were also divided into two clusters with 5 and 6 subjects, respectively. Based on the features of each cluster, the implants designed by the proposed algorithm fit very well on their reference humeri and the proposed implant selection procedure allows for a scenario of treating a patient with merely a preoperative anatomical model in order to correctly select the implant that has the best fit. Based on the leave-one-out validation, it can be concluded that both the PA-based method and GAD-based method are able to achieve excellent performance when dealing with the problem of implant selection. The accuracy and average execution time for the PA-based method were 100 % and 0.132 s, respectively, while those of the GAD- based method were 100 % and 0.058 s. Therefore, the GAD-based method outperformed the PA-based method in terms of execution speed. The primary contributions of this paper include the proposal of methods for development of Asian-, gender- and cluster-specific implants based on shape features and selection of the best fit implants for future patients according to their features. To the best of our knowledge, this is the first work that proposes implant design and selection for Asian patients automatically based on features extracted from cluster-specific statistical atlases.
Identification of subsurface microorganisms at Yucca Mountain; Second quarterly report, October 1, 1993--December 31, 1993

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stetzenbach, L.D.

1993-12-31

The primary effort of this past quarter was to develop a procedure where accumulated data files could be evaluated to determine the naming consistency and inter-relationships of the various species which have been identified by the Microbial Identification System (MIDI) system. This involved a series of steps, including the clustering of similarly named organisms in a dendrogram format to determine how closely similarly named isolates are related. The experience of other researchers using the MIDI system has shown that clusters which are joined at a Euclidian distance of 10 or less belong to the same species. Strains which are verymore » similar cluster at less than 6 Euclidian units and clusters below two units have nearly identical fatty acid patterns. When the dendrograms derived from the springs were scrutinized, some organisms were found which did not match the pattern of their named group. Then a decision was made whether to rename the isolates and exclude them from the group or redefine the group. This decision was assisted by plotting the principal components derived from an analysis of the fatty acid composition of members of the genus. Each species can be examined by the same procedure to determine group homogeneity. In these 2-dimensional plots members of the same species are roughly bounded by a box of 100 squared units while closely related strains are grouped more tightly together. The 2-dimensional plot of isolates of Micrococcus luteus demonstrates the presence of three identifiable sub-species.« less
The Minnesota Center for Twin and Family Research Genome-Wide Association Study

PubMed Central

Miller, Michael B.; Basu, Saonli; Cunningham, Julie; Eskin, Eleazar; Malone, Steven M.; Oetting, William S.; Schork, Nicholas; Sul, Jae Hoon; Iacono, William G.; Mcgue, Matt

2012-01-01

As part of the Genes, Environment and Development Initiative (GEDI), the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study (GWAS), which we describe here. A total of 8405 research participants, clustered in 4-member families, have been successfully genotyped on 527,829 single nucleotide polymorphism (SNP) markers using Illumina’s Human660W-Quad array. Quality control screening of samples and markers as well as SNP imputation procedures are described. We also describe methods for ancestry control and how the familial clustering of the MCTFR sample can be accounted for in the analysis using a Rapid Feasible Generalized Least Squares algorithm. The rich longitudinal MCTFR assessments provide numerous opportunities for collaboration. PMID:23363460
Stability and change in adolescent spirituality/religiosity: a person-centered approach.

PubMed

Good, Marie; Willoughby, Teena; Busseri, Michael A

2011-03-01

Although there has been a substantial increase over the past decade in studies that have examined the psychosocial correlates of spirituality/religiosity in adolescence, very little is known about spirituality/religiosity as a domain of development in its own right. To address this limitation, the authors identified configurations of multiple dimensions of spirituality/religiosity across 2 time points with an empirical classification procedure (cluster analysis) and assessed development in these configurations at the sample and individual level. Participants included 756 predominately Canadian-born adolescents (53% female, 47% male) from southern Ontario, Canada, who completed a survey in Grade 11 (M age = 16.41 years) and Grade 12 (M age = 17.36 years). Measures included religious activity involvement, enjoyment of religious activities, the Spiritual Transcendence Index, wondering about spiritual issues, frequency of prayer, and frequency of meditation. Sample-level development (structural stability and change) was assessed by examining whether the structural configurations of the clusters were consistent over time. Individual-level development was assessed by examining intraindividual stability and change in cluster membership over time. Results revealed that a five cluster-solution was optimal at both grades. Clusters were identified as aspiritual/irreligious, disconnected wonderers, high institutional and personal, primarily personal, and meditators. With the exception of the high institutional and personal cluster, the cluster structures were stable over time. There also was significant intraindividual stability in all clusters over time; however, a significant proportion of individuals classified as high institutional and personal in Grade 11 moved into the primarily personal cluster in Grade 12. PsycINFO Database Record (c) 2011 APA, all rights reserved.
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

PubMed

Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

2016-12-01

In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Harmonic decomposition of magneto-optical signal from suspensions of superparamagnetic nanoparticles

NASA Astrophysics Data System (ADS)

Patterson, Cody; Syed, Maarij; Takemura, Yasushi

2018-04-01

Magnetic nanoparticles (MNPs) are widely used in biomedical applications. Characterizing dilute suspensions of superparamagnetic iron oxide nanoparticles (SPIONs) in bio-relevant media is particularly valuable for magnetic particle imaging, hyperthermia, drug delivery, etc. Here, we study dilute aqueous suspensions of single-domain magnetite nanoparticles using an AC Faraday rotation (FR) setup. The setup uses an oscillating magnetic field (800 Hz) which generates a multi-harmonic response. Each harmonic is collected and analyzed using the Fourier components of the theoretical signal determined by a Langevin-like magnetization. With this procedure, we determine the average magnetic moment per particle μ , particle number density n, and Verdet constant of the sample. The fitted values of μ and n are shown to be consistent across each harmonic. Additionally, we present the results of these parameters as n is varied. The large values of μ reveal the possibility of clustering as reported in other literature. This suggests that μ is representative of the average magnetic moment per cluster of nanoparticles. Multiple factors, including the external magnetic field, surfactant degradation, and laser absorption, can contribute to dynamic and long-term aggregation leading to FR signals that represent space- and time-averaged sample parameters. Using this powerful analysis procedure, future studies are aimed at determining the clustering mechanisms in this AC system and characterizing SPION suspensions at different frequencies and viscosities.
Multiple-locus variable number of tandem repeat analysis as a tool for molecular epidemiology of botulism: The Italian experience.

PubMed

Anniballi, Fabrizio; Fillo, Silvia; Giordani, Francesco; Auricchio, Bruna; Tehran, Domenico Azarnia; di Stefano, Enrica; Mandarino, Giuseppina; De Medici, Dario; Lista, Florigio

2016-12-01

Clostridium botulinum is the bacterial agent of botulism, a rare but severe neuro-paralytic disease. Because of its high impact, in Italy botulism is monitored by an ad hoc surveillance system. The National Reference Centre for Botulism, as part of this system, collects and analyzes all demographic, epidemiologic, microbiological, and molecular data recovered during cases and/or outbreaks occurred in Italy. A panel of 312 C. botulinum strains belonging to group I were submitted to MLVA sub-typing. Strains, isolated from clinical specimens, food and environmental samples collected during the surveillance activities, were representative of all forms of botulism from all Italian regions. Through clustering analysis isolates were grouped into 12 main clusters. No regional or temporal clustering was detected, demonstrating the high heterogeneity of strains circulating in Italy. This study confirmed that MLVA is capable of sub-typing C. botulinum strains. Moreover, MLVA is effective at tracing and tracking the source of contamination and is helpful for the surveillance system in terms of planning and upgrading of procedures, activities and data collection forms. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Biochemical imaging of tissues by SIMS for biomedical applications

NASA Astrophysics Data System (ADS)

Lee, Tae Geol; Park, Ji-Won; Shon, Hyun Kyong; Moon, Dae Won; Choi, Won Woo; Li, Kapsok; Chung, Jin Ho

2008-12-01

With the development of optimal surface cleaning techniques by cluster ion beam sputtering, certain applications of SIMS for analyzing cells and tissues have been actively investigated. For this report, we collaborated with bio-medical scientists to study bio-SIMS analyses of skin and cancer tissues for biomedical diagnostics. We pay close attention to the setting up of a routine procedure for preparing tissue specimens and treating the surface before obtaining the bio-SIMS data. Bio-SIMS was used to study two biosystems, skin tissues for understanding the effects of photoaging and colon cancer tissues for insight into the development of new cancer diagnostics for cancer. Time-of-flight SIMS imaging measurements were taken after surface cleaning with cluster ion bombardment by Bi n or C 60 under varying conditions. The imaging capability of bio-SIMS with a spatial resolution of a few microns combined with principal component analysis reveal biologically meaningful information, but the lack of high molecular weight peaks even with cluster ion bombardment was a problem. This, among other problems, shows that discourse with biologists and medical doctors are critical to glean any meaningful information from SIMS mass spectrometric and imaging data. For SIMS to be accepted as a routine, daily analysis tool in biomedical laboratories, various practical sample handling methodology such as surface matrix treatment, including nano-metal particles and metal coating, in addition to cluster sputtering, should be studied.
Characterization of edible seaweed harvested on the Galician coast (northwestern Spain) using pattern recognition techniques and major and trace element data.

PubMed

Romarís-Hortas, Vanessa; García-Sartal, Cristina; Barciela-Alonso, María Carmen; Moreda-Piñeiro, Antonio; Bermejo-Barrera, Pilar

2010-02-10

Major and trace elements in North Atlantic seaweed originating from Galicia (northwestern Spain) were determined by using inductively coupled plasma-optical emission spectrometry (ICP-OES) (Ba, Ca, Cu, K, Mg, Mn, Na, Sr, and Zn), inductively coupled plasma-mass spectrometry (ICP-MS) (Br and I) and hydride generation-atomic fluorescence spectrometry (HG-AFS) (As). Pattern recognition techniques were then used to classify the edible seaweed according to their type (red, brown, and green seaweed) and also their variety (Wakame, Fucus, Sea Spaghetti, Kombu, Dulse, Nori, and Sea Lettuce). Principal component analysis (PCA) and cluster analysis (CA) were used as exploratory techniques, and linear discriminant analysis (LDA) and soft independent modeling of class analogy (SIMCA) were used as classification procedures. In total, t12 elements were determined in a range of 35 edible seaweed samples (20 brown seaweed, 10 red seaweed, 4 green seaweed, and 1 canned seaweed). Natural groupings of the samples (brown, red, and green types) were observed using PCA and CA (squared Euclidean distance between objects and Ward method as clustering procedure). The application of LDA gave correct assignation percentages of 100% for brown, red, and green types at a significance level of 5%. However, a satisfactory classification (recognition and prediction) using SIMCA was obtained only for red seaweed (100% of cases correctly classified), whereas percentages of 89 and 80% were obtained for brown seaweed for recognition (training set) and prediction (testing set), respectively.

A new method for mapping multidimensional data to lower dimensions

NASA Technical Reports Server (NTRS)

Gowda, K. C.

1983-01-01

A multispectral mapping method is proposed which is based on the new concept of BEND (Bidimensional Effective Normalised Difference). The method, which involves taking one sample point at a time and finding the interrelationships between its features, is found very economical from the point of view of storage and processing time. It has good dimensionality reduction and clustering properties, and is highly suitable for computer analysis of large amounts of data. The transformed values obtained by this procedure are suitable for either a planar 2-space mapping of geological sample points or for making grayscale and color images of geo-terrains. A few examples are given to justify the efficacy of the proposed procedure.
Variable number of tandem repeats and pulsed-field gel electrophoresis cluster analysis of enterohemorrhagic Escherichia coli serovar O157 strains.

PubMed

Yokoyama, Eiji; Uchimura, Masako

2007-11-01

Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
Non-specific filtering of beta-distributed data.

PubMed

Wang, Xinhui; Laird, Peter W; Hinoue, Toshinori; Groshen, Susan; Siegmund, Kimberly D

2014-06-19

Non-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias. We compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets. We found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered.
Tests for informative cluster size using a novel balanced bootstrap scheme.

PubMed

Nevalainen, Jaakko; Oja, Hannu; Datta, Somnath

2017-07-20

Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) presented a number of marginal distributions that could be tested. Depending on the nature and degree of informativeness of the cluster size, these marginal distributions may differ, as do the choices of the appropriate test. In particular, they applied their new test to a periodontal data set where the plausibility of the informativeness was mentioned, but no formal test for the same was conducted. We propose bootstrap tests for testing the presence of ICS. A balanced bootstrap method is developed to successfully estimate the null distribution by merging the re-sampled observations with closely matching counterparts. Relying on the assumption of exchangeability within clusters, the proposed procedure performs well in simulations even with a small number of clusters, at different distributions and against different alternative hypotheses, thus making it an omnibus test. We also explain how to extend the ICS test to a regression setting and thereby enhancing its practical utility. The methodologies are illustrated using the periodontal data set mentioned earlier. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Long-term surface EMG monitoring using K-means clustering and compressive sensing

NASA Astrophysics Data System (ADS)

Balouchestani, Mohammadreza; Krishnan, Sridhar

2015-05-01

In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
Personality traits and clinical/biochemical course in the first year after kidney transplant.

PubMed

Thomas, Caroline Venzon; de Castro, Elisa Kern; Antonello, Ivan Carlos Ferreira

2016-10-01

The relationship between personality and health is frequently studied in scientific research. This study investigated the clinical/biochemical course of kidney transplant patients based on personality traits. A longitudinal study assessed 114 kidney transplant patients (men = 68 and women = 46) with an average age of 47.72 years (SD = 11.4). Personality was evaluated using the Brazilian Factorial Personality Inventory (BFP/Big Five Model). Clinical variables were analyzed based on patient charts (estimated glomerular filtration rate (eGFR), hypertension, acute rejection, infection, graft loss, and death). Personality types were assessed by hierarchical cluster analysis. Two groups with personality types were differentiated by psychological characteristics: Cluster 1 - average neuroticism, high surgency, agreeableness and conscientiousness, and low openness; Cluster 2 - high neuroticism, average surgency and agreeableness, average conscientiousness, and low openness. There was no statistically significant difference between the clusters in terms of hypertension, acute infection, graft loss, death, and Human Leukocyte Antigen (HLA) I and II panel reactive antibodies. eGFR was associated with the personality types. Cluster 2 was associated with a better renal function in the 9-month follow-up period after kidney transplantation. In this study, patients from Cluster 2 exhibited higher eGFR 9 months after the transplant procedure compared to those from Cluster 1. Monitoring these patients over a longer period may provide a better understanding of the relationship between personality traits and clinical course during the post-transplant period.
Statistical Features of the 2010 Beni-Ilmane, Algeria, Aftershock Sequence

NASA Astrophysics Data System (ADS)

Hamdache, M.; Peláez, J. A.; Gospodinov, D.; Henares, J.

2018-03-01

The aftershock sequence of the 2010 Beni-Ilmane ( M W 5.5) earthquake is studied in depth to analyze the spatial and temporal variability of seismicity parameters of the relationships modeling the sequence. The b value of the frequency-magnitude distribution is examined rigorously. A threshold magnitude of completeness equal to 2.1, using the maximum curvature procedure or the changing point algorithm, and a b value equal to 0.96 ± 0.03 have been obtained for the entire sequence. Two clusters have been identified and characterized by their faulting type, exhibiting b values equal to 0.99 ± 0.05 and 1.04 ± 0.05. Additionally, the temporal decay of the aftershock sequence was examined using a stochastic point process. The analysis was done through the restricted epidemic-type aftershock sequence (RETAS) stochastic model, which allows the possibility to recognize the prevailing clustering pattern of the relaxation process in the examined area. The analysis selected the epidemic-type aftershock sequence (ETAS) model to offer the most appropriate description of the temporal distribution, which presumes that all events in the sequence can cause secondary aftershocks. Finally, the fractal dimensions are estimated using the integral correlation. The obtained D 2 values are 2.15 ± 0.01, 2.23 ± 0.01 and 2.17 ± 0.02 for the entire sequence, and for the first and second cluster, respectively. An analysis of the temporal evolution of the fractal dimensions D -2, D 0, D 2 and the spectral slope has been also performed to derive and characterize the different clusters included in the sequence.
Disordered eating in a Swedish community sample of adolescent girls: subgroups, stability, and associations with body esteem, deliberate self-harm and other difficulties.

PubMed

Viborg, Njördur; Wångby-Lundh, Margit; Lundh, Lars-Gunnar; Wallin, Ulf; Johnsson, Per

2018-01-01

The developmental study of subtypes of disordered eating (DE) during adolescence may be relevant to understand the development of eating disorders. The purpose of the present study was to identify subgroups with different profiles of DE in a community sample of adolescent girls aged 13-15 years, and to study the stability of these profiles and subgroups over a one-year interval in order to find patterns that may need to be addressed in further research and prevention. Cluster analysis according to the LICUR procedure was performed on five aspects of DE, and the structural and individual stability of these clusters was analysed. The clusters were compared with regard to BMI, body esteem, deliberate self-harm, and other kinds of psychological difficulties. The analysis revealed six clusters (Multiple eating problems including purging, Multiple eating problems without purging, Social eating problems, Weight concerns, Fear of not being able to stop eating, and No eating problems) all of which had structurally stable profiles and five of which showed stability at the individual level. The more pronounced DE clusters (Multiple eating problems including/without purging) were consistently associated with higher levels of psychological difficulties and lower levels of body esteem. Furthermore, girls that reported purging reported engaging in self-harm to a larger extent. Subgroups of 13-15 year old girls show stable patterns of disordered eating that are associated with higher rates of psychological impairment and lower body esteem. The subgroup of girls who engage in purging also engage in more deliberate self-harm.
Implementation of client versus care-provider strategies to improve external cephalic version rates: a cluster randomized controlled trial.

PubMed

Vlemmix, Floortje; Rosman, Ageeth N; Rijnders, Marlies E; Beuckens, Antje; Opmeer, Brent C; Mol, Ben W J; Kok, Marjolein; Fleuren, Margot A H

2015-05-01

To determine the effectiveness of a client or care-provider strategy to improve the implementation of external cephalic version. Cluster randomized controlled trial. Twenty-five clusters; hospitals and their referring midwifery practices randomly selected in the Netherlands. Singleton breech presentation from 32 weeks of gestation onwards. We randomized clusters to a client strategy (written information leaflets and decision aid), a care-provider strategy (1-day counseling course focused on knowledge and counseling skills), a combined client and care-provider strategy and care-as-usual strategy. We performed an intention-to-treat analysis. Rate of external cephalic version in various strategies. Secondary outcomes were the percentage of women counseled and opting for a version attempt. The overall implementation rate of external cephalic version was 72% (1169 of 1613 eligible clients) with a range between clusters of 8-95%. Neither the client strategy (OR 0.8, 95% CI 0.4-1.5) nor the care-provider strategy (OR 1.2, 95% CI 0.6-2.3) showed significant improvements. Results were comparable when we limited the analysis to those women who were actually offered intervention (OR 0.6, 95% CI 0.3-1.4 and OR 2.0, 95% CI 0.7-4.5). Neither a client nor a care-provider strategy improved the external cephalic version implementation rate for breech presentation, neither with regard to the number of version attempts offered nor the number of women accepting the procedure. © 2015 Nordic Federation of Societies of Obstetrics and Gynecology.
Computer object segmentation by nonlinear image enhancement, multidimensional clustering, and geometrically constrained contour optimization

NASA Astrophysics Data System (ADS)

Bruynooghe, Michel M.

1998-04-01

In this paper, we present a robust method for automatic object detection and delineation in noisy complex images. The proposed procedure is a three stage process that integrates image segmentation by multidimensional pixel clustering and geometrically constrained optimization of deformable contours. The first step is to enhance the original image by nonlinear unsharp masking. The second step is to segment the enhanced image by multidimensional pixel clustering, using our reducible neighborhoods clustering algorithm that has a very interesting theoretical maximal complexity. Then, candidate objects are extracted and initially delineated by an optimized region merging algorithm, that is based on ascendant hierarchical clustering with contiguity constraints and on the maximization of average contour gradients. The third step is to optimize the delineation of previously extracted and initially delineated objects. Deformable object contours have been modeled by cubic splines. An affine invariant has been used to control the undesired formation of cusps and loops. Non linear constrained optimization has been used to maximize the external energy. This avoids the difficult and non reproducible choice of regularization parameters, that are required by classical snake models. The proposed method has been applied successfully to the detection of fine and subtle microcalcifications in X-ray mammographic images, to defect detection by moire image analysis, and to the analysis of microrugosities of thin metallic films. The later implementation of the proposed method on a digital signal processor associated to a vector coprocessor would allow the design of a real-time object detection and delineation system for applications in medical imaging and in industrial computer vision.
Procedure of Partitioning Data Into Number of Data Sets or Data Group - A Review

NASA Astrophysics Data System (ADS)

Kim, Tai-Hoon

The goal of clustering is to decompose a dataset into similar groups based on a objective function. Some already well established clustering algorithms are there for data clustering. Objective of these data clustering algorithms are to divide the data points of the feature space into a number of groups (or classes) so that a predefined set of criteria are satisfied. The article considers the comparative study about the effectiveness and efficiency of traditional data clustering algorithms. For evaluating the performance of the clustering algorithms, Minkowski score is used here for different data sets.
A novel procedure on next generation sequencing data analysis using text mining algorithm.

PubMed

Zhao, Weizhong; Chen, James J; Perkins, Roger; Wang, Yuping; Liu, Zhichao; Hong, Huixiao; Tong, Weida; Zou, Wen

2016-05-13

Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data.
Deep Brain Stimulation of the Subthalamic Nucleus Improves Lexical Switching in Parkinsons Disease Patients.

PubMed

Vonberg, Isabelle; Ehlen, Felicitas; Fromm, Ortwin; Kühn, Andrea A; Klostermann, Fabian

2016-01-01

Reduced verbal fluency (VF) has been reported in patients with Parkinson's disease (PD), especially those treated by Deep Brain Stimulation of the subthalamic nucleus (STN DBS). To delineate the nature of this dysfunction we aimed at identifying the particular VF-related operations modified by STN DBS. Eleven PD patients performed VF tasks in their STN DBS ON and OFF condition. To differentiate VF-components modulated by the stimulation, a temporal cluster analysis was performed, separating production spurts (i.e., 'clusters' as correlates of automatic activation spread within lexical fields) from slower cluster transitions (i.e., 'switches' reflecting set-shifting towards new lexical fields). The results were compared to those of eleven healthy control subjects. PD patients produced significantly more switches accompanied by shorter switch times in the STN DBS ON compared to the STN DBS OFF condition. The number of clusters and time intervals between words within clusters were not affected by the treatment state. Although switch behavior in patients with DBS ON improved, their task performance was still lower compared to that of healthy controls. Beyond impacting on motor symptoms, STN DBS seems to influence the dynamics of cognitive procedures. Specifically, the results are in line with basal ganglia roles for cognitive switching, in the particular case of VF, from prevailing lexical concepts to new ones.
Cherry-picking functionally relevant substates from long md trajectories using a stratified sampling approach.

PubMed

Chandramouli, Balasubramanian; Mancini, Giordano

2016-01-01

Classical Molecular Dynamics (MD) simulations can provide insights at the nanoscopic scale into protein dynamics. Currently, simulations of large proteins and complexes can be routinely carried out in the ns-μs time regime. Clustering of MD trajectories is often performed to identify selective conformations and to compare simulation and experimental data coming from different sources on closely related systems. However, clustering techniques are usually applied without a careful validation of results and benchmark studies involving the application of different algorithms to MD data often deal with relatively small peptides instead of average or large proteins; finally clustering is often applied as a means to analyze refined data and also as a way to simplify further analysis of trajectories. Herein, we propose a strategy to classify MD data while carefully benchmarking the performance of clustering algorithms and internal validation criteria for such methods. We demonstrate the method on two showcase systems with different features, and compare the classification of trajectories in real and PCA space. We posit that the prototype procedure adopted here could be highly fruitful in clustering large trajectories of multiple systems or that resulting especially from enhanced sampling techniques like replica exchange simulations. Copyright: © 2016 by Fabrizio Serra editore, Pisa · Roma.
Clumpak: a program for identifying clustering modes and packaging population structure inferences across K.

PubMed

Kopelman, Naama M; Mayzel, Jonathan; Jakobsson, Mattias; Rosenberg, Noah A; Mayrose, Itay

2015-09-01

The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present Clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, Clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software Clumpp. Next, Clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in Clumpp and simplifying the comparison of clustering results across different K values. Clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. Clumpak, available at http://clumpak.tau.ac.il, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology. © 2015 John Wiley & Sons Ltd.
Sensory description of sweet wines obtained by the winemaking procedures of raisining, botrytisation and fortification.

PubMed

González-Álvarez, Mariana; Noguerol-Pato, Raquel; González-Barreiro, Carmen; Cancho-Grande, Beatriz; Simal-Gándara, Jesús

2014-02-15

The effect of winemaking procedures on the sensory modification of sweet wines was investigated. Garnacha Tintorera-based sweet wines were obtained by two different processes: by using raisins for vinification to obtain a naturally sweet wine and by using freshly harvested grapes with the stoppage of the fermentation by the addition of alcohol. Eight international sweet wines were also subjected to sensory analysis for comparative description purposes. Wines were described with a sensory profile by 12 trained panellists on 70 sensory attributes by employing the frequency of citation method. Analysis of variance of the descriptive data confirmed the existence of subtle sensory differences among Garnacha Tintorera-based sweet wines depending on the procedure used for their production. Cluster analysis emphasised discriminated attributes between the Garnacha Tintorera-based and the commercial groups of sweet wines for both those obtained by raisining and by fortification. Several kinds of discriminant functions were used to separate groups of sweet wines--obtained by botrytisation, raisining and fortification--to show the key descriptors that contribute to their separation and define the sensory perception of each type of wine. Copyright © 2013 Elsevier Ltd. All rights reserved.
Challenges in performance of food safety management systems: a case of fish processing companies in Tanzania.

PubMed

Kussaga, Jamal B; Luning, Pieternel A; Tiisekwa, Bendantunguka P M; Jacxsens, Liesbeth

2014-04-01

This study provides insight for food safety (FS) performance in light of the current performance of core FS management system (FSMS) activities and context riskiness of these systems to identify the opportunities for improvement of the FSMS. A FSMS diagnostic instrument was applied to assess the performance levels of FSMS activities regarding context riskiness and FS performance in 14 fish processing companies in Tanzania. Two clusters (cluster I and II) with average FSMS (level 2) operating under moderate-risk context (score 2) were identified. Overall, cluster I had better (score 3) FS performance than cluster II (score 2 to 3). However, a majority of the fish companies need further improvement of their FSMS and reduction of context riskiness to assure good FS performance. The FSMS activity levels could be improved through hygienic design of equipment and facilities, strict raw material control, proper follow-up of critical control point analysis, developing specific sanitation procedures and company-specific sampling design and measuring plans, independent validation of preventive measures, and establishing comprehensive documentation and record-keeping systems. The risk level of the context could be reduced through automation of production processes (such as filleting, packaging, and sanitation) to restrict people's interference, recruitment of permanent high-skilled technological staff, and setting requirements on product use (storage and distribution conditions) on customers. However, such intervention measures for improvement could be taken in phases, starting with less expensive ones (such as sanitation procedures) that can be implemented in the short term to more expensive interventions (setting up assurance activities) to be adopted in the long term. These measures are essential for fish processing companies to move toward FSMS that are more effective.
Density-based clustering: A 'landscape view' of multi-channel neural data for inference and dynamic complexity analysis.

PubMed

Baglietto, Gabriel; Gigante, Guido; Del Giudice, Paolo

2017-01-01

Two, partially interwoven, hot topics in the analysis and statistical modeling of neural data, are the development of efficient and informative representations of the time series derived from multiple neural recordings, and the extraction of information about the connectivity structure of the underlying neural network from the recorded neural activities. In the present paper we show that state-space clustering can provide an easy and effective option for reducing the dimensionality of multiple neural time series, that it can improve inference of synaptic couplings from neural activities, and that it can also allow the construction of a compact representation of the multi-dimensional dynamics, that easily lends itself to complexity measures. We apply a variant of the 'mean-shift' algorithm to perform state-space clustering, and validate it on an Hopfield network in the glassy phase, in which metastable states are largely uncorrelated from memories embedded in the synaptic matrix. In this context, we show that the neural states identified as clusters' centroids offer a parsimonious parametrization of the synaptic matrix, which allows a significant improvement in inferring the synaptic couplings from the neural activities. Moving to the more realistic case of a multi-modular spiking network, with spike-frequency adaptation inducing history-dependent effects, we propose a procedure inspired by Boltzmann learning, but extending its domain of application, to learn inter-module synaptic couplings so that the spiking network reproduces a prescribed pattern of spatial correlations; we then illustrate, in the spiking network, how clustering is effective in extracting relevant features of the network's state-space landscape. Finally, we show that the knowledge of the cluster structure allows casting the multi-dimensional neural dynamics in the form of a symbolic dynamics of transitions between clusters; as an illustration of the potential of such reduction, we define and analyze a measure of complexity of the neural time series.
Effects of bursting dynamic features on the generation of multi-clustered structure of neural network with symmetric spike-timing-dependent plasticity learning rule.

PubMed

Liu, Hui; Song, Yongduan; Xue, Fangzheng; Li, Xiumin

2015-11-01

In this paper, the generation of multi-clustered structure of self-organized neural network with different neuronal firing patterns, i.e., bursting or spiking, has been investigated. The initially all-to-all-connected spiking neural network or bursting neural network can be self-organized into clustered structure through the symmetric spike-timing-dependent plasticity learning for both bursting and spiking neurons. However, the time consumption of this clustering procedure of the burst-based self-organized neural network (BSON) is much shorter than the spike-based self-organized neural network (SSON). Our results show that the BSON network has more obvious small-world properties, i.e., higher clustering coefficient and smaller shortest path length than the SSON network. Also, the results of larger structure entropy and activity entropy of the BSON network demonstrate that this network has higher topological complexity and dynamical diversity, which benefits for enhancing information transmission of neural circuits. Hence, we conclude that the burst firing can significantly enhance the efficiency of clustering procedure and the emergent clustered structure renders the whole network more synchronous and therefore more sensitive to weak input. This result is further confirmed from its improved performance on stochastic resonance. Therefore, we believe that the multi-clustered neural network which self-organized from the bursting dynamics has high efficiency in information processing.
Testing the accuracy of clustering redshifts with simulations

NASA Astrophysics Data System (ADS)

Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.

2018-03-01

We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.

Effects of bursting dynamic features on the generation of multi-clustered structure of neural network with symmetric spike-timing-dependent plasticity learning rule

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Hui; Song, Yongduan; Xue, Fangzheng

In this paper, the generation of multi-clustered structure of self-organized neural network with different neuronal firing patterns, i.e., bursting or spiking, has been investigated. The initially all-to-all-connected spiking neural network or bursting neural network can be self-organized into clustered structure through the symmetric spike-timing-dependent plasticity learning for both bursting and spiking neurons. However, the time consumption of this clustering procedure of the burst-based self-organized neural network (BSON) is much shorter than the spike-based self-organized neural network (SSON). Our results show that the BSON network has more obvious small-world properties, i.e., higher clustering coefficient and smaller shortest path length than themore » SSON network. Also, the results of larger structure entropy and activity entropy of the BSON network demonstrate that this network has higher topological complexity and dynamical diversity, which benefits for enhancing information transmission of neural circuits. Hence, we conclude that the burst firing can significantly enhance the efficiency of clustering procedure and the emergent clustered structure renders the whole network more synchronous and therefore more sensitive to weak input. This result is further confirmed from its improved performance on stochastic resonance. Therefore, we believe that the multi-clustered neural network which self-organized from the bursting dynamics has high efficiency in information processing.« less
Neotectonic control on drainage systems: GIS-based geomorphometric and morphotectonic assessment for Crete, Greece

NASA Astrophysics Data System (ADS)

Argyriou, Athanasios V.; Teeuw, Richard M.; Soupios, Pantelis; Sarris, Apostolos

2017-11-01

Geomorphic indices can be used to examine the geomorphological and tectonic processes responsible for the development of the drainage basins. Such indices can be dependent on tectonics, erosional processes and other factors that control the morphology of the landforms. The inter-relationships between geomorphic indices can determine the influence of regional tectonic activity in the shape development of drainage basins. A Multi-Criteria Decision Analysis (MCDA) procedure has been used to perform an integrated cluster analysis that highlights information associated with the dominant regional tectonic activity. Factor Analysis (FA) and Analytical Hierarchy Process (AHP) were considered within that procedure, producing a representation of the distributed regional tectonic activity of the drainage basins studied. The study area is western Crete, located in the outer fore-arc of the Hellenic subduction zone, one of the world's most tectonically active regions. The results indicate that in the landscape evolution of the study area (especially the western basins) tectonic controls dominate over lithological controls.
Simplified multi-element analysis of ground and instant coffees by ICP-OES and FAAS.

PubMed

Szymczycha-Madeja, Anna; Welna, Maja; Pohl, Pawel

2015-01-01

A simplified alternative to the wet digestion sample preparation procedure for roasted ground and instant coffees has been developed and validated for the determination of different elements by inductively coupled plasma optical emission spectrometry (ICP-OES) (Al, Ba, Cd, Co, Cr, Cu, Mn, Ni, Pb, Sr, Zn) and flame atomic absorption spectrometry (FAAS) (Ca, Fe, K, Mg, Na). The proposed procedure, i.e. the ultrasound-assisted solubilisation in aqua regia, is quite fast and simple, requires minimal use of reagents, and demonstrated good analytical performance, i.e. accuracy from -4.7% to 1.9%, precision within 0.5-8.6% and recovery in the range 93.5-103%. Detection limits of elements were from 0.086 ng ml(-1) (Sr) to 40 ng ml(-1) (Fe). A preliminary classification of 18 samples of ground and instant coffees was successfully made based on concentrations of selected elements and using principal component analysis and hierarchic cluster analysis.
Historic changes in fish assemblage structure in midwestern nonwadeable rivers

USGS Publications Warehouse

Parks, Timothy P.; Quist, Michael C.; Pierce, Clay L.

2014-01-01

Historical change in fish assemblage structure was evaluated in the mainstems of the Des Moines, Iowa, Cedar, Wapsipinicon, and Maquoketa rivers, in Iowa. Fish occurrence data were compared in each river between historical and recent time periods to characterize temporal changes among 126 species distributions and assess spatiotemporal patterns in faunal similarity. A resampling procedure was used to estimate species occurrences in rivers during each assessment period and changes in species occurrence were summarized. Spatiotemporal shifts in species composition were analyzed at the river and river section scale using cluster analysis, pairwise Jaccard's dissimilarities, and analysis of multivariate beta dispersion. The majority of species exhibited either increases or declines in distribution in all rivers with the exception of several “unknown” or inconclusive trends exhibited by species in the Maquoketa River. Cluster analysis identified temporal patterns of similarity among fish assemblages in the Des Moines, Cedar, and Iowa rivers within the historical and recent assessment period indicating a significant change in species composition. Prominent declines of backwater species with phytophilic spawning strategies contributed to assemblage changes occurring across river systems.
Hausdorff clustering

NASA Astrophysics Data System (ADS)

Basalto, Nicolas; Bellotti, Roberto; de Carlo, Francesco; Facchi, Paolo; Pantaleo, Ester; Pascazio, Saverio

2008-10-01

A clustering algorithm based on the Hausdorff distance is analyzed and compared to the single, complete, and average linkage algorithms. The four clustering procedures are applied to a toy example and to the time series of financial data. The dendrograms are scrutinized and their features compared. The Hausdorff linkage relies on firm mathematical grounds and turns out to be very effective when one has to discriminate among complex structures.
Data-driven inference for the spatial scan statistic.

PubMed

Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

2011-08-02

Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Health Occupations Cluster.

ERIC Educational Resources Information Center

Walraven, Catherine; And Others

These instructional materials consist of a series of curriculum worksheets that cover tasks to be mastered by students in health occupations cluster programs. Covered in the curriculum worksheets are diagnostic procedures; observing/recording/reporting/planning; safety; nutrition/elimination; hygiene/personal care/comfort;…
Mathematical Geology

ERIC Educational Resources Information Center

Merriam, Daniel F.

1978-01-01

Geomathematics is a developing field that is being used in practical applications. Classification is an important element and the dynamic-cluster method (DCM), a nonhierarchial procedure, was introduced this past year. A method for testing the degree of cluster distinctness was developed also. (MA)
The Effectiveness of Two Grammar Treatment Procedures for Children with SLI: A Randomized Clinical Trial

ERIC Educational Resources Information Center

Smith-Lock, Karen M.; Leitão, Suze; Prior, Polly; Nickels, Lyndsey

2015-01-01

Purpose: This study compared the effectiveness of two grammar treatment procedures for children with specific language impairment. Method: A double-blind superiority trial with cluster randomization was used to compare a cueing procedure, designed to elicit a correct production following an initial error, to a recasting procedure, which required…
Sequence-structure relationship study in all-α transmembrane proteins using an unsupervised learning approach.

PubMed

Esque, Jérémy; Urbain, Aurélie; Etchebest, Catherine; de Brevern, Alexandre G

2015-11-01

Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.
Multiple imputation by chained equations for systematically and sporadically missing multilevel data.

PubMed

Resche-Rigon, Matthieu; White, Ian R

2018-06-01

In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.
The application of data mining techniques to oral cancer prognosis.

PubMed

Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan

2015-05-01

This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
Predicting lower mantle heterogeneity from 4-D Earth models

NASA Astrophysics Data System (ADS)

Flament, Nicolas; Williams, Simon; Müller, Dietmar; Gurnis, Michael; Bower, Dan J.

2016-04-01

The Earth's lower mantle is characterized by two large-low-shear velocity provinces (LLSVPs), approximately ˜15000 km in diameter and 500-1000 km high, located under Africa and the Pacific Ocean. The spatial stability and chemical nature of these LLSVPs are debated. Here, we compare the lower mantle structure predicted by forward global mantle flow models constrained by tectonic reconstructions (Bower et al., 2015) to an analysis of five global tomography models. In the dynamic models, spanning 230 million years, slabs subducting deep into the mantle deform an initially uniform basal layer containing 2% of the volume of the mantle. Basal density, convective vigour (Rayleigh number Ra), mantle viscosity, absolute plate motions, and relative plate motions are varied in a series of model cases. We use cluster analysis to classify a set of equally-spaced points (average separation ˜0.45°) on the Earth's surface into two groups of points with similar variations in present-day temperature between 1000-2800 km depth, for each model case. Below ˜2400 km depth, this procedure reveals a high-temperature cluster in which mantle temperature is significantly larger than ambient and a low-temperature cluster in which mantle temperature is lower than ambient. The spatial extent of the high-temperature cluster is in first-order agreement with the outlines of the African and Pacific LLSVPs revealed by a similar cluster analysis of five tomography models (Lekic et al., 2012). Model success is quantified by computing the accuracy and sensitivity of the predicted temperature clusters in predicting the low-velocity cluster obtained from tomography (Lekic et al., 2012). In these cases, the accuracy varies between 0.61-0.80, where a value of 0.5 represents the random case, and the sensitivity ranges between 0.18-0.83. The largest accuracies and sensitivities are obtained for models with Ra ≈ 5 x 107, no asthenosphere (or an asthenosphere restricted to the oceanic domain), and a basal layer ˜ 4% denser than ambient mantle. Increasing convective vigour (Ra ≈ 5 x 108) or decreasing the density of the basal layer decreases both the accuracy and sensitivity of the predicted lower mantle structure. References: D. J. Bower, M. Gurnis, N. Flament, Assimilating lithosphere and slab history in 4-D Earth models. Phys. Earth Planet. Inter. 238, 8-22 (2015). V. Lekic, S. Cottaar, A. Dziewonski, B. Romanowicz, Cluster analysis of global lower mantle tomography: A new class of structure and implications for chemical heterogeneity. Earth Planet. Sci. Lett. 357, 68-77 (2012).
Extending cluster Lot Quality Assurance Sampling designs for surveillance programs

PubMed Central

Hund, Lauren; Pagano, Marcello

2014-01-01

Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance based on the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible non-parametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. PMID:24633656
Extending cluster lot quality assurance sampling designs for surveillance programs.

PubMed

Hund, Lauren; Pagano, Marcello

2014-07-20

Lot quality assurance sampling (LQAS) has a long history of applications in industrial quality control. LQAS is frequently used for rapid surveillance in global health settings, with areas classified as poor or acceptable performance on the basis of the binary classification of an indicator. Historically, LQAS surveys have relied on simple random samples from the population; however, implementing two-stage cluster designs for surveillance sampling is often more cost-effective than simple random sampling. By applying survey sampling results to the binary classification procedure, we develop a simple and flexible nonparametric procedure to incorporate clustering effects into the LQAS sample design to appropriately inflate the sample size, accommodating finite numbers of clusters in the population when relevant. We use this framework to then discuss principled selection of survey design parameters in longitudinal surveillance programs. We apply this framework to design surveys to detect rises in malnutrition prevalence in nutrition surveillance programs in Kenya and South Sudan, accounting for clustering within villages. By combining historical information with data from previous surveys, we design surveys to detect spikes in the childhood malnutrition rate. Copyright © 2014 John Wiley & Sons, Ltd.
Empirical Identification of Hierarchies.

ERIC Educational Resources Information Center

McCormick, Douglas; And Others

Outlining a cluster procedure which maximizes specific criteria while building scales from binary measures using a sequential, agglomerative, overlapping, non-hierarchic method results in indices giving truer results than exploratory facotr analyses or multidimensional scaling. In a series of eleven figures, patterns within cluster histories…
A new approach to spike sorting for multi-neuronal activities recorded with a tetrode--how ICA can be practical.

PubMed

Takahashi, Susumu; Anzai, Yuichiro; Sakurai, Yoshio

2003-07-01

Multi-neuronal recording with a tetrode is a powerful technique to reveal neuronal interactions in local circuits. However, it is difficult to detect precise spike timings among closely neighboring neurons because the spike waveforms of individual neurons overlap on the electrode when more than two neurons fire simultaneously. In addition, the spike waveforms of single neurons, especially in the presence of complex spikes, are often non-stationary. These problems limit the ability of ordinary spike sorting to sort multi-neuronal activities recorded using tetrodes into their single-neuron components. Though sorting with independent component analysis (ICA) can solve these problems, it has one serious limitation that the number of separated neurons must be less than the number of electrodes. Using a combination of ICA and the efficiency of ordinary spike sorting technique (k-means clustering), we developed an automatic procedure to solve the spike-overlapping and the non-stationarity problems with no limitation on the number of separated neurons. The results for the procedure applied to real multi-neuronal data demonstrated that some outliers which may be assigned to distinct clusters if ordinary spike-sorting methods were used can be identified as overlapping spikes, and that there are functional connections between a putative pyramidal neuron and its putative dendrite. These findings suggest that the combination of ICA and k-means clustering can provide insights into the precise nature of functional circuits among neurons, i.e. cell assemblies.
Advances in Significance Testing for Cluster Detection

NASA Astrophysics Data System (ADS)

Coleman, Deidra Andrea

Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).
A local search for a graph clustering problem

NASA Astrophysics Data System (ADS)

Navrotskaya, Anna; Il'ev, Victor

2016-10-01

In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.
Washington photometry of 14 intermediate-age to old star clusters in the Small Magellanic Cloud

NASA Astrophysics Data System (ADS)

Piatti, Andrés E.; Clariá, Juan J.; Bica, Eduardo; Geisler, Doug; Ahumada, Andrea V.; Girardi, Léo

2011-10-01

We present CCD photometry in the Washington system C, T1 and T2 passbands down to T1˜ 23 in the fields of L3, L28, HW 66, L100, HW 79, IC 1708, L106, L108, L109, NGC 643, L112, HW 84, HW 85 and HW 86, 14 Small Magellanic Cloud (SMC) clusters, most of them poorly studied objects. We measured T1 magnitudes and C-T1 and T1-T2 colours for a total of 213 516 stars spread throughout cluster areas of 14.7 × 14.7 arcmin2 each. We carried out an in-depth analysis of the field star contamination of the colour-magnitude diagrams (CMDs) and statistically cleaned the cluster CMDs. Based on the best fits of isochrones computed by the Padova group to the (T1, C-T1) CMDs, as well as from the δ(T1) index and the standard giant branch procedure, we derived ages and metallicities for the cluster sample. With the exception of IC 1708, a relatively metal-poor Hyades-age cluster, the remaining 13 objects are between intermediate and old age (from 1.0 to 6.3 Gyr), their [Fe/H] values ranging from -1.4 to -0.7 dex. By combining these results with others available in the literature, we compiled a sample of 43 well-known SMC clusters older than 1 Gyr, with which we produced a revised age distribution. We found that the present clusters' age distribution reveals two primary excesses of clusters at t˜ 2 and 5 Gyr, which engraves the SMC with clear signs of enhanced formation episodes at both ages. In addition, we found that from the birth of the SMC cluster system until approximately the first 4 Gyr of its lifetime, the cluster formation resembles that of a constant formation rate scenario.

The COMPTEL Processing and Analysis Software system (COMPASS)

NASA Astrophysics Data System (ADS)

de Vries, C. P.; COMPTEL Collaboration

The data analysis system of the gamma-ray Compton Telescope (COMPTEL) onboard the Compton-GRO spacecraft is described. A continous stream of data of the order of 1 kbytes per second is generated by the instrument. The data processing and analysis software is build around a relational database managment system (RDBMS) in order to be able to trace heritage and processing status of all data in the processing pipeline. Four institutes cooperate in this effort requiring procedures to keep local RDBMS contents identical between the sites and swift exchange of data using network facilities. Lately, there has been a gradual move of the system from central processing facilities towards clusters of workstations.
Fast ground filtering for TLS data via Scanline Density Analysis

NASA Astrophysics Data System (ADS)

Che, Erzhuo; Olsen, Michael J.

2017-07-01

Terrestrial Laser Scanning (TLS) efficiently collects 3D information based on lidar (light detection and ranging) technology. TLS has been widely used in topographic mapping, engineering surveying, forestry, industrial facilities, cultural heritage, and so on. Ground filtering is a common procedure in lidar data processing, which separates the point cloud data into ground points and non-ground points. Effective ground filtering is helpful for subsequent procedures such as segmentation, classification, and modeling. Numerous ground filtering algorithms have been developed for Airborne Laser Scanning (ALS) data. However, many of these are error prone in application to TLS data because of its different angle of view and highly variable resolution. Further, many ground filtering techniques are limited in application within challenging topography and experience difficulty coping with some objects such as short vegetation, steep slopes, and so forth. Lastly, due to the large size of point cloud data, operations such as data traversing, multiple iterations, and neighbor searching significantly affect the computation efficiency. In order to overcome these challenges, we present an efficient ground filtering method for TLS data via a Scanline Density Analysis, which is very fast because it exploits the grid structure storing TLS data. The process first separates the ground candidates, density features, and unidentified points based on an analysis of point density within each scanline. Second, a region growth using the scan pattern is performed to cluster the ground candidates and further refine the ground points (clusters). In the experiment, the effectiveness, parameter robustness, and efficiency of the proposed method is demonstrated with datasets collected from an urban scene and a natural scene, respectively.
Identification of subsurface microorganisms at Yucca Mountain; Third quarterly report, January 1, 1994--March 31, 1994

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stetzenbach, L.D.

1994-05-01

Bacteria isolated from ground water samples taken from 31 springs during 1993 were collected and processed according to procedures described in earlier reports. These procedures required aseptic collection of surface water samples in sterile screw-capped containers, transportation to the HRC microbiology laboratory, and culture by spread plating onto R2A medium. The isolates were further processed for identification using a gas chromatographic analysis of fatty acid methyl esters (FAME) extracted from cell membranes. This work generated a presumptive identification of 113 bacterial species distributed among 45 genera using a database obtained from Microbial ID, Inc., Newark, Delaware (MIDI). A preliminary examinationmore » of the FAME data was accomplished using cluster analysis and principal component analysis software obtained from MIDI. Typically, bacterial strains that cluster at less than 10 Euclidian distance units have fatty acid patterns consistent among members of the same species. Thus an organism obtained from one source can be recognized if it is isolated again from the same or any other source. This makes it possible to track the distribution of organisms and monitor environmental conditions or fluid transport mechanisms. Microorganisms are seldom found as monocultures in natural environments. They are more likely to be closely associated with other genera with complementary metabolic requirements. An understanding of the indigenous microorganism population is useful in understanding subtle changes in the environment. However, classification of environmental organisms using traditional methods is not ideal because differentiation of species with small variations or genera with very similar taxonomic characteristics is beyond the capabilities of traditional microbiological methods.« less
Molecular counting of membrane receptor subunits with single-molecule localization microscopy

NASA Astrophysics Data System (ADS)

Krüger, Carmen; Fricke, Franziska; Karathanasis, Christos; Dietz, Marina S.; Malkusch, Sebastian; Hummer, Gerhard; Heilemann, Mike

2017-02-01

We report on quantitative single-molecule localization microscopy, a method that next to super-resolved images of cellular structures provides information on protein copy numbers in protein clusters. This approach is based on the analysis of blinking cycles of single fluorophores, and on a model-free description of the distribution of the number of blinking events. We describe the experimental and analytical procedures, present cellular data of plasma membrane proteins and discuss the applicability of this method.
Visual Pattern Analysis in Histopathology Images Using Bag of Features

NASA Astrophysics Data System (ADS)

Cruz-Roa, Angel; Caicedo, Juan C.; González, Fabio A.

This paper presents a framework to analyse visual patterns in a collection of medical images in a two stage procedure. First, a set of representative visual patterns from the image collection is obtained by constructing a visual-word dictionary under a bag-of-features approach. Second, an analysis of the relationships between visual patterns and semantic concepts in the image collection is performed. The most important visual patterns for each semantic concept are identified using correlation analysis. A matrix visualization of the structure and organization of the image collection is generated using a cluster analysis. The experimental evaluation was conducted on a histopathology image collection and results showed clear relationships between visual patterns and semantic concepts, that in addition, are of easy interpretation and understanding.
Optimized Clustering Estimators for BAO Measurements Accounting for Significant Redshift Uncertainty

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ross, Ashley J.; Banik, Nilanjan; Avila, Santiago

2017-05-15

We determine an optimized clustering statistic to be used for galaxy samples with significant redshift uncertainty, such as those that rely on photometric redshifts. To do so, we study the BAO information content as a function of the orientation of galaxy clustering modes with respect to their angle to the line-of-sight (LOS). The clustering along the LOS, as observed in a redshift-space with significant redshift uncertainty, has contributions from clustering modes with a range of orientations with respect to the true LOS. For redshift uncertaintymore » $$\\sigma_z \\geq 0.02(1+z)$$ we find that while the BAO information is confined to transverse clustering modes in the true space, it is spread nearly evenly in the observed space. Thus, measuring clustering in terms of the projected separation (regardless of the LOS) is an efficient and nearly lossless compression of the signal for $$\\sigma_z \\geq 0.02(1+z)$$. For reduced redshift uncertainty, a more careful consideration is required. We then use more than 1700 realizations of galaxy simulations mimicking the Dark Energy Survey Year 1 sample to validate our analytic results and optimized analysis procedure. We find that using the correlation function binned in projected separation, we can achieve uncertainties that are within 10 per cent of of those predicted by Fisher matrix forecasts. We predict that DES Y1 should achieve a 5 per cent distance measurement using our optimized methods. We expect the results presented here to be important for any future BAO measurements made using photometric redshift data.« less
Optimized clustering estimators for BAO measurements accounting for significant redshift uncertainty

NASA Astrophysics Data System (ADS)

Ross, Ashley J.; Banik, Nilanjan; Avila, Santiago; Percival, Will J.; Dodelson, Scott; Garcia-Bellido, Juan; Crocce, Martin; Elvin-Poole, Jack; Giannantonio, Tommaso; Manera, Marc; Sevilla-Noarbe, Ignacio

2017-12-01

We determine an optimized clustering statistic to be used for galaxy samples with significant redshift uncertainty, such as those that rely on photometric redshifts. To do so, we study the baryon acoustic oscillation (BAO) information content as a function of the orientation of galaxy clustering modes with respect to their angle to the line of sight (LOS). The clustering along the LOS, as observed in a redshift-space with significant redshift uncertainty, has contributions from clustering modes with a range of orientations with respect to the true LOS. For redshift uncertainty σz ≥ 0.02(1 + z), we find that while the BAO information is confined to transverse clustering modes in the true space, it is spread nearly evenly in the observed space. Thus, measuring clustering in terms of the projected separation (regardless of the LOS) is an efficient and nearly lossless compression of the signal for σz ≥ 0.02(1 + z). For reduced redshift uncertainty, a more careful consideration is required. We then use more than 1700 realizations (combining two separate sets) of galaxy simulations mimicking the Dark Energy Survey Year 1 (DES Y1) sample to validate our analytic results and optimized analysis procedure. We find that using the correlation function binned in projected separation, we can achieve uncertainties that are within 10 per cent of those predicted by Fisher matrix forecasts. We predict that DES Y1 should achieve a 5 per cent distance measurement using our optimized methods. We expect the results presented here to be important for any future BAO measurements made using photometric redshift data.
An Intercomparison Between Radar Reflectivity and the IR Cloud Classification Technique for the TOGA-COARE Area

NASA Technical Reports Server (NTRS)

Carvalho, L. M. V.; Rickenbach, T.

1999-01-01

Satellite infrared (IR) and visible (VIS) images from the Tropical Ocean Global Atmosphere - Coupled Ocean Atmosphere Response Experiment (TOGA-COARE) experiment are investigated through the use of Clustering Analysis. The clusters are obtained from the values of IR and VIS counts and the local variance for both channels. The clustering procedure is based on the standardized histogram of each variable obtained from 179 pairs of images. A new approach to classify high clouds using only IR and the clustering technique is proposed. This method allows the separation of the enhanced convection in two main classes: convective tops, more closely related to the most active core of the storm, and convective systems, which produce regions of merged, thick anvil clouds. The resulting classification of different portions of cloudiness is compared to the radar reflectivity field for intensive events. Convective Systems and Convective Tops are followed during their life cycle using the IR clustering method. The areal coverage of precipitation and features related to convective and stratiform rain is obtained from the radar for each stage of the evolving Mesoscale Convective Systems (MCS). In order to compare the IR clustering method with a simple threshold technique, two IR thresholds (Tir) were used to identify different portions of cloudiness, Tir=240K which roughly defines the extent of all cloudiness associated with the MCS, and Tir=220K which indicates the presence of deep convection. It is shown that the IR clustering technique can be used as a simple alternative to identify the actual portion of convective and stratiform rainfall.
Identifying knowledge activism in worker health and safety representation: A cluster analysis.

PubMed

Hall, Alan; Oudyk, John; King, Andrew; Naqvi, Syed; Lewchuk, Wayne

2016-01-01

Although worker representation in OHS has been widely recognized as contributing to health and safety improvements at work, few studies have examined the role that worker representatives play in this process. Using a large quantitative sample, this paper seeks to confirm findings from an earlier exploratory qualitative study that worker representatives can be differentiated by the knowledge intensive tactics and strategies that they use to achieve changes in their workplace. Just under 900 worker health and safety representatives in Ontario completed surveys which asked them to report on the amount of time they devoted to different types of representation activities (i.e., technical activities such as inspections and report writing vs. political activities such as mobilizing workers to build support), the kinds of conditions or hazards they tried to address through their representation (e.g., housekeeping vs. modifications in ventilation systems), and their reported success in making positive improvements. A cluster analysis was used to determine whether the worker representatives could be distinguished in terms of the relative time devoted to different activities and the clusters were then compared with reference to types of intervention efforts and outcomes. The cluster analysis identified three distinct groupings of representatives with significant differences in reported types of interventions and in their level of reported impact. Two of the clusters were consistent with the findings in the exploratory study, identified as knowledge activism for greater emphasis on knowledge based political activity and technical-legal representation for greater emphasis on formalized technical oriented procedures and legal regulations. Knowledge activists were more likely to take on challenging interventions and they reported more impact across the full range of interventions. This paper provides further support for the concepts of knowledge activism and technical-legal representation when differentiating the strategic orientations and impact of worker health and safety representatives, with important implications for education, political support and recruitment. © 2015 Wiley Periodicals, Inc.
The Gaia-ESO Survey. Mg-Al anti-correlation in iDR4 globular clusters

NASA Astrophysics Data System (ADS)

Pancino, E.; Romano, D.; Tang, B.; Tautvaišienė, G.; Casey, A. R.; Gruyters, P.; Geisler, D.; San Roman, I.; Randich, S.; Alfaro, E. J.; Bragaglia, A.; Flaccomio, E.; Korn, A. J.; Recio-Blanco, A.; Smiljanic, R.; Carraro, G.; Bayo, A.; Costado, M. T.; Damiani, F.; Jofré, P.; Lardo, C.; de Laverny, P.; Monaco, L.; Morbidelli, L.; Sbordone, L.; Sousa, S. G.; Villanova, S.

2017-05-01

We use Gaia-ESO (GES) Survey iDR4 data to explore the Mg-Al anti-correlation in globular clusters that were observed as calibrators, as a demonstration of the quality of Gaia-ESO Survey data and analysis. The results compare well with the available literature, within 0.1 dex or less, after a small (compared to the internal spreads) offset between the UVES and GIRAFFE data of 0.10-0.15 dex was taken into account. In particular, for the first time we present data for NGC 5927, which is one of the most metal-rich globular clusters studied in the literature so far with [ Fe / H ] = - 0.39 ± 0.04 dex; this cluster was included to connect with the open cluster regime in the Gaia-ESO Survey internal calibration. The extent and shape of the Mg-Al anti-correlation provide strong constraints on the multiple population phenomenon in globular clusters. In particular, we studied the dependency of the Mg-Al anti-correlation extension with metallicity, present-day mass,and age of the clusters, using GES data in combination with a large set of homogenized literature measurements.We find a dependency with both metallicity and mass, which is evident when fitting for the two parameters simultaneously, but we do not find significant dependency with age. We confirm that the Mg-Al anti-correlation is not seen in all clusters, but disappears for the less massive or most metal-rich clusters. We also use our data set to see whether a normal anti-correlation would explain the low [Mg/α] observed in some extragalactic globular clusters, but find that none of the clusters in our sample can reproduce it; a more extreme chemical composition, such as that of NGC 2419, would be required. We conclude that GES iDR4 data already meet the requirements set by the main survey goals and can be used to study globular clusters in detail, even if the analysis procedures were not specifically designed for them. Based on data products from observations made with ESO Telescopes at the La Silla Paranal Observatory under programme ID 188.B-3002.Full Table 2 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/601/A112
Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

PubMed

Balouchestani, Mohammadreza; Krishnan, Sridhar

2014-01-01

Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Newspaper coverage of suicide and initiation of suicide clusters in teenagers in the USA, 1988-96: a retrospective, population-based, case-control study.

PubMed

Gould, Madelyn S; Kleinman, Marjorie H; Lake, Alison M; Forman, Judith; Midle, Jennifer Bassett

2014-06-01

Public health and clinical efforts to prevent suicide clusters are seriously hampered by the unanswered question of why such outbreaks occur. We aimed to establish whether an environmental factor-newspaper reports of suicide-has a role in the emergence of suicide clusters. In this retrospective, population-based, case-control study, we identified suicide clusters in young people aged 13-20 years in the USA from 1988 to 1996 (preceding the advent of social media) using the time-space Scan statistic. For each cluster community, we selected two matched non-cluster control communities in which suicides of similarly aged youth occurred, from non-contiguous counties within the same state as the cluster. We examined newspapers within each cluster community for stories about suicide published in the days between the first and second suicides in the cluster. In non-cluster communities, we examined a matched length of time after the matched control suicide. We used a content-analysis procedure to code the characteristics of each story and compared newspaper stories about suicide published in case and control communities with mixed-effect regression analyses. We identified 53 suicide clusters, of which 48 were included in the media review. For one cluster we could identify only one appropriate control; therefore, 95 matched control communities were included. The mean number of news stories about suicidal individuals published after an index cluster suicide (7·42 [SD 10·02]) was significantly greater than the mean number of suicide stories published after a non-cluster suicide (5·14 [6.00]; p<0·0001). Several story characteristics, including front-page placement, headlines containing the word suicide or a description of the method used, and detailed descriptions of the suicidal individual and act, appeared more often in stories published after the index cluster suicides than after non-cluster suicides. Our identification of an association between newspaper reports about suicide (including specific story characteristics) and the initiation of teenage suicide clusters should provide an empirical basis to support efforts by mental health professionals, community officials, and the media to work together to identify and prevent the onset of suicide clusters. US National Institute of Mental Health and American Foundation for Suicide Prevention. Copyright © 2014 Elsevier Ltd. All rights reserved.
Training a Network of Electronic Neurons for Control of a Mobile Robot

NASA Astrophysics Data System (ADS)

Vromen, T. G. M.; Steur, E.; Nijmeijer, H.

An adaptive training procedure is developed for a network of electronic neurons, which controls a mobile robot driving around in an unknown environment while avoiding obstacles. The neuronal network controls the angular velocity of the wheels of the robot based on the sensor readings. The nodes in the neuronal network controller are clusters of neurons rather than single neurons. The adaptive training procedure ensures that the input-output behavior of the clusters is identical, even though the constituting neurons are nonidentical and have, in isolation, nonidentical responses to the same input. In particular, we let the neurons interact via a diffusive coupling, and the proposed training procedure modifies the diffusion interaction weights such that the neurons behave synchronously with a predefined response. The working principle of the training procedure is experimentally validated and results of an experiment with a mobile robot that is completely autonomously driving in an unknown environment with obstacles are presented.
The effect of genotype on methotrexate polyglutamate variability in juvenile idiopathic arthritis and association with drug response.

PubMed

Becker, Mara L; Gaedigk, Roger; van Haandel, Leon; Thomas, Bradley; Lasky, Andrew; Hoeltzel, Mark; Dai, Hongying; Stobaugh, John; Leeder, J Steven

2011-01-01

The response to and toxicity of methotrexate (MTX) are unpredictable in patients with juvenile idiopathic arthritis (JIA). Intracellular polyglutamation of MTX, assessed by measuring concentrations of MTX polyglutamates (MTXGlu), has been demonstrated to be a promising predictor of drug response. Therefore, this study was aimed at investigating the genetic predictors of MTXGlu variability and associations between MTXGlu and drug response in JIA. The study was designed as a single-center cross-sectional analysis of patients with JIA who were receiving stable doses of MTX at a tertiary care children's hospital. After informed consent was obtained from the 104 patients with JIA, blood was withdrawn during routine MTX-screening laboratory testing. Clinical data were collected by chart review. Genotyping for 34 single-nucleotide polymorphisms (SNPs) in 18 genes within the MTX metabolic pathway was performed. An ion-pair chromatographic procedure with mass spectrometric detection was used to measure MTXGlu1-7. Analysis and genotyping of MTXGlu was completed in the 104 patients. K-means clustering resulted in 3 distinct patterns of MTX polyglutamation. Cluster 1 had low red blood cell (RBC) MTXGlu concentrations, cluster 2 had moderately high RBC MTXGlu1+2 concentrations, and cluster 3 had high concentrations of MTXGlu, specifically MTXGlu3-5. SNPs in the purine and pyrimidine synthesis pathways, as well as the adenosine pathway, were significantly associated with cluster subtype. The cluster with high concentrations of MTXGlu3-5 was associated with elevated liver enzyme levels on liver function tests (LFTs), and there were higher concentrations of MTXGlu3-5 in children who reported gastrointestinal side effects and had abnormal findings on LFTs. No association was noted between MTXGlu and active arthritis. MTXGlu remains a potentially useful tool for determining outcomes in patients with JIA being treated with MTX. The genetic predictors of MTXGlu variability may also contribute to a better understanding of the intracellular biotransformation of MTX in these patients. Copyright © 2011 by the American College of Rheumatology.
Procedural Guide for Designation Surveys of Ocean Dredged Material Disposal Sites. Revision

DTIC Science & Technology

1990-04-01

data standardization." One of the most frequently used clustering strategies is called UPGMA (unweighted pair-group method using arithmetic averages...Sneath and Sokal 1973). Romesburg (1984) 151 evaluated many possible methods and concluded that UPGMA is appropriate for most types of cluster
Origin and evolution of the Perm Anomaly

NASA Astrophysics Data System (ADS)

Flament, N. E.; Williams, S.; Müller, D.; Gurnis, M.; Bower, D. J.

2016-12-01

Earth's lower mantle is characterized by two large-low-shear velocity provinces (LLSVPs, 15000 km in diameter, 500-1000 km high) located under Africa and the Pacific Ocean. In addition, a single, much smaller ( 1000 km in diameter, 500 km high) deep mantle structure named the "Perm Anomaly" was recently identified through the analysis of seismic tomography models. This discovery challenges current reconstructions of the evolution of the plate-mantle system that invoke plumes rising from the edges of the two LLSVPs, assumed spatially fixed and non-deforming in time. Here, we present mantle flow models constrained by tectonic reconstructions that reproduce the present-day structure of the lower mantle, and show a Perm-like anomaly. In the dynamic models, spanning 230 Myr, subducting slabs deform an initially uniform basal layer containing 2% of the volume of the mantle. Basal density, convective vigour, mantle viscosity, absolute plate motions, and relative plate motions are varied in a series of model cases. We use cluster analysis to classify equally-spaced points on Earth's surface into two groups with similar variations in present-day temperature between 1000-2800 km depth, for each model case. The procedure reveals a high-temperature cluster and a low-temperature cluster with respect to ambient mantle temperature below 2400 km depth. The spatial extent of the high-temperature cluster is in first-order agreement with the outlines of the LLSVPs and of the Perm Anomaly revealed by a similar cluster analysis of seven tomography models. Model success is quantified by computing the accuracy (between 0.56 and 0.76) of the temperature clusters in predicting the low-velocity cluster obtained from tomography, and qualified by the occurrence of a separate Perm-like anomaly. The anomaly formed in isolation prior to 150 Ma within a long-lived subduction network 22000 km in circumference composed of the Mongol-Okhotsk subduction along Eurasia to the west, northern Tethys subduction to the south, and east Asia subduction to the east, then migrated 2500 km westward at an average rate of 1.7 cm/yr, indicating a greater mobility of deep mantle structures than previously recognized. We infer that the mobile Perm Anomaly could be linked to the Emeishan volcanics, in contrast to the previously proposed Siberian Traps.
Skin movement artefact assessment and compensation in the estimation of knee-joint kinematics.

PubMed

Lucchetti, L; Cappozzo, A; Cappello, A; Della Croce, U

1998-11-01

In three dimensional (3-D) human movement analysis using close-range photogrammetry, surface marker clusters deform and rigidly move relative to the underlying bone. This introduces an important artefact (skin movement artefact) which propagates to bone position and orientation and joint kinematics estimates. This occurs to the extent that those joint attitude components that undergo small variations result in totally unreliable values. This paper presents an experimental and analytical procedure, to be included in a subject-specific movement analysis protocol, which allows for the assessment of skin movement artefacts and, based on this knowledge, for their compensation. The effectiveness of this procedure was verified with reference to knee-joint kinematics and to the artefacts caused by the hip movements on markers located on the thigh surface. Quantitative validation was achieved through experimental paradigms whereby prior reliable information on the target joint kinematics was available. When position and orientation of bones were determined during the execution of a motor task, using a least-squares optimal estimator, but the rigid artefactual marker cluster movement was not dealt with, then knee joint translations and rotations were affected by root mean square errors (r.m.s.) up to 14 mm and 6 degrees, respectively. When the rigid artefactual movement was also compensated for, then r.m.s errors were reduced to less than 4 mm and 3 degrees, respectively. In addition, errors originally strongly correlated with hip rotations, after compensation, lost this correlation.
Morphometric and kinematic sperm subpopulations in split ejaculates of normozoospermic men

PubMed Central

Santolaria, Pilar; Soler, Carles; Recreo, Pilar; Carretero, Teresa; Bono, Araceli; Berné, José M; Yániz, Jesús L

2016-01-01

This study was designed to analyze the sperm kinematic and morphometric subpopulations in the different fractions of the ejaculate in normozoospermic men. Ejaculates from eight normozoospermic men were collected by masturbation in three fractions after 3–5 days of sexual abstinence. Analyses of sperm motility by computer-assisted sperm analysis (CASA-Mot), and of sperm morphometry by computer-assisted sperm morphometry analysis (CASA-Morph) using fluorescence were performed. Clustering and discriminant procedures were performed to identify sperm subpopulations in the kinematic and morphometric data obtained. Clustering procedures resulted in the classification of spermatozoa into three kinematic subpopulations (slow with low ALH [35.6% of all motile spermatozoa], with circular trajectories [32.0%], and rapid with high ALH [32.4%]), and three morphometric subpopulations (large-round [33.9% of all spermatozoa], elongated [32.0%], and small [34.10%]). The distribution of kinematic sperm subpopulations was different among ejaculate fractions (P < 0.001), with higher percentages of spermatozoa exhibiting slow movements with low ALH in the second and third portions, and with a more homogeneous distribution of kinematic sperm subpopulations in the first portion. The distribution of morphometric sperm subpopulations was also different among ejaculate fractions (P < 0.001), with more elongated spermatozoa in the first, and of small spermatozoa in the third, portion. It is concluded that important variations in the distribution of kinematic and morphometric sperm subpopulations exist between ejaculate fractions, with possible functional implications. PMID:27624985
Discrimination and chemical phylogenetic study of seven species of Dendrobium using infrared spectroscopy combined with cluster analysis

NASA Astrophysics Data System (ADS)

Luo, Congpei; He, Tao; Chun, Ze

2013-04-01

Dendrobium is a commonly used and precious herb in Traditional Chinese Medicine. The high biodiversity of Dendrobium and the therapeutic needs require tools for the correct and fast discrimination of different Dendrobium species. This study investigates Fourier transform infrared spectroscopy followed by cluster analysis for discrimination and chemical phylogenetic study of seven Dendrobium species. Despite the general pattern of the IR spectra, different intensities, shapes, peak positions were found in the IR spectra of these samples, especially in the range of 1800-800 cm-1. The second derivative transformation and alcoholic extracting procedure obviously enlarged the tiny spectral differences among these samples. The results indicated each Dendrobium species had a characteristic IR spectra profile, which could be used to discriminate them. The similarity coefficients among the samples were analyzed based on their second derivative IR spectra, which ranged from 0.7632 to 0.9700, among the seven Dendrobium species, and from 0.5163 to 0.9615, among the ethanol extracts. A dendrogram was constructed based on cluster analysis the IR spectra for studying the chemical phylogenetic relationships among the samples. The results indicated that D. denneanum and D. crepidatum could be the alternative resources to substitute D. chrysotoxum, D. officinale and D. nobile which were officially recorded in Chinese Pharmacopoeia. In conclusion, with the advantages of high resolution, speediness and convenience, the experimental approach can successfully discriminate and construct the chemical phylogenetic relationships of the seven Dendrobium species.
An optical catalog of galaxy clusters obtained from an adaptive matched filter finder applied to SDSS DR9 data

NASA Astrophysics Data System (ADS)

Banerjee, P.; Szabo, T.; Pierpaoli, E.; Franco, G.; Ortiz, M.; Oramas, A.; Tornello, B.

2018-01-01

We present a new galaxy cluster catalog constructed from the Sloan Digital Sky Survey Data Release 9 (SDSS DR9) using an Adaptive Matched Filter (AMF) technique. Our catalog has 46,479 galaxy clusters with richness Λ200 > 20 in the redshift range 0.045 ≤ z < 0.641 in ∼11,500 deg2 of the sky. Angular position, richness, core and virial radii and redshift estimates for these clusters, as well as their error analysis, are provided as part of this catalog. In addition to the main version of the catalog, we also provide an extended version with a lower richness cut, containing 79,368 clusters. This version, in addition to the clusters in the main catalog, also contains those clusters (with richness 10 < Λ200 < 20) which have a one-to-one match in the DR8 catalog developed by Wen et al.(WHL). We obtain probabilities for cluster membership for each galaxy and implement several procedures for the identification and removal of false cluster detections. We cross-correlate the main AMF DR9 catalog with a number of cluster catalogs in different wavebands (Optical, X-ray). We compare our catalog with other SDSS-based ones such as the redMaPPer (26,350 clusters) and the Wen et al. (WHL) (132,684 clusters) in the same area of the sky and in the overlapping redshift range. We match 97% of the richest Abell clusters (Richness group 3), the same as WHL, while redMaPPer matches ∼ 90% of these clusters. Considering AMF DR9 richness bins, redMaPPer does not have one-to-one matches for 70% of our lowest richness clusters (20 < Λ200 < 40), while WHL matches 54% of these missed clusters (not present in redMaPPer). redMaPPer consistently does not possess one-to-one matches for ∼ 20% AMF DR9 clusters with Λ200 > 40, while WHL matches ≥ 70% of these missed clusters on average. For comparisons with X-ray clusters, we match the AMF catalog with BAX, MCXC and a combined catalog from NORAS and REFLEX. We consistently obtain a greater number of one-to-one matches for X-ray clusters across higher luminosity bins (Lx > 6 × 1044 ergs/sec) than redMaPPer while WHL matches the most clusters overall. For the most luminous clusters (Lx > 8), our catalog performs equivalently to WHL. This new catalog provides a wider sample than redMaPPer while retaining many fewer objects than WHL.

EnzDP: improved enzyme annotation for metabolic network reconstruction based on domain composition profiles.

PubMed

Nguyen, Nam-Ninh; Srihari, Sriganesh; Leong, Hon Wai; Chong, Ket-Fah

2015-10-01

Determining the entire complement of enzymes and their enzymatic functions is a fundamental step for reconstructing the metabolic network of cells. High quality enzyme annotation helps in enhancing metabolic networks reconstructed from the genome, especially by reducing gaps and increasing the enzyme coverage. Currently, structure-based and network-based approaches can only cover a limited number of enzyme families, and the accuracy of homology-based approaches can be further improved. Bottom-up homology-based approach improves the coverage by rebuilding Hidden Markov Model (HMM) profiles for all known enzymes. However, its clustering procedure relies firmly on BLAST similarity score, ignoring protein domains/patterns, and is sensitive to changes in cut-off thresholds. Here, we use functional domain architecture to score the association between domain families and enzyme families (Domain-Enzyme Association Scoring, DEAS). The DEAS score is used to calculate the similarity between proteins, which is then used in clustering procedure, instead of using sequence similarity score. We improve the enzyme annotation protocol using a stringent classification procedure, and by choosing optimal threshold settings and checking for active sites. Our analysis shows that our stringent protocol EnzDP can cover up to 90% of enzyme families available in Swiss-Prot. It achieves a high accuracy of 94.5% based on five-fold cross-validation. EnzDP outperforms existing methods across several testing scenarios. Thus, EnzDP serves as a reliable automated tool for enzyme annotation and metabolic network reconstruction. Available at: www.comp.nus.edu.sg/~nguyennn/EnzDP .
Nonlinear dimensionality reduction of data lying on the multicluster manifold.

PubMed

Meng, Deyu; Leung, Yee; Fung, Tung; Xu, Zongben

2008-08-01

A new method, which is called decomposition-composition (D-C) method, is proposed for the nonlinear dimensionality reduction (NLDR) of data lying on the multicluster manifold. The main idea is first to decompose a given data set into clusters and independently calculate the low-dimensional embeddings of each cluster by the decomposition procedure. Based on the intercluster connections, the embeddings of all clusters are then composed into their proper positions and orientations by the composition procedure. Different from other NLDR methods for multicluster data, which consider associatively the intracluster and intercluster information, the D-C method capitalizes on the separate employment of the intracluster neighborhood structures and the intercluster topologies for effective dimensionality reduction. This, on one hand, isometrically preserves the rigid-body shapes of the clusters in the embedding process and, on the other hand, guarantees the proper locations and orientations of all clusters. The theoretical arguments are supported by a series of experiments performed on the synthetic and real-life data sets. In addition, the computational complexity of the proposed method is analyzed, and its efficiency is theoretically analyzed and experimentally demonstrated. Related strategies for automatic parameter selection are also examined.
A cross-species bi-clustering approach to identifying conserved co-regulated genes.

PubMed

Sun, Jiangwen; Jiang, Zongliang; Tian, Xiuchun; Bi, Jinbo

2016-06-15

A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/ jinbo@engr.uconn.edu. © The Author 2016. Published by Oxford University Press.
A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis

PubMed Central

2013-01-01

Background Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Results Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. Conclusion There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic contamination in terrestrial ecosystems exposed to different kinds of anthropic polution. PMID:23987502
Fully Automated Single-Zone Elliptic Grid Generation for Mars Science Laboratory (MSL) Aeroshell and Canopy Geometries

NASA Technical Reports Server (NTRS)

kaul, Upender K.

2008-01-01

A procedure for generating smooth uniformly clustered single-zone grids using enhanced elliptic grid generation has been demonstrated here for the Mars Science Laboratory (MSL) geometries such as aeroshell and canopy. The procedure obviates the need for generating multizone grids for such geometries, as reported in the literature. This has been possible because the enhanced elliptic grid generator automatically generates clustered grids without manual prescription of decay parameters needed with the conventional approach. In fact, these decay parameters are calculated as decay functions as part of the solution, and they are not constant over a given boundary. Since these decay functions vary over a given boundary, orthogonal grids near any arbitrary boundary can be clustered automatically without having to break up the boundaries and the corresponding interior domains into various zones for grid generation.
Qualitative mechanism models and the rationalization of procedures

NASA Technical Reports Server (NTRS)

Farley, Arthur M.

1989-01-01

A qualitative, cluster-based approach to the representation of hydraulic systems is described and its potential for generating and explaining procedures is demonstrated. Many ideas are formalized and implemented as part of an interactive, computer-based system. The system allows for designing, displaying, and reasoning about hydraulic systems. The interactive system has an interface consisting of three windows: a design/control window, a cluster window, and a diagnosis/plan window. A qualitative mechanism model for the ORS (Orbital Refueling System) is presented to coordinate with ongoing research on this system being conducted at NASA Ames Research Center.
Evaluation of large area crop estimation techniques using LANDSAT and ground-derived data. [Missouri

NASA Technical Reports Server (NTRS)

Amis, M. L.; Lennington, R. K.; Martin, M. V.; Mcguire, W. G.; Shen, S. S. (Principal Investigator)

1981-01-01

The results of the Domestic Crops and Land Cover Classification and Clustering study on large area crop estimation using LANDSAT and ground truth data are reported. The current crop area estimation approach of the Economics and Statistics Service of the U.S. Department of Agriculture was evaluated in terms of the factors that are likely to influence the bias and variance of the estimator. Also, alternative procedures involving replacements for the clustering algorithm, the classifier, or the regression model used in the original U.S. Department of Agriculture procedures were investigated.
Eye-gaze determination of user intent at the computer interface

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goldberg, J.H.; Schryver, J.C.

1993-12-31

Determination of user intent at the computer interface through eye-gaze monitoring can significantly aid applications for the disabled, as well as telerobotics and process control interfaces. Whereas current eye-gaze control applications are limited to object selection and x/y gazepoint tracking, a methodology was developed here to discriminate a more abstract interface operation: zooming-in or out. This methodology first collects samples of eve-gaze location looking at controlled stimuli, at 30 Hz, just prior to a user`s decision to zoom. The sample is broken into data frames, or temporal snapshots. Within a data frame, all spatial samples are connected into a minimummore » spanning tree, then clustered, according to user defined parameters. Each cluster is mapped to one in the prior data frame, and statistics are computed from each cluster. These characteristics include cluster size, position, and pupil size. A multiple discriminant analysis uses these statistics both within and between data frames to formulate optimal rules for assigning the observations into zooming, zoom-out, or no zoom conditions. The statistical procedure effectively generates heuristics for future assignments, based upon these variables. Future work will enhance the accuracy and precision of the modeling technique, and will empirically test users in controlled experiments.« less
Using Cluster Analysis to Examine Husband-Wife Decision Making

ERIC Educational Resources Information Center

Bonds-Raacke, Jennifer M.

2006-01-01

Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…
New detections of embedded clusters in the Galactic halo

NASA Astrophysics Data System (ADS)

Camargo, D.; Bica, E.; Bonatto, C.

2016-09-01

Context. Until recently it was thought that high Galactic latitude clouds were a non-star-forming ensemble. However, in a previous study we reported the discovery of two embedded clusters (ECs) far away from the Galactic plane (~ 5 kpc). In our recent star cluster catalogue we provided additional high and intermediate latitude cluster candidates. Aims: This work aims to clarify whether our previous detection of star clusters far away from the disc represents just an episodic event or whether star cluster formation is currently a systematic phenomenon in the Galactic halo. We analyse the nature of four clusters found in our recent catalogue and report the discovery of three new ECs each with an unusually high latitude and distance from the Galactic disc midplane. Methods: The analysis is based on 2MASS and WISE colour-magnitude diagrams (CMDs), and stellar radial density profiles (RDPs). The CMDs are built by applying a field-star decontamination procedure, which uncovers the cluster's intrinsic CMD morphology. Results: All of these clusters are younger than 5 Myr. The high-latitude ECs C 932, C 934, and C 939 appear to be related to a cloud complex about 5 kpc below the Galactic disc, under the Local arm. The other clusters are above the disc, C 1074 and C 1100 with a vertical distance of ~3 kpc, C 1099 with ~ 2 kpc, and C 1101 with ~1.8 kpc. Conclusions: According to the derived parameters ECs located below and above the disc occur, which gives evidence of widespread star cluster formation throughout the Galactic halo. This study therefore represents a paradigm shift, by demonstrating that a sterile halo must now be understood as a host for ongoing star formation. The origin and fate of these ECs remain open. There are two possibilities for their origin, Galactic fountains or infall. The discovery of ECs far from the disc suggests that the Galactic halo is more actively forming stars than previously thought. Furthermore, since most ECs do not survive the infant mortality, stars may be raining from the halo into the disc, and/or the halo may be harbouring generations of stars formed in clusters like those detected in our survey.
Meta-analysis using Dirichlet process.

PubMed

Muthukumarana, Saman; Tiwari, Ram C

2016-02-01

This article develops a Bayesian approach for meta-analysis using the Dirichlet process. The key aspect of the Dirichlet process in meta-analysis is the ability to assess evidence of statistical heterogeneity or variation in the underlying effects across study while relaxing the distributional assumptions. We assume that the study effects are generated from a Dirichlet process. Under a Dirichlet process model, the study effects parameters have support on a discrete space and enable borrowing of information across studies while facilitating clustering among studies. We illustrate the proposed method by applying it to a dataset on the Program for International Student Assessment on 30 countries. Results from the data analysis, simulation studies, and the log pseudo-marginal likelihood model selection procedure indicate that the Dirichlet process model performs better than conventional alternative methods. © The Author(s) 2012.
Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering.

PubMed

Chang, Jinyuan; Zhou, Wen; Zhou, Wen-Xin; Wang, Lan

2017-03-01

Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2016, The International Biometric Society.
A Unique Four-Hub Protein Cluster Associates to Glioblastoma Progression

PubMed Central

Simeone, Pasquale; Trerotola, Marco; Urbanella, Andrea; Lattanzio, Rossano; Ciavardelli, Domenico; Di Giuseppe, Fabrizio; Eleuterio, Enrica; Sulpizio, Marilisa; Eusebi, Vincenzo; Pession, Annalisa; Piantelli, Mauro; Alberti, Saverio

2014-01-01

Gliomas are the most frequent brain tumors. Among them, glioblastomas are malignant and largely resistant to available treatments. Histopathology is the gold standard for classification and grading of brain tumors. However, brain tumor heterogeneity is remarkable and histopathology procedures for glioma classification remain unsatisfactory for predicting disease course as well as response to treatment. Proteins that tightly associate with cancer differentiation and progression, can bear important prognostic information. Here, we describe the identification of protein clusters differentially expressed in high-grade versus low-grade gliomas. Tissue samples from 25 high-grade tumors, 10 low-grade tumors and 5 normal brain cortices were analyzed by 2D-PAGE and proteomic profiling by mass spectrometry. This led to identify 48 differentially expressed protein markers between tumors and normal samples. Protein clustering by multivariate analyses (PCA and PLS-DA) provided discrimination between pathological samples to an unprecedented extent, and revealed a unique network of deranged proteins. We discovered a novel glioblastoma control module centered on four major network hubs: Huntingtin, HNF4α, c-Myc and 14-3-3ζ. Immunohistochemistry, western blotting and unbiased proteome-wide meta-analysis revealed altered expression of this glioblastoma control module in human glioma samples as compared with normal controls. Moreover, the four-hub network was found to cross-talk with both p53 and EGFR pathways. In summary, the findings of this study indicate the existence of a unifying signaling module controlling glioblastoma pathogenesis and malignant progression, and suggest novel targets for development of diagnostic and therapeutic procedures. PMID:25050814
[Procedure of seed quality testing and seed grading standard of Prunus humilis].

PubMed

Wen, Hao; Ren, Guang-Xi; Gao, Ya; Luo, Jun; Liu, Chun-Sheng; Li, Wei-Dong

2014-11-01

So far there exists no corresponding quality test procedures and grading standards for the seed of Prunus humilis, which is one of the important source of base of semen pruni. Therefor we set up test procedures that are adapt to characteristics of the P. humilis seed through the study of the test of sampling, seed purity, thousand-grain weight, seed moisture, seed viability and germination percentage. 50 cases of seed specimens of P. humilis tested. The related data were analyzed by cluster analysis. Through this research, the seed quality test procedure was developed, and the seed quality grading standard was formulated. The seed quality of each grade should meet the following requirements: for first grade seeds, germination percentage ≥ 68%, thousand-grain weight 383 g, purity ≥ 93%, seed moisture ≤ 5%; for second grade seeds, germination percentage ≥ 26%, thousand-grain weight ≥ 266 g, purity ≥ 73%, seed moisture ≤9%; for third grade seeds, germination percentage ≥ 10%, purity ≥ 50%, thousand-grain weight ≥ 08 g, seed moisture ≤ 13%.
Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

PubMed

Ing, Alex; Schwarzbauer, Christian

2014-01-01

Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.
Cluster decomposition of full configuration interaction wave functions: A tool for chemical interpretation of systems with strong correlation

NASA Astrophysics Data System (ADS)

Lehtola, Susi; Tubman, Norm M.; Whaley, K. Birgitta; Head-Gordon, Martin

2017-10-01

Approximate full configuration interaction (FCI) calculations have recently become tractable for systems of unforeseen size, thanks to stochastic and adaptive approximations to the exponentially scaling FCI problem. The result of an FCI calculation is a weighted set of electronic configurations, which can also be expressed in terms of excitations from a reference configuration. The excitation amplitudes contain information on the complexity of the electronic wave function, but this information is contaminated by contributions from disconnected excitations, i.e., those excitations that are just products of independent lower-level excitations. The unwanted contributions can be removed via a cluster decomposition procedure, making it possible to examine the importance of connected excitations in complicated multireference molecules which are outside the reach of conventional algorithms. We present an implementation of the cluster decomposition analysis and apply it to both true FCI wave functions, as well as wave functions generated from the adaptive sampling CI algorithm. The cluster decomposition is useful for interpreting calculations in chemical studies, as a diagnostic for the convergence of various excitation manifolds, as well as as a guidepost for polynomially scaling electronic structure models. Applications are presented for (i) the double dissociation of water, (ii) the carbon dimer, (iii) the π space of polyacenes, and (iv) the chromium dimer. While the cluster amplitudes exhibit rapid decay with an increasing rank for the first three systems, even connected octuple excitations still appear important in Cr2, suggesting that spin-restricted single-reference coupled-cluster approaches may not be tractable for some problems in transition metal chemistry.
Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

PubMed Central

Ing, Alex; Schwarzbauer, Christian

2014-01-01

Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136
Typology of eaters based on conventional and organic food consumption: results from the NutriNet-Santé cohort study.

PubMed

Baudry, Julia; Touvier, Mathilde; Allès, Benjamin; Péneau, Sandrine; Méjean, Caroline; Galan, Pilar; Hercberg, Serge; Lairon, Denis; Kesse-Guyot, Emmanuelle

2016-08-01

Limited information is available on large-scale populations regarding the socio-demographic and nutrient profiles and eating behaviour of consumers, taking into account both organic and conventional foods. The aims of this study were to draw up a typology of consumers according to their eating habits, based both on their dietary patterns and the mode of food production, and to outline their socio-demographic, behavioural and nutritional characteristics. Data were collected from 28 245 participants of the NutriNet-Santé study. Dietary information was obtained using a 264-item, semi-quantitative, organic FFQ. To identify clusters of consumers, principal component analysis was applied on sixteen conventional and sixteen organic food groups followed by a clustering procedure. The following five clusters of consumers were identified: (1) a cluster characterised by low energy intake, low consumption of organic food and high prevalence of inadequate nutrient intakes; (2) a cluster of big eaters of conventional foods with high intakes of SFA and cholesterol; (3) a cluster with high consumption of organic food and relatively adequate nutritional diet quality; (4) a group with a high percentage of organic food consumers, 14 % of which were either vegetarians or vegans, who exhibited a high nutritional diet quality and a low prevalence of inadequate intakes of most vitamins except B12; and (5) a group of moderate organic food consumers with a particularly high intake of proteins and alcohol and a poor nutritional diet quality. These findings may have implications for future aetiological studies investigating the potential impact of organic food consumption.
Differentiation of Leishmania species by FT-IR spectroscopy

NASA Astrophysics Data System (ADS)

Aguiar, Josafá C.; Mittmann, Josane; Ferreira, Isabelle; Ferreira-Strixino, Juliana; Raniero, Leandro

2015-05-01

Leishmaniasis is a parasitic infectious disease caused by protozoa that belong to the genus Leishmania. It is transmitted by the bite of an infected female Sand fly. The disease is endemic in 88 countries Desjeux (2001) [1] (16 developed countries and 72 developing countries) on four continents. In Brazil, epidemiological data show the disease is present in all Brazilian regions, with the highest incidences in the North and Northeast. There are several methods used to diagnose leishmaniasis, but these procedures have many limitations, are time consuming, have low sensitivity, and are expensive. In this context, Fourier Transform Infrared Spectroscopy (FT-IR) analysis has the potential to provide rapid results and may be adapted for a clinical test with high sensitivity and specificity. In this work, FT-IR was used as a tool to investigate the promastigotes of Leishmaniaamazonensis, Leishmaniachagasi, and Leishmaniamajor species. The spectra were analyzed by cluster analysis and deconvolution procedure base on spectra second derivatives. Results: cluster analysis found four specific regions that are able to identify the Leishmania species. The dendrogram representation clearly indicates the heterogeneity among Leishmania species. The band deconvolution done by the curve fitting in these regions quantitatively differentiated the polysaccharides, amide III, phospholipids, proteins, and nucleic acids. L. chagasi and L. major showed a greater biochemistry similarity and have three bands that were not registered in L. amazonensis. The L. amazonensis presented three specific bands that were not recorded in the other two species. It is evident that the FT-IR method is an indispensable tool to discriminate these parasites. The high sensitivity and specificity of this technique opens up the possibilities for further studies about characterization of other microorganisms.
A Weight-Adaptive Laplacian Embedding for Graph-Based Clustering.

PubMed

Cheng, De; Nie, Feiping; Sun, Jiande; Gong, Yihong

2017-07-01

Graph-based clustering methods perform clustering on a fixed input data graph. Thus such clustering results are sensitive to the particular graph construction. If this initial construction is of low quality, the resulting clustering may also be of low quality. We address this drawback by allowing the data graph itself to be adaptively adjusted in the clustering procedure. In particular, our proposed weight adaptive Laplacian (WAL) method learns a new data similarity matrix that can adaptively adjust the initial graph according to the similarity weight in the input data graph. We develop three versions of these methods based on the L2-norm, fuzzy entropy regularizer, and another exponential-based weight strategy, that yield three new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic data sets and real-world benchmark data sets exhibit the effectiveness of these new graph-based clustering methods.

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

PubMed Central

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Backgrounds Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Methods Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. Result A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. PMID:24705246
Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

PubMed

Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

2017-12-01

To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The neural substrates of procrastination: A voxel-based morphometry study.

PubMed

Hu, Yue; Liu, Peiwei; Guo, Yiqun; Feng, Tingyong

2018-03-01

Procrastination is a pervasive phenomenon across different cultures and brings about lots of serious consequences, including performance, subjective well-being, and even public policy. However, little is known about the neural substrates of procrastination. In order to shed light upon this question, we investigated the neuroanatomical substrates of procrastination across two independent samples using voxel-based morphometry (VBM) method. The whole-brain analysis showed procrastination was positively correlated with the graymatter (GM) volume of clusters in the parahippocampal gyrus (PHG) and the orbital frontal cortex (OFC), while negatively correlated with the GM volume of clusters in the inferior frontal gyrus (IFG) and the middle frontal gyrus (MFG) in sample one (151 participants). We further conducted a verification procedure on another sample (108 participants) using region-of-interest analysis to examine the reliability of these results. Results showed procrastination can be predicted by the GM volume of the OFC and the MFG. The present findings suggest that the MFG and OFC, which are the key regions of self-control and emotion regulation, may play an important role in procrastination. Copyright © 2018 Elsevier Inc. All rights reserved.
Pain Behavior in Rheumatoid Arthritis Patients: Identification of Pain Behavior Subgroups

PubMed Central

Waters, Sandra J.; Riordan, Paul A.; Keefe, Francis J.; Lefebvre, John C.

2008-01-01

This study used Ward’s minimum variance hierarchical cluster analysis to identify homogeneous subgroups of rheumatoid arthritis patients suffering from chronic pain who exhibited similar pain behavior patterns during a videotaped behavior sample. Ninety-two rheumatoid arthritis patients were divided into two samples. Six motor pain behaviors were examined: guarding, bracing, active rubbing, rigidity, grimacing, and sighing. The cluster analysis procedure identified four similar subgroups in Sample 1 and Sample 2. The first subgroup exhibited low levels of all pain behaviors. The second subgroup exhibited a high level of guarding and low levels of other pain behaviors. The third subgroup exhibited high levels of guarding and rigidity and low levels of other pain behaviors. The fourth subgroup exhibited high levels of guarding and active rubbing and low levels of other pain behaviors. Sample 1 contained a fifth subgroup that exhibited a high level of active rubbing and low levels of other pain measures. The results of this study suggest that there are homogeneous subgroups within rheumatoid arthritis patient populations who differ in the motor pain behaviors they exhibit. PMID:18358682
Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters

PubMed Central

Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

2011-01-01

The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874
Comparative microbial modules resource: generation and visualization of multi-species biclusters.

PubMed

Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard

2011-12-01

The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.
A scheme for racquet sports video analysis with the combination of audio-visual information

NASA Astrophysics Data System (ADS)

Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua

2005-07-01

As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.
Bimetallic clustered thin films with variable electro-optical properties

NASA Astrophysics Data System (ADS)

Antipov, A.; Bukharov, D.; Arakelyan, S.; Osipov, A.; Lelekova, A.

2018-01-01

The drop deposition of colloidal nanoparticles was performed from water-based colloidal solutions. The proposed procedure is based on the agglomeration of colloidal particles in laser-assisted evaporation processes. The evaporation process was resulted in the formation of clustered thin films on a glass substrate. In the experiments with bimetallic Au:Ag solutions, the clustered films are grown, the formation of the clustered films with the average height of 100 nm was achieved. Optical properties of the deposited structures were investigated experimentally. It is shown that the obtained films may become transparent and its properties are defined by its morphology.
Mapping Dark Matter in Simulated Galaxy Clusters

NASA Astrophysics Data System (ADS)

Bowyer, Rachel

2018-01-01

Galaxy clusters are the most massive bound objects in the Universe with most of their mass being dark matter. Cosmological simulations of structure formation show that clusters are embedded in a cosmic web of dark matter filaments and large scale structure. It is thought that these filaments are found preferentially close to the long axes of clusters. We extract galaxy clusters from the simulations "cosmo-OWLS" in order to study their properties directly and also to infer their properties from weak gravitational lensing signatures. We investigate various stacking procedures to enhance the signal of the filaments and large scale structure surrounding the clusters to better understand how the filaments of the cosmic web connect with galaxy clusters. This project was supported in part by the NSF REU grant AST-1358980 and by the Nantucket Maria Mitchell Association.
Correlation filtering in financial time series (Invited Paper)

NASA Astrophysics Data System (ADS)

Aste, T.; Di Matteo, Tiziana; Tumminello, M.; Mantegna, R. N.

2005-05-01

We apply a method to filter relevant information from the correlation coefficient matrix by extracting a network of relevant interactions. This method succeeds to generate networks with the same hierarchical structure of the Minimum Spanning Tree but containing a larger amount of links resulting in a richer network topology allowing loops and cliques. In Tumminello et al.,1 we have shown that this method, applied to a financial portfolio of 100 stocks in the USA equity markets, is pretty efficient in filtering relevant information about the clustering of the system and its hierarchical structure both on the whole system and within each cluster. In particular, we have found that triangular loops and 4 element cliques have important and significant relations with the market structure and properties. Here we apply this filtering procedure to the analysis of correlation in two different kind of interest rate time series (16 Eurodollars and 34 US interest rates).
Configural approaches to temperament assessment: implications for predicting risk of unintentional injury in children.

PubMed

Berry, Jack W; Schwebel, David C

2009-10-01

This study used two configural approaches to understand how temperament factors (surgency/extraversion, negative affect, and effortful control) might predict child injury risk. In the first approach, clustering procedures were applied to trait dimensions to identify discrete personality prototypes. In the second approach, two- and three-way trait interactions were considered dimensionally in regression models predicting injury outcomes. Injury risk was assessed through four measures: lifetime prevalence of injuries requiring professional medical attention, scores on the Injury Behavior Checklist, and frequency and severity of injuries reported in a 2-week injury diary. In the prototype analysis, three temperament clusters were obtained, which resembled resilient, overcontrolled, and undercontrolled types found in previous research. Undercontrolled children had greater risk of injury than children in the other groups. In the dimensional interaction analyses, an interaction between surgency/extraversion and negative affect tended to predict injury, especially when children lacked capacity for effortful control.
Modular analysis of the probabilistic genetic interaction network.

PubMed

Hou, Lin; Wang, Lin; Qian, Minping; Li, Dong; Tang, Chao; Zhu, Yunping; Deng, Minghua; Li, Fangting

2011-03-15

Epistatic Miniarray Profiles (EMAP) has enabled the mapping of large-scale genetic interaction networks; however, the quantitative information gained from EMAP cannot be fully exploited since the data are usually interpreted as a discrete network based on an arbitrary hard threshold. To address such limitations, we adopted a mixture modeling procedure to construct a probabilistic genetic interaction network and then implemented a Bayesian approach to identify densely interacting modules in the probabilistic network. Mixture modeling has been demonstrated as an effective soft-threshold technique of EMAP measures. The Bayesian approach was applied to an EMAP dataset studying the early secretory pathway in Saccharomyces cerevisiae. Twenty-seven modules were identified, and 14 of those were enriched by gold standard functional gene sets. We also conducted a detailed comparison with state-of-the-art algorithms, hierarchical cluster and Markov clustering. The experimental results show that the Bayesian approach outperforms others in efficiently recovering biologically significant modules.
On Learning Cluster Coefficient of Private Networks

PubMed Central

Wang, Yue; Wu, Xintao; Zhu, Jun; Xiang, Yang

2013-01-01

Enabling accurate analysis of social network data while preserving differential privacy has been challenging since graph features such as clustering coefficient or modularity often have high sensitivity, which is different from traditional aggregate functions (e.g., count and sum) on tabular data. In this paper, we treat a graph statistics as a function f and develop a divide and conquer approach to enforce differential privacy. The basic procedure of this approach is to first decompose the target computation f into several less complex unit computations f1, …, fm connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), then perturb the output of each fi with Laplace noise derived from its own sensitivity value and the distributed privacy threshold εi, and finally combine those perturbed fi as the perturbed output of computation f. We examine how various operations affect the accuracy of complex computations. When unit computations have large global sensitivity values, we enforce the differential privacy by calibrating noise based on the smooth sensitivity, rather than the global sensitivity. By doing this, we achieve the strict differential privacy guarantee with smaller magnitude noise. We illustrate our approach by using clustering coefficient, which is a popular statistics used in social network analysis. Empirical evaluations on five real social networks and various synthetic graphs generated from three random graph models show the developed divide and conquer approach outperforms the direct approach. PMID:24429843
Genetic analysis of the ADGF multigene family by homologous recombination and gene conversion in Drosophila.

PubMed

Dolezal, Tomas; Gazi, Michal; Zurovec, Michal; Bryant, Peter J

2003-10-01

Many Drosophila genes exist as members of multigene families and within each family the members can be functionally redundant, making it difficult to identify them by classical mutagenesis techniques based on phenotypic screening. We have addressed this problem in a genetic analysis of a novel family of six adenosine deaminase-related growth factors (ADGFs). We used ends-in targeting to introduce mutations into five of the six ADGF genes, taking advantage of the fact that five of the family members are encoded by a three-gene cluster and a two-gene cluster. We used two targeting constructs to introduce loss-of-function mutations into all five genes, as well as to isolate different combinations of multiple mutations, independent of phenotypic consequences. The results show that (1) it is possible to use ends-in targeting to disrupt gene clusters; (2) gene conversion, which is usually considered a complication in gene targeting, can be used to help recover different mutant combinations in a single screening procedure; (3) the reduction of duplication to a single copy by induction of a double-strand break is better explained by the single-strand annealing mechanism than by simple crossing over between repeats; and (4) loss of function of the most abundantly expressed family member (ADGF-A) leads to disintegration of the fat body and the development of melanotic tumors in mutant larvae.
DCE: A Distributed Energy-Eﬃcient Clustering Protocol for Wireless Sensor Network Based on Double-Phase Cluster-Head Election.

PubMed

Han, Ruisong; Yang, Wei; Wang, Yipeng; You, Kaiming

2017-05-01

Clustering is an effective technique used to reduce energy consumption and extend the lifetime of wireless sensor network (WSN). The characteristic of energy heterogeneity of WSNs should be considered when designing clustering protocols. We propose and evaluate a novel distributed energy-eﬃcient clustering protocol called DCE for heterogeneous wireless sensor networks, based on a Double-phase Cluster-head Election scheme. In DCE, the procedure of cluster head election is divided into two phases. In the first phase, tentative cluster heads are elected with the probabilities which are decided by the relative levels of initial and residual energy. Then, in the second phase, the tentative cluster heads are replaced by their cluster members to form the final set of cluster heads if any member in their cluster has more residual energy. Employing two phases for cluster-head election ensures that the nodes with more energy have a higher chance to be cluster heads. Energy consumption is well-distributed in the proposed protocol, and the simulation results show that DCE achieves longer stability periods than other typical clustering protocols in heterogeneous scenarios.
Latent Class Detection and Class Assignment: A Comparison of the MAXEIG Taxometric Procedure and Factor Mixture Modeling Approaches

ERIC Educational Resources Information Center

Lubke, Gitta; Tueller, Stephen

2010-01-01

Taxometric procedures such as MAXEIG and factor mixture modeling (FMM) are used in latent class clustering, but they have very different sets of strengths and weaknesses. Taxometric procedures, popular in psychiatric and psychopathology applications, do not rely on distributional assumptions. Their sole purpose is to detect the presence of latent…
Can cluster environment modify the dynamical evolution of spiral galaxies?

NASA Technical Reports Server (NTRS)

Amram, P.; Balkowski, C.; Cayatte, V.; Marcelin, M.; Sullivan, W. T., III

1993-01-01

Over the past decade many effects of the cluster environment on member galaxies have been established. These effects are manifest in the amount and distribution of gas in cluster spirals, the luminosity and light distributions within galaxies, and the segregation of morphological types. All these effects could indicate a specific dynamical evolution for galaxies in clusters. Nevertheless, a more direct evidence, such as a different mass distribution for spiral galaxies in clusters and in the field, is not yet clearly established. Indeed, Rubin, Whitmore, and Ford (1988) and Whitmore, Forbes, and Rubin (1988) (referred to as RWF) presented evidence that inner cluster spirals have falling rotation curves, unlike those of outer cluster spirals or the great majority of field spirals. If falling rotation curves exist in centers of clusters, as argued by RWF, it would suggest that dark matter halos were absent from cluster spirals, either because the halos had become stripped by interactions with other galaxies or with an intracluster medium, or because the halos had never formed in the first place. Even if they didn't disagree with RWF, other researchers pointed out that the behaviour of the slope of the rotation curves of spiral galaxies (in Virgo) is not so clear. Amram, using a different sample of spiral galaxies in clusters, found only 10% of declining rotation curves (2 declining vs 17 flat or rising) in opposition to RWF who find about 40% of declining rotation curves in their sample (6 declining vs 10 flat or rising), we will hereafter briefly discuss the Amram data paper and compare it to the results of RWF. We have measured the rotation curves for a sample of 21 spiral galaxies in 5 nearby clusters. These rotation curves have been constructed from detailed two-dimensional maps of each galaxy's velocity field as traced by emission from the Ha line. This complete mapping, combined with the sensitivity of our CFHT 3.60 m. + Perot-Fabry + CCD observations, allows the construction of high-quality rotation curves. Details concerning the acquisition and reduction procedures of the data are given in Amram. We present and discuss our preliminary analysis and compare them with RWF's results.
Palate dimensions in six-year-old children with unilateral cleft lip and palate: a six-center study on dental casts.

PubMed

Koželj, Vesna; Vegnuti, Miljana; Drevenšek, Martina; Hortis-Dzierzbicka, Maria; Gonzalez-Landa, Gonzalo; Hanstein, Siiri; Klimova, Irena; Kobus, Kazimierz; Kobus-Zaleśna, Katarzyna; Semb, Gunvor; Shaw, Bill

2012-11-01

To compare palatal dimensions in 6-year-old children with unilateral cleft lip and palate (UCLP) treated by different protocols with those of noncleft children. Retrospective intercenter outcome study. Patients : Upper dental casts from 129 children with repaired UCLP and 30 controls were analyzed by the trigonometric method. Six European cleft centers. Main outcome measures : Sagittal, transverse, and vertical dimensions of the palate were observed. Palate variables were analyzed with descriptive methods and nonparametric tests. Regarding several various characteristics measured on a relatively small number of subjects, hierarchical, k-means clustering, and principal component analyses were used. Mean values of the observed dimensions for five cleft groups differed significantly from the control (p < .05). The group with one-stage closure of the cleft differed significantly from all other cleft groups in most variables (p < .05). Principal component analysis of all 159 cases identified three clusters with specific morphologic characteristics of the palate. A similar number of treated children were classified into each cluster, while all children without clefts were classified in the same cluster. The percentage of treated children from a particular group that fit this cluster ranged from 0% to 70% and increased with age at palatal closure and number of primary surgical procedures. At 6 years of age, children with stepwise repair and hard palate closure after the age of two more frequently result in palatal dimensions of noncleft control than children with earlier palatal closure and one-stage cleft repair.
RUPRECHT 147: THE OLDEST NEARBY OPEN CLUSTER AS A NEW BENCHMARK FOR STELLAR ASTROPHYSICS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curtis, Jason L.; Wright, Jason T.; Wolfgang, Angie

2013-05-15

Ruprecht 147 is a hitherto unappreciated open cluster that holds great promise as a standard in fundamental stellar astrophysics. We have conducted a radial velocity survey of astrometric candidates with Lick, Palomar, and MMT observatories and have identified over 100 members, including 5 blue stragglers, 11 red giants, and 5 double-lined spectroscopic binaries (SB2s). We estimate the cluster metallicity from spectroscopic analysis, using Spectroscopy Made Easy (SME), and find it to be [M/H] = +0.07 {+-} 0.03. We have obtained deep CFHT/MegaCam g'r'i'z' photometry and fit Padova isochrones to the (g' - i') and Two Micron All Sky Survey (Jmore » - K{sub S} ) color-magnitude diagrams, using the {tau}{sup 2} maximum-likelihood procedure of Naylor, and an alternative method using two-dimensional cross-correlations developed in this work. We find best fits for Padova isochrones at age t = 2.5 {+-} 0.25 Gyr, m - M = 7.35 {+-} 0.1, and A{sub V} = 0.25 {+-} 0.05, with additional uncertainty from the unresolved binary population and possibility of differential extinction across this large cluster. The inferred age is heavily dependent on our choice of stellar evolution model: fitting Dartmouth and PARSEC models yield age parameters of 3 Gyr and 3.25 Gyr, respectively. At {approx}300 pc and {approx}3 Gyr, Ruprecht 147 is by far the oldest nearby star cluster.« less
Subtypes of female juvenile offenders: a cluster analysis of the Millon Adolescent Clinical Inventory.

PubMed

Stefurak, Tres; Calhoun, Georgia B

2007-01-01

The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.

[Cluster analysis in biomedical researches].

PubMed

Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

2013-01-01

Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
The development and cross-validation of an MMPI typology of murderers.

PubMed

Holcomb, W R; Adams, N A; Ponder, H M

1985-06-01

A sample of 80 male offenders charged with premeditated murder were divided into five personality types using MMPI scores. A hierarchical clustering procedure was used with a subsequent internal cross-validation analysis using a second sample of 80 premeditated murderers. A Discriminant Analysis resulted in a 96.25% correct classification of subjects from the second sample into the five types. Clinical data from a mental status interview schedule supported the external validity of these types. There were significant differences among the five types in hallucinations, disorientation, hostility, depression, and paranoid thinking. Both similarities and differences of the present typology with prior research was discussed. Additional research questions were suggested.
Modeling of correlated data with informative cluster sizes: An evaluation of joint modeling and within-cluster resampling approaches.

PubMed

Zhang, Bo; Liu, Wei; Zhang, Zhiwei; Qu, Yanping; Chen, Zhen; Albert, Paul S

2017-08-01

Joint modeling and within-cluster resampling are two approaches that are used for analyzing correlated data with informative cluster sizes. Motivated by a developmental toxicity study, we examined the performances and validity of these two approaches in testing covariate effects in generalized linear mixed-effects models. We show that the joint modeling approach is robust to the misspecification of cluster size models in terms of Type I and Type II errors when the corresponding covariates are not included in the random effects structure; otherwise, statistical tests may be affected. We also evaluate the performance of the within-cluster resampling procedure and thoroughly investigate the validity of it in modeling correlated data with informative cluster sizes. We show that within-cluster resampling is a valid alternative to joint modeling for cluster-specific covariates, but it is invalid for time-dependent covariates. The two methods are applied to a developmental toxicity study that investigated the effect of exposure to diethylene glycol dimethyl ether.
Application of adaptive cluster sampling to low-density populations of freshwater mussels

USGS Publications Warehouse

Smith, D.R.; Villella, R.F.; Lemarie, D.P.

2003-01-01

Freshwater mussels appear to be promising candidates for adaptive cluster sampling because they are benthic macroinvertebrates that cluster spatially and are frequently found at low densities. We applied adaptive cluster sampling to estimate density of freshwater mussels at 24 sites along the Cacapon River, WV, where a preliminary timed search indicated that mussels were present at low density. Adaptive cluster sampling increased yield of individual mussels and detection of uncommon species; however, it did not improve precision of density estimates. Because finding uncommon species, collecting individuals of those species, and estimating their densities are important conservation activities, additional research is warranted on application of adaptive cluster sampling to freshwater mussels. However, at this time we do not recommend routine application of adaptive cluster sampling to freshwater mussel populations. The ultimate, and currently unanswered, question is how to tell when adaptive cluster sampling should be used, i.e., when is a population sufficiently rare and clustered for adaptive cluster sampling to be efficient and practical? A cost-effective procedure needs to be developed to identify biological populations for which adaptive cluster sampling is appropriate.
Participation of adults with visual and severe or profound intellectual disabilities: Definition and operationalization.

PubMed

Hanzen, Gineke; van Nispen, Ruth M A; van der Putten, Annette A J; Waninge, Aly

2017-02-01

The available opinions regarding participation do not appear to be applicable to adults with visual and severe or profound intellectual disabilities (VSPID). Because a clear definition and operationalization are lacking, it is difficult for support professionals to give meaning to participation for adults with VSPID. The purpose of the present study was to develop a definition and operationalization of the concept of participation of adults with VSPID. Parents or family members, professionals, and experts participated in an online concept mapping procedure. This procedure includes generating statements, clustering them, and rating their importance. The data were analyzed quantitatively using multidimensional scaling and qualitatively with triangulation. A total of 53 participants generated 319 statements of which 125 were clustered and rated. The final cluster map of the statements contained seven clusters: (1) Experience and discover; (2) Inclusion; (3) Involvement; (4) Leisure and recreation; (5) Communication and being understood; (6) Social relations; and (7) Self-management and autonomy. The average importance rating of the statements varied from 6.49 to 8.95. A definition of participation of this population was developed which included these seven clusters. The combination of the developed definition, the clusters, and the statements in these clusters, derived from the perceptions of parents or family members, professionals, and experts, can be employed to operationalize the construct of participation of adults with VSPID. This operationalization supports professionals in their ability to give meaning to participation in these adults. Future research will focus on using the operationalization as a checklist of participation for adults with VSPID. Copyright © 2016 Elsevier Ltd. All rights reserved.
The effect of clustering on perceived quantity in humans (Homo sapiens) and in chicks (Gallus gallus).

PubMed

Bertamini, Marco; Guest, Martin; Vallortigara, Giorgio; Rugani, Rosa; Regolin, Lucia

2018-04-30

Animals can perceive the numerosity of sets of visual elements. Qualitative and quantitative similarities in different species suggest the existence of a shared system (approximate number system). Biases associated with sensory properties are informative about the underlying mechanisms. In humans, regular spacing increases perceived numerosity (regular-random numerosity illusion). This has led to a model that predicts numerosity based on occupancy (a measure that decreases when elements are close together). We used a procedure in which observers selected one of two stimuli and were given feedback with respect to whether the choice was correct. One configuration had 20 elements and the other 40, randomly placed inside a circular region. Participants had to discover the rule based on feedback. Because density and clustering covaried with numerosity, different dimensions could be used. After reaching a criterion, test trials presented two types of configurations with 30 elements. One type had a larger interelement distance than the other (high or low clustering). If observers had adopted a numerosity strategy, they would choose low clustering (if reinforced with 40) and high clustering (if reinforced with 20). A clustering or density strategy predicts the opposite. Human adults used a numerosity strategy. Chicks were tested using a similar procedure. There were two behavioral measures: first approach response and final circumnavigation (walking behind the screen). The prediction based on numerosity was confirmed by the first approach data. For chicks, one clear pattern from both responses was a preference for the configurations with higher clustering. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Genetic Variability among Lucerne Cultivars Based on Biochemical (SDS-PAGE) and Morphological Markers

NASA Astrophysics Data System (ADS)

Farshadfar, M.; Farshadfar, E.

The present research was conducted to determine the genetic variability of 18 Lucerne cultivars, based on morphological and biochemical markers. The traits studied were plant height, tiller number, biomass, dry yield, dry yield/biomass, dry leaf/dry yield, macro and micro elements, crude protein, dry matter, crude fiber and ash percentage and SDS- PAGE in seed and leaf samples. Field experiments included 18 plots of two meter rows. Data based on morphological, chemical and SDS-PAGE markers were analyzed using SPSSWIN soft ware and the multivariate statistical procedures: cluster analysis (UPGMA), principal component. Analysis of analysis of variance and mean comparison for morphological traits reflected significant differences among genotypes. Genotype 13 and 15 had the greatest values for most traits. The Genotypic Coefficient of Variation (GCV), Phenotypic Coefficient of Variation (PCV) and Heritability (Hb) parameters for different characters raged from 12.49 to 26.58% for PCV, hence the GCV ranged from 6.84 to 18.84%. The greatest value of Hb was 0.94 for stem number. Lucerne genotypes could be classified, based on morphological traits, into four clusters and 94% of the variance among the genotypes was explained by two PCAs: Based on chemical traits they were classified into five groups and 73.492% of variance was explained by four principal components: Dry matter, protein, fiber, P, K, Na, Mg and Zn had higher variance. Genotypes based on the SDS-PAGE patterns all genotypes were classified into three clusters. The greatest genetic distance was between cultivar 10 and others, therefore they would be suitable parent in a breeding program.
EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.

PubMed

Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina

2009-04-01

In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.
Measuring Vocational Preferences: Ranking versus Categorical Rating Procedures.

ERIC Educational Resources Information Center

Carifio, James

1978-01-01

Describes a study to compare the relative validities of ranking v categorical rating procedures for obtaining student vocational preference data in exploratory program assignment situations. Students indicated their vocational program preferences from career clusters, and the frequency of wrong assignments made by each method was analyzed. (MF)
Group sequential designs for stepped-wedge cluster randomised trials

PubMed Central

Grayling, Michael J; Wason, James MS; Mander, Adrian P

2017-01-01

Background/Aims: The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Methods: Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. Results: We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial’s type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. Conclusion: The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial. PMID:28653550
Group sequential designs for stepped-wedge cluster randomised trials.

PubMed

Grayling, Michael J; Wason, James Ms; Mander, Adrian P

2017-10-01

The stepped-wedge cluster randomised trial design has received substantial attention in recent years. Although various extensions to the original design have been proposed, no guidance is available on the design of stepped-wedge cluster randomised trials with interim analyses. In an individually randomised trial setting, group sequential methods can provide notable efficiency gains and ethical benefits. We address this by discussing how established group sequential methodology can be adapted for stepped-wedge designs. Utilising the error spending approach to group sequential trial design, we detail the assumptions required for the determination of stepped-wedge cluster randomised trials with interim analyses. We consider early stopping for efficacy, futility, or efficacy and futility. We describe first how this can be done for any specified linear mixed model for data analysis. We then focus on one particular commonly utilised model and, using a recently completed stepped-wedge cluster randomised trial, compare the performance of several designs with interim analyses to the classical stepped-wedge design. Finally, the performance of a quantile substitution procedure for dealing with the case of unknown variance is explored. We demonstrate that the incorporation of early stopping in stepped-wedge cluster randomised trial designs could reduce the expected sample size under the null and alternative hypotheses by up to 31% and 22%, respectively, with no cost to the trial's type-I and type-II error rates. The use of restricted error maximum likelihood estimation was found to be more important than quantile substitution for controlling the type-I error rate. The addition of interim analyses into stepped-wedge cluster randomised trials could help guard against time-consuming trials conducted on poor performing treatments and also help expedite the implementation of efficacious treatments. In future, trialists should consider incorporating early stopping of some kind into stepped-wedge cluster randomised trials according to the needs of the particular trial.
(GTG)5 MSP-PCR fingerprinting as a technique for discrimination of wine associated yeasts?

PubMed

Ramírez-Castrillón, Mauricio; Mendes, Sandra Denise Camargo; Inostroza-Ponta, Mario; Valente, Patricia

2014-01-01

In microbiology, identification of all isolates by sequencing is still unfeasible in small research laboratories. Therefore, many yeast diversity studies follow a screening procedure consisting of clustering the yeast isolates using MSP-PCR fingerprinting, followed by identification of one or a few selected representatives of each cluster by sequencing. Although this procedure has been widely applied in the literature, it has not been properly validated. We evaluated a standardized protocol using MSP-PCR fingerprinting with the primers (GTG)5 and M13 for the discrimination of wine associated yeasts in South Brazil. Two datasets were used: yeasts isolated from bottled wines and vineyard environments. We compared the discriminatory power of both primers in a subset of 16 strains, choosing the primer (GTG)5 for further evaluation. Afterwards, we applied this technique to 245 strains, and compared the results with the identification obtained by partial sequencing of the LSU rRNA gene, considered as the gold standard. An array matrix was constructed for each dataset and used as input for clustering with two methods (hierarchical dendrograms and QAPGrid layout). For both yeast datasets, unrelated species were clustered in the same group. The sensitivity score of (GTG)5 MSP-PCR fingerprinting was high, but specificity was low. As a conclusion, the yeast diversity inferred in several previous studies may have been underestimated and some isolates were probably misidentified due to the compliance to this screening procedure.
(GTG)5 MSP-PCR Fingerprinting as a Technique for Discrimination of Wine Associated Yeasts?

PubMed Central

Inostroza-Ponta, Mario; Valente, Patricia

2014-01-01

In microbiology, identification of all isolates by sequencing is still unfeasible in small research laboratories. Therefore, many yeast diversity studies follow a screening procedure consisting of clustering the yeast isolates using MSP-PCR fingerprinting, followed by identification of one or a few selected representatives of each cluster by sequencing. Although this procedure has been widely applied in the literature, it has not been properly validated. We evaluated a standardized protocol using MSP-PCR fingerprinting with the primers (GTG)5 and M13 for the discrimination of wine associated yeasts in South Brazil. Two datasets were used: yeasts isolated from bottled wines and vineyard environments. We compared the discriminatory power of both primers in a subset of 16 strains, choosing the primer (GTG)5 for further evaluation. Afterwards, we applied this technique to 245 strains, and compared the results with the identification obtained by partial sequencing of the LSU rRNA gene, considered as the gold standard. An array matrix was constructed for each dataset and used as input for clustering with two methods (hierarchical dendrograms and QAPGrid layout). For both yeast datasets, unrelated species were clustered in the same group. The sensitivity score of (GTG)5 MSP-PCR fingerprinting was high, but specificity was low. As a conclusion, the yeast diversity inferred in several previous studies may have been underestimated and some isolates were probably misidentified due to the compliance to this screening procedure. PMID:25171185
Model for spectral and chromatographic data

DOEpatents

Jarman, Kristin [Richland, WA; Willse, Alan [Richland, WA; Wahl, Karen [Richland, WA; Wahl, Jon [Richland, WA

2002-11-26

A method and apparatus using a spectral analysis technique are disclosed. In one form of the invention, probabilities are selected to characterize the presence (and in another form, also a quantification of a characteristic) of peaks in an indexed data set for samples that match a reference species, and other probabilities are selected for samples that do not match the reference species. An indexed data set is acquired for a sample, and a determination is made according to techniques exemplified herein as to whether the sample matches or does not match the reference species. When quantification of peak characteristics is undertaken, the model is appropriately expanded, and the analysis accounts for the characteristic model and data. Further techniques are provided to apply the methods and apparatuses to process control, cluster analysis, hypothesis testing, analysis of variance, and other procedures involving multiple comparisons of indexed data.
A sampling design framework for monitoring secretive marshbirds

USGS Publications Warehouse

Johnson, D.H.; Gibbs, J.P.; Herzog, M.; Lor, S.; Niemuth, N.D.; Ribic, C.A.; Seamans, M.; Shaffer, T.L.; Shriver, W.G.; Stehman, S.V.; Thompson, W.L.

2009-01-01

A framework for a sampling plan for monitoring marshbird populations in the contiguous 48 states is proposed here. The sampling universe is the breeding habitat (i.e. wetlands) potentially used by marshbirds. Selection protocols would be implemented within each of large geographical strata, such as Bird Conservation Regions. Site selection will be done using a two-stage cluster sample. Primary sampling units (PSUs) would be land areas, such as legal townships, and would be selected by a procedure such as systematic sampling. Secondary sampling units (SSUs) will be wetlands or portions of wetlands in the PSUs. SSUs will be selected by a randomized spatially balanced procedure. For analysis, the use of a variety of methods as a means of increasing confidence in conclusions that may be reached is encouraged. Additional effort will be required to work out details and implement the plan.
Career Decision Statuses among Portuguese Secondary School Students: A Cluster Analytical Approach

ERIC Educational Resources Information Center

Santos, Paulo Jorge; Ferreira, Joaquim Armando

2012-01-01

Career indecision is a complex phenomenon and an increasing number of authors have proposed that undecided individuals do not form a group with homogeneous characteristics. This study examines career decision statuses among a sample of 362 12th-grade Portuguese students. A cluster-analytical procedure, based on a battery of instruments designed to…
HLM in Cluster-Randomised Trials--Measuring Efficacy across Diverse Populations of Learners

ERIC Educational Resources Information Center

Hegedus, Stephen; Tapper, John; Dalton, Sara; Sloane, Finbarr

2013-01-01

We describe the application of Hierarchical Linear Modelling (HLM) in a cluster-randomised study to examine learning algebraic concepts and procedures in an innovative, technology-rich environment in the US. HLM is applied to measure the impact of such treatment on learning and on contextual variables. We provide a detailed description of such…
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

PubMed

Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

2015-01-01

Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Photoionization cross section by Stieltjes imaging applied to coupled cluster Lanczos pseudo-spectra

NASA Astrophysics Data System (ADS)

Cukras, Janusz; Coriani, Sonia; Decleva, Piero; Christiansen, Ove; Norman, Patrick

2013-09-01

A recently implemented asymmetric Lanczos algorithm for computing (complex) linear response functions within the coupled cluster singles (CCS), coupled cluster singles and iterative approximate doubles (CC2), and coupled cluster singles and doubles (CCSD) is coupled to a Stieltjes imaging technique in order to describe the photoionization cross section of atoms and molecules, in the spirit of a similar procedure recently proposed by Averbukh and co-workers within the Algebraic Diagrammatic Construction approach. Pilot results are reported for the atoms He, Ne, and Ar and for the molecules H2, H2O, NH3, HF, CO, and CO2.
Photoionization cross section by Stieltjes imaging applied to coupled cluster Lanczos pseudo-spectra.

PubMed

Cukras, Janusz; Coriani, Sonia; Decleva, Piero; Christiansen, Ove; Norman, Patrick

2013-09-07

A recently implemented asymmetric Lanczos algorithm for computing (complex) linear response functions within the coupled cluster singles (CCS), coupled cluster singles and iterative approximate doubles (CC2), and coupled cluster singles and doubles (CCSD) is coupled to a Stieltjes imaging technique in order to describe the photoionization cross section of atoms and molecules, in the spirit of a similar procedure recently proposed by Averbukh and co-workers within the Algebraic Diagrammatic Construction approach. Pilot results are reported for the atoms He, Ne, and Ar and for the molecules H2, H2O, NH3, HF, CO, and CO2.

Preoptimised VB: a fast method for the ground and excited states of ionic clusters I. Localised preoptimisation for (ArCO) +, (ArN 2) + and N 4+

NASA Astrophysics Data System (ADS)

Langenberg, J. H.; Bucur, I. B.; Archirel, P.

1997-09-01

We show that in the simple case of van der Waals ionic clusters, the optimisation of orbitals within VB can be easily simulated with the help of pseudopotentials. The procedure yields the ground and the first excited states of the cluster simultaneously. This makes the calculation of potential energy surfaces for tri- and tetraatomic clusters possible, with very acceptable computation times. We give potential curves for (ArCO) +, (ArN 2) + and N 4+. An application to the simulation of the SCF method is shown for Na +H 2O.
Further Automate Planned Cluster Maintenance to Minimize System Downtime during Maintenance Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Springmeyer, R.

This report documents the integration and testing of the automated update process of compute clusters in LC to minimize impact to user productivity. Description: A set of scripts will be written and deployed to further standardize cluster maintenance activities and minimize downtime during planned maintenance windows. Completion Criteria: When the scripts have been deployed and used during planned maintenance windows and a timing comparison is completed between the existing process and the new more automated process, this milestone is complete. This milestone was completed on Aug 23, 2016 on the new CTS1 cluster called Jade when a request to upgrademore » the version of TOSS 3 was initiated while SWL jobs and normal user jobs were running. Jobs that were running when the update to the system began continued to run to completion. New jobs on the cluster started on the new release of TOSS 3. No system administrator action was required. Current update procedures in TOSS 2 begin by killing all users jobs. Then all diskfull nodes are updated, which can take a few hours. Only after the updates are applied are all nodes are rebooted, and then finally put back into service. A system administrator is required for all steps. In terms of human time spent during a cluster OS update, the TOSS 3 automated procedure on Jade took 0 FTE hours. Doing the same update without the Toss Update Tool would have required 4 FTE hours.« less
An Objective Classification of Saturn Cloud Features from Cassini ISS Images

NASA Technical Reports Server (NTRS)

Del Genio, Anthony D.; Barbara, John M.

2016-01-01

A k -means clustering algorithm is applied to Cassini Imaging Science Subsystem continuum and methane band images of Saturn's northern hemisphere to objectively classify regional albedo features and aid in their dynamical interpretation. The procedure is based on a technique applied previously to visible- infrared images of Earth. It provides a new perspective on giant planet cloud morphology and its relationship to the dynamics and a meteorological context for the analysis of other types of simultaneous Saturn observations. The method identifies 6 clusters that exhibit distinct morphology, vertical structure, and preferred latitudes of occurrence. These correspond to areas dominated by deep convective cells; low contrast areas, some including thinner and thicker clouds possibly associated with baroclinic instability; regions with possible isolated thin cirrus clouds; darker areas due to thinner low level clouds or clearer skies due to downwelling, or due to absorbing particles; and fields of relatively shallow cumulus clouds. The spatial associations among these cloud types suggest that dynamically, there are three distinct types of latitude bands on Saturn: deep convectively disturbed latitudes in cyclonic shear regions poleward of the eastward jets; convectively suppressed regions near and surrounding the westward jets; and baro-clinically unstable latitudes near eastward jet cores and in the anti-cyclonic regions equatorward of them. These are roughly analogous to some of the features of Earth's tropics, subtropics, and midlatitudes, respectively. This classification may be more useful for dynamics purposes than the traditional belt-zone partitioning. Temporal variations of feature contrast and cluster occurrence suggest that the upper tropospheric haze in the northern hemisphere may have thickened by 2014. The results suggest that routine use of clustering may be a worthwhile complement to many different types of planetary atmospheric data analysis.
CCD photometry of NGC 6101 - Another globular cluster with blue straggler stars

NASA Technical Reports Server (NTRS)

Sarajedini, Ata; Da Costa, G. S.

1991-01-01

Results are presented on CCD photometric observations of a large sample of stars in the southern globular cluster NGC 6101, and the procedures used to derive the color-magnitude (C-M) diagram of the cluster are described. No indication was found of any difference in age, at the less than 2 Gyr level, between NGC 6101 cluster and other clusters of similar abundance, such as M92. The C-M diagram revealed a significant blue straggler population. It was found that, in NGC 6101, these stars are more centrally concentrated than the cluster subgiants of similar magnitude, indicating that the blue stragglers have larger masses. Results on the magnitude and luminosity function of the sample are consistent with the bianry mass transfer or merger hypotheses for the origin of blue straggler stars.
Topology in two dimensions. II - The Abell and ACO cluster catalogues

NASA Astrophysics Data System (ADS)

Plionis, Manolis; Valdarnini, Riccardo; Coles, Peter

1992-09-01

We apply a method for quantifying the topology of projected galaxy clustering to the Abell and ACO catalogues of rich clusters. We use numerical simulations to quantify the statistical bias involved in using high peaks to define the large-scale structure, and we use the results obtained to correct our observational determinations for this known selection effect and also for possible errors introduced by boundary effects. We find that the Abell cluster sample is consistent with clusters being identified with high peaks of a Gaussian random field, but that the ACO shows a slight meatball shift away from the Gaussian behavior over and above that expected purely from the high-peak selection. The most conservative explanation of this effect is that it is caused by some artefact of the procedure used to select the clusters in the two samples.
Patterns of long-term care services use in a suburban municipality of Japan: a population-based study.

PubMed

Igarashi, Ayumi; Yamamoto-Mitani, Noriko; Yoshie, Satoru; Iijima, Katsuya

2017-05-01

Increasing service use under the long-term care insurance (LTCI) system in Japan requires a comprehensive understanding of how the services are actually used. This study aimed to identify patterns of LTCI service use and to examine the characteristics of the patterns. We analyzed data from a population of 4,339 older adults living in the community who were certified as "Needing Care" and were using at least one LTCI service in a suburban municipality of Japan. We identified six patterns of service use using cluster analysis based on the amount of fees for LTCI services and compared characteristics among the clusters. The clusters were: 1) light use of care services (n = 1,852); 2) day care-centered (n = 1,071); 3) day care with rehabilitation-centered (n = 616); 4) home help-centered (n = 365); 5) short-stay respite service-centered (n = 246); and 6) compound uses of visiting services (n = 189). "Home help-centered" and "short-stay respite service-centered" clusters used a large number of fees, whereas "compound uses of visiting services" clusters did not despite their severe conditions. The "day care-centered (with rehabilitation)" classification included few people who needed medical procedures, likely due to the lack of medical facilities in those agencies. The results show the impact of social and medical factors on LTCI service use, suggesting possible difficulties in the socialization of care. The clusters could be used as typical service use patterns, providing a framework for further studies, such as those evaluating the services' effects. Geriatr Gerontol Int 2017; 17: 753-759. © 2016 Japan Geriatrics Society.
Recurrent-neural-network-based Boolean factor analysis and its application to word clustering.

PubMed

Frolov, Alexander A; Husek, Dusan; Polyakov, Pavel Yu

2009-07-01

The objective of this paper is to introduce a neural-network-based algorithm for word clustering as an extension of the neural-network-based Boolean factor analysis algorithm (Frolov , 2007). It is shown that this extended algorithm supports even the more complex model of signals that are supposed to be related to textual documents. It is hypothesized that every topic in textual data is characterized by a set of words which coherently appear in documents dedicated to a given topic. The appearance of each word in a document is coded by the activity of a particular neuron. In accordance with the Hebbian learning rule implemented in the network, sets of coherently appearing words (treated as factors) create tightly connected groups of neurons, hence, revealing them as attractors of the network dynamics. The found factors are eliminated from the network memory by the Hebbian unlearning rule facilitating the search of other factors. Topics related to the found sets of words can be identified based on the words' semantics. To make the method complete, a special technique based on a Bayesian procedure has been developed for the following purposes: first, to provide a complete description of factors in terms of component probability, and second, to enhance the accuracy of classification of signals to determine whether it contains the factor. Since it is assumed that every word may possibly contribute to several topics, the proposed method might be related to the method of fuzzy clustering. In this paper, we show that the results of Boolean factor analysis and fuzzy clustering are not contradictory, but complementary. To demonstrate the capabilities of this attempt, the method is applied to two types of textual data on neural networks in two different languages. The obtained topics and corresponding words are at a good level of agreement despite the fact that identical topics in Russian and English conferences contain different sets of keywords.
Adaptive Localization of Focus Point Regions via Random Patch Probabilistic Density from Whole-Slide, Ki-67-Stained Brain Tumor Tissue

PubMed Central

Alomari, Yazan M.; MdZin, Reena Rahayu

2015-01-01

Analysis of whole-slide tissue for digital pathology images has been clinically approved to provide a second opinion to pathologists. Localization of focus points from Ki-67-stained histopathology whole-slide tissue microscopic images is considered the first step in the process of proliferation rate estimation. Pathologists use eye pooling or eagle-view techniques to localize the highly stained cell-concentrated regions from the whole slide under microscope, which is called focus-point regions. This procedure leads to a high variety of interpersonal observations and time consuming, tedious work and causes inaccurate findings. The localization of focus-point regions can be addressed as a clustering problem. This paper aims to automate the localization of focus-point regions from whole-slide images using the random patch probabilistic density method. Unlike other clustering methods, random patch probabilistic density method can adaptively localize focus-point regions without predetermining the number of clusters. The proposed method was compared with the k-means and fuzzy c-means clustering methods. Our proposed method achieves a good performance, when the results were evaluated by three expert pathologists. The proposed method achieves an average false-positive rate of 0.84% for the focus-point region localization error. Moreover, regarding RPPD used to localize tissue from whole-slide images, 228 whole-slide images have been tested; 97.3% localization accuracy was achieved. PMID:25793010
Specialized Computer Systems for Environment Visualization

NASA Astrophysics Data System (ADS)

Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.

2018-06-01

The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.
Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

PubMed

Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

2017-05-01

The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.
Rapid quality assessment of Radix Aconiti Preparata using direct analysis in real time mass spectrometry.

PubMed

Zhu, Hongbin; Wang, Chunyan; Qi, Yao; Song, Fengrui; Liu, Zhiqiang; Liu, Shuying

2012-11-08

This study presents a novel and rapid method to identify chemical markers for the quality control of Radix Aconiti Preparata, a world widely used traditional herbal medicine. In the method, the samples with a fast extraction procedure were analyzed using direct analysis in real time mass spectrometry (DART MS) combined with multivariate data analysis. At present, the quality assessment approach of Radix Aconiti Preparata was based on the two processing methods recorded in Chinese Pharmacopoeia for the purpose of reducing the toxicity of Radix Aconiti and ensuring its clinical therapeutic efficacy. In order to ensure the safety and effectivity in clinical use, the processing degree of Radix Aconiti should be well controlled and assessed. In the paper, hierarchical cluster analysis and principal component analysis were performed to evaluate the DART MS data of Radix Aconiti Preparata samples in different processing times. The results showed that the well processed Radix Aconiti Preparata, unqualified processed and the raw Radix Aconiti could be clustered reasonably corresponding to their constituents. The loading plot shows that the main chemical markers having the most influence on the discrimination amongst the qualified and unqualified samples were mainly some monoester diterpenoid aconitines and diester diterpenoid aconitines, i.e. benzoylmesaconine, hypaconitine, mesaconitine, neoline, benzoylhypaconine, benzoylaconine, fuziline, aconitine and 10-OH-mesaconitine. The established DART MS approach in combination with multivariate data analysis provides a very flexible and reliable method for quality assessment of toxic herbal medicine. Copyright © 2012 Elsevier B.V. All rights reserved.
Hundreds of new cluster candidates in the VISTA Variables in the Vía Láctea survey DR1

NASA Astrophysics Data System (ADS)

Barbá, R. H.; Roman-Lopes, A.; Nilo Castellón, J. L.; Firpo, V.; Minniti, D.; Lucas, P.; Emerson, J. P.; Hempel, M.; Soto, M.; Saito, R. K.

2015-09-01

Context. VISTA variables in the Vía Láctea is an ESO Public survey dedicated to scanning the bulge and an adjacent portion of the Galactic disk in the fourth quadrant using the VISTA telescope and its near-infrared camera VIRCAM. One of the leading goals of the VVV survey is to contribute to knowledge of the star cluster population of the Milky Way. Aims: To improve the census of Galactic star clusters, we performed a systematic and careful scan of the JHKs images of the Galactic plane section of the VVV survey. Methods: Our detection procedure is based on a combination of stellar density maps and visual inspection of promising features in the J-, H-, and KS-band images. The material examined are VVV JHKS color-composite images corresponding to Data Release 1 of VVV. Results: We report the discovery of 493 new infrared star cluster candidates. The analysis of the spatial distribution show that the clusters are very concentrated in the Galactic plane, presenting some local maxima around the position of large star-forming complexes, such as G305, RCW 95, and RCW 106. The vast majority of the new star cluster candidates are quite compact and generally surrounded by bright and/or dark nebulosities. IRAS point sources are associated with 59% of the sample, while 88% are associated with MSX point sources. GLIMPSE 8 μm images of the cluster candidates show a variety of morphologies, with 292 clusters dominated by knotty sources, while 361 clusters show some kind of nebulosity in this wavelength regime. Spatial cross-correlation with young stellar objects, masers, and extended green-object catalogs suggest that a large sample of the new cluster candidates are extremely young. In particular, 104 star clusters associated with methanol masers are excellent candidates for ongoing massive star formation. Also, there is a special set of sixteen cluster candidates that present clear signposts of star-forming activity having associated simultaneosly dark nebulae, young stellar objects, extended green objects, and masers. Full Tables 1-3 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/581/A120
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

PubMed Central

Diaz-Ordaz, Karla; Bartlett, Jonathan W

2016-01-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885
Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

PubMed

Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

2017-06-01

Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.
Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

PubMed

Ramón, M; Martínez-Pastor, F

2018-04-23

Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.
Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

NASA Astrophysics Data System (ADS)

Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

2017-08-01

Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.
Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome.

PubMed

Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard

2014-07-01

Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.
Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn

2014-07-15

Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). Conclusions: A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.« less
Brief Report: Clustered Forward Chaining with Embedded Mastery Probes to Teach Recipe Following

ERIC Educational Resources Information Center

Chazin, Kate T.; Bartelmay, Danielle N.; Lambert, Joseph M.; Houchins-Juárez, Nealetta J.

2017-01-01

This study evaluated the effectiveness of a clustered forward chaining (CFC) procedure to teach a 23-year-old male with autism to follow written recipes. CFC incorporates elements of forward chaining (FC) and total task chaining (TTC) by teaching a small number of steps (i.e., units) using TTC, introducing new units sequentially (akin to FC), and…
Development of the ion source for cluster implantation

NASA Astrophysics Data System (ADS)

Kulevoy, T. V.; Seleznev, D. N.; Kozlov, A. V.; Kuibeda, R. P.; Kropachev, G. N.; Alexeyenko, O. V.; Dugin, S. N.; Oks, E. M.; Gushenets, V. I.; Hershcovitch, A.; Jonson, B.; Poole, H. J.

2014-02-01

Bernas ion source development to meet needs of 100s of electron-volt ion implanters for shallow junction production is in progress in Institute for Theoretical and Experimental Physics. The ion sources provides high intensity ion beam of boron clusters under self-cleaning operation mode. The last progress with ion source operation is presented. The mechanism of self-cleaning procedure is described.

Noise/spike detection in phonocardiogram signal as a cyclic random process with non-stationary period interval.

PubMed

Naseri, H; Homaeinezhad, M R; Pourkhajeh, H

2013-09-01

The major aim of this study is to describe a unified procedure for detecting noisy segments and spikes in transduced signals with a cyclic but non-stationary periodic nature. According to this procedure, the cycles of the signal (onset and offset locations) are detected. Then, the cycles are clustered into a finite number of groups based on appropriate geometrical- and frequency-based time series. Next, the median template of each time series of each cluster is calculated. Afterwards, a correlation-based technique is devised for making a comparison between a test cycle feature and the associated time series of each cluster. Finally, by applying a suitably chosen threshold for the calculated correlation values, a segment is prescribed to be either clean or noisy. As a key merit of this research, the procedure can introduce a decision support for choosing accurately orthogonal-expansion-based filtering or to remove noisy segments. In this paper, the application procedure of the proposed method is comprehensively described by applying it to phonocardiogram (PCG) signals for finding noisy cycles. The database consists of 126 records from several patients of a domestic research station acquired by a 3M Littmann(®) 3200, 4KHz sampling frequency electronic stethoscope. By implementing the noisy segments detection algorithm with this database, a sensitivity of Se=91.41% and a positive predictive value, PPV=92.86% were obtained based on physicians assessments. Copyright © 2013 Elsevier Ltd. All rights reserved.
Shadow detection and removal in RGB VHR images for land use unsupervised classification

NASA Astrophysics Data System (ADS)

Movia, A.; Beinat, A.; Crosilla, F.

2016-09-01

Nowadays, high resolution aerial images are widely available thanks to the diffusion of advanced technologies such as UAVs (Unmanned Aerial Vehicles) and new satellite missions. Although these developments offer new opportunities for accurate land use analysis and change detection, cloud and terrain shadows actually limit benefits and possibilities of modern sensors. Focusing on the problem of shadow detection and removal in VHR color images, the paper proposes new solutions and analyses how they can enhance common unsupervised classification procedures for identifying land use classes related to the CO2 absorption. To this aim, an improved fully automatic procedure has been developed for detecting image shadows using exclusively RGB color information, and avoiding user interaction. Results show a significant accuracy enhancement with respect to similar methods using RGB based indexes. Furthermore, novel solutions derived from Procrustes analysis have been applied to remove shadows and restore brightness in the images. In particular, two methods implementing the so called "anisotropic Procrustes" and the "not-centered oblique Procrustes" algorithms have been developed and compared with the linear correlation correction method based on the Cholesky decomposition. To assess how shadow removal can enhance unsupervised classifications, results obtained with classical methods such as k-means, maximum likelihood, and self-organizing maps, have been compared to each other and with a supervised clustering procedure.
Mapping extragalactic dark matter annihilation with galaxy surveys: A systematic study of stacked group searches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lisanti, Mariangela; Mishra-Sharma, Siddharth; Rodd, Nicholas L.

Dark matter in the halos surrounding galaxy groups and clusters can annihilate to high-energy photons. Recent advancements in the construction of galaxy group catalogs provide many thousands of potential extragalactic targets for dark matter. In this paper, we outline a procedure to infer the dark matter signal associated with a given galaxy group. Applying this procedure to a catalog of sources, one can create a full-sky map of the brightest extragalactic dark matter targets in the nearby Universe (z≲0.03), supplementing sources of dark matter annihilation from within the local group. As with searches for dark matter in dwarf galaxies, thesemore » extragalactic targets can be stacked together to enhance the signals associated with dark matter. We validate this procedure on mock Fermi gamma-ray data sets using a galaxy catalog constructed from the DarkSky N-body cosmological simulation and demonstrate that the limits are robust, at O(1) levels, to systematic uncertainties on halo mass and concentration. We also quantify other sources of systematic uncertainty arising from the analysis and modeling assumptions. Lastly, our results suggest that a stacking analysis using galaxy group catalogs provides a powerful opportunity to discover extragalactic dark matter and complements existing studies of Milky Way dwarf galaxies.« less
Mapping extragalactic dark matter annihilation with galaxy surveys: A systematic study of stacked group searches

NASA Astrophysics Data System (ADS)

Lisanti, Mariangela; Mishra-Sharma, Siddharth; Rodd, Nicholas L.; Safdi, Benjamin R.; Wechsler, Risa H.

2018-03-01

Dark matter in the halos surrounding galaxy groups and clusters can annihilate to high-energy photons. Recent advancements in the construction of galaxy group catalogs provide many thousands of potential extragalactic targets for dark matter. In this paper, we outline a procedure to infer the dark matter signal associated with a given galaxy group. Applying this procedure to a catalog of sources, one can create a full-sky map of the brightest extragalactic dark matter targets in the nearby Universe (z ≲0.03 ), supplementing sources of dark matter annihilation from within the local group. As with searches for dark matter in dwarf galaxies, these extragalactic targets can be stacked together to enhance the signals associated with dark matter. We validate this procedure on mock Fermi gamma-ray data sets using a galaxy catalog constructed from the DarkSky N -body cosmological simulation and demonstrate that the limits are robust, at O (1 ) levels, to systematic uncertainties on halo mass and concentration. We also quantify other sources of systematic uncertainty arising from the analysis and modeling assumptions. Our results suggest that a stacking analysis using galaxy group catalogs provides a powerful opportunity to discover extragalactic dark matter and complements existing studies of Milky Way dwarf galaxies.
Mapping extragalactic dark matter annihilation with galaxy surveys: A systematic study of stacked group searches

DOE PAGES

Lisanti, Mariangela; Mishra-Sharma, Siddharth; Rodd, Nicholas L.; ...

2018-03-09

Dark matter in the halos surrounding galaxy groups and clusters can annihilate to high-energy photons. Recent advancements in the construction of galaxy group catalogs provide many thousands of potential extragalactic targets for dark matter. In this paper, we outline a procedure to infer the dark matter signal associated with a given galaxy group. Applying this procedure to a catalog of sources, one can create a full-sky map of the brightest extragalactic dark matter targets in the nearby Universe (z≲0.03), supplementing sources of dark matter annihilation from within the local group. As with searches for dark matter in dwarf galaxies, thesemore » extragalactic targets can be stacked together to enhance the signals associated with dark matter. We validate this procedure on mock Fermi gamma-ray data sets using a galaxy catalog constructed from the DarkSky N-body cosmological simulation and demonstrate that the limits are robust, at O(1) levels, to systematic uncertainties on halo mass and concentration. We also quantify other sources of systematic uncertainty arising from the analysis and modeling assumptions. Lastly, our results suggest that a stacking analysis using galaxy group catalogs provides a powerful opportunity to discover extragalactic dark matter and complements existing studies of Milky Way dwarf galaxies.« less
Near real-time space-time cluster analysis for detection of enteric disease outbreaks in a community setting.

PubMed

Glatman-Freedman, Aharona; Kaufman, Zalman; Kopel, Eran; Bassal, Ravit; Taran, Diana; Valinsky, Lea; Agmon, Vered; Shpriz, Manor; Cohen, Daniel; Anis, Emilia; Shohat, Tamy

2016-08-01

To enhance timely surveillance of bacterial enteric pathogens, space-time cluster analysis was introduced in Israel in May 2013. Stool isolation data of Salmonella, Shigella, and Campylobacter from patients of a large Health Maintenance Organization were analyzed weekly by ArcGIS and SaTScan, and cluster results were sent promptly to local departments of health (LDOHs). During eighteen months, we identified 52 Shigella sonnei clusters, two Salmonella clusters, and no Campylobacter clusters. S. sonnei clusters lasted from one to 33 days and included three to 30 individuals. Thirty-one (60%) of the S. sonnei clusters were known to LDOHs prior to cluster analysis. Clusters not previously known by the LDOHs prompted epidemiologic investigations. In 31 of the 37 (84%) confirmed clusters, educational institutes (nursery schools, kindergartens, and a primary school) were involved. Cluster analysis demonstrated capability to complement enteric disease surveillance. Scaling up the system can further enhance timely detection and control of outbreaks. Copyright © 2016 The British Infection Association. Published by Elsevier Ltd. All rights reserved.
An effective fuzzy kernel clustering analysis approach for gene expression data.

PubMed

Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

2015-01-01

Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Single exposure three-dimensional imaging of dusty plasma clusters.

PubMed

Hartmann, Peter; Donkó, István; Donkó, Zoltán

2013-02-01

We have worked out the details of a single camera, single exposure method to perform three-dimensional imaging of a finite particle cluster. The procedure is based on the plenoptic imaging principle and utilizes a commercial Lytro light field still camera. We demonstrate the capabilities of our technique on a single layer particle cluster in a dusty plasma, where the camera is aligned and inclined at a small angle to the particle layer. The reconstruction of the third coordinate (depth) is found to be accurate and even shadowing particles can be identified.
Identifying protein domains by global analysis of soluble fragment data.

PubMed

Bulloch, Esther M M; Kingston, Richard L

2014-11-15

The production and analysis of individual structural domains is a common strategy for studying large or complex proteins, which may be experimentally intractable in their full-length form. However, identifying domain boundaries is challenging if there is little structural information concerning the protein target. One experimental procedure for mapping domains is to screen a library of random protein fragments for solubility, since truncation of a domain will typically expose hydrophobic groups, leading to poor fragment solubility. We have coupled fragment solubility screening with global data analysis to develop an effective method for identifying structural domains within a protein. A gene fragment library is generated using mechanical shearing, or by uracil doping of the gene and a uracil-specific enzymatic digest. A split green fluorescent protein (GFP) assay is used to screen the corresponding protein fragments for solubility when expressed in Escherichia coli. The soluble fragment data are then analyzed using two complementary approaches. Fragmentation "hotspots" indicate possible interdomain regions. Clustering algorithms are used to group related fragments, and concomitantly predict domain location. The effectiveness of this Domain Seeking procedure is demonstrated by application to the well-characterized human protein p85α. Copyright © 2014 Elsevier Inc. All rights reserved.
Reexamining cluster radioactivity in trans-lead nuclei with consideration of specific density distributions in daughter nuclei and clusters

NASA Astrophysics Data System (ADS)

Qian, Yibin; Ren, Zhongzhou; Ni, Dongdong

2016-08-01

We further investigate the cluster emission from heavy nuclei beyond the lead region in the framework of the preformed cluster model. The refined cluster-core potential is constructed by the double-folding integral of the density distributions of the daughter nucleus and the emitted cluster, where the radius or the diffuseness parameter in the Fermi density distribution formula is determined according to the available experimental data on the charge radii and the neutron skin thickness. The Schrödinger equation of the cluster-daughter relative motion is then solved within the outgoing Coulomb wave-function boundary conditions to obtain the decay width. It is found that the present decay width of cluster emitters is clearly enhanced as compared to that in the previous case, which involved the fixed parametrization for the density distributions of daughter nuclei and clusters. Among the whole procedure, the nuclear deformation of clusters is also introduced into the calculations, and the degree of its influence on the final decay half-life is checked to some extent. Moreover, the effect from the bubble density distribution of clusters on the final decay width is carefully discussed by using the central depressed distribution.
Possible world based consistency learning model for clustering and classifying uncertain data.

PubMed

Liu, Han; Zhang, Xianchao; Zhang, Xiaotong

2018-06-01

Possible world has shown to be effective for handling various types of data uncertainty in uncertain data management. However, few uncertain data clustering and classification algorithms are proposed based on possible world. Moreover, existing possible world based algorithms suffer from the following issues: (1) they deal with each possible world independently and ignore the consistency principle across different possible worlds; (2) they require the extra post-processing procedure to obtain the final result, which causes that the effectiveness highly relies on the post-processing method and the efficiency is also not very good. In this paper, we propose a novel possible world based consistency learning model for uncertain data, which can be extended both for clustering and classifying uncertain data. This model utilizes the consistency principle to learn a consensus affinity matrix for uncertain data, which can make full use of the information across different possible worlds and then improve the clustering and classification performance. Meanwhile, this model imposes a new rank constraint on the Laplacian matrix of the consensus affinity matrix, thereby ensuring that the number of connected components in the consensus affinity matrix is exactly equal to the number of classes. This also means that the clustering and classification results can be directly obtained without any post-processing procedure. Furthermore, for the clustering and classification tasks, we respectively derive the efficient optimization methods to solve the proposed model. Experimental results on real benchmark datasets and real world uncertain datasets show that the proposed model outperforms the state-of-the-art uncertain data clustering and classification algorithms in effectiveness and performs competitively in efficiency. Copyright © 2018 Elsevier Ltd. All rights reserved.
Mining the modular structure of protein interaction networks.

PubMed

Berenstein, Ariel José; Piñero, Janet; Furlong, Laura Inés; Chernomoretz, Ariel

2015-01-01

Cluster-based descriptions of biological networks have received much attention in recent years fostered by accumulated evidence of the existence of meaningful correlations between topological network clusters and biological functional modules. Several well-performing clustering algorithms exist to infer topological network partitions. However, due to respective technical idiosyncrasies they might produce dissimilar modular decompositions of a given network. In this contribution, we aimed to analyze how alternative modular descriptions could condition the outcome of follow-up network biology analysis. We considered a human protein interaction network and two paradigmatic cluster recognition algorithms, namely: the Clauset-Newman-Moore and the infomap procedures. We analyzed to what extent both methodologies yielded different results in terms of granularity and biological congruency. In addition, taking into account Guimera's cartographic role characterization of network nodes, we explored how the adoption of a given clustering methodology impinged on the ability to highlight relevant network meso-scale connectivity patterns. As a case study we considered a set of aging related proteins and showed that only the high-resolution modular description provided by infomap, could unveil statistically significant associations between them and inter/intra modular cartographic features. Besides reporting novel biological insights that could be gained from the discovered associations, our contribution warns against possible technical concerns that might affect the tools used to mine for interaction patterns in network biology studies. In particular our results suggested that sub-optimal partitions from the strict point of view of their modularity levels might still be worth being analyzed when meso-scale features were to be explored in connection with external source of biological knowledge.
Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

ERIC Educational Resources Information Center

de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

2006-01-01

K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…
Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

PubMed Central

2014-01-01

Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591
A generalized analysis of hydrophobic and loop clusters within globular protein sequences

PubMed Central

Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle

2007-01-01

Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072
Cluster-based analysis improves predictive validity of spike-triggered receptive field estimates

PubMed Central

Malone, Brian J.

2017-01-01

Spectrotemporal receptive field (STRF) characterization is a central goal of auditory physiology. STRFs are often approximated by the spike-triggered average (STA), which reflects the average stimulus preceding a spike. In many cases, the raw STA is subjected to a threshold defined by gain values expected by chance. However, such correction methods have not been universally adopted, and the consequences of specific gain-thresholding approaches have not been investigated systematically. Here, we evaluate two classes of statistical correction techniques, using the resulting STRF estimates to predict responses to a novel validation stimulus. The first, more traditional technique eliminated STRF pixels (time-frequency bins) with gain values expected by chance. This correction method yielded significant increases in prediction accuracy, including when the threshold setting was optimized for each unit. The second technique was a two-step thresholding procedure wherein clusters of contiguous pixels surviving an initial gain threshold were then subjected to a cluster mass threshold based on summed pixel values. This approach significantly improved upon even the best gain-thresholding techniques. Additional analyses suggested that allowing threshold settings to vary independently for excitatory and inhibitory subfields of the STRF resulted in only marginal additional gains, at best. In summary, augmenting reverse correlation techniques with principled statistical correction choices increased prediction accuracy by over 80% for multi-unit STRFs and by over 40% for single-unit STRFs, furthering the interpretational relevance of the recovered spectrotemporal filters for auditory systems analysis. PMID:28877194
Canonical PSO Based K-Means Clustering Approach for Real Datasets.

PubMed

Dey, Lopamudra; Chakraborty, Sanjay

2014-01-01

"Clustering" the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

PubMed

Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

2015-01-01

To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

Thermodynamic properties of small aggregates of rare-gas atoms

NASA Technical Reports Server (NTRS)

Etters, R. D.; Kaelberer, J.

1975-01-01

The present work reports on the equilibrium thermodynamic properties of small clusters of xenon, krypton, and argon atoms, determined from a biased random-walk Monte Carlo procedure. Cluster sizes ranged from 3 to 13 atoms. Each cluster was found to have an abrupt liquid-gas phase transition at a temperature much less than for the bulk material. An abrupt solid-liquid transition is observed for thirteen- and eleven-particle clusters. For cluster sizes smaller than 11, a gradual transition from solid to liquid occurred over a fairly broad range of temperatures. Distribution of number of bond lengths as a function of bond length was calculated for several systems at various temperatures. The effects of box boundary conditions are discussed. Results show the importance of a correct description of boundary conditions. A surprising result is the slow rate at which system properties approach bulk behavior as cluster size is increased.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

ERIC Educational Resources Information Center

Steinley, Douglas; Brusco, Michael J.

2011-01-01

This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation

ERIC Educational Resources Information Center

DiStefano, Christine; Kamphaus, R. W.

2006-01-01

Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…
Predicting Student Actions in a Procedural Training Environment

ERIC Educational Resources Information Center

Riofrio-Luzcando, Diego; Ramirez, Jaime; Berrocal-Lobo, Marta

2017-01-01

Data mining is known to have a potential for predicting user performance. However, there are few studies that explore its potential for predicting student behavior in a procedural training environment. This paper presents a collective student model, which is built from past student logs. These logs are first grouped into clusters. Then, an…
Segmentation by fusion of histogram-based k-means clusters in different color spaces.

PubMed

Mignotte, Max

2008-05-01

This paper presents a new, simple, and efficient segmentation approach, based on a fusion procedure which aims at combining several segmentation maps associated to simpler partition models in order to finally get a more reliable and accurate segmentation result. The different label fields to be fused in our application are given by the same and simple (K-means based) clustering technique on an input image expressed in different color spaces. Our fusion strategy aims at combining these segmentation maps with a final clustering procedure using as input features, the local histogram of the class labels, previously estimated and associated to each site and for all these initial partitions. This fusion framework remains simple to implement, fast, general enough to be applied to various computer vision applications (e.g., motion detection and segmentation), and has been successfully applied on the Berkeley image database. The experiments herein reported in this paper illustrate the potential of this approach compared to the state-of-the-art segmentation methods recently proposed in the literature.
Quantum wavepacket ab initio molecular dynamics: an approach for computing dynamically averaged vibrational spectra including critical nuclear quantum effects.

PubMed

Sumner, Isaiah; Iyengar, Srinivasan S

2007-10-18

We have introduced a computational methodology to study vibrational spectroscopy in clusters inclusive of critical nuclear quantum effects. This approach is based on the recently developed quantum wavepacket ab initio molecular dynamics method that combines quantum wavepacket dynamics with ab initio molecular dynamics. The computational efficiency of the dynamical procedure is drastically improved (by several orders of magnitude) through the utilization of wavelet-based techniques combined with the previously introduced time-dependent deterministic sampling procedure measure to achieve stable, picosecond length, quantum-classical dynamics of electrons and nuclei in clusters. The dynamical information is employed to construct a novel cumulative flux/velocity correlation function, where the wavepacket flux from the quantized particle is combined with classical nuclear velocities to obtain the vibrational density of states. The approach is demonstrated by computing the vibrational density of states of [Cl-H-Cl]-, inclusive of critical quantum nuclear effects, and our results are in good agreement with experiment. A general hierarchical procedure is also provided, based on electronic structure harmonic frequencies, classical ab initio molecular dynamics, computation of nuclear quantum-mechanical eigenstates, and employing quantum wavepacket ab initio dynamics to understand vibrational spectroscopy in hydrogen-bonded clusters that display large degrees of anharmonicities.
A DNA fingerprinting procedure for ultra high-throughput genetic analysis of insects.

PubMed

Schlipalius, D I; Waldron, J; Carroll, B J; Collins, P J; Ebert, P R

2001-12-01

Existing procedures for the generation of polymorphic DNA markers are not optimal for insect studies in which the organisms are often tiny and background molecular information is often non-existent. We have used a new high throughput DNA marker generation protocol called randomly amplified DNA fingerprints (RAF) to analyse the genetic variability in three separate strains of the stored grain pest, Rhyzopertha dominica. This protocol is quick, robust and reliable even though it requires minimal sample preparation, minute amounts of DNA and no prior molecular analysis of the organism. Arbitrarily selected oligonucleotide primers routinely produced approximately 50 scoreable polymorphic DNA markers, between individuals of three independent field isolates of R. dominica. Multivariate cluster analysis using forty-nine arbitrarily selected polymorphisms generated from a single primer reliably separated individuals into three clades corresponding to their geographical origin. The resulting clades were quite distinct, with an average genetic difference of 37.5 +/- 6.0% between clades and of 21.0 +/- 7.1% between individuals within clades. As a prelude to future gene mapping efforts, we have also assessed the performance of RAF under conditions commonly used in gene mapping. In this analysis, fingerprints from pooled DNA samples accurately and reproducibly reflected RAF profiles obtained from individual DNA samples that had been combined to create the bulked samples.
Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

PubMed

Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

2017-06-01

Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.
Geomorphological activity at a rock glacier front detected with a 3D density-based clustering algorithm

NASA Astrophysics Data System (ADS)

Micheletti, Natan; Tonini, Marj; Lane, Stuart N.

2017-02-01

Acquisition of high density point clouds using terrestrial laser scanners (TLSs) has become commonplace in geomorphic science. The derived point clouds are often interpolated onto regular grids and the grids compared to detect change (i.e. erosion and deposition/advancement movements). This procedure is necessary for some applications (e.g. digital terrain analysis), but it inevitably leads to a certain loss of potentially valuable information contained within the point clouds. In the present study, an alternative methodology for geomorphological analysis and feature detection from point clouds is proposed. It rests on the use of the Density-Based Spatial Clustering of Applications with Noise (DBSCAN), applied to TLS data for a rock glacier front slope in the Swiss Alps. The proposed methods allowed the detection and isolation of movements directly from point clouds which yield to accuracies in the following computation of volumes that depend only on the actual registered distance between points. We demonstrated that these values are more conservative than volumes computed with the traditional DEM comparison. The results are illustrated for the summer of 2015, a season of enhanced geomorphic activity associated with exceptionally high temperatures.
Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

PubMed Central

Wang, Min; Kornblau, Steven M; Coombes, Kevin R

2018-01-01

Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable. PMID:29881252
Cross-scale analysis of cluster correspondence using different operational neighborhoods

NASA Astrophysics Data System (ADS)

Lu, Yongmei; Thill, Jean-Claude

2008-09-01

Cluster correspondence analysis examines the spatial autocorrelation of multi-location events at the local scale. This paper argues that patterns of cluster correspondence are highly sensitive to the definition of operational neighborhoods that form the spatial units of analysis. A subset of multi-location events is examined for cluster correspondence if they are associated with the same operational neighborhood. This paper discusses the construction of operational neighborhoods for cluster correspondence analysis based on the spatial properties of the underlying zoning system and the scales at which the zones are aggregated into neighborhoods. Impacts of this construction on the degree of cluster correspondence are also analyzed. Empirical analyses of cluster correspondence between paired vehicle theft and recovery locations are conducted on different zoning methods and across a series of geographic scales and the dynamics of cluster correspondence patterns are discussed.
M-DAS: System for multispectral data analysis. [in Saginaw Bay, Michigan

NASA Technical Reports Server (NTRS)

Johnson, R. H.

1975-01-01

M-DAS is a ground data processing system designed for analysis of multispectral data. M-DAS operates on multispectral data from LANDSAT, S-192, M2S and other sources in CCT form. Interactive training by operator-investigators using a variable cursor on a color display was used to derive optimum processing coefficients and data on cluster separability. An advanced multivariate normal-maximum likelihood processing algorithm was used to produce output in various formats: color-coded film images, geometrically corrected map overlays, moving displays of scene sections, coverage tabulations and categorized CCTs. The analysis procedure for M-DAS involves three phases: (1) screening and training, (2) analysis of training data to compute performance predictions and processing coefficients, and (3) processing of multichannel input data into categorized results. Typical M-DAS applications involve iteration between each of these phases. A series of photographs of the M-DAS display are used to illustrate M-DAS operation.
A K-means multivariate approach for clustering independent components from magnetoencephalographic data.

PubMed

Spadone, Sara; de Pasquale, Francesco; Mantini, Dante; Della Penna, Stefania

2012-09-01

Independent component analysis (ICA) is typically applied on functional magnetic resonance imaging, electroencephalographic and magnetoencephalographic (MEG) data due to its data-driven nature. In these applications, ICA needs to be extended from single to multi-session and multi-subject studies for interpreting and assigning a statistical significance at the group level. Here a novel strategy for analyzing MEG independent components (ICs) is presented, Multivariate Algorithm for Grouping MEG Independent Components K-means based (MAGMICK). The proposed approach is able to capture spatio-temporal dynamics of brain activity in MEG studies by running ICA at subject level and then clustering the ICs across sessions and subjects. Distinctive features of MAGMICK are: i) the implementation of an efficient set of "MEG fingerprints" designed to summarize properties of MEG ICs as they are built on spatial, temporal and spectral parameters; ii) the implementation of a modified version of the standard K-means procedure to improve its data-driven character. This algorithm groups the obtained ICs automatically estimating the number of clusters through an adaptive weighting of the parameters and a constraint on the ICs independence, i.e. components coming from the same session (at subject level) or subject (at group level) cannot be grouped together. The performances of MAGMICK are illustrated by analyzing two sets of MEG data obtained during a finger tapping task and median nerve stimulation. The results demonstrate that the method can extract consistent patterns of spatial topography and spectral properties across sessions and subjects that are in good agreement with the literature. In addition, these results are compared to those from a modified version of affinity propagation clustering method. The comparison, evaluated in terms of different clustering validity indices, shows that our methodology often outperforms the clustering algorithm. Eventually, these results are confirmed by a comparison with a MEG tailored version of the self-organizing group ICA, which is largely used for fMRI IC clustering. Copyright © 2012 Elsevier Inc. All rights reserved.
Inductive Approaches to Improving Diagnosis and Design for Diagnosability

NASA Technical Reports Server (NTRS)

Fisher, Douglas H. (Principal Investigator)

1995-01-01

The first research area under this grant addresses the problem of classifying time series according to their morphological features in the time domain. A supervised learning system called CALCHAS, which induces a classification procedure for signatures from preclassified examples, was developed. For each of several signature classes, the system infers a model that captures the class's morphological features using Bayesian model induction and the minimum message length approach to assign priors. After induction, a time series (signature) is classified in one of the classes when there is enough evidence to support that decision. Time series with sufficiently novel features, belonging to classes not present in the training set, are recognized as such. A second area of research assumes two sources of information about a system: a model or domain theory that encodes aspects of the system under study and data from actual system operations over time. A model, when it exists, represents strong prior expectations about how a system will perform. Our work with a diagnostic model of the RCS (Reaction Control System) of the Space Shuttle motivated the development of SIG, a system which combines information from a model (or domain theory) and data. As it tracks RCS behavior, the model computes quantitative and qualitative values. Induction is then performed over the data represented by both the 'raw' features and the model-computed high-level features. Finally, work on clustering for operating mode discovery motivated some important extensions to the clustering strategy we had used. One modification appends an iterative optimization technique onto the clustering system; this optimization strategy appears to be novel in the clustering literature. A second modification improves the noise tolerance of the clustering system. In particular, we adapt resampling-based pruning strategies used by supervised learning systems to the task of simplifying hierarchical clusterings, thus making post-clustering analysis easier.
Population delineation of polar bears using satellite collar data

USGS Publications Warehouse

Bethke, R.; Taylor, Mitchell K.; Amstrup, Steven C.; Messier, François

1996-01-01

To produce reliable estimates of the size or vital rates of a given population, it is important that the boundaries of the population under study are clearly defined. This is particularly critical for large, migratory animals where levels of sustainable harvest are based on these estimates, and where small errors may have serious long-term consequences for the population. Once populations are delineated, rates of exchange between adjacent populations can be determined and accounted/corrected for when calculating abundance (e.g., based on mark-recapture data). Using satellite radio-collar locations for polar bears in the western Canadian Arctic, we illustrate one approach to delineating wildlife populations that integrates cluster analysis methods for determining group membership with home range plotting procedures to define spatial utilization. This approach is flexible with respect to the specific procedures used and provides an objective and quantitative basis for defining population boundaries.
Chemometric study of Maya Blue from the voltammetry of microparticles approach.

PubMed

Doménech, Antonio; Doménech-Carbó, María Teresa; de Agredos Pascual, María Luisa Vazquez

2007-04-01

The use of the voltammetry of microparticles at paraffin-impregnated graphite electrodes allows for the characterization of different types of Maya Blue (MB) used in wall paintings from different archaeological sites of Campeche and YucatAn (Mexico). Using voltammetric signals for electron-transfer processes involving palygorskite-associated indigo and quinone functionalities generated by scratching the graphite surface, voltammograms provide information on the composition and texture of MB samples. Application of hierarchical cluster analysis and other chemometric methods allows us to characterize samples from different archaeological sites and to distinguish between samples proceeding from different chronological periods. Comparison between microscopic, spectroscopic, and electrochemical examination of genuine MB samples and synthetic specimens indicated that the preparation procedure of the pigment evolved in time via successive steps anticipating modern synthetic procedures, namely, hybrid organic-inorganic synthesis, temperature control of chemical reactivity, and template-like synthesis.
Multivariate Statistical Analysis of Water Quality data in Indian River Lagoon, Florida

NASA Astrophysics Data System (ADS)

Sayemuzzaman, M.; Ye, M.

2015-12-01

The Indian River Lagoon, is part of the longest barrier island complex in the United States, is a region of particular concern to the environmental scientist because of the rapid rate of human development throughout the region and the geographical position in between the colder temperate zone and warmer sub-tropical zone. Thus, the surface water quality analysis in this region always brings the newer information. In this present study, multivariate statistical procedures were applied to analyze the spatial and temporal water quality in the Indian River Lagoon over the period 1998-2013. Twelve parameters have been analyzed on twelve key water monitoring stations in and beside the lagoon on monthly datasets (total of 27,648 observations). The dataset was treated using cluster analysis (CA), principle component analysis (PCA) and non-parametric trend analysis. The CA was used to cluster twelve monitoring stations into four groups, with stations on the similar surrounding characteristics being in the same group. The PCA was then applied to the similar groups to find the important water quality parameters. The principal components (PCs), PC1 to PC5 was considered based on the explained cumulative variances 75% to 85% in each cluster groups. Nutrient species (phosphorus and nitrogen), salinity, specific conductivity and erosion factors (TSS, Turbidity) were major variables involved in the construction of the PCs. Statistical significant positive or negative trends and the abrupt trend shift were detected applying Mann-Kendall trend test and Sequential Mann-Kendall (SQMK), for each individual stations for the important water quality parameters. Land use land cover change pattern, local anthropogenic activities and extreme climate such as drought might be associated with these trends. This study presents the multivariate statistical assessment in order to get better information about the quality of surface water. Thus, effective pollution control/management of the surface waters can be undertaken.
The global Minmax k-means algorithm.

PubMed

Wang, Xiaoyan; Bai, Yanping

2016-01-01

The global k -means algorithm is an incremental approach to clustering that dynamically adds one cluster center at a time through a deterministic global search procedure from suitable initial positions, and employs k -means to minimize the sum of the intra-cluster variances. However the global k -means algorithm sometimes results singleton clusters and the initial positions sometimes are bad, after a bad initialization, poor local optimal can be easily obtained by k -means algorithm. In this paper, we modified the global k -means algorithm to eliminate the singleton clusters at first, and then we apply MinMax k -means clustering error method to global k -means algorithm to overcome the effect of bad initialization, proposed the global Minmax k -means algorithm. The proposed clustering method is tested on some popular data sets and compared to the k -means algorithm, the global k -means algorithm and the MinMax k -means algorithm. The experiment results show our proposed algorithm outperforms other algorithms mentioned in the paper.
Kappa statistic for clustered matched-pair data.

PubMed

Yang, Zhao; Zhou, Ming

2014-07-10

Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Open clusters in the Kepler field. II. NGC 6866

DOE Office of Scientific and Technical Information (OSTI.GOV)

Janes, Kenneth; Hoq, Sadia; Barnes, Sydney A.

We have developed a maximum-likelihood procedure to fit theoretical isochrones to the observed cluster color-magnitude diagrams of NGC 6866, an open cluster in the Kepler spacecraft field of view. The Markov chain Monte Carlo algorithm permits exploration of the entire parameter space of a set of isochrones to find both the best solution and the statistical uncertainties. For clusters in the age range of NGC 6866 with few, if any, red giant members, a purely photometric determination of the cluster properties is not well-constrained. Nevertheless, based on our UBVRI photometry alone, we have derived the distance, reddening, age, and metallicitymore » of the cluster and established estimates for the binary nature and membership probability of individual stars. We derive the following values for the cluster properties: (m – M) {sub V} = 10.98 ± 0.24, E(B – V) = 0.16 ± 0.04 (so the distance = 1250 pc), age =705 ± 170 Myr, and Z = 0.014 ± 0.005.« less

Defining clusters in APT reconstructions of ODS steels.

PubMed

Williams, Ceri A; Haley, Daniel; Marquis, Emmanuelle A; Smith, George D W; Moody, Michael P

2013-09-01

Oxide nanoclusters in a consolidated Fe-14Cr-2W-0.3Ti-0.3Y₂O₃ ODS steel and in the alloy powder after mechanical alloying (but before consolidation) are investigated by atom probe tomography (APT). The maximum separation method is a standard method to define and characterise clusters from within APT data, but this work shows that the extent of clustering between the two materials is sufficiently different that the nanoclusters in the mechanically alloyed powder and in the consolidated material cannot be compared directly using the same cluster selection parameters. As the cluster selection parameters influence the size and composition of the clusters significantly, a procedure to optimise the input parameters for the maximum separation method is proposed by sweeping the d(max) and N(min) parameter space. By applying this method of cluster parameter selection combined with a 'matrix correction' to account for trajectory aberrations, differences in the oxide nanoclusters can then be reliably quantified. Copyright © 2012 Elsevier B.V. All rights reserved.
Modest validity and fair reproducibility of dietary patterns derived by cluster analysis.

PubMed

Funtikova, Anna N; Benítez-Arciniega, Alejandra A; Fitó, Montserrat; Schröder, Helmut

2015-03-01

Cluster analysis is widely used to analyze dietary patterns. We aimed to analyze the validity and reproducibility of the dietary patterns defined by cluster analysis derived from a food frequency questionnaire (FFQ). We hypothesized that the dietary patterns derived by cluster analysis have fair to modest reproducibility and validity. Dietary data were collected from 107 individuals from population-based survey, by an FFQ at baseline (FFQ1) and after 1 year (FFQ2), and by twelve 24-hour dietary recalls (24-HDR). Repeatability and validity were measured by comparing clusters obtained by the FFQ1 and FFQ2 and by the FFQ2 and 24-HDR (reference method), respectively. Cluster analysis identified a "fruits & vegetables" and a "meat" pattern in each dietary data source. Cluster membership was concordant for 66.7% of participants in FFQ1 and FFQ2 (reproducibility), and for 67.0% in FFQ2 and 24-HDR (validity). Spearman correlation analysis showed reasonable reproducibility, especially in the "fruits & vegetables" pattern, and lower validity also especially in the "fruits & vegetables" pattern. κ statistic revealed a fair validity and reproducibility of clusters. Our findings indicate a reasonable reproducibility and fair to modest validity of dietary patterns derived by cluster analysis. Copyright © 2015 Elsevier Inc. All rights reserved.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.

PubMed

van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim

2017-01-01

In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
Ecological tolerances of Miocene larger benthic foraminifera from Indonesia

NASA Astrophysics Data System (ADS)

Novak, Vibor; Renema, Willem

2018-01-01

To provide a comprehensive palaeoenvironmental reconstruction based on larger benthic foraminifera (LBF), a quantitative analysis of their assemblage composition is needed. Besides microfacies analysis which includes environmental preferences of foraminiferal taxa, statistical analyses should also be employed. Therefore, detrended correspondence analysis and cluster analysis were performed on relative abundance data of identified LBF assemblages deposited in mixed carbonate-siliciclastic (MCS) systems and blue-water (BW) settings. Studied MCS system localities include ten sections from the central part of the Kutai Basin in East Kalimantan, ranging from late Burdigalian to Serravallian age. The BW samples were collected from eleven sections of the Bulu Formation on Central Java, dated as Serravallian. Results from detrended correspondence analysis reveal significant differences between these two environmental settings. Cluster analysis produced five clusters of samples; clusters 1 and 2 comprise dominantly MCS samples, clusters 3 and 4 with dominance of BW samples, and cluster 5 showing a mixed composition with both MCS and BW samples. The results of cluster analysis were afterwards subjected to indicator species analysis resulting in the interpretation that generated three groups among LBF taxa: typical assemblage indicators, regularly occurring taxa and rare taxa. By interpreting the results of detrended correspondence analysis, cluster analysis and indicator species analysis, along with environmental preferences of identified LBF taxa, a palaeoenvironmental model is proposed for the distribution of LBF in Miocene MCS systems and adjacent BW settings of Indonesia.
Identification of five chronic obstructive pulmonary disease subgroups with different prognoses in the ECLIPSE cohort using cluster analysis.

PubMed

Rennard, Stephen I; Locantore, Nicholas; Delafont, Bruno; Tal-Singer, Ruth; Silverman, Edwin K; Vestbo, Jørgen; Miller, Bruce E; Bakke, Per; Celli, Bartolomé; Calverley, Peter M A; Coxson, Harvey; Crim, Courtney; Edwards, Lisa D; Lomas, David A; MacNee, William; Wouters, Emiel F M; Yates, Julie C; Coca, Ignacio; Agustí, Alvar

2015-03-01

Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease that likely includes clinically relevant subgroups. To identify subgroups of COPD in ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) subjects using cluster analysis and to assess clinically meaningful outcomes of the clusters during 3 years of longitudinal follow-up. Factor analysis was used to reduce 41 variables determined at recruitment in 2,164 patients with COPD to 13 main factors, and the variables with the highest loading were used for cluster analysis. Clusters were evaluated for their relationship with clinically meaningful outcomes during 3 years of follow-up. The relationships among clinical parameters were evaluated within clusters. Five subgroups were distinguished using cross-sectional clinical features. These groups differed regarding outcomes. Cluster A included patients with milder disease and had fewer deaths and hospitalizations. Cluster B had less systemic inflammation at baseline but had notable changes in health status and emphysema extent. Cluster C had many comorbidities, evidence of systemic inflammation, and the highest mortality. Cluster D had low FEV1, severe emphysema, and the highest exacerbation and COPD hospitalization rate. Cluster E was intermediate for most variables and may represent a mixed group that includes further clusters. The relationships among clinical variables within clusters differed from that in the entire COPD population. Cluster analysis using baseline data in ECLIPSE identified five COPD subgroups that differ in outcomes and inflammatory biomarkers and show different relationships between clinical parameters, suggesting the clusters represent clinically and biologically different subtypes of COPD.
Surface enhanced Raman spectroscopy (SERS) from a molecule adsorbed on a nanoscale silver particle cluster in a holographic plate

NASA Astrophysics Data System (ADS)

Jusinski, Leonard E.; Bahuguna, Ramen; Das, Amrita; Arya, Karamjeet

2006-02-01

Surface enhanced Raman spectroscopy has become a viable technique for the detection of single molecules. This highly sensitive technique is due to the very large (up to 14 orders in magnitude) enhancement in the Raman cross section when the molecule is adsorbed on a metal nanoparticle cluster. We report here SERS (Surface Enhanced Raman Spectroscopy) experiments performed by adsorbing analyte molecules on nanoscale silver particle clusters within the gelatin layer of commercially available holographic plates which have been developed and fixed. The Ag particles range in size between 5 - 30 nanometers (nm). Sample preparation was performed by immersing the prepared holographic plate in an analyte solution for a few minutes. We report here the production of SERS signals from Rhodamine 6G (R6G) molecules of nanomolar concentration. These measurements demonstrate a fast, low cost, reproducible technique of producing SERS substrates in a matter of minutes compared to the conventional procedure of preparing Ag clusters from colloidal solutions. SERS active colloidal solutions require up to a full day to prepare. In addition, the preparations of colloidal aggregates are not consistent in shape, contain additional interfering chemicals, and do not generate consistent SERS enhancement. Colloidal solutions require the addition of KCl or NaCl to increase the ionic strength to allow aggregation and cluster formation. We find no need to add KCl or NaCl to create SERS active clusters in the holographic gelatin matrix. These holographic plates, prepared using simple, conventional procedures, can be stored in an inert environment and preserve SERS activity after several weeks subsequent to preparation.
Interactive visual exploration and refinement of cluster assignments.

PubMed

Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R

2017-09-12

With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.
Canonical PSO Based K-Means Clustering Approach for Real Datasets

PubMed Central

Dey, Lopamudra; Chakraborty, Sanjay

2014-01-01

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms. PMID:27355083
Research on dissociative seizures: A bibliometric analysis and visualization of the scientific landscape.

PubMed

Popkirov, Stoyan; Jungilligens, Johannes; Schlegel, Uwe; Wellmer, Jörg

2018-06-01

Dissociative seizures are a common and often elusive differential diagnosis in epilepsy centers. Considering their high prevalence, long diagnostic delays, and disappointing rates of treatment response, scientific research dedicated to dissociative seizures is surprisingly scarce. In order to chart the scientific landscape of dissociative seizures and to visualize thematic clusters and trends in research, a comprehensive bibliometric analysis was performed. The Web of Science database was examined to identify relevant English language documents from the last half-century. A total of 1751 documents with titles referring to dissociative seizures were identified. Automated textual analysis of all titles and abstracts revealed that research clusters around three major topics: differential diagnosis in epilepsy centers, management and treatment, and psychopathology. Time analysis of term networks revealed that the focus of clinical research has moved from diagnostic procedures to treatment approaches. Furthermore, interest within etiological research is shifting from an emphasis on early life trauma and personality traits to the role of anxiety and emotion regulation. With respect to individual contributing authors, a relatively small network of prolific scientists with a remarkable degree of collaboration emerges. By mapping relevant publications, it becomes evident that dissociative seizures still represent a subject mostly within the realm of neurology and epileptology, with a tendency to settle in the latter domain. This analysis sheds light on an important niche subject and highlights trends in research focus and output. Copyright © 2018 Elsevier Inc. All rights reserved.
MicroRNA-Target Network Inference and Local Network Enrichment Analysis Identify Two microRNA Clusters with Distinct Functions in Head and Neck Squamous Cell Carcinoma

PubMed Central

Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S.; Theis, Fabian J.

2015-01-01

MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method “miRlastic”, which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of miRNAs that were predicted to mediate HPV-associated dysregulation in HNSCC. Our novel approach was able to characterize distinct pathway regulations from matched miRNA and mRNA data. An R package of miRlastic was made available through: http://icb.helmholtz-muenchen.de/mirlastic. PMID:26694379
MicroRNA-Target Network Inference and Local Network Enrichment Analysis Identify Two microRNA Clusters with Distinct Functions in Head and Neck Squamous Cell Carcinoma.

PubMed

Sass, Steffen; Pitea, Adriana; Unger, Kristian; Hess, Julia; Mueller, Nikola S; Theis, Fabian J

2015-12-18

MicroRNAs represent ~22 nt long endogenous small RNA molecules that have been experimentally shown to regulate gene expression post-transcriptionally. One main interest in miRNA research is the investigation of their functional roles, which can typically be accomplished by identification of mi-/mRNA interactions and functional annotation of target gene sets. We here present a novel method "miRlastic", which infers miRNA-target interactions using transcriptomic data as well as prior knowledge and performs functional annotation of target genes by exploiting the local structure of the inferred network. For the network inference, we applied linear regression modeling with elastic net regularization on matched microRNA and messenger RNA expression profiling data to perform feature selection on prior knowledge from sequence-based target prediction resources. The novelty of miRlastic inference originates in predicting data-driven intra-transcriptome regulatory relationships through feature selection. With synthetic data, we showed that miRlastic outperformed commonly used methods and was suitable even for low sample sizes. To gain insight into the functional role of miRNAs and to determine joint functional properties of miRNA clusters, we introduced a local enrichment analysis procedure. The principle of this procedure lies in identifying regions of high functional similarity by evaluating the shortest paths between genes in the network. We can finally assign functional roles to the miRNAs by taking their regulatory relationships into account. We thoroughly evaluated miRlastic on a cohort of head and neck cancer (HNSCC) patients provided by The Cancer Genome Atlas. We inferred an mi-/mRNA regulatory network for human papilloma virus (HPV)-associated miRNAs in HNSCC. The resulting network best enriched for experimentally validated miRNA-target interaction, when compared to common methods. Finally, the local enrichment step identified two functional clusters of miRNAs that were predicted to mediate HPV-associated dysregulation in HNSCC. Our novel approach was able to characterize distinct pathway regulations from matched miRNA and mRNA data. An R package of miRlastic was made available through: http://icb.helmholtz-muenchen.de/mirlastic.
Clusters of Occupations Based on Systematically Derived Work Dimensions: An Exploratory Study.

ERIC Educational Resources Information Center

Cunningham, J. W.; And Others

The study explored the feasibility of deriving an educationally relevant occupational cluster structure based on Occupational Analysis Inventory (OAI) work dimensions. A hierarchical cluster analysis was applied to the factor score profiles of 814 occupations on 22 higher-order OAI work dimensions. From that analysis, 73 occupational clusters were…
Using cluster analysis to identify phenotypes and validation of mortality in men with COPD.

PubMed

Chen, Chiung-Zuei; Wang, Liang-Yi; Ou, Chih-Ying; Lee, Cheng-Hung; Lin, Chien-Chung; Hsiue, Tzuen-Ren

2014-12-01

Cluster analysis has been proposed to examine phenotypic heterogeneity in chronic obstructive pulmonary disease (COPD). The aim of this study was to use cluster analysis to define COPD phenotypes and validate them by assessing their relationship with mortality. Male subjects with COPD were recruited to identify and validate COPD phenotypes. Seven variables were assessed for their relevance to COPD, age, FEV(1) % predicted, BMI, history of severe exacerbations, mMRC, SpO(2), and Charlson index. COPD groups were identified by cluster analysis and validated prospectively against mortality during a 4-year follow-up. Analysis of 332 COPD subjects identified five clusters from cluster A to cluster E. Assessment of the predictive validity of these clusters of COPD showed that cluster E patients had higher all cause mortality (HR 18.3, p < 0.0001), and respiratory cause mortality (HR 21.5, p < 0.0001) than those in the other four groups. Cluster E patients also had higher all cause mortality (HR 14.3, p = 0.0002) and respiratory cause mortality (HR 10.1, p = 0.0013) than patients in cluster D alone. COPD patient with severe airflow limitation, many symptoms, and a history of frequent severe exacerbations was a novel and distinct clinical phenotype predicting mortality in men with COPD.
Neural activity in relation to empirically derived personality syndromes in depression using a psychodynamic fMRI paradigm

PubMed Central

Taubner, Svenja; Wiswede, Daniel; Kessler, Henrik

2013-01-01

Objective: The heterogeneity between patients with depression cannot be captured adequately with existing descriptive systems of diagnosis and neurobiological models of depression. Furthermore, considering the highly individual nature of depression, the application of general stimuli in past research efforts may not capture the essence of the disorder. This study aims to identify subtypes of depression by using empirically derived personality syndromes, and to explore neural correlates of the derived personality syndromes. Materials and Methods: In the present exploratory study, an individually tailored and psychodynamically based functional magnetic resonance imaging paradigm using dysfunctional relationship patterns was presented to 20 chronically depressed patients. Results from the Shedler–Westen Assessment Procedure (SWAP-200) were analyzed by Q-factor analysis to identify clinically relevant subgroups of depression and related brain activation. Results: The principle component analysis of SWAP-200 items from all 20 patients lead to a two-factor solution: “Depressive Personality” and “Emotional-Hostile-Externalizing Personality.” Both factors were used in a whole-brain correlational analysis but only the second factor yielded significant positive correlations in four regions: a large cluster in the right orbitofrontal cortex (OFC), the left ventral striatum, a small cluster in the left temporal pole, and another small cluster in the right middle frontal gyrus. Discussion: The degree to which patients with depression score high on the factor “Emotional-Hostile-Externalizing Personality” correlated with relatively higher activity in three key areas involved in emotion processing, evaluation of reward/punishment, negative cognitions, depressive pathology, and social knowledge (OFC, ventral striatum, temporal pole). Results may contribute to an alternative description of neural correlates of depression showing differential brain activation dependent on the extent of specific personality syndromes in depression. PMID:24363644
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

PubMed Central

Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

2015-01-01

Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
clusterProfiler: an R package for comparing biological themes among gene clusters.

PubMed

Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu

2012-05-01

Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.
Effect of centrifugation on dynamic susceptibility of magnetic fluids

NASA Astrophysics Data System (ADS)

Pshenichnikov, Alexander; Lebedev, Alexander; Lakhtina, Ekaterina; Kuznetsov, Andrey

2017-06-01

The dispersive composition, dynamic susceptibility and spectrum of times of magnetization relaxation for six samples of magnetic fluid obtained by centrifuging two base colloidal solutions of the magnetite in kerosene was investigated experimentally. The base solutions differed by the concentration of the magnetic phase and the width of the particle size distribution. The procedure of cluster analysis allowing one to estimate the characteristic sizes of aggregates with uncompensated magnetic moments was described. The results of the magnetogranulometric and cluster analyses were discussed. It was shown that centrifugation has a strong effect on the physical properties of the separated fractions, which is related to the spatial redistribution of particles and multi-particle aggregates. The presence of aggregates in magnetic fluids is interpreted as the main reason of low-frequency (0.1-10 kHz) dispersion of the dynamic susceptibility. The obtained results count in favor of using centrifugation as an effective means of changing the dynamic susceptibility over wide limits and obtaining fluids with the specified type of susceptibility dispersion.
Structural characterization of MAPLE deposited lipase biofilm

NASA Astrophysics Data System (ADS)

Aronne, Antonio; Ausanio, Giovanni; Bloisi, Francesco; Calabria, Raffaela; Califano, Valeria; Fanelli, Esther; Massoli, Patrizio; Vicari, Luciano R. M.

2014-11-01

Lipases (triacylglycerol ester hydrolases) are enzymes used in several industrial applications. Enzymes immobilization can be used to address key issues limiting widespread application at industrial level. Immobilization efficiency is related to the ability to preserve the native conformation of the enzyme. MAPLE (Matrix Assisted Pulsed Laser Evaporation) technique, a laser deposition procedure for treating organic/polymeric/biomaterials, was applied for the deposition of lipase enzyme in an ice matrix, using near infrared laser radiation. Microscopy analysis showed that the deposition occurred in micrometric and submicrometric clusters with a wide size distribution. AFM imaging showed that inter-cluster regions are uniformly covered with smaller aggregates of nanometric size. Fourier transform infrared spectroscopy was used for both recognizing the deposited material and analyzing its secondary structure. Results showed that the protein underwent reversible self-association during the deposition process. Actually, preliminary tests of MAPLE deposited lipase used for soybean oil transesterification with isopropyl alcohol followed by gas chromatography-mass spectrometry gave results consistent with undamaged deposition of lipase.
Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

PubMed

Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

2017-11-01

Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease
Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

PubMed

Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

2014-07-01

Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?

PubMed

Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J

2018-04-01

Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.
Cluster headache: present and future therapy.

PubMed

Leone, Massimo; Giustiniani, Alessandro; Cecchini, Alberto Proietti

2017-05-01

Cluster headache is characterized by severe, unilateral headache attacks of orbital, supraorbital or temporal pain lasting 15-180 min accompanied by ipsilateral lacrimation, rhinorrhea and other cranial autonomic manifestations. Cluster headache attacks need fast-acting abortive agents because the pain peaks very quickly; sumatriptan injection is the gold standard acute treatment. First-line preventative drugs include verapamil and carbolithium. Other drugs demonstrated effective in open trials include topiramate, valproic acid, gabapentin and others. Steroids are very effective; local injection in the occipital area is also effective but its prolonged use needs caution. Monoclonal antibodies against calcitonin gene-related peptide are under investigation as prophylactic agents in both episodic and chronic cluster headache. A number of neurostimulation procedures including occipital nerve stimulation, vagus nerve stimulation, sphenopalatine ganglion stimulation and the more invasive hypothalamic stimulation are employed in chronic intractable cluster headache.
Psychosocial Costs of Racism to Whites: Exploring Patterns through Cluster Analysis

ERIC Educational Resources Information Center

Spanierman, Lisa B.; Poteat, V. Paul; Beer, Amanda M.; Armstrong, Patrick Ian

2006-01-01

Participants (230 White college students) completed the Psychosocial Costs of Racism to Whites (PCRW) Scale. Using cluster analysis, we identified 5 distinct cluster groups on the basis of PCRW subscale scores: the unempathic and unaware cluster contained the lowest empathy scores; the insensitive and afraid cluster consisted of low empathy and…
Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

PubMed

Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

2017-12-01

Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.
Numerical trials of HISSE

NASA Technical Reports Server (NTRS)

Peters, C.; Kampe, F. (Principal Investigator)

1980-01-01

The mathematical description and implementation of the statistical estimation procedure known as the Houston integrated spatial/spectral estimator (HISSE) is discussed. HISSE is based on a normal mixture model and is designed to take advantage of spectral and spatial information of LANDSAT data pixels, utilizing the initial classification and clustering information provided by the AMOEBA algorithm. The HISSE calculates parametric estimates of class proportions which reduce the error inherent in estimates derived from typical classify and count procedures common to nonparametric clustering algorithms. It also singles out spatial groupings of pixels which are most suitable for labeling classes. These calculations are designed to aid the analyst/interpreter in labeling patches with a crop class label. Finally, HISSE's initial performance on an actual LANDSAT agricultural ground truth data set is reported.
Orbit Clustering Based on Transfer Cost

NASA Technical Reports Server (NTRS)

Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

2013-01-01

We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Teaching surgery takes time: the impact of surgical education on time in the operating room

PubMed Central

Vinden, Christopher; Malthaner, Richard; McGee, Jacob; McClure, J. Andrew; Winick-Ng, Jennifer; Liu, Kuan; Nash, Danielle M.; Welk, Blayne; Dubois, Luc

2016-01-01

Background It is generally accepted that surgical training is associated with increased surgical duration. The purpose of this study was to determine the magnitude of this increase for common surgical procedures by comparing surgery duration in teaching and nonteaching hospitals. Methods This retrospective population-based cohort study included all adult residents of Ontario, Canada, who underwent 1 of 14 surgical procedures between 2002 and 2012. We used several linked administrative databases to identify the study cohort in addition to patient-, surgeon- and procedure-related variables. We determined surgery duration using anesthesiology billing records. Negative binomial regression was used to model the association between teaching versus nonteaching hospital status and surgery duration. Results Of the 713 573 surgical cases included in this study, 20.8% were performed in a teaching hospital. For each procedure, the mean surgery duration was significantly longer for teaching hospitals, with differences ranging from 5 to 62 minutes across individual procedures in unadjusted analyses (all p < 0.001). In regression analysis, procedures performed in teaching hospitals were associated with an overall 22% (95% confidence interval 20%–24%) increase in surgery duration, adjusting for patient-, surgeon- and procedure-related variables as well as the clustering of patients within surgeons and hospitals. Conclusion Our results show that a wide range of surgical procedures require significantly more time to perform in teaching than nonteaching hospitals. Given the magnitude of this difference, the impact of surgical training on health care costs and clinical outcomes should be a priority for future studies. PMID:27007088
Mechanisms of Diagonal-Shear Failure in Reinforced Concrete Beams analyzed by AE-SiGMA

NASA Astrophysics Data System (ADS)

Ohno, Kentaro; Shimozono, Shinichiro; Sawada, Yosuke; Ohtsu, Masayasu

Serious shear failures in reinforced concrete (RC) structures were reported in the Hanshin-Awaji Earthquake. In particular, it was demonstrated that a diagonal-shear failure could lead to disastrous damage. However, mechanisms of the diagonal-shear failure in RC beams have not been completely clarified yet. In this study, the diagonal-shear failure in RC beams is investigated, applying acoustic emission (AE) method. To identify source mechanisms of AE signals, SiGMA (Simplified Green's functions for Moment tensor Analysis) procedure was applied. Prior to four-point bending tests of RC beams, theoretical waveforms were calculated to determine the optimal arrangement of AE sensors. Then, cracking mechanisms in experiments were investigated by applying the SiGMA procedure to AE waveforms. From results of the SiGMA analysis, dominant motions of micro-cracks are found to be of shear crack in all the loading stages. As the load increased, the number of tensile cracks increased and eventually the diagonal-shear failure occurred in the shear span. Prior to final failure, AE cluster of micro-cracks was intensely observed in the shear span. To classify AE sources into tensile and shear cracks, AE parameter analysis was also applied. As a result, most of AE hits are classified into tensile cracks. The difference between results obtained by the AE parameter analysis and by the SiGMA analysis is investigated and discussed.
Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

PubMed

Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

2017-01-01

Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.
The CLASSY clustering algorithm: Description, evaluation, and comparison with the iterative self-organizing clustering system (ISOCLS). [used for LACIE data

NASA Technical Reports Server (NTRS)

Lennington, R. K.; Malek, H.

1978-01-01

A clustering method, CLASSY, was developed, which alternates maximum likelihood iteration with a procedure for splitting, combining, and eliminating the resulting statistics. The method maximizes the fit of a mixture of normal distributions to the observed first through fourth central moments of the data and produces an estimate of the proportions, means, and covariances in this mixture. The mathematical model which is the basic for CLASSY and the actual operation of the algorithm is described. Data comparing the performances of CLASSY and ISOCLS on simulated and actual LACIE data are presented.
Three-dimensional cluster formation and structure in heterogeneous dose distribution of intensity modulated radiation therapy.

PubMed

Chao, Ming; Wei, Jie; Narayanasamy, Ganesh; Yuan, Yading; Lo, Yeh-Chi; Peñagarícano, José A

2018-05-01

To investigate three-dimensional cluster structure and its correlation to clinical endpoint in heterogeneous dose distributions from intensity modulated radiation therapy. Twenty-five clinical plans from twenty-one head and neck (HN) patients were used for a phenomenological study of the cluster structure formed from the dose distributions of organs at risks (OARs) close to the planning target volumes (PTVs). Initially, OAR clusters were searched to examine the pattern consistence among ten HN patients and five clinically similar plans from another HN patient. Second, clusters of the esophagus from another ten HN patients were scrutinized to correlate their sizes to radiobiological parameters. Finally, an extensive Monte Carlo (MC) procedure was implemented to gain deeper insights into the behavioral properties of the cluster formation. Clinical studies showed that OAR clusters had drastic differences despite similar PTV coverage among different patients, and the radiobiological parameters failed to positively correlate with the cluster sizes. MC study demonstrated the inverse relationship between the cluster size and the cluster connectivity, and the nonlinear changes in cluster size with dose thresholds. In addition, the clusters were insensitive to the shape of OARs. The results demonstrated that the cluster size could serve as an insightful index of normal tissue damage. The clinical outcome of the same dose-volume might be potentially different. Copyright © 2018 Elsevier B.V. All rights reserved.
Using concept mapping in the knowledge-to-action process to compare stakeholder opinions on barriers to use of cancer screening among South Asians.

PubMed

Lobb, Rebecca; Pinto, Andrew D; Lofters, Aisha

2013-03-23

Using the knowledge-to-action (KTA) process, this study examined barriers to use of evidence-based interventions to improve early detection of cancer among South Asians from the perspective of multiple stakeholders. In 2011, we used concept mapping with South Asian residents, and representatives from health service and community service organizations in the region of Peel Ontario. As part of concept mapping procedures, brainstorming sessions were conducted with stakeholders (n = 53) to identify barriers to cancer screening among South Asians. Participants (n = 46) sorted barriers into groups, and rated barriers from lowest (1) to highest (6) in terms of importance for use of mammograms, Pap tests and fecal occult blood tests, and how feasible it would be to address them. Multi-dimensional scaling, cluster analysis, and descriptive statistics were used to analyze the data. A total of 45 unique barriers to use of mammograms, Pap tests, and fecal occult blood tests among South Asians were classified into seven clusters using concept mapping procedures: patient's beliefs, fears, lack of social support; health system; limited knowledge among residents; limited knowledge among physicians; health education programs; ethno-cultural discordance with the health system; and cost. Overall, the top three ranked clusters of barriers were 'limited knowledge among residents,' 'ethno-cultural discordance,' and 'health education programs' across surveys. Only residents ranked 'cost' second in importance for fecal occult blood testing, and stakeholders from health service organizations ranked 'limited knowledge among physicians' third for the feasibility survey. Stakeholders from health services organizations ranked 'limited knowledge among physicians' fourth for all other surveys, but this cluster consistently ranked lowest among residents. The limited reach of cancer control programs to racial and ethnic minority groups is a critical implementation issue that requires attention. Opinions of community service and health service organizations on why this deficit in implementation occurs are fundamental to understanding the solutions because these are the settings in which evidence-based interventions are implemented. Using concept mapping within a KTA process can facilitate the engagement of multiple stakeholders in the utilization of study results and in identifying next steps for action.
[Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

PubMed

Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

2018-03-06

To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Two worlds collide: Image analysis methods for quantifying structural variation in cluster molecular dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.

2014-02-14

Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less
Two worlds collide: image analysis methods for quantifying structural variation in cluster molecular dynamics.

PubMed

Steenbergen, K G; Gaston, N

2014-02-14

Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.
Clinical Phenotype of Diabetic Peripheral Neuropathy and Relation to Symptom Patterns: Cluster and Factor Analysis in Patients with Type 2 Diabetes in Korea.

PubMed

Won, Jong Chul; Im, Yong-Jin; Lee, Ji-Hyun; Kim, Chong Hwa; Kwon, Hyuk Sang; Cha, Bong-Yun; Park, Tae Sun

2017-01-01

Patients with diabetic peripheral neuropathy (DPN) is the most common complication. However, patients are usually suffering from not only diverse sensory deficit but also neuropathy-related discomforts. The aim of this study is to identify distinct groups of patients with DPN with respect to its clinical impacts on symptom patterns and comorbidities. A hierarchical cluster analysis and factor analysis were performed to identify relevant subgroups of patients with DPN ( n = 1338) and symptom patterns. Patients with DPN were divided into three clusters: asymptomatic (cluster 1, n = 448, 33.5%), moderate symptoms with disturbed sleep (cluster 2, n = 562, 42.0%), and severe symptoms with decreased quality of life (cluster 3, n = 328, 24.5%). Patients in cluster 3, compared with clusters 1 and 2, were characterized by higher levels of HbA1c and more severe pain and physical impairments. Patients in cluster 2 had moderate pain levels but disturbed sleep patterns comparable to those in cluster 3. The frequency of symptoms on each item of MNSI by "painful" symptom pattern showed a similar distribution pattern with increasing intensities along the three clusters. Cluster and factor analysis endorsed the use of comprehensive and symptomatic subgrouping to individualize the evaluation of patients with DPN.
A hybrid monkey search algorithm for clustering analysis.

PubMed

Chen, Xin; Zhou, Yongquan; Luo, Qifang

2014-01-01

Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
Blue emitting undecaplatinum clusters

NASA Astrophysics Data System (ADS)

Chakraborty, Indranath; Bhuin, Radha Gobinda; Bhat, Shridevi; Pradeep, T.

2014-07-01

A blue luminescent 11-atom platinum cluster showing step-like optical features and the absence of plasmon absorption was synthesized. The cluster was purified using high performance liquid chromatography (HPLC). Electrospray ionization (ESI) and matrix assisted laser desorption ionization (MALDI) mass spectrometry (MS) suggest a composition, Pt11(BBS)8, which was confirmed by a range of other experimental tools. The cluster is highly stable and compatible with many organic solvents.A blue luminescent 11-atom platinum cluster showing step-like optical features and the absence of plasmon absorption was synthesized. The cluster was purified using high performance liquid chromatography (HPLC). Electrospray ionization (ESI) and matrix assisted laser desorption ionization (MALDI) mass spectrometry (MS) suggest a composition, Pt11(BBS)8, which was confirmed by a range of other experimental tools. The cluster is highly stable and compatible with many organic solvents. Electronic supplementary information (ESI) available: Details of experimental procedures, instrumentation, chromatogram of the crude cluster; SEM/EDAX, DLS, PXRD, TEM, FT-IR, and XPS of the isolated Pt11 cluster; UV/Vis, MALDI MS and SEM/EDAX of isolated 2 and 3; and 195Pt NMR of the K2PtCl6 standard. See DOI: 10.1039/c4nr02778g
New measurements of radial velocities in clusters of galaxies. II

NASA Astrophysics Data System (ADS)

Proust, D.; Mazure, A.; Sodre, L.; Capelato, H.; Lund, G.

1988-03-01

Heliocentric radial velocities are determined for 100 galaxies in five clusters, on the basis of 380-518-nm observations obtained using a CCD detector coupled by optical fibers to the OCTOPUS multiobject spectrograph at the Cassegrain focus of the 3.6-m telescope at ESO La Silla. The data-reduction procedures and error estimates are discussed, and the results are presented in tables and graphs and briefly characterized.
Mass propagation of shoots of Stevia rebaudiana using a large scale bioreactor.

PubMed

Akita, M; Shigeoka, T; Koizumi, Y; Kawamura, M

1994-01-01

A procedure for the mass propagation of multiple shoots of Stevia rebaudiana is described. Isolated shoot primordia were used as the inoculum to obtain clusters of shoot primordia. Such clusters were grown in a 500 liter bioreactor to obtain shoots. A total of 64.6 Kg of shoots were propagated from 460 g of the inoculated shoot primordia. These shoots were easily acclimatized in soil.

Simultaneous contrast: evidence from licking microstructure and cross-solution comparisons.

PubMed

Dwyer, Dominic M; Lydall, Emma S; Hayward, Andrew J

2011-04-01

The microstructure of rats' licking responses was analyzed to investigate both "classic" simultaneous contrast (e.g., Flaherty & Largen, 1975) and a novel discrete-trial contrast procedure where access to an 8% test solution of sucrose was preceded by a sample of either 2%, 8%, or 32% sucrose (Experiments 1 and 2, respectively). Consumption of a given concentration of sucrose was higher when consumed alongside a low rather than high concentration comparison solution (positive contrast) and consumption of a given concentration of sucrose was lower when consumed alongside a high rather than a low concentration comparison solution (negative contrast). Furthermore, positive contrast increased the size of lick clusters while negative contrast decreased the size of lick clusters. Lick cluster size has a positive monotonic relationship with the concentration of palatable solutions and so positive and negative contrasts produced changes in lick cluster size that were analogous to raising or lowering the concentration of the test solution respectively. Experiment 3 utilized the discrete-trial procedure and compared contrast between two solutions of the same type (sucrose-sucrose or maltodextrin-maltodextrin) or contrast across solutions (sucrose-maltodextrin or maltodextrin-sucrose). Contrast effects on consumption were present, but reduced in size, in the cross-solution conditions. Moreover, lick cluster sizes were not affected at all by cross-solution contrasts as they were by same-solution contrasts. These results are consistent with the idea that simultaneous contrast effects depend, at least partially, on sensory mechanisms.
Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

PubMed Central

Esplin, M Sean; Manuck, Tracy A.; Varner, Michael W.; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M.; Ilekis, John

2015-01-01

Objective We sought to employ an innovative tool based on common biological pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB), in order to enhance investigators' ability to identify to highlight common mechanisms and underlying genetic factors responsible for SPTB. Study Design A secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks gestation. Each woman was assessed for the presence of underlying SPTB etiologies. A hierarchical cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis using VEGAS software. Results 1028 women with SPTB were assigned phenotypes. Hierarchical clustering of the phenotypes revealed five major clusters. Cluster 1 (N=445) was characterized by maternal stress, cluster 2 (N=294) by premature membrane rupture, cluster 3 (N=120) by familial factors, and cluster 4 (N=63) by maternal comorbidities. Cluster 5 (N=106) was multifactorial, characterized by infection (INF), decidual hemorrhage (DH) and placental dysfunction (PD). These three phenotypes were highly correlated by Chi-square analysis [PD and DH (p<2.2e-6); PD and INF (p=6.2e-10); INF and DH (p=0.0036)]. Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. Conclusion We identified 5 major clusters of SPTB based on a phenotype tool and hierarchal clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors underlying SPTB. PMID:26070700
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

PubMed

Williams, N J; Nasuto, S J; Saddy, J D

2015-07-30

The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
Clustering and Hazard Estimation in the Auckland Volcanic Field, New Zealand

NASA Astrophysics Data System (ADS)

Cronin, S. J.; Bebbington, M. S.

2009-12-01

The Auckland Volcanic Field (AVF) with its 49 eruptive centres formed over the last c. 250 ka presents several unique challenges to our understanding of distributed volcanic field construction and evolution. Due to the youth of the field, high-resolution stratigraphy of eruption centres and ash-fall sequences is possible, allowing time-breaks, soil and peat formation between eruption units to be identified. Radiocarbon dating of sediments between volcanic deposits shows that at least five of the centres have erupted on more than one occasion, with time breaks of 50-100 years between episodes. In addition, paleomagnetic and ash fall evidence implies that there has been strong clustering of eruption events over time, with a specific “flare-up” event involving over possibly up to 19 eruptions occurring between 35-25 ka, in spatially disparate locations. An additional complicating factor is that the only centre that shows any major evidence for evolution out of standard alkali basaltic compositions is also the youngest and largest in volume by several orders of magnitude. All of these features of the AVF, along with relatively poor age-control for many of the vents make spatio-temporal hazard forecasting for the field based on assumptions of past behaviour extremely difficult. Any relationships that take volumetric considerations into account are particularly difficult, since any trend analysis produces unreasonably large future eruptions. The most reasonable model is spatial, via eruption location. We have re-examined the age progression of eruptive events in the AVF, incorporating the most reliable sources of age and stratigraphic data, including developing new correlations between ashfall records in cores and likely vent locations via a probabilistic model of tephra dispersal. A Monte Carlo procedure using the age-progression, stratigraphy and dating constraints can then randomly reproduce likely orderings of events in the field. These were fitted by a clustering-based model of vent locations as originally applied by Magill et al (2005: Mathematical Geol. 37: 227-242) to the Allen and Smith (1994; Geosci. Report Shizuoka Univ 20: 5-14) age ordering of volcanism at AVF. Applying this model, modified by allowing continuation of activity at or around the youngest event, to sampled age orderings from the Monte Carlo procedure shows a very different spatial forecast to the earlier analysis. It is also different to the distribution from randomly ordered events, implying there is at least some clustering control on the location of eruptions in the field. Further iterations of this modelling approach will be tested in relation to eruptive volume and applied to other comparative volcanic fields.
A diagnostic for determining the quality of single-reference electron correlation methods

NASA Technical Reports Server (NTRS)

Lee, Timothy J.; Taylor, Peter R.

1989-01-01

It was recently proposed that the Euclidian norm of the t(sub 1) vector of the coupled cluster wave function (normalized by the number of electrons included in the correlation procedure) could be used to determine whether a single-reference-based electron correlation procedure is appopriate. This diagnostic, T(sub 1) is defined for use with self-consistent-field molecular orbitals and is invariant to the same orbital rotations as the coupled cluster energy. T(sub 1) is investigated for several different chemical systems which exhibit a range of multireference behavior, and is shown to be an excellent measure of the importance of non-dynamical electron correlation and is far superior to C(sub 0) from a singles and doubles configuration interaction wave function. It is further suggested that when the aim is to recover a large fraction of the dynamical electron correlation energy, a large T(sub 1) (i.e., greater than 0.02) probably indicates the need for a multireference electron correlation procedure.
A diagnostic for determining the quality of single-reference electron correlation methods

NASA Technical Reports Server (NTRS)

Lee, Timothy J.; Taylor, Peter R.

1989-01-01

It was recently proposed that the Euclidian norm of the t sub 1 vector of the coupled cluster wave function (normalized by the number of electrons included in the correlation procedure) could be used to determine whether a single-reference-based electron correlation procedure is appropriate. This diagnostic, T sub 1, is defined for use with self consistent field molecular orbitals and is invariant to the same orbital rotations as the coupled cluster energy. T sub 1 is investigated for several different chemical systems which exhibit a range of multireference behavior, and is shown to be an excellent measure of the importance of nondynamical electron correlation and is far superior to C sub 0 from a singles and doubles configuration interaction wave function. It is further suggested that when the aim is to recover a large fraction of the dynamical electron correlation energy, a large T sub 1 (i.e., greater than 0.02) probably indicates the need for a multireference electron correlation procedure.
A New Two-Step Approach for Hands-On Teaching of Gene Technology: Effects on Students' Activities During Experimentation in an Outreach Gene Technology Lab

NASA Astrophysics Data System (ADS)

Scharfenberg, Franz-Josef; Bogner, Franz X.

2011-08-01

Emphasis on improving higher level biology education continues. A new two-step approach to the experimental phases within an outreach gene technology lab, derived from cognitive load theory, is presented. We compared our approach using a quasi-experimental design with the conventional one-step mode. The difference consisted of additional focused discussions combined with students writing down their ideas (step one) prior to starting any experimental procedure (step two). We monitored students' activities during the experimental phases by continuously videotaping 20 work groups within each approach ( N = 131). Subsequent classification of students' activities yielded 10 categories (with well-fitting intra- and inter-observer scores with respect to reliability). Based on the students' individual time budgets, we evaluated students' roles during experimentation from their prevalent activities (by independently using two cluster analysis methods). Independently of the approach, two common clusters emerged, which we labeled as `all-rounders' and as `passive students', and two clusters specific to each approach: `observers' as well as `high-experimenters' were identified only within the one-step approach whereas under the two-step conditions `managers' and `scribes' were identified. Potential changes in group-leadership style during experimentation are discussed, and conclusions for optimizing science teaching are drawn.
A Perfusion MRI Study of Emotional Valence and Arousal in Parkinson's Disease

PubMed Central

Limsoontarakul, Sunsern; Campbell, Meghan C.; Black, Kevin J.

2011-01-01

Background. Brain regions subserving emotion have mostly been studied using functional magnetic resonance imaging (fMRI) during emotion provocation procedures in healthy participants. Objective. To identify neuroanatomical regions associated with spontaneous changes in emotional state over time. Methods. Self-rated emotional valence and arousal scores, and regional cerebral blood flow (rCBF) measured by perfusion MRI, were measured 4 or 8 times spanning at least 2 weeks in each of 21 subjects with Parkinson's disease (PD). A random-effects SPM analysis, corrected for multiple comparisons, identified significant clusters of contiguous voxels in which rCBF varied with valence or arousal. Results. Emotional valence correlated positively with rCBF in several brain regions, including medial globus pallidus, orbital prefrontal cortex (PFC), and white matter near putamen, thalamus, insula, and medial PFC. Valence correlated negatively with rCBF in striatum, subgenual cingulate cortex, ventrolateral PFC, and precuneus—posterior cingulate cortex (PCC). Arousal correlated positively with rCBF in clusters including claustrum-thalamus-ventral striatum and inferior parietal lobule and correlated negatively in clusters including posterior insula—mediodorsal thalamus and midbrain. Conclusion. This study demonstrates that the temporal stability of perfusion MRI allows within-subject investigations of spontaneous fluctuations in mental state, such as mood, over relatively long-time intervals. PMID:21969917
Camps 2.0: exploring the sequence and structure space of prokaryotic, eukaryotic, and viral membrane proteins.

PubMed

Neumann, Sindy; Hartmann, Holger; Martin-Galiano, Antonio J; Fuchs, Angelika; Frishman, Dmitrij

2012-03-01

Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ∼1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/. Copyright © 2011 Wiley Periodicals, Inc.
a Study on 4 Reactions Forming 46Ti*

NASA Astrophysics Data System (ADS)

Cicerchia, M.; Marchi, T.; Gramegna, F.; Cinausero, M.; Fabris, D.; Mantovani, G.; Degerlier, M.; Morelli, L.; Bruno, M.; DAgostino, M.; Frosin, C.; Barlini, S.; Piantelli, S.; Valdrè, S.; Bini, M.; Pasquali, G.; Casini, G.; Pastore, G.; Gruyer, D.; Ottanelli, P.; Camaiani, A.; Gelli, N.; Olmi, A.; Poggi, G.; Lombardo, I.; Dell'Aquila, D.; Cieplicka-Orynczak, N.

2018-02-01

The NUCL-EX collaboration is carrying out an extensive research program on preequilibrium emission of light charged particles from hot nuclei. The ultimate goal is to study how cluster structures affect nuclear reactions [1,2,3,4]. Indeed, a strong correlation between nuclear structure and reaction dynamics emerges when some nucleons or clusters of nucleons are emitted or captured [5]. At this purpose, the four reactions 16O+30Si, 16O+30Si, 18O+28Si and 19F +27Al have been measured at about 120 MeV projectile energy. Experimental data were collected at Legnaro National Laboratories, using the GARFIELD+RCo array, fully equipped with digital electronics [6]. Following an initial identification of particles and the energy calibration procedures, the complete analysis is being performed on an event-by-event basis. Experimental data are then compared to the theoretical predictions where events are generated by numerical codes based on pre-equilibrium and statistical models and then filtered through a software replica of the setup. Differences between the experimental data and the predicted data put into evidence effects related to the entrance channel and to the cluster nature of the colliding ions. After a general introduction on the experimental campaign, this contribution will focus on the preliminary results obtained so far.
Elucidation of the Pattern of the Onset of Male Lower Urinary Tract Symptoms Using Cluster Analysis: Efficacy of Tamsulosin in Each Symptom Group.

PubMed

Aikawa, Ken; Kataoka, Masao; Ogawa, Soichiro; Akaihata, Hidenori; Sato, Yuichi; Yabe, Michihiro; Hata, Junya; Koguchi, Tomoyuki; Kojima, Yoshiyuki; Shiragasawa, Chihaya; Kobayashi, Toshimitsu; Yamaguchi, Osamu

2015-08-01

To present a new grouping of male patients with lower urinary tract symptoms (LUTS) based on symptom patterns and clarify whether the therapeutic effect of α1-blocker differs among the groups. We performed secondary analysis of anonymous data from 4815 patients enrolled in a postmarketing surveillance study of tamsulosin in Japan. Data on 7 International Prostate Symptom Score (IPSS) items at the initial visit were used in the cluster analysis. IPSS and quality of life (QOL) scores before and after tamsulosin treatment for 12 weeks were assessed in each cluster. Partial correlation coefficients were also obtained for IPSS and QOL scores based on changes before and after treatment. Five symptom groups were identified by cluster analysis of IPSS. On their symptom profile, each cluster was labeled as minimal type (cluster 1), multiple severe type (cluster 2), weak stream type (cluster 3), storage type (cluster 4), and voiding type (cluster 5). Prevalence and the mean symptom score were significantly improved in almost all symptoms in all clusters by tamsulosin treatment. Nocturia and weak stream had the strongest effect on QOL in clusters 1, 2, and 4 and clusters 3 and 5, respectively. The study clarified that 5 characteristic symptom patterns exist by cluster analysis of IPSS in male patients with LUTS. Tamsulosin improved various symptoms and QOL in each symptom group. The study reports many male patients with LUTS being satisfied with monotherapy using tamsulosin and suggests the usefulness of α1-blockers as a drug of first choice. Copyright © 2015 Elsevier Inc. All rights reserved.
Multiscale visual quality assessment for cluster analysis with self-organizing maps

NASA Astrophysics Data System (ADS)

Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

2011-01-01

Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
Hydrometeor classification through statistical clustering of polarimetric radar measurements: a semi-supervised approach

NASA Astrophysics Data System (ADS)

Besic, Nikola; Ventura, Jordi Figueras i.; Grazioli, Jacopo; Gabella, Marco; Germann, Urs; Berne, Alexis

2016-09-01

Polarimetric radar-based hydrometeor classification is the procedure of identifying different types of hydrometeors by exploiting polarimetric radar observations. The main drawback of the existing supervised classification methods, mostly based on fuzzy logic, is a significant dependency on a presumed electromagnetic behaviour of different hydrometeor types. Namely, the results of the classification largely rely upon the quality of scattering simulations. When it comes to the unsupervised approach, it lacks the constraints related to the hydrometeor microphysics. The idea of the proposed method is to compensate for these drawbacks by combining the two approaches in a way that microphysical hypotheses can, to a degree, adjust the content of the classes obtained statistically from the observations. This is done by means of an iterative approach, performed offline, which, in a statistical framework, examines clustered representative polarimetric observations by comparing them to the presumed polarimetric properties of each hydrometeor class. Aside from comparing, a routine alters the content of clusters by encouraging further statistical clustering in case of non-identification. By merging all identified clusters, the multi-dimensional polarimetric signatures of various hydrometeor types are obtained for each of the studied representative datasets, i.e. for each radar system of interest. These are depicted by sets of centroids which are then employed in operational labelling of different hydrometeors. The method has been applied on three C-band datasets, each acquired by different operational radar from the MeteoSwiss Rad4Alp network, as well as on two X-band datasets acquired by two research mobile radars. The results are discussed through a comparative analysis which includes a corresponding supervised and unsupervised approach, emphasising the operational potential of the proposed method.
Complex regional pain syndrome: evidence for warm and cold subtypes in a large prospective clinical sample.

PubMed

Bruehl, Stephen; Maihöfner, Christian; Stanton-Hicks, Michael; Perez, Roberto S G M; Vatine, Jean-Jacques; Brunner, Florian; Birklein, Frank; Schlereth, Tanja; Mackey, Sean; Mailis-Gagnon, Angela; Livshitz, Anatoly; Harden, R Norman

2016-08-01

Limited research suggests that there may be Warm complex regional pain syndrome (CRPS) and Cold CRPS subtypes, with inflammatory mechanisms contributing most strongly to the former. This study for the first time used an unbiased statistical pattern recognition technique to evaluate whether distinct Warm vs Cold CRPS subtypes can be discerned in the clinical population. An international, multisite study was conducted using standardized procedures to evaluate signs and symptoms in 152 patients with clinical CRPS at baseline, with 3-month follow-up evaluations in 112 of these patients. Two-step cluster analysis using automated cluster selection identified a 2-cluster solution as optimal. Results revealed a Warm CRPS patient cluster characterized by a warm, red, edematous, and sweaty extremity and a Cold CRPS patient cluster characterized by a cold, blue, and less edematous extremity. Median pain duration was significantly (P < 0.001) shorter in the Warm CRPS (4.7 months) than in the Cold CRPS subtype (20 months), with pain intensity comparable. A derived total inflammatory score was significantly (P < 0.001) elevated in the Warm CRPS group (compared with Cold CRPS) at baseline but diminished significantly (P < 0.001) over the follow-up period, whereas this score did not diminish in the Cold CRPS group (time × subtype interaction: P < 0.001). Results support the existence of a Warm CRPS subtype common in patients with acute (<6 months) CRPS and a relatively distinct Cold CRPS subtype most common in chronic CRPS. The pattern of clinical features suggests that inflammatory mechanisms contribute most prominently to the Warm CRPS subtype but that these mechanisms diminish substantially during the first year postinjury.
Use of a modified Comprehensive Pain Evaluation Questionnaire: Characteristics and functional status of patients on entry to a tertiary care pain clinic

PubMed Central

Nelli, Jennifer M; Nicholson, Keith; Lakha, S Fatima; Louffat, Ada F; Chapparo, Luis; Furlan, Julio; Mailis-Gagnon, Angela

2012-01-01

BACKGROUND: With increasing knowledge of chronic pain, clinicians have attempted to assess chronic pain patients with lengthy assessment tools. OBJECTIVES: To describe the functional and emotional status of patients presenting to a tertiary care pain clinic; to assess the reliability and validity of a diagnostic classification system for chronic pain patients modelled after the Multidimensional Pain Inventory; to provide psychometric data on a modified Comprehensive Pain Evaluation Questionnaire (CPEQ); and to evaluate the relationship between the modified CPEQ construct scores and clusters with Diagnostic and Statistical Manual, Fourth Edition – Text Revision Pain Disorder diagnoses. METHODS: Data on 300 new patients over the course of nine months were collected using standardized assessment procedures plus a modified CPEQ at the Comprehensive Pain Program, Toronto Western Hospital, Toronto, Ontario. RESULTS: Cluster analysis of the modified CPEQ revealed three patient profiles, labelled Adaptive Copers, Dysfunctional, and Interpersonally Distressed, which closely resembled those previously reported. The distribution of modified CPEQ construct T scores across profile subtypes was similar to that previously reported for the original CPEQ. A novel finding was that of a strong relationship between the modified CPEQ clusters and constructs with Diagnostic and Statistical Manual, Fourth Edition – Text Revision Pain Disorder diagnoses. DISCUSSION AND CONCLUSIONS: The CPEQ, either the original or modified version, yields reproducible results consistent with the results of other studies. This technique may usefully classify chronic pain patients, but more work is needed to determine the meaning of the CPEQ clusters, what psychological or biomedical variables are associated with CPEQ constructs or clusters, and whether this instrument may assist in treatment planning or predict response to treatment. PMID:22518368
Hurdle models for multilevel zero-inflated data via h-likelihood.

PubMed

Molas, Marek; Lesaffre, Emmanuel

2010-12-30

Count data often exhibit overdispersion. One type of overdispersion arises when there is an excess of zeros in comparison with the standard Poisson distribution. Zero-inflated Poisson and hurdle models have been proposed to perform a valid likelihood-based analysis to account for the surplus of zeros. Further, data often arise in clustered, longitudinal or multiple-membership settings. The proper analysis needs to reflect the design of a study. Typically random effects are used to account for dependencies in the data. We examine the h-likelihood estimation and inference framework for hurdle models with random effects for complex designs. We extend the h-likelihood procedures to fit hurdle models, thereby extending h-likelihood to truncated distributions. Two applications of the methodology are presented. Copyright © 2010 John Wiley & Sons, Ltd.
Genetic structure of Cantharellus formosus populations in a second-growth temperate rain forest of the Pacific Northwest

USGS Publications Warehouse

Redman, Regina S.; Ranson, Judith; Rodriguez, Rusty J.

2006-01-01

Cantharellus formosus growing on the Olympic Peninsula of the Pacific Northwest was sampled from September – November 1995 for genetic analysis. A total of ninety-six basidiomes from five clusters separated from one another by 3 - 25 meters were genetically characterized by PCR analysis of 13 arbitrary loci and rDNA sequences. The number of basidiomes in each cluster varied from 15 to 25 and genetic analysis delineated 15 genets among the clusters. Analysis of variance utilizing thirteen apPCR generated genetic molecular markers and PCR amplification of the ribosomal ITS regions indicated that 81.41% of the genetic variation occurred between clusters and 18.59% within clusters. Proximity of the basidiomes within a cluster was not an indicator of genotypic similarity. The molecular profiles of each cluster were distinct and defined as unique populations containing 2 - 6 genets. The monitoring and analysis of this species through non-lethal sampling and future applications is discussed.
[Achene morphology cluster analysis of Taraxacum F. H. Wigg. from northeast China and molecule systematics evidence determined by SRAP].

PubMed

Li, Hai-juan; Zhao, Xin; Jia, Qing-fei; Li, Tian-lai; Ning, Wei

2012-08-01

The achenes morphological and micro-morphological characteristics of six species of genus Taraxacum from northeastern China as well as SRAP cluster analysis were observed for their classification evidences. The achenes were observed by microscope and EPMA. Cluster analysis was given on the basis of the size, shape, cone proportion, color and surface sculpture of achenes. The Taraxacum inter-species achene shape characteristic difference is obvious, particularly spinulose distribution and size, achene color and achene size; with the Taraxacum plant achene shape the cluster method T. antungense Kitag. and the T. urbanum Kitag. should combine for the identical kind; the achene morphology cluster analysis and the SRAP tagged molecule systematics's cluster result retrieves in the table with "the Chinese flora". The class group to divide the result is consistent. Taraxacum plant achene shape characteristic stable conservative, may carry on the inter-species division and the sibship analysis according to the achene shape characteristic combination difference; the achene morphology cluster analysis as well as the SRAP tagged molecule systematics confirmation support dandelion classification result of "the Chinese flora".
Exploratory Item Classification Via Spectral Graph Clustering

PubMed Central

Chen, Yunxiao; Li, Xiaoou; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

2017-01-01

Large-scale assessments are supported by a large item pool. An important task in test development is to assign items into scales that measure different characteristics of individuals, and a popular approach is cluster analysis of items. Classical methods in cluster analysis, such as the hierarchical clustering, K-means method, and latent-class analysis, often induce a high computational overhead and have difficulty handling missing data, especially in the presence of high-dimensional responses. In this article, the authors propose a spectral clustering algorithm for exploratory item cluster analysis. The method is computationally efficient, effective for data with missing or incomplete responses, easy to implement, and often outperforms traditional clustering algorithms in the context of high dimensionality. The spectral clustering algorithm is based on graph theory, a branch of mathematics that studies the properties of graphs. The algorithm first constructs a graph of items, characterizing the similarity structure among items. It then extracts item clusters based on the graphical structure, grouping similar items together. The proposed method is evaluated through simulations and an application to the revised Eysenck Personality Questionnaire. PMID:29033476
Sulfur in Cometary Dust

NASA Technical Reports Server (NTRS)

Fomenkova, M. N.

1997-01-01

The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.

Performance analysis of clustering techniques over microarray data: A case study

NASA Astrophysics Data System (ADS)

Dash, Rasmita; Misra, Bijan Bihari

2018-03-01

Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
THE HUBBLE SPACE TELESCOPE UV LEGACY SURVEY OF GALACTIC GLOBULAR CLUSTERS. VIII. PRELIMINARY PUBLIC CATALOG RELEASE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soto, M.; Bellini, A.; Anderson, J.

The Hubble Space Telescope (HST) UV Legacy Survey of Galactic Globular Clusters (GO-13297) has been specifically designed to complement the existing F606W and F814W observations of the Advanced Camera for Surveys (ACS) Globular Cluster Survey (GO-10775) by observing the most accessible 47 of the previous survey’s 65 clusters in three WFC3/UVIS filters F275W, F336W, and F438W. The new survey also adds super-solar metallicity open cluster NGC 6791 to increase the metallicity diversity. The combined survey provides a homogeneous 5-band data set that can be used to pursue a broad range of scientific investigations. In particular, the chosen UV filters allow themore » identification of multiple stellar populations by targeting the regions of the spectrum that are sensitive to abundance variations in C, N, and O. In order to provide the community with uniform preliminary catalogs, we have devised an automated procedure that performs high-quality photometry on the new UV observations (along with similar observations of seven other programs in the archive). This procedure finds and measures the potential sources on each individual exposure using library point-spread functions and cross-correlates these observations with the original ACS-Survey catalog. The catalog of 57 clusters we publish here will be useful to identify stars in the different stellar populations, in particular for spectroscopic follow-up. Eventually, we will construct a more sophisticated catalog and artificial-star tests based on an optimal reduction of the UV survey data, but the catalogs presented here give the community the chance to make early use of this HST Treasury survey.« less
Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

PubMed

Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

2016-11-01

To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q -EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.
Methods of extending crop signatures from one area to another

NASA Technical Reports Server (NTRS)

Minter, T. C. (Principal Investigator)

1979-01-01

Efforts to develop a technology for signature extension during LACIE phases 1 and 2 are described. A number of haze and Sun angle correction procedures were developed and tested. These included the ROOSTER and OSCAR cluster-matching algorithms and their modifications, the MLEST and UHMLE maximum likelihood estimation procedures, and the ATCOR procedure. All these algorithms were tested on simulated data and consecutive-day LANDSAT imagery. The ATCOR, OSCAR, and MLEST algorithms were also tested for their capability to geographically extend signatures using LANDSAT imagery.
Spotting effect in microarray experiments

PubMed Central

Mary-Huard, Tristan; Daudin, Jean-Jacques; Robin, Stéphane; Bitton, Frédérique; Cabannes, Eric; Hilson, Pierre

2004-01-01

Background Microarray data must be normalized because they suffer from multiple biases. We have identified a source of spatial experimental variability that significantly affects data obtained with Cy3/Cy5 spotted glass arrays. It yields a periodic pattern altering both signal (Cy3/Cy5 ratio) and intensity across the array. Results Using the variogram, a geostatistical tool, we characterized the observed variability, called here the spotting effect because it most probably arises during steps in the array printing procedure. Conclusions The spotting effect is not appropriately corrected by current normalization methods, even by those addressing spatial variability. Importantly, the spotting effect may alter differential and clustering analysis. PMID:15151695
CLUSFAVOR 5.0: hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles

PubMed Central

Peterson, Leif E

2002-01-01

CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
Stages of Change – Continuous Measure (URICA-E2): psychometrics of a Norwegian version

PubMed Central

Lerdal, Anners; Moe, Britt; Digre, Elin; Harding, Thomas; Kristensen, Frode; Grov, Ellen K; Bakken, Linda N; Eklund, Marthe L; Ruud, Ireen; Rossi, Joseph S

2009-01-01

Title Stages of Change – Continuous Measure (URICA-E2): psychometrics of a Norwegian version. Aim This paper is a report of research to translate the English version of the Stages of Change continuous measure questionnaire (URICA-E2) into Norwegian and to test the validity of the questionnaire and its usefulness in predicting behavioural change. Background While the psychometric properties of the Stages of Change categorical measure have been tested extensively, evaluation of the psychometric properties of the continuous questionnaire has not been described elsewhere in the literature. Method Cross-sectional data were collected with a convenience sample of 198 undergraduate nursing students in 2005 and 2006. The English version of URICA-E2 was translated into Norwegian according to standardized procedures. Findings Principal components analysis clearly confirmed five of the dimensions of readiness to change (Precontemplation Non-Believers, Precontemplation Believers, Contemplation, Preparation and Maintenance), while the sixth dimension, Action, showed the lowest Eigenvalue (0·93). Findings from the cluster analysis indicate distinct profiles among the respondents in terms of readiness to change their exercise behaviour. Conclusion The URICA-E2 was for the most part replicated from Reed’s original work. The result of the cluster analysis of the items associated with the factor ‘Action’ suggests that these do not adequately measure the factor. PMID:19032513
Geospatiotemporal Data Mining of Remotely Sensed Phenology for Unsupervised Forest Threat Detection

NASA Astrophysics Data System (ADS)

Mills, R. T.; Hoffman, F. M.; Kumar, J.; Vulli, S. S.; Hargrove, W. W.; Spruce, J.

2010-12-01

Hargrove and Hoffman have previously developed and applied a scalable geospatiotemporal data mining approach to define a set of categorical, multivariate classes or states for describing and tracking the behavior of ecosystem properties through time within a multi-dimensional phase or state space. The method employs a standard k-means cluster analysis with enhancements that reduce the number of required comparisons, dramatically accelerating iterative convergence. In support of efforts by the USDA Forest Service to develop a National Early Warning System for Forest Disturbances, we have applied this geospatiotemporal cluster analysis procedure to annual phenology patterns derived from Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) for unsupervised change detection. We will present initial results from the analysis of seven years of 250-m MODIS NDVI data for the conterminous United States. While determining what constitutes a "normal" phenological pattern for any given location is challenging due to interannual climate variability, a spatially varying climate change trend, and the relatively short record of MODIS NDVI observations, these results demonstrate the utility of the method for detecting significant mortality events, like the progressive damage from mountain pine beetle, and suggest that the technique may be successfully implemented as a key component in an early warning system for identifying forest threats from natural and anthropogenic disturbances at a continental scale.
Automated Image Analysis of HER2 Fluorescence In Situ Hybridization to Refine Definitions of Genetic Heterogeneity in Breast Cancer Tissue

PubMed Central

Radziuviene, Gedmante; Rasmusson, Allan; Augulis, Renaldas; Lesciute-Krilaviciene, Daiva; Laurinaviciene, Aida; Clim, Eduard

2017-01-01

Human epidermal growth factor receptor 2 gene- (HER2-) targeted therapy for breast cancer relies primarily on HER2 overexpression established by immunohistochemistry (IHC) with borderline cases being further tested for amplification by fluorescence in situ hybridization (FISH). Manual interpretation of HER2 FISH is based on a limited number of cells and rather complex definitions of equivocal, polysomic, and genetically heterogeneous (GH) cases. Image analysis (IA) can extract high-capacity data and potentially improve HER2 testing in borderline cases. We investigated statistically derived indicators of HER2 heterogeneity in HER2 FISH data obtained by automated IA of 50 IHC borderline (2+) cases of invasive ductal breast carcinoma. Overall, IA significantly underestimated the conventional HER2, CEP17 counts, and HER2/CEP17 ratio; however, it collected more amplified cells in some cases below the lower limit of GH definition by manual procedure. Indicators for amplification, polysomy, and bimodality were extracted by factor analysis and allowed clustering of the tumors into amplified, nonamplified, and equivocal/polysomy categories. The bimodality indicator provided independent cell diversity characteristics for all clusters. Tumors classified as bimodal only partially coincided with the conventional GH heterogeneity category. We conclude that automated high-capacity nonselective tumor cell assay can generate evidence-based HER2 intratumor heterogeneity indicators to refine GH definitions. PMID:28752092
Automated Image Analysis of HER2 Fluorescence In Situ Hybridization to Refine Definitions of Genetic Heterogeneity in Breast Cancer Tissue.

PubMed

Radziuviene, Gedmante; Rasmusson, Allan; Augulis, Renaldas; Lesciute-Krilaviciene, Daiva; Laurinaviciene, Aida; Clim, Eduard; Laurinavicius, Arvydas

2017-01-01

Human epidermal growth factor receptor 2 gene- (HER2-) targeted therapy for breast cancer relies primarily on HER2 overexpression established by immunohistochemistry (IHC) with borderline cases being further tested for amplification by fluorescence in situ hybridization (FISH). Manual interpretation of HER2 FISH is based on a limited number of cells and rather complex definitions of equivocal, polysomic, and genetically heterogeneous (GH) cases. Image analysis (IA) can extract high-capacity data and potentially improve HER2 testing in borderline cases. We investigated statistically derived indicators of HER2 heterogeneity in HER2 FISH data obtained by automated IA of 50 IHC borderline (2+) cases of invasive ductal breast carcinoma. Overall, IA significantly underestimated the conventional HER2, CEP17 counts, and HER2/CEP17 ratio; however, it collected more amplified cells in some cases below the lower limit of GH definition by manual procedure. Indicators for amplification, polysomy, and bimodality were extracted by factor analysis and allowed clustering of the tumors into amplified, nonamplified, and equivocal/polysomy categories. The bimodality indicator provided independent cell diversity characteristics for all clusters. Tumors classified as bimodal only partially coincided with the conventional GH heterogeneity category. We conclude that automated high-capacity nonselective tumor cell assay can generate evidence-based HER2 intratumor heterogeneity indicators to refine GH definitions.
DICON: interactive visual analysis of multidimensional clusters.

PubMed

Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin

2011-12-01

Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
Elderly patients attended in emergency health services in Brazil: a study for victims of falls and traffic accidents.

PubMed

de Freitas, Mariana Gonçalves; Bonolo, Palmira de Fátima; de Moraes, Edgar Nunes; Machado, Carla Jorge

2015-03-01

The article aims to describe the profile of elderly victims of falls and traffic accidents from the data of the Surveillance Survey of Violence and Accidents (VIVA). The VIVA Survey was conducted in the emergency health-services of the Unified Health System in the capitals of Brazil in 2011. The sample of elderly by type of accident was subjected to the two-step cluster procedure. Of the 2463 elderly persons in question, 79.8% suffered falls and 20.2% were the victims of traffic accidents. The 1812 elderly who fell were grouped together into 4 clusters: Cluster 1, in which all had disabilities; Cluster 2, all were non-white and falls took place in the home; Cluster 3, younger and active seniors; and Cluster 4, with a higher proportion of seniors 80 years old or above who were white. Among cases of traffic accidents, 446 seniors were grouped into two clusters: Cluster 1 of younger elderly, drivers or passengers; Cluster 2, with higher age seniors, mostly pedestrians. The main victims of falls were women with low schooling and unemployed; traffic accident victims were mostly younger and male. Complications were similar in victims of falls and traffic accidents. Clusters allow adoption of targeted measures of care, prevention and health promotion.
Sizing the star cluster population of the Large Magellanic Cloud

NASA Astrophysics Data System (ADS)

Piatti, Andrés E.

2018-04-01

The number of star clusters that populate the Large Magellanic Cloud (LMC) at deprojected distances <4 deg has been recently found to be nearly double the known size of the system. Because of the unprecedented consequences of this outcome in our knowledge of the LMC cluster formation and dissolution histories, we closely revisited such a compilation of objects and found that only ˜35 per cent of the previously known catalogued clusters have been included. The remaining entries are likely related to stellar overdensities of the LMC composite star field, because there is a remarkable enhancement of objects with assigned ages older than log(t yr-1) ˜ 9.4, which contrasts with the existence of the LMC cluster age gap; the assumption of a cluster formation rate similar to that of the LMC star field does not help to conciliate so large amount of clusters either; and nearly 50 per cent of them come from cluster search procedures known to produce more than 90 per cent of false detections. The lack of further analyses to confirm the physical reality as genuine star clusters of the identified overdensities also glooms those results. We support that the actual size of the LMC main body cluster population is close to that previously known.
Towards a comprehensive knowledge of the star cluster population in the Small Magellanic Cloud

NASA Astrophysics Data System (ADS)

Piatti, A. E.

2018-07-01

The Small Magellanic Cloud (SMC) has recently been found to harbour an increase of more than 200 per cent in its known cluster population. Here, we provide solid evidence that this unprecedented number of clusters could be greatly overestimated. On the one hand, the fully automatic procedure used to identify such an enormous cluster candidate sample did not recover ˜50 per cent, on average, of the known relatively bright clusters located in the SMC main body. On the other hand, the number of new cluster candidates per time unit as a function of time is noticeably different from the intrinsic SMC cluster frequency (CF), which should not be the case if these new detections were genuine physical systems. We found additionally that the SMC CF varies spatially, in such a way that it resembles an outside-in process coupled with the effects of a relatively recent interaction with the Large Magellanic Cloud. By assuming that clusters and field stars share the same formation history, we showed for the first time that the cluster dissolution rate also depends on position in the galaxy. The cluster dissolution becomes higher as the concentration of galaxy mass increases or if external tidal forces are present.
A Stationary Wavelet Entropy-Based Clustering Approach Accurately Predicts Gene Expression

PubMed Central

Nguyen, Nha; Vo, An; Choi, Inchan

2015-01-01

Abstract Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation. PMID:25383910
Cluster Correspondence Analysis.

PubMed

van de Velden, M; D'Enza, A Iodice; Palumbo, F

2017-03-01

A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh

2013-11-30

Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

PubMed

Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

2014-11-01

Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
X-ray and optical substructures of the DAFT/FADA survey clusters

NASA Astrophysics Data System (ADS)

Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.

2013-04-01

We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.
Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

PubMed

Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

2018-07-01

Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

Mixture modelling for cluster analysis.

PubMed

McLachlan, G J; Chang, S U

2004-10-01

Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

PubMed Central

Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

2016-01-01

Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P < 0.05). Preliminary work suggested three clusters by retaining the I-NSD and splitting the I-SSD cluster into two: I-SSD A (n = 29): defined by high WASO and I-SSD B (n = 14): a second I-SSD cluster with high SOL and medium WASO. The I-SSD B cluster performed worse than I-SSD A and I-NSD for sustained attention (P ≤ 0.05). In an exploratory analysis, q-EEG revealed reduced spectral power also in I-SSD B before (Delta, Alpha, Beta-1) and after sleep-onset (Beta-2) compared to I-SSD A and I-NSD (P ≤ 0.05). Conclusions: Two insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796
Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

PubMed Central

Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

2015-01-01

Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Numerical taxonomy and ecology of petroleum-degrading bacteria.

PubMed Central

Austin, B; Calomiris, J J; Walker, J D; Colwell, R R

1977-01-01

A total of 99 strains of petroleum-degrading bacteria isolated from Chesapeake Bay water and sediment were identified by using numerical taxonomy procedures. The isolates, together with 33 reference cultures, were examined for 48 biochemical, cultural, morphological, and physiological characters. The data were analyzed by computer, using both the simple matching and the Jaccard coefficients. Clustering was achieved by the unweighted average linkage method. From the sorted similarity matrix and dendrogram, 14 phenetic groups, comprising 85 of the petroleum-degrading bacteria, were defined at the 80 to 85% similarity level. These groups were identified as actinomycetes (mycelial forms, four clusters), coryneforms, Enterobacteriaceae, Klebsiella aerogenes, Micrococcus spp. (two clusters), Nocardia species (two clusters), Pseudomonas spp. (two clusters), and Sphaerotilus natans. It is concluded that the degradation of petroleum is accomplished by a diverse range of bacterial taxa, some of which were isolated only at given sampling stations and, more specifically, from sediment collected at a given station. PMID:889329
Commentary on Steinley and Brusco (2011): Recommendations and Cautions

ERIC Educational Resources Information Center

McLachlan, Geoffrey J.

2011-01-01

I discuss the recommendations and cautions in Steinley and Brusco's (2011) article on the use of finite models to cluster a data set. In their article, much use is made of comparison with the "K"-means procedure. As noted by researchers for over 30 years, the "K"-means procedure can be viewed as a special case of finite mixture modeling in which…
Regional Frequency Analysis of Annual Maximum Streamflow in Gipuzkoa (Spain)

NASA Astrophysics Data System (ADS)

Erro, J.; López, J. J.

2012-04-01

Extreme streamflow events have been an important cause of recent flooding in Gipuzkoa, and any change in the magnitude of such events may have severe impacts upon urban structures such as dams, urban drainage systems and flood defences, and cause failures to occur. So a regional frequency analysis of annual maximum streamflow was developed for Gipuzkoa, using the well known L-moments approach together with the index-flood procedure, and following the four steps that characterize it: initial screening of the data, identification of homogeneous regions, choice of the appropriate frequency distribution and estimation of quantiles for different return periods. The preliminary study, completed in 2009, was based on the observations recorded at 22 stations distributed throughout the area. A primary filtering of the data revealed the absence of jumps, inconsistencies and changes in trends within the series, and the discordancy measures showed that none of the sites used in the analysis had to be considered discordant with the others. Regionalization was performed by cluster analysis, grouping the stations according to eight physical site characteristics: latitude, longitude, drainage basin area, elevation, main channel length of the basin, slope, annual mean rainfall and annual maximum rainfall. It resulted in two groups - one cluster with the 18 sites of small-medium basin area, and a second cluster with the 4 remaining sites of major basin area - in which the homogeneity criteria were tested and satisfied. However, the short lenght of the series together with the introduction of the observations of 2010 and the inclusion of a historic extreme streamflow event occurred in northern Spain in November 2011, completely changed the results. With this consideration and adjustment, all Gipuzkoa could be treated as a homogeneus region. The goodness-of-fit measures indicated that Generalized Logistic (GLO) is the only suitable distribution to characterize Gipuzkoa. Using the regional L-moment algorithm, quantiles associated with return periods of interest were estimated, and Monte Carlo simulation was used to compute RMSE, bias and error bounds for the estimates.
A correlation analysis-based detection and delineation of ECG characteristic events using template waveforms extracted by ensemble averaging of clustered heart cycles.

PubMed

Homaeinezhad, M R; Erfanianmoshiri-Nejad, M; Naseri, H

2014-01-01

The goal of this study is to introduce a simple, standard and safe procedure to detect and to delineate P and T waves of the electrocardiogram (ECG) signal in real conditions. The proposed method consists of four major steps: (1) a secure QRS detection and delineation algorithm, (2) a pattern recognition algorithm designed for distinguishing various ECG clusters which take place between consecutive R-waves, (3) extracting template of the dominant events of each cluster waveform and (4) application of the correlation analysis in order to delineate automatically the P- and T-waves in noisy conditions. The performance characteristics of the proposed P and T detection-delineation algorithm are evaluated versus various ECG signals whose qualities are altered from the best to the worst cases based on the random-walk noise theory. Also, the method is applied to the MIT-BIH Arrhythmia and the QT databases for comparing some parts of its performance characteristics with a number of P and T detection-delineation algorithms. The conducted evaluations indicate that in a signal with low quality value of about 0.6, the proposed method detects the P and T events with sensitivity Se=85% and positive predictive value of P+=89%, respectively. In addition, at the same quality, the average delineation errors associated with those ECG events are 45 and 63ms, respectively. Stable delineation error, high detection accuracy and high noise tolerance were the most important aspects considered during development of the proposed method. © 2013 Elsevier Ltd. All rights reserved.
Personalized identification of differentially expressed pathways in pediatric sepsis.

PubMed

Li, Binjie; Zeng, Qiyi

2017-10-01

Sepsis is a leading killer of children worldwide with numerous differentially expressed genes reported to be associated with sepsis. Identifying core pathways in an individual is important for understanding septic mechanisms and for the future application of custom therapeutic decisions. Samples used in the study were from a control group (n=18) and pediatric sepsis group (n=52). Based on Kauffman's attractor theory, differentially expressed pathways associated with pediatric sepsis were detected as attractors. When the distribution results of attractors are consistent with the distribution of total data assessed using support vector machine, the individualized pathway aberrance score (iPAS) was calculated to distinguish differences. Through attractor and Kyoto Encyclopedia of Genes and Genomes functional analysis, 277 enriched pathways were identified as attractors. There were 81 pathways with P<0.05 and 59 pathways with P<0.01. Distribution outcomes of screened attractors were mostly consistent with the total data demonstrated by the six classifying parameters, which suggested the efficiency of attractors. Cluster analysis of pediatric sepsis using the iPAS method identified seven pathway clusters and four sample clusters. Thus, in the majority pediatric sepsis samples, core pathways can be detected as different from accumulated normal samples. In conclusion, a novel procedure that identified the dysregulated attractors in individuals with pediatric sepsis was constructed. Attractors can be markers to identify pathways involved in pediatric sepsis. iPAS may provide a correlation score for each of the signaling pathways present in an individual patient. This process may improve the personalized interpretation of disease mechanisms and may be useful in the forthcoming era of personalized medicine.
Analysis of heart rate variability signal in meditation using second-order difference plot

NASA Astrophysics Data System (ADS)

Goswami, Damodar Prasad; Tibarewala, Dewaki Nandan; Bhattacharya, Dilip Kumar

2011-06-01

In this article, the heart rate variability signal taken from subjects practising different types of meditations have been investigated to find the underlying similarity among them and how they differ from the non-meditative condition. Four different groups of subjects having different meditation techniques are involved. The data have been obtained from the Physionet and also collected with our own ECG machine. For data analysis, the second order difference plot is applied. Each of the plots obtained from the second order differences form a single cluster which is nearly elliptical in shape except for some outliers. In meditation, the axis of the elliptical cluster rotates anticlockwise from the cluster formed from the premeditation data, although the amount of rotation is not of the same extent in every case. This form study reveals definite and specific changes in the heart rate variability of the subjects during meditation. All the four groups of subjects followed different procedures but surprisingly the resulting physiological effect is the same to some extent. It indicates that there is some commonness among all the meditative techniques in spite of their apparent dissimilarity and it may be hoped that each of them leads to the same result as preached by the masters of meditation. The study shows that meditative state has a completely different physiology and that it can be achieved by any meditation technique we have observed. Possible use of this tool in clinical setting such as in stress management and in the treatment of hypertension is also mentioned.
MOCASSIN-prot: a multi-objective clustering approach for protein similarity networks.

PubMed

Keel, Brittney N; Deng, Bo; Moriyama, Etsuko N

2018-04-15

Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi-objective optimization, and extended to incorporate clustering refinement procedure. The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot. emoriyama2@unl.edu. Supplementary data are available at Bioinformatics online.
Photodetachment and UV-Vis spectral properties of Cl2rad -·nHO clusters: Extrapolation to bulk

NASA Astrophysics Data System (ADS)

Pathak, A. K.; Mukherjee, T.; Maity, D. K.

2008-03-01

Vertical detachment energy (VDE) and UV-Vis spectra of Cl2rad -·nHO clusters ( n = 1-11) are reported based on first principle electronic structure calculations. VDE of the hydrated clusters are calculated following second order Moller-Plesset perturbation (MP2) as well as coupled cluster theory with 6-311++G(d,p) set of basis function. The excess electron in these hydrated clusters is mainly localized over the solute Cl atoms. A linear relationship is obtained for VDE vs. ( n + 2.6) -1/3 and bulk VDE of Cl2rad - aqueous solution is calculated as 10.61 eV at CCSD(T) level of theory. UV-Vis spectra of these hydrated clusters are calculated applying CI with single electron (CIS) excitation procedure. Simulated UV-Vis spectra of Cl2rad -·10HO cluster is noted to be in excellent agreement with the reported spectra of Cl2rad - (aq) system, λmax for Cl2rad -·11HO system is calculated to be red shifted though.
D2O clusters isolated in rare-gas solids: Dependence of infrared spectrum on concentration, deposition rate, heating temperature, and matrix material

NASA Astrophysics Data System (ADS)

Shimazaki, Yoichi; Arakawa, Ichiro; Yamakawa, Koichiro

2018-04-01

The infrared absorption spectra of D2O monomers and clusters isolated in rare-gas matrices were systematically reinvestigated under the control of the following factors: the D2O concentration, deposition rate, heating temperature, and rare-gas species. We clearly show that the cluster-size distribution is dependent on not only the D2O concentration but also the deposition rate of a sample; as the rate got higher, smaller clusters were preferentially formed. Under the heating procedures at different temperatures, the cluster-size growth was successfully observed. Since the monomer diffusion was not enough to balance the changes in the column densities of the clusters, the dimer diffusion was likely to contribute the cluster growth. The frequencies of the bonded-OD stretches of (D2O)k with k = 2-6 were almost linearly correlated with the square root of the critical temperature of the matrix material. Additional absorption peaks of (D2O)2 and (D2O)3 in a Xe matrix were assigned to the species trapped in tight accommodation sites.
[Utilization of Big Data in Medicine and Future Outlook].

PubMed

Kinosada, Yasutomi; Uematsu, Machiko; Fujiwara, Takuya

2016-03-01

"Big data" is a new buzzword. The point is not to be dazzled by the volume of data, but rather to analyze it, and convert it into insights, innovations, and business value. There are also real differences between conventional analytics and big data. In this article, we show some results of big data analysis using open DPC (Diagnosis Procedure Combination) data in areas of the central part of JAPAN: Toyama, Ishikawa, Fukui, Nagano, Gifu, Aichi, Shizuoka, and Mie Prefectures. These 8 prefectures contain 51 medical administration areas called the second medical area. By applying big data analysis techniques such as k-means, hierarchical clustering, and self-organizing maps to DPC data, we can visualize the disease structure and detect similarities or variations among the 51 second medical areas. The combination of a big data analysis technique and open DPC data is a very powerful method to depict real figures on patient distribution in Japan.
A marker-free system for the analysis of movement disabilities.

PubMed

Legrand, L; Marzani, F; Dusserre, L

1998-01-01

A major step toward improving the treatments of disabled persons may be achieved by using motion analysis equipment. We are developing such a system. It allows the analysis of plane human motion (e.g. gait) without using the tracking of markers. The system is composed of one fixed camera which acquires an image sequence of a human in motion. Then the treatment is divided into two steps: first, a large number of pixels belonging to the boundaries of the human body are extracted at each acquisition time. Secondly, a two-dimensional model of the human body, based on tapered superquadrics, is successively matched with the sets of pixels previously extracted; a specific fuzzy clustering process is used for this purpose. Moreover, an optical flow procedure gives a prediction of the model location at each acquisition time from its location at the previous time. Finally we present some results of this process applied to a leg in motion.
Wildlife management by habitat units: A preliminary plan of action

NASA Technical Reports Server (NTRS)

Frentress, C. D.; Frye, R. G.

1975-01-01

Procedures for yielding vegetation type maps were developed using LANDSAT data and a computer assisted classification analysis (LARSYS) to assist in managing populations of wildlife species by defined area units. Ground cover in Travis County, Texas was classified on two occasions using a modified version of the unsupervised approach to classification. The first classification produced a total of 17 classes. Examination revealed that further grouping was justified. A second analysis produced 10 classes which were displayed on printouts which were later color-coded. The final classification was 82 percent accurate. While the classification map appeared to satisfactorily depict the existing vegetation, two classes were determined to contain significant error. The major sources of error could have been eliminated by stratifying cluster sites more closely among previously mapped soil associations that are identified with particular plant associations and by precisely defining class nomenclature using established criteria early in the analysis.
Utility of correlation techniques in gravity and magnetic interpretation

NASA Technical Reports Server (NTRS)

Chandler, V. W.; Koski, J. S.; Braice, L. W.; Hinze, W. J.

1977-01-01

Internal correspondence uses Poisson's Theorem in a moving-window linear regression analysis between the anomalous first vertical derivative of gravity and total magnetic field reduced to the pole. The regression parameters provide critical information on source characteristics. The correlation coefficient indicates the strength of the relation between magnetics and gravity. Slope value gives delta j/delta sigma estimates of the anomalous source. The intercept furnishes information on anomaly interference. Cluster analysis consists of the classification of subsets of data into groups of similarity based on correlation of selected characteristics of the anomalies. Model studies are used to illustrate implementation and interpretation procedures of these methods, particularly internal correspondence. Analysis of the results of applying these methods to data from the midcontinent and a transcontinental profile shows they can be useful in identifying crustal provinces, providing information on horizontal and vertical variations of physical properties over province size zones, validating long wavelength anomalies, and isolating geomagnetic field removal problems.
Comparing cosmic web classifiers using information theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leclercq, Florent; Lavaux, Guilhem; Wandelt, Benjamin

We introduce a decision scheme for optimally choosing a classifier, which segments the cosmic web into different structure types (voids, sheets, filaments, and clusters). Our framework, based on information theory, accounts for the design aims of different classes of possible applications: (i) parameter inference, (ii) model selection, and (iii) prediction of new observations. As an illustration, we use cosmographic maps of web-types in the Sloan Digital Sky Survey to assess the relative performance of the classifiers T-WEB, DIVA and ORIGAMI for: (i) analyzing the morphology of the cosmic web, (ii) discriminating dark energy models, and (iii) predicting galaxy colors. Ourmore » study substantiates a data-supported connection between cosmic web analysis and information theory, and paves the path towards principled design of analysis procedures for the next generation of galaxy surveys. We have made the cosmic web maps, galaxy catalog, and analysis scripts used in this work publicly available.« less
A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

PubMed

Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

2015-01-01

Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.
Children's patterns of reasoning about reading and addition concepts.

PubMed

Farrington-Flint, Lee; Canobi, Katherine H; Wood, Clare; Faulkner, Dorothy

2010-06-01

Children's reasoning was examined within two educational contexts (word reading and addition) so as to understand the factors that contribute to relational reasoning in the two domains. Sixty-seven 5- to 7-year-olds were given a series of related words to read or single-digit addition items to solve (interspersed with unrelated items). The frequency, accuracy, and response times of children's self-reports on the conceptually related items provided a measure of relational reasoning, while performance on the unrelated addition and reading items provided a measure of procedural skill. The results indicated that the children's ability to use conceptual relations to solve both reading and addition problems enhanced speed and accuracy levels, increased with age, and was related to procedural skill. However, regression analyses revealed that domain-specific competencies can best explain the use of conceptual relations in both reading and addition. Moreover, a cluster analysis revealed that children differ according to the academic domain in which they first apply conceptual relations and these differences are related to individual variation in their procedural skills within these particular domains. These results highlight the developmental significance of relational reasoning in the context of reading and addition and underscore the importance of concept-procedure links in explaining children's literacy and arithmetical development.
Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

PubMed

Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

2015-09-01

We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, (P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.