Interactive visual exploration and refinement of cluster assignments.
Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R
2017-09-12
With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data
NASA Technical Reports Server (NTRS)
Wharton, S. W.; Turner, B. J.
1981-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
Patterns of victimization between and within peer clusters in a high school social network.
Swartz, Kristin; Reyns, Bradford W; Wilcox, Pamela; Dunham, Jessica R
2012-01-01
This study presents a descriptive analysis of patterns of violent victimization between and within the various cohesive clusters of peers comprising a sample of more than 500 9th-12th grade students from one high school. Social network analysis techniques provide a visualization of the overall friendship network structure and allow for the examination of variation in victimization across the various peer clusters within the larger network. Social relationships among clusters with varying levels of victimization are also illustrated so as to provide a sense of possible spatial clustering or diffusion of victimization across proximal peer clusters. Additionally, to provide a sense of the sorts of peer clusters that support (or do not support) victimization, characteristics of clusters at both the high and low ends of the victimization scale are discussed. Finally, several of the peer clusters at both the high and low ends of the victimization continuum are "unpacked", allowing examination of within-network individual-level differences in victimization for these select clusters.
ERIC Educational Resources Information Center
Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2008-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Comparing the performance of biomedical clustering methods.
Wiwie, Christian; Baumbach, Jan; Röttger, Richard
2015-11-01
Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art.
Clustering analysis of proteins from microbial genomes at multiple levels of resolution.
Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana
2016-08-31
Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C
2014-01-01
Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.
2014-01-01
Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683
Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T
2010-01-01
Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.
Analysis of the nutritional status of algae by Fourier transform infrared chemical imaging
NASA Astrophysics Data System (ADS)
Hirschmugl, Carol J.; Bayarri, Zuheir-El; Bunta, Maria; Holt, Justin B.; Giordano, Mario
2006-09-01
A new non-destructive method to study the nutritional status of algal cells and their environments is demonstrated. This approach allows rapid examination of whole cells without any or little pre-treatment providing a large amount of information on the biochemical composition of cells and growth medium. The method is based on the analysis of a collection of infrared (IR) spectra for individual cells; each spectrum describes the biochemical composition of a portion of a cell; a complete set of spectra is used to reconstruct an image of the entire cell. To obtain spatially resolved information synchrotron radiation was used as a bright IR source. We tested this method on the green flagellate Euglena gracilis; a comparison was conducted between cells grown in nutrient replete conditions (Type 1) and on cells allowed to deplete their medium (Type 2). Complete sets of spectra for individual cells of both types were analyzed with agglomerative hierarchical clustering, leading to distinct clusters representative of the two types of cells. The average spectra for the clusters confirmed the similarities between the clusters and the types of cells. The clustering analysis, therefore, allows the distinction of cells of the same species, but with different nutritional histories. In order to facilitate the application of the method and reduce manipulation (washing), we analyzed the cells in the presence of residual medium. The results obtained showed that even with residual medium the outcome of the clustering analysis is reliable. Our results demonstrate the applicability FTIR microspectroscopy for ecological and ecophysiological studies.
NASA Technical Reports Server (NTRS)
Hasler, Nicole; Bulbul, Esra; Bonamente, Massimiliano; Carlstrom, John E.; Culverhouse, Thomas L.; Gralla, Megan; Greer, Christopher; Lamb, James W.; Hawkins, David; Hennessy, Ryan;
2012-01-01
We perform a joint analysis of X-ray and Sunyaev-Zel'dovich effect data using an analytic model that describes the gas properties of galaxy clusters. The joint analysis allows the measurement of the cluster gas mass fraction profile and Hubble constant independent of cosmological parameters. Weak cosmological priors are used to calculate the overdensity radius within which the gas mass fractions are reported. Such an analysis can provide direct constraints on the evolution of the cluster gas mass fraction with redshift. We validate the model and the joint analysis on high signal-to-noise data from the Chandra X-ray Observatory and the Sunyaev-Zel'dovich Array for two clusters, A2631 and A2204.
a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data
NASA Astrophysics Data System (ADS)
Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.
2017-09-01
Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.
Clustering analysis for muon tomography data elaboration in the Muon Portal project
NASA Astrophysics Data System (ADS)
Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.
2015-05-01
Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.
Conley, Samantha
2017-12-01
The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.
Freud: a software suite for high-throughput simulation analysis
NASA Astrophysics Data System (ADS)
Harper, Eric; Spellings, Matthew; Anderson, Joshua; Glotzer, Sharon
Computer simulation is an indispensable tool for the study of a wide variety of systems. As simulations scale to fill petascale and exascale supercomputing clusters, so too does the size of the data produced, as well as the difficulty in analyzing these data. We present Freud, an analysis software suite for efficient analysis of simulation data. Freud makes no assumptions about the system being analyzed, allowing for general analysis methods to be applied to nearly any type of simulation. Freud includes standard analysis methods such as the radial distribution function, as well as new methods including the potential of mean force and torque and local crystal environment analysis. Freud combines a Python interface with fast, parallel C + + analysis routines to run efficiently on laptops, workstations, and supercomputing clusters. Data analysis on clusters reduces data transfer requirements, a prohibitive cost for petascale computing. Used in conjunction with simulation software, Freud allows for smart simulations that adapt to the current state of the system, enabling the study of phenomena such as nucleation and growth, intelligent investigation of phases and phase transitions, and determination of effective pair potentials.
Ofner, Johannes; Kamilli, Katharina A; Eitenberger, Elisabeth; Friedbacher, Gernot; Lendl, Bernhard; Held, Andreas; Lohninger, Hans
2015-09-15
The chemometric analysis of multisensor hyperspectral data allows a comprehensive image-based analysis of precipitated atmospheric particles. Atmospheric particulate matter was precipitated on aluminum foils and analyzed by Raman microspectroscopy and subsequently by electron microscopy and energy dispersive X-ray spectroscopy. All obtained images were of the same spot of an area of 100 × 100 μm(2). The two hyperspectral data sets and the high-resolution scanning electron microscope images were fused into a combined multisensor hyperspectral data set. This multisensor data cube was analyzed using principal component analysis, hierarchical cluster analysis, k-means clustering, and vertex component analysis. The detailed chemometric analysis of the multisensor data allowed an extensive chemical interpretation of the precipitated particles, and their structure and composition led to a comprehensive understanding of atmospheric particulate matter.
the-wizz: clustering redshift estimation for everyone
NASA Astrophysics Data System (ADS)
Morrison, C. B.; Hildebrandt, H.; Schmidt, S. J.; Baldry, I. K.; Bilicki, M.; Choi, A.; Erben, T.; Schneider, P.
2017-05-01
We present the-wizz, an open source and user-friendly software for estimating the redshift distributions of photometric galaxies with unknown redshifts by spatially cross-correlating them against a reference sample with known redshifts. The main benefit of the-wizz is in separating the angular pair finding and correlation estimation from the computation of the output clustering redshifts allowing anyone to create a clustering redshift for their sample without the intervention of an 'expert'. It allows the end user of a given survey to select any subsample of photometric galaxies with unknown redshifts, match this sample's catalogue indices into a value-added data file and produce a clustering redshift estimation for this sample in a fraction of the time it would take to run all the angular correlations needed to produce a clustering redshift. We show results with this software using photometric data from the Kilo-Degree Survey (KiDS) and spectroscopic redshifts from the Galaxy and Mass Assembly survey and the Sloan Digital Sky Survey. The results we present for KiDS are consistent with the redshift distributions used in a recent cosmic shear analysis from the survey. We also present results using a hybrid machine learning-clustering redshift analysis that enables the estimation of clustering redshifts for individual galaxies. the-wizz can be downloaded at http://github.com/morriscb/The-wiZZ/.
Shah, Sohil Atul
2017-01-01
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Kasza, J; Hemming, K; Hooper, R; Matthews, Jns; Forbes, A B
2017-01-01
Stepped wedge and cluster randomised crossover trials are examples of cluster randomised designs conducted over multiple time periods that are being used with increasing frequency in health research. Recent systematic reviews of both of these designs indicate that the within-cluster correlation is typically taken account of in the analysis of data using a random intercept mixed model, implying a constant correlation between any two individuals in the same cluster no matter how far apart in time they are measured: within-period and between-period intra-cluster correlations are assumed to be identical. Recently proposed extensions allow the within- and between-period intra-cluster correlations to differ, although these methods require that all between-period intra-cluster correlations are identical, which may not be appropriate in all situations. Motivated by a proposed intensive care cluster randomised trial, we propose an alternative correlation structure for repeated cross-sectional multiple-period cluster randomised trials in which the between-period intra-cluster correlation is allowed to decay depending on the distance between measurements. We present results for the variance of treatment effect estimators for varying amounts of decay, investigating the consequences of the variation in decay on sample size planning for stepped wedge, cluster crossover and multiple-period parallel-arm cluster randomised trials. We also investigate the impact of assuming constant between-period intra-cluster correlations instead of decaying between-period intra-cluster correlations. Our results indicate that in certain design configurations, including the one corresponding to the proposed trial, a correlation decay can have an important impact on variances of treatment effect estimators, and hence on sample size and power. An R Shiny app allows readers to interactively explore the impact of correlation decay.
Unsupervised analysis of small animal dynamic Cerenkov luminescence imaging
NASA Astrophysics Data System (ADS)
Spinelli, Antonello E.; Boschi, Federico
2011-12-01
Clustering analysis (CA) and principal component analysis (PCA) were applied to dynamic Cerenkov luminescence images (dCLI). In order to investigate the performances of the proposed approaches, two distinct dynamic data sets obtained by injecting mice with 32P-ATP and 18F-FDG were acquired using the IVIS 200 optical imager. The k-means clustering algorithm has been applied to dCLI and was implemented using interactive data language 8.1. We show that cluster analysis allows us to obtain good agreement between the clustered and the corresponding emission regions like the bladder, the liver, and the tumor. We also show a good correspondence between the time activity curves of the different regions obtained by using CA and manual region of interest analysis on dCLIT and PCA images. We conclude that CA provides an automatic unsupervised method for the analysis of preclinical dynamic Cerenkov luminescence image data.
Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...
2017-06-06
There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
Multiscale visual quality assessment for cluster analysis with self-organizing maps
NASA Astrophysics Data System (ADS)
Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias
2011-01-01
Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
Fernández-Arjona, María Del Mar; Grondona, Jesús M; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor.
Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor. PMID:28848398
NASA Astrophysics Data System (ADS)
Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.
2015-07-01
In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP) by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed yielding an explict cluster attribution for each particle, improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
NeatMap--non-clustering heat map alternatives in R.
Rajaram, Satwik; Oono, Yoshi
2010-01-22
The clustered heat map is the most popular means of visualizing genomic data. It compactly displays a large amount of data in an intuitive format that facilitates the detection of hidden structures and relations in the data. However, it is hampered by its use of cluster analysis which does not always respect the intrinsic relations in the data, often requiring non-standardized reordering of rows/columns to be performed post-clustering. This sometimes leads to uninformative and/or misleading conclusions. Often it is more informative to use dimension-reduction algorithms (such as Principal Component Analysis and Multi-Dimensional Scaling) which respect the topology inherent in the data. Yet, despite their proven utility in the analysis of biological data, they are not as widely used. This is at least partially due to the lack of user-friendly visualization methods with the visceral impact of the heat map. NeatMap is an R package designed to meet this need. NeatMap offers a variety of novel plots (in 2 and 3 dimensions) to be used in conjunction with these dimension-reduction techniques. Like the heat map, but unlike traditional displays of such results, it allows the entire dataset to be displayed while visualizing relations between elements. It also allows superimposition of cluster analysis results for mutual validation. NeatMap is shown to be more informative than the traditional heat map with the help of two well-known microarray datasets. NeatMap thus preserves many of the strengths of the clustered heat map while addressing some of its deficiencies. It is hoped that NeatMap will spur the adoption of non-clustering dimension-reduction algorithms.
Samsir, Sri A'jilah; Bunawan, Hamidun; Yen, Choong Chee; Noor, Normah Mohd
2016-09-01
In this dataset, we distinguish 15 accessions of Garcinia mangostana from Peninsular Malaysia using Fourier transform-infrared spectroscopy coupled with chemometric analysis. We found that the position and intensity of characteristic peaks at 3600-3100 cm(-) (1) in IR spectra allowed discrimination of G. mangostana from different locations. Further principal component analysis (PCA) of all the accessions suggests the two main clusters were formed: samples from Johor, Melaka, and Negeri Sembilan (South) were clustered together in one group while samples from Perak, Kedah, Penang, Selangor, Kelantan, and Terengganu (North and East Coast) were in another clustered group.
Potashev, Konstantin; Sharonova, Natalia; Breus, Irina
2014-07-01
Clustering was employed for the analysis of obtained experimental data set (42 plants in total) on seed germination in leached chernozem contaminated with kerosene. Among investigated plants were 31 cultivated plants from 11 families (27 species and 20 varieties) and 11 wild plant species from 7 families, 23 annual and 19 perennial/biannual plant species, 11 monocotyledonous and 31 dicotyledonous plants. Two-dimensional (two-parameter) clustering approach, allowing the estimation of tolerance of germinating seeds using a pair of independent parameters (С75%, V7%) was found to be most effective. These parameters characterized the ability of seeds to both withstand high concentrations of contaminants without the significant reduction of the germination, and maintain high germination rate within certain contaminant concentrations. The performed clustering revealed a number of plant features, which define the relation of a particular plant to a particular tolerance cluster; it has also demonstrated the possibility of generalizing the kerosene results for n-tridecane, which is one of the typical kerosene components. In contrast to the "manual" plant ranking based on the assessment of germination at discrete concentrations of the contaminant, the proposed clustering approach allowed a generalized characterization of the seed tolerance/sensitivity to hydrocarbon contaminants. Copyright © 2014 Elsevier B.V. All rights reserved.
Jade: using on-demand cloud analysis to give scientists back their flow
NASA Astrophysics Data System (ADS)
Robinson, N.; Tomlinson, J.; Hilson, A. J.; Arribas, A.; Powell, T.
2017-12-01
The UK's Met Office generates 400 TB weather and climate data every day by running physical models on its Top 20 supercomputer. As data volumes explode, there is a danger that analysis workflows become dominated by watching progress bars, and not thinking about science. We have been researching how we can use distributed computing to allow analysts to process these large volumes of high velocity data in a way that's easy, effective and cheap.Our prototype analysis stack, Jade, tries to encapsulate this. Functionality includes: An under-the-hood Dask engine which parallelises and distributes computations, without the need to retrain analysts Hybrid compute clusters (AWS, Alibaba, and local compute) comprising many thousands of cores Clusters which autoscale up/down in response to calculation load using Kubernetes, and balances the cluster across providers based on the current price of compute Lazy data access from cloud storage via containerised OpenDAP This technology stack allows us to perform calculations many orders of magnitude faster than is possible on local workstations. It is also possible to outperform dedicated local compute clusters, as cloud compute can, in principle, scale to much larger scales. The use of ephemeral compute resources also makes this implementation cost efficient.
Walton, Barbara L; Verbeck, Guido F
2014-08-19
Matrix-assisted laser desorption ionization (MALDI) imaging is gaining popularity, but matrix effects such as mass spectral interference and damage to the sample limit its applications. Replacing traditional matrices with silver particles capable of equivalent or increased photon energy absorption from the incoming laser has proven to be beneficial for low mass analysis. Not only can silver clusters be advantageous for low mass compound detection, but they can be used for imaging as well. Conventional matrix application methods can obstruct samples, such as fingerprints, rendering them useless after mass analysis. The ability to image latent fingerprints without causing damage to the ridge pattern is important as it allows for further characterization of the print. The application of silver clusters by soft-landing ion mobility allows for enhanced MALDI and preservation of fingerprint integrity.
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.
Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun
2017-01-01
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
NASA Astrophysics Data System (ADS)
Ravagnan, Luca; Divitini, Giorgio; Rebasti, Sara; Marelli, Mattia; Piseri, Paolo; Milani, Paolo
2009-04-01
Nanocomposite films were fabricated by supersonic cluster beam deposition (SCBD) of palladium clusters on poly(methyl methacrylate) (PMMA) surfaces. The evolution of the electrical conductance with cluster coverage and microscopy analysis show that Pd clusters are implanted in the polymer and form a continuous layer extending for several tens of nanometres beneath the polymer surface. This allows the deposition, using stencil masks, of cluster-assembled Pd microstructures on PMMA showing a remarkably high adhesion compared with metallic films obtained by thermal evaporation. These results suggest that SCBD is a promising tool for the fabrication of metallic microstructures on flexible polymeric substrates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward
There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
Cluster randomised trials in the medical literature: two bibliometric surveys
Bland, J Martin
2004-01-01
Background Several reviews of published cluster randomised trials have reported that about half did not take clustering into account in the analysis, which was thus incorrect and potentially misleading. In this paper I ask whether cluster randomised trials are increasing in both number and quality of reporting. Methods Computer search for papers on cluster randomised trials since 1980, hand search of trial reports published in selected volumes of the British Medical Journal over 20 years. Results There has been a large increase in the numbers of methodological papers and of trial reports using the term 'cluster random' in recent years, with about equal numbers of each type of paper. The British Medical Journal contained more such reports than any other journal. In this journal there was a corresponding increase over time in the number of trials where subjects were randomised in clusters. In 2003 all reports showed awareness of the need to allow for clustering in the analysis. In 1993 and before clustering was ignored in most such trials. Conclusion Cluster trials are becoming more frequent and reporting is of higher quality. Perhaps statistician pressure works. PMID:15310402
Alexander, Nathan; Woetzel, Nils; Meiler, Jens
2011-02-01
Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.
Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4–0.9] redshift range
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guennou, L.; et al.
2014-01-17
Context. The DAFT/FADA survey is based on the study of ~90 rich(masses found in the literature >2 x 10^14 M_⊙)and moderately distant clusters (redshifts 0.4 < z < 0.9), all withHST imaging data available. This survey has two main objectives: to constrain dark energy(DE) using weak lensing tomography on galaxy clusters and to build a database (deepmulti-band imaging allowing photometric redshift estimates, spectroscopic data, X-raydata) of rich distant clusters to study their properties.
2014-01-01
Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where possible, discontinuation of clusters following heterogeneous merges, allowance for potential loss of clusters and additional variability in cluster size in the original sample size calculation, and use of appropriate ICC estimates that reflect cluster size. PMID:24884591
A generalized analysis of hydrophobic and loop clusters within globular protein sequences
Eudes, Richard; Le Tuan, Khanh; Delettré, Jean; Mornon, Jean-Paul; Callebaut, Isabelle
2007-01-01
Background Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet. Results The structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters. Conclusion The dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction. PMID:17210072
Wilderjans, Tom F; Ceulemans, Eva; Van Mechelen, Iven; Depril, Dirk
2011-03-01
In many areas of psychology, one is interested in disclosing the underlying structural mechanisms that generated an object by variable data set. Often, based on theoretical or empirical arguments, it may be expected that these underlying mechanisms imply that the objects are grouped into clusters that are allowed to overlap (i.e., an object may belong to more than one cluster). In such cases, analyzing the data with Mirkin's additive profile clustering model may be appropriate. In this model: (1) each object may belong to no, one or several clusters, (2) there is a specific variable profile associated with each cluster, and (3) the scores of the objects on the variables can be reconstructed by adding the cluster-specific variable profiles of the clusters the object in question belongs to. Until now, however, no software program has been publicly available to perform an additive profile clustering analysis. For this purpose, in this article, the ADPROCLUS program, steered by a graphical user interface, is presented. We further illustrate its use by means of the analysis of a patient by symptom data matrix.
NASA Astrophysics Data System (ADS)
Crawford, I.; Ruske, S.; Topping, D. O.; Gallagher, M. W.
2015-11-01
In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs) by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF) spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4) where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio-hydro-atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen-Rocky Mountain Biogenic Aerosol Study) ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP) where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misattribution due to poor centroid definition and failure to assign particles to a cluster as a result of the subsampling and comparative attribution method employed by WASP. The methods used here allow for the entire fluorescent population of particles to be analysed, yielding an explicit cluster attribution for each particle and improving cluster centroid definition and our capacity to discriminate and quantify PBAP meta-classes compared to previous approaches.
Dias, Claudia; Mendes, Luís
2018-01-01
Despite the importance of the literature on food quality labels in the European Union (PDO, PGI and TSG), our search did not find any review joining the various research topics on this subject. This study aims therefore to consolidate the state of academic research in this field, and so the methodological option was to elaborate a bibliometric analysis resorting to the term co-occurrence technique. Analysis was made of 501 articles on the ISI Web of Science database, covering publications up to 2016. The results of the bibliometric analysis allowed identification of four clusters: "Protected Geographical Indication", "Certification of Olive Oil and Cultivars", "Certification of Cheese and Milk" and "Certification and Chemical Composition". Unlike the other clusters, where the PDO label predominates, the "Protected Geographical Indication" cluster covers the study of PGI products, highlighting analysis of consumer behaviour in relation to this type of product. The focus of studies in the "Certification of Olive Oil and Cultivars" cluster and the "Certification of Cheese and Milk" cluster is the development of authentication methods for certified traditional products. In the "Certification and Chemical Composition" cluster, standing out is analysis of the profiles of fatty acids present in this type of product. Copyright © 2017 Elsevier Ltd. All rights reserved.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Henry, David; Dymnicki, Allison B.; Mohatt, Nathaniel; Allen, James; Kelly, James G.
2016-01-01
Qualitative methods potentially add depth to prevention research, but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data, but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-Means clustering, and latent class analysis produced similar levels of accuracy with binary data, and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a “real-world” example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969
Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G
2015-10-01
Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.
Gorzalczany, Marian B; Rudzinski, Filip
2017-06-07
This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain--during learning--to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network--working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)--to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.
SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.
Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A
2018-01-01
Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.
Baglietto, Gabriel; Gigante, Guido; Del Giudice, Paolo
2017-01-01
Two, partially interwoven, hot topics in the analysis and statistical modeling of neural data, are the development of efficient and informative representations of the time series derived from multiple neural recordings, and the extraction of information about the connectivity structure of the underlying neural network from the recorded neural activities. In the present paper we show that state-space clustering can provide an easy and effective option for reducing the dimensionality of multiple neural time series, that it can improve inference of synaptic couplings from neural activities, and that it can also allow the construction of a compact representation of the multi-dimensional dynamics, that easily lends itself to complexity measures. We apply a variant of the 'mean-shift' algorithm to perform state-space clustering, and validate it on an Hopfield network in the glassy phase, in which metastable states are largely uncorrelated from memories embedded in the synaptic matrix. In this context, we show that the neural states identified as clusters' centroids offer a parsimonious parametrization of the synaptic matrix, which allows a significant improvement in inferring the synaptic couplings from the neural activities. Moving to the more realistic case of a multi-modular spiking network, with spike-frequency adaptation inducing history-dependent effects, we propose a procedure inspired by Boltzmann learning, but extending its domain of application, to learn inter-module synaptic couplings so that the spiking network reproduces a prescribed pattern of spatial correlations; we then illustrate, in the spiking network, how clustering is effective in extracting relevant features of the network's state-space landscape. Finally, we show that the knowledge of the cluster structure allows casting the multi-dimensional neural dynamics in the form of a symbolic dynamics of transitions between clusters; as an illustration of the potential of such reduction, we define and analyze a measure of complexity of the neural time series.
Sample size calculation for stepped wedge and other longitudinal cluster randomised trials.
Hooper, Richard; Teerenstra, Steven; de Hoop, Esther; Eldridge, Sandra
2016-11-20
The sample size required for a cluster randomised trial is inflated compared with an individually randomised trial because outcomes of participants from the same cluster are correlated. Sample size calculations for longitudinal cluster randomised trials (including stepped wedge trials) need to take account of at least two levels of clustering: the clusters themselves and times within clusters. We derive formulae for sample size for repeated cross-section and closed cohort cluster randomised trials with normally distributed outcome measures, under a multilevel model allowing for variation between clusters and between times within clusters. Our formulae agree with those previously described for special cases such as crossover and analysis of covariance designs, although simulation suggests that the formulae could underestimate required sample size when the number of clusters is small. Whether using a formula or simulation, a sample size calculation requires estimates of nuisance parameters, which in our model include the intracluster correlation, cluster autocorrelation, and individual autocorrelation. A cluster autocorrelation less than 1 reflects a situation where individuals sampled from the same cluster at different times have less correlated outcomes than individuals sampled from the same cluster at the same time. Nuisance parameters could be estimated from time series obtained in similarly clustered settings with the same outcome measure, using analysis of variance to estimate variance components. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
fluff: exploratory analysis and visualization of high-throughput sequencing data
Georgiou, Georgios
2016-01-01
Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532
The Open Cluster Chemical Abundances and Mapping (OCCAM) Survey: Current Status
NASA Astrophysics Data System (ADS)
Frinchaboy, Peter; O'Connell, Julia; Donor, John; Cunha, Katia; Thompson, Benjamin; Melendez, Matthew; Shetrone, Matthew; Zasowski, Gail; Majewski, Steven R.; APOGEE TEAM
2018-01-01
The Open Cluster Chemical Analysis and Mapping (OCCAM) survey aims to produce a comprehensive, uniform, infrared-based data set forhundreds of open clusters, and constrain key Galactic dynamical and chemical parameters using the SDSS/APOGEE survey and follow-up from the McDonald Observatory Otto Struve 2.1-m telescope and Sandiford Cass Echelle Spectrograph (R ~ 60,000). We report on multi-element radial abundance gradients obtained from a sample of over 30 disk open clusters. The APOGEE chemical abundances were derived automatically by the ASPCAP pipeline and these are part of the SDSS IV Data Release 14, optical follow-up were analyzed using equivalent width analysis and spectral synthesis. We present the current open cluster sample that spans a significant range in age allowing exploration of the evolution of the Galactic abundance gradients. This work is supported by an NSF AAG grants AST-1311835 & AST-1715662.
Amendola, Antonella; Bianchi, Silvia; Frati, Elena R; Ciceri, Giulia; Faccini, Marino; Senatore, Sabrina; Colzani, Daniela; Lamberti, Anna; Baggieri, Melissa; Cereda, Danilo; Gramegna, Maria; Nicoletti, Loredana; Magurano, Fabio; Tanzi, Elisabetta
2017-08-17
A large measles outbreak has been ongoing in Milan and surrounding areas. From 1 March to 30 June 2017, 203 measles cases were laboratory-confirmed (108 sporadic cases and 95 related to 47 clusters). Phylogenetic analysis revealed the co-circulation of two different genotypes, D8 and B3. Both genotypes caused nosocomial clusters in two hospitals. The rapid analysis of epidemiological and phylogenetic data allowed effective surveillance and tracking of transmission pathways. This article is copyright of The Authors, 2017.
Amendola, Antonella; Bianchi, Silvia; Frati, Elena R; Ciceri, Giulia; Faccini, Marino; Senatore, Sabrina; Colzani, Daniela; Lamberti, Anna; Baggieri, Melissa; Cereda, Danilo; Gramegna, Maria; Nicoletti, Loredana; Magurano, Fabio; Tanzi, Elisabetta
2017-01-01
A large measles outbreak has been ongoing in Milan and surrounding areas. From 1 March to 30 June 2017, 203 measles cases were laboratory-confirmed (108 sporadic cases and 95 related to 47 clusters). Phylogenetic analysis revealed the co-circulation of two different genotypes, D8 and B3. Both genotypes caused nosocomial clusters in two hospitals. The rapid analysis of epidemiological and phylogenetic data allowed effective surveillance and tracking of transmission pathways. PMID:28840825
antiSMASH 3.0—a comprehensive resource for the genome mining of biosynthetic gene clusters
Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko
2015-01-01
Abstract Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. PMID:25948579
Simulations of the Formation and Evolution of X-ray Clusters
NASA Astrophysics Data System (ADS)
Bryan, G. L.; Klypin, A.; Norman, M. L.
1994-05-01
We describe results from a set of Omega = 1 Cold plus Hot Dark Matter (CHDM) and Cold Dark Matter (CDM) simulations. We examine the formation and evolution of X-ray clusters in a cosmological setting with sufficient numbers to perform statistical analysis. We find that CDM, normalized to COBE, seems to produce too many large clusters, both in terms of the luminosity (dn/dL) and temperature (dn/dT) functions. The CHDM simulation produces fewer clusters and the temperature distribution (our numerically most secure result) matches observations where they overlap. The computed cluster luminosity function drops below observations, but we are almost surely underestimating the X-ray luminosity. Because of the lower fluctuations in CHDM, there are only a small number of bright clusters in our simulation volume; however we can use the simulated clusters to fix the relation between temperature and velocity dispersion, allowing us to use collisionless N-body codes to probe larger length scales with correspondingly brighter clusters. The hydrodynamic simulations have been performed with a hybrid particle-mesh scheme for the dark matter and a high resolution grid-based piecewise parabolic method for the adiabatic gas dynamics. This combination has been implemented for massively parallel computers, allowing us to achive grids as large as 512(3) .
Intermediate and advanced topics in multilevel logistic regression analysis.
Austin, Peter C; Merlo, Juan
2017-09-10
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033
NASA Astrophysics Data System (ADS)
Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.
2018-04-01
Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.
Ortholog-based screening and identification of genes related to intracellular survival.
Yang, Xiaowen; Wang, Jiawei; Bing, Guoxia; Bie, Pengfei; De, Yanyan; Lyu, Yanli; Wu, Qingmin
2018-04-20
Bioinformatics and comparative genomics analysis methods were used to predict unknown pathogen genes based on homology with identified or functionally clustered genes. In this study, the genes of common pathogens were analyzed to screen and identify genes associated with intracellular survival through sequence similarity, phylogenetic tree analysis and the λ-Red recombination system test method. The total 38,952 protein-coding genes of common pathogens were divided into 19,775 clusters. As demonstrated through a COG analysis, information storage and processing genes might play an important role intracellular survival. Only 19 clusters were present in facultative intracellular pathogens, and not all were present in extracellular pathogens. Construction of a phylogenetic tree selected 18 of these 19 clusters. Comparisons with the DEG database and previous research revealed that seven other clusters are considered essential gene clusters and that seven other clusters are associated with intracellular survival. Moreover, this study confirmed that clusters screened by orthologs with similar function could be replaced with an approved uvrY gene and its orthologs, and the results revealed that the usg gene is associated with intracellular survival. The study improves the current understanding of intracellular pathogens characteristics and allows further exploration of the intracellular survival-related gene modules in these pathogens. Copyright © 2018. Published by Elsevier B.V.
Integrating Multiple Data Views for Improved Malware Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Blake H.
2014-01-31
Exploiting multiple views of a program makes obfuscating the intended behavior of a program more difficult allowing for better performance in classification, clustering, and phylogenetic reconstruction.
Wang, Yi; Coleman-Derr, Devin; Chen, Guoping; Gu, Yong Q
2015-07-01
Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Cluster analysis and prediction of treatment outcomes for chronic rhinosinusitis.
Soler, Zachary M; Hyer, J Madison; Rudmik, Luke; Ramakrishnan, Viswanathan; Smith, Timothy L; Schlosser, Rodney J
2016-04-01
Current clinical classifications of chronic rhinosinusitis (CRS) have weak prognostic utility regarding treatment outcomes. Simplified discriminant analysis based on unsupervised clustering has identified novel phenotypic subgroups of CRS, but prognostic utility is unknown. We sought to determine whether discriminant analysis allows prognostication in patients choosing surgery versus continued medical management. A multi-institutional prospective study of patients with CRS in whom initial medical therapy failed who then self-selected continued medical management or surgical treatment was used to separate patients into 5 clusters based on a previously described discriminant analysis using total Sino-Nasal Outcome Test-22 (SNOT-22) score, age, and missed productivity. Patients completed the SNOT-22 at baseline and for 18 months of follow-up. Baseline demographic and objective measures included olfactory testing, computed tomography, and endoscopy scoring. SNOT-22 outcomes for surgical versus continued medical treatment were compared across clusters. Data were available on 690 patients. Baseline differences in demographics, comorbidities, objective disease measures, and patient-reported outcomes were similar to previous clustering reports. Three of 5 clusters identified by means of discriminant analysis had improved SNOT-22 outcomes with surgical intervention when compared with continued medical management (surgery was a mean of 21.2 points better across these 3 clusters at 6 months, P < .05). These differences were sustained at 18 months of follow-up. Two of 5 clusters had similar outcomes when comparing surgery with continued medical management. A simplified discriminant analysis based on 3 common clinical variables is able to cluster patients and provide prognostic information regarding surgical treatment versus continued medical management in patients with CRS. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.
Analysis of Tropical Cyclone Tracks in the North Indian Ocean
NASA Astrophysics Data System (ADS)
Patwardhan, A.; Paliwal, M.; Mohapatra, M.
2011-12-01
Cyclones are regarded as one of the most dangerous meteorological phenomena of the tropical region. The probability of landfall of a tropical cyclone depends on its movement (trajectory). Analysis of trajectories of tropical cyclones could be useful for identifying potentially predictable characteristics. There is long history of analysis of tropical cyclones tracks. A common approach is using different clustering techniques to group the cyclone tracks on the basis of certain characteristics. Various clustering method have been used to study the tropical cyclones in different ocean basins like western North Pacific ocean (Elsner and Liu, 2003; Camargo et al., 2007), North Atlantic Ocean (Elsner, 2003; Gaffney et al. 2007; Nakamura et al., 2009). In this study, tropical cyclone tracks in the North Indian Ocean basin, for the period 1961-2010 have been analyzed and grouped into clusters based on their spatial characteristics. A tropical cyclone trajectory is approximated as an open curve and described by its first two moments. The resulting clusters have different centroid locations and also differently shaped variance ellipses. These track characteristics are then used in the standard clustering algorithms which allow the whole track shape, length, and location to be incorporated into the clustering methodology. The resulting clusters have different genesis locations and trajectory shapes. We have also examined characteristics such as life span, maximum sustained wind speed, landfall, seasonality, many of which are significantly different across the identified clusters. The clustering approach groups cyclones with higher maximum wind speed and longest life span in to one cluster. Another cluster includes short duration cyclonic events that are mostly deep depressions and significant for rainfall over Eastern and Central India. The clustering approach is likely to prove useful for analysis of events of significance with regard to impacts.
Fens, Niki; van Rossum, Annelot G J; Zanen, Pieter; van Ginneken, Bram; van Klaveren, Rob J; Zwinderman, Aeilko H; Sterk, Peter J
2013-06-01
Classification of COPD is currently based on the presence and severity of airways obstruction. However, this may not fully reflect the phenotypic heterogeneity of COPD in the (ex-) smoking community. We hypothesized that factor analysis followed by cluster analysis of functional, clinical, radiological and exhaled breath metabolomic features identifies subphenotypes of COPD in a community-based population of heavy (ex-) smokers. Adults between 50-75 years with a smoking history of at least 15 pack-years derived from a random population-based survey as part of the NELSON study underwent detailed assessment of pulmonary function, chest CT scanning, questionnaires and exhaled breath molecular profiling using an electronic nose. Factor and cluster analyses were performed on the subgroup of subjects fulfilling the GOLD criteria for COPD (post-BD FEV1/FVC < 0.70). Three hundred subjects were recruited, of which 157 fulfilled the criteria for COPD and were included in the factor and cluster analysis. Four clusters were identified: cluster 1 (n = 35; 22%): mild COPD, limited symptoms and good quality of life. Cluster 2 (n = 48; 31%): low lung function, combined emphysema and chronic bronchitis and a distinct breath molecular profile. Cluster 3 (n = 60; 38%): emphysema predominant COPD with preserved lung function. Cluster 4 (n = 14; 9%): highly symptomatic COPD with mildly impaired lung function. In a leave-one-out validation analysis an accuracy of 97.4% was reached. This unbiased taxonomy for mild to moderate COPD reinforces clusters found in previous studies and thereby allows better phenotyping of COPD in the general (ex-) smoking population.
Topic modeling for cluster analysis of large biological and medical datasets
2014-01-01
Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106
Topic modeling for cluster analysis of large biological and medical datasets.
Zhao, Weizhong; Zou, Wen; Chen, James J
2014-01-01
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.
antiSMASH 2.0--a versatile platform for genome mining of secondary metabolite producers.
Blin, Kai; Medema, Marnix H; Kazempour, Daniyal; Fischbach, Michael A; Breitling, Rainer; Takano, Eriko; Weber, Tilmann
2013-07-01
Microbial secondary metabolites are a potent source of antibiotics and other pharmaceuticals. Genome mining of their biosynthetic gene clusters has become a key method to accelerate their identification and characterization. In 2011, we developed antiSMASH, a web-based analysis platform that automates this process. Here, we present the highly improved antiSMASH 2.0 release, available at http://antismash.secondarymetabolites.org/. For the new version, antiSMASH was entirely re-designed using a plug-and-play concept that allows easy integration of novel predictor or output modules. antiSMASH 2.0 now supports input of multiple related sequences simultaneously (multi-FASTA/GenBank/EMBL), which allows the analysis of draft genomes comprising multiple contigs. Moreover, direct analysis of protein sequences is now possible. antiSMASH 2.0 has also been equipped with the capacity to detect additional classes of secondary metabolites, including oligosaccharide antibiotics, phenazines, thiopeptides, homo-serine lactones, phosphonates and furans. The algorithm for predicting the core structure of the cluster end product is now also covering lantipeptides, in addition to polyketides and non-ribosomal peptides. The antiSMASH ClusterBlast functionality has been extended to identify sub-clusters involved in the biosynthesis of specific chemical building blocks. The new features currently make antiSMASH 2.0 the most comprehensive resource for identifying and analyzing novel secondary metabolite biosynthetic pathways in microorganisms.
Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.
2011-01-01
High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204
Self-assembly of a tetrahedral 58-nuclear barium vanadium oxide cluster.
Kastner, Katharina; Puscher, Bianka; Streb, Carsten
2013-01-07
We report the synthesis and characterization of a molecular barium vanadium oxide cluster featuring high nuclearity and high symmetry. The tetrameric, 2.3 nm cluster H(5)[Ba(10)(NMP)(14)(H(2)O)(8)[V(12)O(33)](4)Br] is based on a bromide-centred, octahedral barium scaffold which is capped by four previously unknown [V(12)O(33)](6-) clusters in a tetrahedral fashion. The compound represents the largest polyoxovanadate-based heterometallic cluster known to date. The cluster is formed in organic solution and it is suggested that the bulky N-methyl-2-pyrrolidone (NMP) solvent ligands allow the isolation of this giant molecule and prevent further condensation to a solid-state metal oxide. The cluster is fully characterized using single-crystal XRD, elemental analysis, ESI mass spectrometry and other spectroscopic techniques.
Graph analysis of cell clusters forming vascular networks
NASA Astrophysics Data System (ADS)
Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.
2018-03-01
This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.
NASA Astrophysics Data System (ADS)
Kamann, S.; Husser, T.-O.; Dreizler, S.; Emsellem, E.; Weilbacher, P. M.; Martens, S.; Bacon, R.; den Brok, M.; Giesers, B.; Krajnović, D.; Roth, M. M.; Wendt, M.; Wisotzki, L.
2018-02-01
This is the first of a series of papers presenting the results from our survey of 25 Galactic globular clusters with the MUSE integral-field spectrograph. In combination with our dedicated algorithm for source deblending, MUSE provides unique multiplex capabilities in crowded stellar fields and allows us to acquire samples of up to 20 000 stars within the half-light radius of each cluster. The present paper focuses on the analysis of the internal dynamics of 22 out of the 25 clusters, using about 500 000 spectra of 200 000 individual stars. Thanks to the large stellar samples per cluster, we are able to perform a detailed analysis of the central rotation and dispersion fields using both radial profiles and two-dimensional maps. The velocity dispersion profiles we derive show a good general agreement with existing radial velocity studies but typically reach closer to the cluster centres. By comparison with proper motion data, we derive or update the dynamical distance estimates to 14 clusters. Compared to previous dynamical distance estimates for 47 Tuc, our value is in much better agreement with other methods. We further find significant (>3σ) rotation in the majority (13/22) of our clusters. Our analysis seems to confirm earlier findings of a link between rotation and the ellipticities of globular clusters. In addition, we find a correlation between the strengths of internal rotation and the relaxation times of the clusters, suggesting that the central rotation fields are relics of the cluster formation that are gradually dissipated via two-body relaxation.
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.
Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain
2015-10-01
Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Metrics and methods for characterizing dairy farm intensification using farm survey data.
Gonzalez-Mejia, Alejandra; Styles, David; Wilson, Paul; Gibbons, James
2018-01-01
Evaluation of agricultural intensification requires comprehensive analysis of trends in farm performance across physical and socio-economic aspects, which may diverge across farm types. Typical reporting of economic indicators at sectorial or the "average farm" level does not represent farm diversity and provides limited insight into the sustainability of specific intensification pathways. Using farm business data from a total of 7281 farm survey observations of English and Welsh dairy farms over a 14-year period we calculate a time series of 16 key performance indicators (KPIs) pertinent to farm structure, environmental and socio-economic aspects of sustainability. We then apply principle component analysis and model-based clustering analysis to identify statistically the number of distinct dairy farm typologies for each year of study, and link these clusters through time using multidimensional scaling. Between 2001 and 2014, dairy farms have largely consolidated and specialized into two distinct clusters: more extensive farms relying predominantly on grass, with lower milk yields but higher labour intensity, and more intensive farms producing more milk per cow with more concentrate and more maize, but lower labour intensity. There is some indication that these clusters are converging as the extensive cluster is intensifying slightly faster than the intensive cluster, in terms of milk yield per cow and use of concentrate feed. In 2014, annual milk yields were 6,835 and 7,500 l/cow for extensive and intensive farm types, respectively, whilst annual concentrate feed use was 1.3 and 1.5 tonnes per cow. For several KPIs such as milk yield the mean trend across all farms differed substantially from the extensive and intensive typologies mean. The indicators and analysis methodology developed allows identification of distinct farm types and industry trends using readily available survey data. The identified groups allow the accurate evaluation of the consequences of the reduction in dairy farm numbers and intensification at national and international scales.
Metrics and methods for characterizing dairy farm intensification using farm survey data
Gonzalez-Mejia, Alejandra; Styles, David; Wilson, Paul
2018-01-01
Evaluation of agricultural intensification requires comprehensive analysis of trends in farm performance across physical and socio-economic aspects, which may diverge across farm types. Typical reporting of economic indicators at sectorial or the “average farm” level does not represent farm diversity and provides limited insight into the sustainability of specific intensification pathways. Using farm business data from a total of 7281 farm survey observations of English and Welsh dairy farms over a 14-year period we calculate a time series of 16 key performance indicators (KPIs) pertinent to farm structure, environmental and socio-economic aspects of sustainability. We then apply principle component analysis and model-based clustering analysis to identify statistically the number of distinct dairy farm typologies for each year of study, and link these clusters through time using multidimensional scaling. Between 2001 and 2014, dairy farms have largely consolidated and specialized into two distinct clusters: more extensive farms relying predominantly on grass, with lower milk yields but higher labour intensity, and more intensive farms producing more milk per cow with more concentrate and more maize, but lower labour intensity. There is some indication that these clusters are converging as the extensive cluster is intensifying slightly faster than the intensive cluster, in terms of milk yield per cow and use of concentrate feed. In 2014, annual milk yields were 6,835 and 7,500 l/cow for extensive and intensive farm types, respectively, whilst annual concentrate feed use was 1.3 and 1.5 tonnes per cow. For several KPIs such as milk yield the mean trend across all farms differed substantially from the extensive and intensive typologies mean. The indicators and analysis methodology developed allows identification of distinct farm types and industry trends using readily available survey data. The identified groups allow the accurate evaluation of the consequences of the reduction in dairy farm numbers and intensification at national and international scales. PMID:29742166
Cluster Masses Derived from X-ray and Sunyaev-Zeldovich Effect Measurements
NASA Technical Reports Server (NTRS)
Laroque, S.; Joy, Marshall; Bonamente, M.; Carlstrom, J.; Dawson, K.
2003-01-01
We infer the gas mass and total gravitational mass of 11 clusters using two different methods; analysis of X-ray data from the Chandra X-ray Observatory and analysis of centimeter-wave Sunyaev-Zel'dovich Effect (SZE) data from the BIMA and OVRO interferometers. This flux-limited sample of clusters from the BCS cluster catalogue was chosen so as to be well above the surface brightness limit of the ROSAT All Sky Survey; this is therefore an orientation unbiased sample. The gas mass fraction, f_g, is calculated for each cluster using both X-ray and SZE data, and the results are compared at a fiducial radius of r_500. Comparison of the X-ray and SZE results for this orientation unbiased sample allows us to constrain cluster systematics, such as clumping of the intracluster medium. We derive an upper limit on Omega_M assuming that the mass composition of clusters within r_500 reflects the universal mass composition Omega_M h_100 is greater than Omega _B / f-g. We also demonstrate how the mean f_g derived from the sample can be used to estimate the masses of clusters discovered by upcoming deep SZE surveys.
Characterization of high explosive particles using cluster secondary ion mass spectrometry.
Gillen, Greg; Mahoney, Christine; Wight, Scott; Lareau, Richard
2006-01-01
The use of secondary ion mass spectrometry (SIMS) for the detection and spatially resolved analysis of individual high explosive particles is described. A C(8) (-) carbon cluster primary ion beam was used in a commercial SIMS instrument to analyze samples of high explosives dispersed as particles on silicon substrates. In comparison with monatomic primary ion bombardment, the carbon cluster primary ion beam was found to greatly enhance characteristic secondary ion signals from the explosive compounds while causing minimal beam-induced degradation. The resistance of these compounds to degradation under ion bombardment allows explosive particles to be analyzed under high primary ion dose bombardment (dynamic SIMS) conditions, facilitating the rapid acquisition of spatially resolved molecular information. The use of cluster SIMS combined with computer control of the sample stage position allows for the automated identification and counting of explosive particle distributions on silicon surfaces. This will be useful for characterizing the efficiency of transfer of particulates in trace explosive detection portal collectors and/or swipes utilized for ion mobility spectrometry applications.
antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters.
Weber, Tilmann; Blin, Kai; Duddela, Srikanth; Krug, Daniel; Kim, Hyun Uk; Bruccoleri, Robert; Lee, Sang Yup; Fischbach, Michael A; Müller, Rolf; Wohlleben, Wolfgang; Breitling, Rainer; Takano, Eriko; Medema, Marnix H
2015-07-01
Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.
2009-01-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N
2009-06-01
One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
Optimizing R with SparkR on a commodity cluster for biomedical research.
Sedlmayr, Martin; Würfl, Tobias; Maier, Christian; Häberle, Lothar; Fasching, Peter; Prokosch, Hans-Ulrich; Christoph, Jan
2016-12-01
Medical researchers are challenged today by the enormous amount of data collected in healthcare. Analysis methods such as genome-wide association studies (GWAS) are often computationally intensive and thus require enormous resources to be performed in a reasonable amount of time. While dedicated clusters and public clouds may deliver the desired performance, their use requires upfront financial efforts or anonymous data, which is often not possible for preliminary or occasional tasks. We explored the possibilities to build a private, flexible cluster for processing scripts in R based on commodity, non-dedicated hardware of our department. For this, a GWAS-calculation in R on a single desktop computer, a Message Passing Interface (MPI)-cluster, and a SparkR-cluster were compared with regards to the performance, scalability, quality, and simplicity. The original script had a projected runtime of three years on a single desktop computer. Optimizing the script in R already yielded a significant reduction in computing time (2 weeks). By using R-MPI and SparkR, we were able to parallelize the computation and reduce the time to less than three hours (2.6 h) on already available, standard office computers. While MPI is a proven approach in high-performance clusters, it requires rather static, dedicated nodes. SparkR and its Hadoop siblings allow for a dynamic, elastic environment with automated failure handling. SparkR also scales better with the number of nodes in the cluster than MPI due to optimized data communication. R is a popular environment for clinical data analysis. The new SparkR solution offers elastic resources and allows supporting big data analysis using R even on non-dedicated resources with minimal change to the original code. To unleash the full potential, additional efforts should be invested to customize and improve the algorithms, especially with regards to data distribution. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Application of Artificial Intelligence For Euler Solutions Clustering
NASA Astrophysics Data System (ADS)
Mikhailov, V.; Galdeano, A.; Diament, M.; Gvishiani, A.; Agayan, S.; Bogoutdinov, Sh.; Graeva, E.; Sailhac, P.
Results of Euler deconvolution strongly depend on the selection of viable solutions. Synthetic calculations using multiple causative sources show that Euler solutions clus- ter in the vicinity of causative bodies even when they do not group densely about perimeter of the bodies. We have developed a clustering technique to serve as a tool for selecting appropriate solutions. The method RODIN, employed in this study, is based on artificial intelligence and was originally designed for problems of classification of large data sets. It is based on a geometrical approach to study object concentration in a finite metric space of any dimension. The method uses a formal definition of cluster and includes free parameters that facilitate the search for clusters of given proper- ties. Test on synthetic and real data showed that the clustering technique successfully outlines causative bodies more accurate than other methods of discriminating Euler solutions. In complicated field cases such as the magnetic field in the Gulf of Saint Malo region (Brittany, France), the method provides geologically insightful solutions. Other advantages of the clustering method application are: - Clusters provide solutions associated with particular bodies or parts of bodies permitting the analysis of different clusters of Euler solutions separately. This may allow computation of average param- eters for individual causative bodies. - Those measurements of the anomalous field that yield clusters also form dense clusters themselves. The application of cluster- ing technique thus outlines areas where the influence of different causative sources is more prominent. This allows one to focus on areas for reinterpretation, using different window sizes, structural indices and so on.
Silver, Sunshine C; Gardenghi, David J; Naik, Sunil G; Shepard, Eric M; Huynh, Boi Hanh; Szilagyi, Robert K; Broderick, Joan B
2014-03-01
Spore photoproduct lyase (SPL), a member of the radical S-adenosyl-L-methionine (SAM) superfamily, catalyzes the direct reversal of the spore photoproduct, a thymine dimer specific to bacterial spores, to two thymines. SPL requires SAM and a redox-active [4Fe-4S] cluster for catalysis. Mössbauer analysis of anaerobically purified SPL indicates the presence of a mixture of cluster states with the majority (40 %) as [2Fe-2S](2+) clusters and a smaller amount (15 %) as [4Fe-4S](2+) clusters. On reduction, the cluster content changes to primarily (60 %) [4Fe-4S](+). The speciation information from Mössbauer data allowed us to deconvolute iron and sulfur K-edge X-ray absorption spectra to uncover electronic (X-ray absorption near-edge structure, XANES) and geometric (extended X-ray absorption fine structure, EXAFS) structural features of the Fe-S clusters, and their interactions with SAM. The iron K-edge EXAFS data provide evidence for elongation of a [2Fe-2S] rhomb of the [4Fe-4S] cluster on binding SAM on the basis of an Fe···Fe scatterer at 3.0 Å. The XANES spectra of reduced SPL in the absence and presence of SAM overlay one another, indicating that SAM is not undergoing reductive cleavage. The X-ray absorption spectroscopy data for SPL samples and data for model complexes from the literature allowed the deconvolution of contributions from [2Fe-2S] and [4Fe-4S] clusters to the sulfur K-edge XANES spectra. The analysis of pre-edge features revealed electronic changes in the Fe-S clusters as a function of the presence of SAM. The spectroscopic findings were further corroborated by density functional theory calculations that provided insights into structural and electronic perturbations that can be correlated by considering the role of SAM as a catalyst or substrate.
Tweets clustering using latent semantic analysis
NASA Astrophysics Data System (ADS)
Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul
2017-04-01
Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.
The positioning of sustainability within residential property marketing.
Kriese, Ulrich; Scholz, Roland W
2011-01-01
This article investigates the evolution of sustainability positioning in residential property marketing to shed light on the specific role and responsibility of housebuilders and housing investors in urban development. To this end, an analysis is made of housing advertisements published in Basel, Switzerland, over a period of more than 100 years. The paper demonstrates how to draw successfully on advertisements to discern sustainability patterns in housing, using criteria situated along the dimensions building, location and people. Cluster analysis allows five clusters of sustainability positioning to be described—namely, good location, green building, comfort living, pre-sustainability and sustainability. Investor and builder types are differently located in these clusters. Location emerges as an issue which, to a large extent, is advertised independently from other sustainability issues.
Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.
Nixon, R M; Thompson, S G
2003-09-15
Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.
Network module detection: Affinity search technique with the multi-node topological overlap measure
Li, Ai; Horvath, Steve
2009-01-01
Background Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: PMID:19619323
Network module detection: Affinity search technique with the multi-node topological overlap measure.
Li, Ai; Horvath, Steve
2009-07-20
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/
Generalized Analysis Tools for Multi-Spacecraft Missions
NASA Astrophysics Data System (ADS)
Chanteur, G. M.
2011-12-01
Analysis tools for multi-spacecraft missions like CLUSTER or MMS have been designed since the end of the 90's to estimate gradients of fields or to characterize discontinuities crossed by a cluster of spacecraft. Different approaches have been presented and discussed in the book "Analysis Methods for Multi-Spacecraft Data" published as Scientific Report 001 of the International Space Science Institute in Bern, Switzerland (G. Paschmann and P. Daly Eds., 1998). On one hand the approach using methods of least squares has the advantage to apply to any number of spacecraft [1] but is not convenient to perform analytical computation especially when considering the error analysis. On the other hand the barycentric approach is powerful as it provides simple analytical formulas involving the reciprocal vectors of the tetrahedron [2] but appears limited to clusters of four spacecraft. Moreover the barycentric approach allows to derive theoretical formulas for errors affecting the estimators built from the reciprocal vectors [2,3,4]. Following a first generalization of reciprocal vectors proposed by Vogt et al [4] and despite the present lack of projects with more than four spacecraft we present generalized reciprocal vectors for a cluster made of any number of spacecraft : each spacecraft is given a positive or nul weight. The non-coplanarity of at least four spacecraft with strictly positive weights is a necessary and sufficient condition for this analysis to be enabled. Weights given to spacecraft allow to minimize the influence of some spacecraft if its location or the quality of its data are not appropriate, or simply to extract subsets of spacecraft from the cluster. Estimators presented in [2] are generalized within this new frame except for the error analysis which is still under investigation. References [1] Harvey, C. C.: Spatial Gradients and the Volumetric Tensor, in: Analysis Methods for Multi-Spacecraft Data, G. Paschmann and P. Daly (eds.), pp. 307-322, ISSI SR-001, 1998. [2] Chanteur, G.: Spatial Interpolation for Four Spacecraft: Theory, in: Analysis Methods for Multi-Spacecraft Data, G. Paschmann and P. Daly (eds.), pp. 371-393, ISSI SR-001, 1998. [3] Chanteur, G.: Accuracy of field gradient estimations by Cluster: Explanation of its dependency upon elongation and planarity of the tetrahedron, pp. 265-268, ESA SP-449, 2000. [4] Vogt, J., Paschmann, G., and Chanteur, G.: Reciprocal Vectors, pp. 33-46, ISSI SR-008, 2008.
MIXOR: a computer program for mixed-effects ordinal regression analysis.
Hedeker, D; Gibbons, R D
1996-03-01
MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.
High-dimensional cluster analysis with the Masked EM Algorithm
Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.
2014-01-01
Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
Fantini, Marco; Malinverni, Duccio; De Los Rios, Paolo; Pastore, Annalisa
2017-01-01
Direct coupling analysis (DCA) is a powerful statistical inference tool used to study protein evolution. It was introduced to predict protein folds and protein-protein interactions, and has also been applied to the prediction of entire interactomes. Here, we have used it to analyze three proteins of the iron-sulfur biogenesis machine, an essential metabolic pathway conserved in all organisms. We show that DCA can correctly reproduce structural features of the CyaY/frataxin family (a protein involved in the human disease Friedreich's ataxia) despite being based on the relatively small number of sequences allowed by its genomic distribution. This result gives us confidence in the method. Its application to the iron-sulfur cluster scaffold protein IscU, which has been suggested to function both as an ordered and a disordered form, allows us to distinguish evolutionary traces of the structured species, suggesting that, if present in the cell, the disordered form has not left evolutionary imprinting. We observe instead, for the first time, direct indications of how the protein can dimerize head-to-head and bind 4Fe4S clusters. Analysis of the alternative scaffold protein IscA provides strong support to a coordination of the cluster by a dimeric form rather than a tetramer, as previously suggested. Our analysis also suggests the presence in solution of a mixture of monomeric and dimeric species, and guides us to the prevalent one. Finally, we used DCA to analyze interactions between some of these proteins, and discuss the potentials and limitations of the method. PMID:28664160
NASA Astrophysics Data System (ADS)
Lin, Yen-Ting; Hsieh, Bau-Ching; Lin, Sheng-Chieh; Oguri, Masamune; Chen, Kai-Feng; Tanaka, Masayuki; Chiu, I.-Non; Huang, Song; Kodama, Tadayuki; Leauthaud, Alexie; More, Surhud; Nishizawa, Atsushi J.; Bundy, Kevin; Lin, Lihwai; Miyazaki, Satoshi
2017-12-01
The unprecedented depth and area surveyed by the Subaru Strategic Program with the Hyper Suprime-Cam (HSC-SSP) have enabled us to construct and publish the largest distant cluster sample out to z∼ 1 to date. In this exploratory study of cluster galaxy evolution from z = 1 to z = 0.3, we investigate the stellar mass assembly history of brightest cluster galaxies (BCGs), the evolution of stellar mass and luminosity distributions, the stellar mass surface density profile, as well as the population of radio galaxies. Our analysis is the first high-redshift application of the top N richest cluster selection, which is shown to allow us to trace the cluster galaxy evolution faithfully. Over the 230 deg2 area of the current HSC-SSP footprint, selecting the top 100 clusters in each of the four redshift bins allows us to observe the buildup of galaxy population in descendants of clusters whose z≈ 1 mass is about 2× {10}14 {M}ȯ . Our stellar mass is derived from a machine-learning algorithm, which is found to be unbiased and accurate with respect to the COSMOS data. We find very mild stellar mass growth in BCGs (about 35% between z = 1 and 0.3), and no evidence for evolution in both the total stellar mass–cluster mass correlation and the shape of the stellar mass surface density profile. We also present the first measurement of the radio luminosity distribution in clusters out to z∼ 1, and show hints of changes in the dominant accretion mode powering the cluster radio galaxies at z∼ 0.8.
Intermediate and advanced topics in multilevel logistic regression analysis
Merlo, Juan
2017-01-01
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517
Bates, Katie; Garrett, Brendan; Henderson, Richard A
2007-12-24
The rates of proton transfer from [pyrH]+ (pyr = pyrrolidine) to the binuclear complexes [Fe2S2Cl4]2- and [S2MS2FeCl2]2- (M = Mo or W) are reported. The reactions were studied using stopped-flow spectrophotometry, and the rate constants for proton transfer were determined from analysis of the kinetics of the substitution reactions of these clusters with the nucleophiles Br- or PhS- in the presence of [pyrH]+. In general, Br- is a poor nucleophile for these clusters, and proton transfer occurs before Br- binds, allowing direct measure of the rate of proton transfer from [pyrH]+ to the cluster. In contrast, PhS- is a better nucleophile, and a pathway in which PhS- binds preferentially to the cluster prior to proton transfer from [pyrH]+ usually operates. For the reaction of [Fe2S2Cl4]2- with PhS- in the presence of [pyrH]+ both pathways are observed. Comparison of the results presented in this paper with analogous studies reported earlier on cuboidal Fe-S-based clusters allows discussion of the factors which affect the rates of proton transfer in synthetic clusters including the nuclearity of the cluster core, the metal composition, and the nature of the terminal ligands. The possible relevance of these findings to the protonation sites of natural Fe-S-based clusters, including FeMo-cofactor from nitrogenase, are presented.
Ion induced electron emission statistics under Agm- cluster bombardment of Ag
NASA Astrophysics Data System (ADS)
Breuers, A.; Penning, R.; Wucher, A.
2018-05-01
The electron emission from a polycrystalline silver surface under bombardment with Agm- cluster ions (m = 1, 2, 3) is investigated in terms of ion induced kinetic excitation. The electron yield γ is determined directly by a current measurement method on the one hand and implicitly by the analysis of the electron emission statistics on the other hand. Successful measurements of the electron emission spectra ensure a deeper understanding of the ion induced kinetic electron emission process, with particular emphasis on the effect of the projectile cluster size to the yield as well as to emission statistics. The results allow a quantitative comparison to computer simulations performed for silver atoms and clusters impinging onto a silver surface.
A Survey of Variable Extragalactic Sources with XTE's All Sky Monitor (ASM)
NASA Technical Reports Server (NTRS)
Jernigan, Garrett
1998-01-01
The original goal of the project was the near real-time detection of AGN utilizing the SSC 3 of the ASM on XTE which does a deep integration on one 100 square degree region of the sky. While the SSC never performed sufficiently well to allow the success of this goal, the work on the project has led to the development of a new analysis method for coded aperture systems which has now been applied to ASM data for mapping regions near clusters of galaxies such as the Perseus Cluster and the Coma Cluster. Publications are in preparation that describe both the new method and the results from mapping clusters of galaxies.
BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J
2011-12-01
BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
Schacht, Julia; Gaston, Nicola
2016-10-18
The electronic properties of doped thiolate-protected gold clusters are often referred to as tunable, but their study to date, conducted at different levels of theory, does not allow a systematic evaluation of this claim. Here, using density functional theory, the applicability of the superatomic model to these clusters is critically evaluated, and related to the degree of structural distortion and electronic inhomogeneity in the differently doped clusters, with dopant atoms Pd, Pt, Cu, and Ag. The effect of electron number is systematically evaluated by varying the charge on the overall cluster, and the nominal number of delocalized electrons, employed in the superatomic model, is compared to the numbers obtained from Bader analysis of individual atomic charges. We find that the superatomic model is highly applicable to all of these clusters, and is able to predict and explain the changing electronic structure as a function of charge. However, significant perturbations of the model arise due to doping, due to distortions of the core structure of the Au 13 [RS(AuSR) 2 ] 6 - cluster. In addition, analysis of the electronic structure indicates that the superatomic character is distributed further across the ligand shell in the case of the doped clusters, which may have implications for the self-assembly of these clusters into materials. The prediction of appropriate clusters for such superatomic solids relies critically on such quantitative analysis of the tunability of the electronic structure. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.
A web portal for hydrodynamical, cosmological simulations
NASA Astrophysics Data System (ADS)
Ragagnin, A.; Dolag, K.; Biffi, V.; Cadolle Bel, M.; Hammer, N. J.; Krukau, A.; Petkova, M.; Steinborn, D.
2017-07-01
This article describes a data centre hosting a web portal for accessing and sharing the output of large, cosmological, hydro-dynamical simulations with a broad scientific community. It also allows users to receive related scientific data products by directly processing the raw simulation data on a remote computing cluster. The data centre has a multi-layer structure: a web portal, a job control layer, a computing cluster and a HPC storage system. The outer layer enables users to choose an object from the simulations. Objects can be selected by visually inspecting 2D maps of the simulation data, by performing highly compounded and elaborated queries or graphically by plotting arbitrary combinations of properties. The user can run analysis tools on a chosen object. These services allow users to run analysis tools on the raw simulation data. The job control layer is responsible for handling and performing the analysis jobs, which are executed on a computing cluster. The innermost layer is formed by a HPC storage system which hosts the large, raw simulation data. The following services are available for the users: (I) CLUSTERINSPECT visualizes properties of member galaxies of a selected galaxy cluster; (II) SIMCUT returns the raw data of a sub-volume around a selected object from a simulation, containing all the original, hydro-dynamical quantities; (III) SMAC creates idealized 2D maps of various, physical quantities and observables of a selected object; (IV) PHOX generates virtual X-ray observations with specifications of various current and upcoming instruments.
A formal concept analysis approach to consensus clustering of multi-experiment expression data
2014-01-01
Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
Star clusters: age, metallicity and extinction from integrated spectra
NASA Astrophysics Data System (ADS)
González Delgado, Rosa M.; Cid Fernandes, Roberto
2010-01-01
Integrated optical spectra of star clusters in the Magellanic Clouds and a few Galactic globular clusters are fitted using high-resolution spectral models for single stellar populations. The goal is to estimate the age, metallicity and extinction of the clusters, and evaluate the degeneracies among these parameters. Several sets of evolutionary models that were computed with recent high-spectral-resolution stellar libraries (MILES, GRANADA, STELIB), are used as inputs to the starlight code to perform the fits. The comparison of the results derived from this method and previous estimates available in the literature allow us to evaluate the pros and cons of each set of models to determine star cluster properties. In addition, we quantify the uncertainties associated with the age, metallicity and extinction determinations resulting from variance in the ingredients for the analysis.
Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range
NASA Astrophysics Data System (ADS)
Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.
2014-01-01
Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 < z < 0.9), all with HST imaging data available. This survey has two main objectives: to constrain dark energy (DE) using weak lensing tomography on galaxy clusters and to build a database (deep multi-band imaging allowing photometric redshift estimates, spectroscopic data, X-ray data) of rich distant clusters to study their properties. Aims: We analyse the structures of all the clusters in the DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures show the same general behaviour as regular cluster galaxies; however, in substructures, there is a deficiency of both late type and old stellar population galaxies. Late type galaxies with recent bursts of star formation seem to be missing in the substructures close to the bottom of the host cluster potential well. However, our sample would need to be increased to allow a more robust analysis. Tables 1, 2, 4 and Appendices A-C are available in electronic form at http://www.aanda.org
magHD: a new approach to multi-dimensional data storage, analysis, display and exploitation
NASA Astrophysics Data System (ADS)
Angleraud, Christophe
2014-06-01
The ever increasing amount of data and processing capabilities - following the well- known Moore's law - is challenging the way scientists and engineers are currently exploiting large datasets. The scientific visualization tools, although quite powerful, are often too generic and provide abstract views of phenomena, thus preventing cross disciplines fertilization. On the other end, Geographic information Systems allow nice and visually appealing maps to be built but they often get very confused as more layers are added. Moreover, the introduction of time as a fourth analysis dimension to allow analysis of time dependent phenomena such as meteorological or climate models, is encouraging real-time data exploration techniques that allow spatial-temporal points of interests to be detected by integration of moving images by the human brain. Magellium is involved in high performance image processing chains for satellite image processing as well as scientific signal analysis and geographic information management since its creation (2003). We believe that recent work on big data, GPU and peer-to-peer collaborative processing can open a new breakthrough in data analysis and display that will serve many new applications in collaborative scientific computing, environment mapping and understanding. The magHD (for Magellium Hyper-Dimension) project aims at developing software solutions that will bring highly interactive tools for complex datasets analysis and exploration commodity hardware, targeting small to medium scale clusters with expansion capabilities to large cloud based clusters.
Predicting healthcare outcomes in prematurely born infants using cluster analysis.
MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne
2018-05-23
Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.
Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro
2014-01-01
The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631
Clustering by soft-constraint affinity propagation: applications to gene-expression data.
Leone, Michele; Sumedha; Weigt, Martin
2007-10-15
Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.
Estimating life expectancies for US small areas: a regression framework
NASA Astrophysics Data System (ADS)
Congdon, Peter
2014-01-01
Analysis of area mortality variations and estimation of area life tables raise methodological questions relevant to assessing spatial clustering, and socioeconomic inequalities in mortality. Existing small area analyses of US life expectancy variation generally adopt ad hoc amalgamations of counties to alleviate potential instability of mortality rates involved in deriving life tables, and use conventional life table analysis which takes no account of correlated mortality for adjacent areas or ages. The alternative strategy here uses structured random effects methods that recognize correlations between adjacent ages and areas, and allows retention of the original county boundaries. This strategy generalizes to include effects of area category (e.g. poverty status, ethnic mix), allowing estimation of life tables according to area category, and providing additional stabilization of estimated life table functions. This approach is used here to estimate stabilized mortality rates, derive life expectancies in US counties, and assess trends in clustering and in inequality according to county poverty category.
NASA Astrophysics Data System (ADS)
De, Sandip; Schaefer, Bastian; Sadeghi, Ali; Sicher, Michael; Kanhere, D. G.; Goedecker, Stefan
2014-02-01
Based on a recently introduced metric for measuring distances between configurations, we introduce distance-energy (DE) plots to characterize the potential energy surface of clusters. Producing such plots is computationally feasible on the density functional level since it requires only a few hundred stable low energy configurations including the global minimum. By using standard criteria based on disconnectivity graphs and the dynamics of Lennard-Jones clusters, we show that the DE plots convey the necessary information about the character of the potential energy surface and allow us to distinguish between glassy and nonglassy systems. We then apply this analysis to real clusters at the density functional theory level and show that both glassy and nonglassy clusters can be found in simulations. It turns out that among our investigated clusters only those can be synthesized experimentally which exhibit a nonglassy landscape.
Delpla, Ianis; Florea, Mihai; Pelletier, Geneviève; Rodriguez, Manuel J
2018-06-04
Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. Copyright © 2018 Elsevier Ltd. All rights reserved.
Applying Model Analysis to a Resource-Based Analysis of the Force and Motion Conceptual Evaluation
ERIC Educational Resources Information Center
Smith, Trevor I.; Wittmann, Michael C.; Carter, Tom
2014-01-01
Previously, we analyzed the Force and Motion Conceptual Evaluation in terms of a resources-based model that allows for clustering of questions so as to provide useful information on how students correctly or incorrectly reason about physics. In this paper, we apply model analysis to show that the associated model plots provide more information…
Chemical structural analysis of diamondlike carbon films: II. Raman analysis
NASA Astrophysics Data System (ADS)
Takabayashi, Susumu; Ješko, Radek; Shinohara, Masanori; Hayashi, Hiroyuki; Sugimoto, Rintaro; Ogawa, Shuichi; Takakuwa, Yuji
2018-02-01
The chemical structure of diamondlike carbon (DLC) films, synthesized by photoemission-assisted glow discharge, has been analyzed by Raman spectroscopy. Raman analysis in conjunction with the sp2 cluster model clarified the film structure. The sp2 clusters in DLC films synthesized at low temperature preferred various aliphatic structures. Sufficient argon-ion assist allowed for formation of less strained DLC films containing large amounts of hydrogen. As the synthesis temperature was increased, thermal desorption of hydrogen left carbon dangling bonds with active unpaired electrons in the films, and the reactions that followed created strained films containing aromatic sp2 clusters. In parallel, the desorption of methane molecules from the growing surface by chemisorption of hydrogen radicals prevented the action of argon ions, promoting internal strain of the films. However, in synthesis at very high temperature, where sp2 clusters are sufficiently dominant, the strain was dissolved gradually. In contrast, the DLC films synthesized at low temperature were more stable than other films synthesized at the same temperature because of stable hydrogen-carbon bonds in the films.
Rehm, Thomas; Baums, Christoph G; Strommenger, Birgit; Beyerbach, Martin; Valentin-Weigand, Peter; Goethe, Ralph
2007-01-01
Amplified fragment length polymorphism (AFLP) typing was applied to 116 Streptococcus suis isolates with different clinical backgrounds (invasive/pneumonia/carrier/human) and with known profiles of virulence-associated genes (cps1, -2, -7 and -9, as well as mrp, epf and sly). A dendrogram was generated that allowed identification of two clusters (A and C) with different subclusters (A1, A2, C1 and C2) and two heterogeneous groups of strains (B and D). For comparison, three strains from each AFLP subcluster and group were subjected to multilocus sequence typing (MLST) analysis. The closest relationship and lowest diversity were found for patterns clustering within AFLP subcluster A1, which corresponded with sequence type (ST) complex 1. Strains within subcluster A1 were mainly invasive cps1 and mrp+ epf+ (or epf*) sly+ cps2+ strains of porcine or human origin. A new finding of this study was the clustering of invasive mrp* cps9 isolates within subcluster A2. MLST analysis suggested that A2 correlates with a single ST complex (ST87). In contrast to A1 and A2, subclusters C1 and C2 contained mainly pneumonia isolates of genotype cps7 or cps2 and epf- sly-. In conclusion, this study demonstrates that AFLP allows identification of clusters of S. suis strains with clinical relevance.
Wavelet-based clustering of resting state MRI data in the rat.
Medda, Alessio; Hoffmann, Lukas; Magnuson, Matthew; Thompson, Garth; Pan, Wen-Ju; Keilholz, Shella
2016-01-01
While functional connectivity has typically been calculated over the entire length of the scan (5-10min), interest has been growing in dynamic analysis methods that can detect changes in connectivity on the order of cognitive processes (seconds). Previous work with sliding window correlation has shown that changes in functional connectivity can be observed on these time scales in the awake human and in anesthetized animals. This exciting advance creates a need for improved approaches to characterize dynamic functional networks in the brain. Previous studies were performed using sliding window analysis on regions of interest defined based on anatomy or obtained from traditional steady-state analysis methods. The parcellation of the brain may therefore be suboptimal, and the characteristics of the time-varying connectivity between regions are dependent upon the length of the sliding window chosen. This manuscript describes an algorithm based on wavelet decomposition that allows data-driven clustering of voxels into functional regions based on temporal and spectral properties. Previous work has shown that different networks have characteristic frequency fingerprints, and the use of wavelets ensures that both the frequency and the timing of the BOLD fluctuations are considered during the clustering process. The method was applied to resting state data acquired from anesthetized rats, and the resulting clusters agreed well with known anatomical areas. Clusters were highly reproducible across subjects. Wavelet cross-correlation values between clusters from a single scan were significantly higher than the values from randomly matched clusters that shared no temporal information, indicating that wavelet-based analysis is sensitive to the relationship between areas. Copyright © 2015 Elsevier Inc. All rights reserved.
Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2010-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138
Identifying technical aliases in SELDI mass spectra of complex mixtures of proteins
2013-01-01
Background Biomarker discovery datasets created using mass spectrum protein profiling of complex mixtures of proteins contain many peaks that represent the same protein with different charge states. Correlated variables such as these can confound the statistical analyses of proteomic data. Previously we developed an algorithm that clustered mass spectrum peaks that were biologically or technically correlated. Here we demonstrate an algorithm that clusters correlated technical aliases only. Results In this paper, we propose a preprocessing algorithm that can be used for grouping technical aliases in mass spectrometry protein profiling data. The stringency of the variance allowed for clustering is customizable, thereby affecting the number of peaks that are clustered. Subsequent analysis of the clusters, instead of individual peaks, helps reduce difficulties associated with technically-correlated data, and can aid more efficient biomarker identification. Conclusions This software can be used to pre-process and thereby decrease the complexity of protein profiling proteomics data, thus simplifying the subsequent analysis of biomarkers by decreasing the number of tests. The software is also a practical tool for identifying which features to investigate further by purification, identification and confirmation. PMID:24010718
Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing.
Trudgian, David C; Mirzaei, Hamid
2012-12-07
We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include modular local and remote scheduling for data processing jobs. The pipeline can now be run on a single PC or server, a local cluster, a remote HPC cluster, and/or the Amazon Web Services (AWS) cloud. We provide public images that allow easy deployment of CPFP in its entirety in the AWS cloud. This significantly reduces the effort necessary to use the software, and allows proteomics laboratories to pay for compute time ad hoc, rather than obtaining and maintaining expensive local server clusters. Alternatively the Amazon cloud can be used to increase the throughput of a local installation of CPFP as necessary. We demonstrate that cloud CPFP allows users to process data at higher speed than local installations but with similar cost and lower staff requirements. In addition to the computational improvements, the web interface to CPFP is simplified, and other functionalities are enhanced. The software is under active development at two leading institutions and continues to be released under an open-source license at http://cpfp.sourceforge.net.
Transcriptome database resource and gene expression atlas for the rose
2012-01-01
Background For centuries roses have been selected based on a number of traits. Little information exists on the genetic and molecular basis that contributes to these traits, mainly because information on expressed genes for this economically important ornamental plant is scarce. Results Here, we used a combination of Illumina and 454 sequencing technologies to generate information on Rosa sp. transcripts using RNA from various tissues and in response to biotic and abiotic stresses. A total of 80714 transcript clusters were identified and 76611 peptides have been predicted among which 20997 have been clustered into 13900 protein families. BLASTp hits in closely related Rosaceae species revealed that about half of the predicted peptides in the strawberry and peach genomes have orthologs in Rosa dataset. Digital expression was obtained using RNA samples from organs at different development stages and under different stress conditions. qPCR validated the digital expression data for a selection of 23 genes with high or low expression levels. Comparative gene expression analyses between the different tissues and organs allowed the identification of clusters that are highly enriched in given tissues or under particular conditions, demonstrating the usefulness of the digital gene expression analysis. A web interface ROSAseq was created that allows data interrogation by BLAST, subsequent analysis of DNA clusters and access to thorough transcript annotation including best BLAST matches on Fragaria vesca, Prunus persica and Arabidopsis. The rose peptides dataset was used to create the ROSAcyc resource pathway database that allows access to the putative genes and enzymatic pathways. Conclusions The study provides useful information on Rosa expressed genes, with thorough annotation and an overview of expression patterns for transcripts with good accuracy. PMID:23164410
From virtual clustering analysis to self-consistent clustering analysis: a mathematical study
NASA Astrophysics Data System (ADS)
Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam
2018-03-01
In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.
Interactive visual exploration and analysis of origin-destination data
NASA Astrophysics Data System (ADS)
Ding, Linfang; Meng, Liqiu; Yang, Jian; Krisp, Jukka M.
2018-05-01
In this paper, we propose a visual analytics approach for the exploration of spatiotemporal interaction patterns of massive origin-destination data. Firstly, we visually query the movement database for data at certain time windows. Secondly, we conduct interactive clustering to allow the users to select input variables/features (e.g., origins, destinations, distance, and duration) and to adjust clustering parameters (e.g. distance threshold). The agglomerative hierarchical clustering method is applied for the multivariate clustering of the origin-destination data. Thirdly, we design a parallel coordinates plot for visualizing the precomputed clusters and for further exploration of interesting clusters. Finally, we propose a gradient line rendering technique to show the spatial and directional distribution of origin-destination clusters on a map view. We implement the visual analytics approach in a web-based interactive environment and apply it to real-world floating car data from Shanghai. The experiment results show the origin/destination hotspots and their spatial interaction patterns. They also demonstrate the effectiveness of our proposed approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andersson, Karl E.; /Stockholm U. /SLAC; Peterson, J.R.
2007-04-17
We propose a new Monte Carlo method to study extended X-ray sources with the European Photon Imaging Camera (EPIC) aboard XMM Newton. The Smoothed Particle Inference (SPI) technique, described in a companion paper, is applied here to the EPIC data for the clusters of galaxies Abell 1689, Centaurus and RXJ 0658-55 (the ''bullet cluster''). We aim to show the advantages of this method of simultaneous spectral-spatial modeling over traditional X-ray spectral analysis. In Abell 1689 we confirm our earlier findings about structure in temperature distribution and produce a high resolution temperature map. We also confirm our findings about velocity structuremore » within the gas. In the bullet cluster, RXJ 0658-55, we produce the highest resolution temperature map ever to be published of this cluster allowing us to trace what looks like the motion of the bullet in the cluster. We even detect a south to north temperature gradient within the bullet itself. In the Centaurus cluster we detect, by dividing up the luminosity of the cluster in bands of gas temperatures, a striking feature to the north-east of the cluster core. We hypothesize that this feature is caused by a subcluster left over from a substantial merger that slightly displaced the core. We conclude that our method is very powerful in determining the spatial distributions of plasma temperatures and very useful for systematic studies in cluster structure.« less
Visual verification and analysis of cluster detection for molecular dynamics.
Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas
2007-01-01
A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.
Rapid identification of Enterobacter hormaechei and Enterobacter cloacae genetic cluster III.
Ohad, S; Block, C; Kravitz, V; Farber, A; Pilo, S; Breuer, R; Rorman, E
2014-05-01
Enterobacter cloacae complex bacteria are of both clinical and environmental importance. Phenotypic methods are unable to distinguish between some of the species in this complex, which often renders their identification incomplete. The goal of this study was to develop molecular assays to identify Enterobacter hormaechei and Ent. cloacae genetic cluster III which are relatively frequently encountered in clinical material. The molecular assays developed in this study are qPCR technology based and served to identify both Ent. hormaechei and Ent. cloacae genetic cluster III. qPCR results were compared to hsp60 sequence analysis. Most clinical isolates were assigned to Ent. hormaechei subsp. steigerwaltii and Ent. cloacae genetic cluster III. The latter was proportionately more frequently isolated from bloodstream infections than from other material (P < 0·05). The qPCR assays detecting Ent. hormaechei and Ent. cloacae genetic cluster III demonstrated high sensitivity and specificity. The presented qPCR assays allow accurate and rapid identification of clinical isolates of the Ent. cloacae complex. The improved identifications obtained can specifically assist analysis of Ent. hormaechei and Ent. cloacae genetic cluster III in nosocomial outbreaks and can promote rapid environmental monitoring. An association was observed between Ent. cloacae cluster III and systemic infection that deserves further attention. © 2014 The Society for Applied Microbiology.
Hahus, Ian; Migliaccio, Kati; Douglas-Mankin, Kyle; Klarenberg, Geraldine; Muñoz-Carpena, Rafael
2018-04-27
Hierarchical and partitional cluster analyses were used to compartmentalize Water Conservation Area 1, a managed wetland within the Arthur R. Marshall Loxahatchee National Wildlife Refuge in southeast Florida, USA, based on physical, biological, and climatic geospatial attributes. Single, complete, average, and Ward's linkages were tested during the hierarchical cluster analyses, with average linkage providing the best results. In general, the partitional method, partitioning around medoids, found clusters that were more evenly sized and more spatially aggregated than those resulting from the hierarchical analyses. However, hierarchical analysis appeared to be better suited to identify outlier regions that were significantly different from other areas. The clusters identified by geospatial attributes were similar to clusters developed for the interior marsh in a separate study using water quality attributes, suggesting that similar factors have influenced variations in both the set of physical, biological, and climatic attributes selected in this study and water quality parameters. However, geospatial data allowed further subdivision of several interior marsh clusters identified from the water quality data, potentially indicating zones with important differences in function. Identification of these zones can be useful to managers and modelers by informing the distribution of monitoring equipment and personnel as well as delineating regions that may respond similarly to future changes in management or climate.
OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.
Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A
2014-10-16
The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P <0.0001) among the four clusters. The four clusters divided the sample into severity levels: Cluster 1 reflects the lowest average levels across all symptoms, and cluster 4 reflects the highest average levels. Clusters 2 and 3 capture moderate symptoms levels. Clusters 2 and 3 differed mainly in profiles of anxiety and depression, with Cluster 2 having lower levels of depression and anxiety than Cluster 3, despite higher levels of pain. The results of the cluster analysis of the external sample (n = 478) looked very similar to those found in the original cluster analysis, except for a slight difference in sleep problems. This was despite having patients in the validation sample who were significantly younger (P <0.0001) and had more severe symptoms (higher FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.
Fast gene ontology based clustering for microarray experiments.
Ovaska, Kristian; Laakso, Marko; Hautaniemi, Sampsa
2008-11-21
Analysis of a microarray experiment often results in a list of hundreds of disease-associated genes. In order to suggest common biological processes and functions for these genes, Gene Ontology annotations with statistical testing are widely used. However, these analyses can produce a very large number of significantly altered biological processes. Thus, it is often challenging to interpret GO results and identify novel testable biological hypotheses. We present fast software for advanced gene annotation using semantic similarity for Gene Ontology terms combined with clustering and heat map visualisation. The methodology allows rapid identification of genes sharing the same Gene Ontology cluster. Our R based semantic similarity open-source package has a speed advantage of over 2000-fold compared to existing implementations. From the resulting hierarchical clustering dendrogram genes sharing a GO term can be identified, and their differences in the gene expression patterns can be seen from the heat map. These methods facilitate advanced annotation of genes resulting from data analysis.
Phonological awareness of English by Chinese and Korean bilinguals
NASA Astrophysics Data System (ADS)
Chung, Hyunjoo; Schmidt, Anna; Cheng, Tse-Hsuan
2002-05-01
This study examined non-native speakers phonological awareness of spoken English. Chinese speaking adults, Korean speaking adults, and English speaking adults were tested. The L2 speakers had been in the US for less than 6 months. Chinese and Korean allow no consonant clusters and have limited numbers of consonants allowable in syllable final position, whereas English allows a variety of clusters and various consonants in syllable final position. Subjects participated in eight phonological awareness tasks (4 replacement tasks and 4 deletion tasks) based on English phonology. In addition, digit span was measured. Preliminary analysis indicates that Chinese and Korean speaker errors appear to reflect L1 influences (such as orthography, phonotactic constraints, and phonology). All three groups of speakers showed more difficulty with manipulation of rime than onset, especially with postvocalic nasals. Results will be discussed in terms of syllable structure, L1 influence, and association with short term memory.
Systematic detection and classification of earthquake clusters in Italy
NASA Astrophysics Data System (ADS)
Poli, P.; Ben-Zion, Y.; Zaliapin, I. V.
2017-12-01
We perform a systematic analysis of spatio-temporal clustering of 2007-2017 earthquakes in Italy with magnitudes m>3. The study employs the nearest-neighbor approach of Zaliapin and Ben-Zion [2013a, 2013b] with basic data-driven parameters. The results indicate that seismicity in Italy (an extensional tectonic regime) is dominated by clustered events, with smaller proportion of background events than in California. Evaluation of internal cluster properties allows separation of swarm-like from burst-like seismicity. This classification highlights a strong geographical coherence of cluster properties. Swarm-like seismicity are dominant in regions characterized by relatively slow deformation with possible elevated temperature and/or fluids (e.g. Alto Tiberina, Pollino), while burst-like seismicity are observed in crystalline tectonic regions (Alps and Calabrian Arc) and in Central Italy where moderate to large earthquakes are frequent (e.g. L'Aquila, Amatrice). To better assess the variation of seismicity style across Italy, we also perform a clustering analysis with region-specific parameters. This analysis highlights clear spatial changes of the threshold separating background and clustered seismicity, and permits better resolution of different clusters in specific geological regions. For example, a large proportion of repeaters is found in the Etna region as expected for volcanic-induced seismicity. A similar behavior is observed in the northern Apennines with high pore pressure associated with mantle degassing. The observed variations of earthquakes properties highlight shortcomings of practices using large-scale average seismic properties, and points to connections between seismicity and local properties of the lithosphere. The observations help to improve the understanding of the physics governing the occurrence of earthquakes in different regions.
Cluster tool solution for fabrication and qualification of advanced photomasks
NASA Astrophysics Data System (ADS)
Schaetz, Thomas; Hartmann, Hans; Peter, Kai; Lalanne, Frederic P.; Maurin, Olivier; Baracchi, Emanuele; Miramond, Corinne; Brueck, Hans-Juergen; Scheuring, Gerd; Engel, Thomas; Eran, Yair; Sommer, Karl
2000-07-01
The reduction of wavelength in optical lithography, phase shift technology and optical proximity correction (OPC), requires a rapid increase in cost effective qualification of photomasks. The knowledge about CD variation, loss of pattern fidelity especially for OPC pattern and mask defects concerning the impact on wafer level is becoming a key issue for mask quality assessment. As part of the European Community supported ESPRIT projection 'Q-CAP', a new cluster concept has been developed, which allows the combination of hardware tools as well as software tools via network communication. It is designed to be open for any tool manufacturer and mask hose. The bi-directional network access allows the exchange of all relevant mask data including grayscale images, measurement results, lithography parameters, defect coordinates, layout data, process data etc. and its storage to a SQL database. The system uses SEMI format descriptions as well as standard network hardware and software components for the client server communication. Each tool is used mainly to perform its specific application without using expensive time to perform optional analysis, but the availability of the database allows each component to share the full data ste gathered by all components. Therefore, the cluster can be considered as one single virtual tool. The paper shows the advantage of the cluster approach, the benefits of the tools linked together already, and a vision of a mask house in the near future.
Sensitivity Analysis of Multiple Informant Models When Data Are Not Missing at Random
ERIC Educational Resources Information Center
Blozis, Shelley A.; Ge, Xiaojia; Xu, Shu; Natsuaki, Misaki N.; Shaw, Daniel S.; Neiderhiser, Jenae M.; Scaramella, Laura V.; Leve, Leslie D.; Reiss, David
2013-01-01
Missing data are common in studies that rely on multiple informant data to evaluate relationships among variables for distinguishable individuals clustered within groups. Estimation of structural equation models using raw data allows for incomplete data, and so all groups can be retained for analysis even if only 1 member of a group contributes…
Frachon, E; Hamon, S; Nicolas, L; de Barjac, H
1991-01-01
Gas-liquid chromatography of fatty acid methyl esters and numerical analysis were carried out with 114 Bacillus sphaericus strains. Since only two clusters harbored mosquitocidal strains, this technique could be developed in screening programs to limit bioassays on mosquito larvae. It also allows differentiation of highly homologous strains. PMID:1781697
Knutson, Stacy T.; Westwood, Brian M.; Leuthaeuser, Janelle B.; Turner, Brandon E.; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D.; Harper, Angela F.; Brown, Shoshana D.; Morris, John H.; Ferrin, Thomas E.; Babbitt, Patricia C.
2017-01-01
Abstract Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification—amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two‐Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure‐Function Linkage Database, SFLD) self‐identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self‐identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well‐curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP‐identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F‐measure and performance analysis on the enolase search results and comparison to GEMMA and SCI‐PHY demonstrate that TuLIP avoids the over‐division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. PMID:28054422
Knutson, Stacy T; Westwood, Brian M; Leuthaeuser, Janelle B; Turner, Brandon E; Nguyendac, Don; Shea, Gabrielle; Kumar, Kiran; Hayden, Julia D; Harper, Angela F; Brown, Shoshana D; Morris, John H; Ferrin, Thomas E; Babbitt, Patricia C; Fetrow, Jacquelyn S
2017-04-01
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Tremblay, Marlène; Hess, Justin P; Christenson, Brock M; McIntyre, Kolby K; Smink, Ben; van der Kamp, Arjen J; de Jong, Lisanne G; Döpfer, Dörte
2016-07-01
Automatic milking systems (AMS) are implemented in a variety of situations and environments. Consequently, there is a need to characterize individual farming practices and regional challenges to streamline management advice and objectives for producers. Benchmarking is often used in the dairy industry to compare farms by computing percentile ranks of the production values of groups of farms. Grouping for conventional benchmarking is commonly limited to the use of a few factors such as farms' geographic region or breed of cattle. We hypothesized that herds' production data and management information could be clustered in a meaningful way using cluster analysis and that this clustering approach would yield better peer groups of farms than benchmarking methods based on criteria such as country, region, breed, or breed and region. By applying mixed latent-class model-based cluster analysis to 529 North American AMS dairy farms with respect to 18 significant risk factors, 6 clusters were identified. Each cluster (i.e., peer group) represented unique management styles, challenges, and production patterns. When compared with peer groups based on criteria similar to the conventional benchmarking standards, the 6 clusters better predicted milk produced (kilograms) per robot per day. Each cluster represented a unique management and production pattern that requires specialized advice. For example, cluster 1 farms were those that recently installed AMS robots, whereas cluster 3 farms (the most northern farms) fed high amounts of concentrates through the robot to compensate for low-energy feed in the bunk. In addition to general recommendations for farms within a cluster, individual farms can generate their own specific goals by comparing themselves to farms within their cluster. This is very comparable to benchmarking but adds the specific characteristics of the peer group, resulting in better farm management advice. The improvement that cluster analysis allows for is characterized by the multivariable approach and the fact that comparisons between production units can be accomplished within a cluster and between clusters as a choice. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Supervised group Lasso with applications to microarray data analysis
Ma, Shuangge; Song, Xiao; Huang, Jian
2007-01-01
Background A tremendous amount of efforts have been devoted to identifying genes for diagnosis and prognosis of diseases using microarray gene expression data. It has been demonstrated that gene expression data have cluster structure, where the clusters consist of co-regulated genes which tend to have coordinated functions. However, most available statistical methods for gene selection do not take into consideration the cluster structure. Results We propose a supervised group Lasso approach that takes into account the cluster structure in gene expression data for gene selection and predictive model building. For gene expression data without biological cluster information, we first divide genes into clusters using the K-means approach and determine the optimal number of clusters using the Gap method. The supervised group Lasso consists of two steps. In the first step, we identify important genes within each cluster using the Lasso method. In the second step, we select important clusters using the group Lasso. Tuning parameters are determined using V-fold cross validation at both steps to allow for further flexibility. Prediction performance is evaluated using leave-one-out cross validation. We apply the proposed method to disease classification and survival analysis with microarray data. Conclusion We analyze four microarray data sets using the proposed approach: two cancer data sets with binary cancer occurrence as outcomes and two lymphoma data sets with survival outcomes. The results show that the proposed approach is capable of identifying a small number of influential gene clusters and important genes within those clusters, and has better prediction performance than existing methods. PMID:17316436
Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy
NASA Astrophysics Data System (ADS)
Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.
2012-11-01
Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.
Fetterman, Christina D; Rannala, Bruce; Walter, Michael A
2008-09-24
Members of the forkhead gene family act as transcription regulators in biological processes including development and metabolism. The evolution of forkhead genes has not been widely examined and selection pressures at the molecular level influencing subfamily evolution and differentiation have not been explored. Here, in silico methods were used to examine selection pressures acting on the coding sequence of five multi-species FOX protein subfamily clusters; FoxA, FoxD, FoxI, FoxO and FoxP. Application of site models, which estimate overall selection pressures on individual codons throughout the phylogeny, showed that the amino acid changes observed were either neutral or under negative selection. Branch-site models, which allow estimated selection pressures along specified lineages to vary as compared to the remaining phylogeny, identified positive selection along branches leading to the FoxA3 and Protostomia clades in the FoxA cluster and the branch leading to the FoxO3 clade in the FoxO cluster. Residues that may differentiate paralogs were identified in the FoxA and FoxO clusters and residues that differentiate orthologs were identified in the FoxA cluster. Neutral amino acid changes were identified in the forkhead domain of the FoxA, FoxD and FoxP clusters while positive selection was identified in the forkhead domain of the Protostomia lineage of the FoxA cluster. A series of residues under strong negative selection adjacent to the N- and C-termini of the forkhead domain were identified in all clusters analyzed suggesting a new method for refinement of domain boundaries. Extrapolation of domains among cluster members in conjunction with selection pressure information allowed prediction of residue function in the FoxA, FoxO and FoxP clusters and exclusion of known domain function in residues of the FoxA and FoxI clusters. Consideration of selection pressures observed in conjunction with known functional information allowed prediction of residue function and refinement of domain boundaries. Identification of residues that differentiate orthologs and paralogs provided insight into the development and functional consequences of paralogs and forkhead subfamily composition differences among species. Overall we found that after gene duplication of forkhead family members, rapid differentiation and subsequent fixation of amino acid changes through negative selection has occurred.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637
X-Ray Morphological Analysis of the Planck ESZ Clusters
NASA Astrophysics Data System (ADS)
Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph
2017-09-01
X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev-Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev-Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.
X-Ray Morphological Analysis of the Planck ESZ Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lovisari, Lorenzo; Forman, William R.; Jones, Christine
2017-09-01
X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper wemore » determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.« less
An information model for use in software management estimation and prediction
NASA Technical Reports Server (NTRS)
Li, Ningda R.; Zelkowitz, Marvin V.
1993-01-01
This paper describes the use of cluster analysis for determining the information model within collected software engineering development data at the NASA/GSFC Software Engineering Laboratory. We describe the Software Management Environment tool that allows managers to predict development attributes during early phases of a software project and the modifications we propose to allow it to develop dynamic models for better predictions of these attributes.
Scientific Cluster Deployment and Recovery - Using puppet to simplify cluster management
NASA Astrophysics Data System (ADS)
Hendrix, Val; Benjamin, Doug; Yao, Yushu
2012-12-01
Deployment, maintenance and recovery of a scientific cluster, which has complex, specialized services, can be a time consuming task requiring the assistance of Linux system administrators, network engineers as well as domain experts. Universities and small institutions that have a part-time FTE with limited time for and knowledge of the administration of such clusters can be strained by such maintenance tasks. This current work is the result of an effort to maintain a data analysis cluster (DAC) with minimal effort by a local system administrator. The realized benefit is the scientist, who is the local system administrator, is able to focus on the data analysis instead of the intricacies of managing a cluster. Our work provides a cluster deployment and recovery process (CDRP) based on the puppet configuration engine allowing a part-time FTE to easily deploy and recover entire clusters with minimal effort. Puppet is a configuration management system (CMS) used widely in computing centers for the automatic management of resources. Domain experts use Puppet's declarative language to define reusable modules for service configuration and deployment. Our CDRP has three actors: domain experts, a cluster designer and a cluster manager. The domain experts first write the puppet modules for the cluster services. A cluster designer would then define a cluster. This includes the creation of cluster roles, mapping the services to those roles and determining the relationships between the services. Finally, a cluster manager would acquire the resources (machines, networking), enter the cluster input parameters (hostnames, IP addresses) and automatically generate deployment scripts used by puppet to configure it to act as a designated role. In the event of a machine failure, the originally generated deployment scripts along with puppet can be used to easily reconfigure a new machine. The cluster definition produced in our CDRP is an integral part of automating cluster deployment in a cloud environment. Our future cloud efforts will further build on this work.
Spatial Analysis of Great Lakes Regional Icing Cloud Liquid Water Content
NASA Technical Reports Server (NTRS)
Ryerson, Charles C.; Koenig, George G.; Melloh, Rae A.; Meese, Debra A.; Reehorst, Andrew L.; Miller, Dean R.
2003-01-01
Abstract Clustering of cloud microphysical conditions, such as liquid water content (LWC) and drop size, can affect the rate and shape of ice accretion and the airworthiness of aircraft. Clustering may also degrade the accuracy of cloud LWC measurements from radars and microwave radiometers being developed by the government for remotely mapping icing conditions ahead of aircraft in flight. This paper evaluates spatial clustering of LWC in icing clouds using measurements collected during NASA research flights in the Great Lakes region. We used graphical and analytical approaches to describe clustering. The analytical approach involves determining the average size of clusters and computing a clustering intensity parameter. We analyzed flight data composed of 1-s-frequency LWC measurements for 12 periods ranging from 17.4 minutes (73 km) to 45.3 minutes (190 km) in duration. Graphically some flight segments showed evidence of consistency with regard to clustering patterns. Cluster intensity varied from 0.06, indicating little clustering, to a high of 2.42. Cluster lengths ranged from 0.1 minutes (0.6 km) to 4.1 minutes (17.3 km). Additional analyses will allow us to determine if clustering climatologies can be developed to characterize cluster conditions by region, time period, or weather condition. Introduction
Microstructure and tuber properties of potato varieties with different genetic profiles.
Romano, Annalisa; Masi, Paolo; Aversano, Riccardo; Carucci, Francesca; Palomba, Sara; Carputo, Domenico
2018-01-15
The objectives of this research were to study tuber starch characteristics and chemical - thermal properties of 21 potato varieties, and to determine their genetic diversity through SSR markers. Starch granular size varied among samples, with a wide diameter distribution (5-85μm), while granule shapes were similar. Differential Scanning Calorimeter analysis showed that the transition temperatures (69°C-74°C) and enthalpies of gelatinization (0.9J/g-3.8J/g) of tubers were also variety dependent. SSR analysis allowed the detection of 157 alleles across all varieties, with an average value of 6.8 alleles per locus. Variety-specific alleles were also identified. SSR-based cluster analysis revealed that varieties with interesting quality attributes were distributed among all clusters and sub-clusters, suggesting that the genetic basis of traits analyzed may differ among our varieties. The information obtained in this study may be useful to identify and develop varieties with slowly digestible starch. Copyright © 2017 Elsevier Ltd. All rights reserved.
Adamek, Martina; Alanjary, Mohammad; Sales-Ortells, Helena; Goodfellow, Michael; Bull, Alan T; Winkler, Anika; Wibberg, Daniel; Kalinowski, Jörn; Ziemert, Nadine
2018-06-01
Genome mining tools have enabled us to predict biosynthetic gene clusters that might encode compounds with valuable functions for industrial and medical applications. With the continuously increasing number of genomes sequenced, we are confronted with an overwhelming number of predicted clusters. In order to guide the effective prioritization of biosynthetic gene clusters towards finding the most promising compounds, knowledge about diversity, phylogenetic relationships and distribution patterns of biosynthetic gene clusters is necessary. Here, we provide a comprehensive analysis of the model actinobacterial genus Amycolatopsis and its potential for the production of secondary metabolites. A phylogenetic characterization, together with a pan-genome analysis showed that within this highly diverse genus, four major lineages could be distinguished which differed in their potential to produce secondary metabolites. Furthermore, we were able to distinguish gene cluster families whose distribution correlated with phylogeny, indicating that vertical gene transfer plays a major role in the evolution of secondary metabolite gene clusters. Still, the vast majority of the diverse biosynthetic gene clusters were derived from clusters unique to the genus, and also unique in comparison to a database of known compounds. Our study on the locations of biosynthetic gene clusters in the genomes of Amycolatopsis' strains showed that clusters acquired by horizontal gene transfer tend to be incorporated into non-conserved regions of the genome thereby allowing us to distinguish core and hypervariable regions in Amycolatopsis genomes. Using a comparative genomics approach, it was possible to determine the potential of the genus Amycolatopsis to produce a huge diversity of secondary metabolites. Furthermore, the analysis demonstrates that horizontal and vertical gene transfer play an important role in the acquisition and maintenance of valuable secondary metabolites. Our results cast light on the interconnections between secondary metabolite gene clusters and provide a way to prioritize biosynthetic pathways in the search and discovery of novel compounds.
2015-01-01
Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893
NASA Astrophysics Data System (ADS)
Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw
2014-01-01
We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.
VizieR Online Data Catalog: HSTPROMO catalogs. II. Kinematic profiles (Watkins+, 2015)
NASA Astrophysics Data System (ADS)
Watkins, L. L.; van der Marel, R. P.; Bellini, A.; Anderson, J.
2015-07-01
In Bellini et al. (2014, J/ApJ/797/115, Paper 1), we recently presented a set of Hubble Space Telescope (HST) proper-motion catalogs for 22 Milky Way globular clusters. These catalogs are the result of a search through archival HST data to find fields in Galactic globular clusters that had been previously observed for other projects at multiple epochs, allowing us to measure proper motions. Thanks to both the stability and longevity of HST, we were able to achieve exceptional precision over baselines of up to 12yr. We begin here an analysis of the kinematical profiles and maps for each of the 22 clusters. (2 data files).
NASA Technical Reports Server (NTRS)
Stauffer, John R.; Petre, Robert (Technical Monitor)
2000-01-01
This grant was originally awarded to Dr. Charles Prosser, who died tragically in a car accident in Tucson in 1998. We had hoped to finish the work Charles had started, which involved analysis of ROSAT data for three programs (observations of the clusters NGC2232, Crl4O and the Pleiades) and also analysis of optical data for each cluster in order to allow interpretation of the ROSAT observations. The Pleiades portion of the program was completed during the past year, and a paper published. We have obtained optical imaging of the other two clusters, and those data are being analyzed. Dr. Brian Patten intends to complete analysis of the ROSAT observations and to combine those data with the optical photometry, but progress on those efforts has been slow due to the press of other work (Dr. Patten is responsible for the pipeline processing of data from SWAS). We intend to publish those results as soon as we can, but it will now be completed without further support from this grant.
Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003
NASA Astrophysics Data System (ADS)
Di Salvo, Roberto; Montalto, Placido; Nunnari, Giuseppe; Neri, Marco; Puglisi, Giuseppe
2013-02-01
Time series clustering is an important task in data analysis issues in order to extract implicit, previously unknown, and potentially useful information from a large collection of data. Finding useful similar trends in multivariate time series represents a challenge in several areas including geophysics environment research. While traditional time series analysis methods deal only with univariate time series, multivariate time series analysis is a more suitable approach in the field of research where different kinds of data are available. Moreover, the conventional time series clustering techniques do not provide desired results for geophysical datasets due to the huge amount of data whose sampling rate is different according to the nature of signal. In this paper, a novel approach concerning geophysical multivariate time series clustering is proposed using dynamic time series segmentation and Self Organizing Maps techniques. This method allows finding coupling among trends of different geophysical data recorded from monitoring networks at Mt. Etna spanning from 1996 to 2003, when the transition from summit eruptions to flank eruptions occurred. This information can be used to carry out a more careful evaluation of the state of volcano and to define potential hazard assessment at Mt. Etna.
NASA Technical Reports Server (NTRS)
Sutherland, Betsy M.; Georgakilas, Alexandros G.; Bennett, Paula V.; Laval, Jacques; Sutherland, John C.; Gewirtz, A. M. (Principal Investigator)
2003-01-01
Assessing DNA damage induction, repair and consequences of such damages requires measurement of specific DNA lesions by methods that are independent of biological responses to such lesions. Lesions affecting one DNA strand (altered bases, abasic sites, single strand breaks (SSB)) as well as damages affecting both strands (clustered damages, double strand breaks) can be quantified by direct measurement of DNA using gel electrophoresis, gel imaging and number average length analysis. Damage frequencies as low as a few sites per gigabase pair (10(9)bp) can be quantified by this approach in about 50ng of non-radioactive DNA, and single molecule methods may allow such measurements in DNA from single cells. This review presents the theoretical basis, biochemical requirements and practical aspects of this approach, and shows examples of their applications in identification and quantitation of complex clustered damages.
SpatialEpiApp: A Shiny web application for the analysis of spatial and spatio-temporal disease data.
Moraga, Paula
2017-11-01
During last years, public health surveillance has been facilitated by the existence of several packages implementing statistical methods for the analysis of spatial and spatio-temporal disease data. However, these methods are still inaccesible for many researchers lacking the adequate programming skills to effectively use the required software. In this paper we present SpatialEpiApp, a Shiny web application that integrate two of the most common approaches in health surveillance: disease mapping and detection of clusters. SpatialEpiApp is easy to use and does not require any programming knowledge. Given information about the cases, population and optionally covariates for each of the areas and dates of study, the application allows to fit Bayesian models to obtain disease risk estimates and their uncertainty by using R-INLA, and to detect disease clusters by using SaTScan. The application allows user interaction and the creation of interactive data visualizations and reports showing the analyses performed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantum chemical calculations in the structural analysis of phloretin
NASA Astrophysics Data System (ADS)
Gómez-Zavaglia, Andrea
2009-07-01
In this work, a conformational search on the molecule of phloretin [2',4',6'-Trihydroxy-3-(4-hydroxyphenyl)-propiophenone] has been performed. The molecule of phloretin has eight dihedral angles, four of them taking part in the carbon backbone and the other four, related with the orientation of the hydroxyl groups. A systematic search involving a random variation of the dihedral angles has been used to generate input structures for the quantum chemical calculations. Calculations at the DFT(B3LYP)/6-311++G(d,p) level of theory permitted the identification of 58 local minima belonging to the C 1 symmetry point group. The molecular structures of the conformers have been analyzed using hierarchical cluster analysis. This method allowed us to group conformers according to their similarities, and thus, to correlate the conformers' stability with structural parameters. The dendrogram obtained from the hierarchical cluster analysis depicted two main clusters. Cluster I included all the conformers with relative energies lower than 25 kJ mol -1 and cluster II, the remaining conformers. The possibility of forming intramolecular hydrogen bonds resulted the main factor contributing for the stability. Accordingly, all conformers depicting intramolecular H-bonds belong to cluster I. These conformations are clearly favored when the carbon backbone is as planar as possible. The values of the νC dbnd O and νOH vibrational modes were compared among all the conformers of phloretin. The redshifts associated with intramolecular H-bonds were correlated with the H-bonds distances and energies.
Early dynamical evolution of young substructured clusters
NASA Astrophysics Data System (ADS)
Dorval, Julien; Boily, Christian
2017-03-01
Stellar clusters form with a high level of substructure, inherited from the molecular cloud and the star formation process. Evidence from observations and simulations also indicate the stars in such young clusters form a subvirial system. The subsequent dynamical evolution can cause important mass loss, ejecting a large part of the birth population in the field. It can also imprint the stellar population and still be inferred from observations of evolved clusters. Nbody simulations allow a better understanding of these early twists and turns, given realistic initial conditions. Nowadays, substructured, clumpy young clusters are usually obtained through pseudo-fractal growth and velocity inheritance. We introduce a new way to create clumpy initial conditions through a ''Hubble expansion'' which naturally produces self consistent clumps, velocity-wise. In depth analysis of the resulting clumps shows consistency with hydrodynamical simulations of young star clusters. We use these initial conditions to investigate the dynamical evolution of young subvirial clusters. We find the collapse to be soft, with hierarchical merging leading to a high level of mass segregation. The subsequent evolution is less pronounced than the equilibrium achieved from a cold collapse formation scenario.
Neutrino and axion bounds from the globular cluster M5 (NGC 5904).
Viaux, N; Catelan, M; Stetson, P B; Raffelt, G G; Redondo, J; Valcarce, A A R; Weiss, A
2013-12-06
The red-giant branch (RGB) in globular clusters is extended to larger brightness if the degenerate helium core loses too much energy in "dark channels." Based on a large set of archival observations, we provide high-precision photometry for the Galactic globular cluster M5 (NGC 5904), allowing for a detailed comparison between the observed tip of the RGB with predictions based on contemporary stellar evolution theory. In particular, we derive 95% confidence limits of g(ae)<4.3×10(-13) on the axion-electron coupling and μ(ν)<4.5×10(-12)μ(B) (Bohr magneton μ(B)=e/2m(e)) on a neutrino dipole moment, based on a detailed analysis of statistical and systematic uncertainties. The cluster distance is the single largest source of uncertainty and can be improved in the future.
Raleiras, Patrícia; Kellers, Petra; Lindblad, Peter; Styring, Stenbjörn; Magnuson, Ann
2013-06-21
In nitrogen-fixing cyanobacteria, hydrogen evolution is associated with hydrogenases and nitrogenase, making these enzymes interesting targets for genetic engineering aimed at increased hydrogen production. Nostoc punctiforme ATCC 29133 is a filamentous cyanobacterium that expresses the uptake hydrogenase HupSL in heterocysts under nitrogen-fixing conditions. Little is known about the structural and biophysical properties of HupSL. The small subunit, HupS, has been postulated to contain three iron-sulfur clusters, but the details regarding their nature have been unclear due to unusual cluster binding motifs in the amino acid sequence. We now report the cloning and heterologous expression of Nostoc punctiforme HupS as a fusion protein, f-HupS. We have characterized the anaerobically purified protein by UV-visible and EPR spectroscopies. Our results show that f-HupS contains three iron-sulfur clusters. UV-visible absorption of f-HupS has bands ∼340 and 420 nm, typical for iron-sulfur clusters. The EPR spectrum of the oxidized f-HupS shows a narrow g = 2.023 resonance, characteristic of a low-spin (S = ½) [3Fe-4S] cluster. The reduced f-HupS presents complex EPR spectra with overlapping resonances centered on g = 1.94, g = 1.91, and g = 1.88, typical of low-spin (S = ½) [4Fe-4S] clusters. Analysis of the spectroscopic data allowed us to distinguish between two species attributable to two distinct [4Fe-4S] clusters, in addition to the [3Fe-4S] cluster. This indicates that f-HupS binds [4Fe-4S] clusters despite the presence of unusual coordinating amino acids. Furthermore, our expression and purification of what seems to be an intact HupS protein allows future studies on the significance of ligand nature on redox properties of the iron-sulfur clusters of HupS.
Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla
2013-12-01
To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks
Mall, Raghvendra; Langone, Rocco; Suykens, Johan A. K.
2014-01-01
Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks. PMID:24949877
Choque, Elodie; Klopp, Christophe; Valiere, Sophie; Raynal, José; Mathieu, Florence
2018-03-15
Black Aspergilli represent one of the most important fungal resources of primary and secondary metabolites for biotechnological industry. Having several black Aspergilli sequenced genomes should allow targeting the production of certain metabolites with bioactive properties. In this study, we report the draft genome of a black Aspergilli, A. tubingensis G131, isolated from a French Mediterranean vineyard. This 35 Mb genome includes 10,994 predicted genes. A genomic-based discovery identifies 80 secondary metabolites biosynthetic gene clusters. Genomic sequences of these clusters were blasted on 3 chosen black Aspergilli genomes: A. tubingensis CBS 134.48, A. niger CBS 513.88 and A. kawachii IFO 4308. This comparison highlights different levels of clusters conservation between the four strains. It also allows identifying seven unique clusters in A. tubingensis G131. Moreover, the putative secondary metabolites clusters for asperazine and naphtho-gamma-pyrones production were proposed based on this genomic analysis. Key biosynthetic genes required for the production of 2 mycotoxins, ochratoxin A and fumonisin, are absent from this draft genome. Even if intergenic sequences of these mycotoxins biosynthetic pathways are present, this could not lead to the production of those mycotoxins by A. tubingensis G131. Functional and bioinformatics analyses of A. tubingensis G131 genome highlight its potential for metabolites production in particular for TAN-1612, asperazine and naphtho-gamma-pyrones presenting antioxidant, anticancer or antibiotic properties.
Wrobel, Tomasz P; Mateuszuk, Lukasz; Kostogrys, Renata B; Chlopicki, Stefan; Baranska, Malgorzata
2013-11-07
In this work the quantitative determination of atherosclerotic lesion area (ApoE/LDLR(-/-) mice) by FT-IR imaging is presented and validated by comparison with atherosclerotic lesion area determination by classic Oil Red O staining. Cluster analysis of FT-IR-based measurements in the 2800-3025 cm(-1) range allowed for quantitative analysis of the atherosclerosis plaque area, the results of which were highly correlated with those of Oil Red O histological staining (R(2) = 0.935). Moreover, a specific class obtained from a second cluster analysis of the aortic cross-section samples at different stages of disease progression (3, 4 and 6 months old) seemed to represent the macrophages (CD68) area within the atherosclerotic plaque.
Analysis of cytokine release assay data using machine learning approaches.
Xiong, Feiyu; Janko, Marco; Walker, Mindi; Makropoulos, Dorie; Weinstock, Daniel; Kam, Moshe; Hrebien, Leonid
2014-10-01
The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development of monoclonal antibody (mAb) therapeutics. In this study, several machine learning approaches are used to analyze CRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential of mAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic (Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from 44 donors. Three (3) machine learning approaches were applied in combination to observations obtained from that assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followed by K-means clustering; and (iii) Decision Tree Classification (DTC). All three approaches were able to identify the treatment that caused the most severe cytokine response. HCA was able to provide information about the expected number of clusters in the data. PCA coupled with K-means clustering allowed classification of treatments sample by sample, and visualizing clusters of treatments. DTC models showed the relative importance of various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem provides better selection of parameters for one method based on outcomes from another, and an overall improved analysis of the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 may be correlated with CRS reactions, although this correlation has not yet been corroborated in the literature. Copyright © 2014 Elsevier B.V. All rights reserved.
Groundwater Quality: Analysis of Its Temporal and Spatial Variability in a Karst Aquifer.
Pacheco Castro, Roger; Pacheco Ávila, Julia; Ye, Ming; Cabrera Sansores, Armando
2018-01-01
This study develops an approach based on hierarchical cluster analysis for investigating the spatial and temporal variation of water quality governing processes. The water quality data used in this study were collected in the karst aquifer of Yucatan, Mexico, the only source of drinking water for a population of nearly two million people. Hierarchical cluster analysis was applied to the quality data of all the sampling periods lumped together. This was motivated by the observation that, if water quality does not vary significantly in time, two samples from the same sampling site will belong to the same cluster. The resulting distribution maps of clusters and box-plots of the major chemical components reveal the spatial and temporal variability of groundwater quality. Principal component analysis was used to verify the results of cluster analysis and to derive the variables that explained most of the variation of the groundwater quality data. Results of this work increase the knowledge about how precipitation and human contamination impact groundwater quality in Yucatan. Spatial variability of groundwater quality in the study area is caused by: a) seawater intrusion and groundwater rich in sulfates at the west and in the coast, b) water rock interactions and the average annual precipitation at the middle and east zones respectively, and c) human contamination present in two localized zones. Changes in the amount and distribution of precipitation cause temporal variation by diluting groundwater in the aquifer. This approach allows to analyze the variation of groundwater quality controlling processes efficiently and simultaneously. © 2017, National Ground Water Association.
NGC 346: Looking in the Cradle of a Massive Star Cluster
NASA Astrophysics Data System (ADS)
Gouliermis, Dimitrios A.; Hony, Sacha
2017-03-01
How does a star cluster of more than few 10,000 solar masses form? We present the case of the cluster NGC 346 in the Small Magellanic Cloud, still embedded in its natal star-forming region N66, and we propose a scenario for its formation, based on observations of the rich stellar populations in the region. Young massive clusters host a high fraction of early-type stars, indicating an extremely high star formation efficiency. The Milky Way galaxy hosts several young massive clusters that fill the gap between young low-mass open clusters and old massive globular clusters. Only a handful, though, are young enough to study their formation. Moreover, the investigation of their gaseous natal environments suffers from contamination by the Galactic disk. Young massive clusters are very abundant in distant starburst and interacting galaxies, but the distance of their hosting galaxies do not also allow a detailed analysis of their formation. The Magellanic Clouds, on the other hand, host young massive clusters in a wide range of ages with the youngest being still embedded in their giant HII regions. Hubble Space Telescope imaging of such star-forming complexes provide a stellar sampling with a high dynamic range in stellar masses, allowing the detailed study of star formation at scales typical for molecular clouds. Our cluster analysis on the distribution of newly-born stars in N66 shows that star formation in the region proceeds in a clumpy hierarchical fashion, leading to the formation of both a dominant young massive cluster, hosting about half of the observed pre-main-sequence population, and a self-similar dispersed distribution of the remaining stars. We investigate the correlation between stellar surface density (and star formation rate derived from star-counts) and molecular gas surface density (derived from dust column density) in order to unravel the physical conditions that gave birth to NGC 346. A power law fit to the data yields a steep correlation between these two parameters with a considerable scatter. The fraction of stellar over the total (gas plus young stars) mass is found to be systematically higher within the central 15 pc (where the young massive cluster is located) than outside, which suggests variations in the star formation efficiency within the same star-forming complex. This trend possibly reflects a change of star formation efficiency in N66 between clustered and non-clustered star formation. Our findings suggest that the formation of NGC 346 is the combined result of star formation regulated by turbulence and of early dynamical evolution induced by the gravitational potential of the dense interstellar medium.
Monitoring Fatigue Status with HRV Measures in Elite Athletes: An Avenue Beyond RMSSD?
Schmitt, Laurent; Regnard, Jacques; Millet, Grégoire P
2015-01-01
Among the tools proposed to assess the athlete's "fatigue," the analysis of heart rate variability (HRV) provides an indirect evaluation of the settings of autonomic control of heart activity. HRV analysis is performed through assessment of time-domain indices, the square root of the mean of the sum of the squares of differences between adjacent normal R-R intervals (RMSSD) measured during short (5 min) recordings in supine position upon awakening in the morning and particularly the logarithm of RMSSD (LnRMSSD) has been proposed as the most useful resting HRV indicator. However, if RMSSD can help the practitioner to identify a global "fatigue" level, it does not allow discriminating different types of fatigue. Recent results using spectral HRV analysis highlighted firstly that HRV profiles assessed in supine and standing positions are independent and complementary; and secondly that using these postural profiles allows the clustering of distinct sub-categories of "fatigue." Since, cardiovascular control settings are different in standing and lying posture, using the HRV figures of both postures to cluster fatigue state embeds information on the dynamics of control responses. Such, HRV spectral analysis appears more sensitive and enlightening than time-domain HRV indices. The wealthier information provided by this spectral analysis should improve the monitoring of the adaptive training-recovery process in athletes.
2-Way k-Means as a Model for Microbiome Samples.
Jackson, Weston J; Agarwal, Ipsita; Pe'er, Itsik
2017-01-01
Motivation . Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k -means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project.
2-Way k-Means as a Model for Microbiome Samples
2017-01-01
Motivation. Microbiome sequencing allows defining clusters of samples with shared composition. However, this paradigm poorly accounts for samples whose composition is a mixture of cluster-characterizing ones and which therefore lie in between them in the cluster space. This paper addresses unsupervised learning of 2-way clusters. It defines a mixture model that allows 2-way cluster assignment and describes a variant of generalized k-means for learning such a model. We demonstrate applicability to microbial 16S rDNA sequencing data from the Human Vaginal Microbiome Project. PMID:29177026
NASA Astrophysics Data System (ADS)
Georgiadis, A.; Berg, S.; Makurat, A.; Maitland, G.; Ott, H.
2013-09-01
We investigated the cluster-size distribution of the residual nonwetting phase in a sintered glass-bead porous medium at two-phase flow conditions, by means of micro-computed-tomography (μCT) imaging with pore-scale resolution. Cluster-size distribution functions and cluster volumes were obtained by image analysis for a range of injected pore volumes under both imbibition and drainage conditions; the field of view was larger than the porosity-based representative elementary volume (REV). We did not attempt to make a definition for a two-phase REV but used the nonwetting-phase cluster-size distribution as an indicator. Most of the nonwetting-phase total volume was found to be contained in clusters that were one to two orders of magnitude larger than the porosity-based REV. The largest observed clusters in fact ranged in volume from 65% to 99% of the entire nonwetting phase in the field of view. As a consequence, the largest clusters observed were statistically not represented and were found to be smaller than the estimated maximum cluster length. The results indicate that the two-phase REV is larger than the field of view attainable by μCT scanning, at a resolution which allows for the accurate determination of cluster connectivity.
Software system for data management and distributed processing of multichannel biomedical signals.
Franaszczuk, P J; Jouny, C C
2004-01-01
The presented software is designed for efficient utilization of cluster of PC computers for signal analysis of multichannel physiological data. The system consists of three main components: 1) a library of input and output procedures, 2) a database storing additional information about location in a storage system, 3) a user interface for selecting data for analysis, choosing programs for analysis, and distributing computing and output data on cluster nodes. The system allows for processing multichannel time series data in multiple binary formats. The description of data format, channels and time of recording are included in separate text files. Definition and selection of multiple channel montages is possible. Epochs for analysis can be selected both manually and automatically. Implementation of a new signal processing procedures is possible with a minimal programming overhead for the input/output processing and user interface. The number of nodes in cluster used for computations and amount of storage can be changed with no major modification to software. Current implementations include the time-frequency analysis of multiday, multichannel recordings of intracranial EEG of epileptic patients as well as evoked response analyses of repeated cognitive tasks.
Detection and Characterization of Galaxy Systems at Intermediate Redshift.
NASA Astrophysics Data System (ADS)
Barrena, Rafael
2004-11-01
This thesis is divided into two very related parts. In the first part we implement and apply a galaxy cluster detection method, based on multiband observations in visible. For this purpose, we use a new algorithm, the Voronoi Galaxy Cluster Finder, which identifies overdensities over a Poissonian field of objects. By applying this algorithm over four photometric bands (B, V, R and I) we reduce the possibility of detecting galaxy projection effects and spurious detections instead of real galaxy clusters. The B, V, R and I photometry allows a good characterization of galaxy systems. Therefore, we analyze the colour and early-type sequences in the colour-magnitude diagrams of the detected clusters. This analysis helps us to confirm the selected candidates as actual galaxy systems. In addition, by comparing observational early-type sequences with a semiempirical model we can estimate a photometric redshift for the detected clusters. We will apply this detection method on four 0.5x0.5 square degrees areas, that partially overlap the Postman Distant Cluster Survey (PDCS). The observations were performed as part of the International Time Programme 1999-B using the Wide Field Camera mounted at Isaac Newton Telescope (Roque de los Muchachos Observatory, La Palma island, Spain). The B and R data obtained were completed with V and I photometry performed by Marc Postman. The comparison of our cluster catalogue with that of PDCS reveals that our work is a clear improvement in the cluster detection techniques. Our method efficiently selects galaxy clusters, in particular low mass galaxy systems, even at relative high redshift, and estimate a precise photometric redshift. The validation of our method comes by observing spectroscopically several selected candidates. By comparing photometric and spectroscopic redshifts we conclude: 1) our photometric estimation method gives an precision lower than 0.1; 2) our detection technique is even able to detect galaxy systems at z~0.7 using visible photometric bands. In the second part of this thesis we analyze in detail the dynamical state of 1E0657-56 (z=0.296), a hot galaxy cluster with strong X-ray and radio emissions. Using spectroscopic and photometric observations in visible (obtained with the New Technology Telescope and the Very Large Telescope, both located at La Silla Observatory, Chile) we analyze the velocity field, morphology, colour and star formation in the galaxy population of this cluster. 1E0657-56 is involved in a collision event. We identify the substructure involved in this collision and we propose a dynamical model that allows us to investigate the origins of X-ray and radio emissions and the relation between them. The analysis of 1E0657-56 presented in this thesis constitutes a good example of what kind of properties could be studied in some of the clusters catalogued in first part of this thesis. In addition, the detailed analysis of this cluster represents an improvement in the study of the origin of X-ray and radio emissions and merging processes in galaxy clusters.
NASA Astrophysics Data System (ADS)
Miyazaki, Satoshi; Oguri, Masamune; Hamana, Takashi; Shirasaki, Masato; Koike, Michitaro; Komiyama, Yutaka; Umetsu, Keiichi; Utsumi, Yousuke; Okabe, Nobuhiro; More, Surhud; Medezinski, Elinor; Lin, Yen-Ting; Miyatake, Hironao; Murayama, Hitoshi; Ota, Naomi; Mitsuishi, Ikuyuki
2018-01-01
We present the result of searching for clusters of galaxies based on weak gravitational lensing analysis of the ˜160 deg2 area surveyed by Hyper Suprime-Cam (HSC) as a Subaru Strategic Program. HSC is a new prime focus optical imager with a 1.5°-diameter field of view on the 8.2 m Subaru telescope. The superb median seeing on the HSC i-band images of 0.56" allows the reconstruction of high angular resolution mass maps via weak lensing, which is crucial for the weak lensing cluster search. We identify 65 mass map peaks with a signal-to-noise (S/N) ratio larger than 4.7, and carefully examine their properties by cross-matching the clusters with optical and X-ray cluster catalogs. We find that all the 39 peaks with S/N > 5.1 have counterparts in the optical cluster catalogs, and only 2 out of the 65 peaks are probably false positives. The upper limits of X-ray luminosities from the ROSAT All Sky Survey (RASS) imply the existence of an X-ray underluminous cluster population. We show that the X-rays from the shear-selected clusters can be statistically detected by stacking the RASS images. The inferred average X-ray luminosity is about half that of the X-ray-selected clusters of the same mass. The radial profile of the dark matter distribution derived from the stacking analysis is well modeled by the Navarro-Frenk-White profile with a small concentration parameter value of c500 ˜ 2.5, which suggests that the selection bias on the orientation or the internal structure for our shear-selected cluster sample is not strong.
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
2015-01-01
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mass profile and dynamical status of the z ~ 0.8 galaxy cluster LCDCS 0504
NASA Astrophysics Data System (ADS)
Guennou, L.; Biviano, A.; Adami, C.; Limousin, M.; Lima Neto, G. B.; Mamon, G. A.; Ulmer, M. P.; Gavazzi, R.; Cypriano, E. S.; Durret, F.; Clowe, D.; LeBrun, V.; Allam, S.; Basa, S.; Benoist, C.; Cappi, A.; Halliday, C.; Ilbert, O.; Johnston, D.; Jullo, E.; Just, D.; Kubo, J. M.; Márquez, I.; Marshall, P.; Martinet, N.; Maurogordato, S.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Schrabback, T.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.
2014-06-01
Context. Constraints on the mass distribution in high-redshift clusters of galaxies are currently not very strong. Aims: We aim to constrain the mass profile, M(r), and dynamical status of the z ~ 0.8 LCDCS 0504 cluster of galaxies that is characterized by prominent giant gravitational arcs near its center. Methods: Our analysis is based on deep X-ray, optical, and infrared imaging as well as optical spectroscopy, collected with various instruments, which we complemented with archival data. We modeled the mass distribution of the cluster with three different mass density profiles, whose parameters were constrained by the strong lensing features of the inner cluster region, by the X-ray emission from the intracluster medium, and by the kinematics of 71 cluster members. Results: We obtain consistent M(r) determinations from three methods based on kinematics (dispersion-kurtosis, caustics, and MAMPOSSt), out to the cluster virial radius, ≃1.3 Mpc and beyond. The mass profile inferred by the strong lensing analysis in the central cluster region is slightly higher than, but still consistent with, the kinematics estimate. On the other hand, the X-ray based M(r) is significantly lower than the kinematics and strong lensing estimates. Theoretical predictions from ΛCDM cosmology for the concentration-mass relation agree with our observational results, when taking into account the uncertainties in the observational and theoretical estimates. There appears to be a central deficit in the intracluster gas mass fraction compared with nearby clusters. Conclusions: Despite the relaxed appearance of this cluster, the determinations of its mass profile by different probes show substantial discrepancies, the origin of which remains to be determined. The extension of a dynamical analysis similar to that of other clusters of the DAFT/FADA survey with multiwavelength data of sufficient quality will allow shedding light on the possible systematics that affect the determination of mass profiles of high-z clusters, which is possibly related to our incomplete understanding of intracluster baryon physics. Table 2 is available in electronic form at http://www.aanda.org
Far-infrared image restoration analysis of the protostellar cluster in S140
NASA Technical Reports Server (NTRS)
Lester, D. F.; Harvey, P. M.; Joy, M.; Ellis, H. B., Jr.
1986-01-01
Image restoration techniques are applied to one-dimensional scans at 50 and 100 microns of the protostellar cluster in S140. These measurements resolve the surrounding nebula clearly, and Fourier methods are used to match the effective beam profiles at these wavelengths. This allows the radial distribution of temperature and dust column density to be derived at a diffraction limited spatial resolution of 23 arcsec (0.1 pc). Evidence for heating of the S140 molecular cloud by a nearby ionization front is established, and the dissociation of molecules inside the ionization front is spatially well correlated with the heating of the dust. The far-infrared spectral distribution of the three near-infrared sources within 10 arcsesc of the cluster center is presented.
de la Torre, E; Tello, M; Mateu, E M; Torre, E
2005-11-01
Classical biotyping characterizes strains by creating biotype profiles that consider only positive and negative results for a predefined set of biochemical tests. This method allows Salmonella subspecies to be distinguished but does not allow serotypes and phage types to be distinguished. The objective of this study was to determine the relatedness of isolates belonging to distinct Salmonella enterica subsp. enterica serotypes by using a refined biotyping process that considers the kinetics at which biochemical reactions take place. Using a Vitek GNI+ card for the identification of gram-negative organisms, we determined the biochemical kinetic reactions (28 biochemical tests) of 135 Salmonella enterica subsp. enterica strains of pig origin collected in Spain from 1997 to 2002 (59 Salmonella serotype Typhimurium strains, 25 Salmonella serotype Typhimurium monophasic variant strains, 25 Salmonella serotype Anatum strains, 12 Salmonella serotype Tilburg strains, 7 Salmonella serotype Virchow strains, 6 Salmonella serotype Choleraesuis strains, and 1 Salmonella enterica serotype 4,5,12:-:- strain). The results were expressed as the colorimetric and turbidimetric changes (in percent) and were used to enhance the classical biotype profile by adding kinetic categories. A hierarchical cluster analysis was performed by using the enhanced profiles and resulted in 14 clusters. Six major clusters grouped 94% of all isolates with a similarity of > or =95% within any given cluster, and eight clusters contained a single isolate. The six major clusters grouped not only serotypes of the same type but also phenotypic serotype variations into individual clusters. This suggests that metabolic kinetic reaction data from the biochemical tests commonly used for classic Salmonella enterica subsp. enterica biotyping can possibly be used to determine the relatedness between isolates in an easy and timely manner.
Chalmet, Kristen; Staelens, Delfien; Blot, Stijn; Dinakis, Sylvie; Pelgrom, Jolanda; Plum, Jean; Vogelaers, Dirk; Vandekerckhove, Linos; Verhofstede, Chris
2010-09-07
The number of HIV-1 infected individuals in the Western world continues to rise. More in-depth understanding of regional HIV-1 epidemics is necessary for the optimal design and adequate use of future prevention strategies. The use of a combination of phylogenetic analysis of HIV sequences, with data on patients' demographics, infection route, clinical information and laboratory results, will allow a better characterization of individuals responsible for local transmission. Baseline HIV-1 pol sequences, obtained through routine drug-resistance testing, from 506 patients, newly diagnosed between 2001 and 2009, were used to construct phylogenetic trees and identify transmission-clusters. Patients' demographics, laboratory and clinical data, were retrieved anonymously. Statistical analysis was performed to identify subtype-specific and transmission-cluster-specific characteristics. Multivariate analysis showed significant differences between the 59.7% of individuals with subtype B infection and the 40.3% non-B infected individuals, with regard to route of transmission, origin, infection with Chlamydia (p = 0.01) and infection with Hepatitis C virus (p = 0.017). More and larger transmission-clusters were identified among the subtype B infections (p < 0.001). Overall, in multivariate analysis, clustering was significantly associated with Caucasian origin, infection through homosexual contact and younger age (all p < 0.001). Bivariate analysis additionally showed a correlation between clustering and syphilis (p < 0.001), higher CD4 counts (p = 0.002), Chlamydia infection (p = 0.013) and primary HIV (p = 0.017). Combination of phylogenetics with demographic information, laboratory and clinical data, revealed that HIV-1 subtype B infected Caucasian men-who-have-sex-with-men with high prevalence of sexually transmitted diseases, account for the majority of local HIV-transmissions. This finding elucidates observed epidemiological trends through molecular analysis, and justifies sustained focus in prevention on this high risk group.
Tissue Gene Expression Analysis Using Arrayed Normalized cDNA Libraries
Eickhoff, Holger; Schuchhardt, Johannes; Ivanov, Igor; Meier-Ewert, Sebastian; O'Brien, John; Malik, Arif; Tandon, Neeraj; Wolski, Eryk-Witold; Rohlfs, Elke; Nyarsik, Lajos; Reinhardt, Richard; Nietfeld, Wilfried; Lehrach, Hans
2000-01-01
We have used oligonucleotide-fingerprinting data on 60,000 cDNA clones from two different mouse embryonic stages to establish a normalized cDNA clone set. The normalized set of 5,376 clones represents different clusters and therefore, in almost all cases, different genes. The inserts of the cDNA clones were amplified by PCR and spotted on glass slides. The resulting arrays were hybridized with mRNA probes prepared from six different adult mouse tissues. Expression profiles were analyzed by hierarchical clustering techniques. We have chosen radioactive detection because it combines robustness with sensitivity and allows the comparison of multiple normalized experiments. Sensitive detection combined with highly effective clustering algorithms allowed the identification of tissue-specific expression profiles and the detection of genes specifically expressed in the tissues investigated. The obtained results are publicly available (http://www.rzpd.de) and can be used by other researchers as a digital expression reference. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AL360374–AL36537.] PMID:10958641
SIMS of organics—Advances in 2D and 3D imaging and future outlook
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilmore, Ian S.
Secondary ion mass spectrometry (SIMS) has become a powerful technique for the label-free analysis of organics from cells to electronic devices. The development of cluster ion sources has revolutionized the field, increasing the sensitivity for organics by two or three orders of magnitude and for large clusters, such as C{sub 60} and argon clusters, allowing depth profiling of organics. The latter has provided the capability to generate stunning three dimensional images with depth resolutions of around 5 nm, simply unavailable by other techniques. Current state-of-the-art allows molecular images with a spatial resolution of around 500 nm to be achieved andmore » future developments are likely to progress into the sub-100 nm regime. This review is intended to bring those with some familiarity with SIMS up-to-date with the latest developments for organics, the fundamental principles that underpin this and define the future progress. State-of-the-art examples are showcased and signposts to more in-depth reviews about specific topics given for the specialist.« less
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses.
Dinov, Martin; Leech, Robert
2017-01-01
Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP) maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM). We train softmax multi-layer perceptrons (MLPs) using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have discussed, which are likely to be further developed and exploited in future studies. In conclusion, probabilistic clustering and a probabilistic neural network-driven approach to microstate analysis is likely to better model and reveal details and the variability hidden in current deterministic and binarized microstate assignment and analyses. PMID:29163110
Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H
2017-10-25
Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Methods for sample size determination in cluster randomized trials
Rutterford, Clare; Copas, Andrew; Eldridge, Sandra
2015-01-01
Background: The use of cluster randomized trials (CRTs) is increasing, along with the variety in their design and analysis. The simplest approach for their sample size calculation is to calculate the sample size assuming individual randomization and inflate this by a design effect to account for randomization by cluster. The assumptions of a simple design effect may not always be met; alternative or more complicated approaches are required. Methods: We summarise a wide range of sample size methods available for cluster randomized trials. For those familiar with sample size calculations for individually randomized trials but with less experience in the clustered case, this manuscript provides formulae for a wide range of scenarios with associated explanation and recommendations. For those with more experience, comprehensive summaries are provided that allow quick identification of methods for a given design, outcome and analysis method. Results: We present first those methods applicable to the simplest two-arm, parallel group, completely randomized design followed by methods that incorporate deviations from this design such as: variability in cluster sizes; attrition; non-compliance; or the inclusion of baseline covariates or repeated measures. The paper concludes with methods for alternative designs. Conclusions: There is a large amount of methodology available for sample size calculations in CRTs. This paper gives the most comprehensive description of published methodology for sample size calculation and provides an important resource for those designing these trials. PMID:26174515
Constraining the Mass of the Spectacular Pandora's Cluster, Abell 2744
NASA Astrophysics Data System (ADS)
Carrasco, Rodrigo; Frye, Brenda; Coe, Dan; Dupke, Renato; Merten, Julian; Sodre, Laerte; Massey, Richard; Braglia, Filberto; Cypriano, Eduardo; Zitrin, Adi; Krick, Jessica; Benitez, Narciso
2011-08-01
Violent cluster mergers provide a unique opportunity to study the interplay between dark matter (DM) and ICM and to set constraints on the nature of DM. In particular, cluster mergers near first core passage allow us to ``see'' DM by comparing the spatial distribution of the intra-cluster gas (baryonic) to that of DM. We have recently finished a lensing analysis of the particularly interesting merging system, A2744, the Pandora cluster. We found that it is the result of a spectacular merging event, significantly more complex than the "Bullet Cluster", that produced a wide variety of new phenomenologies, among them, a Bullet, a Dark sub-cluster (no gas), a Ghost sub-cluster (no DM), which can provide fundamental insights to the physics of the ICM, and begs further observations. Our analyses revealed 34 arcs produced by strong gravitational lensing, none of which had been published to date. Spectroscopic redshifts of these arcs are essential to determine precise masses of the main merging system providing crucial information for further numerical simulations and to set stronger constraints on the DM self-interaction cross-section. Therefore we are requesting 17.2 hours on Gemini+GMOS-S, primarily to obtain spectroscopic redshifts of multiply strongly lensed arcs produced by this impressive cluster.
Le Vu, Stéphane; Ratmann, Oliver; Delpech, Valerie; Brown, Alison E; Gill, O Noel; Tostevin, Anna; Fraser, Christophe; Volz, Erik M
2018-06-01
Phylogenetic clustering of HIV sequences from a random sample of patients can reveal epidemiological transmission patterns, but interpretation is hampered by limited theoretical support and statistical properties of clustering analysis remain poorly understood. Alternatively, source attribution methods allow fitting of HIV transmission models and thereby quantify aspects of disease transmission. A simulation study was conducted to assess error rates of clustering methods for detecting transmission risk factors. We modeled HIV epidemics among men having sex with men and generated phylogenies comparable to those that can be obtained from HIV surveillance data in the UK. Clustering and source attribution approaches were applied to evaluate their ability to identify patient attributes as transmission risk factors. We find that commonly used methods show a misleading association between cluster size or odds of clustering and covariates that are correlated with time since infection, regardless of their influence on transmission. Clustering methods usually have higher error rates and lower sensitivity than source attribution method for identifying transmission risk factors. But neither methods provide robust estimates of transmission risk ratios. Source attribution method can alleviate drawbacks from phylogenetic clustering but formal population genetic modeling may be required to estimate quantitative transmission risk factors. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Py, Béatrice; Barras, Frédéric
2015-06-01
Since their discovery in the 50's, Fe-S cluster proteins have attracted much attention from chemists, biophysicists and biochemists. However, in the 80's they were joined by geneticists who helped to realize that in vivo maturation of Fe-S cluster bound proteins required assistance of a large number of factors defining complex multi-step pathways. The question of how clusters are formed and distributed in vivo has since been the focus of much effort. Here we review how genetics in discovering genes and investigating processes as they unfold in vivo has provoked seminal advances toward our understanding of Fe-S cluster biogenesis. The power and limitations of genetic approaches are discussed. As a final comment, we argue how the marriage of classic strategies and new high-throughput technologies should allow genetics of Fe-S cluster biology to be even more insightful in the future. This article is part of a Special Issue entitled: Fe/S proteins: Analysis, structure, function, biogenesis and diseases. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Hedlund, Anne; Sandquist, Eric L.; Arentoft, Torben; Brogaard, Karsten; Grundahl, Frank; Stello, Dennis; Bedin, Luigi R.; Libralato, Mattia; Malavolta, Luca; Nardiello, Domenico; Molenda-Zakowicz, Joanna; Vanderburg, Andrew
2018-06-01
V1178 Tau is a double-lined spectroscopic eclipsing binary in NGC1817, one of the more massive clusters observed in the K2 mission. We have determined the orbital period (P = 2.20 d) for the first time, and we model radial velocity measurements from the HARPS and ALFOSC spectrographs, light curves collected by Kepler, and ground based light curves using the Eclipsing Light Curve code (ELC, Orosz & Hauschildt 2000). We present masses and radii for the stars in the binary, allowing for a reddening-independent means of determining the cluster age. V1178 Tau is particularly useful for calculating the age of the cluster because the stars are close to the cluster turnoff, providing a more precise age determination. Furthermore, because one of the stars in the binary is a delta Scuti variable, the analysis provides improved insight into their pulsations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rusek, Marian; Orlowski, Arkadiusz
2005-04-01
The dynamics of small ({<=}55 atoms) argon clusters ionized by an intense femtosecond laser pulse is studied using a time-dependent Thomas-Fermi model. The resulting Bloch-like hydrodynamic equations are solved numerically using the smooth particle hydrodynamics method without the necessity of grid simulations. As follows from recent experiments, absorption of radiation and subsequent ionization of clusters observed in the short-wavelength laser frequency regime (98 nm) differs considerably from that in the optical spectral range (800 nm). Our theoretical approach provides a unified framework for treating these very different frequency regimes and allows for a deeper understanding of the underlying cluster explosionmore » mechanisms. The results of our analysis following from extensive numerical simulations presented in this paper are compared both with experimental findings and with predictions of other theoretical models.« less
Comparative genomic analysis by microbial COGs self-attraction rate.
Santoni, Daniele; Romano-Spica, Vincenzo
2009-06-21
Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.
NASA Astrophysics Data System (ADS)
Lamb, Derek A.
2016-10-01
While sunspots follow a well-defined pattern of emergence in space and time, small-scale flux emergence is assumed to occur randomly at all times in the quiet Sun. HMI's full-disk coverage, high cadence, spatial resolution, and duty cycle allow us to probe that basic assumption. Some case studies of emergence suggest that temporal clustering on spatial scales of 50-150 Mm may occur. If clustering is present, it could serve as a diagnostic of large-scale subsurface magnetic field structures. We present the results of a manual survey of small-scale flux emergence events over a short time period, and a statistical analysis addressing the question of whether these events show spatio-temporal behavior that is anything other than random.
MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.
Hedeker, D; Gibbons, R D
1996-05-01
MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.
Precision growth index using the clustering of cosmic structures and growth data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pouri, Athina; Basilakos, Spyros; Plionis, Manolis, E-mail: athpouri@phys.uoa.gr, E-mail: svasil@academyofathens.gr, E-mail: mplionis@physics.auth.gr
2014-08-01
We use the clustering properties of Luminous Red Galaxies (LRGs) and the growth rate data provided by the various galaxy surveys in order to constrain the growth index γ) of the linear matter fluctuations. We perform a standard χ{sup 2}-minimization procedure between theoretical expectations and data, followed by a joint likelihood analysis and we find a value of γ=0.56± 0.05, perfectly consistent with the expectations of the ΛCDM model, and Ω{sub m0} =0.29± 0.01, in very good agreement with the latest Planck results. Our analysis provides significantly more stringent growth index constraints with respect to previous studies, as indicated by the fact thatmore » the corresponding uncertainty is only ∼ 0.09 γ. Finally, allowing γ to vary with redshift in two manners (Taylor expansion around z=0, and Taylor expansion around the scale factor), we find that the combined statistical analysis between our clustering and literature growth data alleviates the degeneracy and obtain more stringent constraints with respect to other recent studies.« less
Clustering Financial Time Series by Network Community Analysis
NASA Astrophysics Data System (ADS)
Piccardi, Carlo; Calatroni, Lisa; Bertoni, Fabio
In this paper, we describe a method for clustering financial time series which is based on community analysis, a recently developed approach for partitioning the nodes of a network (graph). A network with N nodes is associated to the set of N time series. The weight of the link (i, j), which quantifies the similarity between the two corresponding time series, is defined according to a metric based on symbolic time series analysis, which has recently proved effective in the context of financial time series. Then, searching for network communities allows one to identify groups of nodes (and then time series) with strong similarity. A quantitative assessment of the significance of the obtained partition is also provided. The method is applied to two distinct case-studies concerning the US and Italy Stock Exchange, respectively. In the US case, the stability of the partitions over time is also thoroughly investigated. The results favorably compare with those obtained with the standard tools typically used for clustering financial time series, such as the minimal spanning tree and the hierarchical tree.
The differentiation of camel breeds based on meat measurements using discriminant analysis.
Al-Atiyat, Raed Mahmoud; Suliman, Gamal; AlSuhaibani, Entissar; El-Waziry, Ahmad; Al-Owaimer, Abdullah; Basmaeil, Saeid
2016-06-01
The meat productivity of camel in the tropics is still under investigation for identification of better meat breed or type. Therefore, four one-humped Saudi Arabian (SA) camel breeds, Majaheem, Maghateer, Hamrah, and Safrah were experimented in order to differentiate them from each other based on meat measurements. The measurements were biometrical meat traits measured on six intact males from each breed. The results showed higher values of the Majaheem breed than that obtained for the other breeds except few cases such dressing percentage and rib-eye area. In differentiation analysis, the most discriminating meat variables were myofibrillar protein index, meat color components (L* and a*, b*), and cooking loss. Consequently, the Safrah and the Majaheem breeds presented the largest dissimilarity as evidenced by their multivariate means. The canonical discriminant analysis allowed an additional understanding of the differentiation between breeds. Furthermore, two large clusters, one formed by Hamrah and Maghateer in one group along with Safrah. These classifications may assign each breed into one cluster considering they are better as meat producers. The Majaheem was clustered alone in another cluster that might be a result of being better as milk producers. Nevertheless, the productivity type of the camel breeds of SA needs further morphology and genetic descriptions.
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters
NASA Astrophysics Data System (ADS)
Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.
2018-06-01
We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
NASA Astrophysics Data System (ADS)
Popescu, Bogdan; Hanson, M. M.; Elmegreen, Bruce G.
2012-06-01
We present new age and mass estimates for 920 stellar clusters in the Large Magellanic Cloud (LMC) based on previously published broadband photometry and the stellar cluster analysis package, MASSCLEANage. Expressed in the generic fitting formula, d 2 N/dMdtvpropM α t β, the distribution of observed clusters is described by α = -1.5 to -1.6 and β = -2.1 to -2.2. For 288 of these clusters, ages have recently been determined based on stellar photometric color-magnitude diagrams, allowing us to gauge the confidence of our ages. The results look very promising, opening up the possibility that this sample of 920 clusters, with reliable and consistent age, mass, and photometric measures, might be used to constrain important characteristics about the stellar cluster population in the LMC. We also investigate a traditional age determination method that uses a χ2 minimization routine to fit observed cluster colors to standard infinite-mass limit simple stellar population models. This reveals serious defects in the derived cluster age distribution using this method. The traditional χ2 minimization method, due to the variation of U, B, V, R colors, will always produce an overdensity of younger and older clusters, with an underdensity of clusters in the log (age/yr) = [7.0, 7.5] range. Finally, we present a unique simulation aimed at illustrating and constraining the fading limit in observed cluster distributions that includes the complex effects of stochastic variations in the observed properties of stellar clusters.
Van Cann, Joannes; Virgilio, Massimiliano; Jordaens, Kurt; De Meyer, Marc
2015-01-01
Previous attempts to resolve the Ceratitis FAR complex (Ceratitis fasciventris, Ceratitis anonae, Ceratitis rosa, Diptera, Tephritidae) showed contrasting results and revealed the occurrence of five microsatellite genotypic clusters (A, F1, F2, R1, R2). In this paper we explore the potential of wing morphometrics for the diagnosis of FAR morphospecies and genotypic clusters. We considered a set of 227 specimens previously morphologically identified and genotyped at 16 microsatellite loci. Seventeen wing landmarks and 6 wing band areas were used for morphometric analyses. Permutational multivariate analysis of variance detected significant differences both across morphospecies and genotypic clusters (for both males and females). Unconstrained and constrained ordinations did not properly resolve groups corresponding to morphospecies or genotypic clusters. However, posterior group membership probabilities (PGMPs) of the Discriminant Analysis of Principal Components (DAPC) allowed the consistent identification of a relevant proportion of specimens (but with performances differing across morphospecies and genotypic clusters). This study suggests that wing morphometrics and PGMPs might represent a possible tool for the diagnosis of species within the FAR complex. Here, we propose a tentative diagnostic method and provide a first reference library of morphometric measures that might be used for the identification of additional and unidentified FAR specimens.
Comparison of identification methods for oral asaccharolytic Eubacterium species.
Wade, W G; Slayne, M A; Aldred, M J
1990-12-01
Thirty one strains of oral, asaccharolytic Eubacterium spp. and the type strains of E. brachy, E. nodatum and E. timidum were subjected to three identification techniques--protein-profile analysis, determination of metabolic end-products, and the API ATB32A identification kit. Five clusters were obtained from numerical analysis of protein profiles and excellent correlations were seen with the other two methods. Protein profiles alone allowed unequivocal identification.
Is antibody clustering predictive of clinical subsets and damage in systemic lupus erythematosus?
To, C H; Petri, M
2005-12-01
To examine autoantibody clusters and their associations with clinical features and organ damage accrual in patients with systemic lupus erythematosus (SLE). The study group comprised 1,357 consecutive patients with SLE who were recruited to participate in a prospective longitudinal cohort study. In the cohort, 92.6% of the patients were women, the mean +/- SD age of the patients was 41.3 +/- 12.7 years, 55.9% were Caucasian, 39.1% were African American, and 5% were Asian. Seven autoantibodies (anti-double-stranded DNA [anti-dsDNA], anti-Sm, anti-Ro, anti-La, anti-RNP, lupus anticoagulant (LAC), and anticardiolipin antibody [aCL]) were selected for cluster analysis using the K-means cluster analysis procedure. Three distinct autoantibody clusters were identified: cluster 1 (anti-Sm and anti-RNP), cluster 2 (anti-dsDNA, anti-Ro, and anti-La), and cluster 3 (anti-dsDNA, LAC, and aCL). Patients in cluster 1 (n = 451), when compared with patients in clusters 2 (n = 470) and 3 (n = 436), had the lowest incidence of proteinuria (39.7%), anemia (52.8%), lymphopenia (33.9%), and thrombocytopenia (13.7%). The incidence of nephrotic syndrome and leukopenia was also lower in cluster 1 than in cluster 2. Cluster 2 had the highest female-to-male ratio (22:1) and the greatest proportion of Asian patients. Among the 3 clusters, cluster 2 had significantly more patients presenting with secondary Sjögren's syndrome (15.7%). Cluster 3, when compared with the other 2 clusters, consisted of more Caucasian and fewer African American patients and was characterized by the highest incidence of arterial thrombosis (17.4%), venous thrombosis (25.7%), and livedo reticularis (31.4%). By using the Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index, the greatest frequency of nephrotic syndrome (8.9%) was observed in patients in cluster 2, whereas cluster 3 patients had the highest percentage of damage due to cerebrovascular accident (12.8%) and venous thrombosis (7.8%). Osteoporotic fracture (11.9%) was also more common in cluster 3 than in cluster 2. Autoantibody clustering is a valuable tool to differentiate between various subsets of SLE, allowing prediction of subsequent clinical course and organ damage.
A clustering algorithm for determining community structure in complex networks
NASA Astrophysics Data System (ADS)
Jin, Hong; Yu, Wei; Li, ShiJun
2018-02-01
Clustering algorithms are attractive for the task of community detection in complex networks. DENCLUE is a representative density based clustering algorithm which has a firm mathematical basis and good clustering properties allowing for arbitrarily shaped clusters in high dimensional datasets. However, this method cannot be directly applied to community discovering due to its inability to deal with network data. Moreover, it requires a careful selection of the density parameter and the noise threshold. To solve these issues, a new community detection method is proposed in this paper. First, we use a spectral analysis technique to map the network data into a low dimensional Euclidean Space which can preserve node structural characteristics. Then, DENCLUE is applied to detect the communities in the network. A mathematical method named Sheather-Jones plug-in is chosen to select the density parameter which can describe the intrinsic clustering structure accurately. Moreover, every node on the network is meaningful so there were no noise nodes as a result the noise threshold can be ignored. We test our algorithm on both benchmark and real-life networks, and the results demonstrate the effectiveness of our algorithm over other popularity density based clustering algorithms adopted to community detection.
High-throughput analysis of spatio-temporal dynamics in Dictyostelium
Sawai, Satoshi; Guan, Xiao-Juan; Kuspa, Adam; Cox, Edward C
2007-01-01
We demonstrate a time-lapse video approach that allows rapid examination of the spatio-temporal dynamics of Dictyostelium cell populations. Quantitative information was gathered by sampling life histories of more than 2,000 mutant clones from a large mutagenesis collection. Approximately 4% of the clonal lines showed a mutant phenotype at one stage. Many of these could be ordered by clustering into functional groups. The dataset allows one to search and retrieve movies on a gene-by-gene and phenotype-by-phenotype basis. PMID:17659086
NASA Astrophysics Data System (ADS)
Amirnasr, Elham
It is widely recognized that nonwoven basis weight non-uniformity affects various properties of nonwovens. However, few studies can be found in this topic. The development of uniformity definition and measurement methods and the study of their impact on various web properties such as filtration properties and air permeability would be beneficial both in industrial applications and in academia. They can be utilized as a quality control tool and would provide insights about nonwoven behaviors that cannot be solely explained by average values. Therefore, for quantifying nonwoven web basis weight uniformity we purse to develop an optical analytical tool. The quadrant method and clustering analysis was utilized in an image analysis scheme to help define "uniformity" and its spatial variation. Implementing the quadrant method in an image analysis system allows the establishment of a uniformity index that can be used to quantify the degree of uniformity. Clustering analysis has also been modified and verified using uniform and random simulated images with known parameters. Number of clusters and cluster properties such as cluster size, member and density was determined. We also utilized this new measurement method to evaluate uniformity of nonwovens produced with different processes and investigated impacts of uniformity on filtration and permeability. The results of quadrant method shows that uniformity index computed from quadrant method demonstrate a good range for non-uniformity of nonwoven webs. Clustering analysis is also been applied on reference nonwoven with known visual uniformity. From clustering analysis results, cluster size is promising to be used as uniformity parameter. It is been shown that non-uniform nonwovens has provide lager cluster size than uniform nonwovens. It was been tried to find a relationship between web properties and uniformity index (as a web characteristic). To achieve this, filtration properties, air permeability, solidity and uniformity index of meltblown and spunbond samples was measured. Results for filtration test show some deviation between theoretical and experimental filtration efficiency by considering different types of fiber diameter. This deviation can occur due to variation in basis weight non-uniformity. So an appropriate theory is required to predict the variation of filtration efficiency with respect to non-uniformity of nonwoven filter media. And the results for air permeability test showed that uniformity index determined by quadrant method and measured properties have some relationship. In the other word, air permeability decreases as uniformity index on nonwoven web increase.
Title: Chimeras in small, globally coupled networks: Experiments and stability analysis
NASA Astrophysics Data System (ADS)
Hart, Joseph D.; Bansal, Kanika; Murphy, Thomas E.; Roy, Rajarshi
Since the initial observation of chimera states, there has been much discussion of the conditions under which these states emerge. The emphasis thus far has mainly been to analyze large networks of coupled oscillators; however, recent studies have begun to focus on the opposite limit: what is the smallest system of coupled oscillators in which chimeras can exist? We experimentally observe chimeras and other partially synchronous patterns in a network of four globally-coupled chaotic opto-electronic oscillators. By examining the equations of motion, we demonstrate that symmetries in the network topology allow a variety of synchronous states to exist, including cluster synchronous states and a chimera state. Using the group theoretical approach recently developed for analyzing cluster synchronization, we show how to derive the variational equations for these synchronous patterns and calculate their linear stability. The stability analysis gives good agreement with our experimental results. Both experiments and simulations suggest that these chimera states often appear in regions of multistability between global, cluster, and desynchronized states.
NASA Astrophysics Data System (ADS)
Lin, Yen-Ting; Hsieh, Bau-Ching; Lin, Sheng-Chieh; Oguri, Masamune; Chen, Kai-Feng; Tanaka, Masayuki; Chiu, I.-non; Huang, Song; Kodama, Tadayuki; Leauthaud, Alexie; More, Surhud; Nishizawa, Atsushi J.; Bundy, Kevin; Lin, Lihwai; Miyazaki, Satoshi; HSC Collaboration
2018-01-01
The unprecedented depth and area surveyed by the Subaru Strategic Program with the Hyper Suprime-Cam (HSC-SSP) have enabled us to construct and publish the largest distant cluster sample out to z~1 to date. In this exploratory study of cluster galaxy evolution from z=1 to z=0.3, we investigate the stellar mass assembly history of brightest cluster galaxies (BCGs), and evolution of stellar mass and luminosity distributions, stellar mass surface density profile, as well as the population of radio galaxies. Our analysis is the first high redshift application of the top N richest cluster selection, which is shown to allow us to trace the cluster galaxy evolution faithfully. Our stellar mass is derived from a machine-learning algorithm, which we show to be unbiased and accurate with respect to the COSMOS data. We find very mild stellar mass growth in BCGs, and no evidence for evolution in both the total stellar mass-cluster mass correlation and the shape of the stellar mass surface density profile. The clusters are found to contain more red galaxies compared to the expectations from the field, even after the differences in density between the two environments have been taken into account. We also present the first measurement of the radio luminosity distribution in clusters out to z~1.
Factors of Intensification in the Hops Cluster of Chuvashia
ERIC Educational Resources Information Center
Zakharov, Anatoly I.; Evgrafov, Oleg V.; Zakharov, Dmitry A.; Ivanova, Elena V.; Tolstova, Marija L.; Tsaregorodtsev, Evgeny I.
2016-01-01
The complex analysis of development of hop-growing for 1971-2015 is carried out. In the conditions of the field experiment made in the Chuvash Republic hop-growing intensification elements--technology of its cultivation, mechanization are fulfilled. Based on researches it is established that the main internal allowance of increase in efficiency of…
Robertson, Patrick A; Villani, Luigi; Dissanayake, Uresha L M; Duncan, Luke F; Abbott, Belinda M; Wilson, David J D; Robertson, Evan G
2018-03-28
The electronic spectra of 2-bromoethylbenzene and its chloro and fluoro analogues have been recorded by resonant two-photon ionisation (R2PI) spectroscopy. Anti and gauche conformers have been assigned by rotational band contour analysis and IR-UV ion depletion spectroscopy in the CH region. Hydrate clusters of the anti conformers have also been observed, allowing the role of halocarbons as hydrogen bond acceptors to be examined in this context. The donor OH stretch of water bound to chlorine is red-shifted by 36 cm -1 , or 39 cm -1 in the case of bromine. Although classed as weak H-bond acceptors, halocarbons are favourable acceptor sites compared to π systems. Fluorine stands out as the weakest H-bond acceptor amongst the halogens. Chlorine and bromine are also weak H-bond acceptors, but allow for more geometric lability, facilitating complimentary secondary interactions within the host molecule. Ab initio and DFT quantum chemical calculations, both harmonic and anharmonic, aid the structural assignments and analysis.
Pilot testing model to uncover industrial symbiosis in Brazilian industrial clusters.
Saraceni, Adriana Valélia; Resende, Luis Mauricio; de Andrade Júnior, Pedro Paulo; Pontes, Joseane
2017-04-01
The main objective of this study was to create a pilot model to uncover industrial symbiosis practices in Brazilian industrial clusters. For this purpose, a systematic revision was conducted in journals selected from two categories of the ISI Web of Knowledge: Engineering, Environmental and Engineering, Industrial. After an in-depth revision of literature, results allowed the creation of an analysis structure. A methodology based on fuzzy logic was applied and used to attribute the weights of industrial symbiosis variables. It was thus possible to extract the intensity indicators of the interrelations required to analyse the development level of each correlation between the variables. Determination of variables and their weights initially resulted in a framework for the theory of industrial symbiosis assessments. Research results allowed the creation of a pilot model that could precisely identify the loopholes or development levels in each sphere. Ontology charts for data analysis were also generated. This study contributes to science by presenting the foundations for building an instrument that enables application and compilation of the pilot model, in order to identify opportunity to symbiotic development, which derives from "uncovering" existing symbioses.
Yiu, Sean; Farewell, Vernon T; Tom, Brian D M
2018-02-01
In psoriatic arthritis, it is important to understand the joint activity (represented by swelling and pain) and damage processes because both are related to severe physical disability. The paper aims to provide a comprehensive investigation into both processes occurring over time, in particular their relationship, by specifying a joint multistate model at the individual hand joint level, which also accounts for many of their important features. As there are multiple hand joints, such an analysis will be based on the use of clustered multistate models. Here we consider an observation level random-effects structure with dynamic covariates and allow for the possibility that a subpopulation of patients is at minimal risk of damage. Such an analysis is found to provide further understanding of the activity-damage relationship beyond that provided by previous analyses. Consideration is also given to the modelling of mean sojourn times and jump probabilities. In particular, a novel model parameterization which allows easily interpretable covariate effects to act on these quantities is proposed.
Relative importance of attributes of drug benefit plans: Thai civil servants' perspective.
Ngorsuraches, Surachat; Wanishayakorn, Tanatape; Tanvejsilp, Pimwara; Udomaksorn, Siripa
2013-01-01
The drug benefit plan of Thailand's Civil Servant Medical Benefit Scheme (CSMBS) must be amended to control increasing costs; to that end, it is important to gather the views of beneficiaries before making changes to the benefit plan. To examine the relative importance of attributes of drug benefit plans from the perspective of CSMBS beneficiaries. Attributes and levels adopted from focus group discussions and a preliminary survey were used to develop a questionnaire concerning hypothetical drug benefit plans. A convenience sample of 650 CSMBS beneficiaries in Songkhla province was asked to rate the drug benefit plans. To determine the beneficiaries' decision models, judgment analysis was used. Policy-capturing analysis was used to examine the beneficiaries' preferences, and cluster analysis was conducted to explore the variability among judgment plans. Judgment policy insight was also examined. The results of the study showed that the beneficiaries weighed on cost-sharing as their most important attribute. The results remained unchanged, although only data from the beneficiaries who used the compensatory model were analyzed. The results of the cluster analysis showed that the largest cluster of beneficiaries weighed mostly on the cost-sharing attribute. The judgment policy insight results not only supported the finding that most beneficiaries focused on the cost-sharing attribute but also revealed that they might have the least understanding of how the formulary attribute affected beneficiaries' decision making. Cost-sharing was the most important attribute for the CSMBS beneficiaries. This study indicated that a possible preferred drug benefit plan should have no cost-sharing, permit access only to drugs listed in a closed formulary, allow beneficiaries to obtain 3 months of drugs, and allow them to obtain drugs from either a community pharmacy or a government hospital. Copyright © 2013 Elsevier Inc. All rights reserved.
Cholera epidemic in Guinea-Bissau (2008): the importance of "place".
Luquero, Francisco J; Banga, Cunhate Na; Remartínez, Daniel; Palma, Pedro Pablo; Baron, Emanuel; Grais, Rebeca F
2011-05-04
As resources are limited when responding to cholera outbreaks, knowledge about where to orient interventions is crucial. We describe the cholera epidemic affecting Guinea-Bissau in 2008 focusing on the geographical spread in order to guide prevention and control activities. We conducted two studies: 1) a descriptive analysis of the cholera epidemic in Guinea-Bissau focusing on its geographical spread (country level and within the capital); and 2) a cross-sectional study to measure the prevalence of houses with at least one cholera case in the most affected neighbourhood of the capital (Bairro Bandim) to detect clustering of households with cases (cluster analysis). All cholera cases attending the cholera treatment centres in Guinea-Bissau who fulfilled a modified World Health Organization clinical case definition during the epidemic were included in the descriptive study. For the cluster analysis, a sample of houses was selected from a satellite photo (Google Earth™); 140 houses (and the four closest houses) were assessed from the 2,202 identified structures. We applied K-functions and Kernel smoothing to detect clustering. We confirmed the clustering using Kulldorff's spatial scan statistic. A total of 14,222 cases and 225 deaths were reported in the country (AR = 0.94%, CFR = 1.64%). The more affected regions were Biombo, Bijagos and Bissau (the capital). Bairro Bandim was the most affected neighborhood of the capital (AR = 4.0). We found at least one case in 22.7% of the houses (95%CI: 19.5-26.2) in this neighborhood. The cluster analysis identified two areas within Bairro Bandim at highest risk: a market and an intersection where runoff accumulates waste (p<0.001). Our analysis allowed for the identification of the most affected regions in Guinea-Bissau during the 2008 cholera outbreak, and the most affected areas within the capital. This information was essential for making decisions on where to reinforce treatment and to guide control and prevention activities.
Cholera Epidemic in Guinea-Bissau (2008): The Importance of “Place”
Luquero, Francisco J.; Banga, Cunhate Na; Remartínez, Daniel; Palma, Pedro Pablo; Baron, Emanuel; Grais, Rebeca F.
2011-01-01
Background As resources are limited when responding to cholera outbreaks, knowledge about where to orient interventions is crucial. We describe the cholera epidemic affecting Guinea-Bissau in 2008 focusing on the geographical spread in order to guide prevention and control activities. Methodology/Principal Findings We conducted two studies: 1) a descriptive analysis of the cholera epidemic in Guinea-Bissau focusing on its geographical spread (country level and within the capital); and 2) a cross-sectional study to measure the prevalence of houses with at least one cholera case in the most affected neighbourhood of the capital (Bairro Bandim) to detect clustering of households with cases (cluster analysis). All cholera cases attending the cholera treatment centres in Guinea-Bissau who fulfilled a modified World Health Organization clinical case definition during the epidemic were included in the descriptive study. For the cluster analysis, a sample of houses was selected from a satellite photo (Google Earth™); 140 houses (and the four closest houses) were assessed from the 2,202 identified structures. We applied K-functions and Kernel smoothing to detect clustering. We confirmed the clustering using Kulldorff's spatial scan statistic. A total of 14,222 cases and 225 deaths were reported in the country (AR = 0.94%, CFR = 1.64%). The more affected regions were Biombo, Bijagos and Bissau (the capital). Bairro Bandim was the most affected neighborhood of the capital (AR = 4.0). We found at least one case in 22.7% of the houses (95%CI: 19.5–26.2) in this neighborhood. The cluster analysis identified two areas within Bairro Bandim at highest risk: a market and an intersection where runoff accumulates waste (p<0.001). Conclusions/Significance Our analysis allowed for the identification of the most affected regions in Guinea-Bissau during the 2008 cholera outbreak, and the most affected areas within the capital. This information was essential for making decisions on where to reinforce treatment and to guide control and prevention activities. PMID:21572530
Clustering fossils in solid inflation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Akhshik, Mohammad, E-mail: m.akhshik@ipm.ir
In solid inflation the single field non-Gaussianity consistency condition is violated. As a result, the long tenor perturbation induces observable clustering fossils in the form of quadrupole anisotropy in large scale structure power spectrum. In this work we revisit the bispectrum analysis for the scalar-scalar-scalar and tensor-scalar-scalar bispectrum for the general parameter space of solid. We consider the parameter space of the model in which the level of non-Gaussianity generated is consistent with the Planck constraints. Specializing to this allowed range of model parameter we calculate the quadrupole anisotropy induced from the long tensor perturbations on the power spectrum ofmore » the scalar perturbations. We argue that the imprints of clustering fossil from primordial gravitational waves on large scale structures can be detected from the future galaxy surveys.« less
Persistent molecular superfluid response in doped para-hydrogen clusters.
Raston, P L; Jäger, W; Li, H; Le Roy, R J; Roy, P-N
2012-06-22
Direct observation of superfluid response in para-hydrogen (p-H(2)) remains a challenge because of the need for a probe that would not induce localization and a resultant reduction in superfluid fraction. Earlier work [H. Li, R. J. Le Roy, P.-N. Roy, and A. R. W. McKellar, Phys. Rev. Lett. 105, 133401 (2010)] has shown that carbon dioxide can probe the effective inertia of p-H(2) although larger clusters show a lower superfluid response due to localization. It is shown here that the lighter carbon monoxide probe molecule allows one to measure the effective inertia of p-H(2) clusters while maintaining a maximum superfluid response with respect to dopant rotation. Microwave spectroscopy and a theoretical analysis based on Feynman path-integral simulations are used to support this conclusion.
Monitoring Fatigue Status with HRV Measures in Elite Athletes: An Avenue Beyond RMSSD?
Schmitt, Laurent; Regnard, Jacques; Millet, Grégoire P.
2015-01-01
Among the tools proposed to assess the athlete's “fatigue,” the analysis of heart rate variability (HRV) provides an indirect evaluation of the settings of autonomic control of heart activity. HRV analysis is performed through assessment of time-domain indices, the square root of the mean of the sum of the squares of differences between adjacent normal R-R intervals (RMSSD) measured during short (5 min) recordings in supine position upon awakening in the morning and particularly the logarithm of RMSSD (LnRMSSD) has been proposed as the most useful resting HRV indicator. However, if RMSSD can help the practitioner to identify a global “fatigue” level, it does not allow discriminating different types of fatigue. Recent results using spectral HRV analysis highlighted firstly that HRV profiles assessed in supine and standing positions are independent and complementary; and secondly that using these postural profiles allows the clustering of distinct sub-categories of “fatigue.” Since, cardiovascular control settings are different in standing and lying posture, using the HRV figures of both postures to cluster fatigue state embeds information on the dynamics of control responses. Such, HRV spectral analysis appears more sensitive and enlightening than time-domain HRV indices. The wealthier information provided by this spectral analysis should improve the monitoring of the adaptive training-recovery process in athletes. PMID:26635629
Fully microscopic analysis of laser-driven finite plasmas using the example of clusters
NASA Astrophysics Data System (ADS)
Peltz, Christian; Varin, Charles; Brabec, Thomas; Fennel, Thomas
2012-06-01
We discuss a microscopic particle-in-cell (MicPIC) approach that allows bridging of the microscopic and macroscopic realms of laser-driven plasma physics. The simultaneous resolution of collisions and electromagnetic field propagation in MicPIC enables the investigation of processes that have been inaccessible to rigorous numerical scrutiny so far. This is illustrated by the two main findings of our analysis of pre-ionized, resonantly laser-driven clusters, which can be realized experimentally in pump-probe experiments. In the linear response regime, MicPIC data are used to extract the individual microscopic contributions to the dielectric cluster response function, such as surface and bulk collision frequencies. We demonstrate that the competition between surface collisions and radiation damping is responsible for the maximum in the size-dependent lifetime of the Mie surface plasmon. The capacity to determine the microscopic underpinning of optical material parameters opens new avenues for modeling nano-plasmonics and nano-photonics systems. In the non-perturbative regime, we analyze the formation and evolution of recollision-induced plasma waves in laser-driven clusters. The resulting dynamics of the electron density and local field hot spots opens a new research direction for the field of attosecond science.
The use of the wavelet cluster analysis for asteroid family determination
NASA Technical Reports Server (NTRS)
Benjoya, Phillippe; Slezak, E.; Froeschle, Claude
1992-01-01
The asteroid family determination has been analysis method dependent for a longtime. A new cluster analysis based on the wavelet transform has allowed an automatic definition of families with a degree of significance versus randomness. Actually this method is rather general and can be applied to any kind of structural analysis. We will rather concentrate on the main features of the method. The analysis has been performed on the set of 4100 asteroid proper elements computed by Milani and Knezevic (see Milani and Knezevic 1990). Twenty one families have been found and influence of the chosen metric has been tested. The results have beem compared to Zappala et al.'s ones (see Zappala et al 1990) obtained by the use of a completely different method applied to the same set of data. For the first time, a good overlapping has been found between both method results, not only for the big well known families but also for the smallest ones.
AMICO: optimized detection of galaxy clusters in photometric surveys
NASA Astrophysics Data System (ADS)
Bellagamba, Fabio; Roncarelli, Mauro; Maturi, Matteo; Moscardini, Lauro
2018-02-01
We present Adaptive Matched Identifier of Clustered Objects (AMICO), a new algorithm for the detection of galaxy clusters in photometric surveys. AMICO is based on the Optimal Filtering technique, which allows to maximize the signal-to-noise ratio (S/N) of the clusters. In this work, we focus on the new iterative approach to the extraction of cluster candidates from the map produced by the filter. In particular, we provide a definition of membership probability for the galaxies close to any cluster candidate, which allows us to remove its imprint from the map, allowing the detection of smaller structures. As demonstrated in our tests, this method allows the deblending of close-by and aligned structures in more than 50 per cent of the cases for objects at radial distance equal to 0.5 × R200 or redshift distance equal to 2 × σz, being σz the typical uncertainty of photometric redshifts. Running AMICO on mocks derived from N-body simulations and semi-analytical modelling of the galaxy evolution, we obtain a consistent mass-amplitude relation through the redshift range of 0.3 < z < 1, with a logarithmic slope of ∼0.55 and a logarithmic scatter of ∼0.14. The fraction of false detections is steeply decreasing with S/N and negligible at S/N > 5.
NASA Astrophysics Data System (ADS)
Brisset, J.; Colwell, J. E.; Dove, A.; Maukonen, D.; Brown, N.; Lai, K.; Hoover, B.
2015-12-01
We report on the results of the NanoRocks experiment on the International Space Station (ISS), which simulates collisions that occur in protoplanetary disks and planetary ring systems. A critical stage of the process of early planet formation is the growth of solid bodies from mm-sized chondrules and aggregates to km-sized planetesimals. To characterize the collision behavior of dust in protoplanetary conditions, experimental data is required, working hand in hand with models and numerical simulations. In addition, the collisional evolution of planetary rings takes place in the same collisional regime. The objective of the NanoRocks experiment is to study low-energy collisions of mm-sized particles of different shapes and materials. An aluminum tray (~8x8x2cm) divided into eight sample cells holding different types of particles gets shaken every 60 s providing particles with initial velocities of a few cm/s. In September 2014, NanoRocks reached ISS and 220 video files, each covering one shaking cycle, have already been downloaded from Station. The data analysis is focused on the dynamical evolution of the multi-particle systems and on the formation of cluster. We track the particles down to mean relative velocities less than 1 mm/s where we observe cluster formation. The mean velocity evolution after each shaking event allows for a determination of the mean coefficient of restitution for each particle set. These values can be used as input into protoplanetary disk and planetary rings simulations. In addition, the cluster analysis allows for a determination of the mean final cluster size and the average particle velocity of clustering onset. The size and shape of these particle clumps is crucial to understand the first stages of planet formation inside protoplanetary disks as well as many a feature of Saturn's rings. We report on the results from the ensemble of these collision experiments and discuss applications to planetesimal formation and planetary ring evolution.
The MICE grand challenge lightcone simulation - I. Dark matter clustering
NASA Astrophysics Data System (ADS)
Fosalba, P.; Crocce, M.; Gaztañaga, E.; Castander, F. J.
2015-04-01
We present a new N-body simulation from the Marenostrum Institut de Ciències de l'Espai (MICE) collaboration, the MICE Grand Challenge (MICE-GC), containing about 70 billion dark matter particles in a (3 Gpc h-1)3 comoving volume. Given its large volume and fine spatial resolution, spanning over five orders of magnitude in dynamic range, it allows an accurate modelling of the growth of structure in the universe from the linear through the highly non-linear regime of gravitational clustering. We validate the dark matter simulation outputs using 3D and 2D clustering statistics, and discuss mass-resolution effects in the non-linear regime by comparing to previous simulations and the latest numerical fits. We show that the MICE-GC run allows for a measurement of the BAO feature with per cent level accuracy and compare it to state-of-the-art theoretical models. We also use sub-arcmin resolution pixelized 2D maps of the dark matter counts in the lightcone to make tomographic analyses in real and redshift space. Our analysis shows the simulation reproduces the Kaiser effect on large scales, whereas we find a significant suppression of power on non-linear scales relative to the real space clustering. We complete our validation by presenting an analysis of the three-point correlation function in this and previous MICE simulations, finding further evidence for mass-resolution effects. This is the first of a series of three papers in which we present the MICE-GC simulation, along with a wide and deep mock galaxy catalogue built from it. This mock is made publicly available through a dedicated web portal, http://cosmohub.pic.es.
NASA Astrophysics Data System (ADS)
Rinderer, M.; McGlynn, B. L.; van Meerveld, I. H. J.
2016-12-01
Groundwater measurements can help us to improve our understanding of runoff generation at the catchment-scale but typically only provide point-scale data. These measurements, therefore, need to be interpolated or upscaled in order to obtain information about catchment scale groundwater dynamics. Our approach used data from 51 spatially distributed groundwater monitoring sites in a Swiss pre-alpine catchment and time series clustering to define six groundwater response clusters. Each of the clusters was characterized by distinctly different site characteristics (i.e., Topographic Wetness Index and curvature), which allowed us to assign all unmonitored locations to one of these clusters. Time series modeling and the definition of response thresholds (i.e., the depth of more transmissive soil layers) allowed us to derive maps of the spatial distribution of active (i.e., responding) locations across the catchment at 15 min time intervals. Connectivity between all active locations and the stream network was determined using a graph theory approach. The extent of the active and connected areas differed during events and suggests that not all active locations directly contributed to streamflow. Gate keeper sites prevented connectivity of upslope locations to the channel network. Streamflow dynamics at the catchment outlet were correlated to catchment average connectivity dynamics. In a sensitivity analysis we tested six different groundwater levels for a site to be considered "active", which showed that the definition of the threshold did not significantly influence the conclusions drawn from our analysis. This study is the first one to derive patterns of groundwater dynamics based on empirical data (rather than interpolation) and provides insight into the spatio-temporal evolution of the active and connected runoff source areas at the catchment-scale that is critical to understanding the dynamics of water quantity and quality in streams.
NASA Astrophysics Data System (ADS)
De Marchi, G.; Paresce, F.; Straniero, O.; Prada Moroni, P. G.
2004-03-01
Very deep images of the Galactic globular cluster M 4 (NGC 6121) through the F606W and F814W filters were taken in 2001 with the WFPC2 on board the HST. A first published analysis of this data set (Richer et al. \\cite{Richer2002}) produced the result that the age of M 4 is 12.7± 0.7 Gyr (Hansen et al. \\cite{Hansen2002}), thus setting a robust lower limit to the age of the universe. In view of the great astronomical importance of getting this number right, we have subjected the same data set to the simplest possible photometric analysis that completely avoids uncertain assumptions about the origin of the detected sources. This analysis clearly reveals both a thin main sequence, from which can be deduced the deepest statistically complete mass function yet determined for a globular cluster, and a white dwarf (WD) sequence extending all the way down to the 5 \\sigma detection limit at I ≃ 27. The WD sequence is abruptly terminated at exactly this limit as expected by detection statistics. Using our most recent theoretical WD models (Prada Moroni & Straniero \\cite{Prada2002}) to obtain the expected WD sequence for different ages in the observed bandpasses, we find that the data so far obtained do not reach the peak of the WD luminosity function, thus only allowing one to set a lower limit to the age of M 4 of ˜9 Gyr. Thus, the problem of determining the absolute age of a globular cluster and, therefore, the onset of GC formation with cosmologically significant accuracy remains completely open. Only observations several magnitudes deeper than the limit obtained so far would allow one to approach this objective. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by AURA for NASA under contract NAS5-26555.
Cancer detection based on Raman spectra super-paramagnetic clustering
NASA Astrophysics Data System (ADS)
González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual
2016-08-01
The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.
Fernández-Alvira, Juan Miguel; Börnhorst, Claudia; Bammann, Karin; Gwozdz, Wencke; Krogh, Vittorio; Hebestreit, Antje; Barba, Gianvincenzo; Reisch, Lucia; Eiben, Gabriele; Iglesia, Iris; Veidebaum, Tomas; Kourides, Yannis A; Kovacs, Eva; Huybrechts, Inge; Pigeot, Iris; Moreno, Luis A
2015-02-14
Exploring changes in children's diet over time and the relationship between these changes and socio-economic status (SES) may help to understand the impact of social inequalities on dietary patterns. The aim of the present study was to describe dietary patterns by applying a cluster analysis to 9301 children participating in the baseline (2-9 years old) and follow-up (4-11 years old) surveys of the Identification and Prevention of Dietary- and Lifestyle-induced Health Effects in Children and Infants Study, and to describe the cluster memberships of these children over time and their association with SES. We applied the K-means clustering algorithm based on the similarities between the relative frequencies of consumption of forty-two food items. The following three consistent clusters were obtained at baseline and follow-up: processed (higher frequency of consumption of snacks and fast food); sweet (higher frequency of consumption of sweet foods and sweetened drinks); healthy (higher frequency of consumption of fruits, vegetables and wholemeal products). Children with higher-educated mothers and fathers and the highest household income were more likely to be allocated to the healthy cluster at baseline and follow-up and less likely to be allocated to the sweet cluster. Migrants were more likely to be allocated to the processed cluster at baseline and follow-up. Applying the cluster analysis to derive dietary patterns at the two time points allowed us to identify groups of children from a lower socio-economic background presenting persistently unhealthier dietary profiles. This finding reflects the need for healthy eating interventions specifically targeting children from lower socio-economic backgrounds.
Degree-based statistic and center persistency for brain connectivity analysis.
Yoo, Kwangsun; Lee, Peter; Chung, Moo K; Sohn, William S; Chung, Sun Ju; Na, Duk L; Ju, Daheen; Jeong, Yong
2017-01-01
Brain connectivity analyses have been widely performed to investigate the organization and functioning of the brain, or to observe changes in neurological or psychiatric conditions. However, connectivity analysis inevitably introduces the problem of mass-univariate hypothesis testing. Although, several cluster-wise correction methods have been suggested to address this problem and shown to provide high sensitivity, these approaches fundamentally have two drawbacks: the lack of spatial specificity (localization power) and the arbitrariness of an initial cluster-forming threshold. In this study, we propose a novel method, degree-based statistic (DBS), performing cluster-wise inference. DBS is designed to overcome the above-mentioned two shortcomings. From a network perspective, a few brain regions are of critical importance and considered to play pivotal roles in network integration. Regarding this notion, DBS defines a cluster as a set of edges of which one ending node is shared. This definition enables the efficient detection of clusters and their center nodes. Furthermore, a new measure of a cluster, center persistency (CP) was introduced. The efficiency of DBS with a known "ground truth" simulation was demonstrated. Then they applied DBS to two experimental datasets and showed that DBS successfully detects the persistent clusters. In conclusion, by adopting a graph theoretical concept of degrees and borrowing the concept of persistence from algebraic topology, DBS could sensitively identify clusters with centric nodes that would play pivotal roles in an effect of interest. DBS is potentially widely applicable to variable cognitive or clinical situations and allows us to obtain statistically reliable and easily interpretable results. Hum Brain Mapp 38:165-181, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
TCW: Transcriptome Computational Workbench
Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.
2013-01-01
Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959
TCW: transcriptome computational workbench.
Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R
2013-01-01
The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.
Towards high accuracy tests on the substellar IMF in young clusters. A survey in NGC 2024.
NASA Astrophysics Data System (ADS)
Da Rio, Nicola
2017-08-01
Measuring the Initial Mass Function in young clusters, and testing its universality, is a fundamental benchmark to constrain the physical processes and theoretical models of star formation. The shape and universality of the stellar IMF are well known. Our observational characterization of the substellar IMF, on the other hand, remains more uncertain, along with its possible environmental variations. Because of this, the physical processes that play a role in the formation of brown dwarfs are not fully constrained. In Cycle 22 we were awarded HST time to carry out the deepest spectro-photometric census of BDs in a young cluster: the Orion Nebula Cluster. Through deep WFC3/IR narrow band imaging, we are able to obtain Teff and A_V down to 15Mjup. Preliminary analysis limited to a portion of the total field of view allows us to classify several hundreds BDs, place them in the HRD and obtain, for an extinction limited sample, the complete and consistent IMF down to planetary masses. The substellar slope is consistent with the Galactic IMF but a rapid drop is found at the H-burning limit. We propose to carry out a nearly identical survey with HST in a younger, less massive nearby cluster: NGC2024 in the Flame Nebula. This will allow us to derive the complete census of the young population down to planetary masses, derive the IMF, enabling a consistent comparison with the results in the ONC. We will specifically look for statistically significant IMF variations with environmental properties (cluster mass, density) and investigate primordial mass segregation in the substellar regime. These results will significantly help to constrain the mechanisms involved in BD formation.
Galactic Astronomy in the Ultraviolet
NASA Astrophysics Data System (ADS)
Rastorguev, A. S.; Sachkov, M. E.; Zabolotskikh, M. V.
2017-12-01
We propose a number of prospective observational programs for the ultraviolet space observatory WSO-UV, which seem to be of great importance to modern galactic astronomy. The programs include the search for binary Cepheids; the search and detailed photometric study and the analysis of radial distribution of UV-bright stars in globular clusters ("blue stragglers", blue horizontal-branch stars, RR Lyrae variables, white dwarfs, and stars with UV excesses); the investigation of stellar content and kinematics of young open clusters and associations; the study of spectral energy distribution in hot stars, including calculation of the extinction curves in the UV, optical and NIR; and accurate definition of the relations between the UV-colors and effective temperature. The high angular resolution of the observatory allows accurate astrometric measurements of stellar proper motions and their kinematic analysis.
Zakharov, A.; Vitale, C.; Kilinc, E.; Koroleva, K.; Fayuk, D.; Shelukhina, I.; Naumenko, N.; Skorinkin, A.; Khazipov, R.; Giniatullin, R.
2015-01-01
Trigeminal nerves in meninges are implicated in generation of nociceptive firing underlying migraine pain. However, the neurochemical mechanisms of nociceptive firing in meningeal trigeminal nerves are little understood. In this study, using suction electrode recordings from peripheral branches of the trigeminal nerve in isolated rat meninges, we analyzed spontaneous and capsaicin-induced orthodromic spiking activity. In control, biphasic single spikes with variable amplitude and shapes were observed. Application of the transient receptor potential vanilloid 1 (TRPV1) agonist capsaicin to meninges dramatically increased firing whereas the amplitudes and shapes of spikes remained essentially unchanged. This effect was antagonized by the specific TRPV1 antagonist capsazepine. Using the clustering approach, several groups of uniform spikes (clusters) were identified. The clustering approach combined with capsaicin application allowed us to detect and to distinguish “responder” (65%) from “non-responder” clusters (35%). Notably, responders fired spikes at frequencies exceeding 10 Hz, high enough to provide postsynaptic temporal summation of excitation at brainstem and spinal cord level. Almost all spikes were suppressed by tetrodotoxin (TTX) suggesting an involvement of the TTX-sensitive sodium channels in nociceptive signaling at the peripheral branches of trigeminal neurons. Our analysis also identified transient (desensitizing) and long-lasting (slowly desensitizing) responses to the continuous application of capsaicin. Thus, the persistent activation of nociceptors in capsaicin-sensitive nerve fibers shown here may be involved in trigeminal pain signaling and plasticity along with the release of migraine-related neuropeptides from TRPV1 positive neurons. Furthermore, cluster analysis could be widely used to characterize the temporal and neurochemical profiles of other pain transducers likely implicated in migraine. PMID:26283923
Psychological profiling of offender characteristics from crime behaviors in serial rape offences.
Kocsis, Richard N; Cooksey, Ray W; Irwin, Harvey J
2002-04-01
Criminal psychological profiling has progressively been incorporated into police procedures despite a dearth of empirical research. Indeed, in the study of serial violent crimes for the purpose of psychological profiling, very few original, quantitative, academically reviewed studies actually exist. This article reports on the analysis of 62 incidents of serial sexual assault. The statistical procedure of multidimensional scaling was employed in the analysis of this data, which in turn produced a five-cluster model of serial rapist behavior. First, a central cluster of behaviors were identified that represent common behaviors to all patterns of serial rape. Second, four distinct outlying patterns were identified as demonstrating distinct offence styles, these being assigned the following descriptive labels brutality, intercourse, chaotic, and ritual. Furthermore, analysis of these patterns also identified distinct offender characteristics that allow for the use of empirically robust offender profiles in future serial rape investigations.
Grande, J A; Borrego, J; Morales, J A; de la Torre, M L
2003-04-01
In the last few decades, the study of space-time distribution and variations of heavy metals in estuaries has been extensively studied as an environmental indicator. In the case described here, the combination of acid water from mines, industrial effluents and sea water plays a determining role in the evolutionary process of the chemical makeup of the water in the estuary of the Tinto and Odiel Rivers, located in the southwest of the Iberian Peninsula. Based on the statistical treatment of the data from the analysis of the water samples from this system, which has been affected by processes of industrial and mining pollution, the 16 variables analyzed can be grouped into two large families. Each family presents high, positive Pearson r values that suggest common origins (fluvial or sea) for the pollutants present in the water analyzed and allow their subsequent contrast through cluster analysis.
Determining requirements for patient-centred care: a participatory concept mapping study.
Ogden, Kathryn; Barr, Jennifer; Greenfield, David
2017-11-28
Recognition of a need for patient-centred care is not new, however making patient-centred care a reality remains a challenge to organisations. We need empirical studies to extend current understandings, create new representations of the complexity of patient-centred care, and guide collective action toward patient-centred health care. To achieve these ends, the research aim was to empirically determine what organisational actions are required for patient-centred care to be achieved. We used an established participatory concept mapping methodology. Cross-sector stakeholders contributed to the development of statements for patient-centred care requirements, sorting statements into groupings according to similarity, and rating each statement according to importance, feasibility, and achievement. The resultant data were analysed to produce a visual concept map representing participants' conceptualisation of patient-centred care requirements. Analysis included the development of a similarity matrix, multidimensional scaling, hierarchical cluster analysis, selection of the number of clusters and their labels, identifying overarching domains and quantitative representation of rating data. The outcome was the development of a conceptual map for the Requirements of Patient-Centred Care Systems (ROPCCS). ROPCCS incorporates 123 statements sorted into 13 clusters. Cluster labels were: shared responsibility for personalised health literacy; patient provider dynamic for care partnership; collaboration; shared power and responsibility; resources for coordination of care; recognition of humanity - skills and attributes; knowing and valuing the patient; relationship building; system review evaluation and new models; commitment to supportive structures and processes; elements to facilitate change; professional identity and capability development; and explicit education and learning. The clusters were grouped into three overarching domains, representing a cross-sectoral approach: humanity and partnership; career spanning education and training; and health systems, policy and management. Rating of statements allowed the generation of go-zone maps for further interrogation of the relative importance, feasibility, and achievement of each patient-centred care requirement and cluster. The study has empirically determined requirements for patient-centred care through the development of ROPCCS. The unique map emphasises collaborative responsibility of stakeholders to ensure that patient-centred care is comprehensively progressed. ROPCCS allows the complex requirements for patient-centred care to be understood, implemented, evaluated, measured, and shown to be occurring.
RNA polymerase beta-subunit gene (rpoB) sequence analysis for the identification of Bacteroides spp.
Ko, K S; Kuwahara, T; Haehwa, L; Yoon, Y-J; Kim, B-J; Lee, K-H; Ohnishi, Y; Kook, Y-H
2007-01-01
Partial rpoB sequences (317 bp) of 11 species of Bacteroides, two Porphyromonas spp. and two Prevotella spp. were compared to delineate the genetic relationships among Bacteroides and closely related anaerobic species. The high level of inter-species sequence dissimilarities (7.6-20.8%) allowed the various Bacteroides spp. to be distinguished. The position of the Bacteroides distasonis and Bacteriodes merdae cluster in the rpoB tree was different from the position in the 16S rRNA gene tree. Based on rpoB sequence similarity and clustering in the rpoB tree, it was possible to correctly re-identify 80 clinical isolates of Bacteroides. In addition to two subgroups, cfiA-negative (division I) and cfiA-positive (division II), of Bacteroides fragilis isolates, two distinct subgroups were also found among Bacteroides ovatus and Bacteroides thetaiotaomicron isolates. Bacteroides genus-specific rpoB PCR and B. fragilis species-specific rpoB PCR allowed Bacteroides spp. to be differentiated from Porphyromonas and Prevotella spp., and also allowed B. fragilis to be differentiated from other non-fragilisBacteroides spp. included in the present study.
Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C
2011-04-01
The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.
Pyglidein - A Simple HTCondor Glidein Service
NASA Astrophysics Data System (ADS)
Schultz, D.; Riedel, B.; Merino, G.
2017-10-01
A major challenge for data processing and analysis at the IceCube Neutrino Observatory presents itself in connecting a large set of individual clusters together to form a computing grid. Most of these clusters do not provide a “standard” grid interface. Using a local account on each submit machine, HTCondor glideins can be submitted to virtually any type of scheduler. The glideins then connect back to a main HTCondor pool, where jobs can run normally with no special syntax. To respond to dynamic load, a simple server advertises the number of idle jobs in the queue and the resources they request. The submit script can query this server to optimize glideins to what is needed, or not submit if there is no demand. Configuring HTCondor dynamic slots in the glideins allows us to efficiently handle varying memory requirements as well as whole-node jobs. One step of the IceCube simulation chain, photon propagation in the ice, heavily relies on GPUs for faster execution. Therefore, one important requirement for any workload management system in IceCube is to handle GPU resources properly. Within the pyglidein system, we have successfully configured HTCondor glideins to use any GPU allocated to it, with jobs using the standard HTCondor GPU syntax to request and use a GPU. This mechanism allows us to seamlessly integrate our local GPU cluster with remote non-Grid GPU clusters, including specially allocated resources at XSEDE supercomputers.
ERIC Educational Resources Information Center
Dopke, Nancy Carter; Lovett, Timothy Neal
2007-01-01
Mass spectrometry is a widely used and versatile tool for scientists in many different fields. Soft ionization techniques such as matrix-assisted laser desorption/ionization (MALDI) allow for the analysis of biomolecules, polymers, and clusters. This article describes a MALDI mass spectrometry experiment designed for students in introductory…
Using Cluster Analysis to Extend Usability Testing to Instructional Content. CRESST Report 816
ERIC Educational Resources Information Center
Kerr, Deirdre S.; Chung, Gregory K. W. K.
2012-01-01
Commercial video games undergo usability studies to determine the degree to which the player is able to learn, control, and understand the game. Usability studies allow game designers to improve their games before they are released to the public. If usability studies could be expanded to include information about the presentation of the…
An Analysis of Category Management of Service Contracts
2017-12-01
management teams a way to make informed , data-driven decisions. Data-driven decisions derived from clustering not only align with Category...savings. Furthermore, this methodology provides a data-driven visualization to inform sound business decisions on potential Category Management ...Category Management initiatives. The Maptitude software will allow future research to collect data and develop visualizations to inform Category
NASA Astrophysics Data System (ADS)
Sokolov, Anton; Dmitriev, Egor; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmenten, Marc
2016-04-01
The problem of atmospheric contamination by principal air pollutants was considered in the industrialized coastal region of English Channel in Dunkirk influenced by north European metropolitan areas. MESO-NH nested models were used for the simulation of the local atmospheric dynamics and the online calculation of Lagrangian backward trajectories with 15-minute temporal resolution and the horizontal resolution down to 500 m. The one-month mesoscale numerical simulation was coupled with local pollution measurements of volatile organic components, particulate matter, ozone, sulphur dioxide and nitrogen oxides. Principal atmospheric pathways were determined by clustering technique applied to backward trajectories simulated. Six clusters were obtained which describe local atmospheric dynamics, four winds blowing through the English Channel, one coming from the south, and the biggest cluster with small wind speeds. This last cluster includes mostly sea breeze events. The analysis of meteorological data and pollution measurements allows relating the principal atmospheric pathways with local air contamination events. It was shown that contamination events are mostly connected with a channelling of pollution from local sources and low-turbulent states of the local atmosphere.
Measurements of resonant scattering in the Perseus cluster core with Hitomi SXS
NASA Astrophysics Data System (ADS)
Sato, K.; Zhuravleva, I.
2017-10-01
Hitomi (ASTRO-H) SXS allows us to investigate fine structures of emission lines in extended X-ray sources for the first time. Thanks to its high energy resolution of 5 eV at 6 keV in orbit, Hitomi SXS finds a quiescent atmosphere in the Intra cluster medium of the Perseus cluster core where the gas has a line-of-sight velocity dispersion below 200 km/sec from the line width in the spectral analysis (Hitomi collaboration, Nature, 2016). The resonant scattering is also important to measure the gas velocity as a complementary probe of the direct measurement from the line width. Particularly in the cluster core, resonant scattering should be taken into account when inferring physical properties from line intensities because the optical depth of the He-alpha resonant line is expected to be larger than 1. The observed line flux ratio of Fe XXV He-α resonant to forbidden lines is found to be lower in the cluster core when compared to the outer region, consistent with resonant scattering of the resonant line and also in support of the low turbulent velocity.
Applying Pose Clustering and MD Simulations To Eliminate False Positives in Molecular Docking.
Makeneni, Spandana; Thieker, David F; Woods, Robert J
2018-03-26
In this work, we developed a computational protocol that employs multiple molecular docking experiments, followed by pose clustering, molecular dynamic simulations (10 ns), and energy rescoring to produce reliable 3D models of antibody-carbohydrate complexes. The protocol was applied to 10 antibody-carbohydrate co-complexes and three unliganded (apo) antibodies. Pose clustering significantly reduced the number of potential poses. For each system, 15 or fewer clusters out of 100 initial poses were generated and chosen for further analysis. Molecular dynamics (MD) simulations allowed the docked poses to either converge or disperse, and rescoring increased the likelihood that the best-ranked pose was an acceptable pose. This approach is amenable to automation and can be a valuable aid in determining the structure of antibody-carbohydrate complexes provided there is no major side chain rearrangement or backbone conformational change in the H3 loop of the CDR regions. Further, the basic protocol of docking a small ligand to a known binding site, clustering the results, and performing MD with a suitable force field is applicable to any protein ligand system.
Influence of Aromatic Molecules on the Structure and Spectroscopy of Water Clusters
NASA Astrophysics Data System (ADS)
Tabor, Daniel P.; Sibert, Edwin; Walsh, Patrick S.; Zwier, Timothy S.
2016-06-01
Isomer-specific resonant ion-dip infrared spectra are presented for benzene-(water)_n, 1-2-diphenoxyethane-(water)_n, and tricyclophane-(water)_n clusters. The IR spectra are modeled with a local mode Hamiltonian that was originally formulated for the analysis of benzene-(water)_n clusters with up to seven waters. The model accounts for stretch-bend Fermi coupling, which can complicate the IR spectra in the 3150-3300 cm-1 region. When the water clusters interact with each of the solutes, the hydrogen bond lengths between the water molecules change in a characteristic way, reflecting the strength of the solute-water interaction. These structural effects are also reflected spectroscopically in the shifts of the local mode OH stretch frequencies. When diphenoxyethane is the solute, the water clusters distort more significantly than when bound to benzene. Tricyclophane's structure provides an aromatic-rich binding pocket for the water clusters. The local mode model is used to extract Hamiltonians for individual water molecules. These monomer Hamiltonians divide into groups based on their local H-bonding architecture, allowing for further classification of the wide variety of water environments encountered in this study.
Cataloging the Praesepe Cluster: Identifying Interlopers and Binary Systems
NASA Astrophysics Data System (ADS)
Lucey, Madeline R.; Gosnell, Natalie M.; Mann, Andrew; Douglas, Stephanie
2018-01-01
We present radial velocity measurements from an ongoing survey of the Praesepe open cluster using the WIYN 3.5m Telescope. Our target stars include 229 early-K to mid-M dwarfs with proper motion memberships that have been observed by the repurposed Kepler mission, K2. With this survey, we will provide a well-constrained membership list of the cluster. By removing interloping stars and determining the cluster binary frequency we can avoid systematic errors in our analysis of the K2 findings and more accurately determine exoplanet properties in the Praesepe cluster. Obtaining accurate exoplanet parameters in open clusters allows us to study the temporal dimension of exoplanet parameter space. We find Praesepe to have a mean radial velocity of 34.09 km/s and a velocity dispersion of 1.13 km/s, which is consistent with previous studies. We derive radial velocity membership probabilities for stars with ≥3 radial velocity measurements and compare against published membership probabilities. We also identify radial velocity variables and potential double-lined spectroscopic binaries. We plan to obtain more observations to determine the radial velocity membership of all the stars in our sample, as well as follow up on radial velocity variables to determine binary orbital solutions.
Global survey of star clusters in the Milky Way. VI. Age distribution and cluster formation history
NASA Astrophysics Data System (ADS)
Piskunov, A. E.; Just, A.; Kharchenko, N. V.; Berczik, P.; Scholz, R.-D.; Reffert, S.; Yen, S. X.
2018-06-01
Context. The all-sky Milky Way Star Clusters (MWSC) survey provides uniform and precise ages, along with other relevant parameters, for a wide variety of clusters in the extended solar neighbourhood. Aims: In this study we aim to construct the cluster age distribution, investigate its spatial variations, and discuss constraints on cluster formation scenarios of the Galactic disk during the last 5 Gyrs. Methods: Due to the spatial extent of the MWSC, we have considered spatial variations of the age distribution along galactocentric radius RG, and along Z-axis. For the analysis of the age distribution we used 2242 clusters, which all lie within roughly 2.5 kpc of the Sun. To connect the observed age distribution to the cluster formation history we built an analytical model based on simple assumptions on the cluster initial mass function and on the cluster mass-lifetime relation, fit it to the observations, and determined the parameters of the cluster formation law. Results: Comparison with the literature shows that earlier results strongly underestimated the number of evolved clusters with ages t ≳ 100 Myr. Recent studies based on all-sky catalogues agree better with our data, but still lack the oldest clusters with ages t ≳ 1 Gyr. We do not observe a strong variation in the age distribution along RG, though we find an enhanced fraction of older clusters (t > 1 Gyr) in the inner disk. In contrast, the distribution strongly varies along Z. The high altitude distribution practically does not contain clusters with t < 1 Gyr. With simple assumptions on the cluster formation history, the cluster initial mass function and the cluster lifetime we can reproduce the observations. The cluster formation rate and the cluster lifetime are strongly degenerate, which does not allow us to disentangle different formation scenarios. In all cases the cluster formation rate is strongly declining with time, and the cluster initial mass function is very shallow at the high mass end.
TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis
Ji, Zhicheng; Ji, Hongkai
2016-01-01
When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. PMID:27179027
TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.
Ji, Zhicheng; Ji, Hongkai
2016-07-27
When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
[Differences in living conditions and health between cities: construction of a composite indicator].
Luiz, Olinda do Carmo; Heimann, Luiza Sterman; Boaretto, Roberta Cristina; Pacheco, Adriana Galvão; Pessoto, Umberto Catarino; Ibanhes, Lauro Cesar; Castro, Iracema Ester do Nascimento; Kayano, Jorge; Junqueira, Virginia; Rocha, Jucilene Leite da; Cortizo, Carlos Tato; Telesi Junior, Emílio
2009-02-01
To describe an index to identify inequities in living conditions and health and its relationship with health planning. Variables and indicators that would reflect demographic, economic, environment and education processes as well as supply and production of health services were applied for nondimensional scaling and clustering of 5,507 Brazilian municipalities. Data sources were the 2000 Census and the Brazilian Ministry of Health information systems. Z-score test statistic and cluster analysis were performed allowing to defining 4 groups of municipalities by living conditions. There was seen a polarization between the group with the best living conditions and health (Group 1) and the group with the worst living conditions (Group 4). Group 1 consisted of municipalities with larger populations while Group 4 comprised mainly the smallest municipalities. As for Brazilian macroregions, municipalities in Group 1 are clustered in the south and southeast and those in Group 4 are in the Northeast. The living conditions and health index comprises reality dimensions such as housing, environment and health which allows to identifying the most vulnerable municipalities and can provide input for setting priorities, and developing criteria for more equitable financing and resource allocation.
Bringing Clouds into Our Lab! - The Influence of Turbulence on the Early Stage Rain Droplets
NASA Astrophysics Data System (ADS)
Yavuz, Mehmet Altug; Kunnen, Rudie; Heijst, Gertjan; Clercx, Herman
2015-11-01
We are investigating a droplet-laden flow in an air-filled turbulence chamber, forced by speaker-driven air jets. The speakers are running in a random manner; yet they allow us to control and define the statistics of the turbulence. We study the motion of droplets with tunable size (Stokes numbers ~ 0.13 - 9) in a turbulent flow, mimicking the early stages of raindrop formation. 3D Particle Tracking Velocimetry (PTV) together with Laser Induced Fluorescence (LIF) methods are chosen as the experimental method to track the droplets and collect data for statistical analysis. Thereby it is possible to study the spatial distribution of the droplets in turbulence using the so-called Radial Distribution Function (RDF), a statistical measure to quantify the clustering of particles. Additionally, 3D-PTV technique allows us to measure velocity statistics of the droplets and the influence of the turbulence on droplet trajectories, both individually and collectively. In this contribution, we will present the clustering probability quantified by the RDF for different Stokes numbers. We will explain the physics underlying the influence of turbulence on droplet cluster behavior. This study supported by FOM/NWO Netherlands.
Algorithmic localisation of noise sources in the tip region of a low-speed axial flow fan
NASA Astrophysics Data System (ADS)
Tóth, Bence; Vad, János
2017-04-01
An objective and algorithmised methodology is proposed to analyse beamform data obtained for axial fans. Its application is demonstrated in a case study regarding the tip region of a low-speed cooling fan. First, beamforming is carried out in a co-rotating frame of reference. Then, a distribution of source strength is extracted along the circumference of the rotor at the blade tip radius in each analysed third-octave band. The circumferential distributions are expanded into Fourier series, which allows for filtering out the effects of perturbations, on the basis of an objective criterion. The remaining Fourier components are then considered as base sources to determine the blade-passage-periodic flow mechanisms responsible for the broadband noise. Based on their frequency and angular location, the base sources are grouped together. This is done using the fuzzy c-means clustering method to allow the overlap of the source mechanisms. The number of clusters is determined in a validity analysis. Finally, the obtained clusters are assigned to source mechanisms based on the literature. Thus, turbulent boundary layer - trailing edge interaction noise, tip leakage flow noise, and double leakage flow noise are identified.
NASA Astrophysics Data System (ADS)
Martinez, F.; Marx, G.; Schweikhard, L.; Vass, A.; Ziegler, F.
2011-07-01
ClusterTrap has been designed to investigate properties of atomic clusters in the gas phase with particular emphasis on the dependence on the cluster size and charge state. The combination of cluster source, Penning trap and time-of-flight mass spectrometry allows a variety of experimental schemes including collision-induced dissociation, photo-dissociation, further ionization by electron impact, and electron attachment. Due to the storage capability of the trap extended-delay reaction experiments can be performed. Several recent modifications have resulted in an improved setup. In particular, an electrostatic quadrupole deflector allows the coupling of several sources or detectors to the Penning trap. Furthermore, a linear radio-frequency quadrupole trap has been added for accumulation and ion bunching and by switching the potential of a drift tube the kinetic energy of the cluster ions can be adjusted on their way towards or from the Penning trap. Recently, experiments on multiply negatively charged clusters have been resumed.
Zhang, Shaoliang; Lorenzo, Alberto; Gómez, Miguel-Angel; Mateus, Nuno; Gonçalves, Bruno; Sampaio, Jaime
2018-04-20
The aim of this study was: (i) to group basketball players into similar clusters based on a combination of anthropometric characteristics and playing experience; and (ii) explore the distribution of players (included starters and non-starters) from different levels of teams within the obtained clusters. The game-related statistics from 699 regular season balanced games were analyzed using a two-step cluster model and a discriminant analysis. The clustering process allowed identifying five different player profiles: Top height and weight (HW) with low experience, TopHW-LowE; Middle HW with middle experience, MiddleHW-MiddleE; Middle HW with top experience, MiddleHW-TopE; Low HW with low experience, LowHW-LowE; Low HW with middle experience, LowHW-MiddleE. Discriminant analysis showed that TopHW-LowE group was highlighted by two-point field goals made and missed, offensive and defensive rebounds, blocks, and personal fouls; whereas the LowHW-LowE group made fewest passes and touches. The players from weaker teams were mostly distributed in LowHW-LowE group, whereas players from stronger teams were mainly grouped in LowHW-MiddleE group; and players that participated in the finals were allocated in the MiddleHW-MiddleE group. These results provide alternative references for basketball staff concerning the process of evaluating performance.
Review of methods for handling confounding by cluster and informative cluster size in clustered data
Seaman, Shaun; Pavlou, Menelaos; Copas, Andrew
2014-01-01
Clustered data are common in medical research. Typically, one is interested in a regression model for the association between an outcome and covariates. Two complications that can arise when analysing clustered data are informative cluster size (ICS) and confounding by cluster (CBC). ICS and CBC mean that the outcome of a member given its covariates is associated with, respectively, the number of members in the cluster and the covariate values of other members in the cluster. Standard generalised linear mixed models for cluster-specific inference and standard generalised estimating equations for population-average inference assume, in general, the absence of ICS and CBC. Modifications of these approaches have been proposed to account for CBC or ICS. This article is a review of these methods. We express their assumptions in a common format, thus providing greater clarity about the assumptions that methods proposed for handling CBC make about ICS and vice versa, and about when different methods can be used in practice. We report relative efficiencies of methods where available, describe how methods are related, identify a previously unreported equivalence between two key methods, and propose some simple additional methods. Unnecessarily using a method that allows for ICS/CBC has an efficiency cost when ICS and CBC are absent. We review tools for identifying ICS/CBC. A strategy for analysis when CBC and ICS are suspected is demonstrated by examining the association between socio-economic deprivation and preterm neonatal death in Scotland. PMID:25087978
Precise strong lensing mass profile of the CLASH galaxy cluster MACS 2129
NASA Astrophysics Data System (ADS)
Monna, A.; Seitz, S.; Balestra, I.; Rosati, P.; Grillo, C.; Halkola, A.; Suyu, S. H.; Coe, D.; Caminha, G. B.; Frye, B.; Koekemoer, A.; Mercurio, A.; Nonino, M.; Postman, M.; Zitrin, A.
2017-04-01
We present a detailed strong lensing (SL) mass reconstruction of the core of the galaxy cluster MACS J2129.4-0741 (zcl = 0.589) obtained by combining high-resolution Hubble Space Telescope photometry from the CLASH (Cluster Lensing And Supernovae survey with Hubble) survey with new spectroscopic observations from the CLASH-VLT (Very Large Telescope) survey. A background bright red passive galaxy at zsp = 1.36, sextuply lensed in the cluster core, has four radial lensed images located over the three central cluster members. Further 19 background lensed galaxies are spectroscopically confirmed by our VLT survey, including 3 additional multiple systems. A total of 31 multiple images are used in the lensing analysis. This allows us to trace with high precision the total mass profile of the cluster in its very inner region (R < 100 kpc). Our final lensing mass model reproduces the multiple images systems identified in the cluster core with high accuracy of 0.4 arcsec. This translates to a high-precision mass reconstruction of MACS 2129, which is constrained at a level of 2 per cent. The cluster has Einstein parameter ΘE = (29 ± 4) arcsec and a projected total mass of Mtot(<ΘE) = (1.35 ± 0.03) × 1014 M⊙ within such radius. Together with the cluster mass profile, we provide here also the complete spectroscopic data set for the cluster members and lensed images measured with VLT/Visible Multi-Object Spectrograph within the CLASH-VLT survey.
Paladino, Simona; Lebreton, Stéphanie; Lelek, Mickaël; Riccio, Patrizia; De Nicola, Sergio; Zimmer, Christophe
2017-01-01
Spatio-temporal compartmentalization of membrane proteins is critical for the regulation of diverse vital functions in eukaryotic cells. It was previously shown that, at the apical surface of polarized MDCK cells, glycosylphosphatidylinositol (GPI)-anchored proteins (GPI-APs) are organized in small cholesterol-independent clusters of single GPI-AP species (homoclusters), which are required for the formation of larger cholesterol-dependent clusters formed by multiple GPI-AP species (heteroclusters). This clustered organization is crucial for the biological activities of GPI-APs; hence, understanding the spatio-temporal properties of their membrane organization is of fundamental importance. Here, by using direct stochastic optical reconstruction microscopy coupled to pair correlation analysis (pc-STORM), we were able to visualize and measure the size of these clusters. Specifically, we show that they are non-randomly distributed and have an average size of 67 nm. We also demonstrated that polarized MDCK and non-polarized CHO cells have similar cluster distribution and size, but different sensitivity to cholesterol depletion. Finally, we derived a model that allowed a quantitative characterization of the cluster organization of GPI-APs at the apical surface of polarized MDCK cells for the first time. Experimental FRET (fluorescence resonance energy transfer)/FLIM (fluorescence-lifetime imaging microscopy) data were correlated to the theoretical predictions of the model. PMID:29046391
MODEL-FREE MULTI-PROBE LENSING RECONSTRUCTION OF CLUSTER MASS PROFILES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umetsu, Keiichi
2013-05-20
Lens magnification by galaxy clusters induces characteristic spatial variations in the number counts of background sources, amplifying their observed fluxes and expanding the area of sky, the net effect of which, known as magnification bias, depends on the intrinsic faint-end slope of the source luminosity function. The bias is strongly negative for red galaxies, dominated by the geometric area distortion, whereas it is mildly positive for blue galaxies, enhancing the blue counts toward the cluster center. We generalize the Bayesian approach of Umetsu et al. for reconstructing projected cluster mass profiles, by incorporating multiple populations of background sources for magnification-biasmore » measurements and combining them with complementary lens-distortion measurements, effectively breaking the mass-sheet degeneracy and improving the statistical precision of cluster mass measurements. The approach can be further extended to include strong-lensing projected mass estimates, thus allowing for non-parametric absolute mass determinations in both the weak and strong regimes. We apply this method to our recent CLASH lensing measurements of MACS J1206.2-0847, and demonstrate how combining multi-probe lensing constraints can improve the reconstruction of cluster mass profiles. This method will also be useful for a stacked lensing analysis, combining all lensing-related effects in the cluster regime, for a definitive determination of the averaged mass profile.« less
Identification of Hard X-ray Sources in Galactic Globular Clusters: Simbol-X Simulations
NASA Astrophysics Data System (ADS)
Servillat, M.
2009-05-01
Globular clusters harbour an excess of X-ray sources compared to the number of X-ray sources in the Galactic plane. It has been proposed that many of these X-ray sources are cataclysmic variables that have an intermediate magnetic field, i.e. intermediate polars, which remains to be confirmed and understood. We present here several methods to identify intermediate polars in globular clusters from multiwavelength analysis. First, we report on XMM-Newton, Chandra and HST observations of the very dense Galactic globular cluster NGC 2808. By comparing UV and X-ray properties of the cataclysmic variable candidates, the fraction of intermediate polars in this cluster can be estimated. We also present the optical spectra of two cataclysmic variables in the globular cluster M 22. The HeII (4868 Å) emission line in these spectra could be related to the presence of a magnetic field in these objects. Simulations of Simbol-X observations indicate that the angular resolution is sufficient to study X-ray sources in the core of close, less dense globular clusters, such as M 22. The sensitivity of Simbol-X in an extended energy band up to 80 keV will allow us to discriminate between hard X-ray sources (such as magnetic cataclysmic variables) and soft X-ray sources (such as chromospherically active binaries).
Yao, Hiroshi; Iwatsu, Mana
2016-04-05
Synthesis of atomically precise, water-soluble phosphine-protected gold clusters is still currently limited probably due to a stability issue. We here present the synthesis, magic-number isolation, and exploration of the electronic structures as well as the asymmetric conversion of triphenylphosphine monosulfonate (TPPS)-protected gold clusters. Electrospray ionization mass spectrometry and elemental analysis result in the primary formation of Au11(TPPS)9Cl undecagold cluster compound. Magnetic circular dichroism (MCD) spectroscopy clarifies that extremely weak transitions are present in the low-energy region unresolved in the UV-vis absorption, which can be due to the Faraday B-terms based on the magnetically allowed transitions in the cluster. Asymmetric conversion without changing the nuclearity is remarkable by the chiral phase transfer in a synergistic fashion, which yields a rather small anisotropy factor (g-factor) of at most (2.5-7.0) × 10(-5). Quantum chemical calculations for model undecagold cluster compounds are then used to evaluate the optical and chiroptical responses induced by the chiral phase transfer. On this basis, we find that the Au core distortion is ignorable, and the chiral ion-pairing causes a slight increase in the CD response of the Au11 cluster.
Saavedra, Milene T; Quon, Bradley S; Faino, Anna; Caceres, Silvia M; Poch, Katie R; Sanders, Linda A; Malcolm, Kenneth C; Nichols, David P; Sagel, Scott D; Taylor-Cousar, Jennifer L; Leach, Sonia M; Strand, Matthew; Nick, Jerry A
2018-05-01
Cystic fibrosis pulmonary exacerbations accelerate pulmonary decline and increase mortality. Previously, we identified a 10-gene leukocyte panel measured directly from whole blood, which indicates response to exacerbation treatment. We hypothesized that molecular characteristics of exacerbations could also predict future disease severity. We tested whether a 10-gene panel measured from whole blood could identify patient cohorts at increased risk for severe morbidity and mortality, beyond standard clinical measures. Transcript abundance for the 10-gene panel was measured from whole blood at the beginning of exacerbation treatment (n = 57). A hierarchical cluster analysis of subjects based on their gene expression was performed, yielding four molecular clusters. An analysis of cluster membership and outcomes incorporating an independent cohort (n = 21) was completed to evaluate robustness of cluster partitioning of genes to predict severe morbidity and mortality. The four molecular clusters were analyzed for differences in forced expiratory volume in 1 second, C-reactive protein, return to baseline forced expiratory volume in 1 second after treatment, time to next exacerbation, and time to morbidity or mortality events (defined as lung transplant referral, lung transplant, intensive care unit admission for respiratory insufficiency, or death). Clustering based on gene expression discriminated between patient groups with significant differences in forced expiratory volume in 1 second, admission frequency, and overall morbidity and mortality. At 5 years, all subjects in cluster 1 (very low risk) were alive and well, whereas 90% of subjects in cluster 4 (high risk) had suffered a major event (P = 0.0001). In multivariable analysis, the ability of gene expression to predict clinical outcomes remained significant, despite adjustment for forced expiratory volume in 1 second, sex, and admission frequency. The robustness of gene clustering to categorize patients appropriately in terms of clinical characteristics, and short- and long-term clinical outcomes, remained consistent, even when adding in a secondary population with significantly different clinical outcomes. Whole blood gene expression profiling allows molecular classification of acute pulmonary exacerbations, beyond standard clinical measures, providing a predictive tool for identifying subjects at increased risk for mortality and disease progression.
Ramón, M; Martínez-Pastor, F
2018-04-23
Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.
Clustering the Orion B giant molecular cloud based on its molecular emission.
Bron, Emeric; Daudon, Chloé; Pety, Jérôme; Levrier, François; Gerin, Maryvonne; Gratier, Pierre; Orkisz, Jan H; Guzman, Viviana; Bardeau, Sébastien; Goicoechea, Javier R; Liszt, Harvey; Öberg, Karin; Peretto, Nicolas; Sievers, Albrecht; Tremblin, Pascal
2018-02-01
Previous attempts at segmenting molecular line maps of molecular clouds have focused on using position-position-velocity data cubes of a single molecular line to separate the spatial components of the cloud. In contrast, wide field spectral imaging over a large spectral bandwidth in the (sub)mm domain now allows one to combine multiple molecular tracers to understand the different physical and chemical phases that constitute giant molecular clouds (GMCs). We aim at using multiple tracers (sensitive to different physical processes and conditions) to segment a molecular cloud into physically/chemically similar regions (rather than spatially connected components), thus disentangling the different physical/chemical phases present in the cloud. We use a machine learning clustering method, namely the Meanshift algorithm, to cluster pixels with similar molecular emission, ignoring spatial information. Clusters are defined around each maximum of the multidimensional Probability Density Function (PDF) of the line integrated intensities. Simple radiative transfer models were used to interpret the astrophysical information uncovered by the clustering analysis. A clustering analysis based only on the J = 1 - 0 lines of three isotopologues of CO proves suffcient to reveal distinct density/column density regimes ( n H ~ 100 cm -3 , ~ 500 cm -3 , and > 1000 cm -3 ), closely related to the usual definitions of diffuse, translucent and high-column-density regions. Adding two UV-sensitive tracers, the J = 1 - 0 line of HCO + and the N = 1 - 0 line of CN, allows us to distinguish two clearly distinct chemical regimes, characteristic of UV-illuminated and UV-shielded gas. The UV-illuminated regime shows overbright HCO + and CN emission, which we relate to a photochemical enrichment effect. We also find a tail of high CN/HCO + intensity ratio in UV-illuminated regions. Finer distinctions in density classes ( n H ~ 7 × 10 3 cm -3 ~ 4 × 10 4 cm -3 ) for the densest regions are also identified, likely related to the higher critical density of the CN and HCO + (1 - 0) lines. These distinctions are only possible because the high-density regions are spatially resolved. Molecules are versatile tracers of GMCs because their line intensities bear the signature of the physics and chemistry at play in the gas. The association of simultaneous multi-line, wide-field mapping and powerful machine learning methods such as the Meanshift clustering algorithm reveals how to decode the complex information available in these molecular tracers.
Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.
2007-01-01
To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.
MetaABC--an integrated metagenomics platform for data adjustment, binning and clustering.
Su, Chien-Hao; Hsu, Ming-Tsung; Wang, Tse-Yi; Chiang, Sufeng; Cheng, Jen-Hao; Weng, Francis C; Kao, Cheng-Yan; Wang, Daryi; Tsai, Huai-Kuang
2011-08-15
MetaABC is a metagenomic platform that integrates several binning tools coupled with methods for removing artifacts, analyzing unassigned reads and controlling sampling biases. It allows users to arrive at a better interpretation via series of distinct combinations of analysis tools. After execution, MetaABC provides outputs in various visual formats such as tables, pie and bar charts as well as clustering result diagrams. MetaABC source code and documentation are available at http://bits2.iis.sinica.edu.tw/MetaABC/ CONTACT: dywang@gate.sinica.edu.tw; hktsai@iis.sinica.edu.tw Supplementary data are available at Bioinformatics online.
High-Performance Data Analysis Tools for Sun-Earth Connection Missions
NASA Technical Reports Server (NTRS)
Messmer, Peter
2011-01-01
The data analysis tool of choice for many Sun-Earth Connection missions is the Interactive Data Language (IDL) by ITT VIS. The increasing amount of data produced by these missions and the increasing complexity of image processing algorithms requires access to higher computing power. Parallel computing is a cost-effective way to increase the speed of computation, but algorithms oftentimes have to be modified to take advantage of parallel systems. Enhancing IDL to work on clusters gives scientists access to increased performance in a familiar programming environment. The goal of this project was to enable IDL applications to benefit from both computing clusters as well as graphics processing units (GPUs) for accelerating data analysis tasks. The tool suite developed in this project enables scientists now to solve demanding data analysis problems in IDL that previously required specialized software, and it allows them to be solved orders of magnitude faster than on conventional PCs. The tool suite consists of three components: (1) TaskDL, a software tool that simplifies the creation and management of task farms, collections of tasks that can be processed independently and require only small amounts of data communication; (2) mpiDL, a tool that allows IDL developers to use the Message Passing Interface (MPI) inside IDL for problems that require large amounts of data to be exchanged among multiple processors; and (3) GPULib, a tool that simplifies the use of GPUs as mathematical coprocessors from within IDL. mpiDL is unique in its support for the full MPI standard and its support of a broad range of MPI implementations. GPULib is unique in enabling users to take advantage of an inexpensive piece of hardware, possibly already installed in their computer, and achieve orders of magnitude faster execution time for numerically complex algorithms. TaskDL enables the simple setup and management of task farms on compute clusters. The products developed in this project have the potential to interact, so one can build a cluster of PCs, each equipped with a GPU, and use mpiDL to communicate between the nodes and GPULib to accelerate the computations on each node.
Three-dimensional x-ray diffraction nanoscopy
NASA Astrophysics Data System (ADS)
Nikulin, Andrei Y.; Dilanian, Ruben A.; Zatsepin, Nadia A.; Muddle, Barry C.
2008-08-01
A novel approach to x-ray diffraction data analysis for non-destructive determination of the shape of nanoscale particles and clusters in three-dimensions is illustrated with representative examples of composite nanostructures. The technique is insensitive to the x-rays coherence, which allows 3D reconstruction of a modal image without tomographic synthesis and in-situ analysis of large (over a several cubic millimeters) volume of material with a spatial resolution of few nanometers, rendering the approach suitable for laboratory facilities.
ALP conversion and the soft X-ray excess in the outskirts of the Coma cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kraljic, David; Rummel, Markus; Conlon, Joseph P., E-mail: David.Kraljic@physics.ox.ac.uk, E-mail: Markus.Rummel@physics.ox.ac.uk, E-mail: j.conlon1@physics.ox.ac.uk
2015-01-01
It was recently found that the soft X-ray excess in the center of the Coma cluster can be fitted by conversion of axion-like-particles (ALPs) of a cosmic axion background (CAB) to photons. We extend this analysis to the outskirts of Coma, including regions up to 5 Mpc from the center of the cluster. We extract the excess soft X-ray flux from ROSAT All-Sky Survey data and compare it to the expected flux from ALP to photon conversion of a CAB. The soft X-ray excess both in the center and the outskirts of Coma can be simultaneously fitted by ALP tomore » photon conversion of a CAB. Given the uncertainties of the cluster magnetic field in the outskirts we constrain the parameter space of the CAB. In particular, an upper limit on the CAB mean energy and a range of allowed ALP-photon couplings are derived.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Glagolev, Mikhail K.; Vasilevskaya, Valentina V., E-mail: vvvas@polly.phys.msu.ru; Khokhlov, Alexei R.
Impact of mixture composition on self-organization in concentrated solutions of stiff helical and flexible macromolecules was studied by means of molecular dynamics simulation. The macromolecules were composed of identical amphiphilic monomer units but a fraction f of macromolecules had stiff helical backbones and the remaining chains were flexible. In poor solvents the compacted flexible macromolecules coexist with bundles or filament clusters from few intertwined stiff helical macromolecules. The increase of relative content f of helical macromolecules leads to increase of the length of helical clusters, to alignment of clusters with each other, and then to liquid-crystalline-like ordering along a singlemore » direction. The formation of filament clusters causes segregation of helical and flexible macromolecules and the alignment of the filaments induces effective liquid-like ordering of flexible macromolecules. A visual analysis and calculation of order parameter relaying the anisotropy of diffraction allow concluding that transition from disordered to liquid-crystalline state proceeds sharply at relatively low content of stiff components.« less
Genovesi, Benjamin; Berrebi, Patrick; Nagai, Satoshi; Reynaud, Nathalie; Wang, Jinhui; Masseret, Estelle
2015-09-15
The intra-specific diversity and genetic structure within the Alexandrium pacificum Litaker (A. catenella - Group IV) populations along the Temperate Asian coasts, were studied among individuals isolated from Japan to China. The UPGMA dendrogram and FCA revealed the existence of 3 clusters. Assignment analysis suggested the occurrence of gene flows between the Japanese Pacific coast (cluster-1) and the Chinese Zhejiang coast (cluster-2). Human transportations are suspected to explain the lack of genetic difference between several pairs of distant Japanese samples, hardly explained by a natural dispersal mechanism. The genetic isolation of the population established in the Sea of Japan (cluster-3) suggested the existence of a strong ecological and geographical barrier. Along the Pacific coasts, the South-North current allows limited exchanges between Chinese and Japanese populations. The relationships between Temperate Asian and Mediterranean individuals suggested different scenario of large-scale dispersal mechanisms. Copyright © 2015. Published by Elsevier Ltd.
Genetic Network Inference: From Co-Expression Clustering to Reverse Engineering
NASA Technical Reports Server (NTRS)
Dhaeseleer, Patrik; Liang, Shoudan; Somogyi, Roland
2000-01-01
Advances in molecular biological, analytical, and computational technologies are enabling us to systematically investigate the complex molecular processes underlying biological systems. In particular, using high-throughput gene expression assays, we are able to measure the output of the gene regulatory network. We aim here to review datamining and modeling approaches for conceptualizing and unraveling the functional relationships implicit in these datasets. Clustering of co-expression profiles allows us to infer shared regulatory inputs and functional pathways. We discuss various aspects of clustering, ranging from distance measures to clustering algorithms and multiple-duster memberships. More advanced analysis aims to infer causal connections between genes directly, i.e., who is regulating whom and how. We discuss several approaches to the problem of reverse engineering of genetic networks, from discrete Boolean networks, to continuous linear and non-linear models. We conclude that the combination of predictive modeling with systematic experimental verification will be required to gain a deeper insight into living organisms, therapeutic targeting, and bioengineering.
Dynamic Evolution Model Based on Social Network Services
NASA Astrophysics Data System (ADS)
Xiong, Xi; Gou, Zhi-Jian; Zhang, Shi-Bin; Zhao, Wen
2013-11-01
Based on the analysis of evolutionary characteristics of public opinion in social networking services (SNS), in the paper we propose a dynamic evolution model, in which opinions are coupled with topology. This model shows the clustering phenomenon of opinions in dynamic network evolution. The simulation results show that the model can fit the data from a social network site. The dynamic evolution of networks accelerates the opinion, separation and aggregation. The scale and the number of clusters are influenced by confidence limit and rewiring probability. Dynamic changes of the topology reduce the number of isolated nodes, while the increased confidence limit allows nodes to communicate more sufficiently. The two effects make the distribution of opinion more neutral. The dynamic evolution of networks generates central clusters with high connectivity and high betweenness, which make it difficult to control public opinions in SNS.
How mutation affects evolutionary games on graphs
Allen, Benjamin; Traulsen, Arne; Tarnita, Corina E.; Nowak, Martin A.
2011-01-01
Evolutionary dynamics are affected by population structure, mutation rates and update rules. Spatial or network structure facilitates the clustering of strategies, which represents a mechanism for the evolution of cooperation. Mutation dilutes this effect. Here we analyze how mutation influences evolutionary clustering on graphs. We introduce new mathematical methods to evolutionary game theory, specifically the analysis of coalescing random walks via generating functions. These techniques allow us to derive exact identity-by-descent (IBD) probabilities, which characterize spatial assortment on lattices and Cayley trees. From these IBD probabilities we obtain exact conditions for the evolution of cooperation and other game strategies, showing the dual effects of graph topology and mutation rate. High mutation rates diminish the clustering of cooperators, hindering their evolutionary success. Our model can represent either genetic evolution with mutation, or social imitation processes with random strategy exploration. PMID:21473871
Chemometric analysis of minerals in gluten-free products.
Gliszczyńska-Świgło, Anna; Klimczak, Inga; Rybicka, Iga
2018-06-01
Numerous studies indicate mineral deficiencies in people on a gluten-free (GF) diet. These deficiencies may indicate that GF products are a less valuable source of minerals than gluten-containing products. In the study, the nutritional quality of 50 GF products is discussed taking into account the nutritional requirements for minerals expressed as percentage of recommended daily allowance (%RDA) or percentage of adequate intake (%AI) for a model celiac patient. Elements analyzed were calcium, potassium, magnesium, sodium, copper, iron, manganese, and zinc. Analysis of %RDA or %AI was performed using principal component analysis (PCA) and hierarchical cluster analysis (HCA). Using PCA, the differentiation between products based on rice, corn, potato, GF wheat starch and based on buckwheat, chickpea, millet, oats, amaranth, teff, quinoa, chestnut, and acorn was possible. In the HCA, four clusters were created. The main criterion determining the adherence of the sample to the cluster was the content of all minerals included to HCA (K, Mg, Cu, Fe, Mn); however, only the Mn content differentiated four formed groups. GF products made of buckwheat, chickpea, millet, oats, amaranth, teff, quinoa, chestnut, and acorn are better source of minerals than based on other GF raw materials, what was confirmed by PCA and HCA. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Yücel, Yasin; Sultanoğlu, Pınar
2013-09-01
Chemical characterisation has been carried out on 45 honey samples collected from Hatay region of Turkey. The concentrations of 17 elements were determined by inductively coupled plasma optical emission spectrometry (ICP-OES). Ca, K, Mg and Na were the most abundant elements, with mean contents of 219.38, 446.93, 49.06 and 95.91 mg kg(-1) respectively. The trace element mean contents ranged between 0.03 and 15.07 mg kg(-1). Chemometric methods such as principal component analysis (PCA) and cluster analysis (CA) techniques were applied to classify honey according to mineral content. The first most important principal component (PC) was strongly associated with the value of Al, B, Cd and Co. CA showed eight clusters corresponding to the eight botanical origins of honey. PCA explained 75.69% of the variance with the first six PC variables. Chemometric analysis of the analytical data allowed the accurate classification of the honey samples according to origin. Copyright © 2013 Elsevier Ltd. All rights reserved.
Proper motions in the VVV Survey: Results for more than 15 million stars across NGC 6544
NASA Astrophysics Data System (ADS)
Contreras Ramos, R.; Zoccali, M.; Rojas, F.; Rojas-Arriagada, A.; Gárate, M.; Huijse, P.; Gran, F.; Soto, M.; Valcarce, A. A. R.; Estévez, P. A.; Minniti, D.
2017-12-01
Context. In the last six years, the VISTA Variable in the Vía Láctea (VVV) survey mapped 562 sq. deg. across the bulge and southern disk of the Galaxy. However, a detailed study of these regions, which includes 36 globular clusters (GCs) and thousands of open clusters is by no means an easy challenge. High differential reddening and severe crowding along the line of sight makes highly hamper to reliably distinguish stars belonging to different populations and/or systems. Aims: The aim of this study is to separate stars that likely belong to the Galactic GC NGC 6544 from its surrounding field by means of proper motion (PM) techniques. Methods: This work was based upon a new astrometric reduction method optimized for images of the VVV survey. Results: PSF-fitting photometry over the six years baseline of the survey allowed us to obtain a mean precision of 0.51 mas yr-1, in each PM coordinate, for stars with Ks< 15 mag. In the area studied here, cluster stars separate very well from field stars, down to the main sequence turnoff and below, allowing us to derive for the first time the absolute PM of NGC 6544. Isochrone fitting on the clean and differential reddening corrected cluster color magnitude diagram yields an age of 11-13 Gyr, and metallicity [Fe/H] =-1.5 dex, in agreement with previous studies restricted to the cluster core. We were able to derive the cluster orbit assuming an axisymmetric model of the Galaxy and conclude that NGC 6544 is likely a halo GC. We have not detected tidal tail signatures associated to the cluster, but a remarkable elongation in the galactic center direction has been found. The precision achieved in the PM determination also allows us to separate bulge stars from foreground disk stars, enabling the kinematical selection of bona fide bulge stars across the whole survey area. Conclusions: Kinematical techniques are a fundamental step toward disentangling different stellar populations that overlap in a studied field. Our results show that VVV data is perfectly suitable for this kind of analysis. Based on observations taken with ESO telescopes at Paranal Observatory under programme IDs 179.B-2002.
Yokoyama, Eiji; Uchimura, Masako
2007-11-01
Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
Characterization and identification of microorganisms by FT-IR microspectrometry
NASA Astrophysics Data System (ADS)
Ngo-Thi, N. A.; Kirschner, C.; Naumann, D.
2003-12-01
We report on a novel FT-IR approach for microbial characterization/identification based on a light microscope coupled to an infrared spectrometer which offers the possibility to acquire IR-spectra of microcolonies containing only few hundred cells. Microcolony samples suitable for FT-IR microspectroscopic measurements were obtained by a replica technique with a stamping device that transfers spatially accurate cells of microcolonies growing on solid culture plates to a special, IR-transparent or reflecting stamping plate. High quality spectra could be recorded either by applying the transmission/absorbance or the reflectance/absorbance mode of the infrared microscope. Signal to noise ratios higher than 1000 were obtained for microcolonies as small as 40 μm in diameter. Reproducibility levels were established that allowed species and strain identification. The differentiation and classification capacity of the FT-IR microscopic technique was tested for different selected microorganisms. Cluster and factor analysis methods were used to evaluate the complex spectral data. Excellent discrimination between bacteria and yeasts, and at the same time Gram-negative and Gram-positive bacterial strains was obtained. Twenty-two selected strains of different species within the genus Staphylococcus were repetitively measured and could be grouped into correct species cluster. Moreover, the results indicated that the method allows also identifications at the subspecies level. Additionally, the new approach allowed spectral mapping analysis of single colonies which provided spatially resolved characterization of growth heterogeneity within complex microbial populations such as colonies.
Naegle, Kristen M; Welsch, Roy E; Yaffe, Michael B; White, Forest M; Lauffenburger, Douglas A
2011-07-01
Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology ('MCAM') employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems. © 2011 Naegle et al.
Clustering and flow around a sphere moving into a grain cloud.
Seguin, A; Lefebvre-Lepot, A; Faure, S; Gondret, P
2016-06-01
A bidimensional simulation of a sphere moving at constant velocity into a cloud of smaller spherical grains far from any boundaries and without gravity is presented with a non-smooth contact dynamics method. A dense granular "cluster" zone builds progressively around the moving sphere until a stationary regime appears with a constant upstream cluster size. The key point is that the upstream cluster size increases with the initial solid fraction [Formula: see text] but the cluster packing fraction takes an about constant value independent of [Formula: see text]. Although the upstream cluster size around the moving sphere diverges when [Formula: see text] approaches a critical value, the drag force exerted by the grains on the sphere does not. The detailed analysis of the local strain rate and local stress fields made in the non-parallel granular flow inside the cluster allows us to extract the local invariants of the two tensors: dilation rate, shear rate, pressure and shear stress. Despite different spatial variations of these invariants, the local friction coefficient μ appears to depend only on the local inertial number I as well as the local solid fraction, which means that a local rheology does exist in the present non-parallel flow. The key point is that the spatial variations of I inside the cluster do not depend on the sphere velocity and explore only a small range around the value one.
Lee, Hyokyeong; Moody-Davis, Asher; Saha, Utsab; Suzuki, Brian M; Asarnow, Daniel; Chen, Steven; Arkin, Michelle; Caffrey, Conor R; Singh, Rahul
2012-01-01
Neglected tropical diseases, especially those caused by helminths, constitute some of the most common infections of the world's poorest people. Development of techniques for automated, high-throughput drug screening against these diseases, especially in whole-organism settings, constitutes one of the great challenges of modern drug discovery. We present a method for enabling high-throughput phenotypic drug screening against diseases caused by helminths with a focus on schistosomiasis. The proposed method allows for a quantitative analysis of the systemic impact of a drug molecule on the pathogen as exhibited by the complex continuum of its phenotypic responses. This method consists of two key parts: first, biological image analysis is employed to automatically monitor and quantify shape-, appearance-, and motion-based phenotypes of the parasites. Next, we represent these phenotypes as time-series and show how to compare, cluster, and quantitatively reason about them using techniques of time-series analysis. We present results on a number of algorithmic issues pertinent to the time-series representation of phenotypes. These include results on appropriate representation of phenotypic time-series, analysis of different time-series similarity measures for comparing phenotypic responses over time, and techniques for clustering such responses by similarity. Finally, we show how these algorithmic techniques can be used for quantifying the complex continuum of phenotypic responses of parasites. An important corollary is the ability of our method to recognize and rigorously group parasites based on the variability of their phenotypic response to different drugs. The methods and results presented in this paper enable automatic and quantitative scoring of high-throughput phenotypic screens focused on helmintic diseases. Furthermore, these methods allow us to analyze and stratify parasites based on their phenotypic response to drugs. Together, these advancements represent a significant breakthrough for the process of drug discovery against schistosomiasis in particular and can be extended to other helmintic diseases which together afflict a large part of humankind.
2012-01-01
Background Neglected tropical diseases, especially those caused by helminths, constitute some of the most common infections of the world's poorest people. Development of techniques for automated, high-throughput drug screening against these diseases, especially in whole-organism settings, constitutes one of the great challenges of modern drug discovery. Method We present a method for enabling high-throughput phenotypic drug screening against diseases caused by helminths with a focus on schistosomiasis. The proposed method allows for a quantitative analysis of the systemic impact of a drug molecule on the pathogen as exhibited by the complex continuum of its phenotypic responses. This method consists of two key parts: first, biological image analysis is employed to automatically monitor and quantify shape-, appearance-, and motion-based phenotypes of the parasites. Next, we represent these phenotypes as time-series and show how to compare, cluster, and quantitatively reason about them using techniques of time-series analysis. Results We present results on a number of algorithmic issues pertinent to the time-series representation of phenotypes. These include results on appropriate representation of phenotypic time-series, analysis of different time-series similarity measures for comparing phenotypic responses over time, and techniques for clustering such responses by similarity. Finally, we show how these algorithmic techniques can be used for quantifying the complex continuum of phenotypic responses of parasites. An important corollary is the ability of our method to recognize and rigorously group parasites based on the variability of their phenotypic response to different drugs. Conclusions The methods and results presented in this paper enable automatic and quantitative scoring of high-throughput phenotypic screens focused on helmintic diseases. Furthermore, these methods allow us to analyze and stratify parasites based on their phenotypic response to drugs. Together, these advancements represent a significant breakthrough for the process of drug discovery against schistosomiasis in particular and can be extended to other helmintic diseases which together afflict a large part of humankind. PMID:22369037
Matsuyama, T; Fukuda, Y; Sakai, T; Tanimoto, N; Nakanishi, M; Nakamura, Y; Takano, T; Nakayasu, C
2017-08-01
Bacterial haemolytic jaundice caused by Ichthyobacterium seriolicida has been responsible for mortality in farmed yellowtail, Seriola quinqueradiata, in western Japan since the 1980s. In this study, polymorphic analysis of I. seriolicida was performed using three molecular methods: amplified fragment length polymorphism (AFLP) analysis, multilocus sequence typing (MLST) and multiple-locus variable-number tandem repeat analysis (MLVA). Twenty-eight isolates were analysed using AFLP, while 31 isolates were examined by MLST and MLVA. No polymorphisms were identified by AFLP analysis using EcoRI and MseI, or by MLST of internal fragments of eight housekeeping genes. However, MLVA revealed variation in repeat numbers of three elements, allowing separation of the isolates into 16 sequence types. The unweighted pair group method using arithmetic averages cluster analysis of the MLVA data identified four major clusters, and all isolates belonged to clonal complexes. It is likely that I. seriolicida populations share a common ancestor, which may be a recently introduced strain. © 2016 John Wiley & Sons Ltd.
Determinants of the use of dietary supplements among secondary and high school students
Gajda, Karolina; Zielińska, Monika; Ciecierska, Anna; Hamułka, Jadwiga
All over the world, including Poland, the sale of dietary supplements is increasing. More and more often, people including children and youths, use dietary supplements on their own initiative and without any medical indications or knowledge in this field. Analysis of the conditions of using the dietary supplements with vitamins and minerals among secondary school and high school students in Poland. The study included 396 students aged 13-18 years (249 girls and 147 boys). Authors’ questionnaire was used to evaluate the intake of dietary supplements. The use of cluster analysis allowed to distinguish groups of students with similar socio-demographic characteristics and the frequency of use of dietary supplements. In the studied population of students three clusters were created that significantly differed in socio-demographic characteristics. In cluster 1 and 2, were mostly students who used dietary supplements (respectively, 56% of respondents and 100%). In cluster 1 there were mostly students coming from rural areas and small city, with a worse financial situation, mainly boys (56%), while cluster 2 was dominated by girls (81%) living in a big city, coming from families with a good financial situation and who were more likely to be underweight (28.8%). In cluster 3 there were mostly older students (62%), not taking dietary supplements. In comparison to cluster 2, they had lower frequency of breakfast consumption (55% vs. 69%), but higher frequency of the consumption of soft drinks, fast-food, coffee as well as salt use at the table. The results show that the use of dietary supplements in adolescence is a common phenomenon and slightly conditioned by eating behaviors. This unfavorable habit of common dietary supplements intake observed among students indicates the need for education on the benefits and risks of the supplements usage.
Planck's view on the spectrum of the Sunyaev-Zeldovich effect
NASA Astrophysics Data System (ADS)
Erler, Jens; Basu, Kaustuv; Chluba, Jens; Bertoldi, Frank
2018-05-01
We present a detailed analysis of the stacked frequency spectrum of a large sample of galaxy clusters using Planck data, together with auxiliary data from the AKARI and IRAS missions. Our primary goal is to search for the imprint of relativistic corrections to the thermal Sunyaev-Zeldovich effect (tSZ) spectrum, which allow to measure the temperature of the intracluster medium. We remove Galactic and extragalactic foregrounds with a matched filtering technique, which is validated using simulations with realistic mock data sets. The extracted spectra show the tSZ signal at high significance and reveal an additional far-infrared (FIR) excess, which we attribute to thermal emission from the galaxy clusters themselves. This excess FIR emission from clusters is accounted for in our spectral model. We are able to measure the tSZ relativistic corrections at 2.2σ by constraining the mean temperature of our cluster sample to 4.4^{+2.1}_{-2.0} keV. We repeat the same analysis on a subsample containing only the 100 hottest clusters, for which we measure the mean temperature to be 6.0^{+3.8}_{-2.9} keV, corresponding to 2.0σ. The temperature of the emitting dust grains in our FIR model is constrained to ≃20 K, consistent with previous studies. Control for systematic biases is done by fitting mock clusters, from which we also show that using the non-relativistic spectrum for SZ signal extraction will lead to a bias in the integrated Compton parameter Y, which can be up to 14% for the most massive clusters. We conclude by providing an outlook for the upcoming CCAT-prime telescope, which will improve upon Planck with lower noise and better spatial resolution.
NASA Astrophysics Data System (ADS)
Okabe, Taizo; Nishimichi, Takahiro; Oguri, Masamune; Peirani, Sébastien; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi
2018-04-01
While various observations measured ellipticities of galaxy clusters and alignments between orientations of the brightest cluster galaxies and their host clusters, there are only a handful of numerical simulations that implement realistic baryon physics to allow direct comparisons with those observations. Here we investigate ellipticities of galaxy clusters and alignments between various components of them and the central galaxies in the state-of-the-art cosmological hydrodynamical simulation Horizon-AGN, which contains dark matter, stellar, and gas components in a large simulation box of (100h-1 Mpc)3 with high spatial resolution (˜1 kpc). We estimate ellipticities of total matter, dark matter, stellar, gas surface mass density distributions, X-ray surface brightness, and the Compton y-parameter of the Sunyaev-Zel'dovich effect, as well as alignments between these components and the central galaxies for 120 projected images of galaxy clusters with masses M200 > 5 × 1013M⊙. Our results indicate that the distributions of these components are well aligned with the major-axes of the central galaxies, with the root mean square value of differences of their position angles of ˜20°, which vary little from inner to the outer regions. We also estimate alignments of these various components with total matter distributions, and find tighter alignments than those for central galaxies with the root mean square value of ˜15°. We compare our results with previous observations of ellipticities and position angle alignments and find reasonable agreements. The comprehensive analysis presented in this paper provides useful prior information for analyzing stacked lensing signals as well as designing future observations to study ellipticities and alignments of galaxy clusters.
NASA Astrophysics Data System (ADS)
Okabe, Taizo; Nishimichi, Takahiro; Oguri, Masamune; Peirani, Sébastien; Kitayama, Tetsu; Sasaki, Shin; Suto, Yasushi
2018-07-01
While various observations measured ellipticities of galaxy clusters and alignments between orientations of the brightest cluster galaxies and their host clusters, there are only a handful of numerical simulations that implement realistic baryon physics to allow direct comparisons with those observations. Here, we investigate ellipticities of galaxy clusters and alignments between various components of them and the central galaxies in the state-of-the-art cosmological hydrodynamical simulation Horizon-AGN, which contains dark matter, stellar, and gas components in a large simulation box of (100h-1 Mpc)3 with high spatial resolution (˜1 kpc). We estimate ellipticities of total matter, dark matter, stellar, gas surface mass density distributions, X-ray surface brightness, and the Compton y-parameter of the Sunyaev-Zel'dovich effect, as well as alignments between these components and the central galaxies for 120 projected images of galaxy clusters with masses M200 > 5 × 1013 M⊙. Our results indicate that the distributions of these components are well aligned with the major axes of the central galaxies, with the root-mean-square value of differences of their position angles of ˜20°, which vary little from inner to the outer regions. We also estimate alignments of these various components with total matter distributions, and find tighter alignments than those for central galaxies with the root-mean-square value of ˜15°. We compare our results with previous observations of ellipticities and position angle alignments and find reasonable agreements. The comprehensive analysis presented in this paper provides useful prior information for analysing stacked lensing signals as well as designing future observations to study ellipticities and alignments of galaxy clusters.
Hahn, Noel G.
2017-01-01
Geospatial analyses were used to investigate the spatial distribution of populations of Halyomorpha halys, an important invasive agricultural pest in mid-Atlantic peach orchards. This spatial analysis will improve efficiency by allowing growers and farm managers to predict insect arrangement and target management strategies. Data on the presence of H. halys were collected from five peach orchards at four farms in New Jersey from 2012–2014 located in different land-use contexts. A point pattern analysis, using Ripley’s K function, was used to describe clustering of H. halys. In addition, the clustering of damage indicative of H. halys feeding was described. With low populations early in the growing season, H. halys did not exhibit signs of clustering in the orchards at most distances. At sites with low populations throughout the season, clustering was not apparent. However, later in the season, high infestation levels led to more evident clustering of H. halys. Damage, although present throughout the entire orchard, was found at low levels. When looking at trees with greater than 10% fruit damage, damage was shown to cluster in orchards. The Moran’s I statistic showed that spatial autocorrelation of H. halys was present within the orchards on the August sample dates, in relation to both populations density and levels of damage. Kriging the abundance of H. halys and the severity of damage to peaches revealed that the estimations of these are generally found in the same region of the orchards. This information on the clustering of H. halys populations will be useful to help predict presence of insects for use in management or scouting programs. PMID:28362797
The Projected Dark and Baryonic Ellipsoidal Structure of 20 CLASH Galaxy Clusters
NASA Astrophysics Data System (ADS)
Umetsu, Keiichi; Sereno, Mauro; Tam, Sut-Ieng; Chiu, I.-Non; Fan, Zuhui; Ettori, Stefano; Gruen, Daniel; Okumura, Teppei; Medezinski, Elinor; Donahue, Megan; Meneghetti, Massimo; Frye, Brenda; Koekemoer, Anton; Broadhurst, Tom; Zitrin, Adi; Balestra, Italo; Benítez, Narciso; Higuchi, Yuichi; Melchior, Peter; Mercurio, Amata; Merten, Julian; Molino, Alberto; Nonino, Mario; Postman, Marc; Rosati, Piero; Sayers, Jack; Seitz, Stella
2018-06-01
We reconstruct the two-dimensional (2D) matter distributions in 20 high-mass galaxy clusters selected from the CLASH survey by using the new approach of performing a joint weak gravitational lensing analysis of 2D shear and azimuthally averaged magnification measurements. This combination allows for a complete analysis of the field, effectively breaking the mass-sheet degeneracy. In a Bayesian framework, we simultaneously constrain the mass profile and morphology of each individual cluster, assuming an elliptical Navarro–Frenk–White halo characterized by the mass, concentration, projected axis ratio, and position angle (PA) of the projected major axis. We find that spherical mass estimates of the clusters from azimuthally averaged weak-lensing measurements in previous work are in excellent agreement with our results from a full 2D analysis. Combining all 20 clusters in our sample, we detect the elliptical shape of weak-lensing halos at the 5σ significance level within a scale of 2 {Mpc} {h}-1. The median projected axis ratio is 0.67 ± 0.07 at a virial mass of {M}vir}=(15.2+/- 2.8)× {10}14 {M}ȯ , which is in agreement with theoretical predictions from recent numerical simulations of the standard collisionless cold dark matter model. We also study misalignment statistics of the brightest cluster galaxy, X-ray, thermal Sunyaev–Zel’dovich effect, and strong-lensing morphologies with respect to the weak-lensing signal. Among the three baryonic tracers studied here, we find that the X-ray morphology is best aligned with the weak-lensing mass distribution, with a median misalignment angle of | {{Δ }}{PA}| =21^\\circ +/- 7^\\circ . We also conduct a stacked quadrupole shear analysis of the 20 clusters assuming that the X-ray major axis is aligned with that of the projected mass distribution. This yields a consistent axis ratio of 0.67 ± 0.10, suggesting again a tight alignment between the intracluster gas and dark matter. Based in part on data collected at the Subaru Telescope, which is operated by the National Astronomical Society of Japan.
Classifying bent radio galaxies from a mixture of point-like/extended images with Machine Learning.
NASA Astrophysics Data System (ADS)
Bastien, David; Oozeer, Nadeem; Somanah, Radhakrishna
2017-05-01
The hypothesis that bent radio sources are supposed to be found in rich, massive galaxy clusters and the avalibility of huge amount of data from radio surveys have fueled our motivation to use Machine Learning (ML) to identify bent radio sources and as such use them as tracers for galaxy clusters. The shapelet analysis allowed us to decompose radio images into 256 features that could be fed into the ML algorithm. Additionally, ideas from the field of neuro-psychology helped us to consider training the machine to identify bent galaxies at different orientations. From our analysis, we found that the Random Forest algorithm was the most effective with an accuracy rate of 92% for a classification of point and extended sources as well as an accuracy of 80% for bent and unbent classification.
Lopsidedness of Self-consistent Galaxies Caused by the External Field Effect of Clusters
NASA Astrophysics Data System (ADS)
Wu, Xufen; Wang, Yougang; Feix, Martin; Zhao, HongSheng
2017-08-01
Adopting Schwarzschild’s orbit-superposition technique, we construct a series of self-consistent galaxy models, embedded in the external field of galaxy clusters in the framework of Milgrom’s MOdified Newtonian Dynamics (MOND). These models represent relatively massive ellipticals with a Hernquist radial profile at various distances from the cluster center. Using N-body simulations, we perform a first analysis of these models and their evolution. We find that self-gravitating axisymmetric density models, even under a weak external field, lose their symmetry by instability and generally evolve to triaxial configurations. A kinematic analysis suggests that the instability originates from both box and nonclassified orbits with low angular momentum. We also consider a self-consistent isolated system that is then placed in a strong external field and allowed to evolve freely. This model, just like the corresponding equilibrium model in the same external field, eventually settles to a triaxial equilibrium as well, but has a higher velocity radial anisotropy and is rounder. The presence of an external field in the MOND universe generically predicts some lopsidedness of galaxy shapes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biewer, Theodore M.; Marcus, Chris; Klepper, C Christopher
The divertor-specific ITER Diagnostic Residual Gas Analyzer (DRGA) will provide essential information relating to DT fusion plasma performance. This includes pulse-resolving measurements of the fuel isotopic mix reaching the pumping ducts, as well as the concentration of the helium generated as the ash of the fusion reaction. In the present baseline design, the cluster of sensors attached to this diagnostic's differentially pumped analysis chamber assembly includes a radiation compatible version of a commercial quadrupole mass spectrometer, as well as an optical gas analyzer using a plasma-based light excitation source. This paper reports on a laboratory study intended to validate themore » performance of this sensor cluster, with emphasis on the detection limit of the isotopic measurement. This validation study was carried out in a laboratory set-up that closely prototyped the analysis chamber assembly configuration of the baseline design. This includes an ITER-specific placement of the optical gas measurement downstream from the first turbine of the chamber's turbo-molecular pump to provide sufficient light emission while preserving the gas dynamics conditions that allow for \\textasciitilde 1 s response time from the sensor cluster [1].« less
Yang, Haixuan; Seoighe, Cathal
2016-01-01
Nonnegative Matrix Factorization (NMF) has proved to be an effective method for unsupervised clustering analysis of gene expression data. By the nonnegativity constraint, NMF provides a decomposition of the data matrix into two matrices that have been used for clustering analysis. However, the decomposition is not unique. This allows different clustering results to be obtained, resulting in different interpretations of the decomposition. To alleviate this problem, some existing methods directly enforce uniqueness to some extent by adding regularization terms in the NMF objective function. Alternatively, various normalization methods have been applied to the factor matrices; however, the effects of the choice of normalization have not been carefully investigated. Here we investigate the performance of NMF for the task of cancer class discovery, under a wide range of normalization choices. After extensive evaluations, we observe that the maximum norm showed the best performance, although the maximum norm has not previously been used for NMF. Matlab codes are freely available from: http://maths.nuigalway.ie/~haixuanyang/pNMF/pNMF.htm.
Lopsidedness of Self-consistent Galaxies Caused by the External Field Effect of Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Xufen; Wang, Yougang; Feix, Martin
2017-08-01
Adopting Schwarzschild’s orbit-superposition technique, we construct a series of self-consistent galaxy models, embedded in the external field of galaxy clusters in the framework of Milgrom’s MOdified Newtonian Dynamics (MOND). These models represent relatively massive ellipticals with a Hernquist radial profile at various distances from the cluster center. Using N -body simulations, we perform a first analysis of these models and their evolution. We find that self-gravitating axisymmetric density models, even under a weak external field, lose their symmetry by instability and generally evolve to triaxial configurations. A kinematic analysis suggests that the instability originates from both box and nonclassified orbitsmore » with low angular momentum. We also consider a self-consistent isolated system that is then placed in a strong external field and allowed to evolve freely. This model, just like the corresponding equilibrium model in the same external field, eventually settles to a triaxial equilibrium as well, but has a higher velocity radial anisotropy and is rounder. The presence of an external field in the MOND universe generically predicts some lopsidedness of galaxy shapes.« less
Mining the SDSS SkyServer SQL queries log
NASA Astrophysics Data System (ADS)
Hirota, Vitor M.; Santos, Rafael; Raddick, Jordan; Thakar, Ani
2016-05-01
SkyServer, the Internet portal for the Sloan Digital Sky Survey (SDSS) astronomic catalog, provides a set of tools that allows data access for astronomers and scientific education. One of SkyServer data access interfaces allows users to enter ad-hoc SQL statements to query the catalog. SkyServer also presents some template queries that can be used as basis for more complex queries. This interface has logged over 330 million queries submitted since 2001. It is expected that analysis of this data can be used to investigate usage patterns, identify potential new classes of queries, find similar queries, etc. and to shed some light on how users interact with the Sloan Digital Sky Survey data and how scientists have adopted the new paradigm of e-Science, which could in turn lead to enhancements on the user interfaces and experience in general. In this paper we review some approaches to SQL query mining, apply the traditional techniques used in the literature and present lessons learned, namely, that the general text mining approach for feature extraction and clustering does not seem to be adequate for this type of data, and, most importantly, we find that this type of analysis can result in very different queries being clustered together.
NASA Astrophysics Data System (ADS)
von der Linden, Anja; Allen, Mark T.; Applegate, Douglas E.; Kelly, Patrick L.; Allen, Steven W.; Ebeling, Harald; Burchat, Patricia R.; Burke, David L.; Donovan, David; Morris, R. Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam
2014-03-01
This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15 ≲ zCl ≲ 0.7, in order to calibrate X-ray and other mass proxies for cosmological cluster experiments. The primary aim is to improve the absolute mass calibration of cluster observables, currently the dominant systematic uncertainty for cluster count experiments. Key elements of this work are the rigorous quantification of systematic uncertainties, high-quality data reduction and photometric calibration, and the `blind' nature of the analysis to avoid confirmation bias. Our target clusters are drawn from X-ray catalogues based on the ROSAT All-Sky Survey, and provide a versatile calibration sample for many aspects of cluster cosmology. We have acquired wide-field, high-quality imaging using the Subaru Telescope and Canada-France-Hawaii Telescope for all 51 clusters, in at least three bands per cluster. For a subset of 27 clusters, we have data in at least five bands, allowing accurate photometric redshift estimates of lensed galaxies. In this paper, we describe the cluster sample and observations, and detail the processing of the SuprimeCam data to yield high-quality images suitable for robust weak-lensing shape measurements and precision photometry. For each cluster, we present wide-field three-colour optical images and maps of the weak-lensing mass distribution, the optical light distribution and the X-ray emission. These provide insights into the large-scale structure in which the clusters are embedded. We measure the offsets between X-ray flux centroids and the brightest cluster galaxies in the clusters, finding these to be small in general, with a median of 20 kpc. For offsets ≲100 kpc, weak-lensing mass measurements centred on the brightest cluster galaxies agree well with values determined relative to the X-ray centroids; miscentring is therefore not a significant source of systematic uncertainty for our weak-lensing mass measurements. In accompanying papers, we discuss the key aspects of our photometric calibration and photometric redshift measurements (Kelly et al.), and measure cluster masses using two methods, including a novel Bayesian weak-lensing approach that makes full use of the photometric redshift probability distributions for individual background galaxies (Applegate et al.). In subsequent papers, we will incorporate these weak-lensing mass measurements into a self-consistent framework to simultaneously determine cluster scaling relations and cosmological parameters.
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
Li, Li; Stoeckert, Christian J.; Roos, David S.
2003-01-01
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of “recent” paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome. PMID:12952885
A taxonomy of epithelial human cancer and their metastases
2009-01-01
Background Microarray technology has allowed to molecularly characterize many different cancer sites. This technology has the potential to individualize therapy and to discover new drug targets. However, due to technological differences and issues in standardized sample collection no study has evaluated the molecular profile of epithelial human cancer in a large number of samples and tissues. Additionally, it has not yet been extensively investigated whether metastases resemble their tissue of origin or tissue of destination. Methods We studied the expression profiles of a series of 1566 primary and 178 metastases by unsupervised hierarchical clustering. The clustering profile was subsequently investigated and correlated with clinico-pathological data. Statistical enrichment of clinico-pathological annotations of groups of samples was investigated using Fisher exact test. Gene set enrichment analysis (GSEA) and DAVID functional enrichment analysis were used to investigate the molecular pathways. Kaplan-Meier survival analysis and log-rank tests were used to investigate prognostic significance of gene signatures. Results Large clusters corresponding to breast, gastrointestinal, ovarian and kidney primary tissues emerged from the data. Chromophobe renal cell carcinoma clustered together with follicular differentiated thyroid carcinoma, which supports recent morphological descriptions of thyroid follicular carcinoma-like tumors in the kidney and suggests that they represent a subtype of chromophobe carcinoma. We also found an expression signature identifying primary tumors of squamous cell histology in multiple tissues. Next, a subset of ovarian tumors enriched with endometrioid histology clustered together with endometrium tumors, confirming that they share their etiopathogenesis, which strongly differs from serous ovarian tumors. In addition, the clustering of colon and breast tumors correlated with clinico-pathological characteristics. Moreover, a signature was developed based on our unsupervised clustering of breast tumors and this was predictive for disease-specific survival in three independent studies. Next, the metastases from ovarian, breast, lung and vulva cluster with their tissue of origin while metastases from colon showed a bimodal distribution. A significant part clusters with tissue of origin while the remaining tumors cluster with the tissue of destination. Conclusion Our molecular taxonomy of epithelial human cancer indicates surprising correlations over tissues. This may have a significant impact on the classification of many cancer sites and may guide pathologists, both in research and daily practice. Moreover, these results based on unsupervised analysis yielded a signature predictive of clinical outcome in breast cancer. Additionally, we hypothesize that metastases from gastrointestinal origin either remember their tissue of origin or adapt to the tissue of destination. More specifically, colon metastases in the liver show strong evidence for such a bimodal tissue specific profile. PMID:20017941
Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M
2008-01-01
Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. Conclusion The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. Method We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit. PMID:18992163
Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M
2008-11-07
Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit.
Boesten, Rolf; Schuren, Frank; Wind, Richèle D; Knol, Jan; de Vos, Willem M
2011-09-01
A total of 20 Bifidobacterium strains were isolated from fecal samples of 4 breast- and bottle-fed infants and all were characterized as Bifidobacterium breve based on 16S rRNA gene sequence and metabolic analysis. These isolates were further characterized and compared to the type strains of B. breve and 7 other Bifidobacterium spp. by comparative genome hybridization. For this purpose, we constructed and used a DNA-based microarray containing over 2000 randomly cloned DNA fragments from B. breve type strain LMG13208. This molecular analysis revealed a high degree of genomic variation between the isolated strains and allowed the vast majority to be grouped into 4 clusters. One cluster contained a single isolate that was virtually indistinguishable from the B. breve type strain. The 3 other clusters included 19 B. breve strains that differed considerably from all type strains. Remarkably, each of the 4 clusters included strains that were isolated from a single infant, indicating that a niche adaptation may contribute to variation within the B. breve species. Based on genomic hybridization data, the new B. breve isolates were estimated to contain approximately 60-90% of the genes of the B. breve type strain, attesting to the existence of various subspecies within the species B. breve. Further bioinformatic analysis identified several hundred diagnostic clones specific to the genomic clustering of the B. breve isolates. Molecular analysis of representatives of these revealed that annotated genes from the conserved B. breve core encoded mainly housekeeping functions, while the strain-specific genes were predicted to code for functions related to life style, such as carbohydrate metabolism and transport. This is compatible with genetic adaptation of the strains to their niche, a combination of infants and diet. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Quantitative analysis of single-molecule superresolution images
Coltharp, Carla; Yang, Xinxing; Xiao, Jie
2014-01-01
This review highlights the quantitative capabilities of single-molecule localization-based superresolution imaging methods. In addition to revealing fine structural details, the molecule coordinate lists generated by these methods provide the critical ability to quantify the number, clustering, and colocalization of molecules with 10 – 50 nm resolution. Here we describe typical workflows and precautions for quantitative analysis of single-molecule superresolution images. These guidelines include potential pitfalls and essential control experiments, allowing critical assessment and interpretation of superresolution images. PMID:25179006
Dupuy, Céline; Morignat, Eric; Maugey, Xavier; Vinard, Jean-Luc; Hendrikx, Pascal; Ducrot, Christian; Calavas, Didier; Gay, Emilie
2013-04-30
The slaughterhouse is a central processing point for food animals and thus a source of both demographic data (age, breed, sex) and health-related data (reason for condemnation and condemned portions) that are not available through other sources. Using these data for syndromic surveillance is therefore tempting. However many possible reasons for condemnation and condemned portions exist, making the definition of relevant syndromes challenging.The objective of this study was to determine a typology of cattle with at least one portion of the carcass condemned in order to define syndromes. Multiple factor analysis (MFA) in combination with clustering methods was performed using both health-related data and demographic data. Analyses were performed on 381,186 cattle with at least one portion of the carcass condemned among the 1,937,917 cattle slaughtered in ten French abattoirs. Results of the MFA and clustering methods led to 12 clusters considered as stable according to year of slaughter and slaughterhouse. One cluster was specific to a disease of public health importance (cysticercosis). Two clusters were linked to the slaughtering process (fecal contamination of heart or lungs and deterioration lesions). Two clusters respectively characterized by chronic liver lesions and chronic peritonitis could be linked to diseases of economic importance to farmers. Three clusters could be linked respectively to reticulo-pericarditis, fatty liver syndrome and farmer's lung syndrome, which are related to both diseases of economic importance to farmers and herd management issues. Three clusters respectively characterized by arthritis, myopathy and Dark Firm Dry (DFD) meat could notably be linked to animal welfare issues. Finally, one cluster, characterized by bronchopneumonia, could be linked to both animal health and herd management issues. The statistical approach of combining multiple factor analysis with cluster analysis showed its relevance for the detection of syndromes using available large and complex slaughterhouse data. The advantages of this statistical approach are to i) define groups of reasons for condemnation based on meat inspection data, ii) help grouping reasons for condemnation among a list of various possible reasons for condemnation for which a consensus among experts could be difficult to reach, iii) assign each animal to a single syndrome which allows the detection of changes in trends of syndromes to detect unusual patterns in known diseases and emergence of new diseases.
McLean, K G; Hanson, D J; Jervis, S M; Drake, M A
2017-11-01
Bacon is one of the most recognizable consumer pork products and is differentiated by appearance, flavor, thickness, and several possible product claims. The objective of this study was to explore the attributes of retail bacon that influence consumers to purchase and consume bacon. An Adaptive Choice-Based Conjoint (ACBC) survey was designed for attributes of raw American-style bacon. An ACBC survey (N = 1410 consumers) and Kano questioning were applied to determine the key attributes that influenced consumer purchase. Attributes included package size, brand, thickness, label claims, flavor, price, and images of the bacon package displaying fat:lean ratio. Maximum Difference Scaling (MaxDiff) was used to rank appeal of 20 different bacon images with variable fat:lean ration and slice shape. The most important attribute for bacon purchase was price followed by fat:lean appearance and then flavor. Three consumer clusters were identified with distinct preferences. For 2 clusters, price was not the primary attribute. Understanding preferences of distinct consumer clusters will enable manufacturers to target consumers and make more appealing bacon. Adaptive Choice-Based Conjoint (ACBC) is a research technique that allows consumers to react to assembled products and identify product attributes that they prefer. Kano questions allow researchers to look at the individual aspects of a product and understand consumer sentiment and expectations towards those product qualities while Maximum Difference scaling allows consumers to directly rank single attributes of a product relative to one another. A combination of these 3 approaches can provide key understandings on consumer perception of retail bacon allowing companies to optimize and maximize their development and advertising resources. © 2017 Institute of Food Technologists®.
Cluster stability in the analysis of mass cytometry data.
Melchiotti, Rossella; Gracio, Filipe; Kordasti, Shahram; Todd, Alan K; de Rinaldis, Emanuele
2017-01-01
Manual gating has been traditionally applied to cytometry data sets to identify cells based on protein expression. The advent of mass cytometry allows for a higher number of proteins to be simultaneously measured on cells, therefore providing a means to define cell clusters in a high dimensional expression space. This enhancement, whilst opening unprecedented opportunities for single cell-level analyses, makes the incremental replacement of manual gating with automated clustering a compelling need. To this aim many methods have been implemented and their successful applications demonstrated in different settings. However, the reproducibility of automatically generated clusters is proving challenging and an analytical framework to distinguish spurious clusters from more stable entities, and presumably more biologically relevant ones, is still missing. One way to estimate cell clusters' stability is the evaluation of their consistent re-occurrence within- and between-algorithms, a metric that is commonly used to evaluate results from gene expression. Herein we report the usage and importance of cluster stability evaluations, when applied to results generated from three popular clustering algorithms - SPADE, FLOCK and PhenoGraph - run on four different data sets. These algorithms were shown to generate clusters with various degrees of statistical stability, many of them being unstable. By comparing the results of automated clustering with manually gated populations, we illustrate how information on cluster stability can assist towards a more rigorous and informed interpretation of clustering results. We also explore the relationships between statistical stability and other properties such as clusters' compactness and isolation, demonstrating that whilst cluster stability is linked to other properties it cannot be reliably predicted by any of them. Our study proposes the introduction of cluster stability as a necessary checkpoint for cluster interpretation and contributes to the construction of a more systematic and standardized analytical framework for the assessment of cytometry clustering results. © 2016 International Society for Advancement of Cytometry. © 2016 International Society for Advancement of Cytometry.
NASA Astrophysics Data System (ADS)
Franke, R.
2016-11-01
In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.
LoCuSS: weak-lensing mass calibration of galaxy clusters
NASA Astrophysics Data System (ADS)
Okabe, Nobuhiro; Smith, Graham P.
2016-10-01
We present weak-lensing mass measurements of 50 X-ray luminous galaxy clusters at 0.15 ≤ z ≤ 0.3, based on uniform high-quality observations with Suprime-Cam mounted on the 8.2-m Subaru telescope. We pay close attention to possible systematic biases, aiming to control them at the ≲4 per cent level. The dominant source of systematic bias in weak-lensing measurements of the mass of individual galaxy clusters is contamination of background galaxy catalogues by faint cluster and foreground galaxies. We extend our conservative method for selecting background galaxies with (V - I') colours redder than the red sequence of cluster members to use a colour-cut that depends on cluster-centric radius. This allows us to define background galaxy samples that suffer ≤1 per cent contamination, and comprise 13 galaxies per square arcminute. Thanks to the purity of our background galaxy catalogue, the largest systematic that we identify in our analysis is a shape measurement bias of 3 per cent, that we measure using simulations that probe weak shears up to g = 0.3. Our individual cluster mass and concentration measurements are in excellent agreement with predictions of the mass-concentration relation. Equally, our stacked shear profile is in excellent agreement with the Navarro Frenk and White profile. Our new Local Cluster Substructure Survey mass measurements are consistent with the Canadian Cluster Cosmology Project and Cluster Lensing And Supernova Survey with Hubble surveys, and in tension with the Weighing the Giants at ˜1σ-2σ significance. Overall, the consensus at z ≤ 0.3 that is emerging from these complementary surveys represents important progress for cluster mass calibration, and augurs well for cluster cosmology.
Track structure in radiation biology: theory and applications.
Nikjoo, H; Uehara, S; Wilson, W E; Hoshi, M; Goodhead, D T
1998-04-01
A brief review is presented of the basic concepts in track structure and the relative merit of various theoretical approaches adopted in Monte-Carlo track-structure codes are examined. In the second part of the paper, a formal cluster analysis is introduced to calculate cluster-distance distributions. Total experimental ionization cross-sections were least-square fitted and compared with the calculation by various theoretical methods. Monte-Carlo track-structure code Kurbuc was used to examine and compare the spectrum of the secondary electrons generated by using functions given by Born-Bethe, Jain-Khare, Gryzinsky, Kim-Rudd, Mott and Vriens' theories. The cluster analysis in track structure was carried out using the k-means method and Hartigan algorithm. Data are presented on experimental and calculated total ionization cross-sections: inverse mean free path (IMFP) as a function of electron energy used in Monte-Carlo track-structure codes; the spectrum of secondary electrons generated by different functions for 500 eV primary electrons; cluster analysis for 4 MeV and 20 MeV alpha-particles in terms of the frequency of total cluster energy to the root-mean-square (rms) radius of the cluster and differential distance distributions for a pair of clusters; and finally relative frequency distribution for energy deposited in DNA, single-strand break and double-strand breaks for 10MeV/u protons, alpha-particles and carbon ions. There are a number of Monte-Carlo track-structure codes that have been developed independently and the bench-marking presented in this paper allows a better choice of the theoretical method adopted in a track-structure code to be made. A systematic bench-marking of cross-sections and spectra of the secondary electrons shows differences between the codes at atomic level, but such differences are not significant in biophysical modelling at the macromolecular level. Clustered-damage evaluation shows: that a substantial proportion of dose ( 30%) is deposited by low-energy electrons; the majority of DNA damage lesions are of simple type; the complexity of damage increases with increased LET, while the total yield of strand breaks remains constant; and at high LET values nearly 70% of all double-strand breaks are of complex type.
Hollunder, Jens; Friedel, Maik; Kuiper, Martin; Wilhelm, Thomas
2010-04-01
Many large 'omics' datasets have been published and many more are expected in the near future. New analysis methods are needed for best exploitation. We have developed a graphical user interface (GUI) for easy data analysis. Our discovery of all significant substructures (DASS) approach elucidates the underlying modularity, a typical feature of complex biological data. It is related to biclustering and other data mining approaches. Importantly, DASS-GUI also allows handling of multi-sets and calculation of statistical significances. DASS-GUI contains tools for further analysis of the identified patterns: analysis of the pattern hierarchy, enrichment analysis, module validation, analysis of additional numerical data, easy handling of synonymous names, clustering, filtering and merging. Different export options allow easy usage of additional tools such as Cytoscape. Source code, pre-compiled binaries for different systems, a comprehensive tutorial, case studies and many additional datasets are freely available at http://www.ifr.ac.uk/dass/gui/. DASS-GUI is implemented in Qt.
NASA Astrophysics Data System (ADS)
Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
NASA Astrophysics Data System (ADS)
Giorgino, Toni; Laio, Alessandro; Rodriguez, Alex
2017-08-01
Molecular dynamics (MD) simulations allow the exploration of the phase space of biopolymers through the integration of equations of motion of their constituent atoms. The analysis of MD trajectories often relies on the choice of collective variables (CVs) along which the dynamics of the system is projected. We developed a graphical user interface (GUI) for facilitating the interactive choice of the appropriate CVs. The GUI allows: defining interactively new CVs; partitioning the configurations into microstates characterized by similar values of the CVs; calculating the free energies of the microstates for both unbiased and biased (metadynamics) simulations; clustering the microstates in kinetic basins; visualizing the free energy landscape as a function of a subset of the CVs used for the analysis. A simple mouse click allows one to quickly inspect structures corresponding to specific points in the landscape.
Aarabi, A; Grebe, R; Berquin, P; Bourel Ponchel, E; Jalin, C; Fohlen, M; Bulteau, C; Delalande, O; Gondry, C; Héberlé, C; Moullart, V; Wallois, F
2012-06-01
This case study aims to demonstrate that spatiotemporal spike discrimination and source analysis are effective to monitor the development of sources of epileptic activity in time and space. Therefore, they can provide clinically useful information allowing a better understanding of the pathophysiology of individual seizures with time- and space-resolved characteristics of successive epileptic states, including interictal, preictal, postictal, and ictal states. High spatial resolution scalp EEGs (HR-EEG) were acquired from a 2-year-old girl with refractory central epilepsy and single-focus seizures as confirmed by intracerebral EEG recordings and ictal single-photon emission computed tomography (SPECT). Evaluation of HR-EEG consists of the following three global steps: (1) creation of the initial head model, (2) automatic spike and seizure detection, and finally (3) source localization. During the source localization phase, epileptic states are determined to allow state-based spike detection and localization of underlying sources for each spike. In a final cluster analysis, localization results are integrated to determine the possible sources of epileptic activity. The results were compared with the cerebral locations identified by intracerebral EEG recordings and SPECT. The results obtained with this approach were concordant with those of MRI, SPECT and distribution of intracerebral potentials. Dipole cluster centres found for spikes in interictal, preictal, ictal and postictal states were situated an average of 6.3mm from the intracerebral contacts with the highest voltage. Both amplitude and shape of spikes change between states. Dispersion of the dipoles was higher in the preictal state than in the postictal state. Two clusters of spikes were identified. The centres of these clusters changed position periodically during the various epileptic states. High-resolution surface EEG evaluated by an advanced algorithmic approach can be used to investigate the spatiotemporal characteristics of sources located in the epileptic focus. The results were validated by standard methods, ensuring good spatial resolution by MRI and SPECT and optimal temporal resolution by intracerebral EEG. Surface EEG can be used to identify different spike clusters and sources of the successive epileptic states. The method that was used in this study will provide physicians with a better understanding of the pathophysiological characteristics of epileptic activities. In particular, this method may be useful for more effective positioning of implantable intracerebral electrodes. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
NASA Astrophysics Data System (ADS)
Wagner-Kaiser, R.; Mackey, Dougal; Sarajedini, Ata; Chaboyer, Brian; Cohen, Roger E.; Yang, Soung-Chul; Cummings, Jeffrey D.; Geisler, Doug; Grocholski, Aaron J.
2017-11-01
We analyse Hubble Space Telescope observations of six globular clusters in the Large Magellanic Cloud (LMC) from programme GO-14164 in Cycle 23. These are the deepest available observations of the LMC globular cluster population; their uniformity facilitates a precise comparison with globular clusters in the Milky Way. Measuring the magnitude of the main-sequence turn-off point relative to template Galactic globular clusters allows the relative ages of the clusters to be determined with a mean precision of 8.4 per cent, and down to 6 per cent for individual objects. We find that the mean age of our LMC cluster ensemble is identical to the mean age of the oldest metal-poor clusters in the Milky Way halo to 0.2 ± 0.4 Gyr. This provides the most sensitive test to date of the synchronicity of the earliest epoch of globular cluster formation in two independent galaxies. Horizontal branch magnitudes and subdwarf fitting to the main sequence allow us to determine distance estimates for each cluster and examine their geometric distribution in the LMC. Using two different methods, we find an average distance to the LMC of 18.52 ± 0.05.
Dynamic clustering detection through multi-valued descriptors of dermoscopic images.
Cozza, Valentina; Guarracino, Maria Rosario; Maddalena, Lucia; Baroni, Adone
2011-09-10
This paper introduces a dynamic clustering methodology based on multi-valued descriptors of dermoscopic images. The main idea is to support medical diagnosis to decide if pigmented skin lesions belonging to an uncertain set are nearer to malignant melanoma or to benign nevi. Melanoma is the most deadly skin cancer, and early diagnosis is a current challenge for clinicians. Most data analysis algorithms for skin lesions discrimination focus on segmentation and extraction of features of categorical or numerical type. As an alternative approach, this paper introduces two new concepts: first, it considers multi-valued data that scalar variables not only describe but also intervals or histogram variables; second, it introduces a dynamic clustering method based on Wasserstein distance to compare multi-valued data. The overall strategy of analysis can be summarized into the following steps: first, a segmentation of dermoscopic images allows to identify a set of multi-valued descriptors; second, we performed a discriminant analysis on a set of images where there is an a priori classification so that it is possible to detect which features discriminate the benign and malignant lesions; and third, we performed the proposed dynamic clustering method on the uncertain cases, which need to be associated to one of the two previously mentioned groups. Results based on clinical data show that the grading of specific descriptors associated to dermoscopic characteristics provides a novel way to characterize uncertain lesions that can help the dermatologist's diagnosis. Copyright © 2011 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
ZuHone, J. A.; Kowalik, K.; Öhman, E.; Lau, E.; Nagai, D.
2018-01-01
We present the “Galaxy Cluster Merger Catalog.” This catalog provides an extensive suite of mock observations and related data for N-body and hydrodynamical simulations of galaxy cluster mergers and clusters from cosmological simulations. These mock observations consist of projections of a number of important observable quantities in several different wavebands, as well as along different lines of sight through each simulation domain. The web interface to the catalog consists of easily browsable images over epoch and projection direction, as well as download links for the raw data and a JS9 interface for interactive data exploration. The data are presented within a consistent format so that comparison between simulations is straightforward. All of the data products are provided in the standard Flexible Image Transport System file format. The data are being stored on the yt Hub (http://hub.yt), which allows for remote access and analysis using a Jupyter notebook server. Future versions of the catalog will include simulations from a number of research groups and a variety of research topics related to the study of interactions of galaxy clusters with each other and with their member galaxies. The catalog is located at http://gcmc.hub.yt.
Modeling and Testing Dark Energy and Gravity with Galaxy Cluster Data
NASA Astrophysics Data System (ADS)
Rapetti, David; Cataneo, Matteo; Heneka, Caroline; Mantz, Adam; Allen, Steven W.; Von Der Linden, Anja; Schmidt, Fabian; Lombriser, Lucas; Li, Baojiu; Applegate, Douglas; Kelly, Patrick; Morris, Glenn
2018-06-01
The abundance of galaxy clusters is a powerful probe to constrain the properties of dark energy and gravity at large scales. We employed a self-consistent analysis that includes survey, observable-mass scaling relations and weak gravitational lensing data to obtain constraints on f(R) gravity, which are an order of magnitude tighter than the best previously achieved, as well as on cold dark energy of negligible sound speed. The latter implies clustering of the dark energy fluid at all scales, allowing us to measure the effects of dark energy perturbations at cluster scales. For this study, we recalibrated the halo mass function using the following non-linear characteristic quantities: the spherical collapse threshold, the virial overdensity and an additional mass contribution for cold dark energy. We also presented a new modeling of the f(R) gravity halo mass function that incorporates novel corrections to capture key non-linear effects of the Chameleon screening mechanism, as found in high resolution N-body simulations. All these results permit us to predict, as I will also exemplify, and eventually obtain the next generation of cluster constraints on such models, and provide us with frameworks that can also be applied to other proposed dark energy and modified gravity models using cluster abundance observations.
Strain-Level Diversity of Secondary Metabolism in Streptomyces albus
Seipke, Ryan F.
2015-01-01
Streptomyces spp. are robust producers of medicinally-, industrially- and agriculturally-important small molecules. Increased resistance to antibacterial agents and the lack of new antibiotics in the pipeline have led to a renaissance in natural product discovery. This endeavor has benefited from inexpensive high quality DNA sequencing technology, which has generated more than 140 genome sequences for taxonomic type strains and environmental Streptomyces spp. isolates. Many of the sequenced streptomycetes belong to the same species. For instance, Streptomyces albus has been isolated from diverse environmental niches and seven strains have been sequenced, consequently this species has been sequenced more than any other streptomycete, allowing valuable analyses of strain-level diversity in secondary metabolism. Bioinformatics analyses identified a total of 48 unique biosynthetic gene clusters harboured by Streptomyces albus strains. Eighteen of these gene clusters specify the core secondary metabolome of the species. Fourteen of the gene clusters are contained by one or more strain and are considered auxiliary, while 16 of the gene clusters encode the production of putative strain-specific secondary metabolites. Analysis of Streptomyces albus strains suggests that each strain of a Streptomyces species likely harbours at least one strain-specific biosynthetic gene cluster. Importantly, this implies that deep sequencing of a species will not exhaust gene cluster diversity and will continue to yield novelty. PMID:25635820
Efficiency and credit ratings: a permutation-information-theory analysis
NASA Astrophysics Data System (ADS)
Fernandez Bariviera, Aurelio; Zunino, Luciano; Belén Guercio, M.; Martinez, Lisana B.; Rosso, Osvaldo A.
2013-08-01
The role of credit rating agencies has been under severe scrutiny after the subprime crisis. In this paper we explore the relationship between credit ratings and informational efficiency of a sample of thirty nine corporate bonds of US oil and energy companies from April 2008 to November 2012. For this purpose we use a powerful statistical tool, relatively new in the financial literature: the complexity-entropy causality plane. This representation space allows us to graphically classify the different bonds according to their degree of informational efficiency. We find that this classification agrees with the credit ratings assigned by Moody’s. In particular, we detect the formation of two clusters, which correspond to the global categories of investment and speculative grades. Regarding the latter cluster, two subgroups reflect distinct levels of efficiency. Additionally, we also find an intriguing absence of correlation between informational efficiency and firm characteristics. This allows us to conclude that the proposed permutation-information-theory approach provides an alternative practical way to justify bond classification.
Alerts Analysis and Visualization in Network-based Intrusion Detection Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Dr. Li
2010-08-01
The alerts produced by network-based intrusion detection systems, e.g. Snort, can be difficult for network administrators to efficiently review and respond to due to the enormous number of alerts generated in a short time frame. This work describes how the visualization of raw IDS alert data assists network administrators in understanding the current state of a network and quickens the process of reviewing and responding to intrusion attempts. The project presented in this work consists of three primary components. The first component provides a visual mapping of the network topology that allows the end-user to easily browse clustered alerts. Themore » second component is based on the flocking behavior of birds such that birds tend to follow other birds with similar behaviors. This component allows the end-user to see the clustering process and provides an efficient means for reviewing alert data. The third component discovers and visualizes patterns of multistage attacks by profiling the attacker s behaviors.« less
Cluster optical coding: from biochips to counterfeit security
NASA Astrophysics Data System (ADS)
Haglmueller, Jakob; Alguel, Yilmaz; Mayer, Christian; Matyushin, Viacheslav; Bauer, Georg; Pittner, Fritz; Leitner, Alfred; Aussenegg, Franz R.; Schalkhammer, Thomas G.
2004-07-01
Spatially tuned resonant nano-clusters allow high local field enhancement when exited by electromagnetic radiation. A number of phenomena had been described and subsequently applied to novel nano- and bionano-devices. Decisive for these types of devices and sensors is the precise nanometric assembly, coupling the local field surrounding a cluster to allow resonance with other elements interacting with this field. In particular, the distance cluster-mirror or cluster-fluorophore gives rise to a variety of enhancement phenomena. High throughput transducers using metal cluster resonance technology are based on surface-enhancement of metal cluster light absorption (SEA). The optical property for the analytical application of metal cluster films is the so-called anomalous absorption. At a well defined nanometric distance of a cluster to a mirror the reflected electromagnetic field has the same phase at the position of the absorbing cluster as the incident fields. This feedback mechanism strongly enhances the effective cluster absorption coefficient. The system is characterised by a narrow reflection minimum. Based on this SEA-phenomenon (licensed to and further developed and optimized by NovemberAG, Germany Erlangen) a number of commercial products have been constructed. Brandsealing(R) uses the patented SEA cluster technology to produce optical codings. Cluster SEA thin film systems show a characteristic color-flip effect and are extremely mechanically and thermally robust. This is the basis for its application as an unique security feature. The specific spectroscopic properties as e.g. narrow band multi-resonance of the cluster layers allow the authentication of the optical code which can be easily achieved with a mobile hand-held reader developed by november AG and Siemens AG. Thus, these features are machine-readable which makes them superior to comparable technologies. Cluster labels are available in two formats: as a label for tamper-proof product packaging, and as a direct label, where label and logo are permanently applied directly and unremovable to the product surface. Together with Infineon Technologies and HUECK FOLIEN, the SEA technology is currently developed as a direct label for e.g. SmartCards.
Electrofacies analysis for coal lithotype profiling based on high-resolution wireline log data
NASA Astrophysics Data System (ADS)
Roslin, A.; Esterle, J. S.
2016-06-01
The traditional approach to coal lithotype analysis is based on a visual characterisation of coal in core, mine or outcrop exposures. As not all wells are fully cored, the petroleum and coal mining industries increasingly use geophysical wireline logs for lithology interpretation.This study demonstrates a method for interpreting coal lithotypes from geophysical wireline logs, and in particular discriminating between bright or banded, and dull coal at similar densities to a decimetre level. The study explores the optimum combination of geophysical log suites for training the coal electrofacies interpretation, using neural network conception, and then propagating the results to wells with fewer wireline data. This approach is objective and has a recordable reproducibility and rule set.In addition to conventional gamma ray and density logs, laterolog resistivity, microresistivity and PEF data were used in the study. Array resistivity data from a compact micro imager (CMI tool) were processed into a single microresistivity curve and integrated with the conventional resistivity data in the cluster analysis. Microresistivity data were tested in the analysis to test the hypothesis that the improved vertical resolution of microresistivity curve can enhance the accuracy of the clustering analysis. The addition of PEF log allowed discrimination between low density bright to banded coal electrofacies and low density inertinite-rich dull electrofacies.The results of clustering analysis were validated statistically and the results of the electrofacies results were compared to manually derived coal lithotype logs.
ICM: a web server for integrated clustering of multi-dimensional biomedical data.
He, Song; He, Haochen; Xu, Wenjian; Huang, Xin; Jiang, Shuai; Li, Fei; He, Fuchu; Bo, Xiaochen
2016-07-08
Large-scale efforts for parallel acquisition of multi-omics profiling continue to generate extensive amounts of multi-dimensional biomedical data. Thus, integrated clustering of multiple types of omics data is essential for developing individual-based treatments and precision medicine. However, while rapid progress has been made, methods for integrated clustering are lacking an intuitive web interface that facilitates the biomedical researchers without sufficient programming skills. Here, we present a web tool, named Integrated Clustering of Multi-dimensional biomedical data (ICM), that provides an interface from which to fuse, cluster and visualize multi-dimensional biomedical data and knowledge. With ICM, users can explore the heterogeneity of a disease or a biological process by identifying subgroups of patients. The results obtained can then be interactively modified by using an intuitive user interface. Researchers can also exchange the results from ICM with collaborators via a web link containing a Project ID number that will directly pull up the analysis results being shared. ICM also support incremental clustering that allows users to add new sample data into the data of a previous study to obtain a clustering result. Currently, the ICM web server is available with no login requirement and at no cost at http://biotech.bmi.ac.cn/icm/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Paladino, Simona; Lebreton, Stéphanie; Lelek, Mickaël; Riccio, Patrizia; De Nicola, Sergio; Zimmer, Christophe; Zurzolo, Chiara
2017-12-01
Spatio-temporal compartmentalization of membrane proteins is critical for the regulation of diverse vital functions in eukaryotic cells. It was previously shown that, at the apical surface of polarized MDCK cells, glycosylphosphatidylinositol (GPI)-anchored proteins (GPI-APs) are organized in small cholesterol-independent clusters of single GPI-AP species (homoclusters), which are required for the formation of larger cholesterol-dependent clusters formed by multiple GPI-AP species (heteroclusters). This clustered organization is crucial for the biological activities of GPI-APs; hence, understanding the spatio-temporal properties of their membrane organization is of fundamental importance. Here, by using direct stochastic optical reconstruction microscopy coupled to pair correlation analysis (pc-STORM), we were able to visualize and measure the size of these clusters. Specifically, we show that they are non-randomly distributed and have an average size of 67 nm. We also demonstrated that polarized MDCK and non-polarized CHO cells have similar cluster distribution and size, but different sensitivity to cholesterol depletion. Finally, we derived a model that allowed a quantitative characterization of the cluster organization of GPI-APs at the apical surface of polarized MDCK cells for the first time. Experimental FRET (fluorescence resonance energy transfer)/FLIM (fluorescence-lifetime imaging microscopy) data were correlated to the theoretical predictions of the model. © 2017 The Author(s).
Mwangi, Benson; Soares, Jair C; Hasan, Khader M
2014-10-30
Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
Community structure from spectral properties in complex networks
NASA Astrophysics Data System (ADS)
Servedio, V. D. P.; Colaiori, F.; Capocci, A.; Caldarelli, G.
2005-06-01
We analyze the spectral properties of complex networks focusing on their relation to the community structure, and develop an algorithm based on correlations among components of different eigenvectors. The algorithm applies to general weighted networks, and, in a suitably modified version, to the case of directed networks. Our method allows to correctly detect communities in sharply partitioned graphs, however it is useful to the analysis of more complex networks, without a well defined cluster structure, as social and information networks. As an example, we test the algorithm on a large scale data-set from a psychological experiment of free word association, where it proves to be successful both in clustering words, and in uncovering mental association patterns.
Whistler Waves Driven by Anisotropic Strahl Velocity Distributions: Cluster Observations
NASA Technical Reports Server (NTRS)
Vinas, A.F.; Gurgiolo, C.; Nieves-Chinchilla, T.; Gary, S. P.; Goldstein, M. L.
2010-01-01
Observed properties of the strahl using high resolution 3D electron velocity distribution data obtained from the Cluster/PEACE experiment are used to investigate its linear stability. An automated method to isolate the strahl is used to allow its moments to be computed independent of the solar wind core+halo. Results show that the strahl can have a high temperature anisotropy (T(perpindicular)/T(parallell) approximately > 2). This anisotropy is shown to be an important free energy source for the excitation of high frequency whistler waves. The analysis suggests that the resultant whistler waves are strong enough to regulate the electron velocity distributions in the solar wind through pitch-angle scattering
The CNO Bi-cycle in the Open Cluster NGC 752
NASA Astrophysics Data System (ADS)
Hawkins, Keith; Schuler, S.; King, J.; The, L.
2011-01-01
The CNO bi-cycle is the primary energy source for main sequence stars more massive than the sun. To test our understanding of stellar evolution models using the CNO bi-cycle, we have undertaken light-element (CNO) abundance analysis of three main sequence dwarf stars and three red giant stars in the open cluster NGC 752 utilizing high resolution (R 50,000) spectroscopy from the Keck Observatory. Preliminary results indicate, as expected, there is a depletion of carbon in the giants relative to the dwarfs. Additional analysis is needed to determine if the amount of depletion is in line with model predictions, as seen in the Hyades open cluster. Oxygen abundances are derived from the high-excitation O I triplet, and there is a 0.19 dex offset in the [O/H] abundances between the giants and dwarfs which may be explained by non-local thermodynamic equilibrium (NLTE), although further analysis is needed to verify this. The standard procedure for spectroscopically determining stellar parameters used here allows for a measurement of the cluster metallicity, [Fe/H] = 0.04 ± 0.02. In addition to the Fe abundances we have determined Na, Mg, and Al abundances to determine the status of other nucleosynthesis processes. The Na, Mg and Al abundances of the giants are enhanced relative to the dwarfs, which is consistent with similar findings in giants of other open clusters. Support for K. Hawkins was provided by the NOAO/KPNO Research Experiences for Undergraduates (REU) Program which is funded by the National Science Foundation Research Experiences for Undergraduates Program and the Department of Defense ASSURE program through Scientific Program Order No. 13 (AST-0754223) of the Cooperative Agreement No. AST-0132798 between the Association of Universities for Research in Astronomy (AURA) and the NSF.
Mingers, Daniel; Köhler, Denis; Huchzermeier, Christian; Hinrichs, Günter
2017-01-01
Does the Youth Psychopathic Traits Inventory identify one or more high-risk subgroups among young offenders? Which recommendations for possible courses of action can be derived for individual clinical or forensic cases? Method: Model-based cluster analysis (Raftery, 1995) was conducted on a sample of young offenders (N = 445, age 14–22 years, M = 18.5, SD = 1.65). The resulting model was then tested for differences between clusters with relevant context variables of psychopathy. The variables included measures of intelligence, social competence, drug use, and antisocial behavior. Results: Three clusters were found (Low Trait, Impulsive/Irresponsible, Psychopathy) that differ highly significantly concerning YPI scores and the variables mentioned above. The YPI Scores Δ Low = 4.28 (Low Trait – Impulsive/Irresponsible) and Δ High = 6.86 (Impulsive/Irresponsible – Psychopathy) were determined to be thresholds between the clusters. The allocation of a person to be assessed within the calculated clusters allows for an orientation of consequent tests beyond the diagnosis of psychopathy. We conclude that the YPI is a valuable instrument for the assessment of young offenders, as it yields clinically and forensically relevant information concerning the cause and expected development of psychopathological behavior.
Shocks and cold fronts in merging and massive galaxy clusters: new detections with Chandra
NASA Astrophysics Data System (ADS)
Botteon, A.; Gastaldello, F.; Brunetti, G.
2018-06-01
A number of merging galaxy clusters show the presence of shocks and cold fronts, i.e. sharp discontinuities in surface brightness and temperature. The observation of these features requires an X-ray telescope with high spatial resolution like Chandra, and allows to study important aspects concerning the physics of the intracluster medium (ICM), such as its thermal conduction and viscosity, as well as to provide information on the physical conditions leading to the acceleration of cosmic rays and magnetic field amplification in the cluster environment. In this work we search for new discontinuities in 15 merging and massive clusters observed with Chandra by using different imaging and spectral techniques of X-ray observations. Our analysis led to the discovery of 22 edges: six shocks, eight cold fronts, and eight with uncertain origin. All the six shocks detected have M< 2 derived from density and temperature jumps. This work contributed to increase the number of discontinuities detected in clusters and shows the potential of combining diverse approaches aimed to identify edges in the ICM. A radio follow-up of the shocks discovered in this paper will be useful to study the connection between weak shocks and radio relics.
The Nature of Red-Sequence Cluster Spiral Galaxies
NASA Astrophysics Data System (ADS)
Kashur, Lane; Barkhouse, Wayne; Sultanova, Madina; Kalawila Vithanage, Sandanuwa; Archer, Haylee; Foote, Gregory; Mathew, Elijah; Rude, Cody; Lopez-Cruz, Omar
2017-01-01
Preliminary analysis of the red-sequence galaxy population from a sample of 57 low-redshift galaxy clusters observed using the KPNO 0.9m telescope and 74 clusters from the WINGS dataset, indicates that a small fraction of red-sequence galaxies have a morphology consistent with spiral systems. For spiral galaxies to acquire the color of elliptical/S0s at a similar luminosity, they must either have been stripped of their star-forming gas at an earlier epoch, or contain a larger than normal fraction of dust. To test these ideas we have compiled a sample of red-sequence spiral galaxies and examined their infrared properties as measured by 2MASS, WISE, Spitzer, and Herschel. These IR data allows us to estimate the amount of dust in each of our red-sequence spiral galaxies. We compare the estimated dust mass in each of these red-sequence late-type galaxies with spiral galaxies located in the same cluster field but having colors inconsistent with the red-sequence. We thus provide a statistical measure to discriminate between purely passive spiral galaxy evolution and dusty spirals to explain the presence of these late-type systems in cluster red-sequences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spinella, Corrado; Bongiorno, Corrado; Nicotra, Giuseppe
2005-07-25
We present an analytical methodology, based on electron energy loss spectroscopy (EELS) and energy-filtered transmission electron microscopy, which allows us to quantify the clustered silicon concentration in annealed substoichiometric silicon oxide layers, deposited by plasma-enhanced chemical vapor deposition. The clustered Si volume fraction was deduced from a fit to the experimental EELS spectrum using a theoretical description proposed to calculate the dielectric function of a system of spherical particles of equal radii, located at random in a host material. The methodology allowed us to demonstrate that the clustered Si concentration is only one half of the excess Si concentration dissolvedmore » in the layer.« less
NASA Astrophysics Data System (ADS)
Okolelova, Ella; Shibaeva, Marina; Shalnev, Oleg
2018-03-01
The article analyses risks in high-rise construction in terms of investment value with account of the maximum probable loss in case of risk event. The authors scrutinized the risks of high-rise construction in regions with various geographic, climatic and socio-economic conditions that may influence the project environment. Risk classification is presented in general terms, that includes aggregated characteristics of risks being common for many regions. Cluster analysis tools, that allow considering generalized groups of risk depending on their qualitative and quantitative features, were used in order to model the influence of the risk factors on the implementation of investment project. For convenience of further calculations, each type of risk is assigned a separate code with the number of the cluster and the subtype of risk. This approach and the coding of risk factors makes it possible to build a risk matrix, which greatly facilitates the task of determining the degree of impact of risks. The authors clarified and expanded the concept of the price risk, which is defined as the expected value of the event, 105 which extends the capabilities of the model, allows estimating an interval of the probability of occurrence and also using other probabilistic methods of calculation.
GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.
Schulz, Tizian; Stoye, Jens; Doerr, Daniel
2018-05-08
Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
NASA Astrophysics Data System (ADS)
Yoon, Mijin; Jee, Myungkook James; Tyson, Tony
2018-01-01
The Deep Lens Survey (DLS), a precursor to the Large Synoptic Survey Telescope (LSST), is a 20 sq. deg survey carried out with NOAO’s Blanco and Mayall telescopes. The strength of the survey lies in its depth reaching down to ~27th mag in BVRz bands. This enables a broad redshift baseline study and allows us to investigate cosmological evolution of the large-scale structure. In this poster, we present the first cosmological analysis from the DLS using galaxy-shear correlations and galaxy clustering signals. Our DLS shear calibration accuracy has been validated through the most recent public weak-lensing data challenge. Photometric redshift systematic errors are tested by performing lens-source flip tests. Instead of real-space correlations, we reconstruct band-limited power spectra for cosmological parameter constraints. Our analysis puts a tight constraint on the matter density and the power spectrum normalization parameters. Our results are highly consistent with our previous cosmic shear analysis and also with the Planck CMB results.
[Comparative analysis of variable regions in the genomes of variola virus].
Babkin, I V; Nepomniashchikh, T S; Maksiutov, R A; Gutorov, V V; Babkina, I N; Shchelkunov, S N
2008-01-01
Nucleotide sequences of two extended segments of the terminal variable regions in variola virus genome were determined. The size of the left segment was 13.5 kbp and of the right, 10.5 kbp. Totally, over 540 kbp were sequenced for 22 variola virus strains. The conducted phylogenetic analysis and the data published earlier allowed us to find the interrelations between 70 variola virus isolates, the character of their clustering, and the degree of intergroup and intragroup variations of the clusters of variola virus strains. The most polymorphic loci of the genome segments studied were determined. It was demonstrated that that these loci are localized to either noncoding genome regions or to the regions of destroyed open reading frames, characteristic of the ancestor virus. These loci are promising for development of the strategy for genotyping variola virus strains. Analysis of recombination using various methods demonstrated that, with the only exception, no statistically significant recombinational events in the genomes of variola virus strains studied were detectable.
Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S; Allard, Marc W; Brown, Eric W; Strain, Errol A
2017-01-01
A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents.
Chen, Yi; Luo, Yan; Curry, Phillip; Timme, Ruth; Melka, David; Doyle, Matthew; Parish, Mickey; Hammack, Thomas S.; Allard, Marc W.; Brown, Eric W.; Strain, Errol A.
2017-01-01
A listeriosis outbreak in the United States implicated contaminated ice cream produced by one company, which operated 3 facilities. We performed single nucleotide polymorphism (SNP)-based whole genome sequencing (WGS) analysis on Listeria monocytogenes from food, environmental and clinical sources, identifying two clusters and a single branch, belonging to PCR serogroup IIb and genetic lineage I. WGS Cluster I, representing one outbreak strain, contained 82 food and environmental isolates from Facility I and 4 clinical isolates. These isolates differed by up to 29 SNPs, exhibited 9 pulsed-field gel electrophoresis (PFGE) profiles and multilocus sequence typing (MLST) sequence type (ST) 5 of clonal complex 5 (CC5). WGS Cluster II contained 51 food and environmental isolates from Facility II, 4 food isolates from Facility I and 5 clinical isolates. Among them the isolates from Facility II and clinical isolates formed a clade and represented another outbreak strain. Isolates in this clade differed by up to 29 SNPs, exhibited 3 PFGE profiles and ST5. The only isolate collected from Facility III belonged to singleton ST489, which was in a single branch separate from Clusters I and II, and was not associated with the outbreak. WGS analyses clustered together outbreak-associated isolates exhibiting multiple PFGE profiles, while differentiating them from epidemiologically unrelated isolates that exhibited outbreak PFGE profiles. The complete genome of a Cluster I isolate allowed the identification and analyses of putative prophages, revealing that Cluster I isolates differed by the gain or loss of three putative prophages, causing the banding pattern differences among all 3 AscI-PFGE profiles observed in Cluster I isolates. WGS data suggested that certain ice cream varieties and/or production lines might have contamination sources unique to them. The SNP-based analysis was able to distinguish CC5 as a group from non-CC5 isolates and differentiate among CC5 isolates from different outbreaks/incidents. PMID:28166293
Sloan, Chantel D.; Nordsborg, Rikke B.; Jacquez, Geoffrey M.; Raaschou-Nielsen, Ole; Meliker, Jaymie R.
2015-01-01
Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population. PMID:25756204
Sloan, Chantel D; Nordsborg, Rikke B; Jacquez, Geoffrey M; Raaschou-Nielsen, Ole; Meliker, Jaymie R
2015-01-01
Though the etiology is largely unknown, testicular cancer incidence has seen recent significant increases in northern Europe and throughout many Western regions. The most common cancer in males under age 40, age period cohort models have posited exposures in the in utero environment or in early childhood as possible causes of increased risk of testicular cancer. Some of these factors may be tied to geography through being associated with behavioral, cultural, sociodemographic or built environment characteristics. If so, this could result in detectable geographic clusters of cases that could lead to hypotheses regarding environmental targets for intervention. Given a latency period between exposure to an environmental carcinogen and testicular cancer diagnosis, mobility histories are beneficial for spatial cluster analyses. Nearest-neighbor based Q-statistics allow for the incorporation of changes in residency in spatial disease cluster detection. Using these methods, a space-time cluster analysis was conducted on a population-wide case-control population selected from the Danish Cancer Registry with mobility histories since 1971 extracted from the Danish Civil Registration System. Cases (N=3297) were diagnosed between 1991 and 2003, and two sets of controls (N=3297 for each set) matched on sex and date of birth were included in the study. We also examined spatial patterns in maternal residential history for those cases and controls born in 1971 or later (N= 589 case-control pairs). Several small clusters were detected when aligning individuals by year prior to diagnosis, age at diagnosis and calendar year of diagnosis. However, the largest of these clusters contained only 2 statistically significant individuals at their center, and were not replicated in SaTScan spatial-only analyses which are less susceptible to multiple testing bias. We found little evidence of local clusters in residential histories of testicular cancer cases in this Danish population.
Study of parameters of the nearest neighbour shared algorithm on clustering documents
NASA Astrophysics Data System (ADS)
Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni
2018-03-01
Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well with different parameter values.
Stratification of co-evolving genomic groups using ranked phylogenetic profiles
Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A
2009-01-01
Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884
Diffuse light and building history of the galaxy cluster Abell 2667
NASA Astrophysics Data System (ADS)
Covone, G.; Adami, C.; Durret, F.; Kneib, J.-P.; Lima Neto, G. B.; Slezak, E.
2006-12-01
Aims.We searched for diffuse intracluster light in the galaxy cluster Abell 2667 (z=0.233) from HST images in three broad band-filters. Methods: .We applied an iterative multi-scale wavelet analysis and reconstruction technique to these images, which allows to subtract stars and galaxies from the original images. Results: .We detect a zone of diffuse emission southwest of the cluster center (DS1) and a second faint object (ComDif) within DS1. Another diffuse source (DS2) may be detected at lower confidence level northeast of the center. These sources of diffuse light contribute to 10-15% of the total visible light in the cluster. Whether they are independent entities or part of the very elliptical external envelope of the central galaxy remains unclear. Deep VLT VIMOS integral field spectroscopy reveals a faint continuum at the positions of DS1 and ComDif but do not allow a redshift to be computed, so we conclude if these sources are part of the central galaxy or not. A hierarchical substructure detection method reveals the presence of several galaxy pairs and groups defining a similar direction to the one drawn by the DS1 - central galaxy - DS2 axis. The analysis of archive XMM-Newton and Chandra observations shows X-ray emission elongated in the same direction. The X-ray temperature map shows the presence of a cool core, a broad cool zone stretching from north to south, and hotter regions towards the northeast, southwest, and northwest. This might suggest shock fronts along these directions produced by infalling material, even if uncertainties remain quite large on the temperature determination far from the center. Conclusions: .These various data are consistent with a picture in which diffuse sources are concentrations of tidal debris and harassed matter expelled from infalling galaxies by tidal stripping and undergoing an accretion process onto the central cluster galaxy; as such, they are expected to be found along the main infall directions. Note, however, that the limited signal to noise of the various data and the apparent lack of large numbers of well-defined independent tidal tails, besides the one named ComDif, preclude definitive conclusions on this scenario.
Technical structure of the global nanoscience and nanotechnology literature
NASA Astrophysics Data System (ADS)
Kostoff, Ronald N.; Koytcheff, Raymond G.; Lau, Clifford G. Y.
2007-10-01
Text mining was used to extract technical intelligence from the open source global nanotechnology and nanoscience research literature. An extensive nanotechnology/nanoscience-focused query was applied to the Science Citation Index/Social Science Citation Index (SCI/SSCI) databases. The nanotechnology/nanoscience research literature technical structure (taxonomy) was obtained using computational linguistics/document clustering and factor analysis. The infrastructure (prolific authors, key journals/institutions/countries, most cited authors/journals/documents) for each of the clusters generated by the document clustering algorithm was obtained using bibliometrics. Another novel addition was the use of phrase auto-correlation maps to show technical thrust areas based on phrase co-occurrence in Abstracts, and the use of phrase-phrase cross-correlation maps to show technical thrust areas based on phrase relations due to the sharing of common co-occurring phrases. The ˜400 most cited nanotechnology papers since 1991 were grouped, and their characteristics generated. Whereas the main analysis provided technical thrusts of all nanotechnology papers retrieved, analysis of the most cited papers allowed their characteristics to be displayed. Finally, most cited papers from selected time periods were extracted, along with all publications from those time periods, and the institutions and countries were compared based on their representation in the most cited documents list relative to their representation in the most publications list.
Embedded Star Formation in the Eagle Nebula with Spitzer GLIMPSE
NASA Astrophysics Data System (ADS)
Indebetouw, R.; Robitaille, T. P.; Whitney, B. A.; Churchwell, E.; Babler, B.; Meade, M.; Watson, C.; Wolfire, M.
2007-09-01
We present new Spitzer photometry of the Eagle Nebula (M16, containing the optical cluster NGC 6611) combined with near-infrared photometry from 2MASS. We use dust radiative transfer models, mid-infrared and near-infrared color-color analysis, and mid-infrared spectral indices to analyze point-source spectral energy distributions, select candidate YSOs, and constrain their mass and evolutionary state. Comparison of the different protostellar selection methods shows that mid-infrared methods are consistent, but as has been known for some time, near-infrared-only analysis misses some young objects. We reveal more than 400 protostellar candidates, including one massive YSO that has not been previously highlighted. The YSO distribution supports a picture of distributed low-level star formation, with no strong evidence of triggered star formation in the ``pillars.'' We confirm the youth of NGC 6611 by a large fraction of infrared excess sources and reveal a younger cluster of YSOs in the nearby molecular cloud. Analysis of the YSO clustering properties shows a possible imprint of the molecular cloud's Jeans length. Multiwavelength mid-IR imaging thus allows us to analyze the protostellar population, to measure the dust temperature and column density, and to relate these in a consistent picture of star formation in M16.
Low-level processing for real-time image analysis
NASA Technical Reports Server (NTRS)
Eskenazi, R.; Wilf, J. M.
1979-01-01
A system that detects object outlines in television images in real time is described. A high-speed pipeline processor transforms the raw image into an edge map and a microprocessor, which is integrated into the system, clusters the edges, and represents them as chain codes. Image statistics, useful for higher level tasks such as pattern recognition, are computed by the microprocessor. Peak intensity and peak gradient values are extracted within a programmable window and are used for iris and focus control. The algorithms implemented in hardware and the pipeline processor architecture are described. The strategy for partitioning functions in the pipeline was chosen to make the implementation modular. The microprocessor interface allows flexible and adaptive control of the feature extraction process. The software algorithms for clustering edge segments, creating chain codes, and computing image statistics are also discussed. A strategy for real time image analysis that uses this system is given.
Ultrafast dynamics in atomic clusters: Analysis and control
Bonačić-Koutecký, Vlasta; Mitrić, Roland; Werner, Ute; Wöste, Ludger; Berry, R. Stephen
2006-01-01
We present a study of dynamics and ultrafast observables in the frame of pump–probe negative-to-neutral-to-positive ion (NeNePo) spectroscopy illustrated by the examples of bimetallic trimers Ag2Au−/Ag2Au/Ag2Au+ and silver oxides Ag3O2−/Ag3O2/Ag3O2+ in the context of cluster reactivity. First principle multistate adiabatic dynamics allows us to determine time scales of different ultrafast processes and conditions under which these processes can be experimentally observed. Furthermore, we present a strategy for optimal pump–dump control in complex systems based on the ab initio Wigner distribution approach and apply it to tailor laser fields for selective control of the isomerization process in Na3F2. The shapes of pulses can be assigned to underlying processes, and therefore control can be used as a tool for analysis. PMID:16740664
Ultrafast dynamics in atomic clusters: analysis and control.
Bonacić-Koutecký, Vlasta; Mitrić, Roland; Werner, Ute; Wöste, Ludger; Berry, R Stephen
2006-07-11
We present a study of dynamics and ultrafast observables in the frame of pump-probe negative-to-neutral-to-positive ion (NeNePo) spectroscopy illustrated by the examples of bimetallic trimers Ag2Au-/Ag2Au/Ag2Au+ and silver oxides Ag3O2-/Ag3O2/Ag3O2+ in the context of cluster reactivity. First principle multistate adiabatic dynamics allows us to determine time scales of different ultrafast processes and conditions under which these processes can be experimentally observed. Furthermore, we present a strategy for optimal pump-dump control in complex systems based on the ab initio Wigner distribution approach and apply it to tailor laser fields for selective control of the isomerization process in Na3F2. The shapes of pulses can be assigned to underlying processes, and therefore control can be used as a tool for analysis.
Clark, David J; Fondrie, William E; Liao, Zhongping; Hanson, Phyllis I; Fulton, Amy; Mao, Li; Yang, Austin J
2015-10-20
Exosomes are microvesicles of endocytic origin constitutively released by multiple cell types into the extracellular environment. With evidence that exosomes can be detected in the blood of patients with various malignancies, the development of a platform that uses exosomes as a diagnostic tool has been proposed. However, it has been difficult to truly define the exosome proteome due to the challenge of discerning contaminant proteins that may be identified via mass spectrometry using various exosome enrichment strategies. To better define the exosome proteome in breast cancer, we incorporated a combination of Tandem-Mass-Tag (TMT) quantitative proteomics approach and Support Vector Machine (SVM) cluster analysis of three conditioned media derived fractions corresponding to a 10 000g cellular debris pellet, a 100 000g crude exosome pellet, and an Optiprep enriched exosome pellet. The quantitative analysis identified 2 179 proteins in all three fractions, with known exosomal cargo proteins displaying at least a 2-fold enrichment in the exosome fraction based on the TMT protein ratios. Employing SVM cluster analysis allowed for the classification 251 proteins as "true" exosomal cargo proteins. This study provides a robust and vigorous framework for the future development of using exosomes as a potential multiprotein marker phenotyping tool that could be useful in breast cancer diagnosis and monitoring disease progression.
NASA Astrophysics Data System (ADS)
Carraro, G.; Villanova, S.; Demarque, P.; Moni Bidin, C.; McSwain, M. V.
2008-05-01
We report on a new, wide-field (20 × 20 arcmin2), multicolour (UBVI), photometric campaign in the area of the nearby old open cluster NGC 2112. At the same time, we provide medium-resolution spectroscopy of 35 (and high-resolution of additional 5) red giant and turn-off stars. This material is analysed with the aim to update the fundamental parameters of this traditionally difficult cluster, which is very sparse and suffers from heavy field star contamination. Among the 40 stars with spectra, we identified 21 bona fide radial velocity members which allow us to put more solid constraints on the cluster's metal abundance, long suggested to be as low as the metallicity of globulars. As indicated earlier by us on a purely photometric basis, the cluster [Fe/H] abundance is slightly supersolar ([Fe/H] = 0.16 +/- 0.03) and close to the Hyades value, as inferred from a detailed abundance analysis of three of the five stars with higher resolution spectra. Abundance ratios are also marginally supersolar. Based on this result, we revise the properties of NGC 2112 using stellar models from the Padova and Yale-Yonsei groups. For this metal abundance, we find that the cluster's age, reddening and distance values are 1.8 Gyr, 0.60 mag and 940 pc, respectively. Both the Yale-Yonsei and Padova models predict the same values for the fundamental parameters within the errors. Overall, NGC 2112 is a typical solar neighbourhood, thin-disc star cluster, sharing the same chemical properties of F-G stars and open clusters close to the Sun. This investigation outlines the importance of a detailed membership analysis in the study of disc star clusters. This paper includes data gathered with the 6.5 Magellan Telescopes, located at Las Campanas Observatory, Chile. The data discussed in this paper will be made available at the WEBDA open cluster data base http://www.univie.ac.at/webda, which is maintained by E. Paunzen and J.-C. Mermilliod. ‡ E-mail: gcarraro@eso.org (GC); sandro.villanova@unipd.it (SV); demarque@astro.yale.edu (PD); mbidin@das.uchile.cl (CMB); mcswain@lehigh.edu(MVM)
Taranto, F; D'Agostino, N; Greco, B; Cardi, T; Tripodi, P
2016-11-21
Knowledge on population structure and genetic diversity in vegetable crops is essential for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Herein we used the GBS approach for the genome-wide identification of SNPs in a collection of Capsicum spp. accessions and for the assessment of the level of genetic diversity in a subset of 222 cultivated pepper (Capsicum annum) genotypes. GBS analysis generated a total of 7,568,894 master tags, of which 43.4% uniquely aligned to the reference genome CM334. A total of 108,591 SNP markers were identified, of which 105,184 were in C. annuum accessions. In order to explore the genetic diversity of C. annuum and to select a minimal core set representing most of the total genetic variation with minimum redundancy, a subset of 222 C. annuum accessions were analysed using 32,950 high quality SNPs. Based on Bayesian and Hierarchical clustering it was possible to divide the collection into three clusters. Cluster I had the majority of varieties and landraces mainly from Southern and Northern Italy, and from Eastern Europe, whereas clusters II and III comprised accessions of different geographical origins. Considering the genome-wide genetic variation among the accessions included in cluster I, a second round of Bayesian (K = 3) and Hierarchical (K = 2) clustering was performed. These analysis showed that genotypes were grouped not only based on geographical origin, but also on fruit-related features. GBS data has proven useful to assess the genetic diversity in a collection of C. annuum accessions. The high number of SNP markers, uniformly distributed on the 12 chromosomes, allowed the accessions to be distinguished according to geographical origin and fruit-related features. SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs.
Resolving galaxy cluster gas properties at z ˜ 1 with XMM-Newton and Chandra
NASA Astrophysics Data System (ADS)
Bartalucci, I.; Arnaud, M.; Pratt, G. W.; Démoclès, J.; van der Burg, R. F. J.; Mazzotta, P.
2017-02-01
Massive, high-redshift, galaxy clusters are useful laboratories to test cosmological models and to probe structure formation and evolution, but observations are challenging due to cosmological dimming and angular distance effects. Here we present a pilot X-ray study of the five most massive (M500 > 5 × 1014M⊙), distant (z 1), clusters detected via the Sunyaev-Zel'Dovich effect. We optimally combine XMM-Newton and Chandra X-ray observations by leveraging the throughput of XMM-Newton to obtain spatially-resolved spectroscopy, and the spatial resolution of Chandra to probe the bright inner parts and to detect embedded point sources. Capitalising on the excellent agreement in flux-related measurements, we present a new method to derive the density profiles, which are constrained in the centre by Chandra and in the outskirts by XMM-Newton. We show that the Chandra-XMM-Newton combination is fundamental for morphological analysis at these redshifts, the Chandra resolution being required to remove point source contamination, and the XMM-Newton sensitivity allowing higher significance detection of faint substructures. Measuring the morphology using images from both instruments, we found that the sample is dominated by dynamically disturbed objects. We use the combined Chandra-XMM-Newton density profiles and spatially-resolved temperature profiles to investigate thermodynamic quantities including entropy and pressure. From comparison of the scaled profiles with the local REXCESS sample, we find no significant departure from standard self-similar evolution, within the dispersion, at any radius, except for the entropy beyond 0.7 R500. The baryon mass fraction tends towards the cosmic value, with a weaker dependence on mass than that observed in the local Universe. We make a comparison with the predictions from numerical simulations. The present pilot study demonstrates the utility and feasibility of spatially-resolved analysis of individual objects at high-redshift through the combination of XMM-Newton and Chandra observations. Observations of a larger sample will allow a fuller statistical analysis to be undertaken, in particular of the intrinsic scatter in the structural and scaling properties of the cluster population.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balestra, I.; Sartoris, B.; Girardi, M.
2016-06-01
We present VIMOS-Very Large Telescope (VLT) spectroscopy of the Frontier Fields cluster MACS J0416.1-2403 ( z = 0.397). Taken as part of the CLASH-VLT survey, the large spectroscopic campaign provided more than 4000 reliable redshifts over ∼600 arcmin{sup 2}, including ∼800 cluster member galaxies. The unprecedented sample of cluster members at this redshift allows us to perform a highly detailed dynamical and structural analysis of the cluster out to ∼2.2 r {sub 200} (∼4 Mpc). Our analysis of substructures reveals a complex system composed of a main massive cluster ( M {sub 200} ∼ 0.9 × 10{sup 15} M {sub ⊙} and σ{sub V,r200} ∼ 1000 km s{supmore » −1}) presenting two major features: (i) a bimodal velocity distribution, showing two central peaks separated by Δ V {sub rf} ∼ 1100 km s{sup −1} with comparable galaxy content and velocity dispersion, and (ii) a projected elongation of the main substructures along the NE–SW direction, with a prominent sub-clump ∼600 kpc SW of the center and an isolated BCG approximately halfway between the center and the SW clump. We also detect a low-mass structure at z ∼ 0.390, ∼10′ south of the cluster center, projected at ∼3 Mpc, with a relative line-of-sight velocity of Δ V{sub rf} ∼ −1700 km s{sup −1}. The cluster mass profile that we obtain through our dynamical analysis deviates significantly from the “universal” NFW, being best fit by a Softened Isothermal Sphere model instead. The mass profile measured from the galaxy dynamics is found to be in relatively good agreement with those obtained from strong and weak lensing, as well as with that from the X-rays, despite the clearly unrelaxed nature of the cluster. Our results reveal an overall complex dynamical state of this massive cluster and support the hypothesis that the two main subclusters are being observed in a pre-collisional phase, in agreement with recent findings from radio and deep X-ray data. In this article, we also release the entire redshift catalog of 4386 sources in the field of this cluster, which includes 60 identified Chandra X-ray sources and 105 JVLA radio sources.« less
The cosmological analysis of X-ray cluster surveys. III. 4D X-ray observable diagrams
NASA Astrophysics Data System (ADS)
Pierre, M.; Valotti, A.; Faccioli, L.; Clerc, N.; Gastaud, R.; Koulouridis, E.; Pacaud, F.
2017-11-01
Context. Despite compelling theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties surrounding cluster-mass estimates. Aims: Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. Methods: We model the population of detected clusters in the count-rate - hardness-ratio - angular size - redshift space and compare the corresponding four-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are estimated by averaging the results from ten independent survey realisations. The method allows a simultaneous fit of the cosmological parameters of the cluster evolutionary physics and of the selection effects. Results: When using information from the X-ray survey alone plus redshifts, this approach is shown to be as accurate as the modelling of the mass function for the cosmological parameters and to perform better for the cluster physics, for a similar level of assumptions on the scaling relations. It enables the identification of degenerate combinations of parameter values. Conclusions: Given the considerably shorter computer times involved for running the minimisation procedure in the observed parameter space, this method appears to clearly outperform traditional mass-based approaches when X-ray survey data alone are available.
COOL CORE CLUSTERS FROM COSMOLOGICAL SIMULATIONS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rasia, E.; Borgani, S.; Murante, G.
2015-11-01
We present results obtained from a set of cosmological hydrodynamic simulations of galaxy clusters, aimed at comparing predictions with observational data on the diversity between cool-core (CC) and non-cool-core (NCC) clusters. Our simulations include the effects of stellar and active galactic nucleus (AGN) feedback and are based on an improved version of the smoothed particle hydrodynamics code GADGET-3, which ameliorates gas mixing and better captures gas-dynamical instabilities by including a suitable artificial thermal diffusion. In this Letter, we focus our analysis on the entropy profiles, the primary diagnostic we used to classify the degree of cool-coreness of clusters, and themore » iron profiles. In keeping with observations, our simulated clusters display a variety of behaviors in entropy profiles: they range from steadily decreasing profiles at small radii, characteristic of CC systems, to nearly flat core isentropic profiles, characteristic of NCC systems. Using observational criteria to distinguish between the two classes of objects, we find that they occur in similar proportions in both simulations and observations. Furthermore, we also find that simulated CC clusters have profiles of iron abundance that are steeper than those of NCC clusters, which is also in agreement with observational results. We show that the capability of our simulations to generate a realistic CC structure in the cluster population is due to AGN feedback and artificial thermal diffusion: their combined action allows us to naturally distribute the energy extracted from super-massive black holes and to compensate for the radiative losses of low-entropy gas with short cooling time residing in the cluster core.« less
Cool Core Clusters from Cosmological Simulations
NASA Astrophysics Data System (ADS)
Rasia, E.; Borgani, S.; Murante, G.; Planelles, S.; Beck, A. M.; Biffi, V.; Ragone-Figueroa, C.; Granato, G. L.; Steinborn, L. K.; Dolag, K.
2015-11-01
We present results obtained from a set of cosmological hydrodynamic simulations of galaxy clusters, aimed at comparing predictions with observational data on the diversity between cool-core (CC) and non-cool-core (NCC) clusters. Our simulations include the effects of stellar and active galactic nucleus (AGN) feedback and are based on an improved version of the smoothed particle hydrodynamics code GADGET-3, which ameliorates gas mixing and better captures gas-dynamical instabilities by including a suitable artificial thermal diffusion. In this Letter, we focus our analysis on the entropy profiles, the primary diagnostic we used to classify the degree of cool-coreness of clusters, and the iron profiles. In keeping with observations, our simulated clusters display a variety of behaviors in entropy profiles: they range from steadily decreasing profiles at small radii, characteristic of CC systems, to nearly flat core isentropic profiles, characteristic of NCC systems. Using observational criteria to distinguish between the two classes of objects, we find that they occur in similar proportions in both simulations and observations. Furthermore, we also find that simulated CC clusters have profiles of iron abundance that are steeper than those of NCC clusters, which is also in agreement with observational results. We show that the capability of our simulations to generate a realistic CC structure in the cluster population is due to AGN feedback and artificial thermal diffusion: their combined action allows us to naturally distribute the energy extracted from super-massive black holes and to compensate for the radiative losses of low-entropy gas with short cooling time residing in the cluster core.
Order statistics applied to the most massive and most distant galaxy clusters
NASA Astrophysics Data System (ADS)
Waizmann, J.-C.; Ettori, S.; Bartelmann, M.
2013-06-01
In this work, we present an analytic framework for calculating the individual and joint distributions of the nth most massive or nth highest redshift galaxy cluster for a given survey characteristic allowing us to formulate Λ cold dark matter (ΛCDM) exclusion criteria. We show that the cumulative distribution functions steepen with increasing order, giving them a higher constraining power with respect to the extreme value statistics. Additionally, we find that the order statistics in mass (being dominated by clusters at lower redshifts) is sensitive to the matter density and the normalization of the matter fluctuations, whereas the order statistics in redshift is particularly sensitive to the geometric evolution of the Universe. For a fixed cosmology, both order statistics are efficient probes of the functional shape of the mass function at the high-mass end. To allow a quick assessment of both order statistics, we provide fits as a function of the survey area that allow percentile estimation with an accuracy better than 2 per cent. Furthermore, we discuss the joint distributions in the two-dimensional case and find that for the combination of the largest and the second largest observation, it is most likely to find them to be realized with similar values with a broadly peaked distribution. When combining the largest observation with higher orders, it is more likely to find a larger gap between the observations and when combining higher orders in general, the joint probability density function peaks more strongly. Having introduced the theory, we apply the order statistical analysis to the Southpole Telescope (SPT) massive cluster sample and metacatalogue of X-ray detected clusters of galaxies catalogue and find that the 10 most massive clusters in the sample are consistent with ΛCDM and the Tinker mass function. For the order statistics in redshift, we find a discrepancy between the data and the theoretical distributions, which could in principle indicate a deviation from the standard cosmology. However, we attribute this deviation to the uncertainty in the modelling of the SPT survey selection function. In turn, by assuming the ΛCDM reference cosmology, order statistics can also be utilized for consistency checks of the completeness of the observed sample and of the modelling of the survey selection function.
Goovaerts, Pierre; Jacquez, Geoffrey M
2004-01-01
Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930
FOSS GIS on the GFZ HPC cluster: Towards a service-oriented Scientific Geocomputation Environment
NASA Astrophysics Data System (ADS)
Loewe, P.; Klump, J.; Thaler, J.
2012-12-01
High performance compute clusters can be used as geocomputation workbenches. Their wealth of resources enables us to take on geocomputation tasks which exceed the limitations of smaller systems. These general capabilities can be harnessed via tools such as Geographic Information System (GIS), provided they are able to utilize the available cluster configuration/architecture and provide a sufficient degree of user friendliness to allow for wide application. While server-level computing is clearly not sufficient for the growing numbers of data- or computation-intense tasks undertaken, these tasks do not get even close to the requirements needed for access to "top shelf" national cluster facilities. So until recently such kind of geocomputation research was effectively barred due to lack access to of adequate resources. In this paper we report on the experiences gained by providing GRASS GIS as a software service on a HPC compute cluster at the German Research Centre for Geosciences using Platform Computing's Load Sharing Facility (LSF). GRASS GIS is the oldest and largest Free Open Source (FOSS) GIS project. During ramp up in 2011, multiple versions of GRASS GIS (v 6.4.2, 6.5 and 7.0) were installed on the HPC compute cluster, which currently consists of 234 nodes with 480 CPUs providing 3084 cores. Nineteen different processing queues with varying hardware capabilities and priorities are provided, allowing for fine-grained scheduling and load balancing. After successful initial testing, mechanisms were developed to deploy scripted geocomputation tasks onto dedicated processing queues. The mechanisms are based on earlier work by NETELER et al. (2008) and allow to use all 3084 cores for GRASS based geocomputation work. However, in practice applications are limited to fewer resources as assigned to their respective queue. Applications of the new GIS functionality comprise so far of hydrological analysis, remote sensing and the generation of maps of simulated tsunamis in the Mediterranean Sea for the Tsunami Atlas of the FP-7 TRIDEC Project (www.tridec-online.eu). This included the processing of complex problems, requiring significant amounts of processing time up to full 20 CPU days. This GRASS GIS-based service is provided as a research utility in the sense of "Software as a Service" (SaaS) and is a first step towards a GFZ corporate cloud service.
Phillips, Yvonne F; Towsey, Michael; Roe, Paul
2018-01-01
Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecological investigations. We describe a method that first reduces environmental audio to vectors of acoustic indices, which are then clustered. This can reduce the audio data by six to eight orders of magnitude yet retain useful ecological information. We describe techniques to visualise sequences of cluster occurrence (using for example, diel plots, rose plots) that assist interpretation of environmental audio. Colour coding acoustic clusters allows months and years of audio data to be visualised in a single image. These techniques are useful in identifying and indexing the contents of long-duration audio recordings. They could also play an important role in monitoring long-term changes in species abundance brought about by habitat degradation and/or restoration.
Towsey, Michael; Roe, Paul
2018-01-01
Audio recordings of the environment are an increasingly important technique to monitor biodiversity and ecosystem function. While the acquisition of long-duration recordings is becoming easier and cheaper, the analysis and interpretation of that audio remains a significant research area. The issue addressed in this paper is the automated reduction of environmental audio data to facilitate ecological investigations. We describe a method that first reduces environmental audio to vectors of acoustic indices, which are then clustered. This can reduce the audio data by six to eight orders of magnitude yet retain useful ecological information. We describe techniques to visualise sequences of cluster occurrence (using for example, diel plots, rose plots) that assist interpretation of environmental audio. Colour coding acoustic clusters allows months and years of audio data to be visualised in a single image. These techniques are useful in identifying and indexing the contents of long-duration audio recordings. They could also play an important role in monitoring long-term changes in species abundance brought about by habitat degradation and/or restoration. PMID:29494629
Cosmological parameter estimation from CMB and X-ray cluster after Planck
NASA Astrophysics Data System (ADS)
Hu, Jian-Wei; Cai, Rong-Gen; Guo, Zong-Kuan; Hu, Bin
2014-05-01
We investigate constraints on cosmological parameters in three 8-parameter models with the summed neutrino mass as a free parameter, by a joint analysis of CCCP X-ray cluster data, the newly released Planck CMB data as well as some external data sets including baryon acoustic oscillation measurements from the 6dFGS, SDSS DR7 and BOSS DR9 surveys, and Hubble Space Telescope H0 measurement. We find that the combined data strongly favor a non-zero neutrino masses at more than 3σ confidence level in these non-vanilla models. Allowing the CMB lensing amplitude AL to vary, we find AL > 1 at 3σ confidence level. For dark energy with a constant equation of state w, we obtain w < -1 at 3σ confidence level. The estimate of the matter power spectrum amplitude σ8 is discrepant with the Planck value at 2σ confidence level, which reflects some tension between X-ray cluster data and Planck data in these non-vanilla models. The tension can be alleviated by adding a 9% systematic shift in the cluster mass function.
Clustering the Orion B giant molecular cloud based on its molecular emission
Bron, Emeric; Daudon, Chloé; Pety, Jérôme; Levrier, François; Gerin, Maryvonne; Gratier, Pierre; Orkisz, Jan H.; Guzman, Viviana; Bardeau, Sébastien; Goicoechea, Javier R.; Liszt, Harvey; Öberg, Karin; Peretto, Nicolas; Sievers, Albrecht; Tremblin, Pascal
2017-01-01
Context Previous attempts at segmenting molecular line maps of molecular clouds have focused on using position-position-velocity data cubes of a single molecular line to separate the spatial components of the cloud. In contrast, wide field spectral imaging over a large spectral bandwidth in the (sub)mm domain now allows one to combine multiple molecular tracers to understand the different physical and chemical phases that constitute giant molecular clouds (GMCs). Aims We aim at using multiple tracers (sensitive to different physical processes and conditions) to segment a molecular cloud into physically/chemically similar regions (rather than spatially connected components), thus disentangling the different physical/chemical phases present in the cloud. Methods We use a machine learning clustering method, namely the Meanshift algorithm, to cluster pixels with similar molecular emission, ignoring spatial information. Clusters are defined around each maximum of the multidimensional Probability Density Function (PDF) of the line integrated intensities. Simple radiative transfer models were used to interpret the astrophysical information uncovered by the clustering analysis. Results A clustering analysis based only on the J = 1 – 0 lines of three isotopologues of CO proves suffcient to reveal distinct density/column density regimes (nH ~ 100 cm−3, ~ 500 cm−3, and > 1000 cm−3), closely related to the usual definitions of diffuse, translucent and high-column-density regions. Adding two UV-sensitive tracers, the J = 1 − 0 line of HCO+ and the N = 1 − 0 line of CN, allows us to distinguish two clearly distinct chemical regimes, characteristic of UV-illuminated and UV-shielded gas. The UV-illuminated regime shows overbright HCO+ and CN emission, which we relate to a photochemical enrichment effect. We also find a tail of high CN/HCO+ intensity ratio in UV-illuminated regions. Finer distinctions in density classes (nH ~ 7 × 103 cm−3 ~ 4 × 104 cm−3) for the densest regions are also identified, likely related to the higher critical density of the CN and HCO+ (1 – 0) lines. These distinctions are only possible because the high-density regions are spatially resolved. Conclusions Molecules are versatile tracers of GMCs because their line intensities bear the signature of the physics and chemistry at play in the gas. The association of simultaneous multi-line, wide-field mapping and powerful machine learning methods such as the Meanshift clustering algorithm reveals how to decode the complex information available in these molecular tracers. PMID:29456256
Austin, Peter C; Wagner, Philippe; Merlo, Juan
2017-03-15
Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., 'frailty') Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Moskovchenko, D V; Kurchatova, A N; Fefilov, N N; Yurtaev, A A
2017-05-01
The concentrations of several trace elements and iron were determined in 26 soil samples from Belyi Island in the Kara Sea (West Siberian sector of Russian Arctic). The major types of soils predominating in the soil cover were sampled. The concentrations of trace elements (mg kg -1 ) varied within the following ranges: 119-561 for Mn, 9.5-126 for Zn, 0.082-2.5 for Cd, <0.5-19.2 for Cu, <0.5-132 for Pb, 0.011-0.081 for Hg, <0.5-10.3 for Co, and 7.6-108 for Cr; the concentration of Fe varied from 3943 to 37,899 mg kg -1 . The impact of particular soil properties (pH, carbon and nitrogen contents, particle-size distribution) on metal concentrations was analyzed by the methods of correlation, cluster, and factor analyses. The correlation analysis showed that metal concentrations are negatively correlated with the sand content and positively correlated with the contents of silt and clay fractions. The cluster analysis allowed separation of the soils into three clusters. Cluster I included the soils with the high organic matter content formed under conditions of poor drainage; cluster II, the low-humus sandy soils of the divides and slopes; and cluster III, saline soils of coastal marshes. It was concluded that the geomorphic position largely controls the soil properties. The obtained data were compared with data on metal concentrations in other regions of the Russian Arctic. In general, the concentrations of trace elements in the studied soils were within the ranges typical of the background Arctic territories. However, some soils of Belyi Island contained elevated concentrations of Pb and Cd.
Frickenhaus, Stephan; Kannan, Srinivasaraghavan; Zacharias, Martin
2009-02-01
A direct conformational clustering and mapping approach for peptide conformations based on backbone dihedral angles has been developed and applied to compare conformational sampling of Met-enkephalin using two molecular dynamics (MD) methods. Efficient clustering in dihedrals has been achieved by evaluating all combinations resulting from independent clustering of each dihedral angle distribution, thus resolving all conformational substates. In contrast, Cartesian clustering was unable to accurately distinguish between all substates. Projection of clusters on dihedral principal component (PCA) subspaces did not result in efficient separation of highly populated clusters. However, representation in a nonlinear metric by Sammon mapping was able to separate well the 48 highest populated clusters in just two dimensions. In addition, this approach also allowed us to visualize the transition frequencies between clusters efficiently. Significantly, higher transition frequencies between more distinct conformational substates were found for a recently developed biasing-potential replica exchange MD simulation method allowing faster sampling of possible substates compared to conventional MD simulations. Although the number of theoretically possible clusters grows exponentially with peptide length, in practice, the number of clusters is only limited by the sampling size (typically much smaller), and therefore the method is well suited also for large systems. The approach could be useful to rapidly and accurately evaluate conformational sampling during MD simulations, to compare different sampling strategies and eventually to detect kinetic bottlenecks in folding pathways.
Konno, Takayuki; Yatsuyanagi, Jun; Saito, Shioko
2011-01-01
A total of 18 strains of EHEC O157:H7 were isolated from distinct cases in Akita Prefecture, Japan from July to September 2007. The genetic relatedness of these isolates was investigated by performing a multilocus variable number of tandem repeats analysis (MLVA) and a pulsed-field gel electrophoresis (PFGE) analysis using XbaI. The PFGE analyses allowed us to group these 18 isolates into three major clusters. The MLVA results correlated closely with those obtained by PFGE, although some variants were found within the clusters obtained by PFGE, thus highlighting the utility of this technique for determining a precise classification when it is difficult to differentiate between isolates with indistinguishable or very similar PFGE patterns. In addition, MLVA is a much easier and more rapid method than PFGE for analysis of the genetic relatedness of strains. Thus, as a second molecular epidemiological subtyping method, MLVA is useful for the regional outbreak surveillance of EHEC O157:H7 infections.
Support Vector Data Descriptions and k-Means Clustering: One Class?
Gornitz, Nico; Lima, Luiz Alberto; Muller, Klaus-Robert; Kloft, Marius; Nakajima, Shinichi
2017-09-27
We present ClusterSVDD, a methodology that unifies support vector data descriptions (SVDDs) and k-means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to k-means. In particular, our approach leads to a new interpretation of k-means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD.
González-Calabozo, Jose M; Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen
2016-09-15
Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.
The First Photometric Analysis of the Open Clusters Dolidze 32 and 36
NASA Astrophysics Data System (ADS)
Amin, M. Y.; Elsanhory, W. H.; Haroon, A. A.
2018-06-01
We present a first study of two open clusters Dolidze 32 and Dolidze 36 in the near-infrared region JHKs with the aid of PPMXL catalog. In our study, we used a method able to separate open cluster stars from those that belong to the stellar background. Our results of calculations indicate that for both cluster Dolidze 32 and Dolidze 36 the number of probable member is 286 and 780, respectively. We have estimated the cluster center for Dolidze 32 and Dolidze 36 are α = 18h41m4s.188 , δ = -04°04'57''.144 , α = 20h02m29s.95 , δ = 42°05'49''.2 , respectively. The limiting radius for both clusters Dolidze 32 and Dolidze 36 is about 0.94 ± 0.03 pc and 0.81 ± 0.03 pc, respectively. The Color Magnitude Diagram allows us to estimate the reddening E(B - V) = 1.41 ± 0.03 mag. for Dolidze 32 and E(B - V) = 0.19 ± 0.04 mag. for Dolidze 36 in such a way that the distance modulus (m - M) is 11.36 ± 0.02 and 10.10 ± 0.03 for both clusters, respectively. On the other hand, the luminosity and mass functions of these two open clusters, Dolidze 32 and Dolidze 36, have been estimated, showing that the estimated masses are 437 ± 21 M⊙ and 678 ± 26 M⊙, respectively, while the mass function slopes are -2.56 ± 0.62 and -2.01 ± 0.70 for Dolidze 32 and Dolidze 36, respectively. Finally, the dynamical state of these two clusters shows that only Dolidze 36 can be considered as a dynamically relaxed cluster.
Analysis of Chromobacterium sp. natural isolates from different Brazilian ecosystems
Lima-Bittencourt, Cláudia I; Astolfi-Filho, Spartaco; Chartone-Souza, Edmar; Santos, Fabrício R; Nascimento, Andréa MA
2007-01-01
Background Chromobacterium violaceum is a free-living bacterium able to survive under diverse environmental conditions. In this study we evaluate the genetic and physiological diversity of Chromobacterium sp. isolates from three Brazilian ecosystems: Brazilian Savannah (Cerrado), Atlantic Rain Forest and Amazon Rain Forest. We have analyzed the diversity with molecular approaches (16S rRNA gene sequences and amplified ribosomal DNA restriction analysis) and phenotypic surveys of antibiotic resistance and biochemistry profiles. Results In general, the clusters based on physiological profiles included isolates from two or more geographical locations indicating that they are not restricted to a single ecosystem. The isolates from Brazilian Savannah presented greater physiologic diversity and their biochemical profile was the most variable of all groupings. The isolates recovered from Amazon and Atlantic Rain Forests presented the most similar biochemical characteristics to the Chromobacterium violaceum ATCC 12472 strain. Clusters based on biochemical profiles were congruent with clusters obtained by the 16S rRNA gene tree. According to the phylogenetic analyses, isolates from the Amazon Rain Forest and Savannah displayed a closer relationship to the Chromobacterium violaceum ATCC 12472. Furthermore, 16S rRNA gene tree revealed a good correlation between phylogenetic clustering and geographic origin. Conclusion The physiological analyses clearly demonstrate the high biochemical versatility found in the C. violaceum genome and molecular methods allowed to detect the intra and inter-population diversity of isolates from three Brazilian ecosystems. PMID:17584942
Is the cluster environment quenching the Seyfert activity in elliptical and spiral galaxies?
NASA Astrophysics Data System (ADS)
de Souza, R. S.; Dantas, M. L. L.; Krone-Martins, A.; Cameron, E.; Coelho, P.; Hattab, M. W.; de Val-Borro, M.; Hilbe, J. M.; Elliott, J.; Hagen, A.; COIN Collaboration
2016-09-01
We developed a hierarchical Bayesian model (HBM) to investigate how the presence of Seyfert activity relates to their environment, herein represented by the galaxy cluster mass, M200, and the normalized cluster centric distance, r/r200. We achieved this by constructing an unbiased sample of galaxies from the Sloan Digital Sky Survey, with morphological classifications provided by the Galaxy Zoo Project. A propensity score matching approach is introduced to control the effects of confounding variables: stellar mass, galaxy colour, and star formation rate. The connection between Seyfert-activity and environmental properties in the de-biased sample is modelled within an HBM framework using the so-called logistic regression technique, suitable for the analysis of binary data (e.g. whether or not a galaxy hosts an AGN). Unlike standard ordinary least square fitting methods, our methodology naturally allows modelling the probability of Seyfert-AGN activity in galaxies on their natural scale, I.e. as a binary variable. Furthermore, we demonstrate how an HBM can incorporate information of each particular galaxy morphological type in an unified framework. In elliptical galaxies our analysis indicates a strong correlation of Seyfert-AGN activity with r/r200, and a weaker correlation with the mass of the host cluster. In spiral galaxies these trends do not appear, suggesting that the link between Seyfert activity and the properties of spiral galaxies are independent of the environment.
Roessner, Ute; Willmitzer, Lothar; Fernie, Alisdair R.
2001-01-01
We conducted a comprehensive metabolic phenotyping of potato (Solanum tuberosum L. cv Desiree) tuber tissue that had been modified either by transgenesis or exposure to different environmental conditions using a recently developed gas chromatography-mass spectrometry profiling protocol. Applying this technique, we were able to identify and quantify the major constituent metabolites of the potato tuber within a single chromatographic run. The plant systems that we selected to profile were tuber discs incubated in varying concentrations of fructose, sucrose, and mannitol and transgenic plants impaired in their starch biosynthesis. The resultant profiles were then compared, first at the level of individual metabolites and then using the statistical tools hierarchical cluster analysis and principal component analysis. These tools allowed us to assign clusters to the individual plant systems and to determine relative distances between these clusters; furthermore, analyzing the loadings of these analyses enabled identification of the most important metabolites in the definition of these clusters. The metabolic profiles of the sugar-fed discs were dramatically different from the wild-type steady-state values. When these profiles were compared with one another and also with those we assessed in previous studies, however, we were able to evaluate potential phenocopies. These comparisons highlight the importance of such an approach in the functional and qualitative assessment of diverse systems to gain insights into important mediators of metabolism. PMID:11706160
Rapid Disaster Damage Estimation
NASA Astrophysics Data System (ADS)
Vu, T. T.
2012-07-01
The experiences from recent disaster events showed that detailed information derived from high-resolution satellite images could accommodate the requirements from damage analysts and disaster management practitioners. Richer information contained in such high-resolution images, however, increases the complexity of image analysis. As a result, few image analysis solutions can be practically used under time pressure in the context of post-disaster and emergency responses. To fill the gap in employment of remote sensing in disaster response, this research develops a rapid high-resolution satellite mapping solution built upon a dual-scale contextual framework to support damage estimation after a catastrophe. The target objects are building (or building blocks) and their condition. On the coarse processing level, statistical region merging deployed to group pixels into a number of coarse clusters. Based on majority rule of vegetation index, water and shadow index, it is possible to eliminate the irrelevant clusters. The remaining clusters likely consist of building structures and others. On the fine processing level details, within each considering clusters, smaller objects are formed using morphological analysis. Numerous indicators including spectral, textural and shape indices are computed to be used in a rule-based object classification. Computation time of raster-based analysis highly depends on the image size or number of processed pixels in order words. Breaking into 2 level processing helps to reduce the processed number of pixels and the redundancy of processing irrelevant information. In addition, it allows a data- and tasks- based parallel implementation. The performance is demonstrated with QuickBird images captured a disaster-affected area of Phanga, Thailand by the 2004 Indian Ocean tsunami are used for demonstration of the performance. The developed solution will be implemented in different platforms as well as a web processing service for operational uses.
Chromium: A Stress-Processing Framework for Interactive Rendering on Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humphreys, G,; Houston, M.; Ng, Y.-R.
2002-01-11
We describe Chromium, a system for manipulating streams of graphics API commands on clusters of workstations. Chromium's stream filters can be arranged to create sort-first and sort-last parallel graphics architectures that, in many cases, support the same applications while using only commodity graphics accelerators. In addition, these stream filters can be extended programmatically, allowing the user to customize the stream transformations performed by nodes in a cluster. Because our stream processing mechanism is completely general, any cluster-parallel rendering algorithm can be either implemented on top of or embedded in Chromium. In this paper, we give examples of real-world applications thatmore » use Chromium to achieve good scalability on clusters of workstations, and describe other potential uses of this stream processing technology. By completely abstracting the underlying graphics architecture, network topology, and API command processing semantics, we allow a variety of applications to run in different environments.« less
Laboratory-based validation of the baseline sensors of the ITER diagnostic residual gas analyzer
NASA Astrophysics Data System (ADS)
Klepper, C. C.; Biewer, T. M.; Marcus, C.; Andrew, P.; Gardner, W. L.; Graves, V. B.; Hughes, S.
2017-10-01
The divertor-specific ITER Diagnostic Residual Gas Analyzer (DRGA) will provide essential information relating to DT fusion plasma performance. This includes pulse-resolving measurements of the fuel isotopic mix reaching the pumping ducts, as well as the concentration of the helium generated as the ash of the fusion reaction. In the present baseline design, the cluster of sensors attached to this diagnostic's differentially pumped analysis chamber assembly includes a radiation compatible version of a commercial quadrupole mass spectrometer, as well as an optical gas analyzer using a plasma-based light excitation source. This paper reports on a laboratory study intended to validate the performance of this sensor cluster, with emphasis on the detection limit of the isotopic measurement. This validation study was carried out in a laboratory set-up that closely prototyped the analysis chamber assembly configuration of the baseline design. This includes an ITER-specific placement of the optical gas measurement downstream from the first turbine of the chamber's turbo-molecular pump to provide sufficient light emission while preserving the gas dynamics conditions that allow for \\textasciitilde 1 s response time from the sensor cluster [1].
A statistical study of EMIC waves observed by Cluster: 1. Wave properties
NASA Astrophysics Data System (ADS)
Allen, R. C.; Zhang, J.-C.; Kistler, L. M.; Spence, H. E.; Lin, R.-L.; Klecker, B.; Dunlop, M. W.; André, M.; Jordanova, V. K.
2015-07-01
Electromagnetic ion cyclotron (EMIC) waves are an important mechanism for particle energization and losses inside the magnetosphere. In order to better understand the effects of these waves on particle dynamics, detailed information about the occurrence rate, wave power, ellipticity, normal angle, energy propagation angle distributions, and local plasma parameters are required. Previous statistical studies have used in situ observations to investigate the distribution of these parameters in the magnetic local time versus L-shell (MLT-L) frame within a limited magnetic latitude (MLAT) range. In this study, we present a statistical analysis of EMIC wave properties using 10 years (2001-2010) of data from Cluster, totaling 25,431 min of wave activity. Due to the polar orbit of Cluster, we are able to investigate EMIC waves at all MLATs and MLTs. This allows us to further investigate the MLAT dependence of various wave properties inside different MLT sectors and further explore the effects of Shabansky orbits on EMIC wave generation and propagation. The statistical analysis is presented in two papers. This paper focuses on the wave occurrence distribution as well as the distribution of wave properties. The companion paper focuses on local plasma parameters during wave observations as well as wave generation proxies.
Population-based 3D genome structure analysis reveals driving forces in spatial genome organization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tjong, Harianto; Li, Wenyuan; Kalhor, Reza
Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Here, our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm themore » presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization.« less
Population-based 3D genome structure analysis reveals driving forces in spatial genome organization
Tjong, Harianto; Li, Wenyuan; Kalhor, Reza; ...
2016-03-07
Conformation capture technologies (e.g., Hi-C) chart physical interactions between chromatin regions on a genome-wide scale. However, the structural variability of the genome between cells poses a great challenge to interpreting ensemble-averaged Hi-C data, particularly for long-range and interchromosomal interactions. Here, we present a probabilistic approach for deconvoluting Hi-C data into a model population of distinct diploid 3D genome structures, which facilitates the detection of chromatin interactions likely to co-occur in individual cells. Here, our approach incorporates the stochastic nature of chromosome conformations and allows a detailed analysis of alternative chromatin structure states. For example, we predict and experimentally confirm themore » presence of large centromere clusters with distinct chromosome compositions varying between individual cells. The stability of these clusters varies greatly with their chromosome identities. We show that these chromosome-specific clusters can play a key role in the overall chromosome positioning in the nucleus and stabilizing specific chromatin interactions. By explicitly considering genome structural variability, our population-based method provides an important tool for revealing novel insights into the key factors shaping the spatial genome organization.« less
The Alaska Arctic Vegetation Archive (AVA-AK)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walker, Donald; Breen, Amy; Druckenmiller, Lisa
The Alaska Arctic Vegetation Archive (AVA-AK, GIVD-ID: NA-US-014) is a free, publically available database archive of vegetation-plot data from the Arctic tundra region of northern Alaska. The archive currently contains 24 datasets with 3,026 non-overlapping plots. Of these, 74% have geolocation data with 25-m or better precision. Species cover data and header data are stored in a Turboveg database. A standardized Pan Arctic Species List provides a consistent nomenclature for vascular plants, bryophytes, and lichens in the archive. A web-based online Alaska Arctic Geoecological Atlas (AGA-AK) allows viewing and downloading the species data in a variety of formats, and providesmore » access to a wide variety of ancillary data. We conducted a preliminary cluster analysis of the first 16 datasets (1,613 plots) to examine how the spectrum of derived clusters is related to the suite of datasets, habitat types, and environmental gradients. Here, we present the contents of the archive, assess its strengths and weaknesses, and provide three supplementary files that include the data dictionary, a list of habitat types, an overview of the datasets, and details of the cluster analysis.« less
The Alaska Arctic Vegetation Archive (AVA-AK)
Walker, Donald; Breen, Amy; Druckenmiller, Lisa; ...
2016-05-17
The Alaska Arctic Vegetation Archive (AVA-AK, GIVD-ID: NA-US-014) is a free, publically available database archive of vegetation-plot data from the Arctic tundra region of northern Alaska. The archive currently contains 24 datasets with 3,026 non-overlapping plots. Of these, 74% have geolocation data with 25-m or better precision. Species cover data and header data are stored in a Turboveg database. A standardized Pan Arctic Species List provides a consistent nomenclature for vascular plants, bryophytes, and lichens in the archive. A web-based online Alaska Arctic Geoecological Atlas (AGA-AK) allows viewing and downloading the species data in a variety of formats, and providesmore » access to a wide variety of ancillary data. We conducted a preliminary cluster analysis of the first 16 datasets (1,613 plots) to examine how the spectrum of derived clusters is related to the suite of datasets, habitat types, and environmental gradients. Here, we present the contents of the archive, assess its strengths and weaknesses, and provide three supplementary files that include the data dictionary, a list of habitat types, an overview of the datasets, and details of the cluster analysis.« less
Lin, Y S; Kuan, C S; Weng, I S; Tsai, C C
2015-11-25
The genetic relationships among 27 pineapple [Ananas comosus (L.) Merr.] cultivars and lines were examined using 16 simple sequence repeat (SSR) markers. The number of alleles per locus of the SSR markers ranged from 2 to 6 (average 3.19), for a total of 51 alleles. Similarity coefficients were calculated on the basis of 51 amplified bands. A dendrogram was created according to the 16 SSR markers by the unweighted pair-group method. The banding patterns obtained from the SSR primers allowed most of the cultivars and lines to be distinguished, with the exception of vegetative clones. According to the dendrogram, the 27 pineapple cultivars and lines were clustered into three main clusters and four individual clusters. As expected, the dendrogram showed that derived cultivars and lines are closely related to their parental cultivars; the genetic relationships between pineapple cultivars agree with the genealogy of their breeding history. In addition, the analysis showed that there is no obvious correlation between SSR markers and morphological characters. In conclusion, SSR analysis is an efficient method for pineapple cultivar identification and can offer valuable informative characters to identify pineapple cultivars in Taiwan.
Millstone: software for multiplex microbial genome analysis and engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering.
Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.; ...
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Experimental research of phase transitions in a melt of high-purity aluminum
NASA Astrophysics Data System (ADS)
Vorontsov, V. B.; Pershin, V. K.
2017-12-01
This scientific work is devoted to the studying of the genetic connection structures of solid and liquid phases. In this paper Fourier analysis of acoustic emission (AE) signals accompanying heating of high purity aluminum from the melting point up to 860 °C was performed. The experimental data allowed to follow the dynamics of disorder zones in the melt with increasing melt temperature up to their complete destruction. The presented results of spectral analysis of the signals were analyzed from the standpoint of the theory of cluster melting metals.
Content-addressable read/write memories for image analysis
NASA Technical Reports Server (NTRS)
Snyder, W. E.; Savage, C. D.
1982-01-01
The commonly encountered image analysis problems of region labeling and clustering are found to be cases of search-and-rename problem which can be solved in parallel by a system architecture that is inherently suitable for VLSI implementation. This architecture is a novel form of content-addressable memory (CAM) which provides parallel search and update functions, allowing speed reductions down to constant time per operation. It has been proposed in related investigations by Hall (1981) that, with VLSI, CAM-based structures with enhanced instruction sets for general purpose processing will be feasible.
Plasma Properties in the Plume of a Hall Thruster Cluster
2003-06-04
The Hall thruster cluster is an attractive propulsion approach for spacecraft requiring very high-power electric propulsion systems. This article...probes in the plume of a low-power, four-engine Hall thruster cluster. Simple analytical formulas are introduced that allow these quantities to be
A Study of the Multiple Populations in M10
NASA Astrophysics Data System (ADS)
Gerber, Jeffrey M.; Friel, Eileen D.; Vesperini, Enrico
2017-06-01
We present an analysis of CN and CH band strengths which allow the identification of multiple populations in red giant stars in the globular cluster M10. Our measurements come from low-resolution spectroscopy obtained for ~140 red and asymptotic giant branch stars over two observation runs using Hydra on the WIYN 3.5m telescope. We sort the stars into nitrogen normal and enhanced populations based on the distribution of CN band strength as a function of magnitude. Once the stars are sorted into first and second generation (CN normal and enhanced, respectively), we compare this analysis to other ways of determining multiple stellar populations such as with the light elements Na and O and photometric indicators, particularly the UV photometry from the Hubble Space Telescope. C and N abundances are determined by matching observed CN and CH band measurements with those produced by synthetic spectra created with the Synthetic Spectrum Generator (SSG). The large sample size also allows us to study characteristics like radial distribution, and evolutionary effects such as the depletion of carbon (and subsequent nitrogen enrichment) as a star climbs the red giant branch. We find a rate of carbon depletion as a function of time for both populations in M10 and compare our result to M13, a cluster similar in metallicity.
Application of neuroanatomical features to tractography clustering.
Wang, Qian; Yap, Pew-Thian; Wu, Guorong; Shen, Dinggang
2013-09-01
Diffusion tensor imaging allows unprecedented insight into brain neural connectivity in vivo by allowing reconstruction of neuronal tracts via captured patterns of water diffusion in white matter microstructures. However, tractography algorithms often output hundreds of thousands of fibers, rendering subsequent data analysis intractable. As a remedy, fiber clustering techniques are able to group fibers into dozens of bundles and thus facilitate analyses. Most existing fiber clustering methods rely on geometrical information of fibers, by viewing them as curves in 3D Euclidean space. The important neuroanatomical aspect of fibers, however, is ignored. In this article, the neuroanatomical information of each fiber is encapsulated in the associativity vector, which functions as the unique "fingerprint" of the fiber. Specifically, each entry in the associativity vector describes the relationship between the fiber and a certain anatomical ROI in a fuzzy manner. The value of the entry approaches 1 if the fiber is spatially related to the ROI at high confidence; on the contrary, the value drops closer to 0. The confidence of the ROI is calculated by diffusing the ROI according to the underlying fibers from tractography. In particular, we have adopted the fast marching method for simulation of ROI diffusion. Using the associativity vectors of fibers, we further model fibers as observations sampled from multivariate Gaussian mixtures in the feature space. To group all fibers into relevant major bundles, an expectation-maximization clustering approach is employed. Experimental results indicate that our method results in anatomically meaningful bundles that are highly consistent across subjects. Copyright © 2012 Wiley Periodicals, Inc., a Wiley company.
Special features of the CLUSTER antenna and radial booms design, development and verification
NASA Technical Reports Server (NTRS)
Gianfiglio, G.; Yorck, M.; Luhmann, H. J.
1995-01-01
CLUSTER is a scientific space mission to in-situ investigate the Earth's plasma environment by means of four identical spin-stabilized spacecraft. Each spacecraft is provided with a set of four rigid booms: two Antenna Booms and two Radial Booms. This paper presents a summary of the boom development and verification phases addressing the key aspects of the Radial Boom design. In particular, it concentrates on the difficulties encountered in fulfilling simultaneously the requirements of minimum torque ratio and maximum allowed shock loads at boom latching for this two degree of freedom boom. The paper also provides an overview of the analysis campaign and testing program performed to achieve sufficient confidence in the boom performance and operation.
tropical cyclone risk analysis: a decisive role of its track
NASA Astrophysics Data System (ADS)
Chelsea Nam, C.; Park, Doo-Sun R.; Ho, Chang-Hoi
2016-04-01
The tracks of 85 tropical cyclones (TCs) that made landfall to South Korea for the period 1979-2010 are classified into four clusters by using a fuzzy c-means clustering method. The four clusters are characterized by 1) east-short, 2) east-long, 3) west-long, and 4) west-short based on the moving routes around Korean peninsula. We conducted risk comparison analysis for these four clusters regarding their hazards, exposure, and damages. Here, hazard parameters are calculated from two different sources independently, one from the best-track data (BT) and the other from the 60 weather stations over the country (WS). The results show distinct characteristics of the four clusters in terms of the hazard parameters and economic losses (EL), suggesting that there is a clear track-dependency in the overall TC risk. It is appeared that whether there occurred an "effective collision" overweighs the intensity of the TC per se. The EL ranking did not agree with the BT parameters (maximum wind speed, central pressure, or storm radius), but matches to WS parameter (especially, daily accumulated rainfall and TC-influenced period). The west-approaching TCs (i.e. west-long and west-short clusters) generally recorded larger EL than the east-approaching TCs (i.e. east-short and east-long clusters), although the east-long clusters are the strongest in BT point of view. This can be explained through the spatial distribution of the WS parameters and the regional EL maps corresponding to it. West-approaching TCs accompanied heavy rainfall on the southern regions with the helps of the topographic effect on their tracks, and of the extended stay on the Korean Peninsula in their extratropical transition, that were not allowed to the east-approaching TCs. On the other hand, some regions had EL that are not directly proportional to the hazards, and this is partly attributed to spatial disparity in wealth and vulnerability. Correlation analysis also revealed the importance of rainfall; daily accumulated rainfall is the most-correlated with EL among all BT and WS hazard parameters for all clusters except the east-short. The least-correlated hazard parameter is the storm radius which showed significant correlations with EL for only the short clusters. In conclusion, this study suggests that TC track is essential in determining the way it brings damage on South Korea. Thus, it is suggested that the damage warning and adaptation policy need to be different for different TC tracks although South Korea is relatively small compared to average TC size.
2013-01-01
Background The slaughterhouse is a central processing point for food animals and thus a source of both demographic data (age, breed, sex) and health-related data (reason for condemnation and condemned portions) that are not available through other sources. Using these data for syndromic surveillance is therefore tempting. However many possible reasons for condemnation and condemned portions exist, making the definition of relevant syndromes challenging. The objective of this study was to determine a typology of cattle with at least one portion of the carcass condemned in order to define syndromes. Multiple factor analysis (MFA) in combination with clustering methods was performed using both health-related data and demographic data. Results Analyses were performed on 381,186 cattle with at least one portion of the carcass condemned among the 1,937,917 cattle slaughtered in ten French abattoirs. Results of the MFA and clustering methods led to 12 clusters considered as stable according to year of slaughter and slaughterhouse. One cluster was specific to a disease of public health importance (cysticercosis). Two clusters were linked to the slaughtering process (fecal contamination of heart or lungs and deterioration lesions). Two clusters respectively characterized by chronic liver lesions and chronic peritonitis could be linked to diseases of economic importance to farmers. Three clusters could be linked respectively to reticulo-pericarditis, fatty liver syndrome and farmer’s lung syndrome, which are related to both diseases of economic importance to farmers and herd management issues. Three clusters respectively characterized by arthritis, myopathy and Dark Firm Dry (DFD) meat could notably be linked to animal welfare issues. Finally, one cluster, characterized by bronchopneumonia, could be linked to both animal health and herd management issues. Conclusion The statistical approach of combining multiple factor analysis with cluster analysis showed its relevance for the detection of syndromes using available large and complex slaughterhouse data. The advantages of this statistical approach are to i) define groups of reasons for condemnation based on meat inspection data, ii) help grouping reasons for condemnation among a list of various possible reasons for condemnation for which a consensus among experts could be difficult to reach, iii) assign each animal to a single syndrome which allows the detection of changes in trends of syndromes to detect unusual patterns in known diseases and emergence of new diseases. PMID:23628140
Statistical Features of the 2010 Beni-Ilmane, Algeria, Aftershock Sequence
NASA Astrophysics Data System (ADS)
Hamdache, M.; Peláez, J. A.; Gospodinov, D.; Henares, J.
2018-03-01
The aftershock sequence of the 2010 Beni-Ilmane ( M W 5.5) earthquake is studied in depth to analyze the spatial and temporal variability of seismicity parameters of the relationships modeling the sequence. The b value of the frequency-magnitude distribution is examined rigorously. A threshold magnitude of completeness equal to 2.1, using the maximum curvature procedure or the changing point algorithm, and a b value equal to 0.96 ± 0.03 have been obtained for the entire sequence. Two clusters have been identified and characterized by their faulting type, exhibiting b values equal to 0.99 ± 0.05 and 1.04 ± 0.05. Additionally, the temporal decay of the aftershock sequence was examined using a stochastic point process. The analysis was done through the restricted epidemic-type aftershock sequence (RETAS) stochastic model, which allows the possibility to recognize the prevailing clustering pattern of the relaxation process in the examined area. The analysis selected the epidemic-type aftershock sequence (ETAS) model to offer the most appropriate description of the temporal distribution, which presumes that all events in the sequence can cause secondary aftershocks. Finally, the fractal dimensions are estimated using the integral correlation. The obtained D 2 values are 2.15 ± 0.01, 2.23 ± 0.01 and 2.17 ± 0.02 for the entire sequence, and for the first and second cluster, respectively. An analysis of the temporal evolution of the fractal dimensions D -2, D 0, D 2 and the spectral slope has been also performed to derive and characterize the different clusters included in the sequence.
NASA Astrophysics Data System (ADS)
Bruynooghe, Michel M.
1998-04-01
In this paper, we present a robust method for automatic object detection and delineation in noisy complex images. The proposed procedure is a three stage process that integrates image segmentation by multidimensional pixel clustering and geometrically constrained optimization of deformable contours. The first step is to enhance the original image by nonlinear unsharp masking. The second step is to segment the enhanced image by multidimensional pixel clustering, using our reducible neighborhoods clustering algorithm that has a very interesting theoretical maximal complexity. Then, candidate objects are extracted and initially delineated by an optimized region merging algorithm, that is based on ascendant hierarchical clustering with contiguity constraints and on the maximization of average contour gradients. The third step is to optimize the delineation of previously extracted and initially delineated objects. Deformable object contours have been modeled by cubic splines. An affine invariant has been used to control the undesired formation of cusps and loops. Non linear constrained optimization has been used to maximize the external energy. This avoids the difficult and non reproducible choice of regularization parameters, that are required by classical snake models. The proposed method has been applied successfully to the detection of fine and subtle microcalcifications in X-ray mammographic images, to defect detection by moire image analysis, and to the analysis of microrugosities of thin metallic films. The later implementation of the proposed method on a digital signal processor associated to a vector coprocessor would allow the design of a real-time object detection and delineation system for applications in medical imaging and in industrial computer vision.
Long-term surface EMG monitoring using K-means clustering and compressive sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza; Krishnan, Sridhar
2015-05-01
In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
Ancient genomic architecture for mammalian olfactory receptor clusters
Aloni, Ronny; Olender, Tsviya; Lancet, Doron
2006-01-01
Background Mammalian olfactory receptor (OR) genes reside in numerous genomic clusters of up to several dozen genes. Whole-genome sequence alignment nets of five mammals allow their comprehensive comparison, aimed at reconstructing the ancestral olfactory subgenome. Results We developed a new and general tool for genome-wide definition of genomic gene clusters conserved in multiple species. Syntenic orthologs, defined as gene pairs showing conservation of both genomic location and coding sequence, were subjected to a graph theory algorithm for discovering CLICs (clusters in conservation). When applied to ORs in five mammals, including the marsupial opossum, more than 90% of the OR genes were found within a framework of 48 multi-species CLICs, invoking a general conservation of gene order and composition. A detailed analysis of individual CLICs revealed multiple differences among species, interpretable through species-specific genomic rearrangements and reflecting complex mammalian evolutionary dynamics. One significant instance involves CLIC #1, which lacks a human member, implying the human-specific deletion of an OR cluster, whose mouse counterpart has been tentatively associated with isovaleric acid odorant detection. Conclusion The identified multi-species CLICs demonstrate that most of the mammalian OR clusters have a common ancestry, preceding the split between marsupials and placental mammals. However, only two of these CLICs were capable of incorporating chicken OR genes, parsimoniously implying that all other CLICs emerged subsequent to the avian-mammalian divergence. PMID:17010214
The FOSS GIS Workbench on the GFZ Load Sharing Facility compute cluster
NASA Astrophysics Data System (ADS)
Löwe, P.; Klump, J.; Thaler, J.
2012-04-01
Compute clusters can be used as GIS workbenches, their wealth of resources allow us to take on geocomputation tasks which exceed the limitations of smaller systems. To harness these capabilities requires a Geographic Information System (GIS), able to utilize the available cluster configuration/architecture and a sufficient degree of user friendliness to allow for wide application. In this paper we report on the first successful porting of GRASS GIS, the oldest and largest Free Open Source (FOSS) GIS project, onto a compute cluster using Platform Computing's Load Sharing Facility (LSF). In 2008, GRASS6.3 was installed on the GFZ compute cluster, which at that time comprised 32 nodes. The interaction with the GIS was limited to the command line interface, which required further development to encapsulate the GRASS GIS business layer to facilitate its use by users not familiar with GRASS GIS. During the summer of 2011, multiple versions of GRASS GIS (v 6.4, 6.5 and 7.0) were installed on the upgraded GFZ compute cluster, now consisting of 234 nodes with 480 CPUs providing 3084 cores. The GFZ compute cluster currently offers 19 different processing queues with varying hardware capabilities and priorities, allowing for fine-grained scheduling and load balancing. After successful testing of core GIS functionalities, including the graphical user interface, mechanisms were developed to deploy scripted geocomputation tasks onto dedicated processing queues. The mechanisms are based on earlier work by NETELER et al. (2008). A first application of the new GIS functionality was the generation of maps of simulated tsunamis in the Mediterranean Sea for the Tsunami Atlas of the FP-7 TRIDEC Project (www.tridec-online.eu). For this, up to 500 processing nodes were used in parallel. Further trials included the processing of geometrically complex problems, requiring significant amounts of processing time. The GIS cluster successfully completed all these tasks, with processing times lasting up to full 20 CPU days. The deployment of GRASS GIS on a compute cluster allows our users to tackle GIS tasks previously out of reach of single workstations. In addition, this GRASS GIS cluster implementation will be made available to other users at GFZ in the course of 2012. It will thus become a research utility in the sense of "Software as a Service" (SaaS) and can be seen as our first step towards building a GFZ corporate cloud service.
Cluster synchronization transmission of different external signals in discrete uncertain network
NASA Astrophysics Data System (ADS)
Li, Chengren; Lü, Ling; Chen, Liansong; Hong, Yixuan; Zhou, Shuang; Yang, Yiming
2018-07-01
We research cluster synchronization transmissions of different external signals in discrete uncertain network. Based on the Lyapunov theorem, the network controller and the identification law of uncertain adjustment parameter are designed, and they are efficiently used to achieve the cluster synchronization and the identification of uncertain adjustment parameter. In our technical scheme, the network nodes in each cluster and the transmitted external signal can be different, and they allow the presence of uncertain parameters in the network. Especially, we are free to choose the clustering topologies, the cluster number and the node number in each cluster.
Using Cluster Analysis to Examine Husband-Wife Decision Making
ERIC Educational Resources Information Center
Bonds-Raacke, Jennifer M.
2006-01-01
Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…
Planck intermediate results. XLIII. Spectral energy distribution of dust in clusters of galaxies
NASA Astrophysics Data System (ADS)
Planck Collaboration; Adam, R.; Ade, P. A. R.; Aghanim, N.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Battaner, E.; Benabed, K.; Benoit-Lévy, A.; Bersanelli, M.; Bielewicz, P.; Bikmaev, I.; Bonaldi, A.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Burenin, R.; Burigana, C.; Calabrese, E.; Cardoso, J.-F.; Catalano, A.; Chiang, H. C.; Christensen, P. R.; Churazov, E.; Colombo, L. P. L.; Combet, C.; Comis, B.; Couchot, F.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dole, H.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Elsner, F.; Enßlin, T. A.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Galeotta, S.; Ganga, K.; Génova-Santos, R. T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Harrison, D. L.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Hornstrup, A.; Hovest, W.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Keihänen, E.; Keskitalo, R.; Khamitov, I.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Macías-Pérez, J. F.; Maffei, B.; Maggio, G.; Mandolesi, N.; Mangilli, A.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; Melchiorri, A.; Mennella, A.; Migliaccio, M.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Nørgaard-Nielsen, H. U.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Pagano, L.; Pajot, F.; Paoletti, D.; Pasian, F.; Perdereau, O.; Perotto, L.; Pettorino, V.; Piacentini, F.; Piat, M.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Ponthieu, N.; Pratt, G. W.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Valenziano, L.; Valiviita, J.; Van Tent, F.; Vielva, P.; Villa, F.; Wade, L. A.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zonca, A.
2016-12-01
Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objects from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck's resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. The recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.
Planck intermediate results: XLIII. Spectral energy distribution of dust in clusters of galaxies
Adam, R.; Ade, P. A. R.; Aghanim, N.; ...
2016-12-12
Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide in this paper new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objectsmore » from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck’s resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. Finally, the recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.« less
Advanced multivariate analysis to assess remediation of hydrocarbons in soils.
Lin, Deborah S; Taylor, Peter; Tibbett, Mark
2014-10-01
Accurate monitoring of degradation levels in soils is essential in order to understand and achieve complete degradation of petroleum hydrocarbons in contaminated soils. We aimed to develop the use of multivariate methods for the monitoring of biodegradation of diesel in soils and to determine if diesel contaminated soils could be remediated to a chemical composition similar to that of an uncontaminated soil. An incubation experiment was set up with three contrasting soil types. Each soil was exposed to diesel at varying stages of degradation and then analysed for key hydrocarbons throughout 161 days of incubation. Hydrocarbon distributions were analysed by Principal Coordinate Analysis and similar samples grouped by cluster analysis. Variation and differences between samples were determined using permutational multivariate analysis of variance. It was found that all soils followed trajectories approaching the chemical composition of the unpolluted soil. Some contaminated soils were no longer significantly different to that of uncontaminated soil after 161 days of incubation. The use of cluster analysis allows the assignment of a percentage chemical similarity of a diesel contaminated soil to an uncontaminated soil sample. This will aid in the monitoring of hydrocarbon contaminated sites and the establishment of potential endpoints for successful remediation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Popescu, Bogdan; Hanson, M. M.
2010-04-10
We present Monte Carlo models of open stellar clusters with the purpose of mapping out the behavior of integrated colors with mass and age. Our cluster simulation package allows for stochastic variations in the stellar mass function to evaluate variations in integrated cluster properties. We find that UBVK colors from our simulations are consistent with simple stellar population (SSP) models, provided the cluster mass is large, M {sub cluster} {>=} 10{sup 6} M {sub sun}. Below this mass, our simulations show two significant effects. First, the mean value of the distribution of integrated colors moves away from the SSP predictionsmore » and is less red, in the first 10{sup 7} to 10{sup 8} years in UBV colors, and for all ages in (V - K). Second, the 1{sigma} dispersion of observed colors increases significantly with lower cluster mass. We attribute the former to the reduced number of red luminous stars in most of the lower mass clusters and the latter to the increased stochastic effect of a few of these stars on lower mass clusters. This latter point was always assumed to occur, but we now provide the first public code able to quantify this effect. We are completing a more extensive database of magnitudes and colors as a function of stellar cluster age and mass that will allow the determination of the correlation coefficients among different bands, and improve estimates of cluster age and mass from integrated photometry.« less
Bestgen, Sebastian; Fuhr, Olaf; Breitung, Ben; Kiran Chakravadhanula, Venkata Sei; Guthausen, Gisela; Hennrich, Frank; Yu, Wen; Kappes, Manfred M; Roesky, Peter W; Fenske, Dieter
2017-03-01
With the aim to synthesize soluble cluster molecules, the silver salt of (4-( tert -butyl)phenyl)methanethiol [AgSCH 2 C 6 H 4 t Bu] was applied as a suitable precursor for the formation of a nanoscale silver sulfide cluster. In the presence of 1,6-(diphenylphosphino)hexane (dpph), the 115 nuclear silver cluster [Ag 115 S 34 (SCH 2 C 6 H 4 t Bu) 47 (dpph) 6 ] was obtained. The molecular structure of this compound was elucidated by single crystal X-ray analysis and fully characterized by spectroscopic techniques. In contrast to most of the previously published cluster compounds with more than a hundred heavy atoms, this nanoscale inorganic molecule is soluble in organic solvents, which allowed a comprehensive investigation in solution by UV-Vis spectroscopy and one- and two-dimensional NMR spectroscopy including 31 P/ 109 Ag-HSQC and DOSY experiments. These are the first heteronuclear NMR investigations on coinage metal chalcogenides. They give some first insight into the behavior of nanoscale silver sulfide clusters in solution. Additionally, molecular weight determinations were performed by 2D analytical ultracentrifugation and HR-TEM investigations confirm the presence of size-homogeneous nanoparticles present in solution.
An intermetallic Au24Ag20 superatom nanocluster stabilized by labile ligands.
Wang, Yu; Su, Haifeng; Xu, Chaofa; Li, Gang; Gell, Lars; Lin, Shuichao; Tang, Zichao; Häkkinen, Hannu; Zheng, Nanfeng
2015-04-08
An intermetallic nanocluster containing 44 metal atoms, Au24Ag20(2-SPy)4(PhC≡C)20Cl2, was successfully synthesized and structurally characterized by single-crystal analysis and density funtional theory computations. The 44 metal atoms in the cluster are arranged as a concentric three-shell Au12@Ag20@Au12 Keplerate structure having a high symmetry. For the first time, the co-presence of three different types of anionic ligands (i.e., phenylalkynyl, 2-pyridylthiolate, and chloride) was revealed on the surface of metal nanoclusters. Similar to thiolates, alkynyls bind linearly to surface Au atoms using their σ-bonds, leading to the formation of two types of surface staple units (PhC≡C-Au-L, L = PhC≡C(-) or 2-pyridylthiolate) on the cluster. The co-presence of three different surface ligands allows the site-specific surface and functional modification of the cluster. The lability of PhC≡C(-) ligands on the cluster was demonstrated, making it possible to keep the metal core intact while removing partial surface capping. Moreover, it was found that ligand exchange on the cluster occurs easily to offer various derivatives with the same metal core but different surface functionality and thus different solubility.
Hargreaves, James R; Fearon, Elizabeth; Davey, Calum; Phillips, Andrew; Cambiano, Valentina; Cowan, Frances M
2016-01-05
Pragmatic cluster-randomised trials should seek to make unbiased estimates of effect and be reported according to CONSORT principles, and the study population should be representative of the target population. This is challenging when conducting trials amongst 'hidden' populations without a sample frame. We describe a pair-matched cluster-randomised trial of a combination HIV-prevention intervention to reduce the proportion of female sex workers (FSW) with a detectable HIV viral load in Zimbabwe, recruiting via respondent driven sampling (RDS). We will cross-sectionally survey approximately 200 FSW at baseline and at endline to characterise each of 14 sites. RDS is a variant of chain referral sampling and has been adapted to approximate random sampling. Primary analysis will use the 'RDS-2' method to estimate cluster summaries and will adapt Hayes and Moulton's '2-step' method to adjust effect estimates for individual-level confounders and further adjust for cluster baseline prevalence. We will adapt CONSORT to accommodate RDS. In the absence of observable refusal rates, we will compare the recruitment process between matched pairs. We will need to investigate whether cluster-specific recruitment or the intervention itself affects the accuracy of the RDS estimation process, potentially causing differential biases. To do this, we will calculate RDS-diagnostic statistics for each cluster at each time point and compare these statistics within matched pairs and time points. Sensitivity analyses will assess the impact of potential biases arising from assumptions made by the RDS-2 estimation. We are not aware of any other completed pragmatic cluster RCTs that are recruiting participants using RDS. Our statistical design and analysis approach seeks to transparently document participant recruitment and allow an assessment of the representativeness of the study to the target population, a key aspect of pragmatic trials. The challenges we have faced in the design of this trial are likely to be shared in other contexts aiming to serve the needs of legally and/or socially marginalised populations for which no sampling frame exists and especially when the social networks of participants are both the target of intervention and the means of recruitment. The trial was registered at Pan African Clinical Trials Registry (PACTR201312000722390) on 9 December 2013.
Membership and Coronal Activity in the NGC 2232 and Cr 140 Open Clusters
NASA Technical Reports Server (NTRS)
Patten, Brian M.; Oliversen, Ronald J. (Technical Monitor)
2001-01-01
This is the second annual performance report for our grant "Membership and Coronal Activity in the NGC 2232 and Cr 140 Open Clusters." We propose to identify X-ray sources and extract net source counts in 8 archival ROSAT HRI images in the regions of the NGC 2232 and Cr 140 open clusters. These X-ray data will be combined with ground-based photometry and spectroscopy in order to identify G, K, and early-M type cluster members. At present, no members later than approximately F5 are currently known for either cluster. With ages of approximately 25 Myr and at a distance of just 320 - 360 pc, the combined late-type membership of the NGC 2232 and Cr 140 clusters will yield an almost unique sample of solar-type stars in the post-T Tauri/pre-main sequence phase of evolution. These stars will be used to assess the level and dispersion in coronal activity levels, as part of a probe of the importance of magnetic braking and the level of magnetic dynamo activity, for solar-type stars just before they reach the ZAMS. Over the past year we have successfully acquired all of the ground-based data necessary to support the analysis of the archival ROSAT X-ray data in the regions around both of these clusters. By the end of 2001 we expect to have completed the reduction and analysis of the ground-based photometry and spectroscopy and will begin the integration of these data with the ROSAT X-ray data. A certain amount of pressure to complete the work on NGC 2232 is coming from the SIRTF project, as this cluster may be a key component to a circumstellar disk evolution GTO program. We are only too happy to try to help and have worked to speed the analysis as much as possible. The primary activity to be undertaken in the next few months is the integration of the groundbased photometry and spectroscopy with the archival ROSAT X-ray data and then writing the paper summarizing our results. The most time consuming portion of this next phase is, of course, seeing the paper through publication in a peer-reviewed journal. Therefore, we have requested a no-cost extension to the grant to allow us to bring this project to a conclusion.
Managing distance and covariate information with point-based clustering.
Whigham, Peter A; de Graaf, Brandon; Srivastava, Rashmi; Glue, Paul
2016-09-01
Geographic perspectives of disease and the human condition often involve point-based observations and questions of clustering or dispersion within a spatial context. These problems involve a finite set of point observations and are constrained by a larger, but finite, set of locations where the observations could occur. Developing a rigorous method for pattern analysis in this context requires handling spatial covariates, a method for constrained finite spatial clustering, and addressing bias in geographic distance measures. An approach, based on Ripley's K and applied to the problem of clustering with deliberate self-harm (DSH), is presented. Point-based Monte-Carlo simulation of Ripley's K, accounting for socio-economic deprivation and sources of distance measurement bias, was developed to estimate clustering of DSH at a range of spatial scales. A rotated Minkowski L1 distance metric allowed variation in physical distance and clustering to be assessed. Self-harm data was derived from an audit of 2 years' emergency hospital presentations (n = 136) in a New Zealand town (population ~50,000). Study area was defined by residential (housing) land parcels representing a finite set of possible point addresses. Area-based deprivation was spatially correlated. Accounting for deprivation and distance bias showed evidence for clustering of DSH for spatial scales up to 500 m with a one-sided 95 % CI, suggesting that social contagion may be present for this urban cohort. Many problems involve finite locations in geographic space that require estimates of distance-based clustering at many scales. A Monte-Carlo approach to Ripley's K, incorporating covariates and models for distance bias, are crucial when assessing health-related clustering. The case study showed that social network structure defined at the neighbourhood level may account for aspects of neighbourhood clustering of DSH. Accounting for covariate measures that exhibit spatial clustering, such as deprivation, are crucial when assessing point-based clustering.
An Analysis of Rich Cluster Redshift Survey Data for Large Scale Structure Studies
NASA Astrophysics Data System (ADS)
Slinglend, K.; Batuski, D.; Haase, S.; Hill, J.
1994-12-01
The results from the COBE satellite show the existence of structure on scales on the order of 10% or more of the horizon scale of the universe. Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and may hold the promise of confirming structure on the scale of the COBE result. However, many Abell clusters have zero or only one measured redshift, so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters. Our approach in this effort has been to use the MX multifiber spectrometer on the Steward 2.3m to measure redshifts of at least ten galaxies in each of 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8 (estimated z<= 0.12) and zero or one measured redshifts. This work will result in a deeper, more complete (and reliable) sample of positions of rich clusters. Our primary intent for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters in an effort to constrain theoretical models for structure formation. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 64 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms and stripe density plots for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.
Young massive star clusters in the era of HST and integral field spectroscopy
NASA Astrophysics Data System (ADS)
Zeidler, Peter; Nota, Antonella; Sabbi, Elena; Grebel, Eva K.; Pasquali, Anna
2018-01-01
With an age of 1 – 2 Myr at a distance of 4 kpc and a total stellar mass of 3.7×104 M⊙, Westerlund 2 (Wd2) is one of the most massive young star clusters in the Milky Way. We present a detailed analysis of its prominent pre-main-sequence population using the data of a high-resolution multi-band survey in the optical and near-infrared with the Hubble Space Telescope (HST), in combination with our spectroscopic survey, observed with the VLT/MUSE integral field unit. With our derived high-resolution extinction map of the region, which is absolutely essential giving the dominating presences of the gas and dust, we derived the spatial dependence of the mass function and quantify the degree of mass segregation down to 0.65 M⊙ with a completeness level better than 50%. Studying the radial dependence of the mass function of Wd2 and quantifying the degree of mass segregation in this young massive star cluster showed that it consists of two sub-clumps, namely the main cluster and the northern clump. From the MUSE data, we can extract individual stellar spectra and spectral energy distributions of the stars, based on the astrometry, provided by our high-resolution HST photometric catalog. This data will provide us with an almost complete spectral classification of a young massive star cluster down to 1.0 M⊙. The combination of the MUSE data, together with 3 more years of approved HST data will allow us to obtain, for the first time, the 3D motions of the stars with an accuracy of 1-2 km s-2 to determine the stellar velocity dispersion in order to study the fate of Wd2. This information is of great importance to adjust the initial conditions in cluster evolution models in order to connect these young massive star clusters and the old globular cluster population. Additionally, the combination of the photometric and spectroscopic datasets allows us to study the stars and their feedback onto the surrounding HII region simultaneously, as well as peculiar objects such as the massive, eclipsing Wolf-Rayet binary, WR20a or a possible Herbig-Haro object in the northern clump.
Blue intensity matters for cell cycle profiling in fluorescence DAPI-stained images.
Ferro, Anabela; Mestre, Tânia; Carneiro, Patrícia; Sahumbaiev, Ivan; Seruca, Raquel; Sanches, João M
2017-05-01
In the past decades, there has been an amazing progress in the understanding of the molecular mechanisms of the cell cycle. This has been possible largely due to a better conceptualization of the cycle itself, but also as a consequence of technological advances. Herein, we propose a new fluorescence image-based framework targeted at the identification and segmentation of stained nuclei with the purpose to determine DNA content in distinct cell cycle stages. The method is based on discriminative features, such as total intensity and area, retrieved from in situ stained nuclei by fluorescence microscopy, allowing the determination of the cell cycle phase of both single and sub-population of cells. The analysis framework was built on a modified k-means clustering strategy and refined with a Gaussian mixture model classifier, which enabled the definition of highly accurate classification clusters corresponding to G1, S and G2 phases. Using the information retrieved from area and fluorescence total intensity, the modified k-means (k=3) cluster imaging framework classified 64.7% of the imaged nuclei, as being at G1 phase, 12.0% at G2 phase and 23.2% at S phase. Performance of the imaging framework was ascertained with normal murine mammary gland cells constitutively expressing the Fucci2 technology, exhibiting an overall sensitivity of 94.0%. Further, the results indicate that the imaging framework has a robust capacity to both identify a given DAPI-stained nucleus to its correct cell cycle phase, as well as to determine, with very high probability, true negatives. Importantly, this novel imaging approach is a non-disruptive method that allows an integrative and simultaneous quantitative analysis of molecular and morphological parameters, thus awarding the possibility of cell cycle profiling in cytological and histological samples.
McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra; ...
2017-05-23
Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra
Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less
MASGOMAS project: building a bona-fide catalog of massive star cluster candidates
NASA Astrophysics Data System (ADS)
Herrero, Artemio; Rübke, Klaus; Ramírez Alegría, Sebastián; Garcia, Miriam; Marín-Franch, Antonio
2017-11-01
MASGOMAS (MAssive Stars in Galactic Obscured MAssive clusterS) is a project aiming at discovering OB stars in Galactic, dust enshrouded, star-forming massive clusters (Marín-Franch et al. 2009, A&A 502, 559). The project has gone through different phases of increasing automatization, that have allowed us to discover massive clusters like MASGOMAS-1 (Ramírez Alegría et al. 2012, A&A 541, A75) (with M~20,000 M⊙).
Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...
2014-12-02
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi
2014-12-01
Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo
2017-12-01
To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NASA Astrophysics Data System (ADS)
Sørensen, L. K.; Fleig, T.; Olsen, J.
2009-08-01
Aimed at obtaining complete and highly accurate potential energy surfaces for molecules containing heavy elements, we present a new general-order coupled cluster method which can be applied in the framework of the spin-free Dirac formalism. As an initial application we present a systematic study of electron correlation and relativistic effects on the spectroscopic and electric properties of the LiCs molecule in its electronic ground state. In particular, we closely investigate the importance of excitations higher than coupled cluster doubles, spin-free and spin-dependent relativistic effects and the correlation of outer-core electrons on the equilibrium bond length, the harmonic vibrational frequency, the dissociation energy, the dipole moment and the static electric dipole polarizability. We demonstrate that our new implementation allows for highly accurate calculations not only in the bonding region but also along the complete potential curve. The quality of our results is demonstrated by a vibrational analysis where an almost complete set of vibrational levels has been calculated accurately.
A parsimonious characterization of change in global age-specific and total fertility rates
2018-01-01
This study aims to understand trends in global fertility from 1950-2010 though the analysis of age-specific fertility rates. This approach incorporates both the overall level, as when the total fertility rate is modeled, and different patterns of age-specific fertility to examine the relationship between changes in age-specific fertility and fertility decline. Singular value decomposition is used to capture the variation in age-specific fertility curves while reducing the number of dimensions, allowing curves to be described nearly fully with three parameters. Regional patterns and trends over time are evident in parameter values, suggesting this method provides a useful tool for considering fertility decline globally. The second and third parameters were analyzed using model-based clustering to examine patterns of age-specific fertility over time and place; four clusters were obtained. A country’s demographic transition can be traced through time by membership in the different clusters, and regional patterns in the trajectories through time and with fertility decline are identified. PMID:29377899
NASA Astrophysics Data System (ADS)
Straus, D. M.
2007-12-01
The probability distribution (pdf) of errors is followed in identical twin studies using the COLA T63 AGCM, integrated with observed SST for 15 recent winters. 30 integrations per winter (for 15 winters) are available with initial errors that are extremely small. The evolution of the pdf is tested for multi-modality, and the results interpreted in terms of clusters / regimes found in: (a) the set of 15x30 integrations mentioned, and (b) a larger ensemble of 55x15 integrations made with the same GCM using the same SSTs. The mapping of pdf evolution and clusters is also carried out for each winter separately, using the clusters found in the 55-member ensemble for the same winter alone. This technique yields information on the change in regimes caused by different boundary forcing (Straus and Molteni, 2004; Straus, Corti and Molteni, 2006). Analysis of the growing errors in terms of baroclinic and barotropic components allows for interpretation of the corresponding instabilities.
What drives the formation of massive stars and clusters?
NASA Astrophysics Data System (ADS)
Ochsendorf, Bram; Meixner, Margaret; Roman-Duval, Julia; Evans, Neal J., II; Rahman, Mubdi; Zinnecker, Hans; Nayak, Omnarayani; Bally, John; Jones, Olivia C.; Indebetouw, Remy
2018-01-01
Galaxy-wide surveys allow to study star formation in unprecedented ways. In this talk, I will discuss our analysis of the Large Magellanic Cloud (LMC) and the Milky Way, and illustrate how studying both the large and small scale structure of galaxies are critical in addressing the question: what drives the formation of massive stars and clusters?I will show that ‘turbulence-regulated’ star formation models do not reproduce massive star formation properties of GMCs in the LMC and Milky Way: this suggests that theory currently does not capture the full complexity of star formation on small scales. I will also report on the discovery of a massive star forming complex in the LMC, which in many ways manifests itself as an embedded twin of 30 Doradus: this may shed light on the formation of R136 and 'Super Star Clusters' in general. Finally, I will highlight what we can expect in the next years in the field of star formation with large-scale sky surveys, ALMA, and our JWST-GTO program.
Biocompatibility of cluster-assembled nanostructured TiO2 with primary and cancer cells.
Carbone, Roberta; Marangi, Ida; Zanardi, Andrea; Giorgetti, Luca; Chierici, Elisabetta; Berlanda, Giuseppe; Podestà, Alessandro; Fiorentini, Francesca; Bongiorno, Gero; Piseri, Paolo; Pelicci, Pier Giuseppe; Milani, Paolo
2006-06-01
We have characterized the biocompatibility of nanostructured TiO2 films produced by the deposition of a supersonic beam of TiOx clusters. Physical analysis shows that these films possess, at the nanoscale, a granularity and porosity mimicking those of typical extracellular matrix structures and adsorption properties that could allow surface functionalization with different macromolecules such as DNA, proteins, and peptides. To explore the biocompatibility of this novel nanostructured surface, different cancer and primary cells were analyzed in terms of morphological appearance (by bright field microscopy and immunofluorescence) and growth properties, with the aim to evaluate cluster-assembled TiO2 films as substrates for cell-based and tissue-based applications. Our results strongly suggest that this new biomaterial supports normal growth and adhesion of primary and cancer cells with no need for coating with ECM proteins; we thus propose this new material as an optimal substrate for different applications in cell-based assays, biosensors or microfabricated medical devices.
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Wucherl; Koo, Michelle; Cao, Yu
Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe-more » art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.« less
Cloud Computing for Pharmacometrics: Using AWS, NONMEM, PsN, Grid Engine, and Sonic
Sanduja, S; Jewell, P; Aron, E; Pharai, N
2015-01-01
Cloud computing allows pharmacometricians to access advanced hardware, network, and security resources available to expedite analysis and reporting. Cloud-based computing environments are available at a fraction of the time and effort when compared to traditional local datacenter-based solutions. This tutorial explains how to get started with building your own personal cloud computer cluster using Amazon Web Services (AWS), NONMEM, PsN, Grid Engine, and Sonic. PMID:26451333
Cloud Computing for Pharmacometrics: Using AWS, NONMEM, PsN, Grid Engine, and Sonic.
Sanduja, S; Jewell, P; Aron, E; Pharai, N
2015-09-01
Cloud computing allows pharmacometricians to access advanced hardware, network, and security resources available to expedite analysis and reporting. Cloud-based computing environments are available at a fraction of the time and effort when compared to traditional local datacenter-based solutions. This tutorial explains how to get started with building your own personal cloud computer cluster using Amazon Web Services (AWS), NONMEM, PsN, Grid Engine, and Sonic.
Ko, Yi-An; Mukherjee, Bhramar; Smith, Jennifer A; Kardia, Sharon L R; Allison, Matthew; Diez Roux, Ana V
2016-11-01
There has been an increased interest in identifying gene-environment interaction (G × E) in the context of multiple environmental exposures. Most G × E studies analyze one exposure at a time, but we are exposed to multiple exposures in reality. Efficient analysis strategies for complex G × E with multiple environmental factors in a single model are still lacking. Using the data from the Multiethnic Study of Atherosclerosis, we illustrate a two-step approach for modeling G × E with multiple environmental factors. First, we utilize common clustering and classification strategies (e.g., k-means, latent class analysis, classification and regression trees, Bayesian clustering using Dirichlet Process) to define subgroups corresponding to distinct environmental exposure profiles. Second, we illustrate the use of an additive main effects and multiplicative interaction model, instead of the conventional saturated interaction model using product terms of factors, to study G × E with the data-driven exposure subgroups defined in the first step. We demonstrate useful analytical approaches to translate multiple environmental exposures into one summary class. These tools not only allow researchers to consider several environmental exposures in G × E analysis but also provide some insight into how genes modify the effect of a comprehensive exposure profile instead of examining effect modification for each exposure in isolation.
Classification of different degrees of adiposity in sedentary rats.
Leopoldo, A S; Lima-Leopoldo, A P; Nascimento, A F; Luvizotto, R A M; Sugizaki, M M; Campos, D H S; da Silva, D C T; Padovani, C R; Cicogna, A C
2016-01-01
In experimental studies, several parameters, such as body weight, body mass index, adiposity index, and dual-energy X-ray absorptiometry, have commonly been used to demonstrate increased adiposity and investigate the mechanisms underlying obesity and sedentary lifestyles. However, these investigations have not classified the degree of adiposity nor defined adiposity categories for rats, such as normal, overweight, and obese. The aim of the study was to characterize the degree of adiposity in rats fed a high-fat diet using cluster analysis and to create adiposity intervals in an experimental model of obesity. Thirty-day-old male Wistar rats were fed a normal (n=41) or a high-fat (n=43) diet for 15 weeks. Obesity was defined based on the adiposity index; and the degree of adiposity was evaluated using cluster analysis. Cluster analysis allowed the rats to be classified into two groups (overweight and obese). The obese group displayed significantly higher total body fat and a higher adiposity index compared with those of the overweight group. No differences in systolic blood pressure or nonesterified fatty acid, glucose, total cholesterol, or triglyceride levels were observed between the obese and overweight groups. The adiposity index of the obese group was positively correlated with final body weight, total body fat, and leptin levels. Despite the classification of sedentary rats into overweight and obese groups, it was not possible to identify differences in the comorbidities between the two groups.
Enrichment Clusters: A Practical Plan for Real-World, Student-Driven Learning.
ERIC Educational Resources Information Center
Renzulli, Joseph S.; Gentry, Marcia; Reis, Sally M.
This guidebook provides a rationale and guidelines for implementing a student-driven learning approach using enrichment clusters. Enrichment clusters allow students who share a common interest to meet each week to produce a product, performance, or targeted service based on that common interest. Chapter 1 discusses different models of learning.…
Kéchichian, Razmig; Valette, Sébastien; Desvignes, Michel; Prost, Rémy
2013-11-01
We derive shortest-path constraints from graph models of structure adjacency relations and introduce them in a joint centroidal Voronoi image clustering and Graph Cut multiobject semiautomatic segmentation framework. The vicinity prior model thus defined is a piecewise-constant model incurring multiple levels of penalization capturing the spatial configuration of structures in multiobject segmentation. Qualitative and quantitative analyses and comparison with a Potts prior-based approach and our previous contribution on synthetic, simulated, and real medical images show that the vicinity prior allows for the correct segmentation of distinct structures having identical intensity profiles and improves the precision of segmentation boundary placement while being fairly robust to clustering resolution. The clustering approach we take to simplify images prior to segmentation strikes a good balance between boundary adaptivity and cluster compactness criteria furthermore allowing to control the trade-off. Compared with a direct application of segmentation on voxels, the clustering step improves the overall runtime and memory footprint of the segmentation process up to an order of magnitude without compromising the quality of the result.
NASA Technical Reports Server (NTRS)
Reese, Erik D.; Mroczkowski, Tony; Menanteau, Felipe; Hilton, Matt; Sievers, Jonathan; Aguirre, Paula; Appel, John William; Baker, Andrew J.; Bond, J. Richard; Das, Sudeep;
2011-01-01
We present follow-up observations with the Sunyaev-Zel'dovich Array (SZA) of optically-confirmed galaxy clusters found in the equatorial survey region of the Atacama Cosmology Telescope (ACT): ACT-CL J0022-0036, ACT-CL J2051+0057, and ACT-CL J2337+0016. ACT-CL J0022-0036 is a newly-discovered, massive (10(exp 15) Msun), high-redshift (z=0.81) cluster revealed by ACT through the Sunyaev-Zel'dovich effect (SZE). Deep, targeted observations with the SZA allow us to probe a broader range of cluster spatial scales, better disentangle cluster decrements from radio point source emission, and derive more robust integrated SZE flux and mass estimates than we can with ACT data alone. For the two clusters we detect with the SZA we compute integrated SZE signal and derive masses from the SZA data only. ACT-CL J2337+0016, also known as Abell 2631, has archival Chandra data that allow an additional X-ray-based mass estimate. Optical richness is also used to estimate cluster masses and shows good agreement with the SZE and X-ray-based estimates. Based on the point sources detected by the SZA in these three cluster fields and an extrapolation to ACT's frequency, we estimate that point sources could be contaminating the SZE decrement at the less than = 20% level for some fraction of clusters.
NASA Technical Reports Server (NTRS)
Reese, Erik; Mroczkowski, Tony; Menateau, Felipe; Hilton, Matt; Sievers, Jonathan; Aguirre, Paula; Appel, John William; Baker, Andrew J.; Bond, J. Richard; Das, Sudeep;
2011-01-01
We present follow-up observations with the Sunyaev-Zel'dovich Array (SZA) of optically-confirmed galaxy clusters found in the equatorial survey region of the Atacama Cosmology Telescope (ACT): ACT-CL J0022-0036, ACT-CL J2051+0057, and ACT-CL J2337+0016. ACT-CL J0022-0036 is a newly-discovered, massive ( approximately equals 10(exp 15) Solar M), high-redshift (z = 0.81) cluster revealed by ACT through the Sunyaev-Zeldovich effect (SZE). Deep, targeted observations with the SZA allow us to probe a broader range of cluster spatial scales, better disentangle cluster decrements from radio point source emission, and derive more robust integrated SZE flux and mass estimates than we can with ACT data alone. For the two clusters we detect with the SZA we compute integrated SZE signal and derive masses from the SZA data only. ACT-CL J2337+0016, also known as Abell 2631, has archival Chandra data that allow an additional X-ray-based mass estimate. Optical richness is also used to estimate cluster masses and shows good agreement with the SZE and X-ray-based estimates. Based on the point sources detected by the SZA in these three cluster fields and an extrapolation to ACT's frequency, we estimate that point sources could be contaminating the SZE decrement at the approx < 20% level for some fraction of clusters.
Automated extraction and analysis of rock discontinuity characteristics from 3D point clouds
NASA Astrophysics Data System (ADS)
Bianchetti, Matteo; Villa, Alberto; Agliardi, Federico; Crosta, Giovanni B.
2016-04-01
A reliable characterization of fractured rock masses requires an exhaustive geometrical description of discontinuities, including orientation, spacing, and size. These are required to describe discontinuum rock mass structure, perform Discrete Fracture Network and DEM modelling, or provide input for rock mass classification or equivalent continuum estimate of rock mass properties. Although several advanced methodologies have been developed in the last decades, a complete characterization of discontinuity geometry in practice is still challenging, due to scale-dependent variability of fracture patterns and difficult accessibility to large outcrops. Recent advances in remote survey techniques, such as terrestrial laser scanning and digital photogrammetry, allow a fast and accurate acquisition of dense 3D point clouds, which promoted the development of several semi-automatic approaches to extract discontinuity features. Nevertheless, these often need user supervision on algorithm parameters which can be difficult to assess. To overcome this problem, we developed an original Matlab tool, allowing fast, fully automatic extraction and analysis of discontinuity features with no requirements on point cloud accuracy, density and homogeneity. The tool consists of a set of algorithms which: (i) process raw 3D point clouds, (ii) automatically characterize discontinuity sets, (iii) identify individual discontinuity surfaces, and (iv) analyse their spacing and persistence. The tool operates in either a supervised or unsupervised mode, starting from an automatic preliminary exploration data analysis. The identification and geometrical characterization of discontinuity features is divided in steps. First, coplanar surfaces are identified in the whole point cloud using K-Nearest Neighbor and Principal Component Analysis algorithms optimized on point cloud accuracy and specified typical facet size. Then, discontinuity set orientation is calculated using Kernel Density Estimation and principal vector similarity criteria. Poles to points are assigned to individual discontinuity objects using easy custom vector clustering and Jaccard distance approaches, and each object is segmented into planar clusters using an improved version of the DBSCAN algorithm. Modal set orientations are then recomputed by cluster-based orientation statistics to avoid the effects of biases related to cluster size and density heterogeneity of the point cloud. Finally, spacing values are measured between individual discontinuity clusters along scanlines parallel to modal pole vectors, whereas individual feature size (persistence) is measured using 3D convex hull bounding boxes. Spacing and size are provided both as raw population data and as summary statistics. The tool is optimized for parallel computing on 64bit systems, and a Graphic User Interface (GUI) has been developed to manage data processing, provide several outputs, including reclassified point clouds, tables, plots, derived fracture intensity parameters, and export to modelling software tools. We present test applications performed both on synthetic 3D data (simple 3D solids) and real case studies, validating the results with existing geomechanical datasets.
40 CFR 272.1150 - State authorization.
Code of Federal Regulations, 2010 CFR
2010-07-01
... FR 48608) and RCRA Cluster III authorization effective June 24, 1991 (see 56 FR 18517). (b) Michigan... intent to allow such action in a Federal Register notice granting Michigan authorization and RCRA Cluster...
Stefurak, Tres; Calhoun, Georgia B
2007-01-01
The current study sought to explore subtypes of adolescents within a sample of female juvenile offenders. Using the Millon Adolescent Clinical Inventory with 101 female juvenile offenders, a two-step cluster analysis was performed beginning with a Ward's method hierarchical cluster analysis followed by a K-Means iterative partitioning cluster analysis. The results suggest an optimal three-cluster solution, with cluster profiles leading to the following group labels: Externalizing Problems, Depressed/Interpersonally Ambivalent, and Anxious Prosocial. Analysis along the factors of age, race, offense typology and offense chronicity were conducted to further understand the nature of found clusters. Only the effect for race was significant with the Anxious Prosocial and Depressed Intepersonally Ambivalent clusters appearing disproportionately comprised of African American girls. To establish external validity, clusters were compared across scales of the Behavioral Assessment System for Children - Self Report of Personality, and corroborative distinctions between clusters were found here.
Marangi, M; Cantacessi, C; Sparagano, O A E; Camarda, A; Giangaspero, A
2014-12-01
In order to investigate the genetic relationships between Dermanyssus gallinae (Metastigmata: Dermanyssidae) (de Geer) isolates from poultry farms in Italy and other European countries, phylogenetic analysis was performed using a portion of the cytochrome c oxidase subunit 1 (cox1) gene of the mitochondrial DNA and the internal transcribed spacers (ITS1+5.8S+ITS2) of the ribosomal DNA. A total of 360 cox1 sequences and 360 ITS+ sequences were obtained from mites collected on 24 different poultry farms in 10 different regions of Northern and Southern Italy. Phylogenetic analysis of the cox1 sequences resulted in the clustering of two groups (A and B), whereas phylogenetic analysis of the ITS+ resulted in largely unresolved clusters. Knowledge of the genetic make-up of mite populations within countries, together with comparative analyses of D. gallinae isolates from different countries, will provide better understanding of the population dynamics of D. gallinae. This will also allow the identification of genetic markers of emerging acaricide resistance and the development of alternative strategies for the prevention and treatment of infestations. © 2014 The Royal Entomological Society.
Entropy generation across Earth's collisionless bow shock.
Parks, G K; Lee, E; McCarthy, M; Goldstein, M; Fu, S Y; Cao, J B; Canu, P; Lin, N; Wilber, M; Dandouras, I; Réme, H; Fazakerley, A
2012-02-10
Earth's bow shock is a collisionless shock wave but entropy has never been directly measured across it. The plasma experiments on Cluster and Double Star measure 3D plasma distributions upstream and downstream of the bow shock allowing calculation of Boltzmann's entropy function H and his famous H theorem, dH/dt≤0. The collisionless Boltzmann (Vlasov) equation predicts that the total entropy does not change if the distribution function across the shock becomes nonthermal, but it allows changes in the entropy density. Here, we present the first direct measurements of entropy density changes across Earth's bow shock and show that the results generally support the model of the Vlasov analysis. These observations are a starting point for a more sophisticated analysis that includes 3D computer modeling of collisionless shocks with input from observed particles, waves, and turbulences.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
Russi, Luigi; Marconi, Gianpiero; Sharbel, Timothy F.; Veronesi, Fabio; Albertini, Emidio
2015-01-01
Poa pratensis L. is a forage and turf grass species well adapted to a wide range of mesic to moist habitats. Due to its genome complexity little is known regarding evolution, genome composition and intraspecific phylogenetic relationships of this species. In the present study we investigated the morphological and genetic diversity of 33 P. pratensis accessions from 23 different countries using both nuclear and chloroplast molecular markers as well as flow cytometry of somatic tissues. This with the aim of shedding light on the genetic diversity and phylogenetic relationships of the collection that includes both cultivated and wild materials. Morphological characterization showed that the most relevant traits able to distinguish cultivated from wild forms were spring growth habit and leaf colour. The genome size analysis revealed high variability both within and between accessions in both wild and cultivated materials. The sequence analysis of the trnL-F chloroplast region revealed a low polymorphism level that could be the result of the complex mode of reproduction of this species. In addition, a strong reduction of chloroplast SSR variability was detected in cultivated materials, where only two alleles were conserved out of the four present in wild accessions. Contrarily, at nuclear level, high variability exist in the collection where the analysis of 11 SSR loci allowed the detection of a total of 91 different alleles. A Bayesian analysis performed on nuclear SSR data revealed that studied materials belong to two main clusters. While wild materials are equally represented in both clusters, the domesticated forms are mostly belonging to cluster P2 which is characterized by lower genetic diversity compared to the cluster P1. In the Neighbour Joining tree no clear distinction was found between accessions with the exception of those from China and Mongolia that were clearly separated from all the others. PMID:25893249
NASA Astrophysics Data System (ADS)
Spina, L.; Randich, S.; Magrini, L.; Jeffries, R. D.; Friel, E. D.; Sacco, G. G.; Pancino, E.; Bonito, R.; Bravi, L.; Franciosini, E.; Klutsch, A.; Montes, D.; Gilmore, G.; Vallenari, A.; Bensby, T.; Bragaglia, A.; Flaccomio, E.; Koposov, S. E.; Korn, A. J.; Lanzafame, A. C.; Smiljanic, R.; Bayo, A.; Carraro, G.; Casey, A. R.; Costado, M. T.; Damiani, F.; Donati, P.; Frasca, A.; Hourihane, A.; Jofré, P.; Lewis, J.; Lind, K.; Monaco, L.; Morbidelli, L.; Prisinzano, L.; Sousa, S. G.; Worley, C. C.; Zaggia, S.
2017-05-01
Context. The radial metallicity distribution in the Galactic thin disc represents a crucial constraint for modelling disc formation and evolution. Open star clusters allow us to derive both the radial metallicity distribution and its evolution over time. Aims: In this paper we perform the first investigation of the present-day radial metallicity distribution based on [Fe/H] determinations in late type members of pre-main-sequence clusters. Because of their youth, these clusters are therefore essential for tracing the current interstellar medium metallicity. Methods: We used the products of the Gaia-ESO Survey analysis of 12 young regions (age < 100 Myr), covering Galactocentric distances from 6.67 to 8.70 kpc. For the first time, we derived the metal content of star forming regions farther than 500 pc from the Sun. Median metallicities were determined through samples of reliable cluster members. For ten clusters the membership analysis is discussed in the present paper, while for other two clusters (I.e. Chamaeleon I and Gamma Velorum) we adopted the members identified in our previous works. Results: All the pre-main-sequence clusters considered in this paper have close-to-solar or slightly sub-solar metallicities. The radial metallicity distribution traced by these clusters is almost flat, with the innermost star forming regions having [Fe/H] values that are 0.10-0.15 dex lower than the majority of the older clusters located at similar Galactocentric radii. Conclusions: This homogeneous study of the present-day radial metallicity distribution in the Galactic thin disc favours models that predict a flattening of the radial gradient over time. On the other hand, the decrease of the average [Fe/H] at young ages is not easily explained by the models. Our results reveal a complex interplay of several processes (e.g. star formation activity, initial mass function, supernova yields, gas flows) that controlled the recent evolution of the Milky Way. Based on observations made with the ESO/VLT, at Paranal Observatory, under program 188.B-3002 (The Gaia-ESO Public Spectroscopic Survey).Full Table 1 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/601/A70
Moreira, Naiara Ferraz; da Veiga, Gloria Valeria; Santaliestra-Pasías, Alba María; Androutsos, Odysseas; Cuenca-García, Magdalena; de Oliveira, Alessandra Silva Dias; Pereira, Rosangela Alves; de Moraes, Anelise Bezerra de Vasconcelos; Van den Bussche, Karen; Censi, Laura; González-Gross, Marcela; Cañada, David; Gottrand, Frederic; Kafatos, Anthony; Marcos, Ascensión; Widhalm, Kurt; Mólnar, Dénes; Moreno, Luis Alberto
2018-01-01
The objective of this study was to identify clustering patterns of four energy balance-related behaviors (EBRB): television (TV) watching, moderate and vigorous physical activity (MVPA), consumption of fruits and vegetables (F&V), and consumption of sugar-sweetened beverages (SSB), among European and Brazilian adolescents. EBRB associations with different body fat composition indicators were then evaluated. Participants included adolescents from eight European countries in the HELENA (Healthy Lifestyle in Europe by Nutrition in Adolescents) study (n = 2,057, 53.8% female; age: 12.5-17.5 years) and from the metropolitan region of Rio de Janeiro/Brazil in the ELANA study (the Adolescent Nutritional Assessment Longitudinal Study) (n = 968, 53.2% female; age: 13.5-19 years). EBRB data allowed for sex- and study-specific clusters. Associations were estimated by ANOVA and odds ratios. Five clustering patterns were identified. Four similar clusters were identified for each sex and study. Among boys, different cluster identified was characterized by high F&V consumption in the HELENA study and high TV watching and high MVPA time in the ELANA study. Among girls, the different clusters identified was characterized by high F&V consumption in both studies and, additionally, high SSB consumption in the ELANA study. Regression analysis showed that clusters characterized by high SSB consumption in European boys; high TV watching, and high TV watching plus high MVPA in Brazilian boys; and high MVPA, and high SSB and F&V consumption in Brazilian girls, were positively associated with different body fat composition indicators. Common clusters were observed in adolescents from Europe and Brazil, however, no cluster was identified as being completely healthy or unhealthy. Each cluster seems to impact on body composition indicators, depending on the group. Public health actions should aim to promote adequate practices of EBRB. Copyright © 2017. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Tran, T. J.; Bruening, J. M.; Bunn, A. G.; Salzer, M. W.; Weiss, S. B.
2015-12-01
Great Basin bristlecone pine (Pinus longaeva) is a useful climate proxy because of the species' long lifespan (up to 5000 years) and the climatic sensitivity of its annually-resolved rings. Past studies have shown that growth of individual trees can be limited by temperature, soil moisture, or a combination of the two depending on biophysical setting at the scale of tens of meters. We extend recent research suggesting that trees vary in their growth response depending on their position on the landscape to analyze how growth patterns vary over time. We used hierarchical cluster analysis to examine the growth of 52 bristlecone pine trees near the treeline of Mount Washington, Nevada, USA. We classified growth of individual trees over the instrumental climate record into one of two possible scenarios: trees belonging to a temperature-sensitive cluster and trees belonging to a precipitation-sensitive cluster. The number of trees in the precipitation-sensitive cluster outnumbered the number of trees in the temperature-sensitive cluster, with trees in colder locations belonging to the temperature-sensitive cluster. When we separated the temporal range into two sections (1895-1949 and 1950-2002) spanning the length of the instrumental climate record, we found that most of the 52 trees remained loyal to their cluster membership (e.g., trees in the temperature-sensitive cluster in 1895-1949 were also in the temperature sensitive cluster in 1950-2002), though not without exception. Of those trees that do not remain consistent in cluster membership, the majority changed from temperature-sensitive to precipitation-sensitive as time progressed. This could signal a switch from temperature limitation to water limitation with warming climate. We speculate that topographic complexity in high mountain environments like Mount Washington might allow for climate refugia where growth response could remain constant over the Holocene.
Discovery of a large-scale clumpy structure around the Lynx supercluster at z~ 1.27
NASA Astrophysics Data System (ADS)
Nakata, Fumiaki; Kodama, Tadayuki; Shimasaku, Kazuhiro; Doi, Mamoru; Furusawa, Hisanori; Hamabe, Masaru; Kimura, Masahiko; Komiyama, Yutaka; Miyazaki, Satoshi; Okamura, Sadanori; Ouchi, Masami; Sekiguchi, Maki; Ueda, Yoshihiro; Yagi, Masafumi; Yasuda, Naoki
2005-03-01
We report the discovery of a probable large-scale structure composed of many galaxy clumps around the known twin clusters at z= 1.26 and 1.27 in the Lynx region. Our analysis is based on deep, panoramic, and multicolour imaging, 26.4 × 24.1 arcmin2 in VRi'z' bands with the Suprime-Cam on the 8.2-m Subaru telescope. This unique, deep and wide-field imaging data set allows us for the first time to map out the galaxy distribution in the highest-redshift supercluster known. We apply a photometric redshift technique to extract plausible cluster members at z~ 1.27 down to i'= 26.15 (5σ) corresponding to ~M*+ 2.5 at this redshift. From the two-dimensional distribution of these photometrically selected galaxies, we newly identify seven candidates of galaxy groups or clusters where the surface density of red galaxies is significantly high (>5σ), in addition to the two known clusters. These candidates show clear red colour-magnitude sequences consistent with a passive evolution model, which suggests the existence of additional high-density regions around the Lynx superclusters.
On the lithium dip in the metal poor open cluster NGC 2243
NASA Astrophysics Data System (ADS)
François, P.; Pasquini, L.; Biazzo, K.; Bonifacio, P.; Palsa, R.
2014-05-01
Lithium is a key element for studying the mixing mechanisms operating in stellar interiors. It can also be used to probe the chemical evolution of the Galaxy and the Big Bang nucleosynthesis. Measuring the abundance of Lithium in stars belonging to Open Clusters (hereafter OC) allows a detailed comparison with stellar evolutionary models. NGC 2243 is particularly interesting thanks to its relative low metallicity ([Fe/H]=-0.54 ± 0.10 dex). We performed a detailed analysis of high-resolution spectra obtained with the multi-object facility FLAMES at the VLT 8.2m telescope. Lithium abundance has been measured in 27 stars. We found a Li dip center of 1.06 M⊙, which is significantly smaller than that observed in solar metallicity and metal-rich clusters. This finding confirms and strengthens the conclusion that the mass of the stars in the Li dip strongly depends on stellar metallicity. The mean Li abundance of the cluster is log n(Li) = 2.70 dex, which is substantially higher than that observed in 47 Tue. We derived an iron abundance of [Fe/H]=-0.54±0.10 dex for NGC 2243, in agreement (within the errors) with previous findings.
Penalized unsupervised learning with outliers
Witten, Daniela M.
2013-01-01
We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057
Resche-Rigon, Matthieu; White, Ian R
2018-06-01
In multilevel settings such as individual participant data meta-analysis, a variable is 'systematically missing' if it is wholly missing in some clusters and 'sporadically missing' if it is partly missing in some clusters. Previously proposed methods to impute incomplete multilevel data handle either systematically or sporadically missing data, but frequently both patterns are observed. We describe a new multiple imputation by chained equations (MICE) algorithm for multilevel data with arbitrary patterns of systematically and sporadically missing variables. The algorithm is described for multilevel normal data but can easily be extended for other variable types. We first propose two methods for imputing a single incomplete variable: an extension of an existing method and a new two-stage method which conveniently allows for heteroscedastic data. We then discuss the difficulties of imputing missing values in several variables in multilevel data using MICE, and show that even the simplest joint multilevel model implies conditional models which involve cluster means and heteroscedasticity. However, a simulation study finds that the proposed methods can be successfully combined in a multilevel MICE procedure, even when cluster means are not included in the imputation models.
Cosmological parameter estimation from CMB and X-ray cluster after Planck
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Jian-Wei; Cai, Rong-Gen; Guo, Zong-Kuan
We investigate constraints on cosmological parameters in three 8-parameter models with the summed neutrino mass as a free parameter, by a joint analysis of CCCP X-ray cluster data, the newly released Planck CMB data as well as some external data sets including baryon acoustic oscillation measurements from the 6dFGS, SDSS DR7 and BOSS DR9 surveys, and Hubble Space Telescope H{sub 0} measurement. We find that the combined data strongly favor a non-zero neutrino masses at more than 3σ confidence level in these non-vanilla models. Allowing the CMB lensing amplitude A{sub L} to vary, we find A{sub L} > 1 atmore » 3σ confidence level. For dark energy with a constant equation of state w, we obtain w < −1 at 3σ confidence level. The estimate of the matter power spectrum amplitude σ{sub 8} is discrepant with the Planck value at 2σ confidence level, which reflects some tension between X-ray cluster data and Planck data in these non-vanilla models. The tension can be alleviated by adding a 9% systematic shift in the cluster mass function.« less
Genome sequencing and secondary metabolism of the postharvest pathogen Penicillium griseofulvum.
Banani, Houda; Marcet-Houben, Marina; Ballester, Ana-Rosa; Abbruscato, Pamela; González-Candelas, Luis; Gabaldón, Toni; Spadaro, Davide
2016-01-05
Penicillium griseofulvum is associated in stored apples with blue mould, the most important postharvest disease of pome fruit. This pathogen can simultaneously produce both detrimental and beneficial secondary metabolites (SM). In order to gain insight into SM synthesis in P. griseofulvum in vitro and during disease development on apple, we sequenced the genome of P. griseofulvum strain PG3 and analysed important SM clusters. PG3 genome sequence (29.3 Mb) shows that P. griseofulvum branched off after the divergence of P. oxalicum but before the divergence of P. chrysogenum. Genome-wide analysis of P. griseofulvum revealed putative gene clusters for patulin, griseofulvin and roquefortine C biosynthesis. Furthermore, we quantified the SM production in vitro and on apples during the course of infection. The expression kinetics of key genes of SM produced in infected apple were examined. We found additional SM clusters, including those potentially responsible for the synthesis of penicillin, yanuthone D, cyclopiazonic acid and we predicted a cluster putatively responsible for the synthesis of chanoclavine I. These findings provide relevant information to understand the molecular basis of SM biosynthesis in P. griseofulvum, to allow further research directed to the overexpression or blocking the synthesis of specific SM.
Merging history of three bimodal clusters
NASA Astrophysics Data System (ADS)
Maurogordato, S.; Sauvageot, J. L.; Bourdin, H.; Cappi, A.; Benoist, C.; Ferrari, C.; Mars, G.; Houairi, K.
2011-01-01
We present a combined X-ray and optical analysis of three bimodal galaxy clusters selected as merging candidates at z ~ 0.1. These targets are part of MUSIC (MUlti-Wavelength Sample of Interacting Clusters), which is a general project designed to study the physics of merging clusters by means of multi-wavelength observations. Observations include spectro-imaging with XMM-Newton EPIC camera, multi-object spectroscopy (260 new redshifts), and wide-field imaging at the ESO 3.6 m and 2.2 m telescopes. We build a global picture of these clusters using X-ray luminosity and temperature maps together with galaxy density and velocity distributions. Idealized numerical simulations were used to constrain the merging scenario for each system. We show that A2933 is very likely an equal-mass advanced pre-merger ~200 Myr before the core collapse, while A2440 and A2384 are post-merger systems (~450 Myr and ~1.5 Gyr after core collapse, respectively). In the case of A2384, we detect a spectacular filament of galaxies and gas spreading over more than 1 h-1 Mpc, which we infer to have been stripped during the previous collision. The analysis of the MUSIC sample allows us to outline some general properties of merging clusters: a strong luminosity segregation of galaxies in recent post-mergers; the existence of preferential axes - corresponding to the merging directions - along which the BCGs and structures on various scales are aligned; the concomitance, in most major merger cases, of secondary merging or accretion events, with groups infalling onto the main cluster, and in some cases the evidence of previous merging episodes in one of the main components. These results are in good agreement with the hierarchical scenario of structure formation, in which clusters are expected to form by successive merging events, and matter is accreted along large-scale filaments. Based on data obtained with the European Southern Observatory, Chile (programs 072.A-0595, 075.A-0264, and 079.A-0425).Tables 5-7 are only available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/525/A79
Abualhaj, Bedor; Weng, Guoyang; Ong, Melissa; Attarwala, Ali Asgar; Molina, Flavia; Büsing, Karen; Glatting, Gerhard
2017-01-01
Dynamic [ 18 F]fluoro-ethyl-L-tyrosine positron emission tomography ([ 18 F]FET-PET) is used to identify tumor lesions for radiotherapy treatment planning, to differentiate glioma recurrence from radiation necrosis and to classify gliomas grading. To segment different regions in the brain k-means cluster analysis can be used. The main disadvantage of k-means is that the number of clusters must be pre-defined. In this study, we therefore compared different cluster validity indices for automated and reproducible determination of the optimal number of clusters based on the dynamic PET data. The k-means algorithm was applied to dynamic [ 18 F]FET-PET images of 8 patients. Akaike information criterion (AIC), WB, I, modified Dunn's and Silhouette indices were compared on their ability to determine the optimal number of clusters based on requirements for an adequate cluster validity index. To check the reproducibility of k-means, the coefficients of variation CVs of the objective function values OFVs (sum of squared Euclidean distances within each cluster) were calculated using 100 random centroid initialization replications RCI 100 for 2 to 50 clusters. k-means was performed independently on three neighboring slices containing tumor for each patient to investigate the stability of the optimal number of clusters within them. To check the independence of the validity indices on the number of voxels, cluster analysis was applied after duplication of a slice selected from each patient. CVs of index values were calculated at the optimal number of clusters using RCI 100 to investigate the reproducibility of the validity indices. To check if the indices have a single extremum, visual inspection was performed on the replication with minimum OFV from RCI 100 . The maximum CV of OFVs was 2.7 × 10 -2 from all patients. The optimal number of clusters given by modified Dunn's and Silhouette indices was 2 or 3 leading to a very poor segmentation. WB and I indices suggested in median 5, [range 4-6] and 4, [range 3-6] clusters, respectively. For WB, I, modified Dunn's and Silhouette validity indices the suggested optimal number of clusters was not affected by the number of the voxels. The maximum coefficient of variation of WB, I, modified Dunn's, and Silhouette validity indices were 3 × 10 -2 , 1, 2 × 10 -1 and 3 × 10 -3 , respectively. WB-index showed a single global maximum, whereas the other indices showed also local extrema. From the investigated cluster validity indices, the WB-index is best suited for automated determination of the optimal number of clusters for [ 18 F]FET-PET brain images for the investigated image reconstruction algorithm and the used scanner: it yields meaningful results allowing better differentiation of tissues with higher number of clusters, it is simple, reproducible and has an unique global minimum. © 2016 American Association of Physicists in Medicine.
Workplace cluster of Bell’s palsy in Lima, Peru
2014-01-01
Background We report on a workplace cluster of Bell’s palsy that occurred within a four-month period in 2011 among employees of a three-story office building in Lima, Peru and our investigation to determine the etiology and associated risk factors. Findings An outbreak investigation was conducted to identify possible common infectious or environmental exposures and included patient interviews, reviews of medical records, an epidemiologic survey, serological analysis for IgM and IgG antibodies to putative Bell’s palsy-inducing pathogens, and an environmental exposure assessment of the office building. Three cases of Bell’s palsy were reported among 65 at-risk employees, attack rate 4.6%. Although two patients had underlying risk factors, there was no clear association or common identifiable risk factor among all cases. Serologic analysis showed no evidence of recent infections, and air and water sample measures of all known chemical or neurotoxins were below maximum allowable concentrations for exposure. Conclusions An infection spread among workplace employees could not be excluded as a potential cause of this cluster; however, it was unlikely a pathogen commonly associated with individual cases of Bell’s palsy. Although a specific etiology was not identified among all cases, we believe this methodology will aid future outbreak investigations of Bell’s palsy and a better understanding of its etiology. While environmental assessments may be useful in their ability to ascertain the cause of clusters of Bell’s palsy, future investigations should prioritize focus on common infectious etiology. PMID:24885256
Galaxy CloudMan: delivering cloud compute clusters.
Afgan, Enis; Baker, Dannon; Coraor, Nate; Chapman, Brad; Nekrutenko, Anton; Taylor, James
2010-12-21
Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is "cloud computing", which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate "as is" use by experimental biologists. We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon's EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
Galaxy CloudMan: delivering cloud compute clusters
2010-01-01
Background Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists. Results We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs. Conclusions The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge. PMID:21210983
Bocquet, S.; Saro, A.; Mohr, J. J.; ...
2015-01-30
Here, we present a velocity-dispersion-based mass calibration of the South Pole Telescope Sunyaev-Zel'dovich effect survey (SPT-SZ) galaxy cluster sample. Using a homogeneously selected sample of 100 cluster candidates from 720 deg 2 of the survey along with 63 velocity dispersion (σ v) and 16 X-ray Y X measurements of sample clusters, we simultaneously calibrate the mass-observable relation and constrain cosmological parameters. Our method accounts for cluster selection, cosmological sensitivity, and uncertainties in the mass calibrators. The calibrations using σ v and Y X are consistent at the 0.6σ level, with the σ v calibration preferring ~16% higher masses. We usemore » the full SPTCL data set (SZ clusters+σ v+Y X) to measure σ 8(Ωm/0.27) 0.3 = 0.809 ± 0.036 within a flat ΛCDM model. The SPT cluster abundance is lower than preferred by either the WMAP9 or Planck+WMAP9 polarization (WP) data, but assuming that the sum of the neutrino masses is m ν = 0.06 eV, we find the data sets to be consistent at the 1.0σ level for WMAP9 and 1.5σ for Planck+WP. Allowing for larger Σm ν further reconciles the results. When we combine the SPTCL and Planck+WP data sets with information from baryon acoustic oscillations and Type Ia supernovae, the preferred cluster masses are 1.9σ higher than the Y X calibration and 0.8σ higher than the σ v calibration. Given the scale of these shifts (~44% and ~23% in mass, respectively), we execute a goodness-of-fit test; it reveals no tension, indicating that the best-fit model provides an adequate description of the data. Using the multi-probe data set, we measure Ω m = 0.299 ± 0.009 and σ8 = 0.829 ± 0.011. Within a νCDM model we find Σm ν = 0.148 ± 0.081 eV. We present a consistency test of the cosmic growth rate using SPT clusters. Allowing both the growth index γ and the dark energy equation-of-state parameter w to vary, we find γ = 0.73 ± 0.28 and w = –1.007 ± 0.065, demonstrating that the eΣxpansion and the growth histories are consistent with a ΛCDM universe (γ = 0.55; w = –1).« less
NASA Astrophysics Data System (ADS)
Bocquet, S.; Saro, A.; Mohr, J. J.; Aird, K. A.; Ashby, M. L. N.; Bautz, M.; Bayliss, M.; Bazin, G.; Benson, B. A.; Bleem, L. E.; Brodwin, M.; Carlstrom, J. E.; Chang, C. L.; Chiu, I.; Cho, H. M.; Clocchiatti, A.; Crawford, T. M.; Crites, A. T.; Desai, S.; de Haan, T.; Dietrich, J. P.; Dobbs, M. A.; Foley, R. J.; Forman, W. R.; Gangkofner, D.; George, E. M.; Gladders, M. D.; Gonzalez, A. H.; Halverson, N. W.; Hennig, C.; Hlavacek-Larrondo, J.; Holder, G. P.; Holzapfel, W. L.; Hrubes, J. D.; Jones, C.; Keisler, R.; Knox, L.; Lee, A. T.; Leitch, E. M.; Liu, J.; Lueker, M.; Luong-Van, D.; Marrone, D. P.; McDonald, M.; McMahon, J. J.; Meyer, S. S.; Mocanu, L.; Murray, S. S.; Padin, S.; Pryke, C.; Reichardt, C. L.; Rest, A.; Ruel, J.; Ruhl, J. E.; Saliwanchik, B. R.; Sayre, J. T.; Schaffer, K. K.; Shirokoff, E.; Spieler, H. G.; Stalder, B.; Stanford, S. A.; Staniszewski, Z.; Stark, A. A.; Story, K.; Stubbs, C. W.; Vanderlinde, K.; Vieira, J. D.; Vikhlinin, A.; Williamson, R.; Zahn, O.; Zenteno, A.
2015-02-01
We present a velocity-dispersion-based mass calibration of the South Pole Telescope Sunyaev-Zel'dovich effect survey (SPT-SZ) galaxy cluster sample. Using a homogeneously selected sample of 100 cluster candidates from 720 deg2 of the survey along with 63 velocity dispersion (σ v ) and 16 X-ray Y X measurements of sample clusters, we simultaneously calibrate the mass-observable relation and constrain cosmological parameters. Our method accounts for cluster selection, cosmological sensitivity, and uncertainties in the mass calibrators. The calibrations using σ v and Y X are consistent at the 0.6σ level, with the σ v calibration preferring ~16% higher masses. We use the full SPTCL data set (SZ clusters+σ v +Y X) to measure σ8(Ωm/0.27)0.3 = 0.809 ± 0.036 within a flat ΛCDM model. The SPT cluster abundance is lower than preferred by either the WMAP9 or Planck+WMAP9 polarization (WP) data, but assuming that the sum of the neutrino masses is ∑m ν = 0.06 eV, we find the data sets to be consistent at the 1.0σ level for WMAP9 and 1.5σ for Planck+WP. Allowing for larger ∑m ν further reconciles the results. When we combine the SPTCL and Planck+WP data sets with information from baryon acoustic oscillations and Type Ia supernovae, the preferred cluster masses are 1.9σ higher than the Y X calibration and 0.8σ higher than the σ v calibration. Given the scale of these shifts (~44% and ~23% in mass, respectively), we execute a goodness-of-fit test; it reveals no tension, indicating that the best-fit model provides an adequate description of the data. Using the multi-probe data set, we measure Ωm = 0.299 ± 0.009 and σ8 = 0.829 ± 0.011. Within a νCDM model we find ∑m ν = 0.148 ± 0.081 eV. We present a consistency test of the cosmic growth rate using SPT clusters. Allowing both the growth index γ and the dark energy equation-of-state parameter w to vary, we find γ = 0.73 ± 0.28 and w = -1.007 ± 0.065, demonstrating that the expansion and the growth histories are consistent with a ΛCDM universe (γ = 0.55; w = -1).
Clustering the Orion B giant molecular cloud based on its molecular emission
NASA Astrophysics Data System (ADS)
Bron, Emeric; Daudon, Chloé; Pety, Jérôme; Levrier, François; Gerin, Maryvonne; Gratier, Pierre; Orkisz, Jan H.; Guzman, Viviana; Bardeau, Sébastien; Goicoechea, Javier R.; Liszt, Harvey; Öberg, Karin; Peretto, Nicolas; Sievers, Albrecht; Tremblin, Pascal
2018-02-01
Context. Previous attempts at segmenting molecular line maps of molecular clouds have focused on using position-position-velocity data cubes of a single molecular line to separate the spatial components of the cloud. In contrast, wide field spectral imaging over a large spectral bandwidth in the (sub)mm domain now allows one to combine multiple molecular tracers to understand the different physical and chemical phases that constitute giant molecular clouds (GMCs). Aims: We aim at using multiple tracers (sensitive to different physical processes and conditions) to segment a molecular cloud into physically/chemically similar regions (rather than spatially connected components), thus disentangling the different physical/chemical phases present in the cloud. Methods: We use a machine learning clustering method, namely the Meanshift algorithm, to cluster pixels with similar molecular emission, ignoring spatial information. Clusters are defined around each maximum of the multidimensional probability density function (PDF) of the line integrated intensities. Simple radiative transfer models were used to interpret the astrophysical information uncovered by the clustering analysis. Results: A clustering analysis based only on the J = 1-0 lines of three isotopologues of CO proves sufficient to reveal distinct density/column density regimes (nH 100 cm-3, 500 cm-3, and >1000 cm-3), closely related to the usual definitions of diffuse, translucent and high-column-density regions. Adding two UV-sensitive tracers, the J = 1-0 line of HCO+ and the N = 1-0 line of CN, allows us to distinguish two clearly distinct chemical regimes, characteristic of UV-illuminated and UV-shielded gas. The UV-illuminated regime shows overbright HCO+ and CN emission, which we relate to a photochemical enrichment effect. We also find a tail of high CN/HCO+ intensity ratio in UV-illuminated regions. Finer distinctions in density classes (nH 7 × 103 cm-3, 4 × 104 cm-3) for the densest regions are also identified, likely related to the higher critical density of the CN and HCO+ (1-0) lines. These distinctions are only possible because the high-density regions are spatially resolved. Conclusions: Molecules are versatile tracers of GMCs because their line intensities bear the signature of the physics and chemistry at play in the gas. The association of simultaneous multi-line, wide-field mapping and powerful machine learning methods such as the Meanshift clustering algorithm reveals how to decode the complex information available in these molecular tracers. Data products associated with this paper are available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/610/A12 and at http://www.iram.fr/ pety/ORION-B
Spatial analysis of dengue fever in Guangdong Province, China, 2001-2006.
Liu, Chunxiao; Liu, Qiyong; Lin, Hualiang; Xin, Benqiang; Nie, Jun
2014-01-01
Guangdong Province is the area most seriously affected by dengue fever in China. In this study, we describe the spatial distribution of dengue fever in Guangdong Province from 2001 to 2006 with the objective of informing priority areas for public health planning and resource allocation. Annualized incidence at a county level was calculated and mapped to show crude incidence, excess hazard, and spatial smoothed incidence. Geographic information system-based spatial scan statistics was conducted to detect the spatial distribution pattern of dengue fever incidence at the county level. Spatial scan cluster analyses suggested that counties around Guangzhou City and Chaoshan Region were at increased risk for dengue fever (P < .01). Some spatial clusters of dengue fever were found in Guangdong Province, which allowed intervention measures to be targeted for maximum effect.
Methods to estimate lightning activity using WWLLN and RS data
NASA Astrophysics Data System (ADS)
Baranovskiy, Nikolay V.; Belikova, Marina Yu.; Karanina, Svetlana Yu.; Karanin, Andrey V.; Glebova, Alena V.
2017-11-01
The aim of the work is to develop a comprehensive method for assessing thunderstorm activity using WWLLN and RS data. It is necessary to group lightning discharges to solve practical problems of lightning protection and lightningcaused forest fire danger, as well as climatology problems using information on the spatial and temporal characteristics of thunderstorms. For grouping lightning discharges, it is proposed to use clustering algorithms. The region covering Timiryazevskiy forestry (Tomsk region, borders (55.93 - 56.86)x(83.94 - 85.07)) was selected for the computational experiment. We used the data on lightning discharges registered by the WWLLN network in this region on July 23, 2014. 273 lightning discharges were sampling. A relatively small number of discharges allowed us a visual analysis of solutions obtained during clustering.
CORS BAADE-WESSELINK DISTANCE TO THE LMC NGC 1866 BLUE POPULOUS CLUSTER
DOE Office of Scientific and Technical Information (OSTI.GOV)
Molinaro, R.; Ripepi, V.; Marconi, M.
2012-03-20
We used optical, near-infrared photometry, and radial velocity data for a sample of 11 Cepheids belonging to the young LMC blue populous cluster NGC 1866 to estimate their radii and distances on the basis of the CORS Baade-Wesselink method. This technique, based on an accurate calibration of surface brightness as a function of (U - B), (V - K) colors, allows us to estimate, simultaneously, the linear radius and the angular diameter of Cepheid variables, and consequently to derive their distance. A rigorous error estimate on radii and distances was derived by using Monte Carlo simulations. Our analysis gives amore » distance modulus for NGC 1866 of 18.51 {+-} 0.03 mag, which is in agreement with several independent results.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fasolato, C.; Center for Life Nanoscience@Sapienza, Istituto Italiano di Tecnologia, Rome; Domenici, F., E-mail: fabiodomenici@gmail.com
2015-06-23
The coherent oscillations of the surface electron gas, known as surface plasmons, in metal nanostructures can give rise to the localization of intense electromagnetic fields at the metal-dielectric interface. These strong fields are exploited in surface enhanced spectroscopies, such as Surface Enhanced Raman Scattering (SERS), for the detection and characterization of molecules at very low concentration. Still, the implementation of SERS-based biosensors requires a high level of reproducibility, combined with cheap and simple fabrication methods. For this purpose, SERS substrates based on self-assembled aggregates of commercial metallic nanoparticles (Nps) can meet all the above requests. Following this line, we reportmore » on a combined micro-Raman and Atomic Force Microscopy (AFM) analysis of the SERS efficiency of micrometric silver Np aggregates (enhancement factors up to 10{sup 9}) obtained by self-assembly. Despite the intrinsic disordered nature of these Np clusters, we were able to sort out some general rules relating the specific aggregate morphology to its plasmonic response. We found strong evidences of cooperative effects among the NPs within the cluster and namely a clear dependence of the SERS-efficiency on both the cluster area (basically linear) and the number of stacked NPs layers. A cooperative action among the superimposed layers has been proved also by electromagnetic simulations performed on simplified nanostructures consisting of stacking planes of ordered Nps. Being clear the potentialities of these disordered self-assembled clusters, in terms of both easy fabrication and signal enhancement, we developed a specific nanofabrication protocol, based on electron beam lithography and molecular functionalization, that allowed for a fine control of the Np assemblies into designed shapes fixing their area and height. In particular, we fabricated 2D ordered arrays of disordered clusters choosing gold Nps owing to their high stability. AFM measurements confirmed the regularity in spacing and size (i.e. area and layer number) of the aggregates. Preliminary SERS measurements confirm the high signal enhancement and demonstrate a quite good reproducibility over large number of aggregates within 100×100 μm{sup 2} 2D super-structure. The availability of such a multisensor could allow a careful statistical analysis of the SERS response, thus leading to a reliable quantitative estimate of the presence of relevant molecular species even at ultra-low concentration.« less
Gambling, games of skill and human ecology: a pilot study by a multidimensional analysis approach.
Valera, Luca; Giuliani, Alessandro; Gizzi, Alessio; Tartaglia, Francesco; Tambone, Vittoradolfo
2015-01-01
The present pilot study aims at analyzing the human activity of playing in the light of an indicator of human ecology (HE). We highlighted the four essential anthropological dimensions (FEAD), starting from the analysis of questionnaires administered to actual gamers. The coherence between theoretical construct and observational data is a remarkable proof-of-concept of the possibility of establishing an experimentally motivated link between a philosophical construct (coming from Huizinga's Homo ludens definition) and actual gamers' motivation pattern. The starting hypothesis is that the activity of playing becomes ecological (and thus not harmful) when it achieves the harmony between the FEAD, thus realizing HE; conversely, it becomes at risk of creating some form of addiction, when destroying FEAD balance. We analyzed the data by means of variable clustering (oblique principal components) so to experimentally verify the existence of the hypothesized dimensions. The subsequent projection of statistical units (gamers) on the orthogonal space spanned by principal components allowed us to generate a meaningful, albeit preliminary, clusterization of gamer profiles.
Milanović, Vesna; Osimani, Andrea; Pasquini, Marina; Aquilanti, Lucia; Garofalo, Cristiana; Taccari, Manuela; Cardinali, Federica; Riolo, Paola; Clementi, Francesca
2016-06-16
This study was aimed at investigating the occurrence of 11 transferable antibiotic resistance (AR) genes [erm(A), erm(B), erm(C), vanA, vanB, tet(M), tet(O), tet(S), tet(K), mecA, blaZ] in 11 species of marketed edible insects (small crickets powder, small crickets, locusts, mealworm larvae, giant waterbugs, black ants, winged termite alates, rhino beetles, mole crickets, silkworm pupae, and black scorpions) in order to provide a first baseline for risk assessment. Among the AR genes under study, tet(K) occurred with the highest frequency, followed by erm(B), tet(S) and blaZ. A high variability was seen among the samples, in terms of occurrence of different AR determinants. Cluster Analysis and Principal Coordinates Analysis allowed the 11 samples to be grouped in two main clusters, one including all but one samples produced in Thailand and the other including those produced in the Netherlands. Copyright © 2016 Elsevier B.V. All rights reserved.
Investigation of spacial clustering of rare diseases: childhood malignancies in North Humberside.
Alexander, F; Cartwright, R; McKinney, P A; Ricketts, T J
1990-03-01
The aims of the study were (1) to test for uniformity of distribution of childhood leukaemias and other malignancies; and (2) to consider the aetiological implications of unusual distributions. A test for spacial clustering was applied using a method which allows for unequal distribution of the population at risk and avoids using census data to provide population denominators. When clustering was identified, four possible aetiological links which had already been suggested to the Leukaemia Research Fund Centre were examined in a local area. The study was carried out in the Yorkshire Health Region in the north of England. 144 children under 15 years of age with a diagnosis of malignant disease known to the Yorkshire Regional Childhood Tumour Registry between 1974 and 1986 were included in the analysis. Of these 53 had leukaemias and nine had lymphomas. Significant localised clustering was found in North Humberside, though not in the whole of the Yorkshire Health Region. A number of clustered cases were identified, some of whom were in a post code sector, Hull 10, to the west of Kingston-upon-Hull, about which concern had been expressed since 1985. There was however no evidence that disease clustering was confined to this area. Four previously suggested hypotheses about causation in this particular area were examined but the results were negative or inconclusive. The identification of spacial clustering must be seen as only the first step in a series of investigations; it can only rarely lead to aetiological conclusions by itself, but it can motivate and target other investigations.
NASA Technical Reports Server (NTRS)
Carvalho, L. M. V.; Rickenbach, T.
1999-01-01
Satellite infrared (IR) and visible (VIS) images from the Tropical Ocean Global Atmosphere - Coupled Ocean Atmosphere Response Experiment (TOGA-COARE) experiment are investigated through the use of Clustering Analysis. The clusters are obtained from the values of IR and VIS counts and the local variance for both channels. The clustering procedure is based on the standardized histogram of each variable obtained from 179 pairs of images. A new approach to classify high clouds using only IR and the clustering technique is proposed. This method allows the separation of the enhanced convection in two main classes: convective tops, more closely related to the most active core of the storm, and convective systems, which produce regions of merged, thick anvil clouds. The resulting classification of different portions of cloudiness is compared to the radar reflectivity field for intensive events. Convective Systems and Convective Tops are followed during their life cycle using the IR clustering method. The areal coverage of precipitation and features related to convective and stratiform rain is obtained from the radar for each stage of the evolving Mesoscale Convective Systems (MCS). In order to compare the IR clustering method with a simple threshold technique, two IR thresholds (Tir) were used to identify different portions of cloudiness, Tir=240K which roughly defines the extent of all cloudiness associated with the MCS, and Tir=220K which indicates the presence of deep convection. It is shown that the IR clustering technique can be used as a simple alternative to identify the actual portion of convective and stratiform rainfall.
Noor, Sina Ibne; Dietz, Steffen; Heidtmann, Hella; Boone, Christopher D.; McKenna, Robert; Deitmer, Joachim W.; Becker, Holger M.
2015-01-01
Proton-coupled monocarboxylate transporters (MCTs) mediate the exchange of high energy metabolites like lactate between different cells and tissues. We have reported previously that carbonic anhydrase II augments transport activity of MCT1 and MCT4 by a noncatalytic mechanism, while leaving transport activity of MCT2 unaltered. In the present study, we combined electrophysiological measurements in Xenopus oocytes and pulldown experiments to analyze the direct interaction between carbonic anhydrase II (CAII) and MCT1, MCT2, and MCT4, respectively. Transport activity of MCT2-WT, which lacks a putative CAII-binding site, is not augmented by CAII. However, introduction of a CAII-binding site into the C terminus of MCT2 resulted in CAII-mediated facilitation of MCT2 transport activity. Interestingly, introduction of three glutamic acid residues alone was not sufficient to establish a direct interaction between MCT2 and CAII, but the cluster had to be arranged in a fashion that allowed access to the binding moiety in CAII. We further demonstrate that functional interaction between MCT4 and CAII requires direct binding of the enzyme to the acidic cluster 431EEE in the C terminus of MCT4 in a similar fashion as previously shown for binding of CAII to the cluster 489EEE in the C terminus of MCT1. In CAII, binding to MCT1 and MCT4 is mediated by a histidine residue at position 64. Taken together, our results suggest that facilitation of MCT transport activity by CAII requires direct binding between histidine 64 in CAII and a cluster of glutamic acid residues in the C terminus of the transporter that has to be positioned in surroundings that allow access to CAII. PMID:25561737
Berthias, F; Feketeová, L; Abdoul-Carime, H; Calvo, F; Farizon, B; Farizon, M; Märk, T D
2018-06-22
Velocity distributions of neutral water molecules evaporated after collision induced dissociation of protonated water clusters H+(H2O)n≤10 were measured using the combined correlated ion and neutral fragment time-of-flight (COINTOF) and velocity map imaging (VMI) techniques. As observed previously, all measured velocity distributions exhibit two contributions, with a low velocity part identified by statistical molecular dynamics (SMD) simulations as events obeying the Maxwell-Boltzmann statistics and a high velocity contribution corresponding to non-ergodic events in which energy redistribution is incomplete. In contrast to earlier studies, where the evaporation of a single molecule was probed, the present study is concerned with events involving the evaporation of up to five water molecules. In particular, we discuss here in detail the cases of two and three evaporated molecules. Evaporation of several water molecules after CID can be interpreted in general as a sequential evaporation process. In addition to the SMD calculations, a Monte Carlo (MC) based simulation was developed allowing the reconstruction of the velocity distribution produced by the evaporation of m molecules from H+(H2O)n≤10 cluster ions using the measured velocity distributions for singly evaporated molecules as the input. The observed broadening of the low-velocity part of the distributions for the evaporation of two and three molecules as compared to the width for the evaporation of a single molecule results from the cumulative recoil velocity of the successive ion residues as well as the intrinsically broader distributions for decreasingly smaller parent clusters. Further MC simulations were carried out assuming that a certain proportion of non-ergodic events is responsible for the first evaporation in such a sequential evaporation series, thereby allowing to model the entire velocity distribution.
Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery
Hoinka, Jan; Berezhnoy, Alexey; Dao, Phuong; Sauna, Zuben E.; Gilboa, Eli; Przytycka, Teresa M.
2015-01-01
High-Throughput (HT) SELEX combines SELEX (Systematic Evolution of Ligands by EXponential Enrichment), a method for aptamer discovery, with massively parallel sequencing technologies. This emerging technology provides data for a global analysis of the selection process and for simultaneous discovery of a large number of candidates but currently lacks dedicated computational approaches for their analysis. To close this gap, we developed novel in-silico methods to analyze HT-SELEX data and utilized them to study the emergence of polymerase errors during HT-SELEX. Rather than considering these errors as a nuisance, we demonstrated their utility for guiding aptamer discovery. Our approach builds on two main advancements in aptamer analysis: AptaMut—a novel technique allowing for the identification of polymerase errors conferring an improved binding affinity relative to the ‘parent’ sequence and AptaCluster—an aptamer clustering algorithm which is to our best knowledge, the only currently available tool capable of efficiently clustering entire aptamer pools. We applied these methods to an HT-SELEX experiment developing aptamers against Interleukin 10 receptor alpha chain (IL-10RA) and experimentally confirmed our predictions thus validating our computational methods. PMID:25870409
NASA Astrophysics Data System (ADS)
Gligor, M.; Ausloos, M.
2007-05-01
The statistical distances between countries, calculated for various moving average time windows, are mapped into the ultrametric subdominant space as in classical Minimal Spanning Tree methods. The Moving Average Minimal Length Path (MAMLP) algorithm allows a decoupling of fluctuations with respect to the mass center of the system from the movement of the mass center itself. A Hamiltonian representation given by a factor graph is used and plays the role of cost function. The present analysis pertains to 11 macroeconomic (ME) indicators, namely the GDP (x1), Final Consumption Expenditure (x2), Gross Capital Formation (x3), Net Exports (x4), Consumer Price Index (y1), Rates of Interest of the Central Banks (y2), Labour Force (z1), Unemployment (z2), GDP/hour worked (z3), GDP/capita (w1) and Gini coefficient (w2). The target group of countries is composed of 15 EU countries, data taken between 1995 and 2004. By two different methods (the Bipartite Factor Graph Analysis and the Correlation Matrix Eigensystem Analysis) it is found that the strongly correlated countries with respect to the macroeconomic indicators fluctuations can be partitioned into stable clusters.
Sbaraini, Nicolau; Andreis, Fábio C; Thompson, Claudia E; Guedes, Rafael L M; Junges, Ângela; Campos, Thais; Staats, Charley C; Vainstein, Marilene H; Ribeiro de Vasconcelos, Ana T; Schrank, Augusto
2017-01-01
The emergence of new microbial pathogens can result in destructive outbreaks, since their hosts have limited resistance and pathogens may be excessively aggressive. Described as the major ecological incident of the twentieth century, Dutch elm disease, caused by ascomycete fungi from the Ophiostoma genus, has caused a significant decline in elm tree populations ( Ulmus sp.) in North America and Europe. Genome sequencing of the two main causative agents of Dutch elm disease ( Ophiostoma ulmi and Ophiostoma novo-ulmi ), along with closely related species with different lifestyles, allows for unique comparisons to be made to identify how pathogens and virulence determinants have emerged. Among several established virulence determinants, secondary metabolites (SMs) have been suggested to play significant roles during phytopathogen infection. Interestingly, the secondary metabolism of Dutch elm pathogens remains almost unexplored, and little is known about how SM biosynthetic genes are organized in these species. To better understand the metabolic potential of O. ulmi and O. novo-ulmi , we performed a deep survey and description of SM biosynthetic gene clusters (BGCs) in these species and assessed their conservation among eight species from the Ophiostomataceae family. Among 19 identified BGCs, a fujikurin-like gene cluster (OpPKS8) was unique to Dutch elm pathogens. Phylogenetic analysis revealed that orthologs for this gene cluster are widespread among phytopathogens and plant-associated fungi, suggesting that OpPKS8 may have been horizontally acquired by the Ophiostoma genus. Moreover, the detailed identification of several BGCs paves the way for future in-depth research and supports the potential impact of secondary metabolism on Ophiostoma genus' lifestyle.
The ALICE Software Release Validation cluster
NASA Astrophysics Data System (ADS)
Berzano, D.; Krzewicki, M.
2015-12-01
One of the most important steps of software lifecycle is Quality Assurance: this process comprehends both automatic tests and manual reviews, and all of them must pass successfully before the software is approved for production. Some tests, such as source code static analysis, are executed on a single dedicated service: in High Energy Physics, a full simulation and reconstruction chain on a distributed computing environment, backed with a sample “golden” dataset, is also necessary for the quality sign off. The ALICE experiment uses dedicated and virtualized computing infrastructures for the Release Validation in order not to taint the production environment (i.e. CVMFS and the Grid) with non-validated software and validation jobs: the ALICE Release Validation cluster is a disposable virtual cluster appliance based on CernVM and the Virtual Analysis Facility, capable of deploying on demand, and with a single command, a dedicated virtual HTCondor cluster with an automatically scalable number of virtual workers on any cloud supporting the standard EC2 interface. Input and output data are externally stored on EOS, and a dedicated CVMFS service is used to provide the software to be validated. We will show how the Release Validation Cluster deployment and disposal are completely transparent for the Release Manager, who simply triggers the validation from the ALICE build system's web interface. CernVM 3, based entirely on CVMFS, permits to boot any snapshot of the operating system in time: we will show how this allows us to certify each ALICE software release for an exact CernVM snapshot, addressing the problem of Long Term Data Preservation by ensuring a consistent environment for software execution and data reprocessing in the future.
Wagner, Philippe; Merlo, Juan
2016-01-01
Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster‐specific random effects which allow one to partition the total individual variance into between‐cluster variation and between‐individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time‐to‐event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., ‘frailty’) Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:27885709
CODEX weak lensing: concentration of galaxy clusters at z ~ 0.5
Cibirka, N.; Cypriano, E. S.; Brimioulle, F.; ...
2017-03-04
Here, we present a stacked weak-lensing analysis of 27 richness selected galaxy clusters at 0.40 ≤ z ≤ 0.62 in the COnstrain Dark Energy with X-ray galaxy clusters (CODEX) survey. The fields were observed in five bands with the Canada–France–Hawaii Telescope (CFHT). We measure the stacked surface mass density profile with a 14σ significance in the radial range 0.1 < RMpch -1 < 2.5. The profile is well described by the halo model, with the main halo term following a Navarro–Frenk–White profile (NFW) profile and including the off-centring effect. We select the background sample using a conservative colour–magnitude method to reduce the potential systematic errors and contamination by cluster member galaxies. We perform a Bayesian analysis for the stacked profile and constrain the best-fitting NFW parameters M 200c=6.6more » $$+1.0\\atop{-0.8}$$×10 14h -1 M⊙ and c 200c=3.7$$+0.7\\atop{-0.6}$$. The off-centring effect was modelled based on previous observational results found for redMaPPer Sloan Digital Sky Survey clusters. Our constraints on M200c and c200c allow us to investigate the consistency with numerical predictions and select a concentration–mass relation to describe the high richness CODEX sample. Comparing our best-fitting values for M200c and c200c with other observational surveys at different redshifts, we find no evidence for evolution in the concentration–mass relation, though it could be mitigated by particular selection functions. Similar to previous studies investigating the X-ray luminosity–mass relation, our data suggest a lower evolution than expected from self-similarity.« less
Caroleo, Mariarita; Primerano, Amedeo; Rania, Marianna; Aloi, Matteo; Pugliese, Valentina; Magliocco, Fabio; Fazia, Gilda; Filippo, Andrea; Sinopoli, Flora; Ricchio, Marco; Arturi, Franco; Jimenez-Murcia, Susana; Fernandez-Aranda, Fernando; De Fazio, Pasquale; Segura-Garcia, Cristina
2018-02-01
Considering that specific genetic profiles, psychopathological conditions and neurobiological systems underlie human behaviours, the phenotypic differentiation of obese patients according to eating behaviours should be investigated. The aim of this study was to classify obese patients according to their eating behaviours and to compare these clusters in regard to psychopathology, personality traits, neurocognitive patterns and genetic profiles. A total of 201 obese outpatients seeking weight reduction treatment underwent a dietetic visit, psychological and psychiatric assessment and genotyping for SCL6A2 polymorphisms. Eating behaviours were clustered through two-step cluster analysis, and these clusters were subsequently compared. Two groups emerged: cluster 1 contained patients with predominantly prandial hyperphagia, social eating, an increased frequency of the long allele of the 5-HTTLPR and low scores in all tests; and cluster 2 included patients with more emotionally related eating behaviours (emotional eating, grazing, binge eating, night eating, post-dinner eating, craving for carbohydrates), dysfunctional personality traits, neurocognitive impairment, affective disorders and increased frequencies of the short (S) allele and the S/S genotype. Aside from binge eating, dysfunctional eating behaviours were useful symptoms to identify two different phenotypes of obese patients from a comprehensive set of parameters (genetic, clinical, personality and neuropsychology) in this sample. Grazing and emotional eating were the most important predictors for classifying obese patients, followed by binge eating. This clustering overcomes the idea that 'binging' is the predominant altered eating behaviour, and could help physicians other than psychiatrists to identify whether an obese patient has an eating disorder. Finally, recognising different types of obesity may not only allow a more comprehensive understanding of this illness, but also make it possible to tailor patient-specific treatment pathways. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Early dynamical evolution of substructured stellar clusters
NASA Astrophysics Data System (ADS)
Dorval, Julien; Boily, Christian
2015-08-01
It is now widely accepted that stellar clusters form with a high level of substructure (Kuhn et al. 2014, Bate 2009), inherited from the molecular cloud and the star formation process. Evidence from observations and simulations also indicate the stars in such young clusters form a subvirial system (Kirk et al. 2007, Maschberger et al. 2010). The subsequent dynamical evolution can cause important mass loss, ejecting a large part of the birth population in the field. It can also imprint the stellar population and still be inferred from observations of evolved clusters. Nbody simulations allow a better understanding of these early twists and turns, given realistic initial conditions. Nowadays, substructured, clumpy young clusters are usually obtained through pseudo-fractal growth (Goodwin et al. 2004) and velocity inheritance. Such models are visually realistics and are very useful, they are however somewhat artificial in their velocity distribution. I introduce a new way to create clumpy initial conditions through a "Hubble expansion" which naturally produces self consistent clumps, velocity-wise. A velocity distribution analysis shows the new method produces realistic models, consistent with the dynamical state of the newly created cores in hydrodynamic simulation of cluster formation (Klessen & Burkert 2000). I use these initial conditions to investigate the dynamical evolution of young subvirial clusters, up to 80000 stars. I find an overall soft evolution, with hierarchical merging leading to a high level of mass segregation. I investigate the influence of the mass function on the fate of the cluster, specifically on the amount of mass loss induced by the early violent relaxation. Using a new binary detection algorithm, I also find a strong processing of the native binary population.
NASA Astrophysics Data System (ADS)
Wright, D. J.; Raad, M.; Hoel, E.; Park, M.; Mollenkopf, A.; Trujillo, R.
2016-12-01
Introduced is a new approach for processing spatiotemporal big data by leveraging distributed analytics and storage. A suite of temporally-aware analysis tools summarizes data nearby or within variable windows, aggregates points (e.g., for various sensor observations or vessel positions), reconstructs time-enabled points into tracks (e.g., for mapping and visualizing storm tracks), joins features (e.g., to find associations between features based on attributes, spatial relationships, temporal relationships or all three simultaneously), calculates point densities, finds hot spots (e.g., in species distributions), and creates space-time slices and cubes (e.g., in microweather applications with temperature, humidity, and pressure, or within human mobility studies). These "feature geo analytics" tools run in both batch and streaming spatial analysis mode as distributed computations across a cluster of servers on typical "big" data sets, where static data exist in traditional geospatial formats (e.g., shapefile) locally on a disk or file share, attached as static spatiotemporal big data stores, or streamed in near-real-time. In other words, the approach registers large datasets or data stores with ArcGIS Server, then distributes analysis across a cluster of machines for parallel processing. Several brief use cases will be highlighted based on a 16-node server cluster at 14 Gb RAM per node, allowing, for example, the buffering of over 8 million points or thousands of polygons in 1 minute. The approach is "hybrid" in that ArcGIS Server integrates open-source big data frameworks such as Apache Hadoop and Apache Spark on the cluster in order to run the analytics. In addition, the user may devise and connect custom open-source interfaces and tools developed in Python or Python Notebooks; the common denominator being the familiar REST API.
NASA Astrophysics Data System (ADS)
Biazzo, K.; Pasquini, L.; Girardi, L.; Frasca, A.; da Silva, L.; Setiawan, J.; Marilli, E.; Hatzes, A. P.; Catalano, S.
2007-12-01
Aims:We test our capability of deriving stellar physical parameters of giant stars by analysing a sample of field stars and the well studied open cluster IC 4651 with different spectroscopic methods. Methods: The use of a technique based on line-depth ratios (LDRs) allows us to determine with high precision the effective temperature of the stars and to compare the results with those obtained with a classical LTE abundance analysis. Results: (i) For the field stars we find that the temperatures derived by means of the LDR method are in excellent agreement with those found by the spectral synthesis. This result is extremely encouraging because it shows that spectra can be used to firmly derive population characteristics (e.g., mass and age) of the observed stars. (ii) For the IC 4651 stars we use the determined effective temperature to derive the following results. a) The reddening E(B-V) of the cluster is 0.12±0.02, largely independent of the color-temperature calibration used. b) The age of the cluster is 1.2±0.2 Gyr. c) The typical mass of the analysed giant stars is 2.0±0.2~M⊙. Moreover, we find a systematic difference of about 0.2 dex in log g between spectroscopic and evolutionary values. Conclusions: We conclude that, in spite of known limitations, a classical spectroscopic analysis of giant stars may indeed result in very reliable stellar parameters. We caution that the quality of the agreement, on the other hand, depends on the details of the adopted spectroscopic analysis. Based on observations collected at the ESO telescopes at the Paranal and La Silla Observatories, Chile.
EXPLORING FUNCTIONAL CONNECTIVITY IN FMRI VIA CLUSTERING.
Venkataraman, Archana; Van Dijk, Koene R A; Buckner, Randy L; Golland, Polina
2009-04-01
In this paper we investigate the use of data driven clustering methods for functional connectivity analysis in fMRI. In particular, we consider the K-Means and Spectral Clustering algorithms as alternatives to the commonly used Seed-Based Analysis. To enable clustering of the entire brain volume, we use the Nyström Method to approximate the necessary spectral decompositions. We apply K-Means, Spectral Clustering and Seed-Based Analysis to resting-state fMRI data collected from 45 healthy young adults. Without placing any a priori constraints, both clustering methods yield partitions that are associated with brain systems previously identified via Seed-Based Analysis. Our empirical results suggest that clustering provides a valuable tool for functional connectivity analysis.
Fourier Decomposition and Properties of the Variable Stars in the Globular Cluster NGC 4833
NASA Astrophysics Data System (ADS)
Reed, Hunter M.; Pajkos, Michael A.; Murphy, Brian W.; Darragh, Andrew
2016-01-01
Globular clusters provide an ideal setting to study stellar evolution of stars of similar composition and age. RR Lyrae stars found in globular clusters have a variety of uses in probing the physical characteristics of the stellar population itself and its evolution. Building upon our previous study, we focus on the RR Lyrae stars in the globular cluster NGC 4833. From March through June 2014, we used the Southeastern Association for Research in Astronomy 0.6-meter telescope located at CTIO to collect nearly 1,500 images of NGC 4833 in the B, V, R, and I bands. Using difference image analysis we identified 40 variable stars. Of these, 20 were RR Lyrae stars with 10 being of type RR0, 7 of type RR1, and 3 of type RR2. Additionally, 6 SX Phe, 5 eclipsing binaries, and 9 long period variables were identified. The average period of the type RR0, RR1, and RR2 type variables were 0.69597 days, 0.39547 days, and 0.30654 days, respectively. The periods of the RR Lyrae stars and ratio of N1/(N0+N1) of 0.41 is indicative of an Oosterhoff Type II cluster. The observations of the RR Lyrae stars were of very high quality and phase coverage allowing us to perform Fourier decomposition of their light curves. From this Fourier decomposition we were able to determine the physical characteristics of the RR Lyrae stars. We found the mean iron abundance to be [Fe/H]JKZW = -1.87 ± 0.06, the mean apparent V-magnitude RR0 and RR1 type variables to be VRR = 15.51 ± 0.11, a mean absolute V-magnitude of MV = 0.636 ± 0.053; and an effective temperature for RR0's and RR1's of log10Teff = 3.797 and log10Teff = 3.855, respectively. The multi-band photometry allowed us to determine the reddening of the cluster, E(B-V) = 0.342 ± 0.021, which resulted in a distance of D(kpc) = 5.91 ± 0.31 to NGC 4833.
Jamali, Mojdeh; Ebrahimi, Mohammad-Ali; Karimipour, Morteza; Shams-Ghahfarokhi, Masoomeh; Dinparast-Djadid, Navid; Kalantari, Sanaz; Pilehvar-Soltanahmadi, Yones; Amani, Akram; Razzaghi-Abyaneh, Mehdi
2012-01-01
In the present study, 193 Aspergillus strains were isolated from a total of 100 soil samples of pistachio orchards, which all of them were identified as Aspergillus flavus as the most abundant species of Aspergillus section Flavi existing in the environment. Approximately 59%, 81%, and 61% of the isolates were capable of producing aflatoxins (AFs), cyclopiazonic acid (CPA), and sclerotia, respectively. The isolates were classified into four chemotypes (I to IV) based on the ability to produce AFs and CPA. The resulting dendrogram of random amplified polymorphic DNA (RAPD) analysis of 24 selected A. flavus isolates demonstrated the formation of two separate clusters. Cluster 1 contained both aflatoxigenic and non-aflatoxigenic isolates (17 isolates), whereas cluster 2 comprised only aflatoxigenic isolates (7 isolates). All the isolates of cluster 2 produced significantly higher levels of AFs than those of cluster 1 and the isolates that produced both AFB(1) and AFB(2) were found only in cluster 2. RAPD genotyping allowed the differentiation of A. flavus from Aspergillus parasiticus as a closely related species within section Flavi. The present study has provided for the first time the relevant information on distribution and genetic diversity of different A. flavus populations from nontoxigenic to highly toxigenic enable to produce hazardous amounts of AFB(1) and CPA in soils of pistachio orchards. These fungi, either toxigenic or not-toxigenic, should be considered as potential threats for agriculture and public health.
Abanyie, F; Harvey, R R; Harris, J R; Wiegand, R E; Gaul, L; Desvignes-Kendrick, M; Irvin, K; Williams, I; Hall, R L; Herwaldt, B; Gray, E B; Qvarnstrom, Y; Wise, M E; Cantu, V; Cantey, P T; Bosch, S; DA Silva, A J; Fields, A; Bishop, H; Wellman, A; Beal, J; Wilson, N; Fiore, A E; Tauxe, R; Lance, S; Slutsker, L; Parise, M
2015-12-01
The 2013 multistate outbreaks contributed to the largest annual number of reported US cases of cyclosporiasis since 1997. In this paper we focus on investigations in Texas. We defined an outbreak-associated case as laboratory-confirmed cyclosporiasis in a person with illness onset between 1 June and 31 August 2013, with no history of international travel in the previous 14 days. Epidemiological, environmental, and traceback investigations were conducted. Of the 631 cases reported in the multistate outbreaks, Texas reported the greatest number of cases, 270 (43%). More than 70 clusters were identified in Texas, four of which were further investigated. One restaurant-associated cluster of 25 case-patients was selected for a case-control study. Consumption of cilantro was most strongly associated with illness on meal date-matched analysis (matched odds ratio 19·8, 95% confidence interval 4·0-∞). All case-patients in the other three clusters investigated also ate cilantro. Traceback investigations converged on three suppliers in Puebla, Mexico. Cilantro was the vehicle of infection in the four clusters investigated; the temporal association of these clusters with the large overall increase in cyclosporiasis cases in Texas suggests cilantro was the vehicle of infection for many other cases. However, the paucity of epidemiological and traceback information does not allow for a conclusive determination; moreover, molecular epidemiological tools for cyclosporiasis that could provide more definitive linkage between case clusters are needed.
Balzer, Laura B; Zheng, Wenjing; van der Laan, Mark J; Petersen, Maya L
2018-01-01
We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Adam, R.; Ade, P. A. R.; Aghanim, N.
Although infrared (IR) overall dust emission from clusters of galaxies has been statistically detected using data from the Infrared Astronomical Satellite (IRAS), it has not been possible to sample the spectral energy distribution (SED) of this emission over its peak, and thus to break the degeneracy between dust temperature and mass. By complementing the IRAS spectral coverage with Planck satellite data from 100 to 857 GHz, we provide in this paper new constraints on the IR spectrum of thermal dust emission in clusters of galaxies. We achieve this by using a stacking approach for a sample of several hundred objectsmore » from the Planck cluster sample. This procedure averages out fluctuations from the IR sky, allowing us to reach a significant detection of the faint cluster contribution. We also use the large frequency range probed by Planck, together with component-separation techniques, to remove the contamination from both cosmic microwave background anisotropies and the thermal Sunyaev-Zeldovich effect (tSZ) signal, which dominate at ν ≤ 353 GHz. By excluding dominant spurious signals or systematic effects, averaged detections are reported at frequencies 353 GHz ≤ ν ≤ 5000 GHz. We confirm the presence of dust in clusters of galaxies at low and intermediate redshifts, yielding an SED with a shape similar to that of the Milky Way. Planck’s resolution does not allow us to investigate the detailed spatial distribution of this emission (e.g. whether it comes from intergalactic dust or simply the dust content of the cluster galaxies), but the radial distribution of the emission appears to follow that of the stacked SZ signal, and thus the extent of the clusters. Finally, the recovered SED allows us to constrain the dust mass responsible for the signal and its temperature.« less
González Aracil, J; Ruiz Pérez, I; Aviñó Rico, M J; Hernández Aguado, I
1999-01-01
To measure the usefulness of multiple correspondence analysis (MCA) and cluster analysis applied to the epidemiological research of HIV infection. The specific are to explore the relationships between the different variables that characterize the users of the AIDS Information and Prevention Center (CIPS) and to identify clusters of characteristics which in terms of the attendance to these centers, could be considered similar. The clinical history the CIPS in the Valencian region in Spain was used as data source. The target population target were intravenous drug users (IDUSs) attending these centers between 1987 and 1994 (n = 6211). Information about socio-demographic and HIV type I infection-related variables (drug use and sexual behaviour) was collected by means of a semistructured questionnaire. A MCA was carried out to obtain a group of quantitative factors that were used in a cluster analysis. A 44.8% HIV type I prevalence was found. Five factors were detected by MCA that explain 51.14% of the total variability, of which sex, age and the usual sexual partner were the variables best explained. Cluster analysis allowed to describe 5 different subgroups of CIPS users according to their socio-demographics characteristics, risk behaviours and serologic status. It is necessary to highlight the categories 1 and 2, which collect the serologic status and the most relevant characteristics of HIV infection. Category I contains users with a negative serology and characterized by being mainly single adolescent men, with a low educational level; they stated that they have no steady sexual partner, do not share syringes and have been intravenous drug users between 3 and 10 years. They mainly come from the city of Alicante. Category 2 contains mainly people that are HIV positive and older. They also share syringes and have been intravenous drug users for a longer time; they have a higher education level and most of them come from the city of Valencia. The proposed method of analysis was able to characterise the CIPS users, identifying those socio-demographic variables and risk behaviours that are more related to the serologic status. The applicability of these techniques to epidemiologic studies of HIV type I infection is discussed.
Wu, Jianlan; Tang, Zhoufei; Gong, Zhihao; Cao, Jianshu; Mukamel, Shaul
2015-04-02
The energy absorbed in a light-harvesting protein complex is often transferred collectively through aggregated chromophore clusters. For population evolution of chromophores, the time-integrated effective rate matrix allows us to construct quantum kinetic clusters quantitatively and determine the reduced cluster-cluster transfer rates systematically, thus defining a minimal model of energy-transfer kinetics. For Fenna-Matthews-Olson (FMO) and light-havrvesting complex II (LCHII) monomers, quantum Markovian kinetics of clusters can accurately reproduce the overall energy-transfer process in the long-time scale. The dominant energy-transfer pathways are identified in the picture of aggregated clusters. The chromophores distributed extensively in various clusters can assist a fast and long-range energy transfer.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Weighted Key Player Problem for Social Network Analysis
2011-03-01
the degree of the actor, the number of adjacent neighbors, to de - termine its centrality value. Introduced in its current form by Freeman, a node’s...identifying individuals who are key in a number of contexts. This chapter developed the WKPP-Pos measure that allows for the inclusion of actor and...Techniques were de - 44 veloped for using the p-median to find optimal solutions to the WKPP-Pos measure and for using hierarchical clustering as a
Prediction of tautomer ratios by embedded-cluster integral equation theory
NASA Astrophysics Data System (ADS)
Kast, Stefan M.; Heil, Jochen; Güssregen, Stefan; Schmidt, K. Friedemann
2010-04-01
The "embedded cluster reference interaction site model" (EC-RISM) approach combines statistical-mechanical integral equation theory and quantum-chemical calculations for predicting thermodynamic data for chemical reactions in solution. The electronic structure of the solute is determined self-consistently with the structure of the solvent that is described by 3D RISM integral equation theory. The continuous solvent-site distribution is mapped onto a set of discrete background charges ("embedded cluster") that represent an additional contribution to the molecular Hamiltonian. The EC-RISM analysis of the SAMPL2 challenge set of tautomers proceeds in three stages. Firstly, the group of compounds for which quantitative experimental free energy data was provided was taken to determine appropriate levels of quantum-chemical theory for geometry optimization and free energy prediction. Secondly, the resulting workflow was applied to the full set, allowing for chemical interpretations of the results. Thirdly, disclosure of experimental data for parts of the compounds facilitated a detailed analysis of methodical issues and suggestions for future improvements of the model. Without specifically adjusting parameters, the EC-RISM model yields the smallest value of the root mean square error for the first set (0.6 kcal mol-1) as well as for the full set of quantitative reaction data (2.0 kcal mol-1) among the SAMPL2 participants.
A simple algorithm for the identification of clinical COPD phenotypes.
Burgel, Pierre-Régis; Paillasseur, Jean-Louis; Janssens, Wim; Piquet, Jacques; Ter Riet, Gerben; Garcia-Aymerich, Judith; Cosio, Borja; Bakke, Per; Puhan, Milo A; Langhammer, Arnulf; Alfageme, Inmaculada; Almagro, Pere; Ancochea, Julio; Celli, Bartolome R; Casanova, Ciro; de-Torres, Juan P; Decramer, Marc; Echazarreta, Andrés; Esteban, Cristobal; Gomez Punter, Rosa Mar; Han, MeiLan K; Johannessen, Ane; Kaiser, Bernhard; Lamprecht, Bernd; Lange, Peter; Leivseth, Linda; Marin, Jose M; Martin, Francis; Martinez-Camblor, Pablo; Miravitlles, Marc; Oga, Toru; Sofia Ramírez, Ana; Sin, Don D; Sobradillo, Patricia; Soler-Cataluña, Juan J; Turner, Alice M; Verdu Rivera, Francisco Javier; Soriano, Joan B; Roche, Nicolas
2017-11-01
This study aimed to identify simple rules for allocating chronic obstructive pulmonary disease (COPD) patients to clinical phenotypes identified by cluster analyses.Data from 2409 COPD patients of French/Belgian COPD cohorts were analysed using cluster analysis resulting in the identification of subgroups, for which clinical relevance was determined by comparing 3-year all-cause mortality. Classification and regression trees (CARTs) were used to develop an algorithm for allocating patients to these subgroups. This algorithm was tested in 3651 patients from the COPD Cohorts Collaborative International Assessment (3CIA) initiative.Cluster analysis identified five subgroups of COPD patients with different clinical characteristics (especially regarding severity of respiratory disease and the presence of cardiovascular comorbidities and diabetes). The CART-based algorithm indicated that the variables relevant for patient grouping differed markedly between patients with isolated respiratory disease (FEV 1 , dyspnoea grade) and those with multi-morbidity (dyspnoea grade, age, FEV 1 and body mass index). Application of this algorithm to the 3CIA cohorts confirmed that it identified subgroups of patients with different clinical characteristics, mortality rates (median, from 4% to 27%) and age at death (median, from 68 to 76 years).A simple algorithm, integrating respiratory characteristics and comorbidities, allowed the identification of clinically relevant COPD phenotypes. Copyright ©ERS 2017.
A statistical study of EMIC waves observed by Cluster. 1. Wave properties. EMIC Wave Properties
Allen, R. C.; Zhang, J. -C.; Kistler, L. M.; ...
2015-07-23
Electromagnetic ion cyclotron (EMIC) waves are an important mechanism for particle energization and losses inside the magnetosphere. In order to better understand the effects of these waves on particle dynamics, detailed information about the occurrence rate, wave power, ellipticity, normal angle, energy propagation angle distributions, and local plasma parameters are required. Previous statistical studies have used in situ observations to investigate the distribution of these parameters in the magnetic local time versus L-shell (MLT-L) frame within a limited magnetic latitude (MLAT) range. In our study, we present a statistical analysis of EMIC wave properties using 10 years (2001–2010) of datamore » from Cluster, totaling 25,431 min of wave activity. Due to the polar orbit of Cluster, we are able to investigate EMIC waves at all MLATs and MLTs. This allows us to further investigate the MLAT dependence of various wave properties inside different MLT sectors and further explore the effects of Shabansky orbits on EMIC wave generation and propagation. Thus, the statistical analysis is presented in two papers. OUr paper focuses on the wave occurrence distribution as well as the distribution of wave properties. The companion paper focuses on local plasma parameters during wave observations as well as wave generation proxies.« less
Size and shape variations of the bony components of sperm whale cochleae.
Schnitzler, Joseph G; Frédérich, Bruno; Früchtnicht, Sven; Schaffeld, Tobias; Baltzer, Johannes; Ruser, Andreas; Siebert, Ursula
2017-04-25
Several mass strandings of sperm whales occurred in the North Sea during January and February 2016. Twelve animals were necropsied and sampled around 48 h after their discovery on German coasts of Schleswig Holstein. The present study aims to explore the morphological variation of the primary sensory organ of sperm whales, the left and right auditory system, using high-resolution computerised tomography imaging. We performed a quantitative analysis of size and shape of cochleae using landmark-based geometric morphometrics to reveal inter-individual anatomical variations. A hierarchical cluster analysis based on thirty-one external morphometric characters classified these 12 individuals in two stranding clusters. A relative amount of shape variation could be attributable to geographical differences among stranding locations and clusters. Our geometric data allowed the discrimination of distinct bachelor schools among sperm whales that stranded on German coasts. We argue that the cochleae are individually shaped, varying greatly in dimensions and that the intra-specific variation observed in the morphology of the cochleae may partially reflect their affiliation to their bachelor school. There are increasing concerns about the impact of noise on cetaceans and describing the auditory periphery of odontocetes is a key conservation issue to further assess the effect of noise pollution.
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets
Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.
2017-01-01
High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Molecular subtyping of bladder cancer using Kohonen self-organizing maps
Borkowska, Edyta M; Kruk, Andrzej; Jedrzejczyk, Adam; Rozniecki, Marek; Jablonowski, Zbigniew; Traczyk, Magdalena; Constantinou, Maria; Banaszkiewicz, Monika; Pietrusinski, Michal; Sosnowski, Marek; Hamdy, Freddie C; Peter, Stefan; Catto, James WF; Kaluzewski, Bogdan
2014-01-01
Kohonen self-organizing maps (SOMs) are unsupervised Artificial Neural Networks (ANNs) that are good for low-density data visualization. They easily deal with complex and nonlinear relationships between variables. We evaluated molecular events that characterize high- and low-grade BC pathways in the tumors from 104 patients. We compared the ability of statistical clustering with a SOM to stratify tumors according to the risk of progression to more advanced disease. In univariable analysis, tumor stage (log rank P = 0.006) and grade (P < 0.001), HPV DNA (P < 0.004), Chromosome 9 loss (P = 0.04) and the A148T polymorphism (rs 3731249) in CDKN2A (P = 0.02) were associated with progression. Multivariable analysis of these parameters identified that tumor grade (Cox regression, P = 0.001, OR.2.9 (95% CI 1.6–5.2)) and the presence of HPV DNA (P = 0.017, OR 3.8 (95% CI 1.3–11.4)) were the only independent predictors of progression. Unsupervised hierarchical clustering grouped the tumors into discreet branches but did not stratify according to progression free survival (log rank P = 0.39). These genetic variables were presented to SOM input neurons. SOMs are suitable for complex data integration, allow easy visualization of outcomes, and may stratify BC progression more robustly than hierarchical clustering. PMID:25142434
On the design and analysis of clinical trials with correlated outcomes
Follmann, Dean; Proschan, Michael
2014-01-01
SUMMARY The convention in clinical trials is to regard outcomes as independently distributed, but in some situations they may be correlated. For example, in infectious diseases, correlation may be induced if participants have contact with a common infectious source, or share hygienic tips that prevent infection. This paper discusses the design and analysis of randomized clinical trials that allow arbitrary correlation among all randomized volunteers. This perspective generalizes the traditional perspective of strata, where patients are exchangeable within strata, and independent across strata. For theoretical work, we focus on the test of no treatment effect μ1 − μ0 = 0 when the n dimensional vector of outcomes follows a Gaussian distribution with known n × n covariance matrix Σ, where the half randomized to treatment (placebo) have mean response μ1 (μ0). We show how the new test corresponds to familiar tests in simple situations for independent, exchangeable, paired, and clustered data. We also discuss the design of trials where Σ is known before or during randomization of patients and evaluate randomization schemes based on such knowledge. We provide two complex examples to illustrate the method, one for a study of 23 family clusters with cardiomyopathy, the other where the malaria attack rates vary within households and clusters of households in a Malian village. PMID:25111420
Ammerlaan, Judy W; van Os-Medendorp, Harmieke; de Boer-Nijhof, Nienke; Maat, Bertha; Scholtus, Lieske; Kruize, Aike A; Bijlsma, Johannes W J; Geenen, Rinie
2017-03-01
Aim of this study was to investigate preferences and needs regarding the structure and content of a person-centered online self-management support intervention for patients with a rheumatic disease. A four step procedure, consisting of online focus group interviews, consensus meetings with patient representatives, card sorting task and hierarchical cluster analysis was used to identify the preferences and needs. Preferences concerning the structure involved 1) suitability to individual needs and questions, 2) fit to the life stage 3) creating the opportunity to share experiences, be in contact with others, 4) have an expert patient as trainer, 5) allow for doing the training at one's own pace and 6) offer a brief intervention. Hierarchical cluster analysis of 55 content needs comprised eleven clusters: 1) treatment knowledge, 2) societal procedures, 3) physical activity, 4) psychological distress, 5) self-efficacy, 6) provider, 7) fluctuations, 8) dealing with rheumatic disease, 9) communication, 10) intimate relationship, and 11) having children. A comprehensive assessment of preferences and needs in patients with a rheumatic disease is expected to contribute to motivation, adherence to and outcome of self-management-support programs. The overview of preferences and needs can be used to build an online-line self-management intervention. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Sánchez-Salcedo, Eva M; Tassotti, Michele; Del Rio, Daniele; Hernández, Francisca; Martínez, Juan José; Mena, Pedro
2016-12-01
This study reports the (poly)phenolic fingerprinting and chemometric discrimination of leaves of eight mulberry clones from Morus alba and Morus nigra cultivated in Spain. UHPLC-MS(n) (Ultra High Performance Liquid Chromatography-Mass Spectrometry) high-throughput analysis allowed the tentative identification of a total of 31 compounds. The phenolic profile of mulberry leaf was characterized by the presence of a high number of flavonol derivatives, mainly glycosylated forms of quercetin and kaempferol. Caffeoylquinic acids, simple phenolic acids, and some organic acids were also detected. Seven compounds were identified for the first time in mulberry leaves. The chemometric analysis (cluster analysis and principal component analysis) of the chromatographic data allowed the characterization of the different mulberry clones and served to explain the great intraspecific variability in mulberry secondary metabolism. This screening of the complete phenolic profile of mulberry leaves can assist the increasing interest for purposes related to quality control, germplasm screening, and bioactivity evaluation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Cluster size dependence of high-order harmonic generation
NASA Astrophysics Data System (ADS)
Tao, Y.; Hagmeijer, R.; Bastiaens, H. M. J.; Goh, S. J.; van der Slot, P. J. M.; Biedron, S. G.; Milton, S. V.; Boller, K.-J.
2017-08-01
We investigate high-order harmonic generation (HHG) from noble gas clusters in a supersonic gas jet. To identify the contribution of harmonic generation from clusters versus that from gas monomers, we measure the high-order harmonic output over a broad range of the total atomic number density in the jet (from 3×1016 to 3 × 1018 {{cm}}-3) at two different reservoir temperatures (303 and 363 K). For the first time in the evaluation of the harmonic yield in such measurements, the variation of the liquid mass fraction, g, versus pressure and temperature is taken into consideration, which we determine, reliably and consistently, to be below 20% within our range of experimental parameters. By comparing the measured harmonic yield from a thin jet with the calculated corresponding yield from monomers alone, we find an increased emission of the harmonics when the average cluster size is less than 3000. Using g, under the assumption that the emission from monomers and clusters add up coherently, we calculate the ratio of the average single-atom response of an atom within a cluster to that of a monomer and find an enhancement of around 100 for very small average cluster size (∼200). We do not find any dependence of the cut-off frequency on the composition of the cluster jet. This implies that HHG in clusters is based on electrons that return to their parent ions and not to neighboring ions in the cluster. To fully employ the enhanced average single-atom response found for small average cluster sizes (∼200), the nozzle producing the cluster jet must provide a large liquid mass fraction at these small cluster sizes for increasing the harmonic yield. Moreover, cluster jets may allow for quasi-phase matching, as the higher mass of clusters allows for a higher density contrast in spatially structuring the nonlinear medium.
PCA/HEXTE Observations of Coma and A2319
NASA Technical Reports Server (NTRS)
Rephaeli, Yoel
1998-01-01
The Coma cluster was observed in 1996 for 90 ks by the PCA and HEXTE instruments aboard the RXTE satellite, the first simultaneous, pointing measurement of Coma in the broad, 2-250 keV, energy band. The high sensitivity achieved during this long observation allows precise determination of the spectrum. Our analysis of the measurements clearly indicates that in addition to the main thermal emission from hot intracluster gas at kT=7.5 keV, a second spectral component is required to best-fit the data. If thermal, it can be described with a temperature of 4.7 keV contributing about 20% of the total flux. The additional spectral component can also be described by a power-law, possibly due to Compton scattering of relativistic electrons by the CMB. This interpretation is based on the diffuse radio synchrotron emission, which has a spectral index of 2.34, within the range allowed by fits to the RXTE spectral data. A Compton origin of the measured nonthermal component would imply that the volume-averaged magnetic field in the central region of Coma is B =0.2 micro-Gauss, a value deduced directly from the radio and X-ray measurements (and thus free of the usual assumption of energy equipartition). Barring the presence of unknown systematic errors in the RXTE source or background measurements, our spectral analysis yields considerable evidence for Compton X-ray emission in the Coma cluster.
Network based approaches reveal clustering in protein point patterns
NASA Astrophysics Data System (ADS)
Parker, Joshua; Barr, Valarie; Aldridge, Joshua; Samelson, Lawrence E.; Losert, Wolfgang
2014-03-01
Recent advances in super-resolution imaging have allowed for the sub-diffraction measurement of the spatial location of proteins on the surfaces of T-cells. The challenge is to connect these complex point patterns to the internal processes and interactions, both protein-protein and protein-membrane. We begin analyzing these patterns by forming a geometric network amongst the proteins and looking at network measures, such the degree distribution. This allows us to compare experimentally observed patterns to models. Specifically, we find that the experimental patterns differ from heterogeneous Poisson processes, highlighting an internal clustering structure. Further work will be to compare our results to simulated protein-protein interactions to determine clustering mechanisms.
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
Lux, Markus; Kruger, Jan; Rinke, Christian; ...
2016-12-20
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
acdc – Automated Contamination Detection and Confidence estimation for single-cell genome data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lux, Markus; Kruger, Jan; Rinke, Christian
A major obstacle in single-cell sequencing is sample contamination with foreign DNA. To guarantee clean genome assemblies and to prevent the introduction of contamination into public databases, considerable quality control efforts are put into post-sequencing analysis. Contamination screening generally relies on reference-based methods such as database alignment or marker gene search, which limits the set of detectable contaminants to organisms with closely related reference species. As genomic coverage in the tree of life is highly fragmented, there is an urgent need for a reference-free methodology for contaminant identification in sequence data. We present acdc, a tool specifically developed to aidmore » the quality control process of genomic sequence data. By combining supervised and unsupervised methods, it reliably detects both known and de novo contaminants. First, 16S rRNA gene prediction and the inclusion of ultrafast exact alignment techniques allow sequence classification using existing knowledge from databases. Second, reference-free inspection is enabled by the use of state-of-the-art machine learning techniques that include fast, non-linear dimensionality reduction of oligonucleotide signatures and subsequent clustering algorithms that automatically estimate the number of clusters. The latter also enables the removal of any contaminant, yielding a clean sample. Furthermore, given the data complexity and the ill-posedness of clustering, acdc employs bootstrapping techniques to provide statistically profound confidence values. Tested on a large number of samples from diverse sequencing projects, our software is able to quickly and accurately identify contamination. Results are displayed in an interactive user interface. Acdc can be run from the web as well as a dedicated command line application, which allows easy integration into large sequencing project analysis workflows. Acdc can reliably detect contamination in single-cell genome data. In addition to database-driven detection, it complements existing tools by its unsupervised techniques, which allow for the detection of de novo contaminants. Our contribution has the potential to drastically reduce the amount of resources put into these processes, particularly in the context of limited availability of reference species. As single-cell genome data continues to grow rapidly, acdc adds to the toolkit of crucial quality assurance tools.« less
Testing the accuracy of clustering redshifts with simulations
NASA Astrophysics Data System (ADS)
Scottez, V.; Benoit-Lévy, A.; Coupon, J.; Ilbert, O.; Mellier, Y.
2018-03-01
We explore the accuracy of clustering-based redshift inference within the MICE2 simulation. This method uses the spatial clustering of galaxies between a spectroscopic reference sample and an unknown sample. This study give an estimate of the reachable accuracy of this method. First, we discuss the requirements for the number objects in the two samples, confirming that this method does not require a representative spectroscopic sample for calibration. In the context of next generation of cosmological surveys, we estimated that the density of the Quasi Stellar Objects in BOSS allows us to reach 0.2 per cent accuracy in the mean redshift. Secondly, we estimate individual redshifts for galaxies in the densest regions of colour space ( ˜ 30 per cent of the galaxies) without using the photometric redshifts procedure. The advantage of this procedure is threefold. It allows: (i) the use of cluster-zs for any field in astronomy, (ii) the possibility to combine photo-zs and cluster-zs to get an improved redshift estimation, (iii) the use of cluster-z to define tomographic bins for weak lensing. Finally, we explore this last option and build five cluster-z selected tomographic bins from redshift 0.2 to 1. We found a bias on the mean redshift estimate of 0.002 per bin. We conclude that cluster-z could be used as a primary redshift estimator by next generation of cosmological surveys.
A census of variability in globular cluster M 68 (NGC 4590)
NASA Astrophysics Data System (ADS)
Kains, N.; Arellano Ferro, A.; Figuera Jaimes, R.; Bramich, D. M.; Skottfelt, J.; Jørgensen, U. G.; Tsapras, Y.; Street, R. A.; Browne, P.; Dominik, M.; Horne, K.; Hundertmark, M.; Ipatov, S.; Snodgrass, C.; Steele, I. A.; Lcogt/Robonet Consortium; Alsubai, K. A.; Bozza, V.; Calchi Novati, S.; Ciceri, S.; D'Ago, G.; Galianni, P.; Gu, S.-H.; Harpsøe, K.; Hinse, T. C.; Juncher, D.; Korhonen, H.; Mancini, L.; Popovas, A.; Rabus, M.; Rahvar, S.; Southworth, J.; Surdej, J.; Vilela, C.; Wang, X.-B.; Wertz, O.; Mindstep Consortium
2015-06-01
Aims: We analyse 20 nights of CCD observations in the V and I bands of the globular cluster M 68 (NGC 4590) and use them to detect variable objects. We also obtained electron-multiplying CCD (EMCCD) observations for this cluster in order to explore its core with unprecedented spatial resolution from the ground. Methods: We reduced our data using difference image analysis to achieve the best possible photometry in the crowded field of the cluster. In doing so, we show that when dealing with identical networked telescopes, a reference image from any telescope may be used to reduce data from any other telescope, which facilitates the analysis significantly. We then used our light curves to estimate the properties of the RR Lyrae (RRL) stars in M 68 through Fourier decomposition and empirical relations. The variable star properties then allowed us to derive the cluster's metallicity and distance. Results: M 68 had 45 previously confirmed variables, including 42 RRL and 2 SX Phoenicis (SX Phe) stars. In this paper we determine new periods and search for new variables, especially in the core of the cluster where our method performs particularly well. We detect 4 additional SX Phe stars and confirm the variability of another star, bringing the total number of confirmed variable stars in this cluster to 50. We also used archival data stretching back to 1951 to derive period changes for some of the single-mode RRL stars, and analyse the significant number of double-mode RRL stars in M 68. Furthermore, we find evidence for double-mode pulsation in one of the SX Phe stars in this cluster. Using the different classes of variables, we derived values for the metallicity of the cluster of [Fe/H] = -2.07 ± 0.06 on the ZW scale, or -2.20 ± 0.10 on the UVES scale, and found true distance moduli μ0 = 15.00 ± 0.11 mag (using RR0 stars), 15.00 ± 0.05 mag (using RR1 stars), 14.97 ± 0.11 mag (using SX Phe stars), and 15.00 ± 0.07 mag (using the MV -[Fe/H] relation for RRL stars), corresponding to physical distances of 10.00 ± 0.49, 9.99 ± 0.21, 9.84 ± 0.50, and 10.00 ± 0.30 kpc, respectively. Thanks to the first use of difference image analysis on time-series observations of M 68, we are now confident that we have a complete census of the RRL stars in this cluster. The full Table 2 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/578/A128
NASA Astrophysics Data System (ADS)
Ganeev, Rashid A.
The use of nanoparticles for efficient conversion of the wavelength of ultrashort laser toward the deep UV spectral range through harmonic generation is an attractive application of cluster-containing plasmas. Note that earlier observations of HHG in nanoparticles were limited by using the exotic gas clusters formed during fast cooling of atomic flow from the gas jets 1-4. One can assume the difficulties in definition of the structure of such clusters and the ratio between nanoparticles and atoms/ions in the gas flow. The characterization of gas phase cluster production was currently improved using the sophisticated techniques (e.g., a control of nanoparticle mass and spatial distribution, see the review 5). In the meantime, the plasma nanoparticle HHG has demonstrated some advantages over gas cluster HHG 6. The application of commercially available nanopowders allowed for precisely defining the sizes and structure of these clusters in the plume. The laser ablation technique made possible the predictable manipulation of plasma characteristics, which led to the creation of laser plumes containing mainly nanoparticles with known spatial structure. The latter allows the application of such plumes in nonlinear optics, X-ray emission of clusters, deposition of nanoparticles with fixed parameters on the substrates for semiconductor industry, production of nanostructured and nanocomposite films, etc.
Community detection using Kernel Spectral Clustering with memory
NASA Astrophysics Data System (ADS)
Langone, Rocco; Suykens, Johan A. K.
2013-02-01
This work is related to the problem of community detection in dynamic scenarios, which for instance arises in the segmentation of moving objects, clustering of telephone traffic data, time-series micro-array data etc. A desirable feature of a clustering model which has to capture the evolution of communities over time is the temporal smoothness between clusters in successive time-steps. In this way the model is able to track the long-term trend and in the same time it smooths out short-term variation due to noise. We use the Kernel Spectral Clustering with Memory effect (MKSC) which allows to predict cluster memberships of new nodes via out-of-sample extension and has a proper model selection scheme. It is based on a constrained optimization formulation typical of Least Squares Support Vector Machines (LS-SVM), where the objective function is designed to explicitly incorporate temporal smoothness as a valid prior knowledge. The latter, in fact, allows the model to cluster the current data well and to be consistent with the recent history. Here we propose a generalization of the MKSC model with an arbitrary memory, not only one time-step in the past. The experiments conducted on toy problems confirm our expectations: the more memory we add to the model, the smoother over time are the clustering results. We also compare with the Evolutionary Spectral Clustering (ESC) algorithm which is a state-of-the art method, and we obtain comparable or better results.
NASA Astrophysics Data System (ADS)
Poppe, Sam; Barette, Florian; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu
2016-04-01
The Virunga Volcanic Province (VVP) is situated within the western branch of the East-African Rift. The geochemistry and petrology of its' volcanic products has been studied extensively in a fragmented manner. They represent a unique collection of silica-undersaturated, ultra-alkaline and ultra-potassic compositions, displaying marked geochemical variations over the area occupied by the VVP. We present a novel spatially-explicit database of existing whole-rock geochemical analyses of the VVP volcanics, compiled from international publications, (post-)colonial scientific reports and PhD theses. In the database, a total of 703 geochemical analyses of whole-rock samples collected from the 1950s until recently have been characterised with a geographical location, eruption source location, analytical results and uncertainty estimates for each of these categories. Comparative box plots and Kruskal-Wallis H tests on subsets of analyses with contrasting ages or analytical methods suggest that the overall database accuracy is consistent. We demonstrate how statistical techniques such as Principal Component Analysis (PCA) and subsequent cluster analysis allow the identification of clusters of samples with similar major-element compositions. The spatial patterns represented by the contrasting clusters show that both the historically active volcanoes represent compositional clusters which can be identified based on their contrasted silica and alkali contents. Furthermore, two sample clusters are interpreted to represent the most primitive, deep magma source within the VVP, different from the shallow magma reservoirs that feed the eight dominant large volcanoes. The samples from these two clusters systematically originate from locations which 1. are distal compared to the eight large volcanoes and 2. mostly coincide with the surface expressions of rift faults or NE-SW-oriented inherited Precambrian structures which were reactivated during rifting. The lava from the Mugogo eruption of 1957 belongs to these primitive clusters and is the only known to have erupted outside the current rift valley in historical times. We thus infer there is a distributed hazard of vent opening susceptibility additional to the susceptibility associated with the main Virunga edifices. This study suggests that the statistical analysis of such geochemical database may help to understand complex volcanic plumbing systems and the spatial distribution of volcanic hazards in active and poorly known volcanic areas such as the Virunga Volcanic Province.
On the Interaction of the PKS B1358-113 Radio Galaxy with the A1836 Cluster
Stawarz, L.; Szostek, A.; Cheung, C. C.; ...
2014-10-07
In this study, we present the analysis of multifrequency data gathered for the Fanaroff-Riley type-II (FR II) radio galaxy PKS B1358-113, hosted in the brightest cluster galaxy in the center of A1836. The galaxy harbors one of the most massive black holes known to date, and our analysis of the acquired optical data reveals that this black hole is only weakly active, with a mass accretion ratemore » $$\\dot{M}_{\\rm acc} \\sim 2 \\times 10^{-4} \\, \\dot{M}_{\\rm Edd} \\sim 0.02 \\, M_{\\odot }$$ yr –1. Based on analysis of new Chandra and XMM-Newton X-ray observations and archival radio data, and assuming the well-established model for the evolution of FR II radio galaxies, we derive the preferred range for the jet kinetic luminosity L j ~ (1-6) × 10 –3 L Edd ~ (0.5-3) × 10 45 erg s –1. This is above the values implied by various scaling relations proposed for radio sources in galaxy clusters, being instead very close to the maximum jet power allowed for the given accretion rate. We also constrain the radio source lifetime as τ j ~ 40-70 Myr, meaning the total amount of deposited jet energy E tot ~ (2-8) × 10 60 erg. We argue that approximately half of this energy goes into shock heating of the surrounding thermal gas, and the remaining 50% is deposited into the internal energy of the jet cavity. The detailed analysis of the X-ray data provides indication for the presence of a bow shock driven by the expanding radio lobes into the A1836 cluster environment. We derive the corresponding shock Mach number in the range $$\\mathcal {M}_{\\rm sh} \\sim 2\\hbox{--}4$$, which is one of the highest claimed for clusters or groups of galaxies. This, together with the recently growing evidence that powerful FR II radio galaxies may not be uncommon in the centers of clusters at higher redshifts, supports the idea that jet-induced shock heating may indeed play an important role in shaping the properties of clusters, galaxy groups, and galaxies in formation. In this context, we speculate on a possible bias against detecting stronger jet-driven shocks in poorer environments, resulting from inefficient electron heating at the shock front, combined with a relatively long electron-ion temperature equilibration timescale.« less
Astronomy Fun with Mobile Devices
NASA Astrophysics Data System (ADS)
Pilachowski, Catherine A.; Morris, Frank
2016-01-01
Those mobile devices your students bring to class can do more that tweet and text. Engage your students with these web-based astronomy learning tools that allow students to manipulate astronomical data to learn important concepts. The tools are HTML5, CSS3, Javascript-based applications that provide access to the content on iPad and Android tablets. With "Three Color" students can combine monochrome astronomical images taken through different color filters or in different wavelength regions into a single color image. "Star Clusters" allows students to compare images of clusters with a pre-defined template of colors and sizes to compare clusters of different ages. An adaptation of Travis Rector's "NovaSearch" allows students to examine images of the central regions of the Andromeda Galaxy to find novae and to measure the time over which the nova fades away. New additions to our suite of applications allow students to estimate the surface temperatures of exoplanets and the probability of life elsewhere in the Universe. Further information and access to these web-based tools are available at www.astro.indiana.edu/ala/.