Quantum annealing for combinatorial clustering
NASA Astrophysics Data System (ADS)
Kumar, Vaibhaw; Bass, Gideon; Tomlin, Casey; Dulny, Joseph
2018-02-01
Clustering is a powerful machine learning technique that groups "similar" data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver "qbsolv." The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.
Hierarchical clustering of EMD based interest points for road sign detection
NASA Astrophysics Data System (ADS)
Khan, Jesmin; Bhuiyan, Sharif; Adhami, Reza
2014-04-01
This paper presents an automatic road traffic signs detection and recognition system based on hierarchical clustering of interest points and joint transform correlation. The proposed algorithm consists of the three following stages: interest points detection, clustering of those points and similarity search. At the first stage, good discriminative, rotation and scale invariant interest points are selected from the image edges based on the 1-D empirical mode decomposition (EMD). We propose a two-step unsupervised clustering technique, which is adaptive and based on two criterion. In this context, the detected points are initially clustered based on the stable local features related to the brightness and color, which are extracted using Gabor filter. Then points belonging to each partition are reclustered depending on the dispersion of the points in the initial cluster using position feature. This two-step hierarchical clustering yields the possible candidate road signs or the region of interests (ROIs). Finally, a fringe-adjusted joint transform correlation (JTC) technique is used for matching the unknown signs with the existing known reference road signs stored in the database. The presented framework provides a novel way to detect a road sign from the natural scenes and the results demonstrate the efficacy of the proposed technique, which yields a very low false hit rate.
Determining the Number of Clusters in a Data Set Without Graphical Interpretation
NASA Technical Reports Server (NTRS)
Aguirre, Nathan S.; Davies, Misty D.
2011-01-01
Cluster analysis is a data mining technique that is meant ot simplify the process of classifying data points. The basic clustering process requires an input of data points and the number of clusters wanted. The clustering algorithm will then pick starting C points for the clusters, which can be either random spatial points or random data points. It then assigns each data point to the nearest C point where "nearest usually means Euclidean distance, but some algorithms use another criterion. The next step is determining whether the clustering arrangement this found is within a certain tolerance. If it falls within this tolerance, the process ends. Otherwise the C points are adjusted based on how many data points are in each cluster, and the steps repeat until the algorithm converges,
Monitoring of dispersed smoke-plume layers by determining locations of the data-point clusters
NASA Astrophysics Data System (ADS)
Kovalev, Vladimir; Wold, Cyle; Petkov, Alexander; Min Hao, Wei
2018-04-01
A modified data-processing technique of the signals recorded by zenith-directed lidar, which operates in smoke-polluted atmosphere, is discussed. The technique is based on simple transformations of the lidar backscatter signal and the determination of the spatial location of the data point clusters. The technique allows more reliable detection of the location of dispersed smoke layering. Examples of typical results obtained with lidar in a smokepolluted atmosphere are presented.
Fractal Clustering and Knowledge-driven Validation Assessment for Gene Expression Profiling.
Wang, Lu-Yong; Balasubramanian, Ammaiappan; Chakraborty, Amit; Comaniciu, Dorin
2005-01-01
DNA microarray experiments generate a substantial amount of information about the global gene expression. Gene expression profiles can be represented as points in multi-dimensional space. It is essential to identify relevant groups of genes in biomedical research. Clustering is helpful in pattern recognition in gene expression profiles. A number of clustering techniques have been introduced. However, these traditional methods mainly utilize shape-based assumption or some distance metric to cluster the points in multi-dimension linear Euclidean space. Their results shows poor consistence with the functional annotation of genes in previous validation study. From a novel different perspective, we propose fractal clustering method to cluster genes using intrinsic (fractal) dimension from modern geometry. This method clusters points in such a way that points in the same clusters are more self-affine among themselves than to the points in other clusters. We assess this method using annotation-based validation assessment for gene clusters. It shows that this method is superior in identifying functional related gene groups than other traditional methods.
Efficient clustering aggregation based on data fragments.
Wu, Ou; Hu, Weiming; Maybank, Stephen J; Zhu, Mingliang; Li, Bing
2012-06-01
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2012-05-04
Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
A technique for conducting point pattern analysis of cluster plot stem-maps
C.W. Woodall; J.M. Graham
2004-01-01
Point pattern analysis of forest inventory stem-maps may aid interpretation and inventory estimation of forest attributes. To evaluate the techniques and benefits of conducting point pattern analysis of forest inventory stem-maps, Ripley`s K(t) was calculated for simulated tree spatial distributions and for over 600 USDA Forest Service Forest...
Network module detection: Affinity search technique with the multi-node topological overlap measure
Li, Ai; Horvath, Steve
2009-01-01
Background Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. Findings We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Conclusion Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: PMID:19619323
Network module detection: Affinity search technique with the multi-node topological overlap measure.
Li, Ai; Horvath, Steve
2009-07-20
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis. We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering. Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/
Scalable Prediction of Energy Consumption using Incremental Time Series Clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simmhan, Yogesh; Noor, Muhammad Usman
2013-10-09
Time series datasets are a canonical form of high velocity Big Data, and often generated by pervasive sensors, such as found in smart infrastructure. Performing predictive analytics on time series data can be computationally complex, and requires approximation techniques. In this paper, we motivate this problem using a real application from the smart grid domain. We propose an incremental clustering technique, along with a novel affinity score for determining cluster similarity, which help reduce the prediction error for cumulative time series within a cluster. We evaluate this technique, along with optimizations, using real datasets from smart meters, totaling ~700,000 datamore » points, and show the efficacy of our techniques in improving the prediction error of time series data within polynomial time.« less
Determining the Optimal Number of Clusters with the Clustergram
NASA Technical Reports Server (NTRS)
Fluegemann, Joseph K.; Davies, Misty D.; Aguirre, Nathan D.
2011-01-01
Cluster analysis aids research in many different fields, from business to biology to aerospace. It consists of using statistical techniques to group objects in large sets of data into meaningful classes. However, this process of ordering data points presents much uncertainty because it involves several steps, many of which are subject to researcher judgment as well as inconsistencies depending on the specific data type and research goals. These steps include the method used to cluster the data, the variables on which the cluster analysis will be operating, the number of resulting clusters, and parts of the interpretation process. In most cases, the number of clusters must be guessed or estimated before employing the clustering method. Many remedies have been proposed, but none is unassailable and certainly not for all data types. Thus, the aim of current research for better techniques of determining the number of clusters is generally confined to demonstrating that the new technique excels other methods in performance for several disparate data types. Our research makes use of a new cluster-number-determination technique based on the clustergram: a graph that shows how the number of objects in the cluster and the cluster mean (the ordinate) change with the number of clusters (the abscissa). We use the features of the clustergram to make the best determination of the cluster-number.
On the clustering of multidimensional pictorial data
NASA Technical Reports Server (NTRS)
Bryant, J. D. (Principal Investigator)
1979-01-01
Obvious approaches to reducing the cost (in computer resources) of applying current clustering techniques to the problem of remote sensing are discussed. The use of spatial information in finding fields and in classifying mixture pixels is examined, and the AMOEBA clustering program is described. Internally, a pattern recognition program, from without, AMOEBA appears to be an unsupervised clustering program. It is fast and automatic. No choices (such as arbitrary thresholds to set split/combine sequences) need be made. The problem of finding the number of clusters is solved automatically. At the conclusion of the program, all points in the scene are classified; however, a provision is included for a reject classification of some points which, within the theoretical framework, cannot rationally be assigned to any cluster.
Spitzer Imaging of Planck-Herschel Dusty Proto-Clusters at z=2-3
NASA Astrophysics Data System (ADS)
Cooray, Asantha; Ma, Jingzhe; Greenslade, Joshua; Kubo, Mariko; Nayyeri, Hooshang; Clements, David; Cheng, Tai-An
2018-05-01
We have recently introduced a new proto-cluster selection technique by combing Herschel/SPIRE imaging data and Planck/HFIk all-sky survey point source catalog. These sources are identified as Planck point sources with clumps of Herschel source over-densities with far-IR colors comparable to z=0 ULIRGS redshifted to z=2 to 3. The selection is sensitive to dusty starbursts and obscured QSOs and we have recovered couple of the known proto-clusters and close to 30 new proto-clusters. The candidate proto-clusters selected from this technique have far-IR flux densities several times higher than those that are optically selected, such as using LBG selection, implying that the member galaxies are in a special phase of heightened dusty starburst and dusty QSO activity. This far-IR luminous phase may be short but likely to be necessary piece to understand the whole stellar mass assembly history of clusters. Moreover, our photo-clusters are missed in optical selections, suggesting that optically selected proto-clusters alone do not provide adequate statistics and a comparison of the far-IR and optical selected clusters may reveal the importance of the dusty stellar mass assembly. Here, we propose IRAC observations of six of the highest priority new proto-clusters, to establish the validity of the technique and to determine the total stellar mass through SED models. For a modest observing time the science program will have a substantial impact on an upcoming science topic in cosmology with implications for observations with JWST and WFIRST to understand the mass assembly in the universe.
Techniques and computations for mapping plot clusters that straddle stand boundaries
Charles T. Scott; William A. Bechtold
1995-01-01
Many regional (extensive) forest surveys use clusters of subplots or prism points to reduce survey costs. Two common methods of handling clusters that straddle stand boundaries entail: (1) moving all subplots into a single forest cover type, or (2)"averaging" data across multiple conditions without regard to the boundaries. these methods result in biased...
Oxygen diffusion in alpha-Al2O3. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Cawley, J. D.; Halloran, J. W.; Cooper, A. R.
1984-01-01
Oxygen self diffusion coefficients were determined in single crystal alpha-Al2O3 using the gas exchange technique. The samples were semi-infinite slabs cut from five different boules with varying background impurities. The diffusion direction was parallel to the c-axis. The tracer profiles were determined by two techniques, single spectrum proton activation and secondary ion mass spectrometry. The SIMS proved to be a more useful tool. The determined diffusion coefficients, which were insensitive to impurity levels and oxygen partial pressure, could be described by D = .00151 exp (-572kJ/RT) sq m/s. The insensitivities are discussed in terms of point defect clustering. Two independent models are consistent with the findings, the first considers the clusters as immobile point defect traps which buffer changes in the defect chemistry. The second considers clusters to be mobile and oxygen diffusion to be intrinsic behavior, the mechanism for oxygen transport involving neutral clusters of Schottky quintuplets.
Predicting the points of interaction of small molecules in the NF-κB pathway
2011-01-01
Background The similarity property principle has been used extensively in drug discovery to identify small compounds that interact with specific drug targets. Here we show it can be applied to identify the interactions of small molecules within the NF-κB signalling pathway. Results Clusters that contain compounds with a predominant interaction within the pathway were created, which were then used to predict the interaction of compounds not included in the clustering analysis. Conclusions The technique successfully predicted the points of interactions of compounds that are known to interact with the NF-κB pathway. The method was also shown to be successful when compounds for which the interaction points were unknown were included in the clustering analysis. PMID:21342508
Cosmological Constraints from Galaxy Clustering and the Mass-to-number Ratio of Galaxy Clusters
NASA Astrophysics Data System (ADS)
Tinker, Jeremy L.; Sheldon, Erin S.; Wechsler, Risa H.; Becker, Matthew R.; Rozo, Eduardo; Zu, Ying; Weinberg, David H.; Zehavi, Idit; Blanton, Michael R.; Busha, Michael T.; Koester, Benjamin P.
2012-01-01
We place constraints on the average density (Ω m ) and clustering amplitude (σ8) of matter using a combination of two measurements from the Sloan Digital Sky Survey: the galaxy two-point correlation function, wp (rp ), and the mass-to-galaxy-number ratio within galaxy clusters, M/N, analogous to cluster M/L ratios. Our wp (rp ) measurements are obtained from DR7 while the sample of clusters is the maxBCG sample, with cluster masses derived from weak gravitational lensing. We construct nonlinear galaxy bias models using the Halo Occupation Distribution (HOD) to fit both wp (rp ) and M/N for different cosmological parameters. HOD models that match the same two-point clustering predict different numbers of galaxies in massive halos when Ω m or σ8 is varied, thereby breaking the degeneracy between cosmology and bias. We demonstrate that this technique yields constraints that are consistent and competitive with current results from cluster abundance studies, without the use of abundance information. Using wp (rp ) and M/N alone, we find Ω0.5 m σ8 = 0.465 ± 0.026, with individual constraints of Ω m = 0.29 ± 0.03 and σ8 = 0.85 ± 0.06. Combined with current cosmic microwave background data, these constraints are Ω m = 0.290 ± 0.016 and σ8 = 0.826 ± 0.020. All errors are 1σ. The systematic uncertainties that the M/N technique are most sensitive to are the amplitude of the bias function of dark matter halos and the possibility of redshift evolution between the SDSS Main sample and the maxBCG cluster sample. Our derived constraints are insensitive to the current level of uncertainties in the halo mass function and in the mass-richness relation of clusters and its scatter, making the M/N technique complementary to cluster abundances as a method for constraining cosmology with future galaxy surveys.
The JCMT Gould Belt Survey: Dense Core Clusters in Orion B
NASA Astrophysics Data System (ADS)
Kirk, H.; Johnstone, D.; Di Francesco, J.; Lane, J.; Buckle, J.; Berry, D. S.; Broekhoven-Fiene, H.; Currie, M. J.; Fich, M.; Hatchell, J.; Jenness, T.; Mottram, J. C.; Nutter, D.; Pattle, K.; Pineda, J. E.; Quinn, C.; Salji, C.; Tisi, S.; Hogerheijde, M. R.; Ward-Thompson, D.; The JCMT Gould Belt Survey Team
2016-04-01
The James Clerk Maxwell Telescope Gould Belt Legacy Survey obtained SCUBA-2 observations of dense cores within three sub-regions of Orion B: LDN 1622, NGC 2023/2024, and NGC 2068/2071, all of which contain clusters of cores. We present an analysis of the clustering properties of these cores, including the two-point correlation function and Cartwright’s Q parameter. We identify individual clusters of dense cores across all three regions using a minimal spanning tree technique, and find that in each cluster, the most massive cores tend to be centrally located. We also apply the independent M-Σ technique and find a strong correlation between core mass and the local surface density of cores. These two lines of evidence jointly suggest that some amount of mass segregation in clusters has happened already at the dense core stage.
Clustering analysis for muon tomography data elaboration in the Muon Portal project
NASA Astrophysics Data System (ADS)
Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.
2015-05-01
Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.
Fast distributed large-pixel-count hologram computation using a GPU cluster.
Pan, Yuechao; Xu, Xuewu; Liang, Xinan
2013-09-10
Large-pixel-count holograms are one essential part for big size holographic three-dimensional (3D) display, but the generation of such holograms is computationally demanding. In order to address this issue, we have built a graphics processing unit (GPU) cluster with 32.5 Tflop/s computing power and implemented distributed hologram computation on it with speed improvement techniques, such as shared memory on GPU, GPU level adaptive load balancing, and node level load distribution. Using these speed improvement techniques on the GPU cluster, we have achieved 71.4 times computation speed increase for 186M-pixel holograms. Furthermore, we have used the approaches of diffraction limits and subdivision of holograms to overcome the GPU memory limit in computing large-pixel-count holograms. 745M-pixel and 1.80G-pixel holograms were computed in 343 and 3326 s, respectively, for more than 2 million object points with RGB colors. Color 3D objects with 1.02M points were successfully reconstructed from 186M-pixel hologram computed in 8.82 s with all the above three speed improvement techniques. It is shown that distributed hologram computation using a GPU cluster is a promising approach to increase the computation speed of large-pixel-count holograms for large size holographic display.
A Fast Projection-Based Algorithm for Clustering Big Data.
Wu, Yun; He, Zhiquan; Lin, Hao; Zheng, Yufei; Zhang, Jingfen; Xu, Dong
2018-06-07
With the fast development of various techniques, more and more data have been accumulated with the unique properties of large size (tall) and high dimension (wide). The era of big data is coming. How to understand and discover new knowledge from these data has attracted more and more scholars' attention and has become the most important task in data mining. As one of the most important techniques in data mining, clustering analysis, a kind of unsupervised learning, could group a set data into objectives(clusters) that are meaningful, useful, or both. Thus, the technique has played very important role in knowledge discovery in big data. However, when facing the large-sized and high-dimensional data, most of the current clustering methods exhibited poor computational efficiency and high requirement of computational source, which will prevent us from clarifying the intrinsic properties and discovering the new knowledge behind the data. Based on this consideration, we developed a powerful clustering method, called MUFOLD-CL. The principle of the method is to project the data points to the centroid, and then to measure the similarity between any two points by calculating their projections on the centroid. The proposed method could achieve linear time complexity with respect to the sample size. Comparison with K-Means method on very large data showed that our method could produce better accuracy and require less computational time, demonstrating that the MUFOLD-CL can serve as a valuable tool, at least may play a complementary role to other existing methods, for big data clustering. Further comparisons with state-of-the-art clustering methods on smaller datasets showed that our method was fastest and achieved comparable accuracy. For the convenience of most scholars, a free soft package was constructed.
A method to detect progression of glaucoma using the multifocal visual evoked potential technique
Wangsupadilok, Boonchai; Kanadani, Fabio N.; Grippo, Tomas M.; Liebmann, Jeffrey M.; Ritch, Robert; Hood, Donald C.
2010-01-01
Purpose To describe a method for monitoring progression of glaucoma using the multifocal visual evoked potential (mfVEP) technique. Methods Eighty-seven patients diagnosed with open-angle glaucoma were divided into two groups. Group I, comprised 43 patients who had a repeat mfVEP test within 50 days (mean 0.9 ± 0.5 months), and group II, 44 patients who had a repeat test after at least 6 months (mean 20.7 ± 9.7 months). Monocular mfVEPs were obtained using a 60-sector pattern reversal dartboard display. Monocular and interocular analyses were performed. Data from the two visits were compared. The total number of abnormal test points with P < 5% within the visual field (total scores) and number of abnormal test points within a cluster (cluster size) were calculated. Data for group I provided a measure of test–retest variability independent of disease progression. Data for group II provided a possible measure of progression. Results The difference in the total scores for group II between visit 1 and visit 2 for the interocular and monocular comparison was significant (P < 0.05) as was the difference in cluster size for the interocular comparison (P < 0.05). Group I did not show a significant change in either total score or cluster size. Conclusion The change in the total score and cluster size over time provides a possible method for assessing progression of glaucoma with the mfVEP technique. PMID:18830654
Electronic levels and charge distribution near the interface of nickel
NASA Technical Reports Server (NTRS)
Waber, J. T.
1982-01-01
The energy levels in clusters of nickel atoms were investigated by means of a series of cluster calculations using both the multiple scattering and computational techniques (designated SSO) which avoids the muffin-tin approximation. The point group symmetry of the cluster has significant effect on the energy of levels nominally not occupied. This influences the electron transfer process during chemisorption. The SSO technique permits the approaching atom or molecule plus a small number of nickel atoms to be treated as a cluster. Specifically, molecular levels become more negative in the O atom, as well as in a CO molecule, as the metal atoms are approached. Thus, electron transfer from the nickel and bond formation is facilitated. This result is of importance in understanding chemisorption and catalytic processes.
Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods
NASA Technical Reports Server (NTRS)
Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan
2016-01-01
The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.
Alomari, Yazan M.; MdZin, Reena Rahayu
2015-01-01
Analysis of whole-slide tissue for digital pathology images has been clinically approved to provide a second opinion to pathologists. Localization of focus points from Ki-67-stained histopathology whole-slide tissue microscopic images is considered the first step in the process of proliferation rate estimation. Pathologists use eye pooling or eagle-view techniques to localize the highly stained cell-concentrated regions from the whole slide under microscope, which is called focus-point regions. This procedure leads to a high variety of interpersonal observations and time consuming, tedious work and causes inaccurate findings. The localization of focus-point regions can be addressed as a clustering problem. This paper aims to automate the localization of focus-point regions from whole-slide images using the random patch probabilistic density method. Unlike other clustering methods, random patch probabilistic density method can adaptively localize focus-point regions without predetermining the number of clusters. The proposed method was compared with the k-means and fuzzy c-means clustering methods. Our proposed method achieves a good performance, when the results were evaluated by three expert pathologists. The proposed method achieves an average false-positive rate of 0.84% for the focus-point region localization error. Moreover, regarding RPPD used to localize tissue from whole-slide images, 228 whole-slide images have been tested; 97.3% localization accuracy was achieved. PMID:25793010
DOE Office of Scientific and Technical Information (OSTI.GOV)
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F; Perez, Danny
2017-10-21
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
NASA Astrophysics Data System (ADS)
Huang, Rao; Lo, Li-Ta; Wen, Yuhua; Voter, Arthur F.; Perez, Danny
2017-10-01
Modern molecular-dynamics-based techniques are extremely powerful to investigate the dynamical evolution of materials. With the increase in sophistication of the simulation techniques and the ubiquity of massively parallel computing platforms, atomistic simulations now generate very large amounts of data, which have to be carefully analyzed in order to reveal key features of the underlying trajectories, including the nature and characteristics of the relevant reaction pathways. We show that clustering algorithms, such as the Perron Cluster Cluster Analysis, can provide reduced representations that greatly facilitate the interpretation of complex trajectories. To illustrate this point, clustering tools are used to identify the key kinetic steps in complex accelerated molecular dynamics trajectories exhibiting shape fluctuations in Pt nanoclusters. This analysis provides an easily interpretable coarse representation of the reaction pathways in terms of a handful of clusters, in contrast to the raw trajectory that contains thousands of unique states and tens of thousands of transitions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pal, Ranjan; Chelmis, Charalampos; Aman, Saima
The advent of smart meters and advanced communication infrastructures catalyzes numerous smart grid applications such as dynamic demand response, and paves the way to solve challenging research problems in sustainable energy consumption. The space of solution possibilities are restricted primarily by the huge amount of generated data requiring considerable computational resources and efficient algorithms. To overcome this Big Data challenge, data clustering techniques have been proposed. Current approaches however do not scale in the face of the “increasing dimensionality” problem where a cluster point is represented by the entire customer consumption time series. To overcome this aspect we first rethinkmore » the way cluster points are created and designed, and then design an efficient online clustering technique for demand response (DR) in order to analyze high volume, high dimensional energy consumption time series data at scale, and on the fly. Our online algorithm is randomized in nature, and provides optimal performance guarantees in a computationally efficient manner. Unlike prior work we (i) study the consumption properties of the whole population simultaneously rather than developing individual models for each customer separately, claiming it to be a ‘killer’ approach that breaks the “curse of dimensionality” in online time series clustering, and (ii) provide tight performance guarantees in theory to validate our approach. Our insights are driven by the field of sociology, where collective behavior often emerges as the result of individual patterns and lifestyles.« less
Gibert, Karina; García-Rudolph, Alejandro; García-Molina, Alberto; Roig-Rovira, Teresa; Bernabeu, Montse; Tormos, José María
2008-01-01
Develop a classificatory tool to identify different populations of patients with Traumatic Brain Injury based on the characteristics of deficit and response to treatment. A KDD framework where first, descriptive statistics of every variable was done, data cleaning and selection of relevant variables. Then data was mined using a generalization of Clustering based on rules (CIBR), an hybrid AI and Statistics technique which combines inductive learning (AI) and clustering (Statistics). A prior Knowledge Base (KB) is considered to properly bias the clustering; semantic constraints implied by the KB hold in final clusters, guaranteeing interpretability of the resultis. A generalization (Exogenous Clustering based on rules, ECIBR) is presented, allowing to define the KB in terms of variables which will not be considered in the clustering process itself, to get more flexibility. Several tools as Class panel graph are introduced in the methodology to assist final interpretation. A set of 5 classes was recommended by the system and interpretation permitted profiles labeling. From the medical point of view, composition of classes is well corresponding with different patterns of increasing level of response to rehabilitation treatments. All the patients initially assessable conform a single group. Severe impaired patients are subdivided in four profiles which clearly distinct response patterns. Particularly interesting the partial response profile, where patients could not improve executive functions. Meaningful classes were obtained and, from a semantics point of view, the results were sensibly improved regarding classical clustering, according to our opinion that hybrid AI & Stats techniques are more powerful for KDD than pure ones.
Riemannian multi-manifold modeling and clustering in brain networks
NASA Astrophysics Data System (ADS)
Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.
2017-08-01
This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.
A Cluster-then-label Semi-supervised Learning Approach for Pathology Image Classification.
Peikari, Mohammad; Salama, Sherine; Nofech-Mozes, Sharon; Martel, Anne L
2018-05-08
Completely labeled pathology datasets are often challenging and time-consuming to obtain. Semi-supervised learning (SSL) methods are able to learn from fewer labeled data points with the help of a large number of unlabeled data points. In this paper, we investigated the possibility of using clustering analysis to identify the underlying structure of the data space for SSL. A cluster-then-label method was proposed to identify high-density regions in the data space which were then used to help a supervised SVM in finding the decision boundary. We have compared our method with other supervised and semi-supervised state-of-the-art techniques using two different classification tasks applied to breast pathology datasets. We found that compared with other state-of-the-art supervised and semi-supervised methods, our SSL method is able to improve classification performance when a limited number of labeled data instances are made available. We also showed that it is important to examine the underlying distribution of the data space before applying SSL techniques to ensure semi-supervised learning assumptions are not violated by the data.
ERIC Educational Resources Information Center
Meulman, Jacqueline J.; Verboon, Peter
1993-01-01
Points of view analysis, as a way to deal with individual differences in multidimensional scaling, was largely supplanted by the weighted Euclidean model. It is argued that the approach deserves new attention, especially as a technique to analyze group differences. A streamlined and integrated process is proposed. (SLD)
Spatial patterns in vegetation fires in the Indian region.
Vadrevu, Krishna Prasad; Badarinath, K V S; Anuradha, Eaturu
2008-12-01
In this study, we used fire count datasets derived from Along Track Scanning Radiometer (ATSR) satellite to characterize spatial patterns in fire occurrences across highly diverse geographical, vegetation and topographic gradients in the Indian region. For characterizing the spatial patterns of fire occurrences, observed fire point patterns were tested against the hypothesis of a complete spatial random (CSR) pattern using three different techniques, the quadrat analysis, nearest neighbor analysis and Ripley's K function. Hierarchical nearest neighboring technique was used to depict the 'hotspots' of fire incidents. Of the different states, highest fire counts were recorded in Madhya Pradesh (14.77%) followed by Gujarat (10.86%), Maharastra (9.92%), Mizoram (7.66%), Jharkhand (6.41%), etc. With respect to the vegetation categories, highest number of fires were recorded in agricultural regions (40.26%) followed by tropical moist deciduous vegetation (12.72), dry deciduous vegetation (11.40%), abandoned slash and burn secondary forests (9.04%), tropical montane forests (8.07%) followed by others. Analysis of fire counts based on elevation and slope range suggested that maximum number of fires occurred in low and medium elevation types and in very low to low-slope categories. Results from three different spatial techniques for spatial pattern suggested clustered pattern in fire events compared to CSR. Most importantly, results from Ripley's K statistic suggested that fire events are highly clustered at a lag-distance of 125 miles. Hierarchical nearest neighboring clustering technique identified significant clusters of fire 'hotspots' in different states in northeast and central India. The implications of these results in fire management and mitigation were discussed. Also, this study highlights the potential of spatial point pattern statistics in environmental monitoring and assessment studies with special reference to fire events in the Indian region.
Calderón, Félix; Barros, David; Bueno, José María; Coterón, José Miguel; Fernández, Esther; Gamo, Francisco Javier; Lavandera, José Luís; León, María Luisa; Macdonald, Simon J F; Mallo, Araceli; Manzano, Pilar; Porras, Esther; Fiandor, José María; Castro, Julia
2011-10-13
In 2010, GlaxoSmithKline published the structures of 13533 chemical starting points for antimalarial lead identification. By using an agglomerative structural clustering technique followed by computational filters such as antimalarial activity, physicochemical properties, and dissimilarity to known antimalarial structures, we have identified 47 starting points for lead optimization. Their structures are provided. We invite potential collaborators to work with us to discover new clinical candidates.
Novel laser communications transceiver with internal gimbal-less pointing and tracking
NASA Astrophysics Data System (ADS)
Chalfant, Charles H., III; Orlando, Fred J., Jr.; Gregory, Jeff T.; Sulham, Clifford; O'Neal, Chad B.; Taylor, Geoffrey W.; Craig, Douglas M.; Foshee, James J.; Lovett, J. Timothy
2002-12-01
This paper describes a novel laser communications transceiver for use in multi-platform satellite networks or clusters that provides internal pointing and tracking technique allowing static mounting of the transceiver subsystems and minimal use of mechanical stabilization techniques. This eliminates the need for the large, power hungry, mechanical gimbals that are required for laser cross-link pointing, acquisition and tracking. The miniature transceiver is designed for pointing accuracies required for satellite cross-link distances of between 500 meters to 5000 meters. Specifically, the designs are targeting Air Force Research Lab's TechSat21 Program, although alternative transceiver configurations can provide for much greater link distances and other satellite systems. The receiver and transmitter are connected via fiber optic cabling from a separate electronics subsystem containing the optoelectronics PCBs, thereby eliminating active optoelectronic elements from the transceiver's mechanical housing. The internal acquisition and tracking capability is provided by an advanced micro-electro-mechanical system (MEMS) and an optical design that provides a specific field-of-view based on the satellite cluster's interface specifications. The acquisition & tracking control electronics will utilize conventional closed loop tracking techniques. The link optical power budget and optoelectronics designs allow use of transmitter sources with output powers of near 100 mW. The transceiver will provide data rates of up to 2.5 Gbps and operate at either 1310 nm or 1550 nm. In addition to space-based satellite to satellite cross-links, we are planning to develop a broad range of applications including air to air communications between highly mobile airborne platforms and terrestrial fixed point to point communications.
Internal Cluster Validation on Earthquake Data in the Province of Bengkulu
NASA Astrophysics Data System (ADS)
Rini, D. S.; Novianti, P.; Fransiska, H.
2018-04-01
K-means method is an algorithm for cluster n object based on attribute to k partition, where k < n. There is a deficiency of algorithms that is before the algorithm is executed, k points are initialized randomly so that the resulting data clustering can be different. If the random value for initialization is not good, the clustering becomes less optimum. Cluster validation is a technique to determine the optimum cluster without knowing prior information from data. There are two types of cluster validation, which are internal cluster validation and external cluster validation. This study aims to examine and apply some internal cluster validation, including the Calinski-Harabasz (CH) Index, Sillhouette (S) Index, Davies-Bouldin (DB) Index, Dunn Index (D), and S-Dbw Index on earthquake data in the Bengkulu Province. The calculation result of optimum cluster based on internal cluster validation is CH index, S index, and S-Dbw index yield k = 2, DB Index with k = 6 and Index D with k = 15. Optimum cluster (k = 6) based on DB Index gives good results for clustering earthquake in the Bengkulu Province.
NASA Astrophysics Data System (ADS)
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Automatic detection of erythemato-squamous diseases using k-means clustering.
Ubeyli, Elif Derya; Doğdu, Erdoğan
2010-04-01
A new approach based on the implementation of k-means clustering is presented for automated detection of erythemato-squamous diseases. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. The studied domain contained records of patients with known diagnosis. The k-means clustering algorithm's task was to classify the data points, in this case the patients with attribute data, to one of the five clusters. The algorithm was used to detect the five erythemato-squamous diseases when 33 features defining five disease indications were used. The purpose is to determine an optimum classification scheme for this problem. The present research demonstrated that the features well represent the erythemato-squamous diseases and the k-means clustering algorithm's task achieved high classification accuracies for only five erythemato-squamous diseases.
NASA Astrophysics Data System (ADS)
Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.
2014-12-01
Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.
Bradshaw, Peter L.; Colville, Jonathan F.; Linder, H. Peter
2015-01-01
We used a very large dataset (>40% of all species) from the endemic-rich Cape Floristic Region (CFR) to explore the impact of different weighting techniques, coefficients to calculate similarity among the cells, and clustering approaches on biogeographical regionalisation. The results were used to revise the biogeographical subdivision of the CFR. We show that weighted data (down-weighting widespread species), similarity calculated using Kulczinsky’s second measure, and clustering using UPGMA resulted in the optimal classification. This maximized the number of endemic species, the number of centres recognized, and operational geographic units assigned to centres of endemism (CoEs). We developed a dendrogram branch order cut-off (BOC) method to locate the optimal cut-off points on the dendrogram to define candidate clusters. Kulczinsky’s second measure dendrograms were combined using consensus, identifying areas of conflict which could be due to biotic element overlap or transitional areas. Post-clustering GIS manipulation substantially enhanced the endemic composition and geographic size of candidate CoEs. Although there was broad spatial congruence with previous phytogeographic studies, our techniques allowed for the recovery of additional phytogeographic detail not previously described for the CFR. PMID:26147438
A scoping review of spatial cluster analysis techniques for point-event data.
Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott
2013-05-01
Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.
Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao
2014-01-01
Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470
Uncertainties in the cluster-cluster correlation function
NASA Astrophysics Data System (ADS)
Ling, E. N.; Frenk, C. S.; Barrow, J. D.
1986-12-01
The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.
Communication: Finite size correction in periodic coupled cluster theory calculations of solids.
Liao, Ke; Grüneis, Andreas
2016-10-14
We present a method to correct for finite size errors in coupled cluster theory calculations of solids. The outlined technique shares similarities with electronic structure factor interpolation methods used in quantum Monte Carlo calculations. However, our approach does not require the calculation of density matrices. Furthermore we show that the proposed finite size corrections achieve chemical accuracy in the convergence of second-order Møller-Plesset perturbation and coupled cluster singles and doubles correlation energies per atom for insulating solids with two atomic unit cells using 2 × 2 × 2 and 3 × 3 × 3 k-point meshes only.
Studies in the X-Ray Emission of Clusters of Galaxies and Other Topics
NASA Technical Reports Server (NTRS)
Vrtilek, Jan; Thronson, Harley (Technical Monitor)
2001-01-01
The paper discusses the following: (1) X-ray study of groups of galaxies with Chandra and XMM. (2) X-ray properties of point sources in Chandra deep fields. (3) Study of cluster substructure using wavelet techniques. (4) Combined study of galaxy clusters with X-ray and the S-Z effect. Groups of galaxies are the fundamental building blocks of large scale structure in the Universe. X-ray study of the intragroup medium offers a powerful approach to addressing some of the major questions that still remain about almost all aspects of groups: their ages, origins, importance of composition of various galaxy types, relations to clusters, and origin and enrichment of the intragroup gas. Long exposures with Chandra have opened new opportunities for the study of X-ray background. The presence of substructure within clusters of galaxies has substantial implications for our understanding of cluster evolution as well as fundamental questions in cosmology.
NASA Technical Reports Server (NTRS)
Fomenkova, M. N.
1997-01-01
The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.
A search for novae in M 31 globular clusters
NASA Astrophysics Data System (ADS)
Ciardullo, Robin; Tamblyn, Peter; Phillips, A. C.
1990-10-01
By combining a local sky-fitting algorithm with a Fourier point-spread-function matching technique, nova outbursts have been searched for inside 54 of the globular clusters contained on the Ciardullo et al. (1987 and 1990) H-alpha survey frames of M 31. Over a mean effective survey time of about 2.0 years, no cluster exhibited a magnitude increase indicative of a nova explosion. If the cataclysmic variables (CVs) contained within globular clusters are similar to those found in the field, then these data imply that the overdensity of CVs within globulars is at least several times less than that of the high-luminosity X-ray sources. If tidal capture is responsible for the high density of hard binaries within globulars, then the probability of capturing condensed objects inside globular clusters may depend strongly on the mass of the remnant.
Regional SAR Image Segmentation Based on Fuzzy Clustering with Gamma Mixture Model
NASA Astrophysics Data System (ADS)
Li, X. L.; Zhao, Q. H.; Li, Y.
2017-09-01
Most of stochastic based fuzzy clustering algorithms are pixel-based, which can not effectively overcome the inherent speckle noise in SAR images. In order to deal with the problem, a regional SAR image segmentation algorithm based on fuzzy clustering with Gamma mixture model is proposed in this paper. First, initialize some generating points randomly on the image, the image domain is divided into many sub-regions using Voronoi tessellation technique. Each sub-region is regarded as a homogeneous area in which the pixels share the same cluster label. Then, assume the probability of the pixel to be a Gamma mixture model with the parameters respecting to the cluster which the pixel belongs to. The negative logarithm of the probability represents the dissimilarity measure between the pixel and the cluster. The regional dissimilarity measure of one sub-region is defined as the sum of the measures of pixels in the region. Furthermore, the Markov Random Field (MRF) model is extended from pixels level to Voronoi sub-regions, and then the regional objective function is established under the framework of fuzzy clustering. The optimal segmentation results can be obtained by the solution of model parameters and generating points. Finally, the effectiveness of the proposed algorithm can be proved by the qualitative and quantitative analysis from the segmentation results of the simulated and real SAR images.
Method and system for data clustering for very large databases
NASA Technical Reports Server (NTRS)
Livny, Miron (Inventor); Zhang, Tian (Inventor); Ramakrishnan, Raghu (Inventor)
1998-01-01
Multi-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications
Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric
2016-01-01
Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
NASA Technical Reports Server (NTRS)
Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.
1985-01-01
The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.
On three-dimensional misorientation spaces.
Krakow, Robert; Bennett, Robbie J; Johnstone, Duncan N; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J; Einsle, Joshua F; Midgley, Paul A; Rae, Catherine M F; Hielscher, Ralf
2017-10-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance.
On three-dimensional misorientation spaces
NASA Astrophysics Data System (ADS)
Krakow, Robert; Bennett, Robbie J.; Johnstone, Duncan N.; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J.; Einsle, Joshua F.; Midgley, Paul A.; Rae, Catherine M. F.; Hielscher, Ralf
2017-10-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance.
A Survey on the Taxonomy of Cluster-Based Routing Protocols for Homogeneous Wireless Sensor Networks
Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi
2012-01-01
The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided. PMID:22969350
Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi
2012-01-01
The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided.
Clustering by soft-constraint affinity propagation: applications to gene-expression data.
Leone, Michele; Sumedha; Weigt, Martin
2007-10-15
Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.
On application of image analysis and natural language processing for music search
NASA Astrophysics Data System (ADS)
Gwardys, Grzegorz
2013-10-01
In this paper, I investigate a problem of finding most similar music tracks using, popular in Natural Language Processing, techniques like: TF-IDF and LDA. I de ned document as music track. Each music track is transformed to spectrogram, thanks that, I can use well known techniques to get words from images. I used SURF operation to detect characteristic points and novel approach for their description. The standard kmeans was used for clusterization. Clusterization is here identical with dictionary making, so after that I can transform spectrograms to text documents and perform TF-IDF and LDA. At the final, I can make a query in an obtained vector space. The research was done on 16 music tracks for training and 336 for testing, that are splitted in four categories: Hiphop, Jazz, Metal and Pop. Although used technique is completely unsupervised, results are satisfactory and encouraging to further research.
NASA Technical Reports Server (NTRS)
Donahue, Megan; Scharf, Caleb A.; Mack, Jennifer; Lee, Y. Paul; Postman, Marc; Rosait, Piero; Dickinson, Mark; Voit, G. Mark; Stocke, John T.
2002-01-01
We present and analyze the optical and X-ray catalogs of moderate-redshift cluster candidates from the ROSA TOptical X-Ray Survey, or ROXS. The survey covers the sky area contained in the fields of view of 23 deep archival ROSA T PSPC pointings, 4.8 square degrees. The cross-correlated cluster catalogs were con- structed by comparing two independent catalogs extracted from the optical and X-ray bandpasses, using a matched-filter technique for the optical data and a wavelet technique for the X-ray data. We cross-identified cluster candidates in each catalog. As reported in Paper 1, the matched-filter technique found optical counter- parts for at least 60% (26 out of 43) of the X-ray cluster candidates; the estimated redshifts from the matched filter algorithm agree with at least 7 of 1 1 spectroscopic confirmations (Az 5 0.10). The matched filter technique. with an imaging sensitivity of ml N 23, identified approximately 3 times the number of candidates (155 candidates, 142 with a detection confidence >3 u) found in the X-ray survey of nearly the same area. There are 57 X-ray candidates, 43 of which are unobscured by scattered light or bright stars in the optical images. Twenty-six of these have fairly secure optical counterparts. We find that the matched filter algorithm, when applied to images with galaxy flux sensitivities of mI N 23, is fairly well-matched to discovering z 5 1 clusters detected by wavelets in ROSAT PSPC exposures of 8000-60,000 s. The difference in the spurious fractions between the optical and X-ray (30%) and IO%, respectively) cannot account for the difference in source number. In Paper I, we compared the optical and X-ray cluster luminosity functions and we found that the luminosity functions are consistent if the relationship between X-ray and optical luminosities is steep (Lx o( L&f). Here, in Paper 11, we present the cluster catalogs and a numerical simulation of the ROXS. We also present color-magnitude plots for several of the cluster candidates, and examine the prominence of the red sequence in each. We find that the X-ray clusters in our survey do not all have a prominent red sequence. We conclude that while the red sequence may be a distinct feature in the color-magnitude plots for virialized massive clusters, it may be less distinct in lower mass clusters of galaxies at even moderate redshifts. Multiple, complementary methods of selecting and defining clusters may be essential, particularly at high redshift where all methods start to run into completeness limits, incomplete understanding of physical evolution, and projection effects.
A global optimization perspective on molecular clusters.
Marques, J M C; Pereira, F B; Llanio-Trujillo, J L; Abreu, P E; Albertí, M; Aguilar, A; Pirani, F; Bartolomei, M
2017-04-28
Although there is a long history behind the idea of chemical structure, this is a key concept that continues to challenge chemists. Chemical structure is fundamental to understanding most of the properties of matter and its knowledge for complex systems requires the use of state-of-the-art techniques, either experimental or theoretical. From the theoretical view point, one needs to establish the interaction potential among the atoms or molecules of the system, which contains all the information regarding the energy landscape, and employ optimization algorithms to discover the relevant stationary points. In particular, global optimization methods are of major importance to search for the low-energy structures of molecular aggregates. We review the application of global optimization techniques to several molecular clusters; some new results are also reported. Emphasis is given to evolutionary algorithms and their application in the study of the microsolvation of alkali-metal and Ca 2+ ions with various types of solvents.This article is part of the themed issue 'Theoretical and computational studies of non-equilibrium and non-statistical dynamics in the gas phase, in the condensed phase and at interfaces'. © 2017 The Author(s).
A global optimization perspective on molecular clusters
Pereira, F. B.; Llanio-Trujillo, J. L.; Abreu, P. E.; Albertí, M.; Aguilar, A.; Pirani, F.; Bartolomei, M.
2017-01-01
Although there is a long history behind the idea of chemical structure, this is a key concept that continues to challenge chemists. Chemical structure is fundamental to understanding most of the properties of matter and its knowledge for complex systems requires the use of state-of-the-art techniques, either experimental or theoretical. From the theoretical view point, one needs to establish the interaction potential among the atoms or molecules of the system, which contains all the information regarding the energy landscape, and employ optimization algorithms to discover the relevant stationary points. In particular, global optimization methods are of major importance to search for the low-energy structures of molecular aggregates. We review the application of global optimization techniques to several molecular clusters; some new results are also reported. Emphasis is given to evolutionary algorithms and their application in the study of the microsolvation of alkali-metal and Ca2+ ions with various types of solvents. This article is part of the themed issue ‘Theoretical and computational studies of non-equilibrium and non-statistical dynamics in the gas phase, in the condensed phase and at interfaces’. PMID:28320902
Ardila-Rey, Jorge Alfredo; Rojas-Moreno, Mónica Victoria; Martínez-Tarifa, Juan Manuel; Robles, Guillermo
2014-02-19
Partial discharge (PD) detection is a standardized technique to qualify electrical insulation in machines and power cables. Several techniques that analyze the waveform of the pulses have been proposed to discriminate noise from PD activity. Among them, spectral power ratio representation shows great flexibility in the separation of the sources of PD. Mapping spectral power ratios in two-dimensional plots leads to clusters of points which group pulses with similar characteristics. The position in the map depends on the nature of the partial discharge, the setup and the frequency response of the sensors. If these clusters are clearly separated, the subsequent task of identifying the source of the discharge is straightforward so the distance between clusters can be a figure of merit to suggest the best option for PD recognition. In this paper, two inductive sensors with different frequency responses to pulsed signals, a high frequency current transformer and an inductive loop sensor, are analyzed to test their performance in detecting and separating the sources of partial discharges.
Xu, Xin; Huang, Zhenhua; Graves, Daniel; Pedrycz, Witold
2014-12-01
In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.
Detection of moving clusters by a method of cinematic pairs
NASA Astrophysics Data System (ADS)
Khodjachikh, M. F.; Romanovsky, E. A.
2000-01-01
The algorithm of revealing of pairs stars with common movement is offered and is realized. The basic source is the catalogue HIPPARCOS. On concentration of kinematic pairs it is revealed three unknown earlier moving clusters in constellations: 1) Phe, 2) Cae, 3) Hor and, well known, in 4) UMa are revealed. On an original technique the members of clusters -- all 87 stars are allocated. Coordinates of the clusters convergent point α, delta; (in degrees), spatial speed (in km/s) and age (in 106 yr) from isochrone fitting have made: 1) 51, -29, 19.0, 500, 5/6; 2) 104, -32, 23.7, 300, 9/12; 3) 119, -27, 22.3, 100, 9/22; 4) 303, -31, 16.7, 500, 16/8 accordingly. Numerator of fraction -- number of stars identified as the members of clusters, denominator -- number of the probable members (with unknown radial speeds). The preliminary qualitative analysis of clusters spatial structure is carried in view of their dynamic evolution.
On three-dimensional misorientation spaces
Bennett, Robbie J.; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J.; Einsle, Joshua F.; Midgley, Paul A.; Rae, Catherine M. F.; Hielscher, Ralf
2017-01-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance. PMID:29118660
Business Planning in the Light of Neuro-fuzzy and Predictive Forecasting
NASA Astrophysics Data System (ADS)
Chakrabarti, Prasun; Basu, Jayanta Kumar; Kim, Tai-Hoon
In this paper we have pointed out gain sensing on forecast based techniques.We have cited an idea of neural based gain forecasting. Testing of sequence of gain pattern is also verifies using statsistical analysis of fuzzy value assignment. The paper also suggests realization of stable gain condition using K-Means clustering of data mining. A new concept of 3D based gain sensing has been pointed out. The paper also reveals what type of trend analysis can be observed for probabilistic gain prediction.
Query by example video based on fuzzy c-means initialized by fixed clustering center
NASA Astrophysics Data System (ADS)
Hou, Sujuan; Zhou, Shangbo; Siddique, Muhammad Abubakar
2012-04-01
Currently, the high complexity of video contents has posed the following major challenges for fast retrieval: (1) efficient similarity measurements, and (2) efficient indexing on the compact representations. A video-retrieval strategy based on fuzzy c-means (FCM) is presented for querying by example. Initially, the query video is segmented and represented by a set of shots, each shot can be represented by a key frame, and then we used video processing techniques to find visual cues to represent the key frame. Next, because the FCM algorithm is sensitive to the initializations, here we initialized the cluster center by the shots of query video so that users could achieve appropriate convergence. After an FCM cluster was initialized by the query video, each shot of query video was considered a benchmark point in the aforesaid cluster, and each shot in the database possessed a class label. The similarity between the shots in the database with the same class label and benchmark point can be transformed into the distance between them. Finally, the similarity between the query video and the video in database was transformed into the number of similar shots. Our experimental results demonstrated the performance of this proposed approach.
Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts
Duan, Weisi; Song, Min; Yates, Alexander
2009-01-01
Background We aim to solve the problem of determining word senses for ambiguous biomedical terms with minimal human effort. Methods We build a fully automated system for Word Sense Disambiguation by designing a system that does not require manually-constructed external resources or manually-labeled training examples except for a single ambiguous word. The system uses a novel and efficient graph-based algorithm to cluster words into groups that have the same meaning. Our algorithm follows the principle of finding a maximum margin between clusters, determining a split of the data that maximizes the minimum distance between pairs of data points belonging to two different clusters. Results On a test set of 21 ambiguous keywords from PubMed abstracts, our system has an average accuracy of 78%, outperforming a state-of-the-art unsupervised system by 2% and a baseline technique by 23%. On a standard data set from the National Library of Medicine, our system outperforms the baseline by 6% and comes within 5% of the accuracy of a supervised system. Conclusion Our system is a novel, state-of-the-art technique for efficiently finding word sense clusters, and does not require training data or human effort for each new word to be disambiguated. PMID:19344480
Techniques for spatio-temporal analysis of vegetation fires in the topical belt of Africa
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brivio, P.A.; Ober, G.; Koffi, B.
1995-12-31
Biomass burning of forests and savannas is a phenomenon of continental or even global proportions, capable of causing large scale environmental changes. Satellite space observations, in particular from NOAA-AVHRR GAC data, are the only source of information allowing one to document burning patterns at regional and continental scale and over long periods of time. This paper presents some techniques, such as clustering and rose-diagram, useful in the spatial-temporal analysis of satellite derived fires maps to characterize the evolution of spatial patterns of vegetation fires at regional scale. An automatic clustering approach is presented which enables one to describe and parameterizemore » spatial distribution of fire patterns at different scales. The problem of geographical distribution of vegetation fires with respect to some location of interest, point or line, is also considered and presented. In particular rose-diagrams are used to relate fires patterns to some reference point, as experimental sites of tropospheric chemistry measurements. Different temporal data-sets in the tropical belt of Africa, covering both Northern and Southern Hemisphere dry seasons, using these techniques were analyzed and showed very promising results when compared with data from rain chemistry studies at different sampling sites in the equatorial forest.« less
Big Data Analytics for Demand Response: Clustering Over Space and Time
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chelmis, Charalampos; Kolte, Jahanvi; Prasanna, Viktor K.
The pervasive deployment of advanced sensing infrastructure in Cyber-Physical systems, such as the Smart Grid, has resulted in an unprecedented data explosion. Such data exhibit both large volumes and high velocity characteristics, two of the three pillars of Big Data, and have a time-series notion as datasets in this context typically consist of successive measurements made over a time interval. Time-series data can be valuable for data mining and analytics tasks such as identifying the “right” customers among a diverse population, to target for Demand Response programs. However, time series are challenging to mine due to their high dimensionality. Inmore » this paper, we motivate this problem using a real application from the smart grid domain. We explore novel representations of time-series data for BigData analytics, and propose a clustering technique for determining natural segmentation of customers and identification of temporal consumption patterns. Our method is generizable to large-scale, real-world scenarios, without making any assumptions about the data. We evaluate our technique using real datasets from smart meters, totaling ~ 18,200,000 data points, and show the efficacy of our technique in efficiency detecting the number of optimal number of clusters.« less
Quantifying Biomass from Point Clouds by Connecting Representations of Ecosystem Structure
NASA Astrophysics Data System (ADS)
Hendryx, S. M.; Barron-Gafford, G.
2017-12-01
Quantifying terrestrial ecosystem biomass is an essential part of monitoring carbon stocks and fluxes within the global carbon cycle and optimizing natural resource management. Point cloud data such as from lidar and structure from motion can be effective for quantifying biomass over large areas, but significant challenges remain in developing effective models that allow for such predictions. Inference models that estimate biomass from point clouds are established in many environments, yet, are often scale-dependent, needing to be fitted and applied at the same spatial scale and grid size at which they were developed. Furthermore, training such models typically requires large in situ datasets that are often prohibitively costly or time-consuming to obtain. We present here a scale- and sensor-invariant framework for efficiently estimating biomass from point clouds. Central to this framework, we present a new algorithm, assignPointsToExistingClusters, that has been developed for finding matches between in situ data and clusters in remotely-sensed point clouds. The algorithm can be used for assessing canopy segmentation accuracy and for training and validating machine learning models for predicting biophysical variables. We demonstrate the algorithm's efficacy by using it to train a random forest model of above ground biomass in a shrubland environment in Southern Arizona. We show that by learning a nonlinear function to estimate biomass from segmented canopy features we can reduce error, especially in the presence of inaccurate clusterings, when compared to a traditional, deterministic technique to estimate biomass from remotely measured canopies. Our random forest on cluster features model extends established methods of training random forest regressions to predict biomass of subplots but requires significantly less training data and is scale invariant. The random forest on cluster features model reduced mean absolute error, when evaluated on all test data in leave one out cross validation, by 40.6% from deterministic mesquite allometry and 35.9% from the inferred ecosystem-state allometric function. Our framework should allow for the inference of biomass more efficiently than common subplot methods and more accurately than individual tree segmentation methods in densely vegetated environments.
Clustering analysis of moving target signatures
NASA Astrophysics Data System (ADS)
Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto
2010-04-01
Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
ERIC Educational Resources Information Center
Kalil, Ariel; Ziol-Guest, Kathleen M.; Coley, Rebekah Levine
2005-01-01
Based on adolescent mothers' reports, longitudinal patterns of involvement of young, unmarried biological fathers (n=77) in teenage-mother families using cluster analytic techniques were examined. Approximately one third of fathers maintained high levels of involvement over time, another third demonstrated low involvement at both time points, and…
Chen, Bin; Kim, Hyunmi; Keasler, Samuel J; Nellas, Ricky B
2008-04-03
The aggregation-volume-bias Monte Carlo based simulation technique, which has led to our recent success in vapor-liquid nucleation research, was extended to the study of crystal nucleation processes. In contrast to conventional bulk-phase techniques, this method deals with crystal nucleation events in cluster systems. This approach was applied to the crystal nucleation of Lennard-Jonesium under a wide range of undercooling conditions from 35% to 13% below the triple point. It was found that crystal nucleation in these model clusters proceeds initially via a vapor-liquid like aggregation followed by the formation of crystals inside the aggregates. The separation of these two stages of nucleation is distinct except at deeper undercooling conditions where the crystal nucleation barrier was found to diminish. The simulation results obtained for these two nucleation steps are separately compared to the classical nucleation theory (CNT). For the vapor-liquid nucleation step, the CNT was shown to provide a reasonable description of the critical cluster size but overestimate the barrier heights, consistent with previous simulation studies. On the contrary, for the crystal nucleation step, nearly perfect agreement with the barrier heights was found between the simulations and the CNT. For the critical cluster size, the comparison is more difficult as the simulation data were found to be sensitive to the definition of the solid cluster, but a stringent criterion and lower undercooling conditions generally lead to results closer with the CNT. Additional simulations at undercooling conditions of 40% or above indicate a nearly barrierless transition from the liquid to crystalline-like structure for sufficiently large clusters, which leads to further departure of the barrier height predicted by the CNT from the simulation data for the aggregation step. This is consistent with the latest experimental results on argon that show an unusually large underestimation of the nucleation rate by the CNT toward deep undercooling conditions.
Patient identification using a near-infrared laser scanner
NASA Astrophysics Data System (ADS)
Manit, Jirapong; Bremer, Christina; Schweikard, Achim; Ernst, Floris
2017-03-01
We propose a new biometric approach where the tissue thickness of a person's forehead is used as a biometric feature. Given that the spatial registration of two 3D laser scans of the same human face usually produces a low error value, the principle of point cloud registration and its error metric can be applied to human classification techniques. However, by only considering the spatial error, it is not possible to reliably verify a person's identity. We propose to use a novel near-infrared laser-based head tracking system to determine an additional feature, the tissue thickness, and include this in the error metric. Using MRI as a ground truth, data from the foreheads of 30 subjects was collected from which a 4D reference point cloud was created for each subject. The measurements from the near-infrared system were registered with all reference point clouds using the ICP algorithm. Afterwards, the spatial and tissue thickness errors were extracted, forming a 2D feature space. For all subjects, the lowest feature distance resulted from the registration of a measurement and the reference point cloud of the same person. The combined registration error features yielded two clusters in the feature space, one from the same subject and another from the other subjects. When only the tissue thickness error was considered, these clusters were less distinct but still present. These findings could help to raise safety standards for head and neck cancer patients and lays the foundation for a future human identification technique.
Delpla, Ianis; Florea, Mihai; Pelletier, Geneviève; Rodriguez, Manuel J
2018-06-04
Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. Copyright © 2018 Elsevier Ltd. All rights reserved.
Galaxy cluster center detection methods with weak lensing
NASA Astrophysics Data System (ADS)
Simet, Melanie
The precise location of galaxy cluster centers is a persistent problem in weak lensing mass estimates and in interpretations of clusters in a cosmological context. In this work, we test methods of centroid determination from weak lensing data and examine the effects of such self-calibration on the measured masses. Drawing on lensing data from the Sloan Digital Sky Survey Stripe 82, a 275 square degree region of coadded data in the Southern Galactic Cap, together with a catalog of MaxBCG clusters, we show that halo substructure as well as shape noise and stochasticity in galaxy positions limit the precision of such a self-calibration (in the context of Stripe 82, to ˜ 500 h-1 kpc or larger) and bias the mass estimates around these points to a level that is likely unacceptable for the purposes of making cosmological measurements. We also project the usefulness of this technique in future surveys.
Impact of Neutrinos on Dark Matter Halo Environment
NASA Astrophysics Data System (ADS)
Court, Travis; Villaescusa-Navarro, Francisco
2018-01-01
The spatial clustering of galaxies is commonly used to infer the shape of the matter power spectrum and therefore to place constraints on the value of the cosmological parameters. In order to extract the maximum information from galaxy surveys it is required to provide accurate theoretical predictions. The first step to model galaxy clustering is to understand the spatial distribution of the structures where they reside: dark matter halos. I will show that the clustering of halos does not depend only on mass, but on other quantities like local matter overdensity. I will point out that halo clustering is also sensitive to the local overdensity of the cosmic neutrino background. I will show that splitting halos according to neutrino overdensity induces a very large scale-dependence bias, an effect that may lead to a new technique to constraint the sum of the neutrino masses.
Mai, Xiaofeng; Liu, Jie; Wu, Xiong; Zhang, Qun; Guo, Changjian; Yang, Yanfu; Li, Zhaohui
2017-02-06
A Stokes-space modulation format classification (MFC) technique is proposed for coherent optical receivers by using a non-iterative clustering algorithm. In the clustering algorithm, two simple parameters are calculated to help find the density peaks of the data points in Stokes space and no iteration is required. Correct MFC can be realized in numerical simulations among PM-QPSK, PM-8QAM, PM-16QAM, PM-32QAM and PM-64QAM signals within practical optical signal-to-noise ratio (OSNR) ranges. The performance of the proposed MFC algorithm is also compared with those of other schemes based on clustering algorithms. The simulation results show that good classification performance can be achieved using the proposed MFC scheme with moderate time complexity. Proof-of-concept experiments are finally implemented to demonstrate MFC among PM-QPSK/16QAM/64QAM signals, which confirm the feasibility of our proposed MFC scheme.
Monitoring of Building Construction by 4D Change Detection Using Multi-temporal SAR Images
NASA Astrophysics Data System (ADS)
Yang, C. H.; Pang, Y.; Soergel, U.
2017-05-01
Monitoring urban changes is important for city management, urban planning, updating of cadastral map, etc. In contrast to conventional field surveys, which are usually expensive and slow, remote sensing techniques are fast and cost-effective alternatives. Spaceborne synthetic aperture radar (SAR) sensors provide radar images captured rapidly over vast areas at fine spatiotemporal resolution. In addition, the active microwave sensors are capable of day-and-night vision and independent of weather conditions. These advantages make multi-temporal SAR images suitable for scene monitoring. Persistent scatterer interferometry (PSI) detects and analyses PS points, which are characterized by strong, stable, and coherent radar signals throughout a SAR image sequence and can be regarded as substructures of buildings in built-up cities. Attributes of PS points, for example, deformation velocities, are derived and used for further analysis. Based on PSI, a 4D change detection technique has been developed to detect disappearance and emergence of PS points (3D) at specific times (1D). In this paper, we apply this 4D technique to the centre of Berlin, Germany, to investigate its feasibility and application for construction monitoring. The aims of the three case studies are to monitor construction progress, business districts, and single buildings, respectively. The disappearing and emerging substructures of the buildings are successfully recognized along with their occurrence times. The changed substructures are then clustered into single construction segments based on DBSCAN clustering and α-shape outlining for object-based analysis. Compared with the ground truth, these spatiotemporal results have proven able to provide more detailed information for construction monitoring.
Phelps, G.A.
2008-01-01
This report describes some simple spatial statistical methods to explore the relationships of scattered points to geologic or other features, represented by points, lines, or areas. It also describes statistical methods to search for linear trends and clustered patterns within the scattered point data. Scattered points are often contained within irregularly shaped study areas, necessitating the use of methods largely unexplored in the point pattern literature. The methods take advantage of the power of modern GIS toolkits to numerically approximate the null hypothesis of randomly located data within an irregular study area. Observed distributions can then be compared with the null distribution of a set of randomly located points. The methods are non-parametric and are applicable to irregularly shaped study areas. Patterns within the point data are examined by comparing the distribution of the orientation of the set of vectors defined by each pair of points within the data with the equivalent distribution for a random set of points within the study area. A simple model is proposed to describe linear or clustered structure within scattered data. A scattered data set of damage to pavement and pipes, recorded after the 1989 Loma Prieta earthquake, is used as an example to demonstrate the analytical techniques. The damage is found to be preferentially located nearer a set of mapped lineaments than randomly scattered damage, suggesting range-front faulting along the base of the Santa Cruz Mountains is related to both the earthquake damage and the mapped lineaments. The damage also exhibit two non-random patterns: a single cluster of damage centered in the town of Los Gatos, California, and a linear alignment of damage along the range front of the Santa Cruz Mountains, California. The linear alignment of damage is strongest between 45? and 50? northwest. This agrees well with the mean trend of the mapped lineaments, measured as 49? northwest.
Significant locations in auxiliary data as seeds for typical use cases of point clustering
NASA Astrophysics Data System (ADS)
Kröger, Johannes
2018-05-01
Random greedy clustering and grid-based clustering are highly susceptible by their initial parameters. When used for point data clustering in maps they often change the apparent distribution of the underlying data. We propose a process that uses precomputed weighted seed points for the initialization of clusters, for example from local maxima in population density data. Exemplary results from the clustering of a dataset of petrol stations are presented.
Rigid-Cluster Models of Conformational Transitions in Macromolecular Machines and Assemblies
Kim, Moon K.; Jernigan, Robert L.; Chirikjian, Gregory S.
2005-01-01
We present a rigid-body-based technique (called rigid-cluster elastic network interpolation) to generate feasible transition pathways between two distinct conformations of a macromolecular assembly. Many biological molecules and assemblies consist of domains which act more or less as rigid bodies during large conformational changes. These collective motions are thought to be strongly related with the functions of a system. This fact encourages us to simply model a macromolecule or assembly as a set of rigid bodies which are interconnected with distance constraints. In previous articles, we developed coarse-grained elastic network interpolation (ENI) in which, for example, only Cα atoms are selected as representatives in each residue of a protein. We interpolate distance differences of two conformations in ENI by using a simple quadratic cost function, and the feasible conformations are generated without steric conflicts. Rigid-cluster interpolation is an extension of the ENI method with rigid-clusters replacing point masses. Now the intermediate conformations in an anharmonic pathway can be determined by the translational and rotational displacements of large clusters in such a way that distance constraints are observed. We present the derivation of the rigid-cluster model and apply it to a variety of macromolecular assemblies. Rigid-cluster ENI is then modified for a hybrid model represented by a mixture of rigid clusters and point masses. Simulation results show that both rigid-cluster and hybrid ENI methods generate sterically feasible pathways of large systems in a very short time. For example, the HK97 virus capsid is an icosahedral symmetric assembly composed of 60 identical asymmetric units. Its original Hessian matrix size for a Cα coarse-grained model is >(300,000)2. However, it reduces to (84)2 when we apply the rigid-cluster model with icosahedral symmetry constraints. The computational cost of the interpolation no longer scales heavily with the size of structures; instead, it depends strongly on the minimal number of rigid clusters into which the system can be decomposed. PMID:15833998
PID techniques: Alternatives to RICH Methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vavra, J.; /SLAC
2011-03-01
In this review article we discuss the recent progress in PID techniques other than the RICH methods. In particular we mention the recent progress in the Transition Radiation Detector (TRD), dE/dx cluster counting, and Time Of Flight (TOF) techniques. The TRD technique is mature and has been tried in many hadron colliders. It needs space though, about 20cm of detector radial space for every factor of 10 in the {pi}/e rejection power, and this tends to make such detectors large. Although the cluster counting technique is an old idea, it was never tried in a real physics experiment. Recently, theremore » are efforts to revive it for the SuperB experiment using He-based gases and waveform digitizing electronics. A factor of almost 2 improvement, compared to the classical dE/dx performance, is possible in principle. However, the complexity of the data analysis will be substantial. The TOF technique is well established, but introduction of new fast MCP-PMT and G-APD detectors creates new possibilities. It seems that resolutions below 20-30ps may be possible at some point in the future with relatively small systems, and perhaps this could be pushed down to 10-15ps with very small systems, assuming that one can solve many systematic issues. However, the cost, rate limitation, aging and cross-talk in multi-anode devices at high BW are problems. There are several groups working on these issues, so progress is likely. Table 6 summarizes the author's opinion of pros and cons of various detectors presented in this paper based on their operational capabilities. We refer the reader to Ref.40 for discussion of other more general limits from the PID point of view.« less
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O(kn) time, where k denotes the current number of centers. Traditional techniques for accelerating nearest neighbor searching involve storing the k centers in a data structure. However, because of the iterative nature of the algorithm, this data structure would need to be rebuilt with each new iteration. Our approach is to store the data points in a kd-tree data structure. The assignment of points to nearest neighbors is carried out by a filtering process, which successively eliminates centers that can not possibly be the nearest neighbor for a given region of space. This algorithm is significantly faster, because large groups of data points can be assigned to their nearest center in a single operation. Preliminary results on a number of real Landsat datasets show that our revised ISOCLUS-like scheme runs about twice as fast.
Ardila-Rey, Jorge Alfredo; Rojas-Moreno, Mónica Victoria; Martínez-Tarifa, Juan Manuel; Robles, Guillermo
2014-01-01
Partial discharge (PD) detection is a standardized technique to qualify electrical insulation in machines and power cables. Several techniques that analyze the waveform of the pulses have been proposed to discriminate noise from PD activity. Among them, spectral power ratio representation shows great flexibility in the separation of the sources of PD. Mapping spectral power ratios in two-dimensional plots leads to clusters of points which group pulses with similar characteristics. The position in the map depends on the nature of the partial discharge, the setup and the frequency response of the sensors. If these clusters are clearly separated, the subsequent task of identifying the source of the discharge is straightforward so the distance between clusters can be a figure of merit to suggest the best option for PD recognition. In this paper, two inductive sensors with different frequency responses to pulsed signals, a high frequency current transformer and an inductive loop sensor, are analyzed to test their performance in detecting and separating the sources of partial discharges. PMID:24556674
Multiwavelength study of Chandra X-ray sources in the Antennae
NASA Astrophysics Data System (ADS)
Clark, D. M.; Eikenberry, S. S.; Brandl, B. R.; Wilson, J. C.; Carson, J. C.; Henderson, C. P.; Hayward, T. L.; Barry, D. J.; Ptak, A. F.; Colbert, E. J. M.
2011-01-01
We use Wide-field InfraRed Camera (WIRC) infrared (IR) images of the Antennae (NGC 4038/4039) together with the extensive catalogue of 120 X-ray point sources to search for counterpart candidates. Using our proven frame-tie technique, we find 38 X-ray sources with IR counterparts, almost doubling the number of IR counterparts to X-ray sources that we first identified. In our photometric analysis, we consider the 35 IR counterparts that are confirmed star clusters. We show that the clusters with X-ray sources tend to be brighter, Ks≈ 16 mag, with (J-Ks) = 1.1 mag. We then use archival Hubble Space Telescope (HST) images of the Antennae to search for optical counterparts to the X-ray point sources. We employ our previous IR-to-X-ray frame-tie as an intermediary to establish a precise optical-to-X-ray frame-tie with <0.6 arcsec rms positional uncertainty. Due to the high optical source density near the X-ray sources, we determine that we cannot reliably identify counterparts. Comparing the HST positions to the 35 identified IR star cluster counterparts, we find optical matches for 27 of these sources. Using Bruzual-Charlot spectral evolutionary models, we find that most clusters associated with an X-ray source are massive, and young, ˜ 106 yr.
Statistical Analysis of Large Scale Structure by the Discrete Wavelet Transform
NASA Astrophysics Data System (ADS)
Pando, Jesus
1997-10-01
The discrete wavelet transform (DWT) is developed as a general statistical tool for the study of large scale structures (LSS) in astrophysics. The DWT is used in all aspects of structure identification including cluster analysis, spectrum and two-point correlation studies, scale-scale correlation analysis and to measure deviations from Gaussian behavior. The techniques developed are demonstrated on 'academic' signals, on simulated models of the Lymanα (Lyα) forests, and on observational data of the Lyα forests. This technique can detect clustering in the Ly-α clouds where traditional techniques such as the two-point correlation function have failed. The position and strength of these clusters in both real and simulated data is determined and it is shown that clusters exist on scales as large as at least 20 h-1 Mpc at significance levels of 2-4 σ. Furthermore, it is found that the strength distribution of the clusters can be used to distinguish between real data and simulated samples even where other traditional methods have failed to detect differences. Second, a method for measuring the power spectrum of a density field using the DWT is developed. All common features determined by the usual Fourier power spectrum can be calculated by the DWT. These features, such as the index of a power law or typical scales, can be detected even when the samples are geometrically complex, the samples are incomplete, or the mean density on larger scales is not known (the infrared uncertainty). Using this method the spectra of Ly-α forests in both simulated and real samples is calculated. Third, a method for measuring hierarchical clustering is introduced. Because hierarchical evolution is characterized by a set of rules of how larger dark matter halos are formed by the merging of smaller halos, scale-scale correlations of the density field should be one of the most sensitive quantities in determining the merging history. We show that these correlations can be completely determined by the correlations between discrete wavelet coefficients on adjacent scales and at nearly the same spatial position, Cj,j+12/cdot2. Scale-scale correlations on two samples of the QSO Ly-α forests absorption spectra are computed. Lastly, higher order statistics are developed to detect deviations from Gaussian behavior. These higher order statistics are necessary to fully characterize the Ly-α forests because the usual 2nd order statistics, such as the two-point correlation function or power spectrum, give inconclusive results. It is shown how this technique takes advantage of the locality of the DWT to circumvent the central limit theorem. A non-Gaussian spectrum is defined and this spectrum reveals not only the magnitude, but the scales of non-Gaussianity. When applied to simulated and observational samples of the Ly-α clouds, it is found that different popular models of structure formation have different spectra while two, independent observational data sets, have the same spectra. Moreover, the non-Gaussian spectra of real data sets are significantly different from the spectra of various possible random samples. (Abstract shortened by UMI.)
Analyzing coastal environments by means of functional data analysis
NASA Astrophysics Data System (ADS)
Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.
2017-07-01
Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.
The composite sequential clustering technique for analysis of multispectral scanner data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.
Motion estimation accuracy for visible-light/gamma-ray imaging fusion for portable portal monitoring
NASA Astrophysics Data System (ADS)
Karnowski, Thomas P.; Cunningham, Mark F.; Goddard, James S.; Cheriyadat, Anil M.; Hornback, Donald E.; Fabris, Lorenzo; Kerekes, Ryan A.; Ziock, Klaus-Peter; Gee, Timothy F.
2010-01-01
The use of radiation sensors as portal monitors is increasing due to heightened concerns over the smuggling of fissile material. Portable systems that can detect significant quantities of fissile material that might be present in vehicular traffic are of particular interest. We have constructed a prototype, rapid-deployment portal gamma-ray imaging portal monitor that uses machine vision and gamma-ray imaging to monitor multiple lanes of traffic. Vehicles are detected and tracked by using point detection and optical flow methods as implemented in the OpenCV software library. Points are clustered together but imperfections in the detected points and tracks cause errors in the accuracy of the vehicle position estimates. The resulting errors cause a "blurring" effect in the gamma image of the vehicle. To minimize these errors, we have compared a variety of motion estimation techniques including an estimate using the median of the clustered points, a "best-track" filtering algorithm, and a constant velocity motion estimation model. The accuracy of these methods are contrasted and compared to a manually verified ground-truth measurement by quantifying the rootmean- square differences in the times the vehicles cross the gamma-ray image pixel boundaries compared with a groundtruth manual measurement.
Privacy protection versus cluster detection in spatial epidemiology.
Olson, Karen L; Grannis, Shaun J; Mandl, Kenneth D
2006-11-01
Patient data that includes precise locations can reveal patients' identities, whereas data aggregated into administrative regions may preserve privacy and confidentiality. We investigated the effect of varying degrees of address precision (exact latitude and longitude vs the center points of zip code or census tracts) on detection of spatial clusters of cases. We simulated disease outbreaks by adding supplementary spatially clustered emergency department visits to authentic hospital emergency department syndromic surveillance data. We identified clusters with a spatial scan statistic and evaluated detection rate and accuracy. More clusters were identified, and clusters were more accurately detected, when exact locations were used. That is, these clusters contained at least half of the simulated points and involved few additional emergency department visits. These results were especially apparent when the synthetic clustered points crossed administrative boundaries and fell into multiple zip code or census tracts. The spatial cluster detection algorithm performed better when addresses were analyzed as exact locations than when they were analyzed as center points of zip code or census tracts, particularly when the clustered points crossed administrative boundaries. Use of precise addresses offers improved performance, but this practice must be weighed against privacy concerns in the establishment of public health data exchange policies.
An extended affinity propagation clustering method based on different data density types.
Zhao, XiuLi; Xu, WeiXiang
2015-01-01
Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.
Comparison of 3D point clouds produced by LIDAR and UAV photoscan in the Rochefort cave (Belgium)
NASA Astrophysics Data System (ADS)
Watlet, Arnaud; Triantafyllou, Antoine; Kaufmann, Olivier; Le Mouelic, Stéphane
2016-04-01
Amongst today's techniques that are able to produce 3D point clouds, LIDAR and UAV (Unmanned Aerial Vehicle) photogrammetry are probably the most commonly used. Both methods have their own advantages and limitations. LIDAR scans create high resolution and high precision 3D point clouds, but such methods are generally costly, especially for sporadic surveys. Compared to LIDAR, UAV (e.g. drones) are cheap and flexible to use in different kind of environments. Moreover, the photogrammetric processing workflow of digital images taken with UAV becomes easier with the rise of many affordable software packages (e.g. Agisoft, PhotoModeler3D, VisualSFM). We present here a challenging study made at the Rochefort Cave Laboratory (South Belgium) comprising surface and underground surveys. The site is located in the Belgian Variscan fold-and-thrust belt, a region that shows many karstic networks within Devonian limestone units. A LIDAR scan has been acquired in the main chamber of the cave (~ 15000 m³) to spatialize 3D point cloud of its inner walls and infer geological beds and structures. Even if the use of LIDAR instrument was not really comfortable in such caving environment, the collected data showed a remarkable precision according to few control points geometry. We also decided to perform another challenging survey of the same cave chamber by modelling a 3D point cloud using photogrammetry of a set of DSLR camera pictures taken from the ground and UAV pictures. The aim was to compare both techniques in terms of (i) implementation of data acquisition and processing, (ii) quality of resulting 3D points clouds (points density, field vs cloud recovery and points precision), (iii) their application for geological purposes. Through Rochefort case study, main conclusions are that LIDAR technique provides higher density point clouds with slightly higher precision than photogrammetry method. However, 3D data modeled by photogrammetry provide visible light spectral information for each modeled voxel and interpolated vertices that can be a useful attributes for clustering during data treatment. We thus illustrate such applications to the Rochefort cave by using both sources of 3D information to quantify the orientation of inaccessible geological structures (e.g. faults, tectonic and gravitational joints, and sediments bedding), cluster these structures using color information gathered from UAV's 3D point cloud and compare these data to structural data surveyed on the field. An additional drone photoscan was also conducted in the surface sinkhole giving access to the surveyed underground cavity to seek geological bodies' connections.
An unsupervised classification technique for multispectral remote sensing data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Cummings, R. E.
1973-01-01
Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.
The application of automatic recognition techniques in the Apollo 9 SO-65 experiment
NASA Technical Reports Server (NTRS)
Macdonald, R. B.
1970-01-01
A synoptic feature analysis is reported on Apollo 9 remote earth surface photographs that uses the methods of statistical pattern recognition to classify density points and clusterings in digital conversion of optical data. A computer derived geological map of a geological test site indicates that geological features of the range are separable, but that specific rock types are not identifiable.
NASA Astrophysics Data System (ADS)
Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.
2014-06-01
Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods which have been recently employed to analyse PNSD data, however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K-means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and silhouette width validation values and the K-means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K-means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectra to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
NASA Astrophysics Data System (ADS)
Salimi, F.; Ristovski, Z.; Mazaheri, M.; Laiman, R.; Crilley, L. R.; He, C.; Clifford, S.; Morawska, L.
2014-11-01
Long-term measurements of particle number size distribution (PNSD) produce a very large number of observations and their analysis requires an efficient approach in order to produce results in the least possible time and with maximum accuracy. Clustering techniques are a family of sophisticated methods that have been recently employed to analyse PNSD data; however, very little information is available comparing the performance of different clustering techniques on PNSD data. This study aims to apply several clustering techniques (i.e. K means, PAM, CLARA and SOM) to PNSD data, in order to identify and apply the optimum technique to PNSD data measured at 25 sites across Brisbane, Australia. A new method, based on the Generalised Additive Model (GAM) with a basis of penalised B-splines, was proposed to parameterise the PNSD data and the temporal weight of each cluster was also estimated using the GAM. In addition, each cluster was associated with its possible source based on the results of this parameterisation, together with the characteristics of each cluster. The performances of four clustering techniques were compared using the Dunn index and Silhouette width validation values and the K means technique was found to have the highest performance, with five clusters being the optimum. Therefore, five clusters were found within the data using the K means technique. The diurnal occurrence of each cluster was used together with other air quality parameters, temporal trends and the physical properties of each cluster, in order to attribute each cluster to its source and origin. The five clusters were attributed to three major sources and origins, including regional background particles, photochemically induced nucleated particles and vehicle generated particles. Overall, clustering was found to be an effective technique for attributing each particle size spectrum to its source and the GAM was suitable to parameterise the PNSD data. These two techniques can help researchers immensely in analysing PNSD data for characterisation and source apportionment purposes.
NASA Astrophysics Data System (ADS)
Suri, Veenu; Meyer, Michael; Greenbaum, Alexandra Z.; Bell, Cameron; Beichman, Charles; Gordon, Karl D.; Greene, Thomas P.; Hodapp, K.; Horner, Scott; Johnstone, Doug; Leisenring, Jarron; Manara, Carlos; Mann, Rita; Misselt, K.; Raileanu, Roberta; Rieke, Marcia; Roellig, Thomas
2018-01-01
We describe observations of the embedded young cluster associated with the HII region NGC 2024 planned as part of the guaranteed time observing program for the James Webb Space Telescope with the NIRCam (Near Infrared Camera) instrument. Our goal is to obtain a census of the cluster down to 2 Jupiter masses, viewed through 10-20 magnitudes of extinction, using multi-band filter photometry, both broadband filters and intermediate band filters that are expected to be sensitive to temperature and surface gravity. The cluster contains several bright point sources as well as extended emission due to reflected light, thermal emission from warm dust, as well as nebular line emission. We first developed techniques to better understand which point sources would saturate in our target fields when viewed through several JWST NIRCam filters. Using images of the field with the WISE satellite in filters W1 and W2, as well as 2MASS (J and H) bands, we devised an algorithm that takes the K-band magnitudes of point sources in the field, and the known saturation limits of several NIRCam filters to estimate the impact of the extended emission on survey sensitivity. We provide an overview of our anticipated results, detecting the low mass end of the IMF as well as planetary mass objects likely liberated through dynamical interactions.
Clustering cancer gene expression data by projective clustering ensemble
Yu, Xianxue; Yu, Guoxian
2017-01-01
Gene expression data analysis has paramount implications for gene treatments, cancer diagnosis and other domains. Clustering is an important and promising tool to analyze gene expression data. Gene expression data is often characterized by a large amount of genes but with limited samples, thus various projective clustering techniques and ensemble techniques have been suggested to combat with these challenges. However, it is rather challenging to synergy these two kinds of techniques together to avoid the curse of dimensionality problem and to boost the performance of gene expression data clustering. In this paper, we employ a projective clustering ensemble (PCE) to integrate the advantages of projective clustering and ensemble clustering, and to avoid the dilemma of combining multiple projective clusterings. Our experimental results on publicly available cancer gene expression data show PCE can improve the quality of clustering gene expression data by at least 4.5% (on average) than other related techniques, including dimensionality reduction based single clustering and ensemble approaches. The empirical study demonstrates that, to further boost the performance of clustering cancer gene expression data, it is necessary and promising to synergy projective clustering with ensemble clustering. PCE can serve as an effective alternative technique for clustering gene expression data. PMID:28234920
NASA Astrophysics Data System (ADS)
Abdullah, Mohamed H.; Wilson, Gillian; Klypin, Anatoly
2018-07-01
We introduce GalWeight, a new technique for assigning galaxy cluster membership. This technique is specifically designed to simultaneously maximize the number of bona fide cluster members while minimizing the number of contaminating interlopers. The GalWeight technique can be applied to both massive galaxy clusters and poor galaxy groups. Moreover, it is effective in identifying members in both the virial and infall regions with high efficiency. We apply the GalWeight technique to MDPL2 and Bolshoi N-body simulations, and find that it is >98% accurate in correctly assigning cluster membership. We show that GalWeight compares very favorably against four well-known existing cluster membership techniques (shifting gapper, den Hartog, caustic, SIM). We also apply the GalWeight technique to a sample of 12 Abell clusters (including the Coma cluster) using observations from the Sloan Digital Sky Survey. We conclude by discussing GalWeight’s potential for other astrophysical applications.
Inference from clustering with application to gene-expression microarrays.
Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M
2002-01-01
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Nagwani, Naresh Kumar; Deo, Shirish V
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm.
Nagwani, Naresh Kumar; Deo, Shirish V.
2014-01-01
Understanding of the compressive strength of concrete is important for activities like construction arrangement, prestressing operations, and proportioning new mixtures and for the quality assurance. Regression techniques are most widely used for prediction tasks where relationship between the independent variables and dependent (prediction) variable is identified. The accuracy of the regression techniques for prediction can be improved if clustering can be used along with regression. Clustering along with regression will ensure the more accurate curve fitting between the dependent and independent variables. In this work cluster regression technique is applied for estimating the compressive strength of the concrete and a novel state of the art is proposed for predicting the concrete compressive strength. The objective of this work is to demonstrate that clustering along with regression ensures less prediction errors for estimating the concrete compressive strength. The proposed technique consists of two major stages: in the first stage, clustering is used to group the similar characteristics concrete data and then in the second stage regression techniques are applied over these clusters (groups) to predict the compressive strength from individual clusters. It is found from experiments that clustering along with regression techniques gives minimum errors for predicting compressive strength of concrete; also fuzzy clustering algorithm C-means performs better than K-means algorithm. PMID:25374939
Privacy Protection Versus Cluster Detection in Spatial Epidemiology
Olson, Karen L.; Grannis, Shaun J.; Mandl, Kenneth D.
2006-01-01
Objectives. Patient data that includes precise locations can reveal patients’ identities, whereas data aggregated into administrative regions may preserve privacy and confidentiality. We investigated the effect of varying degrees of address precision (exact latitude and longitude vs the center points of zip code or census tracts) on detection of spatial clusters of cases. Methods. We simulated disease outbreaks by adding supplementary spatially clustered emergency department visits to authentic hospital emergency department syndromic surveillance data. We identified clusters with a spatial scan statistic and evaluated detection rate and accuracy. Results. More clusters were identified, and clusters were more accurately detected, when exact locations were used. That is, these clusters contained at least half of the simulated points and involved few additional emergency department visits. These results were especially apparent when the synthetic clustered points crossed administrative boundaries and fell into multiple zip code or census tracts. Conclusions. The spatial cluster detection algorithm performed better when addresses were analyzed as exact locations than when they were analyzed as center points of zip code or census tracts, particularly when the clustered points crossed administrative boundaries. Use of precise addresses offers improved performance, but this practice must be weighed against privacy concerns in the establishment of public health data exchange policies. PMID:17018828
Traveling-cluster approximation for uncorrelated amorphous systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sen, A.K.; Mills, R.; Kaplan, T.
1984-11-15
We have developed a formalism for including cluster effects in the one-electron Green's function for a positionally disordered (liquid or amorphous) system without any correlation among the scattering sites. This method is an extension of the technique known as the traveling-cluster approximation (TCA) originally obtained and applied to a substitutional alloy by Mills and Ratanavararaksa. We have also proved the appropriate fixed-point theorem, which guarantees, for a bounded local potential, that the self-consistent equations always converge upon iteration to a unique, Herglotz solution. To our knowledge, this is the only analytic theory for considering cluster effects. Furthermore, we have performedmore » some computer calculations in the pair TCA, for the model case of delta-function potentials on a one-dimensional random chain. These results have been compared with ''exact calculations'' (which, in principle, take into account all cluster effects) and with the coherent-potential approximation (CPA), which is the single-site TCA. The density of states for the pair TCA clearly shows some improvement over the CPA and yet, apparently, the pair approximation distorts some of the features of the exact results.« less
NASA Astrophysics Data System (ADS)
Mostafa, Mostafa E.
2005-10-01
The present study shows that reconstructing the reduced stress tensor (RST) from the measurable fault-slip data (FSD) and the immeasurable shear stress magnitudes (SSM) is a typical iteration problem. The result of direct inversion of FSD presented by Angelier [1990. Geophysical Journal International 103, 363-376] is considered as a starting point (zero step iteration) where all SSM are assigned constant value ( λ=√{3}/2). By iteration, the SSM and RST update each other until they converge to fixed values. Angelier [1990. Geophysical Journal International 103, 363-376] designed the function upsilon ( υ) and the two estimators: relative upsilon (RUP) and (ANG) to express the divergence between the measured and calculated shear stresses. Plotting individual faults' RUP at successive iteration steps shows that they tend to zero (simulated data) or to fixed values (real data) at a rate depending on the orientation and homogeneity of the data. FSD of related origin tend to aggregate in clusters. Plots of the estimators ANG versus RUP show that by iteration, labeled data points are disposed in clusters about a straight line. These two new plots form the basis of a technique for separating FSD into homogeneous clusters.
NASA Astrophysics Data System (ADS)
Sun, Jiajia; Li, Yaoguo
2017-02-01
Joint inversion that simultaneously inverts multiple geophysical data sets to recover a common Earth model is increasingly being applied to exploration problems. Petrophysical data can serve as an effective constraint to link different physical property models in such inversions. There are two challenges, among others, associated with the petrophysical approach to joint inversion. One is related to the multimodality of petrophysical data because there often exist more than one relationship between different physical properties in a region of study. The other challenge arises from the fact that petrophysical relationships have different characteristics and can exhibit point, linear, quadratic, or exponential forms in a crossplot. The fuzzy c-means (FCM) clustering technique is effective in tackling the first challenge and has been applied successfully. We focus on the second challenge in this paper and develop a joint inversion method based on variations of the FCM clustering technique. To account for the specific shapes of petrophysical relationships, we introduce several different fuzzy clustering algorithms that are capable of handling different shapes of petrophysical relationships. We present two synthetic and one field data examples and demonstrate that, by choosing appropriate distance measures for the clustering component in the joint inversion algorithm, the proposed joint inversion method provides an effective means of handling common petrophysical situations we encounter in practice. The jointly inverted models have both enhanced structural similarity and increased petrophysical correlation, and better represent the subsurface in the spatial domain and the parameter domain of physical properties.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellanos, Sergio; Ekstrom, Kai E.; Autruffe, Antoine
2016-05-01
In recent years, high-performance multicrystalline silicon (HPMC-Si) has emerged as an attractive alternative to traditional ingot-based multicrystalline silicon (mc-Si), with a similar cost structure but improved cell performance. Herein, we evaluate the gettering response of traditional mc-Si and HPMC-Si. Microanalytical techniques demonstrate that HPMC-Si and mc-Si share similar lifetime-limiting defect types but have different relative concentrations and distributions. HPMC-Si shows a substantial lifetime improvement after P-gettering compared with mc-Si, chiefly because of lower area fraction of dislocation-rich clusters. In both materials, the dislocation clusters and grain boundaries were associated with relatively higher interstitial iron point-defect concentrations after diffusion, which ismore » suggestive of dissolving metal-impurity precipitates. The relatively fewer dislocation clusters in HPMC-Si are shown to exhibit similar characteristics to those found in mc-Si. Given similar governing principles, a proxy to determine relative recombination activity of dislocation clusters developed for mc-Si is successfully transferred to HPMC-Si.« less
Automated extraction and analysis of rock discontinuity characteristics from 3D point clouds
NASA Astrophysics Data System (ADS)
Bianchetti, Matteo; Villa, Alberto; Agliardi, Federico; Crosta, Giovanni B.
2016-04-01
A reliable characterization of fractured rock masses requires an exhaustive geometrical description of discontinuities, including orientation, spacing, and size. These are required to describe discontinuum rock mass structure, perform Discrete Fracture Network and DEM modelling, or provide input for rock mass classification or equivalent continuum estimate of rock mass properties. Although several advanced methodologies have been developed in the last decades, a complete characterization of discontinuity geometry in practice is still challenging, due to scale-dependent variability of fracture patterns and difficult accessibility to large outcrops. Recent advances in remote survey techniques, such as terrestrial laser scanning and digital photogrammetry, allow a fast and accurate acquisition of dense 3D point clouds, which promoted the development of several semi-automatic approaches to extract discontinuity features. Nevertheless, these often need user supervision on algorithm parameters which can be difficult to assess. To overcome this problem, we developed an original Matlab tool, allowing fast, fully automatic extraction and analysis of discontinuity features with no requirements on point cloud accuracy, density and homogeneity. The tool consists of a set of algorithms which: (i) process raw 3D point clouds, (ii) automatically characterize discontinuity sets, (iii) identify individual discontinuity surfaces, and (iv) analyse their spacing and persistence. The tool operates in either a supervised or unsupervised mode, starting from an automatic preliminary exploration data analysis. The identification and geometrical characterization of discontinuity features is divided in steps. First, coplanar surfaces are identified in the whole point cloud using K-Nearest Neighbor and Principal Component Analysis algorithms optimized on point cloud accuracy and specified typical facet size. Then, discontinuity set orientation is calculated using Kernel Density Estimation and principal vector similarity criteria. Poles to points are assigned to individual discontinuity objects using easy custom vector clustering and Jaccard distance approaches, and each object is segmented into planar clusters using an improved version of the DBSCAN algorithm. Modal set orientations are then recomputed by cluster-based orientation statistics to avoid the effects of biases related to cluster size and density heterogeneity of the point cloud. Finally, spacing values are measured between individual discontinuity clusters along scanlines parallel to modal pole vectors, whereas individual feature size (persistence) is measured using 3D convex hull bounding boxes. Spacing and size are provided both as raw population data and as summary statistics. The tool is optimized for parallel computing on 64bit systems, and a Graphic User Interface (GUI) has been developed to manage data processing, provide several outputs, including reclassified point clouds, tables, plots, derived fracture intensity parameters, and export to modelling software tools. We present test applications performed both on synthetic 3D data (simple 3D solids) and real case studies, validating the results with existing geomechanical datasets.
Unsupervised classification of earth resources data.
NASA Technical Reports Server (NTRS)
Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.
1972-01-01
A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.
Discrete range clustering using Monte Carlo methods
NASA Technical Reports Server (NTRS)
Chatterji, G. B.; Sridhar, B.
1993-01-01
For automatic obstacle avoidance guidance during rotorcraft low altitude flight, a reliable model of the nearby environment is needed. Such a model may be constructed by applying surface fitting techniques to the dense range map obtained by active sensing using radars. However, for covertness, passive sensing techniques using electro-optic sensors are desirable. As opposed to the dense range map obtained via active sensing, passive sensing algorithms produce reliable range at sparse locations, and therefore, surface fitting techniques to fill the gaps in the range measurement are not directly applicable. Both for automatic guidance and as a display for aiding the pilot, these discrete ranges need to be grouped into sets which correspond to objects in the nearby environment. The focus of this paper is on using Monte Carlo methods for clustering range points into meaningful groups. One of the aims of the paper is to explore whether simulated annealing methods offer significant advantage over the basic Monte Carlo method for this class of problems. We compare three different approaches and present application results of these algorithms to a laboratory image sequence and a helicopter flight sequence.
Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid
2008-01-01
A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
A nonparametric clustering technique which estimates the number of clusters
NASA Technical Reports Server (NTRS)
Ramey, D. B.
1983-01-01
In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
NASA Technical Reports Server (NTRS)
Smedes, H. W.; Linnerud, H. J.; Woolaver, L. B.; Su, M. Y.; Jayroe, R. R.
1972-01-01
Two clustering techniques were used for terrain mapping by computer of test sites in Yellowstone National Park. One test was made with multispectral scanner data using a composite technique which consists of (1) a strictly sequential statistical clustering which is a sequential variance analysis, and (2) a generalized K-means clustering. In this composite technique, the output of (1) is a first approximation of the cluster centers. This is the input to (2) which consists of steps to improve the determination of cluster centers by iterative procedures. Another test was made using the three emulsion layers of color-infrared aerial film as a three-band spectrometer. Relative film densities were analyzed using a simple clustering technique in three-color space. Important advantages of the clustering technique over conventional supervised computer programs are (1) human intervention, preparation time, and manipulation of data are reduced, (2) the computer map, gives unbiased indication of where best to select the reference ground control data, (3) use of easy to obtain inexpensive film, and (4) the geometric distortions can be easily rectified by simple standard photogrammetric techniques.
Performance analysis of clustering techniques over microarray data: A case study
NASA Astrophysics Data System (ADS)
Dash, Rasmita; Misra, Bijan Bihari
2018-03-01
Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques onmore » two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.« less
A new method to unveil embedded stellar clusters
NASA Astrophysics Data System (ADS)
Lombardi, Marco; Lada, Charles J.; Alves, João
2017-11-01
In this paper we present a novel method to identify and characterize stellar clusters deeply embedded in a dark molecular cloud. The method is based on measuring stellar surface density in wide-field infrared images using star counting techniques. It takes advantage of the differing H-band luminosity functions (HLFs) of field stars and young stellar populations and is able to statistically associate each star in an image as a member of either the background stellar population or a young stellar population projected on or near the cloud. Moreover, the technique corrects for the effects of differential extinction toward each individual star. We have tested this method against simulations as well as observations. In particular, we have applied the method to 2MASS point sources observed in the Orion A and B complexes, and the results obtained compare very well with those obtained from deep Spitzer and Chandra observations where presence of infrared excess or X-ray emission directly determines membership status for every star. Additionally, our method also identifies unobscured clusters and a low resolution version of the Orion stellar surface density map shows clearly the relatively unobscured and diffuse OB 1a and 1b sub-groups and provides useful insights on their spatial distribution.
Banerjee, Arindam; Ghosh, Joydeep
2004-05-01
Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of "curse of dimensionality" effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (spkmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, spkmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the spkmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques. Index Terms-Balanced clustering, expectation maximization (EM), frequency-sensitive competitive learning (FSCL), high-dimensional clustering, kmeans, normalized data, scalable clustering, streaming data, text clustering.
Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations and numerical simulations
NASA Technical Reports Server (NTRS)
Darouzet, Fabien; DeKeyser, Johan; Decreau, Pierrette; Gallagher, Dennis; Pierrard, Viviane; Lemaire, Joseph; Dandouras, Iannis; Matsui, Hiroshi; Dunlop, Malcolm; Andre, Mats
2005-01-01
Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles can be derived from the plasma frequency and/or from the spacecraft potential (note that the electron spectrometer is usually not operating inside the plasmasphere); ion velocity is also measured onboard these satellites (but ion density is not reliable because of instrumental limitations). The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 minutes; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations for 3 plume events and compare CLUSTER in-situ data (panel A) with global images of the plasmasphere obtained from IMAGE (panel B), and with numerical simulations for the formation of plumes based on a model that includes the interchange instability mechanism (panel C). In particular, we study the geometry and the orientation of plasmaspheric plumes by using a four-point analysis method, the spatial gradient. We also compare several aspects of their motion as determined by different methods: (i) inner and outer plume boundary velocity calculated from time delays of this boundary observed by the wave experiment WHISPER on the four spacecraft, (ii) ion velocity derived from the ion spectrometer CIS onboard CLUSTER, (iii) drift velocity measured by the electron drift instrument ED1 onboard CLUSTER and (iv) global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.
Using Clustering to Establish Climate Regimes from PCM Output
NASA Technical Reports Server (NTRS)
Oglesby, Robert; Arnold, James E. (Technical Monitor); Hoffman, Forrest; Hargrove, W. W.; Erickson, D.
2002-01-01
A multivariate statistical clustering technique--based on the k-means algorithm of Hartigan has been used to extract patterns of climatological significance from 200 years of general circulation model (GCM) output. Originally developed and implemented on a Beowulf-style parallel computer constructed by Hoffman and Hargrove from surplus commodity desktop PCs, the high performance parallel clustering algorithm was previously applied to the derivation of ecoregions from map stacks of 9 and 25 geophysical conditions or variables for the conterminous U.S. at a resolution of 1 sq km. Now applied both across space and through time, the clustering technique yields temporally-varying climate regimes predicted by transient runs of the Parallel Climate Model (PCM). Using a business-as-usual (BAU) scenario and clustering four fields of significance to the global water cycle (surface temperature, precipitation, soil moisture, and snow depth) from 1871 through 2098, the authors' analysis shows an increase in spatial area occupied by the cluster or climate regime which typifies desert regions (i.e., an increase in desertification) and a decrease in the spatial area occupied by the climate regime typifying winter-time high latitude perma-frost regions. The patterns of cluster changes have been analyzed to understand the predicted variability in the water cycle on global and continental scales. In addition, representative climate regimes were determined by taking three 10-year averages of the fields 100 years apart for northern hemisphere winter (December, January, and February) and summer (June, July, and August). The result is global maps of typical seasonal climate regimes for 100 years in the past, for the present, and for 100 years into the future. Using three-dimensional data or phase space representations of these climate regimes (i.e., the cluster centroids), the authors demonstrate the portion of this phase space occupied by the land surface at all points in space and time. Any single spot on the globe will exist in one of these climate regimes at any single point in time. By incrementing time, that same spot will trace out a trajectory or orbit between and among these climate regimes (or atmospheric states) in phase (or state) space. When a geographic region enters a state it never previously visited, a climatic change is said to have occurred. Tracing out the entire trajectory of a single spot on the globe yields a 'manifold' in state space representing the shape of its predicted climate occupancy. This sort of analysis enables a researcher to more easily grasp the multivariate behavior of the climate system.
Vajda, Szilárd; Rangoni, Yves; Cecotti, Hubert
2015-01-01
For training supervised classifiers to recognize different patterns, large data collections with accurate labels are necessary. In this paper, we propose a generic, semi-automatic labeling technique for large handwritten character collections. In order to speed up the creation of a large scale ground truth, the method combines unsupervised clustering and minimal expert knowledge. To exploit the potential discriminant complementarities across features, each character is projected into five different feature spaces. After clustering the images in each feature space, the human expert labels the cluster centers. Each data point inherits the label of its cluster’s center. A majority (or unanimity) vote decides the label of each character image. The amount of human involvement (labeling) is strictly controlled by the number of clusters – produced by the chosen clustering approach. To test the efficiency of the proposed approach, we have compared, and evaluated three state-of-the art clustering methods (k-means, self-organizing maps, and growing neural gas) on the MNIST digit data set, and a Lampung Indonesian character data set, respectively. Considering a k-nn classifier, we show that labeling manually only 1.3% (MNIST), and 3.2% (Lampung) of the training data, provides the same range of performance than a completely labeled data set would. PMID:25870463
Lavenn, Christophe; Albrieux, Florian; Tuel, Alain; Demessence, Aude
2014-03-15
Research interest in ultra small gold thiolate clusters has been rising in recent years for the challenges they offer to bring together properties of nanoscience and well-defined materials from molecular chemistry. Here, a new atomically well-defined Au10 gold nanocluster surrounded by ten 4-aminothiophenolate ligands is reported. Its synthesis followed the similar conditions reported for the elaboration of Au144(SR)60, but because the reactivity of thiophenol ligands is different from alkanethiol derivates, smaller Au10 clusters were formed. Different techniques, such as ESI-MS, elemental analysis, XRD, TGA, XPS and UV-vis-NIR experiments, have been carried out to determine the Au10(SPh-pNH2)10 formula. Photoemission experiment has been done and reveals that the Au10 clusters are weakly luminescent as opposed to the amino-based ultra-small gold clusters. This observation points out that the emission of gold thiolate clusters is highly dependent on both the structure of the gold core and the type of the ligands at the surface. In addition, ultra-small amino-functionalized clusters offer the opportunity for extended work on self-assembling networks or deposition on substrates for nanotechnologies or catalytic applications. Copyright © 2013 Elsevier Inc. All rights reserved.
Liao, Minlei; Li, Yunfeng; Kianifard, Farid; Obi, Engels; Arcona, Stephen
2016-03-02
Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and "clusters" found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster. A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward's methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores on comorbidity index scores from the pre-and post-HD periods, while increasing costs were associated with more sharply increasing comorbidity scores. The K-means CA method appeared to be the most appropriate in healthcare claims data with highly skewed cost information when taking into account both change of cost patterns and sample size in the smallest cluster.
Variations in the magnetopause current layer
NASA Astrophysics Data System (ADS)
Laakso, H. E.; Middleton, H. R.
2017-12-01
We use multi-point observations from the Cluster spacecraft to investigate the variations in the magnetopause current layer. With help of the curlometer technique one can determine the magnetopause current and its variability. Most of the time the magnetopause location is moving back and forth, so during any given pass the current layer is crossed several times. We use such crossings to investigate the characteristics of the current layer as the solar wind pressure varies (and the magnetopause moves accordingly). In addition we take an advantage of the ambient electron measurements from the EDI experiment which have been calibrated against the PEACE electron spectrometer data. These data can be used to detect fast variations of 1 keV electrons at resolution of 1-100 ms. Overall, Cluster observations are highly complimentary to the MMS observations due to the polar orbit of the Cluster spacecraft which provide fast vertical profiles of the magnetopause current layer.
A PSF-based approach to Kepler/K2 data - II. Exoplanet candidates in Praesepe (M 44)
NASA Astrophysics Data System (ADS)
Libralato, M.; Nardiello, D.; Bedin, L. R.; Borsato, L.; Granata, V.; Malavolta, L.; Piotto, G.; Ochner, P.; Cunial, A.; Nascimbeni, V.
2016-12-01
In this work, we keep pushing K2 data to a high photometric precision, close to that of the Kepler main mission, using a point-spread function (PSF)-based, neighbour-subtraction technique, which also overcome the dilution effects in crowded environments. We analyse the open cluster M 44 (NGC 2632), observed during the K2 Campaign 5, and extract light curves of stars imaged on module 14, where most of the cluster lies. We present two candidate exoplanets hosted by cluster members and five by field stars. As a by-product of our investigation, we find 1680 eclipsing binaries and variable stars, 1071 of which are new discoveries. Among them, we report the presence of a heartbeat binary star. Together with this work, we release to the community a catalogue with the variable stars and the candidate exoplanets found, as well as all our raw and detrended light curves.
Small Au clusters on a defective MgO(1 0 0) surface
NASA Astrophysics Data System (ADS)
Barcaro, Giovanni; Fortunelli, Alessandro
2008-05-01
The lowest energy structures of small T]>rndm where rndm is a random number (Metropolis criterion), the new configuration is accepted, otherwise the old configuration is kept, and the process is iterated. For each size we performed 3-5 BH runs, each one composed of 20-25 Monte Carlo steps, using a value of 0.5 eV as kT in the Metropolis criterion. Previous experience [13-15] shows that this is sufficient to single out the global minimum for adsorbed clusters of this size, and that the BH approach is more efficient as a global optimization algorithm than other techniques such as simulated annealing [18]. The MgO support was described via an (Mg 12O 12) cluster embedded in an array of ±2.0 a.u. point charges and repulsive pseudopotentials on the positive charges in direct contact with the cluster (see Ref. [15] for more details on the method). The atoms of the oxide cluster and the point charges were located at the lattice positions of the MgO rock-salt bulk structure using the experimental lattice constant of 4.208 Å. At variance with the ), evaluated by subtracting the energy of the oxide surface and of the metal cluster, both frozen in their interacting configuration, from the value of the total energy of the system, and by taking the absolute value; (ii) the binding energy of the metal cluster (E), evaluated by subtracting the energy of the isolated metal atoms from the total energy of the metal cluster in its interacting configuration, and by taking the absolute value; (iii) the metal cluster distortion energy (E), which corresponds to the difference between the energy of the metal cluster in the configuration interacting with the surface minus the energy of the cluster in its lowest-energy gas-phase configuration (a positive quantity); (iv) the oxide distortion energy (ΔE), evaluated subtracting the energy of the relaxed isolated defected oxide from the energy of the isolated defected oxide in the interacting configuration; and (v) the total binding energy (E), which is the sum of the binding energy of the metal cluster, the adhesion energy and the oxide distortion energy (E=E+E-ΔE). Note that the total binding energy of gas-phase clusters in their global minima can be obtained by summing E+E.
Combining satellite photographs and raster lidar data for channel connectivity in tidal marshes.
NASA Astrophysics Data System (ADS)
Li, Zhi; Hodges, Ben
2017-04-01
High resolution airborne lidar is capable of providing topographic detail down to the 1 x 1 m scale or finer over large tidal marshes of a river delta. Such data sets can be challenging to develop and ground-truth due to the inherent complexities of the environment, the relatively small changes in elevation throughout a marsh, and practical difficulties in accessing the variety of flooded, dry, and muddy regions. Standard lidar point-cloud processing techniques (as typically applied in large lidar data collection program) have a tendency to mis-identify narrow channels and water connectivity in a marsh, which makes it difficult to directly use such data for modeling marsh flows. Unfortunately, it is not always practical, or even possible, to access the point cloud and re-analyze the raw lidar data when discrepancies have been found in a raster work product. Faced with this problem in preparing a model of the Trinity River delta (Texas, USA), we developed an approach to integrating analysis of a lidar-based raster with satellite images. Our primary goal was to identify the clear land/water boundaries needed to identify channelization in the available rasterized lidar data. The channel extraction method uses pixelized satellite photographs that are stretched/distorted with image-processing techniques to match identifiable control features in both lidar and photographic data sets. A kmeans clustering algorithm was applied cluster pixels based on their colors, which is effective in separating land and water in a satellite photograph. The clustered image was matched to the lidar data such that the combination shows the channel network. In effect, we are able to use the fact that the satellite photograph is higher resolution than the lidar data, and thus provides connectivity in the clustering at a finer scale. The principal limitation of the method is the where the satellite image and lidar suffer from similar problems For example, vegetation overhanging a narrow channel might show up as higher-elevation land in the lidar data an also as a non-water cluster color in the satellite photo.
Managing distance and covariate information with point-based clustering.
Whigham, Peter A; de Graaf, Brandon; Srivastava, Rashmi; Glue, Paul
2016-09-01
Geographic perspectives of disease and the human condition often involve point-based observations and questions of clustering or dispersion within a spatial context. These problems involve a finite set of point observations and are constrained by a larger, but finite, set of locations where the observations could occur. Developing a rigorous method for pattern analysis in this context requires handling spatial covariates, a method for constrained finite spatial clustering, and addressing bias in geographic distance measures. An approach, based on Ripley's K and applied to the problem of clustering with deliberate self-harm (DSH), is presented. Point-based Monte-Carlo simulation of Ripley's K, accounting for socio-economic deprivation and sources of distance measurement bias, was developed to estimate clustering of DSH at a range of spatial scales. A rotated Minkowski L1 distance metric allowed variation in physical distance and clustering to be assessed. Self-harm data was derived from an audit of 2 years' emergency hospital presentations (n = 136) in a New Zealand town (population ~50,000). Study area was defined by residential (housing) land parcels representing a finite set of possible point addresses. Area-based deprivation was spatially correlated. Accounting for deprivation and distance bias showed evidence for clustering of DSH for spatial scales up to 500 m with a one-sided 95 % CI, suggesting that social contagion may be present for this urban cohort. Many problems involve finite locations in geographic space that require estimates of distance-based clustering at many scales. A Monte-Carlo approach to Ripley's K, incorporating covariates and models for distance bias, are crucial when assessing health-related clustering. The case study showed that social network structure defined at the neighbourhood level may account for aspects of neighbourhood clustering of DSH. Accounting for covariate measures that exhibit spatial clustering, such as deprivation, are crucial when assessing point-based clustering.
Lalonde, Michel; Wells, R Glenn; Birnie, David; Ruddy, Terrence D; Wassenaar, Richard
2014-07-01
Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard; Wells, R. Glenn
2014-07-15
Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: Aboutmore » 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster analysis results were similar to SPECT RNA phase analysis (ROC AUC = 0.78, p = 0.73 vs cluster AUC; sensitivity/specificity = 59%/89%) and PET scar size analysis (ROC AUC = 0.73, p = 1.0 vs cluster AUC; sensitivity/specificity = 76%/67%). Conclusions: A SPECT RNA cluster analysis algorithm was developed for the prediction of CRT outcome. Cluster analysis results produced results equivalent to those obtained from Fourier and scar analysis.« less
Acoustical sensing of cardiomyocyte cluster beating
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tymchenko, Nina; Kunze, Angelika; Dahlenborg, Kerstin
2013-06-14
Highlights: •An example of the application of QCM-D to live cell studies. •Detection of human pluripotent stem cell-derived cardiomyocyte cluster beating. •Clusters were studied in a thin liquid film and in a large liquid volume. •The QCM-D beating profile provides an individual fingerprint of the hPS-CMCs. -- Abstract: Spontaneously beating human pluripotent stem cell-derived cardiomyocytes clusters (CMCs) represent an excellent in vitro tool for studies of human cardiomyocyte function and for pharmacological cardiac safety assessment. Such testing typically requires highly trained operators, precision plating, or large cell quantities, and there is a demand for real-time, label-free monitoring of small cellmore » quantities, especially rare cells and tissue-like structures. Array formats based on sensing of electrical or optical properties of cells are being developed and in use by the pharmaceutical industry. A potential alternative to these techniques is represented by the quartz crystal microbalance with dissipation monitoring (QCM-D) technique, which is an acoustic surface sensitive technique that measures changes in mass and viscoelastic properties close to the sensor surface (from nm to μm). There is an increasing number of studies where QCM-D has successfully been applied to monitor properties of cells and cellular processes. In the present study, we show that spontaneous beating of CMCs on QCM-D sensors can be clearly detected, both in the frequency and the dissipation signals. Beating rates in the range of 66–168 bpm for CMCs were detected and confirmed by simultaneous light microscopy. The QCM-D beating profile was found to provide individual fingerprints of the hPS-CMCs. The presented results point towards acoustical assays for evaluation cardiotoxicity.« less
Alaulamie, Arwa A; Baral, Susil; Johnson, Samuel C; Richardson, Hugh H
2017-01-01
An optical nanothermometer technique based on laser trapping, moving and targeted attaching an erbium oxide nanoparticle cluster is developed to measure the local temperature. The authors apply this new nanoscale temperature measuring technique (limited by the size of the nanoparticles) to measure the temperature of vapor nucleation in water. Vapor nucleation is observed after superheating water above the boiling point for degassed and nondegassed water. The average nucleation temperature for water without gas is 560 K but this temperature is lowered by 100 K when gas is introduced into the water. The authors are able to measure the temperature inside the bubble during bubble formation and find that the temperature inside the bubble spikes to over 1000 K because the heat source (optically-heated nanorods) is no longer connected to liquid water and heat dissipation is greatly reduced. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Comparative assessment of bone pose estimation using Point Cluster Technique and OpenSim.
Lathrop, Rebecca L; Chaudhari, Ajit M W; Siston, Robert A
2011-11-01
Estimating the position of the bones from optical motion capture data is a challenge associated with human movement analysis. Bone pose estimation techniques such as the Point Cluster Technique (PCT) and simulations of movement through software packages such as OpenSim are used to minimize soft tissue artifact and estimate skeletal position; however, using different methods for analysis may produce differing kinematic results which could lead to differences in clinical interpretation such as a misclassification of normal or pathological gait. This study evaluated the differences present in knee joint kinematics as a result of calculating joint angles using various techniques. We calculated knee joint kinematics from experimental gait data using the standard PCT, the least squares approach in OpenSim applied to experimental marker data, and the least squares approach in OpenSim applied to the results of the PCT algorithm. Maximum and resultant RMS differences in knee angles were calculated between all techniques. We observed differences in flexion/extension, varus/valgus, and internal/external rotation angles between all approaches. The largest differences were between the PCT results and all results calculated using OpenSim. The RMS differences averaged nearly 5° for flexion/extension angles with maximum differences exceeding 15°. Average RMS differences were relatively small (< 1.08°) between results calculated within OpenSim, suggesting that the choice of marker weighting is not critical to the results of the least squares inverse kinematics calculations. The largest difference between techniques appeared to be a constant offset between the PCT and all OpenSim results, which may be due to differences in the definition of anatomical reference frames, scaling of musculoskeletal models, and/or placement of virtual markers within OpenSim. Different methods for data analysis can produce largely different kinematic results, which could lead to the misclassification of normal or pathological gait. Improved techniques to allow non-uniform scaling of generic models to more accurately reflect subject-specific bone geometries and anatomical reference frames may reduce differences between bone pose estimation techniques and allow for comparison across gait analysis platforms.
Wolf, Antje; Kirschner, Karl N
2013-02-01
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
Fast ground filtering for TLS data via Scanline Density Analysis
NASA Astrophysics Data System (ADS)
Che, Erzhuo; Olsen, Michael J.
2017-07-01
Terrestrial Laser Scanning (TLS) efficiently collects 3D information based on lidar (light detection and ranging) technology. TLS has been widely used in topographic mapping, engineering surveying, forestry, industrial facilities, cultural heritage, and so on. Ground filtering is a common procedure in lidar data processing, which separates the point cloud data into ground points and non-ground points. Effective ground filtering is helpful for subsequent procedures such as segmentation, classification, and modeling. Numerous ground filtering algorithms have been developed for Airborne Laser Scanning (ALS) data. However, many of these are error prone in application to TLS data because of its different angle of view and highly variable resolution. Further, many ground filtering techniques are limited in application within challenging topography and experience difficulty coping with some objects such as short vegetation, steep slopes, and so forth. Lastly, due to the large size of point cloud data, operations such as data traversing, multiple iterations, and neighbor searching significantly affect the computation efficiency. In order to overcome these challenges, we present an efficient ground filtering method for TLS data via a Scanline Density Analysis, which is very fast because it exploits the grid structure storing TLS data. The process first separates the ground candidates, density features, and unidentified points based on an analysis of point density within each scanline. Second, a region growth using the scan pattern is performed to cluster the ground candidates and further refine the ground points (clusters). In the experiment, the effectiveness, parameter robustness, and efficiency of the proposed method is demonstrated with datasets collected from an urban scene and a natural scene, respectively.
Galaxy Distribution in Clusters of Galaxies
NASA Astrophysics Data System (ADS)
Okamoto, T.; Yachi, S.; Habe, A.
beta-discrepancy have been pointed out from comparison of optical and X-ray observations of clusters of galaxies. To examine physical reason of beta-discrepancy, we use N-body simulation which contains two components, dark particles and galaxies which are identified by using adaptive-linking friend of friend technique at a certain red-shift. The gas component is not included here, since the gas distribution follows the dark matter distribution in dark halos (Jubio F. Navarro, Carlos S. Frenk and Simon D. M. White 1995). We find that the galaxy distribution follows the dark matter distribution, therefore beta-discrepancy does not exist, and this result is consistent with the interpretation of the beta-discrepancy by Bahcall and Lubin (1994), which was based on recent observation.
Simulation studies for surfaces and materials strength
NASA Technical Reports Server (NTRS)
Halicioglu, Timur
1987-01-01
A realistic potential energy function comprising angle dependent terms was employed to describe the potential surface of the N+O2 system. The potential energy parameters were obtained from high level ab-initio results using a nonlinear fitting procedure. It was shown that the potential function is able to reproduce a large number of points on the potential surface with a small rms deviation. A literature survey was conducted to analyze exclusively the status of current small cluster research. This survey turned out to be quite useful in understanding and finding out the existing relationship between theoretical as well as experimental investigative techniques employed by different researchers. Additionally, the importance of the role played by computer simulation in small cluster research, was documented.
NASA Astrophysics Data System (ADS)
Kumar, Rakesh; Chandrawat, Rajesh Kumar; Garg, B. P.; Joshi, Varun
2017-07-01
Opening the new firm or branch with desired execution is very relevant to facility location problem. Along the lines to locate the new ambulances and firehouses, the government desires to minimize average response time for emergencies from all residents of cities. So finding the best location is biggest challenge in day to day life. These type of problems were named as facility location problems. A lot of algorithms have been developed to handle these problems. In this paper, we review five algorithms that were applied to facility location problems. The significance of clustering in facility location problems is also presented. First we compare Fuzzy c-means clustering (FCM) algorithm with alternating heuristic (AH) algorithm, then with Particle Swarm Optimization (PSO) algorithms using different type of distance function. The data was clustered with the help of FCM and then we apply median model and min-max problem model on that data. After finding optimized locations using these algorithms we find the distance from optimized location point to the demanded point with different distance techniques and compare the results. At last, we design a general example to validate the feasibility of the five algorithms for facilities location optimization, and authenticate the advantages and drawbacks of them.
Mammographic images segmentation based on chaotic map clustering algorithm
2014-01-01
Background This work investigates the applicability of a novel clustering approach to the segmentation of mammographic digital images. The chaotic map clustering algorithm is used to group together similar subsets of image pixels resulting in a medically meaningful partition of the mammography. Methods The image is divided into pixels subsets characterized by a set of conveniently chosen features and each of the corresponding points in the feature space is associated to a map. A mutual coupling strength between the maps depending on the associated distance between feature space points is subsequently introduced. On the system of maps, the simulated evolution through chaotic dynamics leads to its natural partitioning, which corresponds to a particular segmentation scheme of the initial mammographic image. Results The system provides a high recognition rate for small mass lesions (about 94% correctly segmented inside the breast) and the reproduction of the shape of regions with denser micro-calcifications in about 2/3 of the cases, while being less effective on identification of larger mass lesions. Conclusions We can summarize our analysis by asserting that due to the particularities of the mammographic images, the chaotic map clustering algorithm should not be used as the sole method of segmentation. It is rather the joint use of this method along with other segmentation techniques that could be successfully used for increasing the segmentation performance and for providing extra information for the subsequent analysis stages such as the classification of the segmented ROI. PMID:24666766
NASA Astrophysics Data System (ADS)
Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.
2018-04-01
Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
IDENTIFICATION OF MEMBERS IN THE CENTRAL AND OUTER REGIONS OF GALAXY CLUSTERS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Serra, Ana Laura; Diaferio, Antonaldo, E-mail: serra@ph.unito.it
2013-05-10
The caustic technique measures the mass of galaxy clusters in both their virial and infall regions and, as a byproduct, yields the list of cluster galaxy members. Here we use 100 galaxy clusters with mass M{sub 200} {>=} 10{sup 14} h {sup -1} M{sub Sun} extracted from a cosmological N-body simulation of a {Lambda}CDM universe to test the ability of the caustic technique to identify the cluster galaxy members. We identify the true three-dimensional members as the gravitationally bound galaxies. The caustic technique uses the caustic location in the redshift diagram to separate the cluster members from the interlopers. Wemore » apply the technique to mock catalogs containing 1000 galaxies in the field of view of 12 h {sup -1} Mpc on a side at the cluster location. On average, this sample size roughly corresponds to 180 real galaxy members within 3r{sub 200}, similar to recent redshift surveys of cluster regions. The caustic technique yields a completeness, the fraction of identified true members, f{sub c} = 0.95 {+-} 0.03, within 3r{sub 200}. The contamination, the fraction of interlopers in the observed catalog of members, increases from f{sub i}=0.020{sup +0.046}{sub -0.015} at r{sub 200} to f{sub i}=0.08{sup +0.11}{sub -0.05} at 3r{sub 200}. No other technique for the identification of the members of a galaxy cluster provides such large completeness and small contamination at these large radii. The caustic technique assumes spherical symmetry and the asphericity of the cluster is responsible for most of the spread of the completeness and the contamination. By applying the technique to an approximately spherical system obtained by stacking the individual clusters, the spreads decrease by at least a factor of two. We finally estimate the cluster mass within 3r{sub 200} after removing the interlopers: for individual clusters, the mass estimated with the virial theorem is unbiased and within 30% of the actual mass; this spread decreases to less than 10% for the spherically symmetric stacked cluster.« less
NASA Technical Reports Server (NTRS)
Hada, M.; Saganti, P. B.; Gersey, B.; Wilkins, R.; Cucinotta, F. A.; Wu, H.
2007-01-01
Most of the reported studies of break point distribution on the damaged chromosomes from radiation exposure were carried out with the G-banding technique or determined based on the relative length of the broken chromosomal fragments. However, these techniques lack the accuracy in comparison with the later developed multicolor banding in situ hybridization (mBAND) technique that is generally used for analysis of intrachromosomal aberrations such as inversions. Using mBAND, we studied chromosome aberrations in human epithelial cells exposed in vitro to both low or high dose rate gamma rays in Houston, low dose rate secondary neutrons at Los Alamos National Laboratory and high dose rate 600 MeV/u Fe ions at NASA Space Radiation Laboratory. Detailed analysis of the inversion type revealed that all of the three radiation types induced a low incidence of simple inversions. Half of the inversions observed after neutron or Fe ion exposure, and the majority of inversions in gamma-irradiated samples were accompanied by other types of intrachromosomal aberrations. In addition, neutrons and Fe ions induced a significant fraction of inversions that involved complex rearrangements of both inter- and intrachromosome exchanges. We further compared the distribution of break point on chromosome 3 for the three radiation types. The break points were found to be randomly distributed on chromosome 3 after neutrons or Fe ions exposure, whereas non-random distribution with clustering break points was observed for gamma-rays. The break point distribution may serve as a potential fingerprint of high-LET radiation exposure.
An adaptive clustering algorithm for image matching based on corner feature
NASA Astrophysics Data System (ADS)
Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song
2018-04-01
The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.
A curvature-based weighted fuzzy c-means algorithm for point clouds de-noising
NASA Astrophysics Data System (ADS)
Cui, Xin; Li, Shipeng; Yan, Xiutian; He, Xinhua
2018-04-01
In order to remove the noise of three-dimensional scattered point cloud and smooth the data without damnify the sharp geometric feature simultaneity, a novel algorithm is proposed in this paper. The feature-preserving weight is added to fuzzy c-means algorithm which invented a curvature weighted fuzzy c-means clustering algorithm. Firstly, the large-scale outliers are removed by the statistics of r radius neighboring points. Then, the algorithm estimates the curvature of the point cloud data by using conicoid parabolic fitting method and calculates the curvature feature value. Finally, the proposed clustering algorithm is adapted to calculate the weighted cluster centers. The cluster centers are regarded as the new points. The experimental results show that this approach is efficient to different scale and intensities of noise in point cloud with a high precision, and perform a feature-preserving nature at the same time. Also it is robust enough to different noise model.
NASA Astrophysics Data System (ADS)
Singh, Jagadish; Taura, Joel John
2014-06-01
This paper studies the motion of an infinitesimal mass in the framework of the restricted three-body problem (R3BP) under the assumption that the primaries of the system are radiating-oblate spheroids, enclosed by a circular cluster of material points. It examines the effects of radiation and oblateness up to J 4 of the primaries and the potential created by the circular cluster, on the linear stability of the liberation locations of the infinitesimal mass. The liberation points are found to be stable for 0< μ< μ c and unstable for , where μ c is the critical mass value depending on terms which involve parameters that characterize the oblateness, radiation forces and the circular cluster of material points. The oblateness up to J 4 of the primaries and the gravitational potential from the circular cluster of material points have stabilizing propensities, while the radiation of the primaries and the oblateness up to J 2 of the primaries have destabilizing tendencies. The combined effect of these perturbations on the stability of the triangular liberation points is that, it has stabilizing propensity.
Goal Profiles, Mental Toughness and its Influence on Performance Outcomes among Wushu Athletes
Roy, Jolly
2007-01-01
This study examined the association between goal orientations and mental toughness and its influence on performance outcomes in competition. Wushu athletes (n = 40) competing in Intervarsity championships in Malaysia completed Task and Ego Orientations in Sport Questionnaire (TEOSQ) and Psychological Performance Inventory (PPI). Using cluster analysis techniques including hierarchical methods and the non-hierarchical method (k-means cluster) to examine goal profiles, a three cluster solution emerged viz. cluster 1 - high task and moderate ego (HT/ME), cluster 2 - moderate task and low ego (MT/LE) and, cluster 3 - moderate task and moderate ego (MT/ME). Analysis of the fundamental areas of mental toughness based on goal profiles revealed that athletes in cluster 1 scored significantly higher on negative energy control than athletes in cluster 2. Further, athletes in cluster 1 also scored significantly higher on positive energy control than athletes in cluster 3. Chi-square (χ2) test revealed no significant differences among athletes with different goal profiles on performance outcomes in the competition. However, significant differences were observed between athletes (medallist and non medallist) in self- confidence (p = 0.001) and negative energy control (p = 0.042). Medallist’s scored significantly higher on self-confidence (mean = 21.82 ± 2.72) and negative energy control (mean = 19.59 ± 2.32) than the non-medallists (self confidence-mean = 18.76 ± 2.49; negative energy control mean = 18.14 ± 1.91). Key points Mental toughness can be influenced by certain goal profile combination. Athletes with successful outcomes in performance (medallist) displayed greater mental toughness. PMID:24198700
[Applying the clustering technique for characterising maintenance outsourcing].
Cruz, Antonio M; Usaquén-Perilla, Sandra P; Vanegas-Pabón, Nidia N; Lopera, Carolina
2010-06-01
Using clustering techniques for characterising companies providing health institutions with maintenance services. The study analysed seven pilot areas' equipment inventory (264 medical devices). Clustering techniques were applied using 26 variables. Response time (RT), operation duration (OD), availability and turnaround time (TAT) were amongst the most significant ones. Average biomedical equipment obsolescence value was 0.78. Four service provider clusters were identified: clusters 1 and 3 had better performance, lower TAT, RT and DR values (56 % of the providers coded O, L, C, B, I, S, H, F and G, had 1 to 4 day TAT values:
From the Cluster Temperature Function to the Mass Function at Low Z
NASA Technical Reports Server (NTRS)
Mushotzky, Richard (Technical Monitor); Markevitch, Maxim
2004-01-01
This XMM project consisted of three observations of the nearby, hot galaxy cluster Triangulum Australis, one of the cluster center and two offsets. The goal was to measure the radial gas temperature profile out to large radii and derive the total gravitating mass within the radius of average mass overdensity 500. The central pointing also provides data for a detailed two-dimensional gas temperature map of this interesting cluster. We have analyzed all three observations. The derivation of the temperature map using the central pointing is complete, and the paper is soon to be submitted. During the course of this study and of the analysis of archival XMM cluster observations, it became apparent that the commonly used XMM background flare screening techniques are often not accurate enough for studies of the cluster outer regions. The information on the cluster's total masses is contained at large off-center distances, and it is precisely the temperatures for those low-brightness regions that are most affected by the detector background anomalies. In particular, our two offset observations of the Triangulum have been contaminated by the background flares ("bad cosmic weather") to a degree where they could not be used for accurate spectral analysis. This forced us to expand the scope of our project. We needed to devise a more accurate method of screening and modeling the background flares, and to evaluate the uncertainty of the XMM background modeling. To do this, we have analyzed a large number of archival EPIC blank-field and closed-cover observations. As a result, we have derived stricter background screening criteria. It also turned out that mild flares affecting EPIC-pn can be modeled with an adequate accuracy. Such modeling has been used to derive our Triangulum temperature map. The results of our XMM background analysis, including the modeling recipes, are presented in a paper which is in final preparation and will be submitted soon. It will be useful not only for our future analysis but for other XMM cluster observations as well.
Using Machine Learning Techniques in the Analysis of Oceanographic Data
NASA Astrophysics Data System (ADS)
Falcinelli, K. E.; Abuomar, S.
2017-12-01
Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.
Method for discovering relationships in data by dynamic quantum clustering
Weinstein, Marvin; Horn, David
2017-05-09
Data clustering is provided according to a dynamical framework based on quantum mechanical time evolution of states corresponding to data points. To expedite computations, we can approximate the time-dependent Hamiltonian formalism by a truncated calculation within a set of Gaussian wave-functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition and/or feature filtering.
Method for discovering relationships in data by dynamic quantum clustering
Weinstein, Marvin; Horn, David
2014-10-28
Data clustering is provided according to a dynamical framework based on quantum mechanical time evolution of states corresponding to data points. To expedite computations, we can approximate the time-dependent Hamiltonian formalism by a truncated calculation within a set of Gaussian wave-functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition and/or feature filtering.
Pellegrini, Michael; Zoghi, Maryam; Jaberzadeh, Shapour
2018-01-12
Cluster analysis and other subgrouping techniques have risen in popularity in recent years in non-invasive brain stimulation research in the attempt to investigate the issue of inter-individual variability - the issue of why some individuals respond, as traditionally expected, to non-invasive brain stimulation protocols and others do not. Cluster analysis and subgrouping techniques have been used to categorise individuals, based on their response patterns, as responder or non-responders. There is, however, a lack of consensus and consistency on the most appropriate technique to use. This systematic review aimed to provide a systematic summary of the cluster analysis and subgrouping techniques used to date and suggest recommendations moving forward. Twenty studies were included that utilised subgrouping techniques, while seven of these additionally utilised cluster analysis techniques. The results of this systematic review appear to indicate that statistical cluster analysis techniques are effective in identifying subgroups of individuals based on response patterns to non-invasive brain stimulation. This systematic review also reports a lack of consensus amongst researchers on the most effective subgrouping technique and the criteria used to determine whether an individual is categorised as a responder or a non-responder. This systematic review provides a step-by-step guide to carrying out statistical cluster analyses and subgrouping techniques to provide a framework for analysis when developing further insights into the contributing factors of inter-individual variability in response to non-invasive brain stimulation.
2011-01-01
Background The Prospective Space-Time scan statistic (PST) is widely used for the evaluation of space-time clusters of point event data. Usually a window of cylindrical shape is employed, with a circular or elliptical base in the space domain. Recently, the concept of Minimum Spanning Tree (MST) was applied to specify the set of potential clusters, through the Density-Equalizing Euclidean MST (DEEMST) method, for the detection of arbitrarily shaped clusters. The original map is cartogram transformed, such that the control points are spread uniformly. That method is quite effective, but the cartogram construction is computationally expensive and complicated. Results A fast method for the detection and inference of point data set space-time disease clusters is presented, the Voronoi Based Scan (VBScan). A Voronoi diagram is built for points representing population individuals (cases and controls). The number of Voronoi cells boundaries intercepted by the line segment joining two cases points defines the Voronoi distance between those points. That distance is used to approximate the density of the heterogeneous population and build the Voronoi distance MST linking the cases. The successive removal of edges from the Voronoi distance MST generates sub-trees which are the potential space-time clusters. Finally, those clusters are evaluated through the scan statistic. Monte Carlo replications of the original data are used to evaluate the significance of the clusters. An application for dengue fever in a small Brazilian city is presented. Conclusions The ability to promptly detect space-time clusters of disease outbreaks, when the number of individuals is large, was shown to be feasible, due to the reduced computational load of VBScan. Instead of changing the map, VBScan modifies the metric used to define the distance between cases, without requiring the cartogram construction. Numerical simulations showed that VBScan has higher power of detection, sensitivity and positive predicted value than the Elliptic PST. Furthermore, as VBScan also incorporates topological information from the point neighborhood structure, in addition to the usual geometric information, it is more robust than purely geometric methods such as the elliptic scan. Those advantages were illustrated in a real setting for dengue fever space-time clusters. PMID:21513556
Synchronization of world economic activity
NASA Astrophysics Data System (ADS)
Groth, Andreas; Ghil, Michael
2017-12-01
Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.
Synchronization of world economic activity.
Groth, Andreas; Ghil, Michael
2017-12-01
Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.
NASA Astrophysics Data System (ADS)
Al-Mousa, Amjed A.
Thin films are essential constituents of modern electronic devices and have a multitude of applications in such devices. The impact of the surface morphology of thin films on the device characteristics where these films are used has generated substantial attention to advanced film characterization techniques. In this work, we present a new approach to characterize surface nanostructures of thin films by focusing on isolating nanostructures and extracting quantitative information, such as the shape and size of the structures. This methodology is applicable to any Scanning Probe Microscopy (SPM) data, such as Atomic Force Microscopy (AFM) data which we are presenting here. The methodology starts by compensating the AFM data for some specific classes of measurement artifacts. After that, the methodology employs two distinct techniques. The first, which we call the overlay technique, proceeds by systematically processing the raster data that constitute the scanning probe image in both vertical and horizontal directions. It then proceeds by classifying points in each direction separately. Finally, the results from both the horizontal and the vertical subsets are overlaid, where a final decision on each surface point is made. The second technique, based on fuzzy logic, relies on a Fuzzy Inference Engine (FIE) to classify the surface points. Once classified, these points are clustered into surface structures. The latter technique also includes a mechanism which can consistently distinguish crowded surfaces from those with sparsely distributed structures and then tune the fuzzy technique system uniquely for that surface. Both techniques have been applied to characterize organic semiconductor thin films of pentacene on different substrates. Also, we present a case study to demonstrate the effectiveness of our methodology to identify quantitatively particle sizes of two specimens of gold nanoparticles of different nominal dimensions dispersed on a mica surface. A comparison with other techniques like: thresholding, watershed and edge detection is presented next. Finally, we present a systematic study of the fuzzy logic technique by experimenting with synthetic data. These results are discussed and compared along with the challenges of the two techniques.
NASA Astrophysics Data System (ADS)
Bassier, M.; Bonduel, M.; Van Genechten, B.; Vergauwen, M.
2017-11-01
Point cloud segmentation is a crucial step in scene understanding and interpretation. The goal is to decompose the initial data into sets of workable clusters with similar properties. Additionally, it is a key aspect in the automated procedure from point cloud data to BIM. Current approaches typically only segment a single type of primitive such as planes or cylinders. Also, current algorithms suffer from oversegmenting the data and are often sensor or scene dependent. In this work, a method is presented to automatically segment large unstructured point clouds of buildings. More specifically, the segmentation is formulated as a graph optimisation problem. First, the data is oversegmented with a greedy octree-based region growing method. The growing is conditioned on the segmentation of planes as well as smooth surfaces. Next, the candidate clusters are represented by a Conditional Random Field after which the most likely configuration of candidate clusters is computed given a set of local and contextual features. The experiments prove that the used method is a fast and reliable framework for unstructured point cloud segmentation. Processing speeds up to 40,000 points per second are recorded for the region growing. Additionally, the recall and precision of the graph clustering is approximately 80%. Overall, nearly 22% of oversegmentation is reduced by clustering the data. These clusters will be classified and used as a basis for the reconstruction of BIM models.
K, Punith; K, Lalitha; G, Suman; Bs, Pradeep; Kumar K, Jayanth
2008-07-01
Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Population-based cross-sectional study. Areas under Mathikere Urban Health Center. Children aged 12 months to 23 months. 220 in cluster sampling, 76 in lot quality assurance sampling. Percentages and Proportions, Chi square Test. (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area.
The computational core and fixed point organization in Boolean networks
NASA Astrophysics Data System (ADS)
Correale, L.; Leone, M.; Pagnani, A.; Weigt, M.; Zecchina, R.
2006-03-01
In this paper, we analyse large random Boolean networks in terms of a constraint satisfaction problem. We first develop an algorithmic scheme which allows us to prune simple logical cascades and underdetermined variables, returning thereby the computational core of the network. Second, we apply the cavity method to analyse the number and organization of fixed points. We find in particular a phase transition between an easy and a complex regulatory phase, the latter being characterized by the existence of an exponential number of macroscopically separated fixed point clusters. The different techniques developed are reinterpreted as algorithms for the analysis of single Boolean networks, and they are applied in the analysis of and in silico experiments on the gene regulatory networks of baker's yeast (Saccharomyces cerevisiae) and the segment-polarity genes of the fruitfly Drosophila melanogaster.
NASA Astrophysics Data System (ADS)
Lyakh, Dmitry I.
2018-03-01
A novel reduced-scaling, general-order coupled-cluster approach is formulated by exploiting hierarchical representations of many-body tensors, combined with the recently suggested formalism of scale-adaptive tensor algebra. Inspired by the hierarchical techniques from the renormalisation group approach, H/H2-matrix algebra and fast multipole method, the computational scaling reduction in our formalism is achieved via coarsening of quantum many-body interactions at larger interaction scales, thus imposing a hierarchical structure on many-body tensors of coupled-cluster theory. In our approach, the interaction scale can be defined on any appropriate Euclidean domain (spatial domain, momentum-space domain, energy domain, etc.). We show that the hierarchically resolved many-body tensors can reduce the storage requirements to O(N), where N is the number of simulated quantum particles. Subsequently, we prove that any connected many-body diagram consisting of a finite number of arbitrary-order tensors, e.g. an arbitrary coupled-cluster diagram, can be evaluated in O(NlogN) floating-point operations. On top of that, we suggest an additional approximation to further reduce the computational complexity of higher order coupled-cluster equations, i.e. equations involving higher than double excitations, which otherwise would introduce a large prefactor into formal O(NlogN) scaling.
3D reconstruction from non-uniform point clouds via local hierarchical clustering
NASA Astrophysics Data System (ADS)
Yang, Jiaqi; Li, Ruibo; Xiao, Yang; Cao, Zhiguo
2017-07-01
Raw scanned 3D point clouds are usually irregularly distributed due to the essential shortcomings of laser sensors, which therefore poses a great challenge for high-quality 3D surface reconstruction. This paper tackles this problem by proposing a local hierarchical clustering (LHC) method to improve the consistency of point distribution. Specifically, LHC consists of two steps: 1) adaptive octree-based decomposition of 3D space, and 2) hierarchical clustering. The former aims at reducing the computational complexity and the latter transforms the non-uniform point set into uniform one. Experimental results on real-world scanned point clouds validate the effectiveness of our method from both qualitative and quantitative aspects.
On the estimation of the current density in space plasmas: Multi- versus single-point techniques
NASA Astrophysics Data System (ADS)
Perri, Silvia; Valentini, Francesco; Sorriso-Valvo, Luca; Reda, Antonio; Malara, Francesco
2017-06-01
Thanks to multi-spacecraft mission, it has recently been possible to directly estimate the current density in space plasmas, by using magnetic field time series from four satellites flying in a quasi perfect tetrahedron configuration. The technique developed, commonly called ;curlometer; permits a good estimation of the current density when the magnetic field time series vary linearly in space. This approximation is generally valid for small spacecraft separation. The recent space missions Cluster and Magnetospheric Multiscale (MMS) have provided high resolution measurements with inter-spacecraft separation up to 100 km and 10 km, respectively. The former scale corresponds to the proton gyroradius/ion skin depth in ;typical; solar wind conditions, while the latter to sub-proton scale. However, some works have highlighted an underestimation of the current density via the curlometer technique with respect to the current computed directly from the velocity distribution functions, measured at sub-proton scales resolution with MMS. In this paper we explore the limit of the curlometer technique studying synthetic data sets associated to a cluster of four artificial satellites allowed to fly in a static turbulent field, spanning a wide range of relative separation. This study tries to address the relative importance of measuring plasma moments at very high resolution from a single spacecraft with respect to the multi-spacecraft missions in the current density evaluation.
Application of dynamic topic models to toxicogenomics data.
Lee, Mikyung; Liu, Zhichao; Huang, Ruili; Tong, Weida
2016-10-06
All biological processes are inherently dynamic. Biological systems evolve transiently or sustainably according to sequential time points after perturbation by environment insults, drugs and chemicals. Investigating the temporal behavior of molecular events has been an important subject to understand the underlying mechanisms governing the biological system in response to, such as, drug treatment. The intrinsic complexity of time series data requires appropriate computational algorithms for data interpretation. In this study, we propose, for the first time, the application of dynamic topic models (DTM) for analyzing time-series gene expression data. A large time-series toxicogenomics dataset was studied. It contains over 3144 microarrays of gene expression data corresponding to rat livers treated with 131 compounds (most are drugs) at two doses (control and high dose) in a repeated schedule containing four separate time points (4-, 8-, 15- and 29-day). We analyzed, with DTM, the topics (consisting of a set of genes) and their biological interpretations over these four time points. We identified hidden patterns embedded in this time-series gene expression profiles. From the topic distribution for compound-time condition, a number of drugs were successfully clustered by their shared mode-of-action such as PPARɑ agonists and COX inhibitors. The biological meaning underlying each topic was interpreted using diverse sources of information such as functional analysis of the pathways and therapeutic uses of the drugs. Additionally, we found that sample clusters produced by DTM are much more coherent in terms of functional categories when compared to traditional clustering algorithms. We demonstrated that DTM, a text mining technique, can be a powerful computational approach for clustering time-series gene expression profiles with the probabilistic representation of their dynamic features along sequential time frames. The method offers an alternative way for uncovering hidden patterns embedded in time series gene expression profiles to gain enhanced understanding of dynamic behavior of gene regulation in the biological system.
A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals
Castañón–Puga, Manuel; Salazar, Abby Stephanie; Aguilar, Leocundo; Gaxiola-Pacheco, Carelia; Licea, Guillermo
2015-01-01
The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs). This approach takes advantage of wireless local area networks (WLANs) over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi–Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information. PMID:26633417
A Novel Hybrid Intelligent Indoor Location Method for Mobile Devices by Zones Using Wi-Fi Signals.
Castañón-Puga, Manuel; Salazar, Abby Stephanie; Aguilar, Leocundo; Gaxiola-Pacheco, Carelia; Licea, Guillermo
2015-12-02
The increasing use of mobile devices in indoor spaces brings challenges to location methods. This work presents a hybrid intelligent method based on data mining and Type-2 fuzzy logic to locate mobile devices in an indoor space by zones using Wi-Fi signals from selected access points (APs). This approach takes advantage of wireless local area networks (WLANs) over other types of architectures and implements the complete method in a mobile application using the developed tools. Besides, the proposed approach is validated by experimental data obtained from case studies and the cross-validation technique. For the purpose of generating the fuzzy rules that conform to the Takagi-Sugeno fuzzy system structure, a semi-supervised data mining technique called subtractive clustering is used. This algorithm finds centers of clusters from the radius map given by the collected signals from APs. Measurements of Wi-Fi signals can be noisy due to several factors mentioned in this work, so this method proposed the use of Type-2 fuzzy logic for modeling and dealing with such uncertain information.
Melo, Armindo; Pinto, Edgar; Aguiar, Ana; Mansilha, Catarina; Pinho, Olívia; Ferreira, Isabel M P L V O
2012-07-01
A monitoring program of nitrate, nitrite, potassium, sodium, and pesticides was carried out in water samples from an intensive horticulture area in a vulnerable zone from north of Portugal. Eight collecting points were selected and water-analyzed in five sampling campaigns, during 1 year. Chemometric techniques, such as cluster analysis, principal component analysis (PCA), and discriminant analysis, were used in order to understand the impact of intensive horticulture practices on dug and drilled wells groundwater and to study variations in the hydrochemistry of groundwater. PCA performed on pesticide data matrix yielded seven significant PCs explaining 77.67% of the data variance. Although PCA rendered considerable data reduction, it could not clearly group and distinguish the sample types. However, a visible differentiation between the water samples was obtained. Cluster and discriminant analysis grouped the eight collecting points into three clusters of similar characteristics pertaining to water contamination, indicating that it is necessary to improve the use of water, fertilizers, and pesticides. Inorganic fertilizers such as potassium nitrate were suspected to be the most important factors for nitrate contamination since highly significant Pearson correlation (r = 0.691, P < 0.01) was obtained between groundwater nitrate and potassium contents. Water from dug wells is especially prone to contamination from the grower and their closer neighbor's practices. Water from drilled wells is also contaminated from distant practices.
Cross-correlating the γ-ray Sky with Catalogs of Galaxy Clusters
NASA Astrophysics Data System (ADS)
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro; Fornengo, Nicolao; Regis, Marco; Viel, Matteo; Xia, Jun-Qing
2017-01-01
We report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ-ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few to tens of megaparsecs, I.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, I.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ-ray emission from the intracluster medium. We argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.
Information extraction from dynamic PS-InSAR time series using machine learning
NASA Astrophysics Data System (ADS)
van de Kerkhof, B.; Pankratius, V.; Chang, L.; van Swol, R.; Hanssen, R. F.
2017-12-01
Due to the increasing number of SAR satellites, with shorter repeat intervals and higher resolutions, SAR data volumes are exploding. Time series analyses of SAR data, i.e. Persistent Scatterer (PS) InSAR, enable the deformation monitoring of the built environment at an unprecedented scale, with hundreds of scatterers per km2, updated weekly. Potential hazards, e.g. due to failure of aging infrastructure, can be detected at an early stage. Yet, this requires the operational data processing of billions of measurement points, over hundreds of epochs, updating this data set dynamically as new data come in, and testing whether points (start to) behave in an anomalous way. Moreover, the quality of PS-InSAR measurements is ambiguous and heterogeneous, which will yield false positives and false negatives. Such analyses are numerically challenging. Here we extract relevant information from PS-InSAR time series using machine learning algorithms. We cluster (group together) time series with similar behaviour, even though they may not be spatially close, such that the results can be used for further analysis. First we reduce the dimensionality of the dataset in order to be able to cluster the data, since applying clustering techniques on high dimensional datasets often result in unsatisfying results. Our approach is to apply t-distributed Stochastic Neighbor Embedding (t-SNE), a machine learning algorithm for dimensionality reduction of high-dimensional data to a 2D or 3D map, and cluster this result using Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The results show that we are able to detect and cluster time series with similar behaviour, which is the starting point for more extensive analysis into the underlying driving mechanisms. The results of the methods are compared to conventional hypothesis testing as well as a Self-Organising Map (SOM) approach. Hypothesis testing is robust and takes the stochastic nature of the observations into account, but is time consuming. Therefore, we successively apply our machine learning approach with the hypothesis testing approach in order to benefit from both the reduced computation time of the machine learning approach as from the robust quality metrics of hypothesis testing. We acknowledge support from NASA AISTNNX15AG84G (PI V. Pankratius)
RRW: repeated random walks on genome-scale protein networks for local cluster discovery
Macropol, Kathy; Can, Tolga; Singh, Ambuj K
2009-01-01
Background We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. Results We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values. Conclusion RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters. PMID:19740439
NASA Astrophysics Data System (ADS)
Arif, Shafaq; Rafique, M. Shahid; Saleemi, Farhat; Sagheer, Riffat; Naab, Fabian; Toader, Ovidiu; Mahmood, Arshad; Rashid, Rashad; Mahmood, Mazhar
2015-09-01
Ion implantation is a useful technique to modify surface properties of polymers without altering their bulk properties. The objective of this work is to explore the 400 keV C+ ion implantation effects on PMMA at different fluences ranging from 5 × 1013 to 5 × 1015 ions/cm2. The surface topographical examination of irradiated samples has been performed using Atomic Force Microscope (AFM). The structural and chemical modifications in implanted PMMA are examined by Raman and Fourier Infrared Spectroscopy (FTIR) respectively. The effects of carbon ion implantation on optical properties of PMMA are investigated by UV-Visible spectroscopy. The modifications in electrical conductivity have been measured using a four point probe technique. AFM images reveal a decrease in surface roughness of PMMA with an increase in ion fluence from 5 × 1014 to 5 × 1015 ions/cm2. The existence of amorphization and sp2-carbon clusterization has been confirmed by Raman and FTIR spectroscopic analysis. The UV-Visible data shows a prominent red shift in absorption edge as a function of ion fluence. This shift displays a continuous reduction in optical band gap (from 3.13 to 0.66 eV) due to formation of carbon clusters. Moreover, size of carbon clusters and photoconductivity are found to increase with increasing ion fluence. The ion-induced carbonaceous clusters are believed to be responsible for an increase in electrical conductivity of PMMA from (2.14 ± 0.06) × 10-10 (Ω-cm)-1 (pristine) to (0.32 ± 0.01) × 10-5 (Ω-cm)-1 (irradiated sample).
LOD-based clustering techniques for efficient large-scale terrain storage and visualization
NASA Astrophysics Data System (ADS)
Bao, Xiaohong; Pajarola, Renato
2003-05-01
Large multi-resolution terrain data sets are usually stored out-of-core. To visualize terrain data at interactive frame rates, the data needs to be organized on disk, loaded into main memory part by part, then rendered efficiently. Many main-memory algorithms have been proposed for efficient vertex selection and mesh construction. Organization of terrain data on disk is quite difficult because the error, the triangulation dependency and the spatial location of each vertex all need to be considered. Previous terrain clustering algorithms did not consider the per-vertex approximation error of individual terrain data sets. Therefore, the vertex sequences on disk are exactly the same for any terrain. In this paper, we propose a novel clustering algorithm which introduces the level-of-detail (LOD) information to terrain data organization to map multi-resolution terrain data to external memory. In our approach the LOD parameters of the terrain elevation points are reflected during clustering. The experiments show that dynamic loading and paging of terrain data at varying LOD is very efficient and minimizes page faults. Additionally, the preprocessing of this algorithm is very fast and works from out-of-core.
Coarse Point Cloud Registration by Egi Matching of Voxel Clusters
NASA Astrophysics Data System (ADS)
Wang, Jinhu; Lindenbergh, Roderik; Shen, Yueqian; Menenti, Massimo
2016-06-01
Laser scanning samples the surface geometry of objects efficiently and records versatile information as point clouds. However, often more scans are required to fully cover a scene. Therefore, a registration step is required that transforms the different scans into a common coordinate system. The registration of point clouds is usually conducted in two steps, i.e. coarse registration followed by fine registration. In this study an automatic marker-free coarse registration method for pair-wise scans is presented. First the two input point clouds are re-sampled as voxels and dimensionality features of the voxels are determined by principal component analysis (PCA). Then voxel cells with the same dimensionality are clustered. Next, the Extended Gaussian Image (EGI) descriptor of those voxel clusters are constructed using significant eigenvectors of each voxel in the cluster. Correspondences between clusters in source and target data are obtained according to the similarity between their EGI descriptors. The random sampling consensus (RANSAC) algorithm is employed to remove outlying correspondences until a coarse alignment is obtained. If necessary, a fine registration is performed in a final step. This new method is illustrated on scan data sampling two indoor scenarios. The results of the tests are evaluated by computing the point to point distance between the two input point clouds. The presented two tests resulted in mean distances of 7.6 mm and 9.5 mm respectively, which are adequate for fine registration.
Computer program documentation: ISOCLS iterative self-organizing clustering program, program C094
NASA Technical Reports Server (NTRS)
Minter, R. T. (Principal Investigator)
1972-01-01
The author has identified the following significant results. This program implements an algorithm which, ideally, sorts a given set of multivariate data points into similar groups or clusters. The program is intended for use in the evaluation of multispectral scanner data; however, the algorithm could be used for other data types as well. The user may specify a set of initial estimated cluster means to begin the procedure, or he may begin with the assumption that all the data belongs to one cluster. The procedure is initiatized by assigning each data point to the nearest (in absolute distance) cluster mean. If no initial cluster means were input, all of the data is assigned to cluster 1. The means and standard deviations are calculated for each cluster.
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Motkova, A. V.
2018-01-01
A strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters is considered. The solution criterion is the minimum of the sum (over both clusters) of weighted sums of squared distances from the elements of each cluster to its geometric center. The weights of the sums are equal to the cardinalities of the desired clusters. The center of one cluster is given as input, while the center of the other is unknown and is determined as the point of space equal to the mean of the cluster elements. A version of the problem is analyzed in which the cardinalities of the clusters are given as input. A polynomial-time 2-approximation algorithm for solving the problem is constructed.
Efficient Skeletonization of Volumetric Objects.
Zhou, Yong; Toga, Arthur W
1999-07-01
Skeletonization promises to become a powerful tool for compact shape description, path planning, and other applications. However, current techniques can seldom efficiently process real, complicated 3D data sets, such as MRI and CT data of human organs. In this paper, we present an efficient voxel-coding based algorithm for Skeletonization of 3D voxelized objects. The skeletons are interpreted as connected centerlines. consisting of sequences of medial points of consecutive clusters. These centerlines are initially extracted as paths of voxels, followed by medial point replacement, refinement, smoothness, and connection operations. The voxel-coding techniques have been proposed for each of these operations in a uniform and systematic fashion. In addition to preserving basic connectivity and centeredness, the algorithm is characterized by straightforward computation, no sensitivity to object boundary complexity, explicit extraction of ready-to-parameterize and branch-controlled skeletons, and efficient object hole detection. These issues are rarely discussed in traditional methods. A range of 3D medical MRI and CT data sets were used for testing the algorithm, demonstrating its utility.
Multiscale visual quality assessment for cluster analysis with self-organizing maps
NASA Astrophysics Data System (ADS)
Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias
2011-01-01
Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth
2008-01-01
Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474
Unsupervised classification of remote multispectral sensing data
NASA Technical Reports Server (NTRS)
Su, M. Y.
1972-01-01
The new unsupervised classification technique for classifying multispectral remote sensing data which can be either from the multispectral scanner or digitized color-separation aerial photographs consists of two parts: (a) a sequential statistical clustering which is a one-pass sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. Applications of the technique using an IBM-7094 computer on multispectral data sets over Purdue's Flight Line C-1 and the Yellowstone National Park test site have been accomplished. Comparisons between the classification maps by the unsupervised technique and the supervised maximum liklihood technique indicate that the classification accuracies are in agreement.
Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters
NASA Astrophysics Data System (ADS)
Mahmoud, E.; Shoukry, A.; Takey, A.
2018-04-01
Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values compared to published spectroscopic redshifts with a 0.029 standard deviation of their differences.
Molecular dynamical simulations of melting Al nanoparticles using a reaxff reactive force field
NASA Astrophysics Data System (ADS)
Liu, Junpeng; Wang, Mengjun; Liu, Pingan
2018-06-01
Molecular dynamics simulations were performed to study thermal properties and melting points of Al nanoparticles by using a reactive force field under canonical (NVT) ensembles. Al nanoparticles (particle size 2–4 nm) were considered in simulations. A combination of structural and thermodynamic parameters such as the Lindemann index, heat capacities, potential energy and radial-distribution functions was employed to decide melting points. We used annealing technique to obtain the initial Al nanoparticle model. Comparison was made between ReaxFF results and other simulation results. We found that ReaxFF force field is reasonable to describe Al cluster melting behavior. The linear relationship between particle size and melting points was found. After validating the ReaxFF force field, more attention was paid on thermal properties of Al nanoparticles with different defect concentrations. 4 nm Al nanoparticles with different defect concentrations (5%–20%) were considered in this paper. Our results revealed that: the melting points are irrelevant with defect concentration at a certain particle size. The extra storage energy of Al nanoparticles is proportional to nanoparticles’ defect concentration, when defect concentration is 5%–15%. While the particle with 20% defect concentration is similar to the cluster with 10% defect concentration. After melting, the extra energy of all nanoparticles decreases sharply, and the extra storage energy is nearly zero at 600 K. The centro-symmetry parameter analysis shows structure evolution of different models during melting processes.
VizieR Online Data Catalog: Abell 315 spectroscopic dataset (Biviano+, 2017)
NASA Astrophysics Data System (ADS)
Biviano, A.; Popesso, P.; Dietrich, J. P.; Zhang, Y.-Y.; Erfanianfar, G.; Romaniello, M.; Sartoris, B.
2017-03-01
Abell 315 was observed at the European Southern Observatory (ESO) Very Large Telescope (VLT) with the VIsible MultiObject Spectrograph (VIMOS). The VIMOS data were acquired using 8 separate pointings, plus 2 additional pointings required to provide the needed redundancy within the central region and to cover the gaps between the VIMOS quadrants. Catalog of galaxies with redshifts in the region of the cluster Abell 315, with flags indicating whether these galaxies are members of the cluster, members of substructures within the cluster, and with probabilities for the cluster members to belong to the main cluster structure. (1 data file).
Electrical Load Profile Analysis Using Clustering Techniques
NASA Astrophysics Data System (ADS)
Damayanti, R.; Abdullah, A. G.; Purnama, W.; Nandiyanto, A. B. D.
2017-03-01
Data mining is one of the data processing techniques to collect information from a set of stored data. Every day the consumption of electricity load is recorded by Electrical Company, usually at intervals of 15 or 30 minutes. This paper uses a clustering technique, which is one of data mining techniques to analyse the electrical load profiles during 2014. The three methods of clustering techniques were compared, namely K-Means (KM), Fuzzy C-Means (FCM), and K-Means Harmonics (KHM). The result shows that KHM is the most appropriate method to classify the electrical load profile. The optimum number of clusters is determined using the Davies-Bouldin Index. By grouping the load profile, the demand of variation analysis and estimation of energy loss from the group of load profile with similar pattern can be done. From the group of electric load profile, it can be known cluster load factor and a range of cluster loss factor that can help to find the range of values of coefficients for the estimated loss of energy without performing load flow studies.
Domain and network aggregation of CdTe quantum rods within Langmuir Blodgett monolayers
NASA Astrophysics Data System (ADS)
Zimnitsky, Dmitry; Xu, Jun; Lin, Zhiqun; Tsukruk, Vladimir V.
2008-05-01
Control over the organization of quantum rods was demonstrated by changing the surface area at the air-liquid interface by means of the Langmuir-Blodgett (LB) technique. The LB isotherm of CdTe quantum rods capped with a mixture of alkylphosphines shows a transition point in the liquid-solid state, which is caused by the inter-rod reorganization. As we observed, at low surface pressure the quantum rods are assembled into round-shaped aggregates composed of a monolayer of nanorods packed in limited-size clusters with random orientation. The increase of the surface pressure leads to the rearrangement of these aggregates into elongated bundles composed of uniformly oriented nanorod clusters. Further compression results in denser packing of nanorods aggregates and in the transformation of monolayered domains into a continuous network of locally ordered quantum rods.
Intersection Detection Based on Qualitative Spatial Reasoning on Stopping Point Clusters
NASA Astrophysics Data System (ADS)
Zourlidou, S.; Sester, M.
2016-06-01
The purpose of this research is to propose and test a method for detecting intersections by analysing collectively acquired trajectories of moving vehicles. Instead of solely relying on the geometric features of the trajectories, such as heading changes, which may indicate turning points and consequently intersections, we extract semantic features of the trajectories in form of sequences of stops and moves. Under this spatiotemporal prism, the extracted semantic information which indicates where vehicles stop can reveal important locations, such as junctions. The advantage of the proposed approach in comparison with existing turning-points oriented approaches is that it can detect intersections even when not all the crossing road segments are sampled and therefore no turning points are observed in the trajectories. The challenge with this approach is that first of all, not all vehicles stop at the same location - thus, the stop-location is blurred along the direction of the road; this, secondly, leads to the effect that nearby junctions can induce similar stop-locations. As a first step, a density-based clustering is applied on the layer of stop observations and clusters of stop events are found. Representative points of the clusters are determined (one per cluster) and in a last step the existence of an intersection is clarified based on spatial relational cluster reasoning, with which less informative geospatial clusters, in terms of whether a junction exists and where its centre lies, are transformed in more informative ones. Relational reasoning criteria, based on the relative orientation of the clusters with their adjacent ones are discussed for making sense of the relation that connects them, and finally for forming groups of stop events that belong to the same junction.
Melting phenomena: effect of composition for 55-atom Ag-Pd bimetallic clusters.
Cheng, Daojian; Wang, Wenchuan; Huang, Shiping
2008-05-14
Understanding the composition effect on the melting processes of bimetallic clusters is important for their applications. Here, we report the relationship between the melting point and the metal composition for the 55-atom icosahedral Ag-Pd bimetallic clusters by canonical Monte Carlo simulations, using the second-moment approximation of the tight-binding potentials (TB-SMA) for the metal-metal interactions. Abnormal melting phenomena for the systems of interest are found. Our simulation results reveal that the dependence of the melting point on the composition is not a monotonic change, but experiences three different stages. The melting temperatures of the Ag-Pd bimetallic clusters increase monotonically with the concentration of the Ag atoms first. Then, they reach a plateau presenting almost a constant value. Finally, they decrease sharply at a specific composition. The main reason for this change can be explained in terms of the relative stability of the Ag-Pd bimetallic clusters at different compositions. The results suggest that the more stable the cluster, the higher the melting point for the 55-atom icosahedral Ag-Pd bimetallic clusters at different compositions.
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Key-Node-Separated Graph Clustering and Layouts for Human Relationship Graph Visualization.
Itoh, Takayuki; Klein, Karsten
2015-01-01
Many graph-drawing methods apply node-clustering techniques based on the density of edges to find tightly connected subgraphs and then hierarchically visualize the clustered graphs. However, users may want to focus on important nodes and their connections to groups of other nodes for some applications. For this purpose, it is effective to separately visualize the key nodes detected based on adjacency and attributes of the nodes. This article presents a graph visualization technique for attribute-embedded graphs that applies a graph-clustering algorithm that accounts for the combination of connections and attributes. The graph clustering step divides the nodes according to the commonality of connected nodes and similarity of feature value vectors. It then calculates the distances between arbitrary pairs of clusters according to the number of connecting edges and the similarity of feature value vectors and finally places the clusters based on the distances. Consequently, the technique separates important nodes that have connections to multiple large clusters and improves the visibility of such nodes' connections. To test this technique, this article presents examples with human relationship graph datasets, including a coauthorship and Twitter communication network dataset.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cho, Nancy L., E-mail: nlcho@partners.org; Lin, Chi-Iou; Du, Jinyan
Highlights: Black-Right-Pointing-Pointer Kinome profiling is a novel technique for identifying activated kinases in human cancers. Black-Right-Pointing-Pointer Src activity is increased in invasive thyroid cancers. Black-Right-Pointing-Pointer Inhibition of Src activity decreased proliferation and invasion in vitro. Black-Right-Pointing-Pointer Further investigation of Src targeted therapies in thyroid cancer is warranted. -- Abstract: Background: Novel therapies are needed for the treatment of invasive thyroid cancers. Aberrant activation of tyrosine kinases plays an important role in thyroid oncogenesis. Because current targeted therapies are biased toward a small subset of tyrosine kinases, we conducted a study to reveal novel therapeutic targets for thyroid cancer using amore » bead-based, high-throughput system. Methods: Thyroid tumors and matched normal tissues were harvested from twenty-six patients in the operating room. Protein lysates were analyzed using the Luminex immunosandwich, a bead-based kinase phosphorylation assay. Data was analyzed using GenePattern 3.0 software and clustered according to histology, demographic factors, and tumor status regarding capsular invasion, size, lymphovascular invasion, and extrathyroidal extension. Survival and invasion assays were performed to determine the effect of Src inhibition in papillary thyroid cancer (PTC) cells. Results: Tyrosine kinome profiling demonstrated upregulation of nine tyrosine kinases in tumors relative to matched normal thyroid tissue: EGFR, PTK6, BTK, HCK, ABL1, TNK1, GRB2, ERK, and SRC. Supervised clustering of well-differentiated tumors by histology, gender, age, or size did not reveal significant differences in tyrosine kinase activity. However, supervised clustering by the presence of invasive disease showed increased Src activity in invasive tumors relative to non-invasive tumors (60% v. 0%, p < 0.05). In vitro, we found that Src inhibition in PTC cells decreased cell invasion and proliferation. Conclusion: Global kinome analysis enables the discovery of novel targets for thyroid cancer therapy. Further investigation of Src targeted therapy for advanced thyroid cancer is warranted.« less
Computational Studies on the Anharmonic Dynamics of Molecular Clusters
NASA Astrophysics Data System (ADS)
Mancini, John S.
Molecular nanoclusters present ideal systems to probe the physical forces and dynamics that drive the behavior of larger bulk systems. At the nanocluster limit the first instances of several phenomena can be observed including the breaking of hydrogen and molecular bonds. Advancements in experimental and theoretical techniques have made it possible to explore these phenomena in great detail. The most fruitful of these studies have involved the use of both experimental and theoretical techniques to leverage to strengths of the two approaches. This dissertation seeks to explore several important phenomena of molecular clusters using new and existing theoretical methodologies. Three specific systems are considered, hydrogen chloride clusters, mixed water and hydrogen chloride clusters and the first cluster where hydrogen chloride autoionization occurs. The focus of these studies remain as close as possible to experimentally observable phenomena with the intention of validating, simulating and expanding on experimental work. Specifically, the properties of interested are those related to the vibrational ground and excited state dynamics of these systems. Studies are performed using full and reduced dimensional potential energy surface alongside advanced quantum mechanical methods including diffusion Monte Carlo, vibrational configuration interaction theory and quasi-classical molecular dynamics. The insight gained from these studies are great and varied. A new on-they-fly ab initio method for studying molecular clusters is validated for (HCl)1--6. A landmark study of the dissociation energy and predissociation mechanism of (HCl)3 is reported. The ground states of mixed (HCl)n(H2O)m are found to be highly delocalized across multiple stationary point configurations. Furthermore, it is identified that the consideration of this delocalization is required in vibrational excited state calculations to achieve agreement with experimental measurements. Finally, the theoretical infrared spectra for the first case of HCl ionization in (H 2O)m is reported, H+(H2O) 3Cl--. The calculation indicates that the ionized cluster's spectra is much more complex than any pervious harmonic predictions, with a large number of the system's infrared active peaks resulting from overtones of lower frequency molecular motions.
Song, Min; Yu, Hwanjo; Han, Wook-Shin
2011-11-24
Protein-protein interaction (PPI) extraction has been a focal point of many biomedical research and database curation tools. Both Active Learning and Semi-supervised SVMs have recently been applied to extract PPI automatically. In this paper, we explore combining the AL with the SSL to improve the performance of the PPI task. We propose a novel PPI extraction technique called PPISpotter by combining Deterministic Annealing-based SSL and an AL technique to extract protein-protein interaction. In addition, we extract a comprehensive set of features from MEDLINE records by Natural Language Processing (NLP) techniques, which further improve the SVM classifiers. In our feature selection technique, syntactic, semantic, and lexical properties of text are incorporated into feature selection that boosts the system performance significantly. By conducting experiments with three different PPI corpuses, we show that PPISpotter is superior to the other techniques incorporated into semi-supervised SVMs such as Random Sampling, Clustering, and Transductive SVMs by precision, recall, and F-measure. Our system is a novel, state-of-the-art technique for efficiently extracting protein-protein interaction pairs.
Performance Analysis of Entropy Methods on K Means in Clustering Process
NASA Astrophysics Data System (ADS)
Dicky Syahputra Lubis, Mhd.; Mawengkang, Herman; Suwilo, Saib
2017-12-01
K Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters / groups. This method partitions the data into clusters / groups so that data that have the same characteristics are grouped into the same cluster and data that have different characteristics are grouped into other groups.The purpose of this data clustering is to minimize the objective function set in the clustering process, which generally attempts to minimize variation within a cluster and maximize the variation between clusters. However, the main disadvantage of this method is that the number k is often not known before. Furthermore, a randomly chosen starting point may cause two points to approach the distance to be determined as two centroids. Therefore, for the determination of the starting point in K Means used entropy method where this method is a method that can be used to determine a weight and take a decision from a set of alternatives. Entropy is able to investigate the harmony in discrimination among a multitude of data sets. Using Entropy criteria with the highest value variations will get the highest weight. Given this entropy method can help K Means work process in determining the starting point which is usually determined at random. Thus the process of clustering on K Means can be more quickly known by helping the entropy method where the iteration process is faster than the K Means Standard process. Where the postoperative patient dataset of the UCI Repository Machine Learning used and using only 12 data as an example of its calculations is obtained by entropy method only with 2 times iteration can get the desired end result.
NASA Astrophysics Data System (ADS)
Banegas, Frederic; Michelucci, Dominique; Roelens, Marc; Jaeger, Marc
1999-05-01
We present a robust method for automatically constructing an ellipsoidal skeleton (e-skeleton) from a set of 3D points taken from NMR or TDM images. To ensure steadiness and accuracy, all points of the objects are taken into account, including the inner ones, which is different from the existing techniques. This skeleton will be essentially useful for object characterization, for comparisons between various measurements and as a basis for deformable models. It also provides good initial guess for surface reconstruction algorithms. On output of the entire process, we obtain an analytical description of the chosen entity, semantically zoomable (local features only or reconstructed surfaces), with any level of detail (LOD) by discretization step control in voxel or polygon format. This capability allows us to handle objects at interactive frame rates once the e-skeleton is computed. Each e-skeleton is stored as a multiscale CSG implicit tree.
Random Walk Quantum Clustering Algorithm Based on Space
NASA Astrophysics Data System (ADS)
Xiao, Shufen; Dong, Yumin; Ma, Hongyang
2018-01-01
In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.
NASA Astrophysics Data System (ADS)
Kapustin, P.; Svetukhin, V.; Tikhonchev, M.
2017-06-01
The atomic displacement cascade simulations near symmetric tilt grain boundaries (GBs) in hexagonal close packed-Zirconium were considered in this paper. Further defect structure analysis was conducted. Four symmetrical tilt GBs -∑14?, ∑14? with the axis of rotation [0 0 0 1] and ∑32?, ∑32? with the axis of rotation ? - were considered. The molecular dynamics method was used for atomic displacement cascades' simulation. A tendency of the point defects produced in the cascade to accumulate near the GB plane, which was an obstacle to the spread of the cascade, was discovered. The results of the point defects' clustering produced in the cascade were obtained. The clusters of both types were represented mainly by single point defects. At the same time, vacancies formed clusters of a large size (more than 20 vacancies per cluster), while self-interstitial atom clusters were small-sized.
Automatic pole-like object modeling via 3D part-based analysis of point cloud
NASA Astrophysics Data System (ADS)
He, Liu; Yang, Haoxiang; Huang, Yuchun
2016-10-01
Pole-like objects, including trees, lampposts and traffic signs, are indispensable part of urban infrastructure. With the advance of vehicle-based laser scanning (VLS), massive point cloud of roadside urban areas becomes applied in 3D digital city modeling. Based on the property that different pole-like objects have various canopy parts and similar trunk parts, this paper proposed the 3D part-based shape analysis to robustly extract, identify and model the pole-like objects. The proposed method includes: 3D clustering and recognition of trunks, voxel growing and part-based 3D modeling. After preprocessing, the trunk center is identified as the point that has local density peak and the largest minimum inter-cluster distance. Starting from the trunk centers, the remaining points are iteratively clustered to the same centers of their nearest point with higher density. To eliminate the noisy points, cluster border is refined by trimming boundary outliers. Then, candidate trunks are extracted based on the clustering results in three orthogonal planes by shape analysis. Voxel growing obtains the completed pole-like objects regardless of overlaying. Finally, entire trunk, branch and crown part are analyzed to obtain seven feature parameters. These parameters are utilized to model three parts respectively and get signal part-assembled 3D model. The proposed method is tested using the VLS-based point cloud of Wuhan University, China. The point cloud includes many kinds of trees, lampposts and other pole-like posters under different occlusions and overlaying. Experimental results show that the proposed method can extract the exact attributes and model the roadside pole-like objects efficiently.
DOT National Transportation Integrated Search
2016-09-01
We consider the problem of subspace clustering: given points that lie on or near the union of many low-dimensional linear subspaces, recover the subspaces. To this end, one first identifies sets of points close to the same subspace and uses the sets ...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro
In this article, we report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ-ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few tomore » tens of megaparsecs, i.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, i.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ-ray emission from the intracluster medium. Lastly, we argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro
We report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ -ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few to tens ofmore » megaparsecs, i.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, i.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ -ray emission from the intracluster medium. We argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.« less
An algorithm for spatial heirarchy clustering
NASA Technical Reports Server (NTRS)
Dejesusparada, N. (Principal Investigator); Velasco, F. R. D.
1981-01-01
A method for utilizing both spectral and spatial redundancy in compacting and preclassifying images is presented. In multispectral satellite images, a high correlation exists between neighboring image points which tend to occupy dense and restricted regions of the feature space. The image is divided into windows of the same size where the clustering is made. The classes obtained in several neighboring windows are clustered, and then again successively clustered until only one region corresponding to the whole image is obtained. By employing this algorithm only a few points are considered in each clustering, thus reducing computational effort. The method is illustrated as applied to LANDSAT images.
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Structure and stability of small Li2 +(X2Σ+ g )-Xen (n = 1-6) clusters
NASA Astrophysics Data System (ADS)
Saidi, Sameh; Ghanmi, Chedli; Berriche, Hamid
2014-04-01
We have studied the structure and stability of the Li2 +(X2Σ+ g )Xe n ( n = 1-6) clusters for special symmetry groups. The potential energy surfaces of these clusters, are described using an accurate ab initio approach based on non-empirical pseudopotential, parameterized l-dependent polarization potential and analytic potential forms for the Li+Xe and Xe-Xe interactions. The pseudopotential technique has reduced the number of active electrons of Li2 +(X2Σ+ g )-Xe n ( n = 1-6) clusters to only one electron, the Li valence electron. The core-core interactions for Li+Xe are included using accurate CCSD(T) potential fitted using the analytical form of Tang and Toennies. For the Xe-Xe potential interactions we have used the analytical form of Lennard Jones (LJ6 - 12). The potential energy surfaces of the Li2 +(X2Σ+ g )Xe n ( n = 1-6) clusters are performed for a fixed distance of the Li2 +(X2Σ+ g ) alkali dimer, its equilibrium distance. They are used to extract information on the stability of the Li2 +(X2Σ+ g Xe n ( n = 1-6) clusters. For each n, the stability of the different isomers is examined by comparing their potential energy surfaces. Moreover, we have determined the quantum energies ( D 0), the zero-point-energies (ZPE) and the ZPE%. To our best knowledge, there are neither experimental nor theoretical works realized for the Li2 +(X2Σ+ g Xe n ( n = 1-6) clusters, our results are presented for the first time.
CROSS-CORRELATING THE γ-RAY SKY WITH CATALOGS OF GALAXY CLUSTERS
Branchini, Enzo; Camera, Stefano; Cuoco, Alessandro; ...
2017-01-18
In this article, we report the detection of a cross-correlation signal between Fermi Large Area Telescope diffuse γ-ray maps and catalogs of clusters. In our analysis, we considered three different catalogs: WHL12, redMaPPer, and PlanckSZ. They all show a positive correlation with different amplitudes, related to the average mass of the objects in each catalog, which also sets the catalog bias. The signal detection is confirmed by the results of a stacking analysis. The cross-correlation signal extends to rather large angular scales, around 1°, that correspond, at the typical redshift of the clusters in these catalogs, to a few tomore » tens of megaparsecs, i.e., the typical scale-length of the large-scale structures in the universe. Most likely this signal is contributed by the cumulative emission from active galactic nuclei (AGNs) associated with the filamentary structures that converge toward the high peaks of the matter density field in which galaxy clusters reside. In addition, our analysis reveals the presence of a second component, more compact in size and compatible with a point-like emission from within individual clusters. At present, we cannot distinguish between the two most likely interpretations for such a signal, i.e., whether it is produced by AGNs inside clusters or if it is a diffuse γ-ray emission from the intracluster medium. Lastly, we argue that this latter, intriguing, hypothesis might be tested by applying this technique to a low-redshift large-mass cluster sample.« less
Dust-enshrouded super star-clusters
NASA Astrophysics Data System (ADS)
Sauvage, Marc; Plante, Stéphanie
2003-04-01
With the advent of either sensitive space-born infrared cameras, or their high-resolution ground-based siblings, we are uncovering a new category of star clusters: the dust-enshrouded super-star clusters. These manifest themselves only beyond a few microns, as their shroud of dust is able to block all light emitted by the stars themselves. Here we present our results on the spectacular cluster in SBS 0335-052, a very metal-poor galaxy. We also point to the growing number of galaxy analogs to SBS 0335-052, revealing the possibility that these clusters signal a major mode of star formation in starbursts. We conclude by listing a number of open points these clusters raise, in particular with respect to high-redshift counterparts.
Summary Diagrams for Coupled Hydrodynamic-Ecosystem Model Skill Assessment
2009-01-01
reference point have the smallest unbiased RMSD value (Fig. 3). It would appear that the cluster of model points closest to the reference point may...total RMSD values. This is particularly the case for phyto- plankton absorption (Fig. 3B) where the cluster of points closest to the reference...pattern statistics and the bias (difference of mean values) each magnitude of the total Root-Mean-Square Difference ( RMSD ). An alternative skill score and
Analysis of the mutations induced by conazole fungicides in vivo.
Ross, Jeffrey A; Leavitt, Sharon A
2010-05-01
The mouse liver tumorigenic conazole fungicides triadimefon and propiconazole have previously been shown to be in vivo mouse liver mutagens in the Big Blue transgenic mutation assay when administered in feed at tumorigenic doses, whereas the non-tumorigenic conazole myclobutanil was not mutagenic. DNA sequencing of the mutants recovered from each treatment group as well as from animals receiving control diet was conducted to gain additional insight into the mode of action by which tumorigenic conazoles induce mutations. Relative dinucleotide mutabilities (RDMs) were calculated for each possible dinucleotide in each treatment group and then examined by multivariate statistical analysis techniques. Unsupervised hierarchical clustering analysis of RDM values segregated two independent control groups together, along with the non-tumorigen myclobutanil. The two tumorigenic conazoles clustered together in a distinct grouping. Partitioning around mediods of RDM values into two clusters also groups the triadimefon and propiconazole together in one cluster and the two control groups and myclobutanil together in a second cluster. Principal component analysis of these results identifies two components that account for 88.3% of the variability in the points. Taken together, these results are consistent with the hypothesis that propiconazole- and triadimefon-induced mutations do not represent clonal expansion of background mutations and support the hypothesis that they arise from the accumulation of reactive electrophilic metabolic intermediates within the liver in vivo.
From solid solution to cluster formation of Fe and Cr in α-Zr
NASA Astrophysics Data System (ADS)
Burr, P. A.; Wenman, M. R.; Gault, B.; Moody, M. P.; Ivermark, M.; Rushton, M. J. D.; Preuss, M.; Edwards, L.; Grimes, R. W.
2015-12-01
To understand the mechanisms by which the re-solution of Fe and Cr additions increase the corrosion rate of irradiated Zr alloys, the solubility and clustering of Fe and Cr in model binary Zr alloys was investigated using a combination of experimental and modelling techniques - atom probe tomography (APT), x-ray diffraction (XRD), thermoelectric power (TEP) and density functional theory (DFT). Cr occupies both interstitial and substitutional sites in the α-Zr lattice; Fe favours interstitial sites, and a low-symmetry site that was not previously modelled is found to be the most favourable for Fe. Lattice expansion as a function of Fe and Cr content in the α-Zr matrix deviates from Vegard's law and is strongly anisotropic for Fe additions, expanding the c-axis while contracting the a-axis. Matrix content of solutes cannot be reliably estimated from lattice parameter measurements, instead a combination of TEP and APT was employed. Defect clusters form at higher solution concentrations, which induce a smaller lattice strain compared to the dilute defects. In the presence of a Zr vacancy, all two-atom clusters are more soluble than individual point defects and as many as four Fe or three Cr atoms could be accommodated in a single Zr vacancy. The Zr vacancy is critical for the increased apparent solubility of defect clusters; the implications for irradiation induced microstructure changes in Zr alloys are discussed.
Automatic microseismic event picking via unsupervised machine learning
NASA Astrophysics Data System (ADS)
Chen, Yangkang
2018-01-01
Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.
Ding, Jiarui; Shah, Sohrab; Condon, Anne
2016-01-01
Motivation: Many biological data processing problems can be formalized as clustering problems to partition data points into sensible and biologically interpretable groups. Results: This article introduces densityCut, a novel density-based clustering algorithm, which is both time- and space-efficient and proceeds as follows: densityCut first roughly estimates the densities of data points from a K-nearest neighbour graph and then refines the densities via a random walk. A cluster consists of points falling into the basin of attraction of an estimated mode of the underlining density function. A post-processing step merges clusters and generates a hierarchical cluster tree. The number of clusters is selected from the most stable clustering in the hierarchical cluster tree. Experimental results on ten synthetic benchmark datasets and two microarray gene expression datasets demonstrate that densityCut performs better than state-of-the-art algorithms for clustering biological datasets. For applications, we focus on the recent cancer mutation clustering and single cell data analyses, namely to cluster variant allele frequencies of somatic mutations to reveal clonal architectures of individual tumours, to cluster single-cell gene expression data to uncover cell population compositions, and to cluster single-cell mass cytometry data to detect communities of cells of the same functional states or types. densityCut performs better than competing algorithms and is scalable to large datasets. Availability and Implementation: Data and the densityCut R package is available from https://bitbucket.org/jerry00/densitycut_dev. Contact: condon@cs.ubc.ca or sshah@bccrc.ca or jiaruid@cs.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153661
Correlation Functions in Two-Dimensional Critical Systems with Conformal Symmetry
NASA Astrophysics Data System (ADS)
Flores, Steven Miguel
This thesis presents a study of certain conformal field theory (CFT) correlation functions that describe physical observables in conform ally invariant two-dimensional critical systems. These are typically continuum limits of critical lattice models in a domain within the complex plane and with a boundary. Certain clusters, called
Effect of Coulomb Correlation on the Magnetic Properties of Mn Clusters.
Huang, Chengxi; Zhou, Jian; Deng, Kaiming; Kan, Erjun; Jena, Puru
2018-05-03
In spite of decades of research, a fundamental understanding of the unusual magnetic behavior of small Mn clusters remains a challenge. Experiments show that Mn 2 is antiferromagnetic while small clusters containing up to five Mn atoms are ferromagnetic with magnetic moments of 5 μ B /atom and become ferrimagnetic as they grow further. Theoretical studies based on density functional theory (DFT), however, find Mn 2 to be ferromagnetic, with ferrimagnetic order setting in at different sizes that depend upon the computational methods used. While quantum chemical techniques correctly account for the antiferromagnetic ground state of Mn 2 , they are computationally too demanding to treat larger clusters, making it difficult to understand the evolution of magnetism. These studies clearly point to the importance of correlation and the need to find ways to treat it effectively for larger clusters and nanostructures. Here, we show that the DFT+ U method can be used to account for strong correlation. We determine the on-site Coulomb correlation, Hubbard U self-consistently by using the linear response theory and study its effect on the magnetic coupling of Mn clusters containing up to five atoms. With a calculated U value of 4.8 eV, we show that the ground state of Mn 2 is antiferromagnetic with a Mn-Mn distance of 3.34 Å, which agrees well with the electron spin resonance experiment. Equally important, we show that on-site Coulomb correlation also plays an important role in the evolution of magnetic coupling in larger clusters, as the results differ significantly from standard DFT calculations. We conclude that for a proper understanding of magnetism of Mn nanostructures (clusters, chains, and layers) one must take into account the effect of strong correlation.
Locality-Aware CTA Clustering For Modern GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Ang; Song, Shuaiwen; Liu, Weifeng
2017-04-08
In this paper, we proposed a novel clustering technique for tapping into the performance potential of a largely ignored type of locality: inter-CTA locality. We first demonstrated the capability of the existing GPU hardware to exploit such locality, both spatially and temporally, on L1 or L1/Tex unified cache. To verify the potential of this locality, we quantified its existence in a broad spectrum of applications and discussed its sources of origin. Based on these insights, we proposed the concept of CTA-Clustering and its associated software techniques. Finally, We evaluated these techniques on all modern generations of NVIDIA GPU architectures. Themore » experimental results showed that our proposed clustering techniques could significantly improve on-chip cache performance.« less
Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data.
Vera, J Fernando; Macías, Rodrigo
2017-06-01
One of the main problems in cluster analysis is that of determining the number of groups in the data. In general, the approach taken depends on the cluster method used. For K-means, some of the most widely employed criteria are formulated in terms of the decomposition of the total point scatter, regarding a two-mode data set of N points in p dimensions, which are optimally arranged into K classes. This paper addresses the formulation of criteria to determine the number of clusters, in the general situation in which the available information for clustering is a one-mode [Formula: see text] dissimilarity matrix describing the objects. In this framework, p and the coordinates of points are usually unknown, and the application of criteria originally formulated for two-mode data sets is dependent on their possible reformulation in the one-mode situation. The decomposition of the variability of the clustered objects is proposed in terms of the corresponding block-shaped partition of the dissimilarity matrix. Within-block and between-block dispersion values for the partitioned dissimilarity matrix are derived, and variance-based criteria are subsequently formulated in order to determine the number of groups in the data. A Monte Carlo experiment was carried out to study the performance of the proposed criteria. For simulated clustered points in p dimensions, greater efficiency in recovering the number of clusters is obtained when the criteria are calculated from the related Euclidean distances instead of the known two-mode data set, in general, for unequal-sized clusters and for low dimensionality situations. For simulated dissimilarity data sets, the proposed criteria always outperform the results obtained when these criteria are calculated from their original formulation, using dissimilarities instead of distances.
Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C
2014-01-01
Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Bolin, Jocelyn H.; Edwards, Julianne M.; Finch, W. Holmes; Cassady, Jerrell C.
2014-01-01
Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering. PMID:24795683
Robles, Guillermo; Fresno, José Manuel; Martínez-Tarifa, Juan Manuel; Ardila-Rey, Jorge Alfredo; Parrado-Hernández, Emilio
2018-03-01
The measurement of partial discharge (PD) signals in the radio frequency (RF) range has gained popularity among utilities and specialized monitoring companies in recent years. Unfortunately, in most of the occasions the data are hidden by noise and coupled interferences that hinder their interpretation and renders them useless especially in acquisition systems in the ultra high frequency (UHF) band where the signals of interest are weak. This paper is focused on a method that uses a selective spectral signal characterization to feature each signal, type of partial discharge or interferences/noise, with the power contained in the most representative frequency bands. The technique can be considered as a dimensionality reduction problem where all the energy information contained in the frequency components is condensed in a reduced number of UHF or high frequency (HF) and very high frequency (VHF) bands. In general, dimensionality reduction methods make the interpretation of results a difficult task because the inherent physical nature of the signal is lost in the process. The proposed selective spectral characterization is a preprocessing tool that facilitates further main processing. The starting point is a clustering of signals that could form the core of a PD monitoring system. Therefore, the dimensionality reduction technique should discover the best frequency bands to enhance the affinity between signals in the same cluster and the differences between signals in different clusters. This is done maximizing the minimum Mahalanobis distance between clusters using particle swarm optimization (PSO). The tool is tested with three sets of experimental signals to demonstrate its capabilities in separating noise and PDs with low signal-to-noise ratio and separating different types of partial discharges measured in the UHF and HF/VHF bands.
Suicide Clusters: A Review of Risk Factors and Mechanisms
ERIC Educational Resources Information Center
Haw, Camilla; Hawton, Keith; Niedzwiedz, Claire; Platt, Steve
2013-01-01
Suicide clusters, although uncommon, cause great concern in the communities in which they occur. We searched the world literature on suicide clusters and describe the risk factors and proposed psychological mechanisms underlying the spatio-temporal clustering of suicides (point clusters). Potential risk factors include male gender, being an…
NASA Technical Reports Server (NTRS)
Reese, Erik D.; Mroczkowski, Tony; Menanteau, Felipe; Hilton, Matt; Sievers, Jonathan; Aguirre, Paula; Appel, John William; Baker, Andrew J.; Bond, J. Richard; Das, Sudeep;
2011-01-01
We present follow-up observations with the Sunyaev-Zel'dovich Array (SZA) of optically-confirmed galaxy clusters found in the equatorial survey region of the Atacama Cosmology Telescope (ACT): ACT-CL J0022-0036, ACT-CL J2051+0057, and ACT-CL J2337+0016. ACT-CL J0022-0036 is a newly-discovered, massive (10(exp 15) Msun), high-redshift (z=0.81) cluster revealed by ACT through the Sunyaev-Zel'dovich effect (SZE). Deep, targeted observations with the SZA allow us to probe a broader range of cluster spatial scales, better disentangle cluster decrements from radio point source emission, and derive more robust integrated SZE flux and mass estimates than we can with ACT data alone. For the two clusters we detect with the SZA we compute integrated SZE signal and derive masses from the SZA data only. ACT-CL J2337+0016, also known as Abell 2631, has archival Chandra data that allow an additional X-ray-based mass estimate. Optical richness is also used to estimate cluster masses and shows good agreement with the SZE and X-ray-based estimates. Based on the point sources detected by the SZA in these three cluster fields and an extrapolation to ACT's frequency, we estimate that point sources could be contaminating the SZE decrement at the less than = 20% level for some fraction of clusters.
NASA Technical Reports Server (NTRS)
Reese, Erik; Mroczkowski, Tony; Menateau, Felipe; Hilton, Matt; Sievers, Jonathan; Aguirre, Paula; Appel, John William; Baker, Andrew J.; Bond, J. Richard; Das, Sudeep;
2011-01-01
We present follow-up observations with the Sunyaev-Zel'dovich Array (SZA) of optically-confirmed galaxy clusters found in the equatorial survey region of the Atacama Cosmology Telescope (ACT): ACT-CL J0022-0036, ACT-CL J2051+0057, and ACT-CL J2337+0016. ACT-CL J0022-0036 is a newly-discovered, massive ( approximately equals 10(exp 15) Solar M), high-redshift (z = 0.81) cluster revealed by ACT through the Sunyaev-Zeldovich effect (SZE). Deep, targeted observations with the SZA allow us to probe a broader range of cluster spatial scales, better disentangle cluster decrements from radio point source emission, and derive more robust integrated SZE flux and mass estimates than we can with ACT data alone. For the two clusters we detect with the SZA we compute integrated SZE signal and derive masses from the SZA data only. ACT-CL J2337+0016, also known as Abell 2631, has archival Chandra data that allow an additional X-ray-based mass estimate. Optical richness is also used to estimate cluster masses and shows good agreement with the SZE and X-ray-based estimates. Based on the point sources detected by the SZA in these three cluster fields and an extrapolation to ACT's frequency, we estimate that point sources could be contaminating the SZE decrement at the approx < 20% level for some fraction of clusters.
The cluster-cluster correlation function. [of galaxies
NASA Technical Reports Server (NTRS)
Postman, M.; Geller, M. J.; Huchra, J. P.
1986-01-01
The clustering properties of the Abell and Zwicky cluster catalogs are studied using the two-point angular and spatial correlation functions. The catalogs are divided into eight subsamples to determine the dependence of the correlation function on distance, richness, and the method of cluster identification. It is found that the Corona Borealis supercluster contributes significant power to the spatial correlation function to the Abell cluster sample with distance class of four or less. The distance-limited catalog of 152 Abell clusters, which is not greatly affected by a single system, has a spatial correlation function consistent with the power law Xi(r) = 300r exp -1.8. In both the distance class four or less and distance-limited samples the signal in the spatial correlation function is a power law detectable out to 60/h Mpc. The amplitude of Xi(r) for clusters of richness class two is about three times that for richness class one clusters. The two-point spatial correlation function is sensitive to the use of estimated redshifts.
Study of point- and cluster-defects in radiation-damaged silicon
NASA Astrophysics Data System (ADS)
Donegani, Elena M.; Fretwurst, Eckhart; Garutti, Erika; Klanner, Robert; Lindstroem, Gunnar; Pintilie, Ioana; Radu, Roxana; Schwandt, Joern
2018-08-01
Non-ionising energy loss of radiation produces point defects and defect clusters in silicon, which result in a significant degradation of sensor performance. In this contribution results from TSC (Thermally Stimulated Current) defect spectroscopy for silicon pad diodes irradiated by electrons to fluences of a few 1014 cm-2 and energies between 3.5 and 27 MeV for isochronal annealing between 80 and 280∘C, are presented. A method based on SRH (Shockley-Read-Hall) statistics is introduced, which assumes that the ionisation energy of the defects in a cluster depends on the fraction of occupied traps. The difference of ionisation energy of an isolated point defect and a fully occupied cluster, ΔEa, is extracted from the TSC data. For the VOi (vacancy-oxygen interstitial) defect ΔEa = 0 is found, which confirms that it is a point defect, and validates the method for point defects. For clusters made of deep acceptors the ΔEa values for different defects are determined after annealing at 80∘C as a function of electron energy, and for the irradiation with 15 MeV electrons as a function of annealing temperature. For the irradiation with 3.5 MeV electrons the value ΔEa = 0 is found, whereas for the electron energies of 6-27 MeV ΔEa > 0. This agrees with the expected threshold of about 5 MeV for cluster formation by electrons. The ΔEa values determined as a function of annealing temperature show that the annealing rate is different for different defects. A naive diffusion model is used to estimate the temperature dependencies of the diffusion of the defects in the clusters.
Clustervision: Visual Supervision of Unsupervised Clustering.
Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam
2018-01-01
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Concept mapping and network analysis: an analytic approach to measure ties among constructs.
Goldman, Alyssa W; Kane, Mary
2014-12-01
Group concept mapping is a mixed-methods approach that helps a group visually represent its ideas on a topic of interest through a series of related maps. The maps and additional graphics are useful for planning, evaluation and theory development. Group concept maps are typically described, interpreted and utilized through points, clusters and distances, and the implications of these features in understanding how constructs relate to one another. This paper focuses on the application of network analysis to group concept mapping to quantify the strength and directionality of relationships among clusters. The authors outline the steps of this analysis, and illustrate its practical use through an organizational strategic planning example. Additional benefits of this analysis to evaluation projects are also discussed, supporting the overall utility of this supplemental technique to the standard concept mapping methodology. Copyright © 2014 Elsevier Ltd. All rights reserved.
Advanced analysis of forest fire clustering
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail; Pereira, Mario; Golay, Jean
2017-04-01
Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index. Pattern Recognition, 48, 4070-4081.
A Bayesian cluster analysis method for single-molecule localization microscopy data.
Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick
2016-12-01
Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.
Analysis of Spectral-type A/B Stars in Five Open Clusters
NASA Astrophysics Data System (ADS)
Wilhelm, Ronald J.; Rafuil Islam, M.
2014-01-01
We have obtained low resolution (R = 1000) spectroscopy of N=68, spectral-type A/B stars in five nearby open star clusters using the McDonald Observatory, 2.1m telescope. The sample of blue stars in various clusters were selected to test our new technique for determining interstellar reddening and distances in areas where interstellar reddening is high. We use a Bayesian approach to find the posterior distribution for Teff, Logg and [Fe/H] from a combination of reddened, photometric colors and spectroscopic line strengths. We will present calibration results for this technique using open cluster star data with known reddening and distances. Preliminary results suggest our technique can produce both reddening and distance determinations to within 10% of cluster values. Our technique opens the possibility of determining distances for blue stars at low Galactic latitudes where extinction can be large and differential. We will also compare our stellar parameter determinations to previously reported MK spectral classifications and discuss the probability that some of our stars are not members of their reported clusters.
NASA Astrophysics Data System (ADS)
Kel'manov, A. V.; Khandeev, V. I.
2016-02-01
The strongly NP-hard problem of partitioning a finite set of points of Euclidean space into two clusters of given sizes (cardinalities) minimizing the sum (over both clusters) of the intracluster sums of squared distances from the elements of the clusters to their centers is considered. It is assumed that the center of one of the sought clusters is specified at the desired (arbitrary) point of space (without loss of generality, at the origin), while the center of the other one is unknown and determined as the mean value over all elements of this cluster. It is shown that unless P = NP, there is no fully polynomial-time approximation scheme for this problem, and such a scheme is substantiated in the case of a fixed space dimension.
A 3D clustering approach for point clouds to detect and quantify changes at a rock glacier front
NASA Astrophysics Data System (ADS)
Micheletti, Natan; Tonini, Marj; Lane, Stuart N.
2016-04-01
Terrestrial Laser Scanners (TLS) are extensively used in geomorphology to remotely-sense landforms and surfaces of any type and to derive digital elevation models (DEMs). Modern devices are able to collect many millions of points, so that working on the resulting dataset is often troublesome in terms of computational efforts. Indeed, it is not unusual that raw point clouds are filtered prior to DEM creation, so that only a subset of points is retained and the interpolation process becomes less of a burden. Whilst this procedure is in many cases necessary, it implicates a considerable loss of valuable information. First, and even without eliminating points, the common interpolation of points to a regular grid causes a loss of potentially useful detail. Second, it inevitably causes the transition from 3D information to only 2.5D data where each (x,y) pair must have a unique z-value. Vector-based DEMs (e.g. triangulated irregular networks) partially mitigate these issues, but still require a set of parameters to be set and a considerable burden in terms of calculation and storage. Because of the reasons above, being able to perform geomorphological research directly on point clouds would be profitable. Here, we propose an approach to identify erosion and deposition patterns on a very active rock glacier front in the Swiss Alps to monitor sediment dynamics. The general aim is to set up a semiautomatic method to isolate mass movements using 3D-feature identification directly from LiDAR data. An ultra-long range LiDAR RIEGL VZ-6000 scanner was employed to acquire point clouds during three consecutive summers. In order to isolate single clusters of erosion and deposition we applied the Density-Based Scan Algorithm with Noise (DBSCAN), previously successfully employed by Tonini and Abellan (2014) in a similar case for rockfall detection. DBSCAN requires two input parameters, strongly influencing the number, shape and size of the detected clusters: the minimum number of points (i) at a maximum distance (ii) around each core-point. Under this condition, seed points are said to be density-reachable by a core point delimiting a cluster around it. A chain of intermediate seed-points can connect contiguous clusters allowing clusters of arbitrary shape to be defined. The novelty of the proposed approach consists in the implementation of the DBSCAN 3D-module, where the xyz-coordinates identify each point and the density of points within a sphere is considered. This allows detecting volumetric features with a higher accuracy, depending only on actual sampling resolution. The approach is truly 3D and exploits all TLS measurements without the need of interpolation or data reduction. Using this method, enhanced geomorphological activity during the summer of 2015 in respect to the previous two years was observed. We attribute this result to the exceptionally high temperatures of that summer, which we deem responsible for accelerating the melting process at the rock glacier front and probably also increasing creep velocities. References: - Tonini, M. and Abellan, A. (2014). Rockfall detection from terrestrial LiDAR point clouds: A clustering approach using R. Journal of Spatial Information Sciences. Number 8, pp95-110 - Hennig, C. Package fpc: Flexible procedures for clustering. https://cran.r-project.org/web/packages/fpc/index.html, 2015. Accessed 2016-01-12.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holmlid, Leif, E-mail: holmlid@chem.gu.se; Kotzias, Bernhard
Ultra-dense hydrogen H(0) with its typical H-H bond distance of 2.3 pm is superfluid at room temperature as expected for quantum fluids. It also shows a Meissner effect at room temperature, which indicates that a transition point to a non-superfluid state should exist above room temperature. This transition point is given by a disappearance of the superfluid long-chain clusters H{sub 2N}(0). This transition point is now measured for several metal carrier surfaces at 405 - 725 K, using both ultra-dense protium p(0) and deuterium D(0). Clusters of ordinary Rydberg matter H(l) as well as small symmetric clusters H{sub 4}(0) andmore » H{sub 3}(0) (which do not give a superfluid or superconductive phase) all still exist on the surface at high temperature. This shows directly that desorption or diffusion processes do not remove the long superfluid H{sub 2N}(0) clusters. The two ultra-dense forms p(0) and D(0) have different transition temperatures under otherwise identical conditions. The transition point for p(0) is higher in temperature, which is unexpected.« less
Focus-based filtering + clustering technique for power-law networks with small world phenomenon
NASA Astrophysics Data System (ADS)
Boutin, François; Thièvre, Jérôme; Hascoët, Mountaz
2006-01-01
Realistic interaction networks usually present two main properties: a power-law degree distribution and a small world behavior. Few nodes are linked to many nodes and adjacent nodes are likely to share common neighbors. Moreover, graph structure usually presents a dense core that is difficult to explore with classical filtering and clustering techniques. In this paper, we propose a new filtering technique accounting for a user-focus. This technique extracts a tree-like graph with also power-law degree distribution and small world behavior. Resulting structure is easily drawn with classical force-directed drawing algorithms. It is also quickly clustered and displayed into a multi-level silhouette tree (MuSi-Tree) from any user-focus. We built a new graph filtering + clustering + drawing API and report a case study.
Wavelet transform analysis of the small-scale X-ray structure of the cluster Abell 1367
NASA Technical Reports Server (NTRS)
Grebeney, S. A.; Forman, W.; Jones, C.; Murray, S.
1995-01-01
We have developed a new technique based on a wavelet transform analysis to quantify the small-scale (less than a few arcminutes) X-ray structure of clusters of galaxies. We apply this technique to the ROSAT position sensitive proportional counter (PSPC) and Einstein high-resolution imager (HRI) images of the central region of the cluster Abell 1367 to detect sources embedded within the diffuse intracluster medium. In addition to detecting sources and determining their fluxes and positions, we show that the wavelet analysis allows a characterization of the sources extents. In particular, the wavelet scale at which a given source achieves a maximum signal-to-noise ratio in the wavelet images provides an estimate of the angular extent of the source. To account for the widely varying point response of the ROSAT PSPC as a function of off-axis angle requires a quantitative measurement of the source size and a comparison to a calibration derived from the analysis of a Deep Survey image. Therefore, we assume that each source could be described as an isotropic two-dimensional Gaussian and used the wavelet amplitudes, at different scales, to determine the equivalent Gaussian Full Width Half-Maximum (FWHM) (and its uncertainty) appropriate for each source. In our analysis of the ROSAT PSPC image, we detect 31 X-ray sources above the diffuse cluster emission (within a radius of 24 min), 16 of which are apparently associated with cluster galaxies and two with serendipitous, background quasars. We find that the angular extents of 11 sources exceed the nominal width of the PSPC point-spread function. Four of these extended sources were previously detected by Bechtold et al. (1983) as 1 sec scale features using the Einstein HRI. The same wavelet analysis technique was applied to the Einstein HRI image. We detect 28 sources in the HRI image, of which nine are extended. Eight of the extended sources correspond to sources previously detected by Bechtold et al. Overall, using both the PSPC and the HRI observations, we detect 16 extended features, of which nine have galaxies coincided with the X-ray-measured positions (within the positional error circles). These extended sources have luminosities lying in the range (3 - 30) x 10(exp 40) ergs/s and gas masses of approximately (1 - 30) x 10(exp 9) solar mass, if the X-rays are of thermal origin. We confirm the presence of extended features in A1367 first reported by Bechtold et al. (1983). The nature of these systems remains uncertain. The luminosities are large if the emission is attributed to single galaxies, and several of the extended features have no associated galaxy counterparts. The extended features may be associated with galaxy groups, as suggested by Canizares, Fabbiano, & Trinchieri (1987), although the number required is large.
Le point sur les amas de galaxies
NASA Astrophysics Data System (ADS)
Pierre, M.
Clusters of galaxies: a review After having briefly described the 3 main components of clusters of galaxies (dark matter, gas and galaxies) we shall present clusters from a theoretical viewpoint: they are the largest entities known in the universe. Consequently, clusters of galaxies play a key role in any cosmological study and thus, are essential for our global understanding of the universe. In the general introduction, we shall outline this fundamental aspect, showing how the study of clusters can help to constrain the various cosmological scenarios. Once this cosmological framework is set, the next chapters will present a detailed analysis of cluster properties and of their cosmic evolution as observed in different wavebands mainly in the optical (galaxies), X-ray (gas) and radio (gas and particles) ranges. We shall see that the detailed study of a cluster is conditioned by the study of the interactions between its different components; this is the necessary step to ultimately derive the fundamental quantity which is the cluster mass. This will be the occasion to undertake an excursion into extremely varied physical processes such as the multi-phase nature of the intra-cluster medium, lensing phenomena, starbursts and morphology evolution in cluster galaxies or the interaction between the intra-cluster plasma and relativistic particles which are accelerated during cluster merging. For each waveband, we shall outline simply the dedicated observing and analysis techniques, which are of special interest in the case of space observations. Finally, we present several ambitious projects for the next observatory generation as well as their expected impact on the study of clusters of galaxies. Après avoir brièvement décrit les 3 constituants fondamentaux des amas de galaxies (matière noire, gaz et galaxies) nous présenterons les amas d'un point de vue plus théorique : ce sont les entités les plus massives à l'équilibre connues dans l'univers. Les amas de galaxies jouent donc un rôle de premier plan dans toute étude cosmologique et par conséquent, sont indispensables à notre compréhension globale de l'univers. Dans l'introduction générale, nous détaillons cet aspect fondamental en montrant comment l'étude des amas peut contribuer à contraindre les scénarios cosmologiques. Une fois le contexte scientifique délimité, les chapitres suivants s'attachent à présenter les diverses propriétés des amas et leur évolution cosmique observée dans diverses longueurs d'onde principalement dans les domaines visible (galaxies), X (gaz) et radio (gaz et particules). Nous verrons que l'étude détaillée d'un amas implique celle de l'interaction entre ses différentes composantes et est un passage obligé pour remonter au paramètre ultime (ou premier) qu'est sa masse. Loin d'être un détour ennuyeux, ceci sera l'occasion d'aborder des phénomènes physiques extrêmement variés tel l'aspect multi-phases du milieu intra-amas, les sursauts de formation d'étoiles et l'évolution morphologique des galaxies capturées par les amas, ou bien encore, l'interaction entre le plasma intra-amas et les particules relativistes accélérées lors de la fusion entre deux amas. Bien sûr, pour chaque longueur d'onde, nous ne manquerons pas de décrire simplement les techniques d'observation et d'analyse mises en oeuvre ; celles-ci sont particulièrement intéressantes dans le cas de l'instrumentation spatiale. Nous terminerons en présentant des projets d'observatoires pour l'horizon 2010 et leur impact prévu sur l'étude des amas de galaxies.
Analyzing survival curves at a fixed point in time for paired and clustered right-censored data
Su, Pei-Fang; Chi, Yunchan; Lee, Chun-Yi; Shyr, Yu; Liao, Yi-De
2018-01-01
In clinical trials, information about certain time points may be of interest in making decisions about treatment effectiveness. Rather than comparing entire survival curves, researchers can focus on the comparison at fixed time points that may have a clinical utility for patients. For two independent samples of right-censored data, Klein et al. (2007) compared survival probabilities at a fixed time point by studying a number of tests based on some transformations of the Kaplan-Meier estimators of the survival function. However, to compare the survival probabilities at a fixed time point for paired right-censored data or clustered right-censored data, their approach would need to be modified. In this paper, we extend the statistics to accommodate the possible within-paired correlation and within-clustered correlation, respectively. We use simulation studies to present comparative results. Finally, we illustrate the implementation of these methods using two real data sets. PMID:29456280
NASA Astrophysics Data System (ADS)
Antoni, R.; Passard, C.; Perot, B.; Guillaumin, F.; Mazy, C.; Batifol, M.; Grassi, G.
2018-07-01
AREVA NC is preparing to process, characterize and compact old used fuel metallic waste stored at La Hague reprocessing plant in view of their future storage ("Haute Activité Oxyde" HAO project). For a large part of these historical wastes, the packaging is planned in CSD-C canisters ("Colis Standard de Déchets Compacté s") in the ACC hulls and nozzles compaction facility ("Atelier de Compactage des Coques et embouts"). . This paper presents a new method to take into account the possible presence of fissile material clusters, which may have a significant impact in the active neutron interrogation (Differential Die-away Technique) measurement of the CSD-C canisters, in the industrial neutron measurement station "P2-2". A matrix effect correction has already been investigated to predict the prompt fission neutron calibration coefficient (which provides the fissile mass) from an internal "drum flux monitor" signal provided during the active measurement by a boron-coated proportional counter located in the measurement cavity, and from a "drum transmission signal" recorded in passive mode by the detection blocks, in presence of an AmBe point source in the measurement cell. Up to now, the relationship between the calibration coefficient and these signals was obtained from a factorial design that did not consider the potential for occurrence of fissile material clusters. The interrogative neutron self-shielding in these clusters was treated separately and resulted in a penalty coefficient larger than 20% to prevent an underestimation of the fissile mass within the drum. In this work, we have shown that the incorporation of a new parameter in the factorial design, representing the fissile mass fraction in these clusters, provides an alternative to the penalty coefficient. This new approach finally does not degrade the uncertainty of the original prediction, which was calculated without taking into consideration the possible presence of clusters. Consequently, the accuracy of the fissile mass assessment is improved by this new method, and this last should be extended to similar DDT measurement stations of larger drums, also using an internal monitor for matrix effect correction.
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Cappelletti, C. A.
1982-01-01
A stratification oriented to crop area and yield estimation problems was performed using an algorithm of clustering. The variables used were a set of agroclimatological characteristics measured in each one of the 232 municipalities of the State of Rio Grande do Sul, Brazil. A nonhierarchical cluster analysis was used and the pseudo F-statistics criterion was implemented for determining the "cut point" in the number of strata.
A deeper look at the X-ray point source population of NGC 4472
NASA Astrophysics Data System (ADS)
Joseph, T. D.; Maccarone, T. J.; Kraft, R. P.; Sivakoff, G. R.
2017-10-01
In this paper we discuss the X-ray point source population of NGC 4472, an elliptical galaxy in the Virgo cluster. We used recent deep Chandra data combined with archival Chandra data to obtain a 380 ks exposure time. We find 238 X-ray point sources within 3.7 arcmin of the galaxy centre, with a completeness flux, FX, 0.5-2 keV = 6.3 × 10-16 erg s-1 cm-2. Most of these sources are expected to be low-mass X-ray binaries. We finding that, using data from a single galaxy which is both complete and has a large number of objects (˜100) below 1038 erg s-1, the X-ray luminosity function is well fitted with a single power-law model. By cross matching our X-ray data with both space based and ground based optical data for NGC 4472, we find that 80 of the 238 sources are in globular clusters. We compare the red and blue globular cluster subpopulations and find red clusters are nearly six times more likely to host an X-ray source than blue clusters. We show that there is evidence that these two subpopulations have significantly different X-ray luminosity distributions. Source catalogues for all X-ray point sources, as well as any corresponding optical data for globular cluster sources, are also presented here.
Boll, Daniel T; Marin, Daniele; Redmon, Grace M; Zink, Stephen I; Merkle, Elmar M
2010-04-01
The purpose of our study was to evaluate whether two-point Dixon MRI using a 2D decomposition technique facilitates metabolite differentiation between lipids and iron in standardized in vitro liver phantoms with in vivo patient validation and allows semiquantitative in vitro assessment of metabolites associated with steatosis, iron overload, and combined disease. The acrylamide-based phantoms were made to reproduce the T1- and T2-weighted MRI appearances of physiologic hepatic parenchyma and hepatic steatosis-iron overload by the admixture of triglycerides and ferumoxides. Combined disease was simulated using joint admixtures of triglycerides and ferumoxides at various concentrations. For phantom validation, 30 patients were included, of whom 10 had steatosis, 10 had iron overload, and 10 had no liver disease. For MRI an in-phase/opposed-phase T1-weighted sequence with TR/TE(opposed-phase)/TE(in-phase) of 4.19/1.25/2.46 was used. Fat/water series were obtained by Dixon-based algorithms. In-phase and opposed-phase and fat/water ratios were calculated. Statistical cluster analysis assessed ratio pairs of physiologic liver, steatosis, iron overload, and combined disease in 2D metabolite discrimination plots. Statistical assessment proved that metabolite decomposition in phantoms simulating steatosis (1.77|0.22; in-phase/opposed-phase|fat/water ratios), iron overload (0.75|0.21), and healthy control subjects (1.09|0.05) formed three clusters with distinct ratio pairs. Patient validation for hepatic steatosis (3.29|0.51), iron overload (0.56|0.41), and normal control subjects (0.99|0.05) confirmed this clustering (p < 0.001). One-dimensional analysis assessing in vitro combined disease only with in-phase/opposed-phase ratios would have failed to characterize metabolites. The 2D analysis plotting in-phase/opposed-phase and fat/water ratios (2.16|0.59) provided accurate semiquantitative metabolite decomposition (p < 0.001). MR Dixon imaging facilitates metabolite decomposition of intrahepatic lipids and iron using in vitro phantoms with in vivo patient validation. The proposed decomposition technique identified distinct in-phase/opposed-phase and fat/water ratios for in vitro steatosis, iron overload, and combined disease.
GENIE(++): A Multi-Block Structured Grid System
NASA Technical Reports Server (NTRS)
Williams, Tonya; Nadenthiran, Naren; Thornburg, Hugh; Soni, Bharat K.
1996-01-01
The computer code GENIE++ is a continuously evolving grid system containing a multitude of proven geometry/grid techniques. The generation process in GENIE++ is based on an earlier version. The process uses several techniques either separately or in combination to quickly and economically generate sculptured geometry descriptions and grids for arbitrary geometries. The computational mesh is formed by using an appropriate algebraic method. Grid clustering is accomplished with either exponential or hyperbolic tangent routines which allow the user to specify a desired point distribution. Grid smoothing can be accomplished by using an elliptic solver with proper forcing functions. B-spline and Non-Uniform Rational B-splines (NURBS) algorithms are used for surface definition and redistribution. The built in sculptured geometry definition with desired distribution of points, automatic Bezier curve/surface generation for interior boundaries/surfaces, and surface redistribution is based on NURBS. Weighted Lagrance/Hermite transfinite interpolation methods, interactive geometry/grid manipulation modules, and on-line graphical visualization of the generation process are salient features of this system which result in a significant time savings for a given geometry/grid application.
A possibilistic approach to clustering
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Keller, James M.
1993-01-01
Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering methods in that total commitment of a vector to a given class is not required at each image pattern recognition iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from the 'Fuzzy C-Means' (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Recently, we cast the clustering problem into the framework of possibility theory using an approach in which the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
Three-dimensional tracking for efficient fire fighting in complex situations
NASA Astrophysics Data System (ADS)
Akhloufi, Moulay; Rossi, Lucile
2009-05-01
Each year, hundred millions hectares of forests burn causing human and economic losses. For efficient fire fighting, the personnel in the ground need tools permitting the prediction of fire front propagation. In this work, we present a new technique for automatically tracking fire spread in three-dimensional space. The proposed approach uses a stereo system to extract a 3D shape from fire images. A new segmentation technique is proposed and permits the extraction of fire regions in complex unstructured scenes. It works in the visible spectrum and combines information extracted from YUV and RGB color spaces. Unlike other techniques, our algorithm does not require previous knowledge about the scene. The resulting fire regions are classified into different homogenous zones using clustering techniques. Contours are then extracted and a feature detection algorithm is used to detect interest points like local maxima and corners. Extracted points from stereo images are then used to compute the 3D shape of the fire front. The resulting data permits to build the fire volume. The final model is used to compute important spatial and temporal fire characteristics like: spread dynamics, local orientation, heading direction, etc. Tests conducted on the ground show the efficiency of the proposed scheme. This scheme is being integrated with a fire spread mathematical model in order to predict and anticipate the fire behaviour during fire fighting. Also of interest to fire-fighters, is the proposed automatic segmentation technique that can be used in early detection of fire in complex scenes.
The importance of topographically corrected null models for analyzing ecological point processes.
McDowall, Philip; Lynch, Heather J
2017-07-01
Analyses of point process patterns and related techniques (e.g., MaxEnt) make use of the expected number of occurrences per unit area and second-order statistics based on the distance between occurrences. Ecologists working with point process data often assume that points exist on a two-dimensional x-y plane or within a three-dimensional volume, when in fact many observed point patterns are generated on a two-dimensional surface existing within three-dimensional space. For many surfaces, however, such as the topography of landscapes, the projection from the surface to the x-y plane preserves neither area nor distance. As such, when these point patterns are implicitly projected to and analyzed in the x-y plane, our expectations of the point pattern's statistical properties may not be met. When used in hypothesis testing, we find that the failure to account for the topography of the generating surface may bias statistical tests that incorrectly identify clustering and, furthermore, may bias coefficients in inhomogeneous point process models that incorporate slope as a covariate. We demonstrate the circumstances under which this bias is significant, and present simple methods that allow point processes to be simulated with corrections for topography. These point patterns can then be used to generate "topographically corrected" null models against which observed point processes can be compared. © 2017 by the Ecological Society of America.
Fully convolutional network with cluster for semantic segmentation
NASA Astrophysics Data System (ADS)
Ma, Xiao; Chen, Zhongbi; Zhang, Jianlin
2018-04-01
At present, image semantic segmentation technology has been an active research topic for scientists in the field of computer vision and artificial intelligence. Especially, the extensive research of deep neural network in image recognition greatly promotes the development of semantic segmentation. This paper puts forward a method based on fully convolutional network, by cluster algorithm k-means. The cluster algorithm using the image's low-level features and initializing the cluster centers by the super-pixel segmentation is proposed to correct the set of points with low reliability, which are mistakenly classified in great probability, by the set of points with high reliability in each clustering regions. This method refines the segmentation of the target contour and improves the accuracy of the image segmentation.
Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.; ...
2017-11-23
We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Hung-Jin; Mandelbaum, Rachel; Freeman, Peter E.
We study the orientations of satellite galaxies in redMaPPer clusters constructed from the Sloan Digital Sky Survey at 0.1 < z < 0.35 to determine whether there is any preferential tendency for satellites to point radially towards cluster centres. Here, we analyse the satellite alignment (SA) signal based on three shape measurement methods (re-Gaussianization, de Vaucouleurs, and isophotal shapes), which trace galaxy light profiles at different radii. The measured SA signal depends on these shape measurement methods. We detect the strongest SA signal in isophotal shapes, followed by de Vaucouleurs shapes. While no net SA signal is detected using re-Gaussianizationmore » shapes across the entire sample, the observed SA signal reaches a statistically significant level when limiting to a subsample of higher luminosity satellites. We further investigate the impact of noise, systematics, and real physical isophotal twisting effects in the comparison between the SA signal detected via different shape measurement methods. Unlike previous studies, which only consider the dependence of SA on a few parameters, here we explore a total of 17 galaxy and cluster properties, using a statistical model averaging technique to naturally account for parameter correlations and identify significant SA predictors. We find that the measured SA signal is strongest for satellites with the following characteristics: higher luminosity, smaller distance to the cluster centre, rounder in shape, higher bulge fraction, and distributed preferentially along the major axis directions of their centrals. Finally, we provide physical explanations for the identified dependences and discuss the connection to theories of SA.« less
OPEN CLUSTERS AS PROBES OF THE GALACTIC MAGNETIC FIELD. I. CLUSTER PROPERTIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoq, Sadia; Clemens, D. P., E-mail: shoq@bu.edu, E-mail: clemens@bu.edu
2015-10-15
Stars in open clusters are powerful probes of the intervening Galactic magnetic field via background starlight polarimetry because they provide constraints on the magnetic field distances. We use 2MASS photometric data for a sample of 31 clusters in the outer Galaxy for which near-IR polarimetric data were obtained to determine the cluster distances, ages, and reddenings via fitting theoretical isochrones to cluster color–magnitude diagrams. The fitting approach uses an objective χ{sup 2} minimization technique to derive the cluster properties and their uncertainties. We found the ages, distances, and reddenings for 24 of the clusters, and the distances and reddenings formore » 6 additional clusters that were either sparse or faint in the near-IR. The derived ranges of log(age), distance, and E(B−V) were 7.25–9.63, ∼670–6160 pc, and 0.02–1.46 mag, respectively. The distance uncertainties ranged from ∼8% to 20%. The derived parameters were compared to previous studies, and most cluster parameters agree within our uncertainties. To test the accuracy of the fitting technique, synthetic clusters with 50, 100, or 200 cluster members and a wide range of ages were fit. These tests recovered the input parameters within their uncertainties for more than 90% of the individual synthetic cluster parameters. These results indicate that the fitting technique likely provides reliable estimates of cluster properties. The distances derived will be used in an upcoming study of the Galactic magnetic field in the outer Galaxy.« less
The applicability and effectiveness of cluster analysis
NASA Technical Reports Server (NTRS)
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
Chemodynamical Clustering Applied to APOGEE Data: Rediscovering Globular Clusters
NASA Astrophysics Data System (ADS)
Chen, Boquan; D’Onghia, Elena; Pardy, Stephen A.; Pasquali, Anna; Bertelli Motta, Clio; Hanlon, Bret; Grebel, Eva K.
2018-06-01
We have developed a novel technique based on a clustering algorithm that searches for kinematically and chemically clustered stars in the APOGEE DR12 Cannon data. As compared to classical chemical tagging, the kinematic information included in our methodology allows us to identify stars that are members of known globular clusters with greater confidence. We apply our algorithm to the entire APOGEE catalog of 150,615 stars whose chemical abundances are derived by the Cannon. Our methodology found anticorrelations between the elements Al and Mg, Na and O, and C and N previously identified in the optical spectra in globular clusters, even though we omit these elements in our algorithm. Our algorithm identifies globular clusters without a priori knowledge of their locations in the sky. Thus, not only does this technique promise to discover new globular clusters, but it also allows us to identify candidate streams of kinematically and chemically clustered stars in the Milky Way.
Multi-Scale Voxel Segmentation for Terrestrial Lidar Data within Marshes
NASA Astrophysics Data System (ADS)
Nguyen, C. T.; Starek, M. J.; Tissot, P.; Gibeaut, J. C.
2016-12-01
The resilience of marshes to a rising sea is dependent on their elevation response. Terrestrial laser scanning (TLS) is a detailed topographic approach for accurate, dense surface measurement with high potential for monitoring of marsh surface elevation response. The dense point cloud provides a 3D representation of the surface, which includes both terrain and non-terrain objects. Extraction of topographic information requires filtering of the data into like-groups or classes, therefore, methods must be incorporated to identify structure in the data prior to creation of an end product. A voxel representation of three-dimensional space provides quantitative visualization and analysis for pattern recognition. The objectives of this study are threefold: 1) apply a multi-scale voxel approach to effectively extract geometric features from the TLS point cloud data, 2) investigate the utility of K-means and Self Organizing Map (SOM) clustering algorithms for segmentation, and 3) utilize a variety of validity indices to measure the quality of the result. TLS data were collected at a marsh site along the central Texas Gulf Coast using a Riegl VZ 400 TLS. The site consists of both exposed and vegetated surface regions. To characterize structure of the point cloud, octree segmentation is applied to create a tree data structure of voxels containing the points. The flexibility of voxels in size and point density makes this algorithm a promising candidate to locally extract statistical and geometric features of the terrain including surface normal and curvature. The characteristics of the voxel itself such as the volume and point density are also computed and assigned to each point as are laser pulse characteristics. The features extracted from the voxelization are then used as input for clustering of the points using the K-means and SOM clustering algorithms. Optimal number of clusters are then determined based on evaluation of cluster separability criterions. Results for different combinations of the feature space vector and differences between K-means and SOM clustering will be presented. The developed method provides a novel approach for compressing TLS scene complexity in marshes, such as for vegetation biomass studies or erosion monitoring.
Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique
ERIC Educational Resources Information Center
Steinley, Douglas
2006-01-01
Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate…
ERIC Educational Resources Information Center
Firdausiah Mansur, Andi Besse; Yusof, Norazah
2013-01-01
Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…
Tang, Haijing; Wang, Siye; Zhang, Yanjun
2013-01-01
Clustering has become a common trend in very long instruction words (VLIW) architecture to solve the problem of area, energy consumption, and design complexity. Register-file-connected clustered (RFCC) VLIW architecture uses the mechanism of global register file to accomplish the inter-cluster data communications, thus eliminating the performance and energy consumption penalty caused by explicit inter-cluster data move operations in traditional bus-connected clustered (BCC) VLIW architecture. However, the limit number of access ports to the global register file has become an issue which must be well addressed; otherwise the performance and energy consumption would be harmed. In this paper, we presented compiler optimization techniques for an RFCC VLIW architecture called Lily, which is designed for encryption systems. These techniques aim at optimizing performance and energy consumption for Lily architecture, through appropriate manipulation of the code generation process to maintain a better management of the accesses to the global register file. All the techniques have been implemented and evaluated. The result shows that our techniques can significantly reduce the penalty of performance and energy consumption due to access port limitation of global register file. PMID:23970841
Active learning for semi-supervised clustering based on locally linear propagation reconstruction.
Chang, Chin-Chun; Lin, Po-Yi
2015-03-01
The success of semi-supervised clustering relies on the effectiveness of side information. To get effective side information, a new active learner learning pairwise constraints known as must-link and cannot-link constraints is proposed in this paper. Three novel techniques are developed for learning effective pairwise constraints. The first technique is used to identify samples less important to cluster structures. This technique makes use of a kernel version of locally linear embedding for manifold learning. Samples neither important to locally linear propagation reconstructions of other samples nor on flat patches in the learned manifold are regarded as unimportant samples. The second is a novel criterion for query selection. This criterion considers not only the importance of a sample to expanding the space coverage of the learned samples but also the expected number of queries needed to learn the sample. To facilitate semi-supervised clustering, the third technique yields inferred must-links for passing information about flat patches in the learned manifold to semi-supervised clustering algorithms. Experimental results have shown that the learned pairwise constraints can capture the underlying cluster structures and proven the feasibility of the proposed approach. Copyright © 2014 Elsevier Ltd. All rights reserved.
The use of nuclear muprobe techniques to study the chemistry of lacustrine sediments and particles
NASA Astrophysics Data System (ADS)
Grime, G. W.; Davison, W.
1993-05-01
The Oxford SPM has been used in two novel studies of lake chemistry: (a) The distribution of dissolved iron in sediment pore waters close to the sediment/water interface has been measured using the novel technique of diffusive equilibration in a thin film (DET). In this technique, which has a spatial resolution of < 1 mm, much less than that of competing techniques (1 cm), a thin layer of polyacrylamide gel is inserted into the sediment and after the rapid equilibration with the pore water, the gel is dried and fixed. The distribution of trace elements can then be measured using mubeam PIXE. Preliminary results have shown for the first time a subsurface maximum of Fe consistent with current theories of Fe dynamics. This paper presents some results obtained using the technique and discusses the limits on resolution and sensitivity, (b) Individual suspended lake particles (predominantly iron oxides and sulphides) have been analysed using point mubeam RBS and PIXE. Of particular interest in this study is the oxidation state of iron rich particles, so RBS with a 1 μm beam was used to determine the Fe: O stoichiometry of single particles. The particles were filtered from a depth of 14 m in Esthwaite Water in the English Lake District and handled in anoxic conditions until evacuation in the SPM sample chamber. Two distinct compositions of iron oxide were determined in clusters of about 5 μm diameter. Analysis by PIXE revealed that FeS was uniformly distributed in the particulate material and that it also contained elevated levels of Cu and Zn. This study was the first to demonstrate directly that discrete clusters of iron oxides are present in black particulate material which is commonly considered to comprise iron sulphides.
van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A
2009-06-01
Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.
Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.
1982-02-26
UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an
Normal versus High Tension Glaucoma: A Comparison of Functional and Structural Defects
Thonginnetra, Oraorn; Greenstein, Vivienne C.; Chu, David; Liebmann, Jeffrey M.; Ritch, Robert; Hood, Donald C.
2009-01-01
Purpose To compare visual field defects obtained with both multifocal visual evoked potential (mfVEP) and Humphrey visual field (HVF) techniques to topographic optic disc measurements in patients with normal tension glaucoma (NTG) and high tension glaucoma (HTG). Methods We studied 32 patients with NTG and 32 with HTG. All patients had reliable 24-2 HVFs with a mean deviation (MD) of −10 dB or better, a glaucomatous optic disc and an abnormal HVF in at least one eye. Multifocal VEPs were obtained from each eye and probability plots created. The mfVEP and HVF probability plots were divided into a central 10-degree (radius) and an outer arcuate subfield in both superior and inferior hemifields. Cluster analyses and counts of abnormal points were performed in each subfield. Optic disc images were obtained with the Heidelberg Retina Tomograph III (HRT III). Eleven stereometric parameters were calculated. Moorfields regression analysis (MRA) and the glaucoma probability score (GPS) were performed. Results There were no significant differences in MD and PSD values between NTG and HTG eyes. However, NTG eyes had a higher percentage of abnormal test points and clusters of abnormal points in the central subfields on both mfVEP and HVF than HTG eyes. For HRT III, there were no significant differences in the 11 stereometric parameters or in the MRA and GPS analyses of the optic disc images. Conclusions The visual field data suggest more localized and central defects for NTG than HTG. PMID:19223786
Focusing cosmic telescopes: systematics of strong lens modeling
NASA Astrophysics Data System (ADS)
Johnson, Traci Lin; Sharon, Keren q.
2018-01-01
The use of strong gravitational lensing by galaxy clusters has become a popular method for studying the high redshift universe. While diverse in computational methods, lens modeling techniques have grasped the means for determining statistical errors on cluster masses and magnifications. However, the systematic errors have yet to be quantified, arising from the number of constraints, availablity of spectroscopic redshifts, and various types of image configurations. I will be presenting my dissertation work on quantifying systematic errors in parametric strong lensing techniques. I have participated in the Hubble Frontier Fields lens model comparison project, using simulated clusters to compare the accuracy of various modeling techniques. I have extended this project to understanding how changing the quantity of constraints affects the mass and magnification. I will also present my recent work extending these studies to clusters in the Outer Rim Simulation. These clusters are typical of the clusters found in wide-field surveys, in mass and lensing cross-section. These clusters have fewer constraints than the HFF clusters and thus, are more susceptible to systematic errors. With the wealth of strong lensing clusters discovered in surveys such as SDSS, SPT, DES, and in the future, LSST, this work will be influential in guiding the lens modeling efforts and follow-up spectroscopic campaigns.
Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Ling; Lee, Doris; Sim, Alex
Current practice in whole time series clustering of residential meter data focuses on aggregated or subsampled load data at the customer level, which ignores day-to-day differences within customers. This information is critical to determine each customer’s suitability to various demand side management strategies that support intelligent power grids and smart energy management. Clustering daily load shapes provides fine-grained information on customer attributes and sources of variation for subsequent models and customer segmentation. In this paper, we apply 11 clustering methods to daily residential meter data. We evaluate their parameter settings and suitability based on 6 generic performance metrics and post-checkingmore » of resulting clusters. Finally, we recommend suitable techniques and parameters based on the goal of discovering diverse daily load patterns among residential customers. To the authors’ knowledge, this paper is the first robust comparative review of clustering techniques applied to daily residential load shape time series in the power systems’ literature.« less
Coastline complexity: A parameter for functional classification of coastal environments
Bartley, J.D.; Buddemeier, R.W.; Bennett, D.A.
2001-01-01
To understand the role of the world's coastal zone (CZ) in global biogeochemical fluxes (particularly those of carbon, nitrogen, phosphorus, and sediments) we must generalise from a limited number of observations associated with a few well-studied coastal systems to the global scale. Global generalisation must be based on globally available data and on robust techniques for classification and upscaling. These requirements impose severe constraints on the set of variables that can be used to extract information about local CZ functions such as advective and metabolic fluxes, and differences resulting from changes in biotic communities. Coastal complexity (plan-view tortuosity of the coastline) is a potentially useful parameter, since it interacts strongly with both marine and terrestrial forcing functions to determine coastal energy regimes and water residence times, and since 'open' vs. 'sheltered' categories are important components of most coastal habitat classification schemes. This study employs the World Vector Shoreline (WVS) dataset, originally developed at a scale of 1:250 000. Coastline complexity measures are generated using a modification of the Angle Measurement Technique (AMT), in which the basic measurement is the angle between two lines of specified length drawn from a selected point to the closest points of intersection with the coastline. Repetition of these measurements for different lengths at the same point yields a distribution of angles descriptive of the extent and scale of complexity in the vicinity of that point; repetition of the process at different points on the coast provides a basis for comparing both the extent and the characteristic scale of coastline variation along different reaches of the coast. The coast of northwestern Mexico (Baja California and the Gulf of California) was used as a case study for initial development and testing of the method. The characteristic angle distribution plots generated by the AMT analysis were clustered using LOICZVIEW, a high dimensionality clustering routine developed for large-scale coastal classification studies. The results show distinctive differences in coastal environments that have the potential for interpretation in terms of both biotic and hydrogeochemical environments, and that can be related to the resolution limits and uncertainties of the shoreline data used. These objective, quantitative measures of coastal complexity as a function of scale can be further developed and combined with other data sets to provide a key component of functional classification of coastal environments. ?? 2001 Elsevier Science B.V. All rights reserved.
An algol program for dissimilarity analysis: a divisive-omnithetic clustering technique
Tipper, J.C.
1979-01-01
Clustering techniques are used properly to generate hypotheses about patterns in data. Of the hierarchical techniques, those which are divisive and omnithetic possess many theoretically optimal properties. One such method, dissimilarity analysis, is implemented here in ALGOL 60, and determined to be competitive computationally with most other methods. ?? 1979.
Automated interpretation of 3D laserscanned point clouds for plant organ segmentation.
Wahabzada, Mirwaes; Paulus, Stefan; Kersting, Kristian; Mahlein, Anne-Katrin
2015-08-08
Plant organ segmentation from 3D point clouds is a relevant task for plant phenotyping and plant growth observation. Automated solutions are required to increase the efficiency of recent high-throughput plant phenotyping pipelines. However, plant geometrical properties vary with time, among observation scales and different plant types. The main objective of the present research is to develop a fully automated, fast and reliable data driven approach for plant organ segmentation. The automated segmentation of plant organs using unsupervised, clustering methods is crucial in cases where the goal is to get fast insights into the data or no labeled data is available or costly to achieve. For this we propose and compare data driven approaches that are easy-to-realize and make the use of standard algorithms possible. Since normalized histograms, acquired from 3D point clouds, can be seen as samples from a probability simplex, we propose to map the data from the simplex space into Euclidean space using Aitchisons log ratio transformation, or into the positive quadrant of the unit sphere using square root transformation. This, in turn, paves the way to a wide range of commonly used analysis techniques that are based on measuring the similarities between data points using Euclidean distance. We investigate the performance of the resulting approaches in the practical context of grouping 3D point clouds and demonstrate empirically that they lead to clustering results with high accuracy for monocotyledonous and dicotyledonous plant species with diverse shoot architecture. An automated segmentation of 3D point clouds is demonstrated in the present work. Within seconds first insights into plant data can be deviated - even from non-labelled data. This approach is applicable to different plant species with high accuracy. The analysis cascade can be implemented in future high-throughput phenotyping scenarios and will support the evaluation of the performance of different plant genotypes exposed to stress or in different environmental scenarios.
Structured background grids for generation of unstructured grids by advancing front method
NASA Technical Reports Server (NTRS)
Pirzadeh, Shahyar
1991-01-01
A new method of background grid construction is introduced for generation of unstructured tetrahedral grids using the advancing-front technique. Unlike the conventional triangular/tetrahedral background grids which are difficult to construct and usually inadequate in performance, the new method exploits the simplicity of uniform Cartesian meshes and provides grids of better quality. The approach is analogous to solving a steady-state heat conduction problem with discrete heat sources. The spacing parameters of grid points are distributed over the nodes of a Cartesian background grid by interpolating from a few prescribed sources and solving a Poisson equation. To increase the control over the grid point distribution, a directional clustering approach is used. The new method is convenient to use and provides better grid quality and flexibility. Sample results are presented to demonstrate the power of the method.
NASA Technical Reports Server (NTRS)
Shih, T. I.-P.; Roelke, R. J.; Steinthorsson, E.
1991-01-01
In order to study numerically details of the flow and heat transfer within coolant passages of turbine blades, a method must first be developed to generate grid systems within the very complicated geometries involved. In this study, a grid generation package was developed that is capable of generating the required grid systems. The package developed is based on an algebraic grid generation technique that permits the user considerable control over how grid points are to be distributed in a very explicit way. These controls include orthogonality of grid lines next to boundary surfaces and ability to cluster about arbitrary points, lines, and surfaces. This paper describes that grid generation package and shows how it can be used to generate grid systems within complicated-shaped coolant passages via an example.
Mayer-cluster expansion of instanton partition functions and thermodynamic bethe ansatz
NASA Astrophysics Data System (ADS)
Meneghelli, Carlo; Yang, Gang
2014-05-01
In [19] Nekrasov and Shatashvili pointed out that the = 2 instanton partition function in a special limit of the Ω-deformation parameters is characterized by certain thermodynamic Bethe ansatz (TBA) like equations. In this work we present an explicit derivation of this fact as well as generalizations to quiver gauge theories. To do so we combine various techniques like the iterated Mayer expansion, the method of expansion by regions, and the path integral tricks for non-perturbative summation. The TBA equations derived entirely within gauge theory have been proposed to encode the spectrum of a large class of quantum integrable systems. We hope that the derivation presented in this paper elucidates further this completely new point of view on the origin, as well as on the structure, of TBA equations in integrable models.
NASA Astrophysics Data System (ADS)
Niwase, Hiroaki; Takada, Naoki; Araki, Hiromitsu; Maeda, Yuki; Fujiwara, Masato; Nakayama, Hirotaka; Kakue, Takashi; Shimobaba, Tomoyoshi; Ito, Tomoyoshi
2016-09-01
Parallel calculations of large-pixel-count computer-generated holograms (CGHs) are suitable for multiple-graphics processing unit (multi-GPU) cluster systems. However, it is not easy for a multi-GPU cluster system to accomplish fast CGH calculations when CGH transfers between PCs are required. In these cases, the CGH transfer between the PCs becomes a bottleneck. Usually, this problem occurs only in multi-GPU cluster systems with a single spatial light modulator. To overcome this problem, we propose a simple method using the InfiniBand network. The computational speed of the proposed method using 13 GPUs (NVIDIA GeForce GTX TITAN X) was more than 3000 times faster than that of a CPU (Intel Core i7 4770) when the number of three-dimensional (3-D) object points exceeded 20,480. In practice, we achieved ˜40 tera floating point operations per second (TFLOPS) when the number of 3-D object points exceeded 40,960. Our proposed method was able to reconstruct a real-time movie of a 3-D object comprising 95,949 points.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Improving clustering with metabolic pathway data.
Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando
2014-04-10
It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a web-demo at http://fich.unl.edu.ar/sinc/web-demo/bsom-lite/. The source code and the data sets supporting the results of this article are available at http://sourceforge.net/projects/sourcesinc/files/bsom.
Microwave Heating of Metal Power Clusters
NASA Astrophysics Data System (ADS)
Rybakov, K. I.; Semenov, V. E.; Volkovskaya, I. I.
2018-01-01
The results of simulating the rapid microwave heating of spherical clusters of metal particles to the melting point are reported. In the simulation, the cluster is subjected to a plane electromagnetic wave. The cluster size is comparable to the wavelength; the perturbations of the field inside the cluster are accounted for within an effective medium approximation. It is shown that the time of heating in vacuum to the melting point does not exceed 1 s when the electric field strength in the incident wave is about 2 kV/cm at a frequency of 24 GHz or 5 kV/cm at a frequency of 2.45 GHz. The obtained results demonstrate feasibility of using rapid microwave heating for the spheroidization of metal particles with an objective to produce high-quality powders for additive manufacturing technologies.
Clustering: An Interactive Technique to Enhance Learning in Biology.
ERIC Educational Resources Information Center
Ambron, Joanna
1988-01-01
Explains an interdisciplinary approach to biology and writing which increases students' mastery of vocabulary, scientific concepts, creativity, and expression. Describes modifications of the clustering technique used to summarize lectures, integrate reading and understand textbook material. (RT)
Constructing the L2-Graph for Robust Subspace Learning and Subspace Clustering.
Peng, Xi; Yu, Zhiding; Yi, Zhang; Tang, Huajin
2017-04-01
Under the framework of graph-based learning, the key to robust subspace clustering and subspace learning is to obtain a good similarity graph that eliminates the effects of errors and retains only connections between the data points from the same subspace (i.e., intrasubspace data points). Recent works achieve good performance by modeling errors into their objective functions to remove the errors from the inputs. However, these approaches face the limitations that the structure of errors should be known prior and a complex convex problem must be solved. In this paper, we present a novel method to eliminate the effects of the errors from the projection space (representation) rather than from the input space. We first prove that l 1 -, l 2 -, l ∞ -, and nuclear-norm-based linear projection spaces share the property of intrasubspace projection dominance, i.e., the coefficients over intrasubspace data points are larger than those over intersubspace data points. Based on this property, we introduce a method to construct a sparse similarity graph, called L2-graph. The subspace clustering and subspace learning algorithms are developed upon L2-graph. We conduct comprehensive experiment on subspace learning, image clustering, and motion segmentation and consider several quantitative benchmarks classification/clustering accuracy, normalized mutual information, and running time. Results show that L2-graph outperforms many state-of-the-art methods in our experiments, including L1-graph, low rank representation (LRR), and latent LRR, least square regression, sparse subspace clustering, and locally linear representation.
Unsupervised color image segmentation using a lattice algebra clustering technique
NASA Astrophysics Data System (ADS)
Urcid, Gonzalo; Ritter, Gerhard X.
2011-08-01
In this paper we introduce a lattice algebra clustering technique for segmenting digital images in the Red-Green- Blue (RGB) color space. The proposed technique is a two step procedure. Given an input color image, the first step determines the finite set of its extreme pixel vectors within the color cube by means of the scaled min-W and max-M lattice auto-associative memory matrices, including the minimum and maximum vector bounds. In the second step, maximal rectangular boxes enclosing each extreme color pixel are found using the Chebychev distance between color pixels; afterwards, clustering is performed by assigning each image pixel to its corresponding maximal box. The two steps in our proposed method are completely unsupervised or autonomous. Illustrative examples are provided to demonstrate the color segmentation results including a brief numerical comparison with two other non-maximal variations of the same clustering technique.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
A Fast Implementation of the Isodata Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Le Moigne, Jacqueline; Mount, David M.; Netanyahu, Nathan S.
2007-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to IsoDATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Zhang, Junfeng; Chen, Wei; Gao, Mingyi; Shen, Gangxiang
2017-10-30
In this work, we proposed two k-means-clustering-based algorithms to mitigate the fiber nonlinearity for 64-quadrature amplitude modulation (64-QAM) signal, the training-sequence assisted k-means algorithm and the blind k-means algorithm. We experimentally demonstrated the proposed k-means-clustering-based fiber nonlinearity mitigation techniques in 75-Gb/s 64-QAM coherent optical communication system. The proposed algorithms have reduced clustering complexity and low data redundancy and they are able to quickly find appropriate initial centroids and select correctly the centroids of the clusters to obtain the global optimal solutions for large k value. We measured the bit-error-ratio (BER) performance of 64-QAM signal with different launched powers into the 50-km single mode fiber and the proposed techniques can greatly mitigate the signal impairments caused by the amplified spontaneous emission noise and the fiber Kerr nonlinearity and improve the BER performance.
Concerted hydrogen atom exchange between three HF molecules
NASA Technical Reports Server (NTRS)
Komornicki, Andrew; Dixon, David A.; Taylor, Peter R.
1992-01-01
We have investigated the termolecular reaction involving concerted hydrogen exchange between three HF molecules, with particular emphasis on the effects of correlation at the various stationary points along the reaction. Using an extended basis, we have located the geometries of the stable hydrogen-bonded trimer, which is of C(sub 3h) symmetry, and the transition state for hydrogen exchange, which is of D(sub 3h) symmetry. The energies of the exchange reation were then evaluated at the correlated level, using a large atomic natural orbital basis and correlating all valence electrons. Several correlation treatments were used, namely, configration interaction with single and double excitations, coupled-pair functional, and coupled-cluster methods. We are thus able to measure the effect of accounting for size-extensivity. Zero-point corrections to the correlated level energetics were determined using analytic second derivative techniques at the SCF level. Our best calculations, which include the effects of connected triple excitations in the coupled-cluster procedure, indicate that the trimer is bound by 9 +/- 1 kcal/mol relative to three separate monomers, in excellent agreement with previous estimates. The barrier to concerted hydrogen exchange is 15 kcal/mol above the trimer, or only 4.7 kcal/mol above three separated monomers. Thus the barrier to hydrogen exchange between HF molecules via this termolecular process is very low.
Ages of Extragalactic Intermediate-Age Star Clusters
NASA Technical Reports Server (NTRS)
Flower, P. J.
1983-01-01
A dating technique for faint, distant star clusters observable in the local group of galaxies with the space telescope is discussed. Color-magnitude diagrams of Magellanic Cloud clusters are mentioned along with the metallicity of star clusters.
NASA Astrophysics Data System (ADS)
Balazs, A. C.; Johnson, K. H.
1982-01-01
Electronic structures have been calculated for 5-, 6-, and 10-atom Pt clusters, as well as for a Pt(PH 3) 4 coordination complex, using the self-consistent-field X-alpha scattered-wave (SCF-Xα-SW) molecular-orbital technique. The 10-atom cluster models the local geometry of a flat, unreconstructed Pt(100) surface, while the 5- and 6-atom clusters show features of a stepped Pt surface. Pt(PH 3) 4 resembles the chemically similar homogeneous catalyst Pt(PPh 3) 4. Common to all these coordinatively unsaturated complexes are orbitals lying near or coinciding with the highest occupied molecular orbital ("Fermi level") which show pronounced d lobes pointing directly into the vacuum. Under the hypothesis that these molecular orbitals are mainly responsible for the chemical activities of the above species, one can account for the relative similarities and differences in catalytic activity and selectivity displayed by unreconstructed Pt(100) surfaces, stepped Pt surfaces or particles, and isolated Pt(PPh 3) 4 coordination complexes. The relevance of these findings to catalyst-support interactions is also discussed. Finally, relativistic corrections to the electronic structures are calculated and their implications on catalytic properties discussed.
Estimation of dew point temperature using neuro-fuzzy and neural network techniques
NASA Astrophysics Data System (ADS)
Kisi, Ozgur; Kim, Sungwon; Shiri, Jalal
2013-11-01
This study investigates the ability of two different artificial neural network (ANN) models, generalized regression neural networks model (GRNNM) and Kohonen self-organizing feature maps neural networks model (KSOFM), and two different adaptive neural fuzzy inference system (ANFIS) models, ANFIS model with sub-clustering identification (ANFIS-SC) and ANFIS model with grid partitioning identification (ANFIS-GP), for estimating daily dew point temperature. The climatic data that consisted of 8 years of daily records of air temperature, sunshine hours, wind speed, saturation vapor pressure, relative humidity, and dew point temperature from three weather stations, Daego, Pohang, and Ulsan, in South Korea were used in the study. The estimates of ANN and ANFIS models were compared according to the three different statistics, root mean square errors, mean absolute errors, and determination coefficient. Comparison results revealed that the ANFIS-SC, ANFIS-GP, and GRNNM models showed almost the same accuracy and they performed better than the KSOFM model. Results also indicated that the sunshine hours, wind speed, and saturation vapor pressure have little effect on dew point temperature. It was found that the dew point temperature could be successfully estimated by using T mean and R H variables.
Why do gallium clusters have a higher melting point than the bulk?
Chacko, S; Joshi, Kavita; Kanhere, D G; Blundell, S A
2004-04-02
Density functional molecular dynamical simulations have been performed on Ga17 and Ga13 clusters to understand the recently observed higher-than-bulk melting temperatures in small gallium clusters [Phys. Rev. Lett. 91, 215508 (2003)
Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita
2016-04-01
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis
NASA Astrophysics Data System (ADS)
Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song
2018-01-01
To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.
NASA Astrophysics Data System (ADS)
Chen, Xin; Liu, Li; Zhou, Sida; Yue, Zhenjiang
2016-09-01
Reduced order models(ROMs) based on the snapshots on the CFD high-fidelity simulations have been paid great attention recently due to their capability of capturing the features of the complex geometries and flow configurations. To improve the efficiency and precision of the ROMs, it is indispensable to add extra sampling points to the initial snapshots, since the number of sampling points to achieve an adequately accurate ROM is generally unknown in prior, but a large number of initial sampling points reduces the parsimony of the ROMs. A fuzzy-clustering-based adding-point strategy is proposed and the fuzzy clustering acts an indicator of the region in which the precision of ROMs is relatively low. The proposed method is applied to construct the ROMs for the benchmark mathematical examples and a numerical example of hypersonic aerothermodynamics prediction for a typical control surface. The proposed method can achieve a 34.5% improvement on the efficiency than the estimated mean squared error prediction algorithm and shows same-level prediction accuracy.
Evaluation of null-point detection methods on simulation data
NASA Astrophysics Data System (ADS)
Olshevsky, Vyacheslav; Fu, Huishan; Vaivads, Andris; Khotyaintsev, Yuri; Lapenta, Giovanni; Markidis, Stefano
2014-05-01
We model the measurements of artificial spacecraft that resemble the configuration of CLUSTER propagating in the particle-in-cell simulation of turbulent magnetic reconnection. The simulation domain contains multiple isolated X-type null-points, but the majority are O-type null-points. Simulations show that current pinches surrounded by twisted fields, analogous to laboratory pinches, are formed along the sequences of O-type nulls. In the simulation, the magnetic reconnection is mainly driven by the kinking of the pinches, at spatial scales of several ion inertial lentghs. We compute the locations of magnetic null-points and detect their type. When the satellites are separated by the fractions of ion inertial length, as it is for CLUSTER, they are able to locate both the isolated null-points, and the pinches. We apply the method to the real CLUSTER data and speculate how common are pinches in the magnetosphere, and whether they play a dominant role in the dissipation of magnetic energy.
NASA Astrophysics Data System (ADS)
Irving, D. H.; Rasheed, M.; O'Doherty, N.
2010-12-01
The efficient storage, retrieval and interactive use of subsurface data present great challenges in geodata management. Data volumes are typically massive, complex and poorly indexed with inadequate metadata. Derived geomodels and interpretations are often tightly bound in application-centric and proprietary formats; open standards for long-term stewardship are poorly developed. Consequently current data storage is a combination of: complex Logical Data Models (LDMs) based on file storage formats; 2D GIS tree-based indexing of spatial data; and translations of serialised memory-based storage techniques into disk-based storage. Whilst adequate for working at the mesoscale over a short timeframes, these approaches all possess technical and operational shortcomings: data model complexity; anisotropy of access; scalability to large and complex datasets; and weak implementation and integration of metadata. High performance hardware such as parallelised storage and Relational Database Management System (RDBMS) have long been exploited in many solutions but the underlying data structure must provide commensurate efficiencies to allow multi-user, multi-application and near-realtime data interaction. We present an open Spatially-Registered Data Structure (SRDS) built on Massively Parallel Processing (MPP) database architecture implemented by a ANSI SQL 2008 compliant RDBMS. We propose a LDM comprising a 3D Earth model that is decomposed such that each increasing Level of Detail (LoD) is achieved by recursively halving the bin size until it is less than the error in each spatial dimension for that data point. The value of an attribute at that point is stored as a property of that point and at that LoD. It is key to the numerical efficiency of the SRDS that it is under-pinned by a power-of-two relationship thus precluding the need for computationally intensive floating point arithmetic. Our approach employed a tightly clustered MPP array with small clusters of storage, processors and memory communicating over a high-speed network inter-connect. This is a shared-nothing architecture where resources are managed within each cluster unlike most other RDBMSs. Data are accessed on this architecture by their primary index values which utilises the hashing algorithm for point-to-point access. The hashing algorithm’s main role is the efficient distribution of data across the clusters based on the primary index. In this study we used 3D seismic volumes, 2D seismic profiles and borehole logs to demonstrate application in both (x,y,TWT) and (x,y,z)-space. In the SRDS the primary index is a composite column index of (x,y) to avoid invoking time-consuming full table scans as is the case in tree-based systems. This means that data access is isotropic. A query for data in a specified spatial range permits retrieval recursively by point-to-point queries within each nested LoD yielding true linear performance up to the Petabyte scale with hardware scaling presenting the primary limiting factor. Our architecture and LDM promotes: realtime interaction with massive data volumes; streaming of result sets and server-rendered 2D/3D imagery; rigorous workflow control and auditing; and in-database algorithms run directly against data as a HPC cloud service.
Gooya, Ali; Lekadir, Karim; Alba, Xenia; Swift, Andrew J; Wild, Jim M; Frangi, Alejandro F
2015-01-01
Construction of Statistical Shape Models (SSMs) from arbitrary point sets is a challenging problem due to significant shape variation and lack of explicit point correspondence across the training data set. In medical imaging, point sets can generally represent different shape classes that span healthy and pathological exemplars. In such cases, the constructed SSM may not generalize well, largely because the probability density function (pdf) of the point sets deviates from the underlying assumption of Gaussian statistics. To this end, we propose a generative model for unsupervised learning of the pdf of point sets as a mixture of distinctive classes. A Variational Bayesian (VB) method is proposed for making joint inferences on the labels of point sets, and the principal modes of variations in each cluster. The method provides a flexible framework to handle point sets with no explicit point-to-point correspondences. We also show that by maximizing the marginalized likelihood of the model, the optimal number of clusters of point sets can be determined. We illustrate this work in the context of understanding the anatomical phenotype of the left and right ventricles in heart. To this end, we use a database containing hearts of healthy subjects, patients with Pulmonary Hypertension (PH), and patients with Hypertrophic Cardiomyopathy (HCM). We demonstrate that our method can outperform traditional PCA in both generalization and specificity measures.
On the extended stellar structure around NGC 288
NASA Astrophysics Data System (ADS)
Piatti, Andrés E.
2018-01-01
We report on observational evidence of an extra-tidal clumpy structure around NGC 288 from homogeneous coverage of a large area with the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) PS1 data base. The extra-tidal star population has been disentangled from that of the Milky Way (MW) field by using a cleaning technique that successfully reproduces the stellar density, luminosity function and colour distributions of MW field stars. We have produced the cluster stellar density radial profile and a stellar density map from independent approaches, and we found the results to be in excellent agreement - the feature extends up to 3.5 times further than the cluster tidal radius. Previous works based on shallower photometric data sets have speculated on the existence of several long tidal tails, similar to that found in Pal 5. The present outcome shows that NGC 288 could hardly have such tails, but it favours the notion that the use of interactions with the MW tidal field has been a relatively inefficient process for stripping stars off the cluster. These results point to the need for a renewed overall study of the external regions of Galactic globular clusters (GGCs) in order to reliably characterize them. It will then be possible to investigate whether there is any connection between detected tidal tails, extra-tidal stellar populations and extended diffuse halo-like structures, and the dynamical histories of GGCs in the Galaxy.
Close-packed floating clusters: granular hydrodynamics beyond the freezing point?
Meerson, Baruch; Pöschel, Thorsten; Bromberg, Yaron
2003-07-11
Monodisperse granular flows often develop regions with hexagonal close packing of particles. We investigate this effect in a system of inelastic hard spheres driven from below by a "thermal" plate. Molecular dynamics simulations show, in a wide range of parameters, a close-packed cluster supported by a low-density region. Surprisingly, the steady-state density profile, including the close-packed cluster part, is well described by a variant of Navier-Stokes granular hydrodynamics (NSGH). We suggest a simple explanation for the success of NSGH beyond the freezing point.
LENR BEC Clusters on and below Wires through Cavitation and Related Techniques
NASA Astrophysics Data System (ADS)
Stringham, Roger; Stringham, Julie
2011-03-01
During the last two years I have been working on BEC cluster densities deposited just under the surface of wires, using cavitation, and other techniques. If I get the concentration high enough before the clusters dissipate, in addition to cold fusion related excess heat (and other effects, including helium-4 formation) I anticipate that it may be possible to initiate transient forms of superconductivity at room temperature.
NASA Astrophysics Data System (ADS)
Prabhu, A.; Babu, S. B.; Dolado, J. S.; Gimel, J.-C.
2014-07-01
We present a novel simulation technique derived from Brownian cluster dynamics used so far to study the isotropic colloidal aggregation. It now implements the classical Kern-Frenkel potential to describe patchy interactions between particles. This technique gives access to static properties, dynamics and kinetics of the system, even far from the equilibrium. Particle thermal motions are modeled using billions of independent small random translations and rotations, constrained by the excluded volume and the connectivity. This algorithm, applied to a single polymer chain leads to correct static and dynamic properties, in the framework where hydrodynamic interactions are ignored. By varying patch angles, various local chain flexibilities can be obtained. We have used this new algorithm to model step-growth polymerization under various solvent qualities. The polymerization reaction is modeled by an irreversible aggregation between patches while an isotropic finite square-well potential is superimposed to mimic the solvent quality. In bad solvent conditions, a competition between a phase separation (due to the isotropic interaction) and polymerization (due to patches) occurs. Surprisingly, an arrested network with a very peculiar structure appears. It is made of strands and nodes. Strands gather few stretched chains that dip into entangled globular nodes. These nodes act as reticulation points between the strands. The system is kinetically driven and we observe a trapped arrested structure. That demonstrates one of the strengths of this new simulation technique. It can give valuable insights about mechanisms that could be involved in the formation of stranded gels.
DICON: interactive visual analysis of multidimensional clusters.
Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin
2011-12-01
Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
Segmentation and clustering as complementary sources of information
NASA Astrophysics Data System (ADS)
Dale, Michael B.; Allison, Lloyd; Dale, Patricia E. R.
2007-03-01
This paper examines the effects of using a segmentation method to identify change-points or edges in vegetation. It identifies coherence (spatial or temporal) in place of unconstrained clustering. The segmentation method involves change-point detection along a sequence of observations so that each cluster formed is composed of adjacent samples; this is a form of constrained clustering. The protocol identifies one or more models, one for each section identified, and the quality of each is assessed using a minimum message length criterion, which provides a rational basis for selecting an appropriate model. Although the segmentation is less efficient than clustering, it does provide other information because it incorporates textural similarity as well as homogeneity. In addition it can be useful in determining various scales of variation that may apply to the data, providing a general method of small-scale pattern analysis.
Application of Artificial Intelligence For Euler Solutions Clustering
NASA Astrophysics Data System (ADS)
Mikhailov, V.; Galdeano, A.; Diament, M.; Gvishiani, A.; Agayan, S.; Bogoutdinov, Sh.; Graeva, E.; Sailhac, P.
Results of Euler deconvolution strongly depend on the selection of viable solutions. Synthetic calculations using multiple causative sources show that Euler solutions clus- ter in the vicinity of causative bodies even when they do not group densely about perimeter of the bodies. We have developed a clustering technique to serve as a tool for selecting appropriate solutions. The method RODIN, employed in this study, is based on artificial intelligence and was originally designed for problems of classification of large data sets. It is based on a geometrical approach to study object concentration in a finite metric space of any dimension. The method uses a formal definition of cluster and includes free parameters that facilitate the search for clusters of given proper- ties. Test on synthetic and real data showed that the clustering technique successfully outlines causative bodies more accurate than other methods of discriminating Euler solutions. In complicated field cases such as the magnetic field in the Gulf of Saint Malo region (Brittany, France), the method provides geologically insightful solutions. Other advantages of the clustering method application are: - Clusters provide solutions associated with particular bodies or parts of bodies permitting the analysis of different clusters of Euler solutions separately. This may allow computation of average param- eters for individual causative bodies. - Those measurements of the anomalous field that yield clusters also form dense clusters themselves. The application of cluster- ing technique thus outlines areas where the influence of different causative sources is more prominent. This allows one to focus on areas for reinterpretation, using different window sizes, structural indices and so on.
Clustering approaches to identifying gene expression patterns from DNA microarray data.
Do, Jin Hwan; Choi, Dong-Kug
2008-04-30
The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.
The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions
NASA Astrophysics Data System (ADS)
Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.
2018-04-01
The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.
Atomistic simulation of damage accumulation and amorphization in Ge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gomez-Selles, Jose L., E-mail: joseluis.gomezselles@imdea.org; Martin-Bragado, Ignacio; Claverie, Alain
2015-02-07
Damage accumulation and amorphization mechanisms by means of ion implantation in Ge are studied using Kinetic Monte Carlo and Binary Collision Approximation techniques. Such mechanisms are investigated through different stages of damage accumulation taking place in the implantation process: from point defect generation and cluster formation up to full amorphization of Ge layers. We propose a damage concentration amorphization threshold for Ge of ∼1.3 × 10{sup 22} cm{sup −3} which is independent on the implantation conditions. Recombination energy barriers depending on amorphous pocket sizes are provided. This leads to an explanation of the reported distinct behavior of the damage generated by different ions.more » We have also observed that the dissolution of clusters plays an important role for relatively high temperatures and fluences. The model is able to explain and predict different damage generation regimes, amount of generated damage, and extension of amorphous layers in Ge for different ions and implantation conditions.« less
Predicting thunderstorm evolution using ground-based lightning detection networks
NASA Technical Reports Server (NTRS)
Goodman, Steven J.
1990-01-01
Lightning measurements acquired principally by a ground-based network of magnetic direction finders are used to diagnose and predict the existence, temporal evolution, and decay of thunderstorms over a wide range of space and time scales extending over four orders of magnitude. The non-linear growth and decay of thunderstorms and their accompanying cloud-to-ground lightning activity is described by the three parameter logistic growth model. The growth rate is shown to be a function of the storm size and duration, and the limiting value of the total lightning activity is related to the available energy in the environment. A new technique is described for removing systematic bearing errors from direction finder data where radar echoes are used to constrain site error correction and optimization (best point estimate) algorithms. A nearest neighbor pattern recognition algorithm is employed to cluster the discrete lightning discharges into storm cells and the advantages and limitations of different clustering strategies for storm identification and tracking are examined.
Clustering and Network Analysis of Reverse Phase Protein Array Data.
Byron, Adam
2017-01-01
Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.
Changes of Water Hydrogen Bond Network with Different Externalities
Zhao, Lin; Ma, Kai; Yang, Zi
2015-01-01
It is crucial to uncover the mystery of water cluster and structural motif to have an insight into the abundant anomalies bound to water. In this context, the analysis of influence factors is an alternative way to shed light on the nature of water clusters. Water structure has been tentatively explained within different frameworks of structural models. Based on comprehensive analysis and summary of the studies on the response of water to four externalities (i.e., temperature, pressure, solutes and external fields), the changing trends of water structure and a deduced intrinsic structural motif are put forward in this work. The variations in physicochemical and biological effects of water induced by each externality are also discussed to emphasize the role of water in our daily life. On this basis, the underlying problems that need to be further studied are formulated by pointing out the limitations attached to current study techniques and to outline prominent studies that have come up recently. PMID:25884333
Cluster geometry and survival probability in systems driven by reaction diffusion dynamics
NASA Astrophysics Data System (ADS)
Windus, Alastair; Jensen, Henrik J.
2008-11-01
We consider a reaction-diffusion model incorporating the reactions A→phi, A→2A and 2A→3A. Depending on the relative rates for sexual and asexual reproduction of the quantity A, the model exhibits either a continuous or first-order absorbing phase transition to an extinct state. A tricritical point separates the two phase lines. While we comment on this critical behaviour, the main focus of the paper is on the geometry of the population clusters that form. We observe the different cluster structures that arise at criticality for the three different types of critical behaviour and show that there exists a linear relationship for the survival probability against initial cluster size at the tricritical point only.
NASA Astrophysics Data System (ADS)
Sangadji, Iriansyah; Arvio, Yozika; Indrianto
2018-03-01
to understand by analyzing the pattern of changes in value movements that can dynamically vary over a given period with relative accuracy, an equipment is required based on the utilization of technical working principles or specific analytical method. This will affect the level of validity of the output that will occur from this system. Subtractive clustering is based on the density (potential) size of data points in a space (variable). The basic concept of subtractive clustering is to determine the regions in a variable that has high potential for the surrounding points. In this paper result is segmentation of behavior pattern based on quantity value movement. It shows the number of clusters is formed and that has many members.
Observations and Interpretation of Magnetofluid Turbulence at Small Scales
NASA Technical Reports Server (NTRS)
Goldstein, Melvyn L.; Sahraoui, Fouad
2011-01-01
High time resolution magnetic field measurements from the four Cluster spacecraft have revealed new features of the properties of magnetofluid turbulence at small spatial scales; perhaps even revealing the approach to the dissipation regime at scales close to the electron inertial length. Various analysis techniques and theoretical ideas have been put forward to account for the properties of those measurements. The talk will describe the current state of observations and theory, and will point out on-going and planned research that will further our understanding of how magnetofluid turbulence dissipates. The observations and theories are directly germane to studies being planned as part of NASA's forthcoming Magnetospheric Multiscale Mission.
Localization--the revolution in consumer markets.
Rigby, Darrell K; Vishwanath, Vijay
2006-04-01
Standardization has been a powerful strategy in consumer markets, but it's reached the point of diminishing returns. And diversity is not the only chink in standardization's armor: Attempts to build stores in the remaining attractive locations often meet fierce resistance from community activists. From California to Florida to New Jersey, neighborhoods are passing ordinances that dictate the sizes and even architectural styles of new shops. Building more of the same--long the cornerstone of retailer growth--seems to be tapped out as a strategy. Of course, a company can't customize every element of its business in every location. Strategists have begun to use clustering techniques to simplify and smooth out decision making and to focus their efforts on the relatively small number of variables that usually drive the bulk of consumer purchases. The customization-by-clusters approach, which began as a strategy for grocery stores in 1995, has since proven effective in drugstores, department stores, mass merchants, big-box retailers, restaurants, apparel companies, and a variety of consumer goods manufacturers. Clustering sorts things into groups, so that the associations are strong between members of the same cluster and weak between members of different clusters. In fact, by centralizing data-intensive and scale-sensitive functions (such as store design, merchandise assortment, buying, and supply chain management), localization liberates store personnel to do what they do best: Test innovative solutions to local challenges and forge strong bonds with communities. Ultimately, all companies serving consumers will face the challenge of local customization. We are advancing to a world where the strategies of the most successful businesses will be as diverse as the communities they serve.
NASA Astrophysics Data System (ADS)
Onaka-Masada, Ayumi; Nakai, Toshiro; Okuyama, Ryosuke; Okuda, Hidehiko; Kadono, Takeshi; Hirose, Ryo; Koga, Yoshihiro; Kurita, Kazunari; Sueoka, Koji
2018-02-01
The effect of oxygen (O) concentration on the Fe gettering capability in a carbon-cluster (C3H5) ion-implanted region was investigated by comparing a Czochralski (CZ)-grown silicon substrate and an epitaxial growth layer. A high Fe gettering efficiency in a carbon-cluster ion-implanted epitaxial growth layer, which has a low oxygen region, was observed by deep-level transient spectroscopy (DLTS) and secondary ion mass spectroscopy (SIMS). It was demonstrated that the amount of gettered Fe in the epitaxial growth layer is approximately two times higher than that in the CZ-grown silicon substrate. Furthermore, by measuring the cathodeluminescence, the number of intrinsic point defects induced by carbon-cluster ion implantation was found to differ between the CZ-grown silicon substrate and the epitaxial growth layer. It is suggested that Fe gettering by carbon-cluster ion implantation comes through point defect clusters, and that O in the carbon-cluster ion-implanted region affects the formation of gettering sinks for Fe.
Dynamical age differences among coeval star clusters as revealed by blue stragglers.
Ferraro, F R; Lanzoni, B; Dalessandro, E; Beccari, G; Pasquato, M; Miocchi, P; Rood, R T; Sigurdsson, S; Sills, A; Vesperini, E; Mapelli, M; Contreras, R; Sanna, N; Mucciarelli, A
2012-12-20
Globular star clusters that formed at the same cosmic time may have evolved rather differently from the dynamical point of view (because that evolution depends on the internal environment) through a variety of processes that tend progressively to segregate stars more massive than the average towards the cluster centre. Therefore clusters with the same chronological age may have reached quite different stages of their dynamical history (that is, they may have different 'dynamical ages'). Blue straggler stars have masses greater than those at the turn-off point on the main sequence and therefore must be the result of either a collision or a mass-transfer event. Because they are among the most massive and luminous objects in old clusters, they can be used as test particles with which to probe dynamical evolution. Here we report that globular clusters can be grouped into a few distinct families on the basis of the radial distribution of blue stragglers. This grouping corresponds well to an effective ranking of the dynamical stage reached by stellar systems, thereby permitting a direct measure of the cluster dynamical age purely from observed properties.
MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data
Gutiérrez-Avilés, David; Rubio-Escudero, Cristina
2015-01-01
Microarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes, experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the angles of the slopes formed by each profile formed by the genes, conditions, and times of the tricluster. PMID:26124630
Simulation study into the identification of nuclear materials in cargo containers using cosmic rays
NASA Astrophysics Data System (ADS)
Blackwell, T. B.; Kudryavtsev, V. A.
2015-04-01
Muon tomography represents a new type of imaging technique that can be used in detecting high-Z materials. Monte Carlo simulations for muon scattering in different types of target materials are presented. The dependence of the detector capability to identify high-Z targets on spatial resolution has been studied. Muon tracks are reconstructed using a basic point of closest approach (PoCA) algorithm. In this article we report the development of a secondary analysis algorithm that is applied to the reconstructed PoCA points. This algorithm efficiently ascertains clusters of voxels with high average scattering angles to identify `areas of interest' within the inspected volume. Using this approach the effect of other parameters, such as the distance between detectors and the number of detectors per set, on material identification is also presented. Finally, false positive and false negative rates for detecting shielded HEU in realistic scenarios with low-Z clutter are presented.
Min-max hyperellipsoidal clustering for anomaly detection in network security.
Sarasamma, Suseela T; Zhu, Qiuming A
2006-08-01
A novel hyperellipsoidal clustering technique is presented for an intrusion-detection system in network security. Hyperellipsoidal clusters toward maximum intracluster similarity and minimum intercluster similarity are generated from training data sets. The novelty of the technique lies in the fact that the parameters needed to construct higher order data models in general multivariate Gaussian functions are incrementally derived from the data sets using accretive processes. The technique is implemented in a feedforward neural network that uses a Gaussian radial basis function as the model generator. An evaluation based on the inclusiveness and exclusiveness of samples with respect to specific criteria is applied to accretively learn the output clusters of the neural network. One significant advantage of this is its ability to detect individual anomaly types that are hard to detect with other anomaly-detection schemes. Applying this technique, several feature subsets of the tcptrace network-connection records that give above 95% detection at false-positive rates below 5% were identified.
Classification of Two Class Motor Imagery Tasks Using Hybrid GA-PSO Based K-Means Clustering.
Suraj; Tiwari, Purnendu; Ghosh, Subhojit; Sinha, Rakesh Kumar
2015-01-01
Transferring the brain computer interface (BCI) from laboratory condition to meet the real world application needs BCI to be applied asynchronously without any time constraint. High level of dynamism in the electroencephalogram (EEG) signal reasons us to look toward evolutionary algorithm (EA). Motivated by these two facts, in this work a hybrid GA-PSO based K-means clustering technique has been used to distinguish two class motor imagery (MI) tasks. The proposed hybrid GA-PSO based K-means clustering is found to outperform genetic algorithm (GA) and particle swarm optimization (PSO) based K-means clustering techniques in terms of both accuracy and execution time. The lesser execution time of hybrid GA-PSO technique makes it suitable for real time BCI application. Time frequency representation (TFR) techniques have been used to extract the feature of the signal under investigation. TFRs based features are extracted and relying on the concept of event related synchronization (ERD) and desynchronization (ERD) feature vector is formed.
Classification of Two Class Motor Imagery Tasks Using Hybrid GA-PSO Based K-Means Clustering
Suraj; Tiwari, Purnendu; Ghosh, Subhojit; Sinha, Rakesh Kumar
2015-01-01
Transferring the brain computer interface (BCI) from laboratory condition to meet the real world application needs BCI to be applied asynchronously without any time constraint. High level of dynamism in the electroencephalogram (EEG) signal reasons us to look toward evolutionary algorithm (EA). Motivated by these two facts, in this work a hybrid GA-PSO based K-means clustering technique has been used to distinguish two class motor imagery (MI) tasks. The proposed hybrid GA-PSO based K-means clustering is found to outperform genetic algorithm (GA) and particle swarm optimization (PSO) based K-means clustering techniques in terms of both accuracy and execution time. The lesser execution time of hybrid GA-PSO technique makes it suitable for real time BCI application. Time frequency representation (TFR) techniques have been used to extract the feature of the signal under investigation. TFRs based features are extracted and relying on the concept of event related synchronization (ERD) and desynchronization (ERD) feature vector is formed. PMID:25972896
Genome Engineering and Modification Toward Synthetic Biology for the Production of Antibiotics.
Zou, Xuan; Wang, Lianrong; Li, Zhiqiang; Luo, Jie; Wang, Yunfu; Deng, Zixin; Du, Shiming; Chen, Shi
2018-01-01
Antibiotic production is often governed by large gene clusters composed of genes related to antibiotic scaffold synthesis, tailoring, regulation, and resistance. With the expansion of genome sequencing, a considerable number of antibiotic gene clusters has been isolated and characterized. The emerging genome engineering techniques make it possible towards more efficient engineering of antibiotics. In addition to genomic editing, multiple synthetic biology approaches have been developed for the exploration and improvement of antibiotic natural products. Here, we review the progress in the development of these genome editing techniques used to engineer new antibiotics, focusing on three aspects of genome engineering: direct cloning of large genomic fragments, genome engineering of gene clusters, and regulation of gene cluster expression. This review will not only summarize the current uses of genomic engineering techniques for cloning and assembly of antibiotic gene clusters or for altering antibiotic synthetic pathways but will also provide perspectives on the future directions of rebuilding biological systems for the design of novel antibiotics. © 2017 Wiley Periodicals, Inc.
Improvements in Ionized Cluster-Beam Deposition
NASA Technical Reports Server (NTRS)
Fitzgerald, D. J.; Compton, L. E.; Pawlik, E. V.
1986-01-01
Lower temperatures result in higher purity and fewer equipment problems. In cluster-beam deposition, clusters of atoms formed by adiabatic expansion nozzle and with proper nozzle design, expanding vapor cools sufficiently to become supersaturated and form clusters of material deposited. Clusters are ionized and accelerated in electric field and then impacted on substrate where films form. Improved cluster-beam technique useful for deposition of refractory metals.
TopMaker: A Technique for Automatic Multi-Block Topology Generation Using the Medial Axis
NASA Technical Reports Server (NTRS)
Heidmann, James D. (Technical Monitor); Rigby, David L.
2004-01-01
A two-dimensional multi-block topology generation technique has been developed. Very general configurations are addressable by the technique. A configuration is defined by a collection of non-intersecting closed curves, which will be referred to as loops. More than a single loop implies that holes exist in the domain, which poses no problem. This technique requires only the medial vertices and the touch points that define each vertex. From the information about the medial vertices, the connectivity between medial vertices is generated. The physical shape of the medial edge is not required. By applying a few simple rules to each medial edge, the multiblock topology is generated with no user intervention required. The resulting topologies contain only the level of complexity dictated by the configurations. Grid lines remain attached to the boundary except at sharp concave turns where a change in index family is introduced as would be desired. Keeping grid lines attached to the boundary is especially important in the area of computational fluid dynamics where highly clustered grids are used near no-slip boundaries. This technique is simple and robust and can easily be incorporated into the overall grid generation process.
Communication: A simplified coupled-cluster Lagrangian for polarizable embedding.
Krause, Katharina; Klopper, Wim
2016-01-28
A simplified coupled-cluster Lagrangian, which is linear in the Lagrangian multipliers, is proposed for the coupled-cluster treatment of a quantum mechanical system in a polarizable environment. In the simplified approach, the amplitude equations are decoupled from the Lagrangian multipliers and the energy obtained from the projected coupled-cluster equation corresponds to a stationary point of the Lagrangian.
Communication: A simplified coupled-cluster Lagrangian for polarizable embedding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krause, Katharina; Klopper, Wim, E-mail: klopper@kit.edu
A simplified coupled-cluster Lagrangian, which is linear in the Lagrangian multipliers, is proposed for the coupled-cluster treatment of a quantum mechanical system in a polarizable environment. In the simplified approach, the amplitude equations are decoupled from the Lagrangian multipliers and the energy obtained from the projected coupled-cluster equation corresponds to a stationary point of the Lagrangian.
Robust MST-Based Clustering Algorithm.
Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing
2018-06-01
Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
Active constrained clustering by examining spectral Eigenvectors
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; desJardins, Marie; Xu, Qianjun
2005-01-01
This work focuses on the active selection of pairwise constraints for spectral clustering. We develop and analyze a technique for Active Constrained Clustering by Examining Spectral eigenvectorS (ACCESS) derived from a similarity matrix.
1994-09-30
relational versus object oriented DBMS, knowledge discovery, data models, rnetadata, data filtering, clustering techniques, and synthetic data. A secondary...The first was the investigation of Al/ES Lapplications (knowledge discovery, data mining, and clustering ). Here CAST collabo.rated with Dr. Fred Petry...knowledge discovery system based on clustering techniques; implemented an on-line data browser to the DBMS; completed preliminary efforts to apply object
Multi-point Measurements of Relativistic Electrons in the Magnetosphere
NASA Astrophysics Data System (ADS)
Li, X.; Selesnick, R.; Baker, D. N.; Blake, J. B.; Schiller, Q.; Blum, L. W.; Zhao, H.; Jaynes, A. N.; Kanekal, S.
2014-12-01
We take an advantage of five different DC electric field measurements in the plasma sheet available from the EFW double probe experiment, EDI electron drift instrument, CODIF and HIA ion spectrometers, and PEACE electron spectrometer on the four Cluster spacecraft. The calibrated observations of the three spectrometers are used to determine the proton and electron velocity moments. The velocity moments can be used to estimate the proton and electron drift velocity and furthermore the DC electric field, assuming that the electron and proton velocity perpendicular to the magnetic field is dominated by the ExB drift motion. Naturally when ions and electrons do not perform a proper drift motion, which can happen in the plasma sheet, the estimated DC electric field from ion and electron motion is not correct. However, surprisingly often the DC electric fields estimated from electron and ion motions are identical suggesting that this field is a real DC electric field around the measurement point. As the measurement techniques are so different, it is quite plausible that when two different measurements yield the same DC electric field, it is the correct field. All five measurements of the DC electric field are usually not simultaneously available, especially on Cluster 2 where CODIF and HIA are not operational, or on Cluster 4 where EDI is off. In this presentation we investigate DC electric field in various transient plasma sheet events such as dipolarization events and BBF's and how the five measurements agree or disagree. There are plenty of important issues that are considered, e.g., (1) what kind of DC electric fields exist in such events and what are their spatial scales, (2) do electrons and ions perform ExB drift motions in these events, and (3) how well the instruments have been calibrated.
Higher order correlations of IRAS galaxies
NASA Technical Reports Server (NTRS)
Meiksin, Avery; Szapudi, Istvan; Szalay, Alexander
1992-01-01
The higher order irreducible angular correlation functions are derived up to the eight-point function, for a sample of 4654 IRAS galaxies, flux-limited at 1.2 Jy in the 60 microns band. The correlations are generally found to be somewhat weaker than those for the optically selected galaxies, consistent with the visual impression of looser clusters in the IRAS sample. It is found that the N-point correlation functions can be expressed as the symmetric sum of products of N - 1 two-point functions, although the correlations above the four-point function are consistent with zero. The coefficients are consistent with the hierarchical clustering scenario as modeled by Hamilton and by Schaeffer.
Kannan, Vijay Christopher; Hodgson, Nicole; Lau, Andrew; Goodin, Kate; Dugas, Andrea Freyer; LoVecchio, Frank
2016-11-01
We seek to use a novel layered-surveillance approach to localize influenza clusters within an acute care population. The first layer of this system is a syndromic surveillance screen to guide rapid polymerase chain reaction testing. The second layer is geolocalization and cluster analysis of these patients. We posit that any identified clusters could represent at-risk populations who could serve as high-yield targets for preventive medical interventions. This was a prospective observational surveillance study. Patients were screened with a previously derived clinical decision guideline that has a 90% sensitivity and 30% specificity for influenza. Patients received points for the following signs and symptoms within the past 7 days: cough (2 points), headache (1 point), subjective fever (1 point), and documented fever at triage (temperature >38°C [100.4°F]) (1 point). Patients scoring 3 points or higher were indicated for influenza testing. Patients were tested with Xpert Flu (Cepheid, Sunnyvale, CA), a rapid polymerase chain reaction test. Positive results were mapped with ArcGIS (ESRI, Redlands, CA) and analyzed with kernel density estimation to create heat maps. There were 1,360 patients tested with Xpert Flu with retrievable addresses within the greater Phoenix metro area. One hundred sixty-seven (12%) of them tested positive for influenza A and 23 (2%) tested positive for influenza B. The influenza A virus exhibited a clear cluster pattern within this patient population. The densest cluster was located in an approximately 1-square-mile region southeast of our hospital. Our layered-surveillance approach was effective in localizing a cluster of influenza A outbreak. This region may house a high-yield target population for public health intervention. Further collaborative efforts will be made between our hospital and the Maricopa County Department of Public Health to perform a series of community vaccination events before the next influenza season. We hope these efforts will ultimately serve to reduce the burden of this disease on our patient population, and that this system will serve as a framework for future investigations locating at-risk populations. Copyright © 2016 American College of Emergency Physicians. Published by Elsevier Inc. All rights reserved.
Mining the National Career Assessment Examination Result Using Clustering Algorithm
NASA Astrophysics Data System (ADS)
Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.
2018-03-01
Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.
[Utilization of Big Data in Medicine and Future Outlook].
Kinosada, Yasutomi; Uematsu, Machiko; Fujiwara, Takuya
2016-03-01
"Big data" is a new buzzword. The point is not to be dazzled by the volume of data, but rather to analyze it, and convert it into insights, innovations, and business value. There are also real differences between conventional analytics and big data. In this article, we show some results of big data analysis using open DPC (Diagnosis Procedure Combination) data in areas of the central part of JAPAN: Toyama, Ishikawa, Fukui, Nagano, Gifu, Aichi, Shizuoka, and Mie Prefectures. These 8 prefectures contain 51 medical administration areas called the second medical area. By applying big data analysis techniques such as k-means, hierarchical clustering, and self-organizing maps to DPC data, we can visualize the disease structure and detect similarities or variations among the 51 second medical areas. The combination of a big data analysis technique and open DPC data is a very powerful method to depict real figures on patient distribution in Japan.
Mapping forest vegetation with ERTS-1 MSS data and automatic data processing techniques
NASA Technical Reports Server (NTRS)
Messmore, J.; Copeland, G. E.; Levy, G. F.
1975-01-01
This study was undertaken with the intent of elucidating the forest mapping capabilities of ERTS-1 MSS data when analyzed with the aid of LARS' automatic data processing techniques. The site for this investigation was the Great Dismal Swamp, a 210,000 acre wilderness area located on the Middle Atlantic coastal plain. Due to inadequate ground truth information on the distribution of vegetation within the swamp, an unsupervised classification scheme was utilized. Initially pictureprints, resembling low resolution photographs, were generated in each of the four ERTS-1 channels. Data found within rectangular training fields was then clustered into 13 spectral groups and defined statistically. Using a maximum likelihood classification scheme, the unknown data points were subsequently classified into one of the designated training classes. Training field data was classified with a high degree of accuracy (greater than 95%), and progress is being made towards identifying the mapped spectral classes.
Mapping forest vegetation with ERTS-1 MSS data and automatic data processing techniques
NASA Technical Reports Server (NTRS)
Messmore, J.; Copeland, G. E.; Levy, G. F.
1975-01-01
This study was undertaken with the intent of elucidating the forest mapping capabilities of ERTS-1 MSS data when analyzed with the aid of LARS' automatic data processing techniques. The site for this investigation was the Great Dismal Swamp, a 210,000 acre wilderness area located on the Middle Atlantic coastal plain. Due to inadequate ground truth information on the distribution of vegetation within the swamp, an unsupervised classification scheme was utilized. Initially pictureprints, resembling low resolution photographs, were generated in each of the four ERTS-1 channels. Data found within rectangular training fields was then clustered into 13 spectral groups and defined statistically. Using a maximum likelihood classification scheme, the unknown data points were subsequently classified into one of the designated training classes. Training field data was classified with a high degree of accuracy (greater than 95 percent), and progress is being made towards identifying the mapped spectral classes.
Japanese migration in contemporary Japan: economic segmentation and interprefectural migration.
Fukurai, H
1991-01-01
This paper examines the economic segmentation model in explaining 1985-86 Japanese interregional migration. The analysis takes advantage of statistical graphic techniques to illustrate the following substantive issues of interregional migration: (1) to examine whether economic segmentation significantly influences Japanese regional migration and (2) to explain socioeconomic characteristics of prefectures for both in- and out-migration. Analytic techniques include a latent structural equation (LISREL) methodology and statistical residual mapping. The residual dispersion patterns, for instance, suggest the extent to which socioeconomic and geopolitical variables explain migration differences by showing unique clusters of unexplained residuals. The analysis further points out that extraneous factors such as high residential land values, significant commuting populations, and regional-specific cultures and traditions need to be incorporated in the economic segmentation model in order to assess the extent of the model's reliability in explaining the pattern of interprefectural migration.
Towards semi-automatic rock mass discontinuity orientation and set analysis from 3D point clouds
NASA Astrophysics Data System (ADS)
Guo, Jiateng; Liu, Shanjun; Zhang, Peina; Wu, Lixin; Zhou, Wenhui; Yu, Yinan
2017-06-01
Obtaining accurate information on rock mass discontinuities for deformation analysis and the evaluation of rock mass stability is important. Obtaining measurements for high and steep zones with the traditional compass method is difficult. Photogrammetry, three-dimensional (3D) laser scanning and other remote sensing methods have gradually become mainstream methods. In this study, a method that is based on a 3D point cloud is proposed to semi-automatically extract rock mass structural plane information. The original data are pre-treated prior to segmentation by removing outlier points. The next step is to segment the point cloud into different point subsets. Various parameters, such as the normal, dip/direction and dip, can be calculated for each point subset after obtaining the equation of the best fit plane for the relevant point subset. A cluster analysis (a point subset that satisfies some conditions and thus forms a cluster) is performed based on the normal vectors by introducing the firefly algorithm (FA) and the fuzzy c-means (FCM) algorithm. Finally, clusters that belong to the same discontinuity sets are merged and coloured for visualization purposes. A prototype system is developed based on this method to extract the points of the rock discontinuity from a 3D point cloud. A comparison with existing software shows that this method is feasible. This method can provide a reference for rock mechanics, 3D geological modelling and other related fields.
Method of identifying clusters representing statistical dependencies in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.
Within-Cluster and Across-Cluster Matching with Observational Multilevel Data
ERIC Educational Resources Information Center
Kim, Jee-Seon; Steiner, Peter M.; Hall, Courtney; Thoemmes, Felix
2013-01-01
When randomized experiments cannot be conducted in practice, propensity score (PS) techniques for matching treated and control units are frequently used for estimating causal treatment effects from observational data. Despite the popularity of PS techniques, they are not yet well studied for matching multilevel data where selection into treatment…
NASA Technical Reports Server (NTRS)
Carvalho, L. M. V.; Rickenbach, T.
1999-01-01
Satellite infrared (IR) and visible (VIS) images from the Tropical Ocean Global Atmosphere - Coupled Ocean Atmosphere Response Experiment (TOGA-COARE) experiment are investigated through the use of Clustering Analysis. The clusters are obtained from the values of IR and VIS counts and the local variance for both channels. The clustering procedure is based on the standardized histogram of each variable obtained from 179 pairs of images. A new approach to classify high clouds using only IR and the clustering technique is proposed. This method allows the separation of the enhanced convection in two main classes: convective tops, more closely related to the most active core of the storm, and convective systems, which produce regions of merged, thick anvil clouds. The resulting classification of different portions of cloudiness is compared to the radar reflectivity field for intensive events. Convective Systems and Convective Tops are followed during their life cycle using the IR clustering method. The areal coverage of precipitation and features related to convective and stratiform rain is obtained from the radar for each stage of the evolving Mesoscale Convective Systems (MCS). In order to compare the IR clustering method with a simple threshold technique, two IR thresholds (Tir) were used to identify different portions of cloudiness, Tir=240K which roughly defines the extent of all cloudiness associated with the MCS, and Tir=220K which indicates the presence of deep convection. It is shown that the IR clustering technique can be used as a simple alternative to identify the actual portion of convective and stratiform rainfall.
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
THE ENTIRE VIRIAL RADIUS OF THE FOSSIL CLUSTER RX J1159+5531. I. GAS PROPERTIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Yuanyuan; Buote, David; Gastaldello, Fabio
2015-06-01
Previous analysis of the fossil-group/cluster RX J1159+5531 with X-ray observations from a central Chandra pointing and an offset-north Suzaku pointing indicate a radial intracluster medium (ICM) entropy profile at the virial radius (R{sub vir}) consistent with predictions from gravity-only cosmological simulations, in contrast to other cool-core clusters. To examine the generality of these results, we present three new Suzaku observations that, in conjunction with the north pointing, provide complete azimuthal coverage out to R{sub vir}. With two new Chandra ACIS-I observations overlapping the north Suzaku pointing, we have resolved ≳50% of the cosmic X-ray background there. We present radial profilesmore » of the ICM density, temperature, entropy, and pressure obtained for each of the four directions. We measure only modest azimuthal scatter in the ICM properties at R{sub 200} between the Suzaku pointings: 7.6% in temperature and 8.6% in density, while the systematic errors can be significant. The temperature scatter, in particular, is lower than that studied at R{sub 200} for a small number of other clusters observed with Suzaku. These azimuthal measurements verify that RX J1159+5531 is a regular, highly relaxed system. The well-behaved entropy profiles we have measured for RX J1159+5531 disfavor the weakening of the accretion shock as an explanation of the entropy flattening found in other cool-core clusters but is consistent with other explanations such as gas clumping, electron-ion non-equilibrium, non-thermal pressure support, and cosmic-ray acceleration. Finally, we mention that the large-scale galaxy density distribution of RX J1159+5531 seems to have little impact on its gas properties near R{sub vir}.« less
a Super Voxel-Based Riemannian Graph for Multi Scale Segmentation of LIDAR Point Clouds
NASA Astrophysics Data System (ADS)
Li, Minglei
2018-04-01
Automatically segmenting LiDAR points into respective independent partitions has become a topic of great importance in photogrammetry, remote sensing and computer vision. In this paper, we cast the problem of point cloud segmentation as a graph optimization problem by constructing a Riemannian graph. The scale space of the observed scene is explored by an octree-based over-segmentation with different depths. The over-segmentation produces many super voxels which restrict the structure of the scene and will be used as nodes of the graph. The Kruskal coordinates are used to compute edge weights that are proportional to the geodesic distance between nodes. Then we compute the edge-weight matrix in which the elements reflect the sectional curvatures associated with the geodesic paths between super voxel nodes on the scene surface. The final segmentation results are generated by clustering similar super voxels and cutting off the weak edges in the graph. The performance of this method was evaluated on LiDAR point clouds for both indoor and outdoor scenes. Additionally, extensive comparisons to state of the art techniques show that our algorithm outperforms on many metrics.
Infrared Multiple Photon Dissociation Spectroscopy Of Metal Cluster-Adducts
NASA Astrophysics Data System (ADS)
Cox, D. M.; Kaldor, A.; Zakin, M. R.
1987-01-01
Recent development of the laser vaporization technique combined with mass-selective detection has made possible new studies of the fundamental chemical and physical properties of unsupported transition metal clusters as a function of the number of constituent atoms. A variety of experimental techniques have been developed in our laboratory to measure ionization threshold energies, magnetic moments, and gas phase reactivity of clusters. However, studies have so far been unable to determine the cluster structure or the chemical state of chemisorbed species on gas phase clusters. The application of infrared multiple photon dissociation IRMPD to obtain the IR absorption properties of metal cluster-adsorbate species in a molecular beam is described here. Specifically using a high power, pulsed CO2 laser as the infrared source, the IRMPD spectrum for methanol chemisorbed on small iron clusters is measured as a function of the number of both iron atoms and methanols in the complex for different methanol isotopes. Both the feasibility and potential utility of IRMPD for characterizing metal cluster-adsorbate interactions are demonstrated. The method is generally applicable to any cluster or cluster-adsorbate system dependent only upon the availability of appropriate high power infrared sources.
NASA Technical Reports Server (NTRS)
Chapman, G. M. (Principal Investigator); Carnes, J. G.
1981-01-01
Several techniques which use clusters generated by a new clustering algorithm, CLASSY, are proposed as alternatives to random sampling to obtain greater precision in crop proportion estimation: (1) Proportional Allocation/relative count estimator (PA/RCE) uses proportional allocation of dots to clusters on the basis of cluster size and a relative count cluster level estimate; (2) Proportional Allocation/Bayes Estimator (PA/BE) uses proportional allocation of dots to clusters and a Bayesian cluster-level estimate; and (3) Bayes Sequential Allocation/Bayesian Estimator (BSA/BE) uses sequential allocation of dots to clusters and a Bayesian cluster level estimate. Clustering in an effective method in making proportion estimates. It is estimated that, to obtain the same precision with random sampling as obtained by the proportional sampling of 50 dots with an unbiased estimator, samples of 85 or 166 would need to be taken if dot sets with AI labels (integrated procedure) or ground truth labels, respectively were input. Dot reallocation provides dot sets that are unbiased. It is recommended that these proportion estimation techniques are maintained, particularly the PA/BE because it provides the greatest precision.
Effective structural descriptors for natural and engineered radioactive waste confinement barriers
NASA Astrophysics Data System (ADS)
Lemmens, Laurent; Rogiers, Bart; De Craen, Mieke; Laloy, Eric; Jacques, Diederik; Huysmans, Marijke; Swennen, Rudy; Urai, Janos L.; Desbois, Guillaume
2017-04-01
The microstructure of a radioactive waste confinement barrier strongly influences its flow and transport properties. Numerical flow and transport simulations for these porous media at the pore scale therefore require input data that describe the microstructure as accurately as possible. To date, no imaging method can resolve all heterogeneities within important radioactive waste confinement barrier materials as hardened cement paste and natural clays at the micro scale (nm-cm). Therefore, it is necessary to merge information from different 2D and 3D imaging methods using porous media reconstruction techniques. To qualitatively compare the results of different reconstruction techniques, visual inspection might suffice. To quantitatively compare training-image based algorithms, Tan et al. (2014) proposed an algorithm using an analysis of distance. However, the ranking of the algorithm depends on the choice of the structural descriptor, in their case multiple-point or cluster-based histograms. We present here preliminary work in which we will review different structural descriptors and test their effectiveness, for capturing the main structural characteristics of radioactive waste confinement barrier materials, to determine the descriptors to use in the analysis of distance. The investigated descriptors are particle size distributions, surface area distributions, two point probability functions, multiple point histograms, linear functions and two point cluster functions. The descriptor testing consists of stochastically generating realizations from a reference image using the simulated annealing optimization procedure introduced by Karsanina et al. (2015). This procedure basically minimizes the differences between pre-specified descriptor values associated with the training image and the image being produced. The most efficient descriptor set can therefore be identified by comparing the image generation quality among the tested descriptor combinations. The assessment of the quality of the simulations will be made by combining all considered descriptors. Once the set of the most efficient descriptors is determined, they can be used in the analysis of distance, to rank different reconstruction algorithms in a more objective way in future work. Karsanina MV, Gerke KM, Skvortsova EB, Mallants D (2015) Universal Spatial Correlation Functions for Describing and Reconstructing Soil Microstructure. PLoS ONE 10(5): e0126515. doi:10.1371/journal.pone.0126515 Tan, Xiaojin, Pejman Tahmasebi, and Jef Caers. "Comparing training-image based algorithms using an analysis of distance." Mathematical Geosciences 46.2 (2014): 149-169.
Study of hot flow anomalies using Cluster multi-spacecraft measurements
NASA Astrophysics Data System (ADS)
Facskó, G.; Trotignon, J. G.; Dandouras, I.; Lucek, E. A.; Daly, P. W.
2010-02-01
Hot flow anomalies (HFAs) were first discovered in the early 1980s at the bow shock of the Earth. In the 1990s these features were studied, observed and simulated very intensively and many new missions (Cluster, THEMIS, Cassini and Venus Express) focused the attention to this phenomenon again. Many basic features and the HFA formation mechanism were clarified observationally and using hybrid simulation techniques. We described previous observational, theoretical and simulation results in the research field of HFAs. We introduced HFA observations performed at the Earth, Mars, Venus and Saturn in this paper. We share different observation results of space mission to give an overview to the reader. Cluster multi-spacecraft measurements gave us more observed HFA events and finer, more sophisticated methods to understand them better. In this study, HFAs were studied using observations of the Cluster magnetometer and the Cluster plasma detector aboard the four Cluster spacecraft. Energetic particle measurements (28.2-68.9 keV) were also used to detect and select HFAs. We studied several specific features of tangential discontinuities generating HFAs on the basis of Cluster measurements in the period February-April 2003, December 2005-April 2006 and January-April, 2007, when the separation of spacecraft was large and the Cluster fleet reached the bow shock. We have confirmed the condition for forming HFAs, that the solar wind speed is higher than the average. This condition was also confirmed by simultaneous ACE magnetic field and solar wind plasma observations at the L1 point 1.4 million km upstream of the Earth. The measured and calculated features of HFA events were compared with the results of different previous hybrid simulations. During the whole spring season of 2003, the solar wind speed was higher than the average. Here we checked whether the higher solar wind speed is a real condition of HFA formation also in 2006 and 2007. At the end we gave an outlook and suggested several desirable direction of the further research of HFAs using the measurements of Cluster, THEMIS, incoming Cross Scale and other space missions.
Machine-learned cluster identification in high-dimensional data.
Ultsch, Alfred; Lötsch, Jörn
2017-02-01
High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Kim, Sang-Hee; Byun, Youngsoon
Symptom clusters must be identified in patients with high-grade brain cancers for effective symptom management during cancer-related therapy. The aims of this study were to identify symptom clusters in patients with high-grade brain cancers and to determine the relationship of each cluster with the performance status and quality of life (QOL) during concurrent chemoradiotherapy (CCRT). Symptoms were assessed using the Memorial Symptom Assessment Scale, and the performance status was evaluated using the Karnofsky Performance Scale. Quality of life was assessed using the Functional Assessment of Cancer Therapy-General. This prospective longitudinal survey was conducted before CCRT and at 2 to 3 weeks and 4 to 6 weeks after the initiation of CCRT. A total of 51 patients with newly diagnosed primary malignant brain cancer were included. Six symptom clusters were identified, and 2 symptom clusters were present at each time point (ie, "negative emotion" and "neurocognitive" clusters before CCRT, "negative emotion and decreased vitality" and "gastrointestinal and decreased sensory" clusters at 2-3 weeks, and "body image and decreased vitality" and "gastrointestinal" clusters at 4-6 weeks). The symptom clusters at each time point demonstrated a significant relationship with the performance status or QOL. Differences were observed in symptom clusters in patients with high-grade brain cancers during CCRT. In addition, the symptom clusters were correlated with the performance status and QOL of patients, and these effects could change during CCRT. The results of this study will provide suggestions for interventions to treat or prevent symptom clusters in patients with high-grade brain cancer during CCRT.
The Measurement of Sulfur Oxidation Products and Their Role in Homogeneous Nucleation
NASA Technical Reports Server (NTRS)
Eisele, F. L.
1999-01-01
An improved version of a transverse ion source was developed which uses selected ion chemical ionization mass spectrometry techniques inside of a particle nucleation flow tube. These new techniques are very unique, in that the chemical ionization is done inside of the flow tube rather than by having to remove the compounds and clusters of interest which are lost on first contact,with any surfaces. The transverse source is also unique because it allows the ion reaction time to be varied over more than an order of magnitude, which in turn makes possible the separation of ion induced cluster growth from the charging of preexisting molecular clusters. As a result of combining these unique capabilities, the first ever measurements of prenucleation molecular clusters were performed. These clusters are the intermediate stage of growth in the gas-to-particle conversion process. This new technique provides a means of observing clusters containing 2, 3, 4, ... and up to about 8 sulfuric acid molecules, where the critical cluster size under these measurement conditions was about 4 or 5. Thus, the nucleation process can now be directly observed and even growth beyond the critical cluster size can be investigated. The details of this investigation are discussed in a recently submitted paper, which is included as Appendix A. Measurements of the diffusion coefficient of sulfuric acid and sulfuric acid clustered with a water molecule have also been performed. The measurements are also discussed in more detail in another recently submitted paper which is included as Appendix B. The empirical results discussed in both of these papers provide a critical test of present nucleation theories. They also provide new hope for resolving many of the huge discrepancies between field observation and model prediction of particle nucleation. The second part of the research conducted under this project was directed towards the development of new chemical ionization techniques for measuring sulfur oxidation products.
HORN-6 special-purpose clustered computing system for electroholography.
Ichihashi, Yasuyuki; Nakayama, Hirotaka; Ito, Tomoyoshi; Masuda, Nobuyuki; Shimobaba, Tomoyoshi; Shiraki, Atsushi; Sugie, Takashige
2009-08-03
We developed the HORN-6 special-purpose computer for holography. We designed and constructed the HORN-6 board to handle an object image composed of one million points and constructed a cluster system composed of 16 HORN-6 boards. Using this HORN-6 cluster system, we succeeded in creating a computer-generated hologram of a three-dimensional image composed of 1,000,000 points at a rate of 1 frame per second, and a computer-generated hologram of an image composed of 100,000 points at a rate of 10 frames per second, which is near video rate, when the size of a computer-generated hologram is 1,920 x 1,080. The calculation speed is approximately 4,600 times faster than that of a personal computer with an Intel 3.4-GHz Pentium 4 CPU.
Testing prediction methods: Earthquake clustering versus the Poisson model
Michael, A.J.
1997-01-01
Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.
Dynamic multifactor clustering of financial networks
NASA Astrophysics Data System (ADS)
Ross, Gordon J.
2014-02-01
We investigate the tendency for financial instruments to form clusters when there are multiple factors influencing the correlation structure. Specifically, we consider a stock portfolio which contains companies from different industrial sectors, located in several different countries. Both sector membership and geography combine to create a complex clustering structure where companies seem to first be divided based on sector, with geographical subclusters emerging within each industrial sector. We argue that standard techniques for detecting overlapping clusters and communities are not able to capture this type of structure and show how robust regression techniques can instead be used to remove the influence of both sector and geography from the correlation matrix separately. Our analysis reveals that prior to the 2008 financial crisis, companies did not tend to form clusters based on geography. This changed immediately following the crisis, with geography becoming a more important determinant of clustering structure.
Correcting for deformation in skin-based marker systems.
Alexander, E J; Andriacchi, T P
2001-03-01
A new technique is described that reduces error due to skin movement artifact in the opto-electronic measurement of in vivo skeletal motion. This work builds on a previously described point cluster technique marker set and estimation algorithm by extending the transformation equations to the general deformation case using a set of activity-dependent deformation models. Skin deformation during activities of daily living are modeled as consisting of a functional form defined over the observation interval (the deformation model) plus additive noise (modeling error). The method is described as an interval deformation technique. The method was tested using simulation trials with systematic and random components of deformation error introduced into marker position vectors. The technique was found to substantially outperform methods that require rigid-body assumptions. The method was tested in vivo on a patient fitted with an external fixation device (Ilizarov). Simultaneous measurements from markers placed on the Ilizarov device (fixed to bone) were compared to measurements derived from skin-based markers. The interval deformation technique reduced the errors in limb segment pose estimate by 33 and 25% compared to the classic rigid-body technique for position and orientation, respectively. This newly developed method has demonstrated that by accounting for the changing shape of the limb segment, a substantial improvement in the estimates of in vivo skeletal movement can be achieved.
Parrish, Robert M; Burns, Lori A; Smith, Daniel G A; Simmonett, Andrew C; DePrince, A Eugene; Hohenstein, Edward G; Bozkaya, Uğur; Sokolov, Alexander Yu; Di Remigio, Roberto; Richard, Ryan M; Gonthier, Jérôme F; James, Andrew M; McAlexander, Harley R; Kumar, Ashutosh; Saitow, Masaaki; Wang, Xiao; Pritchard, Benjamin P; Verma, Prakash; Schaefer, Henry F; Patkowski, Konrad; King, Rollin A; Valeev, Edward F; Evangelista, Francesco A; Turney, Justin M; Crawford, T Daniel; Sherrill, C David
2017-07-11
Psi4 is an ab initio electronic structure program providing methods such as Hartree-Fock, density functional theory, configuration interaction, and coupled-cluster theory. The 1.1 release represents a major update meant to automate complex tasks, such as geometry optimization using complete-basis-set extrapolation or focal-point methods. Conversion of the top-level code to a Python module means that Psi4 can now be used in complex workflows alongside other Python tools. Several new features have been added with the aid of libraries providing easy access to techniques such as density fitting, Cholesky decomposition, and Laplace denominators. The build system has been completely rewritten to simplify interoperability with independent, reusable software components for quantum chemistry. Finally, a wide range of new theoretical methods and analyses have been added to the code base, including functional-group and open-shell symmetry adapted perturbation theory, density-fitted coupled cluster with frozen natural orbitals, orbital-optimized perturbation and coupled-cluster methods (e.g., OO-MP2 and OO-LCCD), density-fitted multiconfigurational self-consistent field, density cumulant functional theory, algebraic-diagrammatic construction excited states, improvements to the geometry optimizer, and the "X2C" approach to relativistic corrections, among many other improvements.
Vision based obstacle detection and grouping for helicopter guidance
NASA Technical Reports Server (NTRS)
Sridhar, Banavar; Chatterji, Gano
1993-01-01
Electro-optical sensors can be used to compute range to objects in the flight path of a helicopter. The computation is based on the optical flow/motion at different points in the image. The motion algorithms provide a sparse set of ranges to discrete features in the image sequence as a function of azimuth and elevation. For obstacle avoidance guidance and display purposes, these discrete set of ranges, varying from a few hundreds to several thousands, need to be grouped into sets which correspond to objects in the real world. This paper presents a new method for object segmentation based on clustering the sparse range information provided by motion algorithms together with the spatial relation provided by the static image. The range values are initially grouped into clusters based on depth. Subsequently, the clusters are modified by using the K-means algorithm in the inertial horizontal plane and the minimum spanning tree algorithms in the image plane. The object grouping allows interpolation within a group and enables the creation of dense range maps. Researchers in robotics have used densely scanned sequence of laser range images to build three-dimensional representation of the outside world. Thus, modeling techniques developed for dense range images can be extended to sparse range images. The paper presents object segmentation results for a sequence of flight images.
Clustering Categorical Data Using Community Detection Techniques
2017-01-01
With the advent of the k-modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in k-modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing k modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top-k detected communities by size will define the k modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for k-modes in terms of accuracy, precision, and recall in most of the cases. PMID:29430249
NASA Astrophysics Data System (ADS)
Wang, Gang; Wu, Nanhua; Chen, Jionghua; Wang, Jinjian; Shao, Jingling; Zhu, Xiaolei; Lu, Xiaohua; Guo, Lucun
2016-11-01
The thermodynamic and kinetic behaviors of gold nanoparticles confined between two-layer graphene nanosheets (two-layer-GNSs) are examined and investigated during heating and cooling processes via molecular dynamics (MD) simulation technique. An EAM potential is applied to represent the gold-gold interactions while a Lennard-Jones (L-J) potential is used to describe the gold-GNS interactions. The MD melting temperature of 1345 K for bulk gold is close to the experimental value (1337 K), confirming that the EAM potential used to describe gold-gold interactions is reliable. On the other hand, the melting temperatures of gold clusters supported on graphite bilayer are corrected to the corresponding experimental values by adjusting the εAu-C value. Therefore, the subsequent results from current work are reliable. The gold nanoparticles confined within two-layer GNSs exhibit face center cubic structures, which is similar to those of free gold clusters and bulk gold. The melting points, heats of fusion, and heat capacities of the confined gold nanoparticles are predicted based on the plots of total energies against temperature. The density distribution perpendicular to GNS suggests that the freezing of confined gold nanoparticles starts from outermost layers. The confined gold clusters exhibit layering phenomenon even in liquid state. The transition of order-disorder in each layer is an essential characteristic in structure for the freezing phase transition of the confined gold clusters. Additionally, some vital kinetic data are obtained in terms of classical nucleation theory.
Sensitivity evaluation of dynamic speckle activity measurements using clustering methods.
Etchepareborda, Pablo; Federico, Alejandro; Kaufmann, Guillermo H
2010-07-01
We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.
Photometry Using Kepler "Superstamps" of Open Clusters NGC 6791 & NGC 6819
NASA Astrophysics Data System (ADS)
Kuehn, Charles A.; Drury, Jason A.; Bellamy, Beau R.; Stello, Dennis; Bedding, Timothy R.; Reed, Mike; Quick, Breanna
2015-09-01
The Kepler space telescope has proven to be a gold mine for the study of variable stars. Usually, Kepler only reads out a handful of pixels around each pre-selected target star, omitting a large number of stars in the Kepler field. Fortunately, for the open clusters NGC 6791 and NGC 6819, Kepler also read out larger "superstamps" which contained complete images of the central region of each cluster. These cluster images can be used to study additional stars in the open clusters that were not originally on Kepler's target list. We discuss our work on using two photometric techniques to analyze these superstamps and present sample results from this project to demonstrate the value of this technique for a wide variety of variable stars.
NASA Technical Reports Server (NTRS)
Spruce, Joe
2001-01-01
Yellowstone National Park (YNP) contains a diversity of land cover. YNP managers need site-specific land cover maps, which may be produced more effectively using high-resolution hyperspectral imagery. ISODATA clustering techniques have aided operational multispectral image classification and may benefit certain hyperspectral data applications if optimally applied. In response, a study was performed for an area in northeast YNP using 11 select bands of low-altitude AVIRIS data calibrated to ground reflectance. These data were subjected to ISODATA clustering and Maximum Likelihood Classification techniques to produce a moderately detailed land cover map. The latter has good apparent overall agreement with field surveys and aerial photo interpretation.
A compressed sensing method with analytical results for lidar feature classification
NASA Astrophysics Data System (ADS)
Allen, Josef D.; Yuan, Jiangbo; Liu, Xiuwen; Rahmes, Mark
2011-04-01
We present an innovative way to autonomously classify LiDAR points into bare earth, building, vegetation, and other categories. One desirable product of LiDAR data is the automatic classification of the points in the scene. Our algorithm automatically classifies scene points using Compressed Sensing Methods via Orthogonal Matching Pursuit algorithms utilizing a generalized K-Means clustering algorithm to extract buildings and foliage from a Digital Surface Models (DSM). This technology reduces manual editing while being cost effective for large scale automated global scene modeling. Quantitative analyses are provided using Receiver Operating Characteristics (ROC) curves to show Probability of Detection and False Alarm of buildings vs. vegetation classification. Histograms are shown with sample size metrics. Our inpainting algorithms then fill the voids where buildings and vegetation were removed, utilizing Computational Fluid Dynamics (CFD) techniques and Partial Differential Equations (PDE) to create an accurate Digital Terrain Model (DTM) [6]. Inpainting preserves building height contour consistency and edge sharpness of identified inpainted regions. Qualitative results illustrate other benefits such as Terrain Inpainting's unique ability to minimize or eliminate undesirable terrain data artifacts.
Chapter 7. Cloning and analysis of natural product pathways.
Gust, Bertolt
2009-01-01
The identification of gene clusters of natural products has lead to an enormous wealth of information about their biosynthesis and its regulation, and about self-resistance mechanisms. Well-established routine techniques are now available for the cloning and sequencing of gene clusters. The subsequent functional analysis of the complex biosynthetic machinery requires efficient genetic tools for manipulation. Until recently, techniques for the introduction of defined changes into Streptomyces chromosomes were very time-consuming. In particular, manipulation of large DNA fragments has been challenging due to the absence of suitable restriction sites for restriction- and ligation-based techniques. The homologous recombination approach called recombineering (referred to as Red/ET-mediated recombination in this chapter) has greatly facilitated targeted genetic modifications of complex biosynthetic pathways from actinomycetes by eliminating many of the time-consuming and labor-intensive steps. This chapter describes techniques for the cloning and identification of biosynthetic gene clusters, for the generation of gene replacements within such clusters, for the construction of integrative library clones and their expression in heterologous hosts, and for the assembly of entire biosynthetic gene clusters from the inserts of individual library clones. A systematic approach toward insertional mutation of a complete Streptomyces genome is shown by the use of an in vitro transposon mutagenesis procedure.
Oluwadare, Oluwatosin; Cheng, Jianlin
2017-11-14
With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD .
Hedgehog bases for A n cluster polylogarithms and an application to six-point amplitudes
Parker, Daniel E.; Scherlis, Adam; Spradlin, Marcus; ...
2015-11-20
Multi-loop scattering amplitudes in N=4 Yang-Mills theory possess cluster algebra structure. In order to develop a computational framework which exploits this connection, we show how to construct bases of Goncharov polylogarithm functions, at any weight, whose symbol alphabet consists of cluster coordinates on the A n cluster algebra. As a result, using such a basis we present a new expression for the 2-loop 6-particle NMHV amplitude which makes some of its cluster structure manifest.
The Use of Cluster Analysis in Typological Research on Community College Students
ERIC Educational Resources Information Center
Bahr, Peter Riley; Bielby, Rob; House, Emily
2011-01-01
One useful and increasingly popular method of classifying students is known commonly as cluster analysis. The variety of techniques that comprise the cluster analytic family are intended to sort observations (for example, students) within a data set into subsets (clusters) that share similar characteristics and differ in meaningful ways from other…
Cluster functions and scattering amplitudes for six and seven points
Harrington, Thomas; Spradlin, Marcus
2017-07-05
Scattering amplitudes in planar super-Yang-Mills theory satisfy several basic physical and mathematical constraints, including physical constraints on their branch cut structure and various empirically discovered connections to the mathematics of cluster algebras. The power of the bootstrap program for amplitudes is inversely proportional to the size of the intersection between these physical and mathematical constraints: ideally we would like a list of constraints which determine scattering amplitudes uniquely. Here, we explore this intersection quantitatively for two-loop six- and seven-point amplitudes by providing a complete taxonomy of the Gr(4, 6) and Gr(4, 7) cluster polylogarithm functions of [15] at weight 4.
Gholami, Mohammad; Brennan, Robert W
2016-01-06
In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency.
Gholami, Mohammad; Brennan, Robert W.
2016-01-01
In this paper, we investigate alternative distributed clustering techniques for wireless sensor node tracking in an industrial environment. The research builds on extant work on wireless sensor node clustering by reporting on: (1) the development of a novel distributed management approach for tracking mobile nodes in an industrial wireless sensor network; and (2) an objective comparison of alternative cluster management approaches for wireless sensor networks. To perform this comparison, we focus on two main clustering approaches proposed in the literature: pre-defined clusters and ad hoc clusters. These approaches are compared in the context of their reconfigurability: more specifically, we investigate the trade-off between the cost and the effectiveness of competing strategies aimed at adapting to changes in the sensing environment. To support this work, we introduce three new metrics: a cost/efficiency measure, a performance measure, and a resource consumption measure. The results of our experiments show that ad hoc clusters adapt more readily to changes in the sensing environment, but this higher level of adaptability is at the cost of overall efficiency. PMID:26751447
Symmetry, Hopf bifurcation, and the emergence of cluster solutions in time delayed neural networks.
Wang, Zhen; Campbell, Sue Ann
2017-11-01
We consider the networks of N identical oscillators with time delayed, global circulant coupling, modeled by a system of delay differential equations with Z N symmetry. We first study the existence of Hopf bifurcations induced by the coupling time delay and then use symmetric Hopf bifurcation theory to determine how these bifurcations lead to different patterns of symmetric cluster oscillations. We apply our results to a case study: a network of FitzHugh-Nagumo neurons with diffusive coupling. For this model, we derive the asymptotic stability, global asymptotic stability, absolute instability, and stability switches of the equilibrium point in the plane of coupling time delay (τ) and excitability parameter (a). We investigate the patterns of cluster oscillations induced by the time delay and determine the direction and stability of the bifurcating periodic orbits by employing the multiple timescales method and normal form theory. We find that in the region where stability switching occurs, the dynamics of the system can be switched from the equilibrium point to any symmetric cluster oscillation, and back to equilibrium point as the time delay is increased.
Branching points in the low-temperature dipolar hard sphere fluid
NASA Astrophysics Data System (ADS)
Rovigatti, Lorenzo; Kantorovich, Sofia; Ivanov, Alexey O.; Tavares, José Maria; Sciortino, Francesco
2013-10-01
In this contribution, we investigate the low-temperature, low-density behaviour of dipolar hard-sphere (DHS) particles, i.e., hard spheres with dipoles embedded in their centre. We aim at describing the DHS fluid in terms of a network of chains and rings (the fundamental clusters) held together by branching points (defects) of different nature. We first introduce a systematic way of classifying inter-cluster connections according to their topology, and then employ this classification to analyse the geometric and thermodynamic properties of each class of defects, as extracted from state-of-the-art equilibrium Monte Carlo simulations. By computing the average density and energetic cost of each defect class, we find that the relevant contribution to inter-cluster interactions is indeed provided by (rare) three-way junctions and by four-way junctions arising from parallel or anti-parallel locally linear aggregates. All other (numerous) defects are either intra-cluster or associated to low cluster-cluster interaction energies, suggesting that these defects do not play a significant part in the thermodynamic description of the self-assembly processes of dipolar hard spheres.
A cluster merging method for time series microarray with production values.
Chira, Camelia; Sedano, Javier; Camara, Monica; Prieto, Carlos; Villar, Jose R; Corchado, Emilio
2014-09-01
A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.
Intelligent Traffic Quantification System
NASA Astrophysics Data System (ADS)
Mohanty, Anita; Bhanja, Urmila; Mahapatra, Sudipta
2017-08-01
Currently, city traffic monitoring and controlling is a big issue in almost all cities worldwide. Vehicular ad-hoc Network (VANET) technique is an efficient tool to minimize this problem. Usually, different types of on board sensors are installed in vehicles to generate messages characterized by different vehicle parameters. In this work, an intelligent system based on fuzzy clustering technique is developed to reduce the number of individual messages by extracting important features from the messages of a vehicle. Therefore, the proposed fuzzy clustering technique reduces the traffic load of the network. The technique also reduces congestion and quantifies congestion.
Hybrid Clustering-GWO-NARX neural network technique in predicting stock price
NASA Astrophysics Data System (ADS)
Das, Debashish; Safa Sadiq, Ali; Mirjalili, Seyedali; Noraziah, A.
2017-09-01
Prediction of stock price is one of the most challenging tasks due to nonlinear nature of the stock data. Though numerous attempts have been made to predict the stock price by applying various techniques, yet the predicted price is not always accurate and even the error rate is high to some extent. Consequently, this paper endeavours to determine an efficient stock prediction strategy by implementing a combinatorial method of Grey Wolf Optimizer (GWO), Clustering and Non Linear Autoregressive Exogenous (NARX) Technique. The study uses stock data from prominent stock market i.e. New York Stock Exchange (NYSE), NASDAQ and emerging stock market i.e. Malaysian Stock Market (Bursa Malaysia), Dhaka Stock Exchange (DSE). It applies K-means clustering algorithm to determine the most promising cluster, then MGWO is used to determine the classification rate and finally the stock price is predicted by applying NARX neural network algorithm. The prediction performance gained through experimentation is compared and assessed to guide the investors in making investment decision. The result through this technique is indeed promising as it has shown almost precise prediction and improved error rate. We have applied the hybrid Clustering-GWO-NARX neural network technique in predicting stock price. We intend to work with the effect of various factors in stock price movement and selection of parameters. We will further investigate the influence of company news either positive or negative in stock price movement. We would be also interested to predict the Stock indices.
A revised moving cluster distance to the Pleiades open cluster
NASA Astrophysics Data System (ADS)
Galli, P. A. B.; Moraux, E.; Bouy, H.; Bouvier, J.; Olivares, J.; Teixeira, R.
2017-02-01
Context. The distance to the Pleiades open cluster has been extensively debated in the literature over several decades. Although different methods point to a discrepancy in the trigonometric parallaxes produced by the Hipparcos mission, the number of individual stars with known distances is still small compared to the number of cluster members to help solve this problem. Aims: We provide a new distance estimate for the Pleiades based on the moving cluster method, which will be useful to further discuss the so-called Pleiades distance controversy and compare it with the very precise parallaxes from the Gaia space mission. Methods: We apply a refurbished implementation of the convergent point search method to an updated census of Pleiades stars to calculate the convergent point position of the cluster from stellar proper motions. Then, we derive individual parallaxes for 64 cluster members using radial velocities compiled from the literature, and approximate parallaxes for another 1146 stars based on the spatial velocity of the cluster. This represents the largest sample of Pleiades stars with individual distances to date. Results: The parallaxes derived in this work are in good agreement with previous results obtained in different studies (excluding Hipparcos) for individual stars in the cluster. We report a mean parallax of 7.44 ± 0.08 mas and distance of pc that is consistent with the weighted mean of 135.0 ± 0.6 pc obtained from the non-Hipparcos results in the literature. Conclusions: Our result for the distance to the Pleiades open cluster is not consistent with the Hipparcos catalog, but favors the recent and more precise distance determination of 136.2 ± 1.2 pc obtained from Very Long Baseline Interferometry observations. It is also in good agreement with the mean distance of 133 ± 5 pc obtained from the first trigonometric parallaxes delivered by the Gaia satellite for the brightest cluster members in common with our sample. Full Table B.2 is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A48
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.
Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan
2017-09-27
A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.
Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis
Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan
2017-01-01
A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation. PMID:28953231
Contemporary machine learning: techniques for practitioners in the physical sciences
NASA Astrophysics Data System (ADS)
Spears, Brian
2017-10-01
Machine learning is the science of using computers to find relationships in data without explicitly knowing or programming those relationships in advance. Often without realizing it, we employ machine learning every day as we use our phones or drive our cars. Over the last few years, machine learning has found increasingly broad application in the physical sciences. This most often involves building a model relationship between a dependent, measurable output and an associated set of controllable, but complicated, independent inputs. The methods are applicable both to experimental observations and to databases of simulated output from large, detailed numerical simulations. In this tutorial, we will present an overview of current tools and techniques in machine learning - a jumping-off point for researchers interested in using machine learning to advance their work. We will discuss supervised learning techniques for modeling complicated functions, beginning with familiar regression schemes, then advancing to more sophisticated decision trees, modern neural networks, and deep learning methods. Next, we will cover unsupervised learning and techniques for reducing the dimensionality of input spaces and for clustering data. We'll show example applications from both magnetic and inertial confinement fusion. Along the way, we will describe methods for practitioners to help ensure that their models generalize from their training data to as-yet-unseen test data. We will finally point out some limitations to modern machine learning and speculate on some ways that practitioners from the physical sciences may be particularly suited to help. This work was performed by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Greenway, Kyle T.; LeGresley, Eric B.; Pinto, B. Mario
2013-01-01
Neuraminidase inhibitors are the main pharmaceutical agents employed for treatments of influenza infections. The neuraminidase structures typically exhibit a 150-cavity, an exposed pocket that is adjacent to the catalytic site. This site offers promising additional contact points for improving potency of existing pharmaceuticals, as well as generating entirely new candidate inhibitors. Several inhibitors based on known compounds and designed to interact with 150-cavity residues have been reported. However, the dynamics of any of these inhibitors remains unstudied and their viability remains unknown. This work reports the outcome of long-term, all-atom molecular dynamics simulations of four such inhibitors, along with three standard inhibitors for comparison. Each is studied in complex with four representative neuraminidase structures, which are also simulated in the absence of ligands for comparison, resulting in a total simulation time of 9.6µs. Our results demonstrate that standard inhibitors characteristically reduce the mobility of these dynamic proteins, while the 150-binders do not, instead giving rise to many unique conformations. We further describe an improved RMSD-based clustering technique that isolates these conformations – the structures of which are provided to facilitate future molecular docking studies – and reveals their interdependence. We find that this approach confers many advantages over previously described techniques, and the implications for rational drug design are discussed. PMID:23544106
Greenway, Kyle T; LeGresley, Eric B; Pinto, B Mario
2013-01-01
Neuraminidase inhibitors are the main pharmaceutical agents employed for treatments of influenza infections. The neuraminidase structures typically exhibit a 150-cavity, an exposed pocket that is adjacent to the catalytic site. This site offers promising additional contact points for improving potency of existing pharmaceuticals, as well as generating entirely new candidate inhibitors. Several inhibitors based on known compounds and designed to interact with 150-cavity residues have been reported. However, the dynamics of any of these inhibitors remains unstudied and their viability remains unknown. This work reports the outcome of long-term, all-atom molecular dynamics simulations of four such inhibitors, along with three standard inhibitors for comparison. Each is studied in complex with four representative neuraminidase structures, which are also simulated in the absence of ligands for comparison, resulting in a total simulation time of 9.6 µs. Our results demonstrate that standard inhibitors characteristically reduce the mobility of these dynamic proteins, while the 150-binders do not, instead giving rise to many unique conformations. We further describe an improved RMSD-based clustering technique that isolates these conformations--the structures of which are provided to facilitate future molecular docking studies--and reveals their interdependence. We find that this approach confers many advantages over previously described techniques, and the implications for rational drug design are discussed.
Clustering of financial time series with application to index and enhanced index tracking portfolio
NASA Astrophysics Data System (ADS)
Dose, Christian; Cincotti, Silvano
2005-09-01
A stochastic-optimization technique based on time series cluster analysis is described for index tracking and enhanced index tracking problems. Our methodology solves the problem in two steps, i.e., by first selecting a subset of stocks and then setting the weight of each stock as a result of an optimization process (asset allocation). Present formulation takes into account constraints on the number of stocks and on the fraction of capital invested in each of them, whilst not including transaction costs. Computational results based on clustering selection are compared to those of random techniques and show the importance of clustering in noise reduction and robust forecasting applications, in particular for enhanced index tracking.
Security and Correctness Analysis on Privacy-Preserving k-Means Clustering Schemes
NASA Astrophysics Data System (ADS)
Su, Chunhua; Bao, Feng; Zhou, Jianying; Takagi, Tsuyoshi; Sakurai, Kouichi
Due to the fast development of Internet and the related IT technologies, it becomes more and more easier to access a large amount of data. k-means clustering is a powerful and frequently used technique in data mining. Many research papers about privacy-preserving k-means clustering were published. In this paper, we analyze the existing privacy-preserving k-means clustering schemes based on the cryptographic techniques. We show those schemes will cause the privacy breach and cannot output the correct results due to the faults in the protocol construction. Furthermore, we analyze our proposal as an option to improve such problems but with intermediate information breach during the computation.
Sideloading - Ingestion of Large Point Clouds Into the Apache Spark Big Data Engine
NASA Astrophysics Data System (ADS)
Boehm, J.; Liu, K.; Alis, C.
2016-06-01
In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.
NASA Astrophysics Data System (ADS)
Husser, Tim-Oliver; Kamann, Sebastian; Dreizler, Stefan; Wendt, Martin; Wulff, Nina; Bacon, Roland; Wisotzki, Lutz; Brinchmann, Jarle; Weilbacher, Peter M.; Roth, Martin M.; Monreal-Ibero, Ana
2016-04-01
Aims: We demonstrate the high multiplex advantage of crowded field 3D spectroscopy with the new integral field spectrograph MUSE by means of a spectroscopic analysis of more than 12 000 individual stars in the globular cluster NGC 6397. Methods: The stars are deblended with a point spread function fitting technique, using a photometric reference catalogue from HST as prior, including relative positions and brightnesses. This catalogue is also used for a first analysis of the extracted spectra, followed by an automatic in-depth analysis via a full-spectrum fitting method based on a large grid of PHOENIX spectra. Results: We analysed the largest sample so far available for a single globular cluster of 18 932 spectra from 12 307 stars in NGC 6397. We derived a mean radial velocity of vrad = 17.84 ± 0.07 km s-1 and a mean metallicity of [Fe/H] = -2.120 ± 0.002, with the latter seemingly varying with temperature for stars on the red giant branch (RGB). We determine Teff and [Fe/H] from the spectra, and log g from HST photometry. This is the first very comprehensive Hertzsprung-Russell diagram (HRD) for a globular cluster based on the analysis of several thousands of stellar spectra, ranging from the main sequence to the tip of the RGB. Furthermore, two interesting objects were identified; one is a post-AGB star and the other is a possible millisecond-pulsar companion. Data products are available at http://muse-vlt.eu/scienceBased on observations obtained at the Very Large Telescope (VLT) of the European Southern Observatory, Paranal, Chile (ESO Programme ID 60.A-9100(C)).
Melting and glass transition for Ni clusters.
Teng, Yuyong; Zeng, Xianghua; Zhang, Haiyan; Sun, Deyan
2007-03-08
The melting of NiN clusters (N = 29, 50-150) has been investigated by using molecular dynamics (MD) simulations with a quantum corrected Sutton-Chen (Q-SC) many-body potential. Surface melting for Ni147, direct melting for Ni79, and the glass transition for Ni29 have been found, and those melting points are equal to 540, 680, and 940 K, respectively. It shows that the melting temperatures are not only size-dependent but also a symmetrical structure effect; in the neighborhood of the clusters, the cluster with higher symmetry has a higher melting point. From the reciprocal slopes of the caloric curves, the specific heats are obtained as 4.1 kB per atom for the liquid and 3.1 kB per atom for the solid; these values are not influenced by the cluster size apart in the transition region. The calculated results also show that latent heat of fusion is the dominant effect on the melting temperatures (Tm), and the relationship between S and L is given.
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1993-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, the clustering problem was cast into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. An appropriate objective function whose minimum will characterize a good possibilistic partition of the data was constructed, and the membership and prototype update equations from necessary conditions for minimization of our criterion function were derived. The ability of this approach to detect linear and quartic curves in the presence of considerable noise is shown.
Possibilistic clustering for shape recognition
NASA Technical Reports Server (NTRS)
Keller, James M.; Krishnapuram, Raghu
1992-01-01
Clustering methods have been used extensively in computer vision and pattern recognition. Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector to a given class is not required at each iteration. Recently fuzzy clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also clusters which are actually 'thin shells', i.e., curves and surfaces. Most analytic fuzzy clustering approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the probabilistic constraint that the memberships of a data point across classes sum to one. This constraint was used to generate the membership update equations for an iterative algorithm. Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in noisy environments. Recently, we cast the clustering problem into the framework of possibility theory. Our approach was radically different from the existing clustering methods in that the resulting partition of the data can be interpreted as a possibilistic partition, and the membership values may be interpreted as degrees of possibility of the points belonging to the classes. We constructed an appropriate objective function whose minimum will characterize a good possibilistic partition of the data, and we derived the membership and prototype update equations from necessary conditions for minimization of our criterion function. In this paper, we show the ability of this approach to detect linear and quartic curves in the presence of considerable noise.
NASA Astrophysics Data System (ADS)
Clark, D. M.; Eikenberry, S. S.; Brandl, B. R.; Wilson, J. C.; Carson, J. C.; Henderson, C. P.; Hayward, T. L.; Barry, D. J.; Ptak, A. F.; Colbert, E. J. M.
2008-05-01
We use the previously identified 15 infrared star cluster counterparts to X-ray point sources in the interacting galaxies NGC 4038/4039 (the Antennae) to study the relationship between total cluster mass and X-ray binary number. This significant population of X-Ray/IR associations allows us to perform, for the first time, a statistical study of X-ray point sources and their environments. We define a quantity, η, relating the fraction of X-ray sources per unit mass as a function of cluster mass in the Antennae. We compute cluster mass by fitting spectral evolutionary models to Ks luminosity. Considering that this method depends on cluster age, we use four different age distributions to explore the effects of cluster age on the value of η and find it varies by less than a factor of 4. We find a mean value of η for these different distributions of η = 1.7 × 10-8 M-1⊙ with ση = 1.2 × 10-8 M-1⊙. Performing a χ2 test, we demonstrate η could exhibit a positive slope, but that it depends on the assumed distribution in cluster ages. While the estimated uncertainties in η are factors of a few, we believe this is the first estimate made of this quantity to "order of magnitude" accuracy. We also compare our findings to theoretical models of open and globular cluster evolution, incorporating the X-ray binary fraction per cluster.
MSClique: Multiple Structure Discovery through the Maximum Weighted Clique Problem.
Sanroma, Gerard; Penate-Sanchez, Adrian; Alquézar, René; Serratosa, Francesc; Moreno-Noguer, Francesc; Andrade-Cetto, Juan; González Ballester, Miguel Ángel
2016-01-01
We present a novel approach for feature correspondence and multiple structure discovery in computer vision. In contrast to existing methods, we exploit the fact that point-sets on the same structure usually lie close to each other, thus forming clusters in the image. Given a pair of input images, we initially extract points of interest and extract hierarchical representations by agglomerative clustering. We use the maximum weighted clique problem to find the set of corresponding clusters with maximum number of inliers representing the multiple structures at the correct scales. Our method is parameter-free and only needs two sets of points along with their tentative correspondences, thus being extremely easy to use. We demonstrate the effectiveness of our method in multiple-structure fitting experiments in both publicly available and in-house datasets. As shown in the experiments, our approach finds a higher number of structures containing fewer outliers compared to state-of-the-art methods.
NASA Astrophysics Data System (ADS)
Riggi, S.; Antonuccio-Delogu, V.; Bandieramonte, M.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Sciacca, E.; Vitello, F.
2013-11-01
Muon tomographic visualization techniques try to reconstruct a 3D image as close as possible to the real localization of the objects being probed. Statistical algorithms under test for the reconstruction of muon tomographic images in the Muon Portal Project are discussed here. Autocorrelation analysis and clustering algorithms have been employed within the context of methods based on the Point Of Closest Approach (POCA) reconstruction tool. An iterative method based on the log-likelihood approach was also implemented. Relative merits of all such methods are discussed, with reference to full GEANT4 simulations of different scenarios, incorporating medium and high-Z objects inside a container.
ERIC Educational Resources Information Center
Pangaribuan, Tagor; Manik, Sondang
2018-01-01
This research held at SMA HKBP 1 Tarutung North Sumatra on the research result of test XI[superscript 2] and XI[superscript 2] students, after they got treatment in teaching writing in recount text by using buzz group and clustering technique. The average score (X) was 67.7 and the total score buzz group the average score (X) was 77.2 and in…
Constrained spectral clustering under a local proximity structure assumption
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri; Xu, Qianjun; des Jardins, Marie
2005-01-01
This work focuses on incorporating pairwise constraints into a spectral clustering algorithm. A new constrained spectral clustering method is proposed, as well as an active constraint acquisition technique and a heuristic for parameter selection. We demonstrate that our constrained spectral clustering method, CSC, works well when the data exhibits what we term local proximity structure.
Learner Typologies Development Using OIndex and Data Mining Based Clustering Techniques
ERIC Educational Resources Information Center
Luan, Jing
2004-01-01
This explorative data mining project used distance based clustering algorithm to study 3 indicators, called OIndex, of student behavioral data and stabilized at a 6-cluster scenario following an exhaustive explorative study of 4, 5, and 6 cluster scenarios produced by K-Means and TwoStep algorithms. Using principles in data mining, the study…
ERIC Educational Resources Information Center
Bortz, Richard F.
To prepare learning materials for health careers programs at the secondary level, the developmental phase of two curriculum projects--the Health Occupations Cluster Curriculum Project and Health-Care Aide Curriculum Project--utilized a model which incorporated a key factor analysis technique. Entitled "A Comprehensive Careers Cluster Curriculum…
Evidence that Clouds of keV Hydrogen Ion Clusters Bounce Elastically from a Solid Surface
NASA Technical Reports Server (NTRS)
Lewis, R. A.; Martin, James J.; Chakrabarti, Suman; Rodgers, Stephen L. (Technical Monitor)
2002-01-01
The behavior of hydrogen ion clusters is tested by an inject/hold/extract technique in a Penning-Malmberg trap. The timing pattern of the extraction signals is consistent with the clusters bouncing elastically from a detector several times. The ion clusters behave more like an elastic fluid than a beam of ions.
An ensemble framework for clustering protein-protein interaction networks.
Asur, Sitaram; Ucar, Duygu; Parthasarathy, Srinivasan
2007-07-01
Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. The presence of biologically relevant functional modules in these networks has been theorized by many researchers. However, the application of traditional clustering algorithms for extracting these modules has not been successful, largely due to the presence of noisy false positive interactions as well as specific topological challenges in the network. In this article, we propose an ensemble clustering framework to address this problem. For base clustering, we introduce two topology-based distance metrics to counteract the effects of noise. We develop a PCA-based consensus clustering technique, designed to reduce the dimensionality of the consensus problem and yield informative clusters. We also develop a soft consensus clustering variant to assign multifaceted proteins to multiple functional groups. We conduct an empirical evaluation of different consensus techniques using topology-based, information theoretic and domain-specific validation metrics and show that our approaches can provide significant benefits over other state-of-the-art approaches. Our analysis of the consensus clusters obtained demonstrates that ensemble clustering can (a) produce improved biologically significant functional groupings; and (b) facilitate soft clustering by discovering multiple functional associations for proteins. Supplementary data are available at Bioinformatics online.
Procedure of Partitioning Data Into Number of Data Sets or Data Group - A Review
NASA Astrophysics Data System (ADS)
Kim, Tai-Hoon
The goal of clustering is to decompose a dataset into similar groups based on a objective function. Some already well established clustering algorithms are there for data clustering. Objective of these data clustering algorithms are to divide the data points of the feature space into a number of groups (or classes) so that a predefined set of criteria are satisfied. The article considers the comparative study about the effectiveness and efficiency of traditional data clustering algorithms. For evaluating the performance of the clustering algorithms, Minkowski score is used here for different data sets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Torres, M. B., E-mail: begonia@ubu.es; Vega, A.; Balbás, L. C.
2014-05-07
Recently, Ar physisorption was used as a structural probe for the location of the Ti dopant atom in aluminium cluster cations, Al{sub n}Ti{sup +} [Lang et al., J. Am. Soc. Mass Spectrom. 22, 1508 (2011)]. As an experiment result, the lack of Ar complexes for n > n{sub c} determines the cluster size for which the Ti atom is located inside of an Al cage. To elucidate the decisive factors for the formation of endohedrally Al{sub n}Ti{sup +}, experimentalists proposed detailed computational studies as indispensable. In this work, we investigated, using the density functional theory, the structural and electronic propertiesmore » of singly titanium doped cationic clusters, Al{sub n}Ti{sup +} (n = 16–21) as well as the adsorption of an Ar atom on them. The first endohedral doped cluster, with Ti encapsulated in a fcc-like cage skeleton, appears at n{sub c} = 21, which is the critical number consistent with the exohedral-endohedral transition experimentally observed. At this critical size the non-crystalline icosahedral growth pattern, related to the pure aluminium clusters, with the Ti atom in the surface, changes into a endohedral fcc-like pattern. The map of structural isomers, relative energy differences, second energy differences, and structural parameters were determined and analyzed. Moreover, we show the critical size depends on the net charge of the cluster, being different for the cationic clusters (n{sub c} = 21) and their neutral counterparts (n{sub c} = 20). For the Al {sub n} Ti {sup +} · Ar complexes, and for n < 21, the preferred Ar adsorption site is on top of the exohedral Ti atom, with adsorption energy in very good agreement with the experimental value. Instead, for n = 21, the Ar adsorption occurs on the top an Al atom with very low absorption energy. For all sizes the geometry of the Al{sub n}Ti{sup +} clusters keeps unaltered in the Ar-cluster complexes. This fact indicates that Ar adsorption does not influence the cluster structure, providing support to the experimental technique used. For n{sub c} = 21, the smallest size of endohedral Ti doped cationic clusters, the Ar binding energy decreases drastically, whereas the Ar-cluster distance increases substantially, point to Ar physisorption, as assumed by the experimentalists. Calculated Ar adsorption energies agree well with available experimental binding energies.« less
Rain volume estimation over areas using satellite and radar data
NASA Technical Reports Server (NTRS)
Doneaud, A. A.; Vonderhaar, T. H.
1985-01-01
The feasibility of rain volume estimation over fixed and floating areas was investigated using rapid scan satellite data following a technique recently developed with radar data, called the Area Time Integral (ATI) technique. The radar and rapid scan GOES satellite data were collected during the Cooperative Convective Precipitation Experiment (CCOPE) and North Dakota Cloud Modification Project (NDCMP). Six multicell clusters and cells were analyzed to the present time. A two-cycle oscillation emphasizing the multicell character of the clusters is demonstrated. Three clusters were selected on each day, 12 June and 2 July. The 12 June clusters occurred during the daytime, while the 2 July clusters during the nighttime. A total of 86 time steps of radar and 79 time steps of satellite images were analyzed. There were approximately 12-min time intervals between radar scans on the average.
Bottom-up strategies for the assembling of magnetic systems using nanoclusters
NASA Astrophysics Data System (ADS)
Dupuis, V.; Hillion, A.; Robert, A.; Loiselet, O.; Khadra, G.; Capiod, P.; Albin, C.; Boisron, O.; Le Roy, D.; Bardotti, L.; Tournus, F.; Tamion, A.
2018-05-01
In the frame of the 20th Anniversary of the Journal of Nanoparticle Research (JNR), our aim is to start from the historical context 20 years ago and to give some recent results and perspectives concerning nanomagnets prepared from clusters preformed in the gas phase using the low-energy cluster beam deposition (LECBD) technique. In this paper, we focus our attention on the typical case of Co clusters embedded in various matrices to study interface magnetic anisotropy and magnetic interactions as a function of volume concentrations, and on still current and perspectives through two examples of binary metallic 3d-5d TM (namely CoPt and FeAu) cluster assemblies to illustrate size-related and nanoalloy phenomena on magnetic properties in well-defined mass-selected clusters. The structural and magnetic properties of these cluster assemblies were investigated using various experimental techniques that include high-resolution transmission electron microscopy (HRTEM), superconducting quantum interference device (SQUID) magnetometry, and synchrotron techniques such as extended X-ray absorption fine structure (EXAFS) and X-ray magnetic circular dichroism (XMCD). Depending on the chemical nature of both NPs and matrix, we observe different magnetic responses compared to their bulk counterparts. In particular, we show how finite size effects (size reduction) enhance their magnetic moment and how specific relaxation in nanoalloys can impact their magnetic anisotropy.
NASA Technical Reports Server (NTRS)
Wolf, S. F.; Lipschutz, M. E.
1993-01-01
Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Old, L.; Wojtak, R.; Pearce, F. R.; ...
2017-12-20
With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Old, L.; Wojtak, R.; Pearce, F. R.
With the advent of wide-field cosmological surveys, we are approaching samples of hundreds of thousands of galaxy clusters. While such large numbers will help reduce statistical uncertainties, the control of systematics in cluster masses is crucial. Here we examine the effects of an important source of systematic uncertainty in galaxy-based cluster mass estimation techniques: the presence of significant dynamical substructure. Dynamical substructure manifests as dynamically distinct subgroups in phase-space, indicating an ‘unrelaxed’ state. This issue affects around a quarter of clusters in a generally selected sample. We employ a set of mock clusters whose masses have been measured homogeneously withmore » commonly used galaxy-based mass estimation techniques (kinematic, richness, caustic, radial methods). We use these to study how the relation between observationally estimated and true cluster mass depends on the presence of substructure, as identified by various popular diagnostics. We find that the scatter for an ensemble of clusters does not increase dramatically for clusters with dynamical substructure. However, we find a systematic bias for all methods, such that clusters with significant substructure have higher measured masses than their relaxed counterparts. This bias depends on cluster mass: the most massive clusters are largely unaffected by the presence of significant substructure, but masses are significantly overestimated for lower mass clusters, by ~ 10 percent at 10 14 and ≳ 20 percent for ≲ 10 13.5. Finally, the use of cluster samples with different levels of substructure can therefore bias certain cosmological parameters up to a level comparable to the typical uncertainties in current cosmological studies.« less
42 CFR 405.503 - Determining customary charges.
Code of Federal Regulations, 2010 CFR
2010-10-01
... exceptional charges on the high side. A significant clustering of charges in the vicinity of the median amount might indicate that a point of such clustering should be taken as the physician's or other person's...
Desorption Induced by KEV Molecular and Cluster Projectiles.
NASA Astrophysics Data System (ADS)
Blain, Matthew Glenn
1990-01-01
A new experimental method has been developed for studying negative secondary ion (SI) emission from solid surfaces bombarded by polyatomic primary ions of 5 to 30 keV. The method is based on the time-of-flight (TOF) analysis of primary ions which are produced by either ^ {252}Cf fission fragment induced desorption or by extraction from a liquid metal ion source, and then accelerated into a field free region. The primary ions included organic monomer, dimer, and fragment ions of coronene and phenylalanine, (CsI)_ nCs ^{+} cluster ions, and Au _sp{n}{+} cluster ions. Secondary electrons, emitted from a target surface upon primary ion impact, are used to identify which primary ion has hit the surface. An event-by-event coincidence counting technique allows several secondary ion TOF spectra, correlated to several different primary ions, to be acquired simultaneously. Negative SI yields from organic (phenylalanine and dinitrostilbene), CsI, and Au surfaces have been measured for a number of different mono- and polyatomic primary ions. The results show, for example, yields ranging from 1 to 10% for phenylalanine (M-H) ^{ -}, 1 to 10% for I^{-} , and 1 to 5% for Au^{-} , with Cs_2I^ {+} and Cs_3I _sp{2}{+} clusters as projectiles. Yields for the same surfaces using Cs ^{+} primary ions are much less than 1%, indicating that SI yields are enhanced with clusters. A yield enhancement occurs when the SI yield per atom of a polyatomic projectile is greater than the SI yield of its monoatomic equivalent, at the same velocity. Thus, a (M-H) ^{-} yield increase of a factor of 50, when phenylalanine is bombarded with Cs_3I_sp{2} {+} instead of Cs^{+ }, represents a yield enhancement factor of 10. For the projectiles and samples studied, it was observed that the heavier the mass of the constituents of a projectile, the larger the enhancement effects, and that the largest yield enhancements (with CsI and Au _ n projectiles) occur for the organic target, phenylalanine. One possible explanation for the larger enhancements with organics, namely a thermal spike process, appears unlikely. Experiments with high and low melting point isomers of dinitrostilbene, bombarded with Cs _2I^{+} and Cs^{+} projectiles, showed larger Cs_2I^ {+} yield enhancements for the high melting point isomer.
The merging cluster Abell 1758: an optical and dynamical view
NASA Astrophysics Data System (ADS)
Monteiro-Oliveira, Rogerio; Serra Cypriano, Eduardo; Machado, Rubens; Lima Neto, Gastao B.
2015-08-01
The galaxy cluster Abell 1758-North (z=0.28) is a binary system composed by the sub-structures NW and NE. This is supposed to be a post-merging cluster due to observed detachment between the NE BCG and the respective X-ray emitting hot gas clump in a scenario very close to the famous Bullet Cluster. On the other hand, the projected position of the NW BCG coincides with the local hot gas peak. This system was been targeted previously by several studies, using multiple wavelengths and techniques, but there is still no clear picture of the scenario that could have caused this unusual configuration. To help solving this complex puzzle we added some pieces: firstly, we have used deep B, RC and z' Subaru images to perform both weak lensing shear and magnification analysis of A1758 (including here the South component that is not in interaction with A1758-North) modeling each sub-clump as an NFW profile in order to constrain masses and its center positions through MCMC methods; the second piece is the dynamical analysis using radial velocities available in the literature (143) plus new Gemini-GMOS/N measurements (68 new redshifts).From weak lensing we found that independent shear and magnification mass determinations are in excellent agreement between them and combining both we could reduce mass error bar by ~30% compared to shear alone. By combining this two weak-lensing probes we found that the position of both Northern BCGs are consistent with the masses centers within 2σ and and the NE hot gas peak to be offseted of the respective mass peak (M200=5.5 X 1014 M⊙) with very high significance. The most massive structure is NW (M200=7.95 X 1014 M⊙ ) where we observed no detachment between gas, DM and BCG.We have calculated a low line-of-sight velocity difference (<300 km/s) between A1758 NW and NE. We have combined it with the projected velocity of 1600 km/s which was estimated by previous X-ray analysis (David & Kempner 2004) and we have obtained a small angle between the plane of collision and the sky (<40 degrees). Dynamic modeling shows that the point of maximum approximation taken place 0.55 Gyr ago, pointing Abell 1758-North as a young merger cluster.
Visualizing statistical significance of disease clusters using cartograms.
Kronenfeld, Barry J; Wong, David W S
2017-05-15
Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.
Clustering P-Wave Receiver Functions To Constrain Subsurface Seismic Structure
NASA Astrophysics Data System (ADS)
Chai, C.; Larmat, C. S.; Maceira, M.; Ammon, C. J.; He, R.; Zhang, H.
2017-12-01
The acquisition of high-quality data from permanent and temporary dense seismic networks provides the opportunity to apply statistical and machine learning techniques to a broad range of geophysical observations. Lekic and Romanowicz (2011) used clustering analysis on tomographic velocity models of the western United States to perform tectonic regionalization and the velocity-profile clusters agree well with known geomorphic provinces. A complementary and somewhat less restrictive approach is to apply cluster analysis directly to geophysical observations. In this presentation, we apply clustering analysis to teleseismic P-wave receiver functions (RFs) continuing efforts of Larmat et al. (2015) and Maceira et al. (2015). These earlier studies validated the approach with surface waves and stacked EARS RFs from the USArray stations. In this study, we experiment with both the K-means and hierarchical clustering algorithms. We also test different distance metrics defined in the vector space of RFs following Lekic and Romanowicz (2011). We cluster data from two distinct data sets. The first, corresponding to the western US, was by smoothing/interpolation of receiver-function wavefield (Chai et al. 2015). Spatial coherence and agreement with geologic region increase with this simpler, spatially smoothed set of observations. The second data set is composed of RFs for more than 800 stations of the China Digital Seismic Network (CSN). Preliminary results show a first order agreement between clusters and tectonic region and each region cluster includes a distinct Ps arrival, which probably reflects differences in crustal thickness. Regionalization remains an important step to characterize a model prior to application of full waveform and/or stochastic imaging techniques because of the computational expense of these types of studies. Machine learning techniques can provide valuable information that can be used to design and characterize formal geophysical inversion, providing information on spatial variability in the subsurface geology.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Segmentation of suspicious objects in an x-ray image using automated region filling approach
NASA Astrophysics Data System (ADS)
Fu, Kenneth; Guest, Clark; Das, Pankaj
2009-08-01
To accommodate the flow of commerce, cargo inspection systems require a high probability of detection and low false alarm rate while still maintaining a minimum scan speed. Since objects of interest (high atomic-number metals) will often be heavily shielded to avoid detection, any detection algorithm must be able to identify such objects despite the shielding. Since pixels of a shielded object have a greater opacity than the shielding, we use a clustering method to classify objects in the image by pixel intensity levels. We then look within each intensity level region for sub-clusters of pixels with greater opacity than the surrounding region. A region containing an object has an enclosed-contour region (a hole) inside of it. We apply a region filling technique to fill in the hole, which represents a shielded object of potential interest. One method for region filling is seed-growing, which puts a "seed" starting point in the hole area and uses a selected structural element to fill out that region. However, automatic seed point selection is a hard problem; it requires additional information to decide if a pixel is within an enclosed region. Here, we propose a simple, robust method for region filling that avoids the problem of seed point selection. In our approach, we calculate the gradient Gx and Gy at each pixel in a binary image, and fill in 1s between a pair of x1 Gx(x1,y)=-1 and x2 Gx(x2,y)=1, and do the same thing in y-direction. The intersection of the two results will be filled region. We give a detailed discussion of our algorithm, discuss the strengths this method has over other methods, and show results of using our method.
Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena
NASA Astrophysics Data System (ADS)
Pankratius, V.; Gowanlock, M.; Blair, D. M.
2015-12-01
Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).
NASA Astrophysics Data System (ADS)
Aaronson, M.; Mould, J.; Huchra, J.; Sullivan, W. T., III; Schommer, R. A.; Bothun, G. D.
1980-07-01
Infrared magnitudes and 21 cm H I velocity widths are presented for galaxies in the Pegasus I cluster (V ≍ 4000 km s-1), the Cancer cluster (V ≍ 4500 km s-1), cluster Zwicky 1400.4 ± 0949 (Z74-23) (V ≍ 6000 km s-1), and the Perseus supercluster (V ≍ 5500 km s-1). The data are used to determine redshift-independent distances from which values of the Hubble ratio can be derived. With a zero point based solely on the Sandage-Tammann distances to M3 1 and M33, the following results are obtained (zero-point error excluded): Pegasus I.--r = 42 ± 4 Mpc, V/r = 91 ± 8 km s-1 Mpc-1; Cancer.--r = = 49 ± 6 Mpc, V/r = 89 ± 11 km s-1 Mpc-1; Z74-23.--r = 6l ± 4 Mpc, V/r = 96 ± 7 km s-1 Mpc-1; Perseus supercluster.--r = 53 ± 2 Mpc, V/r = 104 ± 6 km s-1 Mpc-1; The closely similar value of the Hubble ratio found in the four independent samples suggests that the zero-point calibration in the IR/H I technique does not depend on environment. The difference between the mean of these Hubble ratios, V/r = 95 ± 4 km s-1 Mpc -1, and that measured for Virgo in Paper II, V/r = 65 ±4 km s-1 Mpc-1, is significant at a formal level of 5 σ. The simplest explanation of the discrepancy is to postulate a Local Group component of motion in the direction of Virgo. The resulting velocity perturbation is ΔV = 480 ± 75 km s-1. This value agrees well with recent observations of a dipole term in the 3 K microwave background, the only other anisotropy test for which a detection significance of 5 σ or more is claimed. We are thus led to a preliminary estimate for the value of the Hubble constant of H0 = 95 ± 4 km s-1 Mpc-1. If a zero point based on de Vaucouleurs's distances to M31 and M33 is adopted instead, all distances decrease by , and the Hubble constant increases by a similar amount. A variety of possible systematic errors which might affect the present conclusions are investigated, but we can find none that are relevant. In particular, because the galaxy samples are chosen from a cluster population which is generally all at the same distance, Malmquist bias does not occur. In fact, two of the clusters (Pegasus I and Z74-23) are sampled in both magnitude and velocity width to a level as deep as Virgo itself. Other observational data related to the value of H0 are examined, as are a number of previously used anisotropy tests, including color-luminosity relations, brightest cluster member(s), central surface brightnesses, and supernovae. We find that some of these tests support the present results, while contrary evidence is currently weak. A model in which Virgo gravitationally retards the Hubble flow of galaxies within the Local Supercluster provides a natural interpretation of our findings. A range of 1.5-3 in local density contrast then leads to a value of the density parameter Ω ≍ 0.7-0.2. The deceleration parameter q0 is then 0.35-0.1 for a simple Friedmann-type expanding universe.
Network based approaches reveal clustering in protein point patterns
NASA Astrophysics Data System (ADS)
Parker, Joshua; Barr, Valarie; Aldridge, Joshua; Samelson, Lawrence E.; Losert, Wolfgang
2014-03-01
Recent advances in super-resolution imaging have allowed for the sub-diffraction measurement of the spatial location of proteins on the surfaces of T-cells. The challenge is to connect these complex point patterns to the internal processes and interactions, both protein-protein and protein-membrane. We begin analyzing these patterns by forming a geometric network amongst the proteins and looking at network measures, such the degree distribution. This allows us to compare experimentally observed patterns to models. Specifically, we find that the experimental patterns differ from heterogeneous Poisson processes, highlighting an internal clustering structure. Further work will be to compare our results to simulated protein-protein interactions to determine clustering mechanisms.
Hierarchical Solution of the Traveling Salesman Problem with Random Dyadic Tilings
NASA Astrophysics Data System (ADS)
Kalmár-Nagy, Tamás; Bak, Bendegúz Dezső
We propose a hierarchical heuristic approach for solving the Traveling Salesman Problem (TSP) in the unit square. The points are partitioned with a random dyadic tiling and clusters are formed by the points located in the same tile. Each cluster is represented by its geometrical barycenter and a “coarse” TSP solution is calculated for these barycenters. Midpoints are placed at the middle of each edge in the coarse solution. Near-optimal (or optimal) minimum tours are computed for each cluster. The tours are concatenated using the midpoints yielding a solution for the original TSP. The method is tested on random TSPs (independent, identically distributed points in the unit square) up to 10,000 points as well as on a popular benchmark problem (att532 — coordinates of 532 American cities). Our solutions are 8-13% longer than the optimal ones. We also present an optimization algorithm for the partitioning to improve our solutions. This algorithm further reduces the solution errors (by several percent using 1000 iteration steps). The numerical experiments demonstrate the viability of the approach.
NASA Astrophysics Data System (ADS)
Walz, Michael; Leckebusch, Gregor C.
2016-04-01
Extratropical wind storms pose one of the most dangerous and loss intensive natural hazards for Europe. However, due to only 50 years of high quality observational data, it is difficult to assess the statistical uncertainty of these sparse events just based on observations. Over the last decade seasonal ensemble forecasts have become indispensable in quantifying the uncertainty of weather prediction on seasonal timescales. In this study seasonal forecasts are used in a climatological context: By making use of the up to 51 ensemble members, a broad and physically consistent statistical base can be created. This base can then be used to assess the statistical uncertainty of extreme wind storm occurrence more accurately. In order to determine the statistical uncertainty of storms with different paths of progression, a probabilistic clustering approach using regression mixture models is used to objectively assign storm tracks (either based on core pressure or on extreme wind speeds) to different clusters. The advantage of this technique is that the entire lifetime of a storm is considered for the clustering algorithm. Quadratic curves are found to describe the storm tracks most accurately. Three main clusters (diagonal, horizontal or vertical progression of the storm track) can be identified, each of which have their own particulate features. Basic storm features like average velocity and duration are calculated and compared for each cluster. The main benefit of this clustering technique, however, is to evaluate if the clusters show different degrees of uncertainty, e.g. more (less) spread for tracks approaching Europe horizontally (diagonally). This statistical uncertainty is compared for different seasonal forecast products.
Standard Giant Branches in the Washington Photometric System
NASA Technical Reports Server (NTRS)
Geisler, Doug; Sarajedini, Ata
1998-01-01
We have obtained CCD photometry in the Washington system C, T(sub 1) filters for some 850,000 objects associated with 10 Galactic globular clusters and 2 old open clusters. These clusters have well-known metal abundances, spanning a metallicity range of 2.5 dex from [Fe/H] approx -2.25 to +0.25 at a spacing of approx. 0.2 dex. Two independent observations were obtained for each cluster and internal checks, as well as external comparisons with existing photoelectric photometry, indicate that the final colors and magnitudes have overall uncertainties of 0.03 mag. Analogous to the method employed by Da Costa and Armandroff for V, I photometry , we then proceed to construct standard ((M(sub T),(C - T(sub 1))(sub 0)) giant branches for these clusters adopting the Lee et distance scale, using some 350 stars per globular cluster to define the giant branch. We then determine the metallicity sensitivity of the ((C - T(sub 1))(sub 0) color at a given M((sub T)(sub 1)) value. The Washington system technique is found to have three times the metallicity sensitivity of the V, I technique. At M((sub T)(sub 1)) = -2 (about a magnitude below the tip of the giant branch, roughly equivalent to M(sub I) = -3), the giant branches of 47 Tuc and M15 are separated by 1.16 magnitudes in (V - l)(sub 0) and only 0.38 magnitudes in (V - I)(sub 0). Thus, for a given photometric accuracy, metallicities can be determined three times more precisely with the Washington technique. We find a linear relationship between (C - T(sub l)(sub 0) (at M(sub T)(sub 1) = -2) and metallicity exists over the full metallicity range, with an rms of only 0.04 dex. We also derive metallicity calibrations for M(sub T)(sub 1) = -2.5 and -1.5, as well as for two other metallicity scales. The Washington technique retains almost the same metallicity sensitivity at faint magnitudes , and indeed the standard giant branches are still well separated even below the horizontal branch. The photometry is used to set upper limits in the range 0.03 - 0.09 dex for any intrinsic metallicity dispersion in the calibrating clusters. The calibrations are applicable to objects with ages approx. greater than 5 Gyr - any age effects are small or negligible for such objects. This new technique is found to have many advantages over the old two-color diagram technique for deriving metallicities from Washington photometry. In addition to only requiring 2 filters instead of 3 or 4, the new technique is generally much less sensitive to reddening and photometric errors, and the metallicity sensitivity is many times higher. The new technique is especially advantageous for metal-poor objects. The five metal-poor clusters determined by Geisler et al., using the old technique, to be much more metal-poor than previous indications, yield metallicities using the new technique which are in excellent agreement with the Zinn scale.
CMOS: Efficient Clustered Data Monitoring in Sensor Networks
2013-01-01
Tiny and smart sensors enable applications that access a network of hundreds or thousands of sensors. Thus, recently, many researchers have paid attention to wireless sensor networks (WSNs). The limitation of energy is critical since most sensors are battery-powered and it is very difficult to replace batteries in cases that sensor networks are utilized outdoors. Data transmission between sensor nodes needs more energy than computation in a sensor node. In order to reduce the energy consumption of sensors, we present an approximate data gathering technique, called CMOS, based on the Kalman filter. The goal of CMOS is to efficiently obtain the sensor readings within a certain error bound. In our approach, spatially close sensors are grouped as a cluster. Since a cluster header generates approximate readings of member nodes, a user query can be answered efficiently using the cluster headers. In addition, we suggest an energy efficient clustering method to distribute the energy consumption of cluster headers. Our simulation results with synthetic data demonstrate the efficiency and accuracy of our proposed technique. PMID:24459444
CMOS: efficient clustered data monitoring in sensor networks.
Min, Jun-Ki
2013-01-01
Tiny and smart sensors enable applications that access a network of hundreds or thousands of sensors. Thus, recently, many researchers have paid attention to wireless sensor networks (WSNs). The limitation of energy is critical since most sensors are battery-powered and it is very difficult to replace batteries in cases that sensor networks are utilized outdoors. Data transmission between sensor nodes needs more energy than computation in a sensor node. In order to reduce the energy consumption of sensors, we present an approximate data gathering technique, called CMOS, based on the Kalman filter. The goal of CMOS is to efficiently obtain the sensor readings within a certain error bound. In our approach, spatially close sensors are grouped as a cluster. Since a cluster header generates approximate readings of member nodes, a user query can be answered efficiently using the cluster headers. In addition, we suggest an energy efficient clustering method to distribute the energy consumption of cluster headers. Our simulation results with synthetic data demonstrate the efficiency and accuracy of our proposed technique.
Using cluster analysis for medical resource decision making.
Dilts, D; Khamalah, J; Plotkin, A
1995-01-01
Escalating costs of health care delivery have in the recent past often made the health care industry investigate, adapt, and apply those management techniques relating to budgeting, resource control, and forecasting that have long been used in the manufacturing sector. A strategy that has contributed much in this direction is the definition and classification of a hospital's output into "products" or groups of patients that impose similar resource or cost demands on the hospital. Existing classification schemes have frequently employed cluster analysis in generating these groupings. Unfortunately, the myriad articles and books on clustering and classification contain few formalized selection methodologies for choosing a technique for solving a particular problem, hence they often leave the novice investigator at a loss. This paper reviews the literature on clustering, particularly as it has been applied in the medical resource-utilization domain, addresses the critical choices facing an investigator in the medical field using cluster analysis, and offers suggestions (using the example of clustering low-vision patients) for how such choices can be made.
Identifying irregularly shaped crime hot-spots using a multiobjective evolutionary algorithm
NASA Astrophysics Data System (ADS)
Wu, Xiaolan; Grubesic, Tony H.
2010-12-01
Spatial cluster detection techniques are widely used in criminology, geography, epidemiology, and other fields. In particular, spatial scan statistics are popular and efficient techniques for detecting areas of elevated crime or disease events. The majority of spatial scan approaches attempt to delineate geographic zones by evaluating the significance of clusters using likelihood ratio statistics tested with the Poisson distribution. While this can be effective, many scan statistics give preference to circular clusters, diminishing their ability to identify elongated and/or irregular shaped clusters. Although adjusting the shape of the scan window can mitigate some of these problems, both the significance of irregular clusters and their spatial structure must be accounted for in a meaningful way. This paper utilizes a multiobjective evolutionary algorithm to find clusters with maximum significance while quantitatively tracking their geographic structure. Crime data for the city of Cincinnati are utilized to demonstrate the advantages of the new approach and highlight its benefits versus more traditional scan statistics.
A formal concept analysis approach to consensus clustering of multi-experiment expression data
2014-01-01
Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals. Conclusions The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices. PMID:24885407
Well Conditioned Formulations for Open Surface Scattering
2008-08-01
region on the negative real half of the com- plex plane and tend to cluster about a few points. With few exceptions12, the eigenvalues have converged...a relatively small region on the negative real half of the complex plane and they tend to cluster about a few points. We were surprised, however, to...theory and the results from a numerical implementation. We also discuss a 2d extension of the Poincare -Bertrand identity could be used to develop an
Automatic Clustering Using Multi-objective Particle Swarm and Simulated Annealing
Abubaker, Ahmad; Baharum, Adam; Alrefaei, Mahmoud
2015-01-01
This paper puts forward a new automatic clustering algorithm based on Multi-Objective Particle Swarm Optimization and Simulated Annealing, “MOPSOSA”. The proposed algorithm is capable of automatic clustering which is appropriate for partitioning datasets to a suitable number of clusters. MOPSOSA combines the features of the multi-objective based particle swarm optimization (PSO) and the Multi-Objective Simulated Annealing (MOSA). Three cluster validity indices were optimized simultaneously to establish the suitable number of clusters and the appropriate clustering for a dataset. The first cluster validity index is centred on Euclidean distance, the second on the point symmetry distance, and the last cluster validity index is based on short distance. A number of algorithms have been compared with the MOPSOSA algorithm in resolving clustering problems by determining the actual number of clusters and optimal clustering. Computational experiments were carried out to study fourteen artificial and five real life datasets. PMID:26132309
ERIC Educational Resources Information Center
Zettergren, Peter
2007-01-01
A modern clustering technique was applied to age-10 and age-13 sociometric data with the purpose of identifying longitudinally stable peer status clusters. The study included 445 girls from a Swedish longitudinal study. The identified temporally stable clusters of rejected, popular, and average girls were essentially larger than corresponding…
NASA Astrophysics Data System (ADS)
Chen, Xiangping; Duan, Haiming; Cao, Biaobing; Long, Mengqiu
2018-03-01
The high-temperature first-principle molecular dynamics method used to obtain the low energy configurations of clusters [L. L. Wang and D. D. Johnson, PRB 75, 235405 (2007)] is extended to a considerably large temperature range by combination with the quenching technique. Our results show that there are strong correlations between the possibilities for obtaining the ground-state structure and the temperatures. Larger possibilities can be obtained at relatively low temperatures (as corresponds to the pre-melting temperature range). Details of the structural correlation with the temperature are investigated by taking the Pt13 cluster as an example, which suggests a quite efficient method to obtain the lowest-energy geometries of metal clusters.
NASA Astrophysics Data System (ADS)
Wei, Shiqing; Castleman, A. W., Jr.
1994-02-01
Lase based time-of-flight mass spectrometer systems affixed with reflectrons are valuable tools for investigating cluster dynamics and reactions, spectroscopy and structures. Utilizing the reflectron time-of-flight mass spectrometer techniques, both decay fractions and kinetic energy releases of metastable cluster ions can be measured with high precision. By applying related theoretical models, the desired thermochemical values of metastable species can be deduced, which are otherwise very difficult to obtain. Several examples are discussed with attention focused on ammonia as a test case for hydrogen bond systems, and xenon for weaker van der Waals clusters. A brief overview of applications to investigating solvation effects on reactions and structures, delayed electron transfer and ionization through intracluster Penning ionization is also given.
Patterns in the English language: phonological networks, percolation and assembly models
NASA Astrophysics Data System (ADS)
Stella, Massimo; Brede, Markus
2015-05-01
In this paper we provide a quantitative framework for the study of phonological networks (PNs) for the English language by carrying out principled comparisons to null models, either based on site percolation, randomization techniques, or network growth models. In contrast to previous work, we mainly focus on null models that reproduce lower order characteristics of the empirical data. We find that artificial networks matching connectivity properties of the English PN are exceedingly rare: this leads to the hypothesis that the word repertoire might have been assembled over time by preferentially introducing new words which are small modifications of old words. Our null models are able to explain the ‘power-law-like’ part of the degree distributions and generally retrieve qualitative features of the PN such as high clustering, high assortativity coefficient and small-world characteristics. However, the detailed comparison to expectations from null models also points out significant differences, suggesting the presence of additional constraints in word assembly. Key constraints we identify are the avoidance of large degrees, the avoidance of triadic closure and the avoidance of large non-percolating clusters.
Statistical analysis of dispersion relations in turbulent solar wind fluctuations using Cluster data
NASA Astrophysics Data System (ADS)
Perschke, C.; Narita, Y.
2012-12-01
Multi-spacecraft measurements enable us to resolve three-dimensional spatial structures without assuming Taylor's frozen-in-flow hypothesis. This is very useful to study frequency-wave vector diagram in solar wind turbulence through direct determination of three-dimensional wave vectors. The existence and evolution of dispersion relation and its role in fully-developed plasma turbulence have been drawing attention of physicists, in particular, if solar wind turbulence represents kinetic Alfvén or whistler mode as the carrier of spectral energy among different scales through wave-wave interactions. We investigate solar wind intervals of Cluster data for various flow velocities with a high-resolution wave vector analysis method, Multi-point Signal Resonator technique, at the tetrahedral separation about 100 km. Magnetic field data and ion data are used to determine the frequency- wave vector diagrams in the co-moving frame of the solar wind. We find primarily perpendicular wave vectors in solar wind turbulence which justify the earlier discussions about kinetic Alfvén or whistler wave. The frequency- wave vector diagrams confirm (a) wave vector anisotropy and (b) scattering in frequencies.
Lopez-Meyer, Paulo; Schuckers, Stephanie; Makeyev, Oleksandr; Fontana, Juan M; Sazonov, Edward
2012-09-01
The number of distinct foods consumed in a meal is of significant clinical concern in the study of obesity and other eating disorders. This paper proposes the use of information contained in chewing and swallowing sequences for meal segmentation by food types. Data collected from experiments of 17 volunteers were analyzed using two different clustering techniques. First, an unsupervised clustering technique, Affinity Propagation (AP), was used to automatically identify the number of segments within a meal. Second, performance of the unsupervised AP method was compared to a supervised learning approach based on Agglomerative Hierarchical Clustering (AHC). While the AP method was able to obtain 90% accuracy in predicting the number of food items, the AHC achieved an accuracy >95%. Experimental results suggest that the proposed models of automatic meal segmentation may be utilized as part of an integral application for objective Monitoring of Ingestive Behavior in free living conditions.
Multifocal visual evoked potential and automated perimetry abnormalities in strabismic amblyopes.
Greenstein, Vivienne C; Eggers, Howard M; Hood, Donald C
2008-02-01
To compare visual field abnormalities obtained with standard automated perimetry (SAP) to those obtained with the multifocal visual evoked potential (mfVEP) technique in strabismic amblyopes. Humphrey 24-2 visual fields (HVF) and mfVEPs were obtained from each eye of 12 strabismic amblyopes. For the mfVEP, amplitudes and latencies were analyzed and probability plots were derived. Multifocal VEP and HVF hemifields were abnormal if they had clusters of two or more contiguous points at p < 0.01, or three or more contiguous points at p < 0.05 with at least one at p < 0.01. An eye was abnormal if it had an abnormal hemifield. On SAP, amblyopic eyes had significantly higher foveal thresholds (p = 0.003) and lower mean deviation values (p = 0.005) than fellow eyes. For the mfVEP, 11 amblyopic and 6 fellow eyes were abnormal. Of the 11 amblyopic eyes, 6 were abnormal on SAP. The deficits extended from the center to mid periphery. Monocular mfVEP latencies were significantly decreased for amblyopic eyes compared with control eyes (p < 0.0002). Both techniques revealed deficits in visual function across the visual field in strabismic amblyopes, but the mfVEP revealed deficits in fellow eyes and in more amblyopic eyes. In addition, mfVEP response latencies for amblyopic eyes were shorter than normal.
Clustering and flow around a sphere moving into a grain cloud.
Seguin, A; Lefebvre-Lepot, A; Faure, S; Gondret, P
2016-06-01
A bidimensional simulation of a sphere moving at constant velocity into a cloud of smaller spherical grains far from any boundaries and without gravity is presented with a non-smooth contact dynamics method. A dense granular "cluster" zone builds progressively around the moving sphere until a stationary regime appears with a constant upstream cluster size. The key point is that the upstream cluster size increases with the initial solid fraction [Formula: see text] but the cluster packing fraction takes an about constant value independent of [Formula: see text]. Although the upstream cluster size around the moving sphere diverges when [Formula: see text] approaches a critical value, the drag force exerted by the grains on the sphere does not. The detailed analysis of the local strain rate and local stress fields made in the non-parallel granular flow inside the cluster allows us to extract the local invariants of the two tensors: dilation rate, shear rate, pressure and shear stress. Despite different spatial variations of these invariants, the local friction coefficient μ appears to depend only on the local inertial number I as well as the local solid fraction, which means that a local rheology does exist in the present non-parallel flow. The key point is that the spatial variations of I inside the cluster do not depend on the sphere velocity and explore only a small range around the value one.
Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar
2018-06-07
Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.
Evaluating effective pair and multisite interactions for Ni-Mo system
NASA Astrophysics Data System (ADS)
Banerjee, Rumu H.; Arya, A.; Banerjee, S.
2018-04-01
Cluster expansion (CE) method was used to calculate the energies of various Ni-Mo phases. The clusters comprising of few nearest neighbours can describe any phase of Ni-Mo system by suitable choice of effective pair and multisite interaction parameters (ECI). The ECIs were evaluated in present study by fitting the ground state energies obtained by first principle calculations. The ECIs evaluated for Ni-Mo system were mostly pair clusters followed by triplets and quadruplet clusters with cluster diameters in the range 2.54 - 10.20 Å. The ECI values diminished for multi-body (triplets and quadruplets) clusters as compared to 2-point or pair clusters indicating a good convergence of CE model. With these ECIs the predicted energies of all the Ni-Mo structures across the Mo concentration range 0-100 at% were obtained. The quantitative error in the energies calculated by CE approach and first principle is very small (< 0.026 meV/atom). The appreciable values of 2-point ECIs upto 4th nearest neighbour reveal that two body interactions are dominant in the case of Ni-Mo system. These ECIs are compared with the reported values of compositional dependent effective pair interactions evaluated by first principle as well as by Monte Carlo method.
Critical behavior of the contact process on small-world networks
NASA Astrophysics Data System (ADS)
Ferreira, Ronan S.; Ferreira, Silvio C.
2013-11-01
We investigate the role of clustering on the critical behavior of the contact process (CP) on small-world networks using the Watts-Strogatz (WS) network model with an edge rewiring probability p. The critical point is well predicted by a homogeneous cluster-approximation for the limit of vanishing clustering ( p → 1). The critical exponents and dimensionless moment ratios of the CP are in agreement with those predicted by the mean-field theory for any p > 0. This independence on the network clustering shows that the small-world property is a sufficient condition for the mean-field theory to correctly predict the universality of the model. Moreover, we compare the CP dynamics on WS networks with rewiring probability p = 1 and random regular networks and show that the weak heterogeneity of the WS network slightly changes the critical point but does not alter other critical quantities of the model.
Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging
NASA Astrophysics Data System (ADS)
Liu, Feng; Ye, Chengcheng; Zhu, Erzhou
2017-09-01
Due to the advent of big data, data mining technology has attracted more and more attentions. As an important data analysis method, grid clustering algorithm is fast but with relatively lower accuracy. This paper presents an improved clustering algorithm combined with grid and density parameters. The algorithm first divides the data space into the valid meshes and invalid meshes through grid parameters. Secondly, from the starting point located at the first point of the diagonal of the grids, the algorithm takes the direction of “horizontal right, vertical down” to merge the valid meshes. Furthermore, by the boundary grid processing, the invalid grids are searched and merged when the adjacent left, above, and diagonal-direction grids are all the valid ones. By doing this, the accuracy of clustering is improved. The experimental results have shown that the proposed algorithm is accuracy and relatively faster when compared with some popularly used algorithms.
High-dimensional cluster analysis with the Masked EM Algorithm
Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.
2014-01-01
Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
NASA Astrophysics Data System (ADS)
Jha, S. K.; Brockman, R. A.; Hoffman, R. M.; Sinha, V.; Pilchak, A. L.; Porter, W. J.; Buchanan, D. J.; Larsen, J. M.; John, R.
2018-05-01
Principal component analysis and fuzzy c-means clustering algorithms were applied to slip-induced strain and geometric metric data in an attempt to discover unique microstructural configurations and their frequencies of occurrence in statistically representative instantiations of a titanium alloy microstructure. Grain-averaged fatigue indicator parameters were calculated for the same instantiation. The fatigue indicator parameters strongly correlated with the spatial location of the microstructural configurations in the principal components space. The fuzzy c-means clustering method identified clusters of data that varied in terms of their average fatigue indicator parameters. Furthermore, the number of points in each cluster was inversely correlated to the average fatigue indicator parameter. This analysis demonstrates that data-driven methods have significant potential for providing unbiased determination of unique microstructural configurations and their frequencies of occurrence in a given volume from the point of view of strain localization and fatigue crack initiation.
Nonlocal screening effects on core-level photoemission spectra investigated by large-cluster models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Okada, K.; Kotani, A.
1995-08-15
The copper 2{ital p} core-level x-ray photoemission spectrum in CuO{sub 2} plane systems is calculated by means of large-cluster models to investigate in detail the nonlocal screening effects, which were pointed out by van Veenendaal {ital et} {ital al}. [Phys. Rev. B 47, 11 462 (1993)]. Calculating the hole distributions for the initial and final states of photoemission, we show that the atomic coordination in a cluster strongly affects accessible final states. Accordingly, we point out that the interpretation for Cu{sub 3}O{sub 10} given by van Veenendaal {ital et} {ital al}. is not always general. Moreover, it is shown thatmore » the spectrum can be remarkably affected by whether or not the O 2{ital p}{sub {pi}} orbits are taken into account in the calculations. We also introduce a Hartree-Fock approximation in order to treat much larger-cluster models.« less
Effect of denoising on supervised lung parenchymal clusters
NASA Astrophysics Data System (ADS)
Jayamani, Padmapriya; Raghunath, Sushravya; Rajagopalan, Srinivasan; Karwoski, Ronald A.; Bartholmai, Brian J.; Robb, Richard A.
2012-03-01
Denoising is a critical preconditioning step for quantitative analysis of medical images. Despite promises for more consistent diagnosis, denoising techniques are seldom explored in clinical settings. While this may be attributed to the esoteric nature of the parameter sensitve algorithms, lack of quantitative measures on their ecacy to enhance the clinical decision making is a primary cause of physician apathy. This paper addresses this issue by exploring the eect of denoising on the integrity of supervised lung parenchymal clusters. Multiple Volumes of Interests (VOIs) were selected across multiple high resolution CT scans to represent samples of dierent patterns (normal, emphysema, ground glass, honey combing and reticular). The VOIs were labeled through consensus of four radiologists. The original datasets were ltered by multiple denoising techniques (median ltering, anisotropic diusion, bilateral ltering and non-local means) and the corresponding ltered VOIs were extracted. Plurality of cluster indices based on multiple histogram-based pair-wise similarity measures were used to assess the quality of supervised clusters in the original and ltered space. The resultant rank orders were analyzed using the Borda criteria to nd the denoising-similarity measure combination that has the best cluster quality. Our exhaustive analyis reveals (a) for a number of similarity measures, the cluster quality is inferior in the ltered space; and (b) for measures that benet from denoising, a simple median ltering outperforms non-local means and bilateral ltering. Our study suggests the need to judiciously choose, if required, a denoising technique that does not deteriorate the integrity of supervised clusters.
The energy landscape of glassy dynamics on the amorphous hafnium diboride surface
NASA Astrophysics Data System (ADS)
Nguyen, Duc; Mallek, Justin; Cloud, Andrew N.; Abelson, John R.; Girolami, Gregory S.; Lyding, Joseph; Gruebele, Martin
2014-11-01
Direct visualization of the dynamics of structural glasses and amorphous solids on the sub-nanometer scale provides rich information unavailable from bulk or conventional single molecule techniques. We study the surface of hafnium diboride, a conductive ultrahigh temperature ceramic material that can be grown in amorphous films. Our scanning tunneling movies have a second-to-hour dynamic range and single-point current measurements extend that to the millisecond-to-minute time scale. On the a-HfB2 glass surface, two-state hopping of 1-2 nm diameter cooperatively rearranging regions or "clusters" occurs from sub-milliseconds to hours. We characterize individual clusters in detail through high-resolution (<0.5 nm) imaging, scanning tunneling spectroscopy and voltage modulation, ruling out individual atoms, diffusing adsorbates, or pinned charges as the origin of the observed two-state hopping. Smaller clusters are more likely to hop, larger ones are more likely to be immobile. HfB2 has a very high bulk glass transition temperature Tg, and we observe no three-state hopping or sequential two-state hopping previously seen on lower Tg glass surfaces. The electronic density of states of clusters does not change when they hop up or down, allowing us to calibrate an accurate relative z-axis scale. By directly measuring and histogramming single cluster vertical displacements, we can reconstruct the local free energy landscape of individual clusters, complete with activation barrier height, a reaction coordinate in nanometers, and the shape of the free energy landscape basins between which hopping occurs. The experimental images are consistent with the compact shape of α-relaxors predicted by random first order transition theory, whereas the rapid hopping rate, even taking less confined motion at the surface into account, is consistent with β-relaxations. We make a proposal of how "mixed" features can show up in surface dynamics of glasses.
Réal, Florent; Ordejón, Belén; Vallet, Valérie; Flament, Jean-Pierre; Schamps, Joël
2009-11-21
New ab initio embedded-cluster calculations devoted to simulating the electronic spectroscopy of Bi(3+) impurities in Y(2)O(3) sesquioxide for substitutions in either S(6) or C(2) cationic sites have been carried out taking special care of the quality of the environment. A considerable quantitative improvement with respect to previous studies [F. Real et al. J. Chem. Phys. 125, 174709 (2006); F. Real et al. J. Chem. Phys. 127, 104705 (2007)] is brought by using environments of the impurities obtained via supercell techniques that allow the whole (pseudo) crystal to relax (WCR geometries) instead of environments obtained from local relaxation of the first coordination shell only (FSR geometries) within the embedded cluster approach, as was done previously. In particular the uniform 0.4 eV discrepancy of absorption energies found previously with FSR environments disappears completely when the new WCR environments of the impurities are employed. Moreover emission energies and hence Stokes shifts are in much better agreement with experiment. These decisive improvements are mainly due to a lowering of the local point-group symmetry (S(6)-->C(3) and C(2)-->C(1)) when relaxing the geometry of the emitting (lowest) triplet state. This symmetry lowering was not observed in FSR embedded cluster relaxations because the crystal field of the embedding frozen at the genuine pure crystal positions seems to be a more important driving force than the interactions within the cluster, thus constraining the overall symmetry of the system. Variations of the doping rate are found to have negligible influence on the spectra. In conclusion, the use of WCR environments may be crucial to render the structural distortions occurring in a doped crystal and it may help to significantly improve the embedded-cluster methodology to reach the quantitative accuracy necessary to interpret and predict luminescence properties of doped materials of this type.
Light clusters and pasta phases in warm and dense nuclear matter
NASA Astrophysics Data System (ADS)
Avancini, Sidney S.; Ferreira, Márcio; Pais, Helena; Providência, Constança; Röpke, Gerd
2017-04-01
The pasta phases are calculated for warm stellar matter in a framework of relativistic mean-field models, including the possibility of light cluster formation. Results from three different semiclassical approaches are compared with a quantum statistical calculation. Light clusters are considered as point-like particles, and their abundances are determined from the minimization of the free energy. The couplings of the light clusters to mesons are determined from experimental chemical equilibrium constants and many-body quantum statistical calculations. The effect of these light clusters on the chemical potentials is also discussed. It is shown that, by including heavy clusters, light clusters are present up to larger nucleonic densities, although with smaller mass fractions.
NASA Astrophysics Data System (ADS)
Radu, R.; Pintilie, I.; Nistor, L. C.; Fretwurst, E.; Lindstroem, G.; Makarenko, L. F.
2015-04-01
This work is focusing on generation, time evolution, and impact on the electrical performance of silicon diodes impaired by radiation induced active defects. n-type silicon diodes had been irradiated with electrons ranging from 1.5 MeV to 27 MeV. It is shown that the formation of small clusters starts already after irradiation with high fluence of 1.5 MeV electrons. An increase of the introduction rates of both point defects and small clusters with increasing energy is seen, showing saturation for electron energies above ˜15 MeV. The changes in the leakage current at low irradiation fluence-values proved to be determined by the change in the configuration of the tri-vacancy (V3). Similar to V3, other cluster related defects are showing bistability indicating that they might be associated with larger vacancy clusters. The change of the space charge density with irradiation and with annealing time after irradiation is fully described by accounting for the radiation induced trapping centers. High resolution electron microscopy investigations correlated with the annealing experiments revealed changes in the spatial structure of the defects. Furthermore, it is shown that while the generation of point defects is well described by the classical Non Ionizing Energy Loss (NIEL), the formation of small defect clusters is better described by the "effective NIEL" using results from molecular dynamics simulations.
A phase field model for segregation and precipitation induced by irradiation in alloys
NASA Astrophysics Data System (ADS)
Badillo, A.; Bellon, P.; Averback, R. S.
2015-04-01
A phase field model is introduced to model the evolution of multicomponent alloys under irradiation, including radiation-induced segregation and precipitation. The thermodynamic and kinetic components of this model are derived using a mean-field model. The mobility coefficient and the contribution of chemical heterogeneity to free energy are rescaled by the cell size used in the phase field model, yielding microstructural evolutions that are independent of the cell size. A new treatment is proposed for point defect clusters, using a mixed discrete-continuous approach to capture the stochastic character of defect cluster production in displacement cascades, while retaining the efficient modeling of the fate of these clusters using diffusion equations. The model is tested on unary and binary alloy systems using two-dimensional simulations. In a unary system, the evolution of point defects under irradiation is studied in the presence of defect clusters, either pre-existing ones or those created by irradiation, and compared with rate theory calculations. Binary alloys with zero and positive heats of mixing are then studied to investigate the effect of point defect clustering on radiation-induced segregation and precipitation in undersaturated solid solutions. Lastly, irradiation conditions and alloy parameters leading to irradiation-induced homogeneous precipitation are investigated. The results are discussed in the context of experimental results reported for Ni-Si and Al-Zn undersaturated solid solutions subjected to irradiation.
Thermodynamics and proton activities of protic ionic liquids with quantum cluster equilibrium theory
NASA Astrophysics Data System (ADS)
Ingenmey, Johannes; von Domaros, Michael; Perlt, Eva; Verevkin, Sergey P.; Kirchner, Barbara
2018-05-01
We applied the binary Quantum Cluster Equilibrium (bQCE) method to a number of alkylammonium-based protic ionic liquids in order to predict boiling points, vaporization enthalpies, and proton activities. The theory combines statistical thermodynamics of van-der-Waals-type clusters with ab initio quantum chemistry and yields the partition functions (and associated thermodynamic potentials) of binary mixtures over a wide range of thermodynamic phase points. Unlike conventional cluster approaches that are limited to the prediction of thermodynamic properties, dissociation reactions can be effortlessly included into the bQCE formalism, giving access to ionicities, as well. The method is open to quantum chemical methods at any level of theory, but combination with low-cost composite density functional theory methods and the proposed systematic approach to generate cluster sets provides a computationally inexpensive and mostly parameter-free way to predict such properties at good-to-excellent accuracy. Boiling points can be predicted within an accuracy of 50 K, reaching excellent accuracy for ethylammonium nitrate. Vaporization enthalpies are predicted within an accuracy of 20 kJ mol-1 and can be systematically interpreted on a molecular level. We present the first theoretical approach to predict proton activities in protic ionic liquids, with results fitting well into the experimentally observed correlation. Furthermore, enthalpies of vaporization were measured experimentally for some alkylammonium nitrates and an excellent linear correlation with vaporization enthalpies of their respective parent amines is observed.
Intra-cluster Globular Clusters in a Simulated Galaxy Cluster
NASA Astrophysics Data System (ADS)
Ramos-Almendares, Felipe; Abadi, Mario; Muriel, Hernán; Coenda, Valeria
2018-01-01
Using a cosmological dark matter simulation of a galaxy-cluster halo, we follow the temporal evolution of its globular cluster population. To mimic the red and blue globular cluster populations, we select at high redshift (z∼ 1) two sets of particles from individual galactic halos constrained by the fact that, at redshift z = 0, they have density profiles similar to observed ones. At redshift z = 0, approximately 60% of our selected globular clusters were removed from their original halos building up the intra-cluster globular cluster population, while the remaining 40% are still gravitationally bound to their original galactic halos. As the blue population is more extended than the red one, the intra-cluster globular cluster population is dominated by blue globular clusters, with a relative fraction that grows from 60% at redshift z = 0 up to 83% for redshift z∼ 2. In agreement with observational results for the Virgo galaxy cluster, the blue intra-cluster globular cluster population is more spatially extended than the red one, pointing to a tidally disrupted origin.
The Application of Clustering Techniques to Citation Data. Research Reports Series B No. 6.
ERIC Educational Resources Information Center
Arms, William Y.; Arms, Caroline
This report describes research carried out as part of the Design of Information Systems in the Social Sciences (DISISS) project. Cluster analysis techniques were applied to a machine readable file of bibliographic data in the form of cited journal titles in order to identify groupings which could be used to structure bibliographic files. Practical…
NASA Astrophysics Data System (ADS)
Sahiner, Berkman; Gurcan, Metin N.; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Petrick, Nicholas; Helvie, Mark A.
2002-05-01
We are developing new techniques to improve the accuracy of computerized microcalcification detection by using the joint two-view information on craniocaudal (CC) and mediolateral-oblique (MLO) views. After cluster candidates were detected using a single-view detection technique, candidates on CC and MLO views were paired using their radial distances from the nipple. Object pairs were classified with a joint two-view classifier that used the similarity of objects in a pair. Each cluster candidate was also classified as a true microcalcification cluster or a false-positive (FP) using its single-view features. The outputs of these two classifiers were fused. A data set of 38 pairs of mammograms from our database was used to train the new detection technique. The independent test set consisted of 77 pairs of mammograms from the University of South Florida public database. At a per-film sensitivity of 70%, the FP rates were 0.17 and 0.27 with the fusion and single-view detection methods, respectively. Our results indicate that correspondence of cluster candidates on two different views provides valuable additional information for distinguishing false from true microcalcification clusters.
Modulation aware cluster size optimisation in wireless sensor networks
NASA Astrophysics Data System (ADS)
Sriram Naik, M.; Kumar, Vinay
2017-07-01
Wireless sensor networks (WSNs) play a great role because of their numerous advantages to the mankind. The main challenge with WSNs is the energy efficiency. In this paper, we have focused on the energy minimisation with the help of cluster size optimisation along with consideration of modulation effect when the nodes are not able to communicate using baseband communication technique. Cluster size optimisations is important technique to improve the performance of WSNs. It provides improvement in energy efficiency, network scalability, network lifetime and latency. We have proposed analytical expression for cluster size optimisation using traditional sensing model of nodes for square sensing field with consideration of modulation effects. Energy minimisation can be achieved by changing the modulation schemes such as BPSK, 16-QAM, QPSK, 64-QAM, etc., so we are considering the effect of different modulation techniques in the cluster formation. The nodes in the sensing fields are random and uniformly deployed. It is also observed that placement of base station at centre of scenario enables very less number of modulation schemes to work in energy efficient manner but when base station placed at the corner of the sensing field, it enable large number of modulation schemes to work in energy efficient manner.
Density-cluster NMA: A new protein decomposition technique for coarse-grained normal mode analysis.
Demerdash, Omar N A; Mitchell, Julie C
2012-07-01
Normal mode analysis has emerged as a useful technique for investigating protein motions on long time scales. This is largely due to the advent of coarse-graining techniques, particularly Hooke's Law-based potentials and the rotational-translational blocking (RTB) method for reducing the size of the force-constant matrix, the Hessian. Here we present a new method for domain decomposition for use in RTB that is based on hierarchical clustering of atomic density gradients, which we call Density-Cluster RTB (DCRTB). The method reduces the number of degrees of freedom by 85-90% compared with the standard blocking approaches. We compared the normal modes from DCRTB against standard RTB using 1-4 residues in sequence in a single block, with good agreement between the two methods. We also show that Density-Cluster RTB and standard RTB perform well in capturing the experimentally determined direction of conformational change. Significantly, we report superior correlation of DCRTB with B-factors compared with 1-4 residue per block RTB. Finally, we show significant reduction in computational cost for Density-Cluster RTB that is nearly 100-fold for many examples. Copyright © 2012 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
van Rossum, Anne C.; Lin, Hai Xiang; Dubbeldam, Johan; van der Herik, H. Jaap
2018-04-01
In machine vision typical heuristic methods to extract parameterized objects out of raw data points are the Hough transform and RANSAC. Bayesian models carry the promise to optimally extract such parameterized objects given a correct definition of the model and the type of noise at hand. A category of solvers for Bayesian models are Markov chain Monte Carlo methods. Naive implementations of MCMC methods suffer from slow convergence in machine vision due to the complexity of the parameter space. Towards this blocked Gibbs and split-merge samplers have been developed that assign multiple data points to clusters at once. In this paper we introduce a new split-merge sampler, the triadic split-merge sampler, that perform steps between two and three randomly chosen clusters. This has two advantages. First, it reduces the asymmetry between the split and merge steps. Second, it is able to propose a new cluster that is composed out of data points from two different clusters. Both advantages speed up convergence which we demonstrate on a line extraction problem. We show that the triadic split-merge sampler outperforms the conventional split-merge sampler. Although this new MCMC sampler is demonstrated in this machine vision context, its application extend to the very general domain of statistical inference.
Successful ageing: A study of the literature using citation network analysis.
Kusumastuti, Sasmita; Derks, Marloes G M; Tellier, Siri; Di Nucci, Ezio; Lund, Rikke; Mortensen, Erik Lykke; Westendorp, Rudi G J
2016-11-01
Ageing is accompanied by an increased risk of disease and a loss of functioning on several bodily and mental domains and some argue that maintaining health and functioning is essential for a successful old age. Paradoxically, studies have shown that overall wellbeing follows a curvilinear pattern with the lowest point at middle age but increases thereafter up to very old age. To shed further light on this paradox, we reviewed the existing literature on how scholars define successful ageing and how they weigh the contribution of health and functioning to define success. We performed a novel, hypothesis-free and quantitative analysis of citation networks exploring the literature on successful ageing that exists in the Web of Science Core Collection Database using the CitNetExplorer software. Outcomes were visualized using timeline-based citation patterns. The clusters and sub-clusters of citation networks identified were starting points for in-depth qualitative analysis. Within the literature from 1902 through 2015, two distinct citation networks were identified. The first cluster had 1146 publications and 3946 citation links. It focused on successful ageing from the perspective of older persons themselves. Analysis of the various sub-clusters emphasized the importance of coping strategies, psycho-social engagement, and cultural differences. The second cluster had 609 publications and 1682 citation links and viewed successful ageing based on the objective measurements as determined by researchers. Subsequent sub-clustering analysis pointed to different domains of functioning and various ways of assessment. In the current literature two mutually exclusive concepts of successful ageing are circulating that depend on whether the individual himself or an outsider judges the situation. These different points of view help to explain the disability paradox, as successful ageing lies in the eyes of the beholder. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Adaptive Water Sampling based on Unsupervised Clustering
NASA Astrophysics Data System (ADS)
Py, F.; Ryan, J.; Rajan, K.; Sherman, A.; Bird, L.; Fox, M.; Long, D.
2007-12-01
Autonomous Underwater Vehicles (AUVs) are widely used for oceanographic surveys, during which data is collected from a number of on-board sensors. Engineers and scientists at MBARI have extended this approach by developing a water sampler specialy for the AUV, which can sample a specific patch of water at a specific time. The sampler, named the Gulper, captures 2 liters of seawater in less than 2 seconds on a 21" MBARI Odyssey AUV. Each sample chamber of the Gulper is filled with seawater through a one-way valve, which protrudes through the fairing of the AUV. This new kind of device raises a new problem: when to trigger the gulper autonomously? For example, scientists interested in studying the mobilization and transport of shelf sediments would like to detect intermediate nepheloïd layers (INLs). To be able to detect this phenomenon we need to extract a model based on AUV sensors that can detect this feature in-situ. The formation of such a model is not obvious as identification of this feature is generally based on data from multiple sensors. We have developed an unsupervised data clustering technique to extract the different features which will then be used for on-board classification and triggering of the Gulper. We use a three phase approach: 1) use data from past missions to learn the different classes of data from sensor inputs. The clustering algorithm will then extract the set of features that can be distinguished within this large data set. 2) Scientists on shore then identify these features and point out which correspond to those of interest (e.g. nepheloïd layer, upwelling material etc) 3) Embed the corresponding classifier into the AUV control system to indicate the most probable feature of the water depending on sensory input. The triggering algorithm looks to this result and triggers the Gulper if the classifier indicates that we are within the feature of interest with a predetermined threshold of confidence. We have deployed this method of online classification and sampling based on AUV depth and HOBI Labs Hydroscat-2 sensor data. Using approximately 20,000 data samples the clustering algorithm generated 14 clusters with one identified as corresponding to a nepheloïd layer. We demonstrate that such a technique can be used to reliably and efficiently sample water based on multiple sources of data in real-time.
On the analysis of large data sets
NASA Astrophysics Data System (ADS)
Ruch, Gerald T., Jr.
We present a set of tools and techniques for performing detailed comparisons between computational models with high dimensional parameter spaces and large sets of archival data. By combining a principal component analysis of a large grid of samples from the model with an artificial neural network, we create a powerful data visualization tool as well as a way to robustly recover physical parameters from a large set of experimental data. Our techniques are applied in the context of circumstellar disks, the likely sites of planetary formation. An analysis is performed applying the two layer approximation of Chiang et al. (2001) and Dullemond et al. (2001) to the archive created by the Spitzer Space Telescope Cores to Disks Legacy program. We find two populations of disk sources. The first population is characterized by the lack of a puffed up inner rim while the second population appears to contain an inner rim which casts a shadow across the disk. The first population also exhibits a trend of increasing spectral index while the second population exhibits a decreasing trend in the strength of the 20 mm silicate emission feature. We also present images of the giant molecular cloud W3 obtained with the Infrared Array Camera (IRAC) and the Multiband Imaging Photometer (MIPS) on board the Spitzer Space Telescope. The images encompass the star forming regions W3 Main, W3(OH), and a region that we refer to as the Central Cluster which encloses the emission nebula IC 1795. We present a star count analysis of the point sources detected in W3. The star count analysis shows that the stellar population of the Central Cluster, when compared to that in the background, contains an over density of sources. The Central Cluster also contains an excess of sources with colors consistent with Class II Young Stellar Objects (YSOs). A analysis of the color-color diagrams also reveals a large number of Class II YSOs in the Central Cluster. Our results suggest that an earlier epoch of star formation created the Central Cluster, created a cavity, and triggered the active star formation in the W3 Main and W3(OH) regions. We also detect a new outflow and its candidate exciting star.
Yohannan, Jithin; He, Bing; Wang, Jiangxia; Greene, Gregory; Schein, Yvette; Mkocha, Harran; Munoz, Beatriz; Quinn, Thomas C.; Gaydos, Charlotte; West, Sheila K.
2014-01-01
Purpose. We detected spatial clustering of households with Chlamydia trachomatis infection (CI) and active trachoma (AT) in villages undergoing mass treatment with azithromycin (MDA) over time. Methods. We obtained global positioning system (GPS) coordinates for all households in four villages in Kongwa District, Tanzania. Every 6 months for a period of 42 months, our team examined all children under 10 for AT, and tested for CI with ocular swabbing and Amplicor. Villages underwent four rounds of annual MDA. We classified households as having ≥1 child with CI (or AT) or having 0 children with CI (or AT). We calculated the difference in the K function between households with and without CI or AT to detect clustering at each time point. Results. Between 918 and 991 households were included over the 42 months of this analysis. At baseline, 306 households (32.59%) had ≥1 child with CI, which declined to 73 households (7.50%) at 42 months. We observed borderline clustering of households with CI at 12 months after one round of MDA and statistically significant clustering with growing cluster sizes between 18 and 24 months after two rounds of MDA. Clusters diminished in size at 30 months after 3 rounds of MDA. Active trachoma did not cluster at any time point. Conclusions. This study demonstrates that CI clusters after multiple rounds of MDA. Clusters of infection may increase in size if the annual antibiotic pressure is removed. The absence of growth after the three rounds suggests the start of control of transmission. PMID:24906862
Logo image clustering based on advanced statistics
NASA Astrophysics Data System (ADS)
Wei, Yi; Kamel, Mohamed; He, Yiwei
2007-11-01
In recent years, there has been a growing interest in the research of image content description techniques. Among those, image clustering is one of the most frequently discussed topics. Similar to image recognition, image clustering is also a high-level representation technique. However it focuses on the coarse categorization rather than the accurate recognition. Based on wavelet transform (WT) and advanced statistics, the authors propose a novel approach that divides various shaped logo images into groups according to the external boundary of each logo image. Experimental results show that the presented method is accurate, fast and insensitive to defects.
Visual analytics of large multidimensional data using variable binned scatter plots
NASA Astrophysics Data System (ADS)
Hao, Ming C.; Dayal, Umeshwar; Sharma, Ratnesh K.; Keim, Daniel A.; Janetzko, Halldór
2010-01-01
The scatter plot is a well-known method of visualizing pairs of two-dimensional continuous variables. Multidimensional data can be depicted in a scatter plot matrix. They are intuitive and easy-to-use, but often have a high degree of overlap which may occlude a significant portion of data. In this paper, we propose variable binned scatter plots to allow the visualization of large amounts of data without overlapping. The basic idea is to use a non-uniform (variable) binning of the x and y dimensions and plots all the data points that fall within each bin into corresponding squares. Further, we map a third attribute to color for visualizing clusters. Analysts are able to interact with individual data points for record level information. We have applied these techniques to solve real-world problems on credit card fraud and data center energy consumption to visualize their data distribution and cause-effect among multiple attributes. A comparison of our methods with two recent well-known variants of scatter plots is included.
Fast, axis-agnostic, dynamically summarized storage and retrieval for mass spectrometry data.
Handy, Kyle; Rosen, Jebediah; Gillan, André; Smith, Rob
2017-01-01
Mass spectrometry, a popular technique for elucidating the molecular contents of experimental samples, creates data sets comprised of millions of three-dimensional (m/z, retention time, intensity) data points that correspond to the types and quantities of analyzed molecules. Open and commercial MS data formats are arranged by retention time, creating latency when accessing data across multiple m/z. Existing MS storage and retrieval methods have been developed to overcome the limitations of retention time-based data formats, but do not provide certain features such as dynamic summarization and storage and retrieval of point meta-data (such as signal cluster membership), precluding efficient viewing applications and certain data-processing approaches. This manuscript describes MzTree, a spatial database designed to provide real-time storage and retrieval of dynamically summarized standard and augmented MS data with fast performance in both m/z and RT directions. Performance is reported on real data with comparisons against related published retrieval systems.
The HST Frontier Field MACS 1159.5+2223: Flanking Observations for Intracluster Light
NASA Astrophysics Data System (ADS)
Gonzalez, Anthony
2017-08-01
We propose a 6 orbit WFC3/IR imaging program targeting the environs of the HST Frontier Field cluster MACS 1149.5+2223 to obtain a comprehensive view of the intracluster stellar population in a massive galaxy cluster. WFC3/IR enables a vast improvement over ground-based studies in mapping emission from diffuse stellar populations. Our proposed observations are designed to build upon the existing investment in the Frontier Fields to conduct a new, more complete census of the intracluster light (ICL) extending out to 750 kpc. The requested observations are constructed to span the gap between the primary and parallel HFF pointings, detecting ICL to a surface brightness of 29.5 mag per square arcsec in F160W (equivalent to 31.5 mag per square arcsec in V-band). This depth is sufficient to trace the radial ICL profile out to 750 kpc from the BCG. These data will also yield a high-fidelity calibration of the background sky level, enabling two-dimensional mapping of the distribution and color of intracluster light down to 27 mag per square arcsec in F160W. From these maps we will quantify spatial variation in the ratio of the stellar baryons to the ICM, establishing whether the observed low scatter in the global ratio masks underlying smaller scale inhomogeneities due to astrophysical processes in the cluster. The requested observations further serve as a pilot program, enabling future similar analyses with the full ensemble of HFF clusters, and developing techniques that will be required for such low surface brightness programs with upcoming facilities including Euclid and WFIRST.
Dynamical Modeling of NGC 6397: Simulated HST Imaging
NASA Astrophysics Data System (ADS)
Dull, J. D.; Cohn, H. N.; Lugger, P. M.; Slavin, S. D.; Murphy, B. W.
1994-12-01
The proximity of NGC 6397 (2.2 kpc) provides an ideal opportunity to test current dynamical models for globular clusters with the HST Wide-Field/Planetary Camera (WFPC2)\\@. We have used a Monte Carlo algorithm to generate ensembles of simulated Planetary Camera (PC) U-band images of NGC 6397 from evolving, multi-mass Fokker-Planck models. These images, which are based on the post-repair HST-PC point-spread function, are used to develop and test analysis methods for recovering structural information from actual HST imaging. We have considered a range of exposure times up to 2.4times 10(4) s, based on our proposed HST Cycle 5 observations. Our Fokker-Planck models include energy input from dynamically-formed binaries. We have adopted a 20-group mass spectrum extending from 0.16 to 1.4 M_sun. We use theoretical luminosity functions for red giants and main sequence stars. Horizontal branch stars, blue stragglers, white dwarfs, and cataclysmic variables are also included. Simulated images are generated for cluster models at both maximal core collapse and at a post-collapse bounce. We are carrying out stellar photometry on these images using ``DAOPHOT-assisted aperture photometry'' software that we have developed. We are testing several techniques for analyzing the resulting star counts, to determine the underlying cluster structure, including parametric model fits and the nonparametric density estimation methods. Our simulated images also allow us to investigate the accuracy and completeness of methods for carrying out stellar photometry in HST Planetary Camera images of dense cluster cores.
Lee, Sanghwa; Lee, Seung Ho; Paulson, Bjorn; Lee, Jae-Chul; Kim, Jun Ki
2018-06-20
The development of size-selective and non-destructive detection techniques for nanosized biomarkers has many reasons, including the study of living cells and diagnostic applications. We present an approach for Raman signal enhancement on biocompatible sensing chips based on surface enhancement Raman spectroscopy (SERS). A sensing chip was fabricated by forming a ZnO-based nanorod structure so that the Raman enhancement occurred at a gap of several tens to several hundred nanometers. The effect of coffee-ring formation was eliminated by introducing the porous ZnO nanorods for the bio-liquid sample. A peculiarity of this approach is that the gold sputtered on the ZnO nanorods initially grows at their heads forming clusters, as confirmed by secondary electron microscopy. This clustering was verified by finite element analysis to be the main factor for enhancement of local surface plasmon resonance (LSPR). This clustering property and the ability to adjust the size of the nanorods enabled the signal acquisition points to be refined using confocal based Raman spectroscopy, which could be applied directly to the sensor chip based on the optimization process in this experiment. It was demonstrated by using common cancer cell lines that cell growth was high on these gold-clad ZnO nanorod-based surface-enhanced Raman substrates. The porosity of the sensing chip, the improved structure for signal enhancement, and the cell assay make these gold-coated ZnO nanorods substrates promising biosensing chips with excellent potential for detecting nanometric biomarkers secreted by cells. Copyright © 2018 Elsevier B.V. All rights reserved.
AMADEUS—The acoustic neutrino detection test system of the ANTARES deep-sea neutrino telescope
NASA Astrophysics Data System (ADS)
Aguilar, J. A.; Al Samarai, I.; Albert, A.; Anghinolfi, M.; Anton, G.; Anvar, S.; Ardid, M.; Assis Jesus, A. C.; Astraatmadja, T.; Aubert, J.-J.; Auer, R.; Barbarito, E.; Baret, B.; Basa, S.; Bazzotti, M.; Bertin, V.; Biagi, S.; Bigongiari, C.; Bou-Cabo, M.; Bouwhuis, M. C.; Brown, A.; Brunner, J.; Busto, J.; Camarena, F.; Capone, A.; Cârloganu, C.; Carminati, G.; Carr, J.; Cassano, B.; Castorina, E.; Cavasinni, V.; Cecchini, S.; Ceres, A.; Charvis, Ph.; Chiarusi, T.; Chon Sen, N.; Circella, M.; Coniglione, R.; Costantini, H.; Cottini, N.; Coyle, P.; Curtil, C.; de Bonis, G.; Decowski, M. P.; Dekeyser, I.; Deschamps, A.; Distefano, C.; Donzaud, C.; Dornic, D.; Drouhin, D.; Eberl, T.; Emanuele, U.; Ernenwein, J.-P.; Escoffier, S.; Fehr, F.; Fiorello, C.; Flaminio, V.; Fritsch, U.; Fuda, J.-L.; Gay, P.; Giacomelli, G.; Gómez-González, J. P.; Graf, K.; Guillard, G.; Halladjian, G.; Hallewell, G.; van Haren, H.; Heijboer, A. J.; Heine, E.; Hello, Y.; Hernández-Rey, J. J.; Herold, B.; Hößl, J.; de Jong, M.; Kalantar-Nayestanaki, N.; Kalekin, O.; Kappes, A.; Katz, U.; Keller, P.; Kooijman, P.; Kopper, C.; Kouchner, A.; Kretschmer, W.; Lahmann, R.; Lamare, P.; Lambard, G.; Larosa, G.; Laschinsky, H.; Le Provost, H.; Lefèvre, D.; Lelaizant, G.; Lim, G.; Lo Presti, D.; Loehner, H.; Loucatos, S.; Louis, F.; Lucarelli, F.; Mangano, S.; Marcelin, M.; Margiotta, A.; Martinez-Mora, J. A.; Mazure, A.; Mongelli, M.; Montaruli, T.; Morganti, M.; Moscoso, L.; Motz, H.; Naumann, C.; Neff, M.; Ostasch, R.; Palioselitis, D.; Păvălaş, G. E.; Payre, P.; Petrovic, J.; Picot-Clemente, N.; Picq, C.; Popa, V.; Pradier, T.; Presani, E.; Racca, C.; Radu, A.; Reed, C.; Riccobene, G.; Richardt, C.; Rujoiu, M.; Ruppi, M.; Russo, G. V.; Salesa, F.; Sapienza, P.; Schöck, F.; Schuller, J.-P.; Shanidze, R.; Simeone, F.; Spurio, M.; Steijger, J. J. M.; Stolarczyk, Th.; Taiuti, M.; Tamburini, C.; Tasca, L.; Toscano, S.; Vallage, B.; van Elewyck, V.; Vannoni, G.; Vecchi, M.; Vernin, P.; Wijnker, G.; de Wolf, E.; Yepes, H.; Zaborov, D.; Zornoza, J. D.; Zúñiga, J.
2011-01-01
The AMADEUS (ANTARES Modules for the Acoustic Detection Under the Sea) system which is described in this article aims at the investigation of techniques for acoustic detection of neutrinos in the deep sea. It is integrated into the ANTARES neutrino telescope in the Mediterranean Sea. Its acoustic sensors, installed at water depths between 2050 and 2300 m, employ piezo-electric elements for the broad-band recording of signals with frequencies ranging up to 125 kHz. The typical sensitivity of the sensors is around -145 dB re 1 V/μPa (including preamplifier). Completed in May 2008, AMADEUS consists of six “acoustic clusters”, each comprising six acoustic sensors that are arranged at distances of roughly 1 m from each other. Two vertical mechanical structures (so-called lines) of the ANTARES detector host three acoustic clusters each. Spacings between the clusters range from 14.5 to 340 m. Each cluster contains custom-designed electronics boards to amplify and digitise the acoustic signals from the sensors. An on-shore computer cluster is used to process and filter the data stream and store the selected events. The daily volume of recorded data is about 10 GB. The system is operating continuously and automatically, requiring only little human intervention. AMADEUS allows for extensive studies of both transient signals and ambient noise in the deep sea, as well as signal correlations on several length scales and localisation of acoustic point sources. Thus the system is excellently suited to assess the background conditions for the measurement of the bipolar pulses expected to originate from neutrino interactions.
Reconstruction of cluster masses using particle based lensing
NASA Astrophysics Data System (ADS)
Deb, Sanghamitra
Clusters of galaxies are among the richest astrophysical data systems, but to truly understand these systems, we need a detailed study of the relationship between observables and the underlying cluster dark matter distribution. Gravitational lensing is the most direct probe of dark matter, but many mass reconstruction techniques assume that cluster light traces mass, or combine different lensing signals in an ad hoc way. In this talk, we will describe "Particle Based Lensing" (PBL), a new method for cluster mass reconstruction, that avoids many of the pitfalls of previous techniques. PBL optimally combines lensing information of varying signal-to-noise, and makes no assumptions about the relationship between mass and light. We will describe mass reconstructions in three very different, but very illuminating cluster systems: the "Bullet Cluster" (lE 0657-56), A901/902 and A1689. The "Bullet Cluster" is a system of merging clusters made famous by the first unambiguous lensing detection of dark matter. A901/902 is a multi-cluster system with four peaks, and provides an ideal laboratory for studying cluster interaction. We are particularly interested in measuring and correlating the dark matter clump ellipticities. A1689 is one of the richest clusters known, and has significant substructure at the core. It is also my first exercise in optimally combining weak and strong gravitational lensing in a cluster reconstruction. We find that the dark matter distribution is significantly clumpier than indicated by X-ray maps of the gas. We conclude by discussing various potential applications of PBL to existing and future data.
Tracing Large Scale Structure with a Redshift Survey of Rich Clusters of Galaxies
NASA Astrophysics Data System (ADS)
Batuski, D.; Slinglend, K.; Haase, S.; Hill, J. M.
1993-12-01
Rich clusters of galaxies from Abell's catalog show evidence of structure on scales of 100 Mpc and hold promise of confirming the existence of structure in the more immediate universe on scales corresponding to COBE results (i.e., on the order of 10% or more of the horizon size of the universe). However, most Abell clusters do not as yet have measured redshifts (or, in the case of most low redshift clusters, have only one or two galaxies measured), so present knowledge of their three dimensional distribution has quite large uncertainties. The shortage of measured redshifts for these clusters may also mask a problem of projection effects corrupting the membership counts for the clusters, perhaps even to the point of spurious identifications of some of the clusters themselves. Our approach in this effort has been to use the MX multifiber spectrometer to measure redshifts of at least ten galaxies in each of about 80 Abell cluster fields with richness class R>= 1 and mag10 <= 16.8. This work will result in a somewhat deeper, much more complete (and reliable) sample of positions of rich clusters. Our primary use for the sample is for two-point correlation and other studies of the large scale structure traced by these clusters. We are also obtaining enough redshifts per cluster so that a much better sample of reliable cluster velocity dispersions will be available for other studies of cluster properties. To date, we have collected such data for 40 clusters, and for most of them, we have seven or more cluster members with redshifts, allowing for reliable velocity dispersion calculations. Velocity histograms for several interesting cluster fields are presented, along with summary tables of cluster redshift results. Also, with 10 or more redshifts in most of our cluster fields (30({') } square, just about an `Abell diameter' at z ~ 0.1) we have investigated the extent of projection effects within the Abell catalog in an effort to quantify and understand how this may effect the Abell sample.
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009-2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.
Detection of Anomalies in Hydrometric Data Using Artificial Intelligence Techniques
NASA Astrophysics Data System (ADS)
Lauzon, N.; Lence, B. J.
2002-12-01
This work focuses on the detection of anomalies in hydrometric data sequences, such as 1) outliers, which are individual data having statistical properties that differ from those of the overall population; 2) shifts, which are sudden changes over time in the statistical properties of the historical records of data; and 3) trends, which are systematic changes over time in the statistical properties. For the purpose of the design and management of water resources systems, it is important to be aware of these anomalies in hydrometric data, for they can induce a bias in the estimation of water quantity and quality parameters. These anomalies may be viewed as specific patterns affecting the data, and therefore pattern recognition techniques can be used for identifying them. However, the number of possible patterns is very large for each type of anomaly and consequently large computing capacities are required to account for all possibilities using the standard statistical techniques, such as cluster analysis. Artificial intelligence techniques, such as the Kohonen neural network and fuzzy c-means, are clustering techniques commonly used for pattern recognition in several areas of engineering and have recently begun to be used for the analysis of natural systems. They require much less computing capacity than the standard statistical techniques, and therefore are well suited for the identification of outliers, shifts and trends in hydrometric data. This work constitutes a preliminary study, using synthetic data representing hydrometric data that can be found in Canada. The analysis of the results obtained shows that the Kohonen neural network and fuzzy c-means are reasonably successful in identifying anomalies. This work also addresses the problem of uncertainties inherent to the calibration procedures that fit the clusters to the possible patterns for both the Kohonen neural network and fuzzy c-means. Indeed, for the same database, different sets of clusters can be established with these calibration procedures. A simple method for analyzing uncertainties associated with the Kohonen neural network and fuzzy c-means is developed here. The method combines the results from several sets of clusters, either from the Kohonen neural network or fuzzy c-means, so as to provide an overall diagnosis as to the identification of outliers, shifts and trends. The results indicate an improvement in the performance for identifying anomalies when the method of combining cluster sets is used, compared with when only one cluster set is used.
Site-Specific Biomolecule Labeling with Gold Clusters
Ackerson, Christopher J.; Powell, Richard D.; Hainfeld, James F.
2013-01-01
Site-specific labeling of biomolecules in vitro with gold clusters can enhance the information content of electron cryomicroscopy experiments. This chapter provides a practical overview of well-established techniques for forming biomolecule/gold cluster conjugates. Three bioconjugation chemistries are covered: Linker-mediated bioconjugation, direct gold–biomolecule bonding, and coordination-mediated bonding of nickel(II) nitrilotriacetic acid (NTA)-derivatized gold clusters to polyhistidine (His)-tagged proteins. PMID:20887859
Dorsey, Shannon; Kerns, Suzanne E U; Lucid, Leah; Pullmann, Michael D; Harrison, Julie P; Berliner, Lucy; Thompson, Kelly; Deblinger, Esther
2018-01-24
Workplace-based clinical supervision as an implementation strategy to support evidence-based treatment (EBT) in public mental health has received limited research attention. A commonly provided infrastructure support, it may offer a relatively cost-neutral implementation strategy for organizations. However, research has not objectively examined workplace-based supervision of EBT and specifically how it might differ from EBT supervision provided in efficacy and effectiveness trials. Data come from a descriptive study of supervision in the context of a state-funded EBT implementation effort. Verbal interactions from audio recordings of 438 supervision sessions between 28 supervisors and 70 clinicians from 17 public mental health organizations (in 23 offices) were objectively coded for presence and intensity coverage of 29 supervision strategies (16 content and 13 technique items), duration, and temporal focus. Random effects mixed models estimated proportion of variance in content and techniques attributable to the supervisor and clinician levels. Interrater reliability among coders was excellent. EBT cases averaged 12.4 min of supervision per session. Intensity of coverage for EBT content varied, with some discussed frequently at medium or high intensity (exposure) and others infrequently discussed or discussed only at low intensity (behavior management; assigning/reviewing client homework). Other than fidelity assessment, supervision techniques common in treatment trials (e.g., reviewing actual practice, behavioral rehearsal) were used rarely or primarily at low intensity. In general, EBT content clustered more at the clinician level; different techniques clustered at either the clinician or supervisor level. Workplace-based clinical supervision may be a feasible implementation strategy for supporting EBT implementation, yet it differs from supervision in treatment trials. Time allotted per case is limited, compressing time for EBT coverage. Techniques that involve observation of clinician skills are rarely used. Workplace-based supervision content appears to be tailored to individual clinicians and driven to some degree by the individual supervisor. Our findings point to areas for intervention to enhance the potential of workplace-based supervision for implementation effectiveness. NCT01800266 , Clinical Trials, Retrospectively Registered (for this descriptive study; registration prior to any intervention [part of phase II RCT, this manuscript is only phase I descriptive results]).
Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice
2015-01-01
Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.
A Comparison of Two Approaches to Beta-Flexible Clustering.
ERIC Educational Resources Information Center
Belbin, Lee; And Others
1992-01-01
A method for hierarchical agglomerative polythetic (multivariate) clustering, based on unweighted pair group using arithmetic averages (UPGMA) is compared with the original beta-flexible technique, a weighted average method. Reasons the flexible UPGMA strategy is recommended are discussed, focusing on the ability to recover cluster structure over…
Identification of piecewise affine systems based on fuzzy PCA-guided robust clustering technique
NASA Astrophysics Data System (ADS)
Khanmirza, Esmaeel; Nazarahari, Milad; Mousavi, Alireza
2016-12-01
Hybrid systems are a class of dynamical systems whose behaviors are based on the interaction between discrete and continuous dynamical behaviors. Since a general method for the analysis of hybrid systems is not available, some researchers have focused on specific types of hybrid systems. Piecewise affine (PWA) systems are one of the subsets of hybrid systems. The identification of PWA systems includes the estimation of the parameters of affine subsystems and the coefficients of the hyperplanes defining the partition of the state-input domain. In this paper, we have proposed a PWA identification approach based on a modified clustering technique. By using a fuzzy PCA-guided robust k-means clustering algorithm along with neighborhood outlier detection, the two main drawbacks of the well-known clustering algorithms, i.e., the poor initialization and the presence of outliers, are eliminated. Furthermore, this modified clustering technique enables us to determine the number of subsystems without any prior knowledge about system. In addition, applying the structure of the state-input domain, that is, considering the time sequence of input-output pairs, provides a more efficient clustering algorithm, which is the other novelty of this work. Finally, the proposed algorithm has been evaluated by parameter identification of an IGV servo actuator. Simulation together with experiment analysis has proved the effectiveness of the proposed method.
Hafen, G M; Hurst, C; Yearwood, J; Smith, J; Dzalilov, Z; Robinson, P J
2008-10-05
Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. (1) Feature selection: CAP has a more effective "modelling" focus than DA.(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.
Clustering properties of g -selected galaxies at z ~ 0.8
Favole, Ginevra; Comparat, Johan; Prada, Francisco; ...
2016-06-21
In current and future large redshift surveys, as the Sloan Digital Sky Survey IV extended Baryon Oscillation Spectroscopic Survey (SDSS-IV/eBOSS) or the Dark Energy Spectroscopic Instrument (DESI), we will use emission-line galaxies (ELGs) to probe cosmological models by mapping the large-scale structure of the Universe in the redshift range 0.6 < z < 1.7. We explore the halo-galaxy connection, with current data and by measuring three clustering properties of g-selected ELGs as matter tracers in the redshift range 0.6 < z < 1: (i) the redshift-space two-point correlation function using spectroscopic redshifts from the BOSS ELG sample and VIPERS; (ii)more » the angular two-point correlation function on the footprint of the CFHT-LS; (iii) the galaxy-galaxy lensing signal around the ELGs using the CFHTLenS. Furthermore, we interpret these observations by mapping them on to the latest high-resolution MultiDark Planck N-body simulation, using a novel (Sub)Halo-Abundance Matching technique that accounts for the ELG incompleteness. ELGs at z ~ 0.8 live in haloes of (1 ± 0.5) × 10 12 h -1 M⊙ and 22.5 ± 2.5 per cent of them are satellites belonging to a larger halo. The halo occupation distribution of ELGs indicates that we are sampling the galaxies in which stars form in the most efficient way, according to their stellar-to-halo mass ratio.« less
FT-IR microspectroscopy in rapid identification of bacteria in pure and mixed culture
NASA Astrophysics Data System (ADS)
Fontoura, Inglid; Belo, Ricardo; Sakane, Kumiko; Cardoso, Maria Angélica Gargione; Khouri, Sônia; Uehara, Mituo; Raniero, Leandro; Martin, Airton A.
2010-02-01
In recent years FT-IR microspectroscopy has been developed for microbiology analysis and applied successfully in pure cultures of microorganisms to rapidly identify strains of bacteria, yeasts and fungi. The investigation and characterization of microorganism mixed cultures is also of growing importance, especially in hospitals where it is common to poly-microbial infections. In this work, the rapid identification of bacteria in pure and mixed cultures was studied. The bacteria were obtained from the Institute Oswaldo Cruz culture collection at Brazil. Escherichia coli ATCC 10799 and Staphylococcus aureus ATCC 14456 were analyzed, 3 inoculations were examined in triplicate: Escherichia coli, Staphylococcus aureus and a mixed culture of them. The inoculations were prepared according to McFarland 0.5, incubated at 37 ° C for 6 hours, diluted in saline, placed in the CaF2 window and store for one hour at 50°C to obtain thin film. The measurement was performed by Spectrum Spotlight 400 (Perkin-Elmer) equipment in the range of 4000-900 cm-1, with 32 scans using a transmittance technique with point and image modes. The data were processed (baseline, normalization, calculation of first derivate followed by smoothing with 9 point using a Savitzky-Golay algorithm) and a cluster analysis were done by Ward's algorithm and an excellent discrimination between pure and mixed culture was obtained. Our preliminary results indicate that the FT-IR microspectroscopy associated with cluster analysis can be used to discriminate between pure and mixed culture.
- and Graph-Based Point Cloud Segmentation of 3d Scenes Using Perceptual Grouping Laws
NASA Astrophysics Data System (ADS)
Xu, Y.; Hoegner, L.; Tuttas, S.; Stilla, U.
2017-05-01
Segmentation is the fundamental step for recognizing and extracting objects from point clouds of 3D scene. In this paper, we present a strategy for point cloud segmentation using voxel structure and graph-based clustering with perceptual grouping laws, which allows a learning-free and completely automatic but parametric solution for segmenting 3D point cloud. To speak precisely, two segmentation methods utilizing voxel and supervoxel structures are reported and tested. The voxel-based data structure can increase efficiency and robustness of the segmentation process, suppressing the negative effect of noise, outliers, and uneven points densities. The clustering of voxels and supervoxel is carried out using graph theory on the basis of the local contextual information, which commonly conducted utilizing merely pairwise information in conventional clustering algorithms. By the use of perceptual laws, our method conducts the segmentation in a pure geometric way avoiding the use of RGB color and intensity information, so that it can be applied to more general applications. Experiments using different datasets have demonstrated that our proposed methods can achieve good results, especially for complex scenes and nonplanar surfaces of objects. Quantitative comparisons between our methods and other representative segmentation methods also confirms the effectiveness and efficiency of our proposals.
NASA Technical Reports Server (NTRS)
Wilson, Gillian; Demarco, Ricardo; Muzzin, Adam; Yee, H.K.C.; Lacy, Mark; Surace, Jason; Gilbank, David; Blindert, Kris; Hoekstra, Henk; Majumdar, Subhabrata;
2008-01-01
The Spitzer Adaptation of the Red-sequence Cluster Survey (SpARCS) is a z'-passband imaging survey, consisting of deep (z' approx. 24 AB) observations made from both hemispheres using the CFHT 3.6m and CTIO 4m telescopes. The survey was designed with the primary aim of detecting galaxy clusters at z > 1. In tandem with pre-existing 3.6 micron observations from the Spitzer Space Telescope SWIRE Legacy Survey, SpARCS detects clusters using an infrared adaptation of the two-filter red-sequence cluster technique. The total effective area of the SpARCS cluster survey is 41.9 sq deg. In this paper, we provide an overview of the 13.6 sq deg Southern CTIO/MOSAICII observations. The 28.3 sq deg Northern CFHT/MegaCam observations are summarized in a companion paper by Muzzin et al. (2008a). In this paper, we also report spectroscopic confirmation of SpARCS J003550-431224, a very rich galaxy cluster at z = 1.335, discovered in the ELAIS-S1 field. To date, this is the highest spectroscopically confirmed redshift for a galaxy cluster discovered using the red-sequence technique. Based on nine confirmed members, SpARCS J003550-431224 has a preliminary velocity dispersion of 1050+/-230 km/s. With its proven capability for efficient cluster detection, SpARCS is a demonstration that we have entered an era of large, homogeneously-selected z > 1 cluster surveys.
Zhang, Xianming; Lohmann, Rainer; Dassuncao, Clifton; Hu, Xindi C.; Weber, Andrea K.; Vecitis, Chad D.; Sunderland, Elsie M.
2017-01-01
Exposure to poly and perfluoroalkyl substances (PFASs) has been associated with adverse health effects in humans and wildlife. Understanding pollution sources is essential for environmental regulation but source attribution for PFASs has been confounded by limited information on industrial releases and rapid changes in chemical production. Here we use principal component analysis (PCA), hierarchical clustering, and geospatial analysis to understand source contributions to 14 PFASs measured across 37 sites in the Northeastern United States in 2014. PFASs are significantly elevated in urban areas compared to rural sites except for perfluorobutane sulfonate (PFBS), N-methyl perfluorooctanesulfonamidoacetic acid (N-MeFOSAA), perfluoroundecanate (PFUnDA) and perfluorododecanate (PFDoDA). The highest PFAS concentrations across sites were for perfluorooctanate (PFOA, 56 ng L−1) and perfluorohexane sulfonate (PFOS, 43 ng L−1) and PFOS levels are lower than earlier measurements of U.S. surface waters. PCA and cluster analysis indicates three main statistical groupings of PFASs. Geospatial analysis of watersheds reveals the first component/cluster originates from a mixture of contemporary point sources such as airports and textile mills. Atmospheric sources from the waste sector are consistent with the second component, and the metal smelting industry plausibly explains the third component. We find this source-attribution technique is effective for better understanding PFAS sources in urban areas. PMID:28217711
A clustering algorithm for sample data based on environmental pollution characteristics
NASA Astrophysics Data System (ADS)
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
1993-06-18
the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and clustering methods...rule rather than the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and...experiments using two microcosm protocols. We use nonmetric clustering, a multivariate pattern recognition technique developed by Matthews and Heame (1991
ERIC Educational Resources Information Center
Sinaga, Megawati
2017-01-01
The Objectives of this paper as an experimental research was to investigate the effect of Roundtable and Clustering teaching techniques and students' personal traits on students' achievement in descriptive writing. The students in grade ix of SMP Negeri 2 Pancurbatu 2016/2017 school academic year were chose as the population of this research. The…
‘Parabolic’ trapped modes and steered Dirac cones in platonic crystals
McPhedran, R. C.; Movchan, A. B.; Movchan, N. V.; Brun, M.; Smith, M. J. A.
2015-01-01
This paper discusses the properties of flexural waves governed by the biharmonic operator, and propagating in a thin plate pinned at doubly periodic sets of points. The emphases are on the design of dispersion surfaces having the Dirac cone topology, and on the related topic of trapped modes in plates for a finite set (cluster) of pinned points. The Dirac cone topologies we exhibit have at least two cones touching at a point in the reciprocal lattice, augmented by another band passing through the point. We show that these Dirac cones can be steered along symmetry lines in the Brillouin zone by varying the aspect ratio of rectangular lattices of pins, and that, as the cones are moved, the involved band surfaces tilt. We link Dirac points with a parabolic profile in their neighbourhood, and the characteristic of this parabolic profile decides the direction of propagation of the trapped mode in finite clusters. PMID:27547089
NASA Astrophysics Data System (ADS)
D'Alessandro, A.; Mangano, G.; D'Anna, G.; Luzio, D.; Selvaggi, G.
2011-12-01
On September 6th 2002 the northern Sicily was hit by a strong earthquake (MW 5.9). In the following six months over a thousand aftershocks were located in the same area. On December 7th 2009, the INGV OBSLab deployed an OBS/H near the epicentral area of the main shock at a depth of 1500 m. The submarine station was recovered after 233 days. During the eight months of the experiment the OBS/H recorded about 250 small magnitude events of clear local origin. In order to identify seismic events generated by the same tectonic structure, we have applied a clustering technique based on the similarity of the waveforms. The similarity matrix was constructed using the maximum of the normalized cross-covariance function. To identify the multiplets, we used a clustering technique based on an agglomerative hierarchical algorithm, based on the nearest neighbor strategy. The results were summarized in the dendrogram of Fig. 1. The partitions have been obtained by "cutting" the dendrogram at a level of distance equal to 0.3. So we have identified 9 multiplets and some doublets and triplets. Fig. 2 shows as example the multiplet 1. The events of this cluster have a high level of similarity; 25 of the 31 micro-events are characterized by a similarity greater than 0.9. In order to locate the micro-earthquakes recorded by the OBS/H only a single station location technique was implemented and applied. Some multiplets have clouds of hypocenters overlapping each other. These clusters, indistinguishable without the application of a waveforms clustering technique, show differences in the waveforms that must be attributed to differences in focal mechanisms which generated the waveforms.
Wagner, Martin G; Hatt, Charles R; Dunkerley, David A P; Bodart, Lindsay E; Raval, Amish N; Speidel, Michael A
2018-04-16
Transcatheter aortic valve replacement (TAVR) is a minimally invasive procedure in which a prosthetic heart valve is placed and expanded within a defective aortic valve. The device placement is commonly performed using two-dimensional (2D) fluoroscopic imaging. Within this work, we propose a novel technique to track the motion and deformation of the prosthetic valve in three dimensions based on biplane fluoroscopic image sequences. The tracking approach uses a parameterized point cloud model of the valve stent which can undergo rigid three-dimensional (3D) transformation and different modes of expansion. Rigid elements of the model are individually rotated and translated in three dimensions to approximate the motions of the stent. Tracking is performed using an iterative 2D-3D registration procedure which estimates the model parameters by minimizing the mean-squared image values at the positions of the forward-projected model points. Additionally, an initialization technique is proposed, which locates clusters of salient features to determine the initial position and orientation of the model. The proposed algorithms were evaluated based on simulations using a digital 4D CT phantom as well as experimentally acquired images of a prosthetic valve inside a chest phantom with anatomical background features. The target registration error was 0.12 ± 0.04 mm in the simulations and 0.64 ± 0.09 mm in the experimental data. The proposed algorithm could be used to generate 3D visualization of the prosthetic valve from two projections. In combination with soft-tissue sensitive-imaging techniques like transesophageal echocardiography, this technique could enable 3D image guidance during TAVR procedures. © 2018 American Association of Physicists in Medicine.
Mazarakioti, Eleni C; Poole, Katye M; Cunha-Silva, Luis; Christou, George; Stamatatos, Theocharis C
2014-08-14
The first use of the flexible Schiff base ligand N-salicylidene-2-aminocyclohexanol in metal cluster chemistry has afforded a new family of Ln7 clusters with ideal D(3h) point group symmetry and metal-centered trigonal prismatic topology; solid-state and solution studies revealed SMM and photoluminescence behaviors.
Mechanisms for the clustering of inertial particles in the inertial range of isotropic turbulence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bragg, Andrew D.; Ireland, Peter J.; Collins, Lance R.
2015-08-27
In this study, we consider the physical mechanism for the clustering of inertial particles in the inertial range of isotropic turbulence. We analyze the exact, but unclosed, equation governing the radial distribution function (RDF) and compare the mechanisms it describes for clustering in the dissipation and inertial ranges. We demonstrate that in the limit St r <<1, where St r is the Stokes number based on the eddy turnover time scale at separation r, the clustering in the inertial range can be understood to be due to the preferential sampling of the coarse-grained fluid velocity gradient tensor at that scale.more » When St r≳O(1) this mechanism gives way to a nonlocal clustering mechanism. These findings reveal that the clustering mechanisms in the inertial range are analogous to the mechanisms that we identified for the dissipation regime. Further, we discuss the similarities and differences between the clustering mechanisms we identify in the inertial range and the “sweep-stick” mechanism developed by Coleman and Vassilicos. We show that the idea that initial particles are swept along with acceleration stagnation points is only approximately true because there always exists a finite difference between the velocity of the acceleration stagnation points and the local fluid velocity. This relative velocity is sufficient to allow particles to traverse the average distance between the stagnation points within the correlation time scale of the acceleration field. We also show that the stick part of the mechanism is only valid for St r<<1 in the inertial range. We emphasize that our clustering mechanism provides the more fundamental explanation since it, unlike the sweep-stick mechanism, is able to explain clustering in arbitrary spatially correlated velocity fields. We then consider the closed, model equation for the RDF given in Zaichik and Alipchenkov and use this, together with the results from our analysis, to predict the analytic form of the RDF in the inertial range for St r<<1, which, unlike that in the dissipation range, is not scale invariant. Finally, the results are in good agreement with direct numerical simulations, provided the separations are well within the inertial range.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slepian, Zachary; Slosar, Anze; Eisenstein, Daniel J.
We search for a galaxy clustering bias due to a modulation of galaxy number with the baryon-dark matter relative velocity resulting from recombination-era physics. We find no detected signal and place the constraint bv <0.01 on the relative velocity bias for the CMASS galaxies. This bias is an important potential systematic of Baryon Acoustic Oscillation (BAO) method measurements of the cosmic distance scale using the 2-point clustering. Our limit on the relative velocity bias indicates a systematic shift of no more than 0.3% rms in the distance scale inferred from the BAO feature in the BOSS 2-point clustering, well belowmore » the 1% statistical error of this measurement. In conclusion, this constraint is the most stringent currently available and has important implications for the ability of upcoming large-scale structure surveys such as DESI to self-protect against the relative velocity as a possible systematic.« less
Slepian, Zachary; Slosar, Anze; Eisenstein, Daniel J.; ...
2017-10-24
We search for a galaxy clustering bias due to a modulation of galaxy number with the baryon-dark matter relative velocity resulting from recombination-era physics. We find no detected signal and place the constraint bv <0.01 on the relative velocity bias for the CMASS galaxies. This bias is an important potential systematic of Baryon Acoustic Oscillation (BAO) method measurements of the cosmic distance scale using the 2-point clustering. Our limit on the relative velocity bias indicates a systematic shift of no more than 0.3% rms in the distance scale inferred from the BAO feature in the BOSS 2-point clustering, well belowmore » the 1% statistical error of this measurement. In conclusion, this constraint is the most stringent currently available and has important implications for the ability of upcoming large-scale structure surveys such as DESI to self-protect against the relative velocity as a possible systematic.« less
NASA Astrophysics Data System (ADS)
Slepian, Zachary; Eisenstein, Daniel J.; Blazek, Jonathan A.; Brownstein, Joel R.; Chuang, Chia-Hsun; Gil-Marín, Héctor; Ho, Shirley; Kitaura, Francisco-Shu; McEwen, Joseph E.; Percival, Will J.; Ross, Ashley J.; Rossi, Graziano; Seo, Hee-Jong; Slosar, Anže; Vargas-Magaña, Mariana
2018-02-01
We search for a galaxy clustering bias due to a modulation of galaxy number with the baryon-dark matter relative velocity resulting from recombination-era physics. We find no detected signal and place the constraint bv < 0.01 on the relative velocity bias for the CMASS galaxies. This bias is an important potential systematic of baryon acoustic oscillation (BAO) method measurements of the cosmic distance scale using the two-point clustering. Our limit on the relative velocity bias indicates a systematic shift of no more than 0.3 per cent rms in the distance scale inferred from the BAO feature in the BOSS two-point clustering, well below the 1 per cent statistical error of this measurement. This constraint is the most stringent currently available and has important implications for the ability of upcoming large-scale structure surveys such as the Dark Energy Spectroscopic Instrument (DESI) to self-protect against the relative velocity as a possible systematic.
NASA Astrophysics Data System (ADS)
Takizawa, Kenji; Tezduyar, Tayfun E.; Boben, Joseph; Kostov, Nikolay; Boswell, Cody; Buscher, Austin
2013-12-01
To increase aerodynamic performance, the geometric porosity of a ringsail spacecraft parachute canopy is sometimes increased, beyond the "rings" and "sails" with hundreds of "ring gaps" and "sail slits." This creates extra computational challenges for fluid-structure interaction (FSI) modeling of clusters of such parachutes, beyond those created by the lightness of the canopy structure, geometric complexities of hundreds of gaps and slits, and the contact between the parachutes of the cluster. In FSI computation of parachutes with such "modified geometric porosity," the flow through the "windows" created by the removal of the panels and the wider gaps created by the removal of the sails cannot be accurately modeled with the Homogenized Modeling of Geometric Porosity (HMGP), which was introduced to deal with the hundreds of gaps and slits. The flow needs to be actually resolved. All these computational challenges need to be addressed simultaneously in FSI modeling of clusters of spacecraft parachutes with modified geometric porosity. The core numerical technology is the Stabilized Space-Time FSI (SSTFSI) technique, and the contact between the parachutes is handled with the Surface-Edge-Node Contact Tracking (SENCT) technique. In the computations reported here, in addition to the SSTFSI and SENCT techniques and HMGP, we use the special techniques we have developed for removing the numerical spinning component of the parachute motion and for restoring the mesh integrity without a remesh. We present results for 2- and 3-parachute clusters with two different payload models.
SOTXTSTREAM: Density-based self-organizing clustering of text streams.
Bryant, Avory C; Cios, Krzysztof J
2017-01-01
A streaming data clustering algorithm is presented building upon the density-based self-organizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets.
Jácome, Gabriel; Valarezo, Carla; Yoo, Changkyoo
2018-03-30
Pollution and the eutrophication process are increasing in lake Yahuarcocha and constant water quality monitoring is essential for a better understanding of the patterns occurring in this ecosystem. In this study, key sensor locations were determined using spatial and temporal analyses combined with geographical information systems (GIS) to assess the influence of weather features, anthropogenic activities, and other non-point pollution sources. A water quality monitoring network was established to obtain data on 14 physicochemical and microbiological parameters at each of seven sample sites over a period of 13 months. A spatial and temporal statistical approach using pattern recognition techniques, such as cluster analysis (CA) and discriminant analysis (DA), was employed to classify and identify the most important water quality parameters in the lake. The original monitoring network was reduced to four optimal sensor locations based on a fuzzy overlay of the interpolations of concentration variations of the most important parameters.
Coherent Structure Detection using Persistent Homology and other Topological Tools
NASA Astrophysics Data System (ADS)
Smith, Spencer; Roberts, Eric; Sindi, Suzanne; Mitchell, Kevin
2017-11-01
For non-autonomous, aperiodic fluid flows, coherent structures help organize the dynamics, much as invariant manifolds and periodic orbits do for autonomous or periodic systems. The prevalence of such flows in nature and industry has motivated many successful techniques for defining and detecting coherent structures. However, often these approaches require very fine trajectory data to reconstruct velocity fields and compute Cauchy-Green-tensor-related quantities. We use topological techniques to help detect coherent trajectory sets in relatively sparse 2D advection problems. More specifically, we have developed a homotopy-based algorithm, the ensemble-based topological entropy calculation (E-tec), which assigns to each edge in an initial triangulation of advected points a topologically forced lower bound on its future stretching rate. The triangulation and its weighted edges allow us to analyze flows using persistent homology. This topological data analysis tool detects clusters and loops in the triangulation that are robust in the presence of noise and in this case correspond to coherent trajectory sets.
A study and evaluation of image analysis techniques applied to remotely sensed data
NASA Technical Reports Server (NTRS)
Atkinson, R. J.; Dasarathy, B. V.; Lybanon, M.; Ramapriyan, H. K.
1976-01-01
An analysis of phenomena causing nonlinearities in the transformation from Landsat multispectral scanner coordinates to ground coordinates is presented. Experimental results comparing rms errors at ground control points indicated a slight improvement when a nonlinear (8-parameter) transformation was used instead of an affine (6-parameter) transformation. Using a preliminary ground truth map of a test site in Alabama covering the Mobile Bay area and six Landsat images of the same scene, several classification methods were assessed. A methodology was developed for automatic change detection using classification/cluster maps. A coding scheme was employed for generation of change depiction maps indicating specific types of changes. Inter- and intraseasonal data of the Mobile Bay test area were compared to illustrate the method. A beginning was made in the study of data compression by applying a Karhunen-Loeve transform technique to a small section of the test data set. The second part of the report provides a formal documentation of the several programs developed for the analysis and assessments presented.
Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.
2011-01-01
Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stello, Dennis; Huber, Daniel; Bedding, Timothy R.
Studying star clusters offers significant advances in stellar astrophysics due to the combined power of having many stars with essentially the same distance, age, and initial composition. This makes clusters excellent test benches for verification of stellar evolution theory. To fully exploit this potential, it is vital that the star sample is uncontaminated by stars that are not members of the cluster. Techniques for determining cluster membership therefore play a key role in the investigation of clusters. We present results on three clusters in the Kepler field of view based on a newly established technique that uses asteroseismology to identifymore » fore- or background stars in the field, which demonstrates advantages over classical methods such as kinematic and photometry measurements. Four previously identified seismic non-members in NGC 6819 are confirmed in this study, and three additional non-members are found-two in NGC 6819 and one in NGC 6791. We further highlight which stars are, or might be, affected by blending, which needs to be taken into account when analyzing these Kepler data.« less
Structure of overheated metal clusters: MD simulation study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vorontsov, Alexander
2015-08-17
The structure of overheated metal clusters appeared in condensation process was studied by computer simulation techniques. It was found that clusters with size larger than several tens of atoms have three layers: core part, intermediate dense packing layer and a gas- like shell with low density. The change of the size and structure of these layers with the variation of internal energy and the size of cluster is discussed.
Cancer detection based on Raman spectra super-paramagnetic clustering
NASA Astrophysics Data System (ADS)
González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual
2016-08-01
The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.
Convalescing Cluster Configuration Using a Superlative Framework
Sabitha, R.; Karthik, S.
2015-01-01
Competent data mining methods are vital to discover knowledge from databases which are built as a result of enormous growth of data. Various techniques of data mining are applied to obtain knowledge from these databases. Data clustering is one such descriptive data mining technique which guides in partitioning data objects into disjoint segments. K-means algorithm is a versatile algorithm among the various approaches used in data clustering. The algorithm and its diverse adaptation methods suffer certain problems in their performance. To overcome these issues a superlative algorithm has been proposed in this paper to perform data clustering. The specific feature of the proposed algorithm is discretizing the dataset, thereby improving the accuracy of clustering, and also adopting the binary search initialization method to generate cluster centroids. The generated centroids are fed as input to K-means approach which iteratively segments the data objects into respective clusters. The clustered results are measured for accuracy and validity. Experiments conducted by testing the approach on datasets from the UC Irvine Machine Learning Repository evidently show that the accuracy and validity measure is higher than the other two approaches, namely, simple K-means and Binary Search method. Thus, the proposed approach proves that discretization process will improve the efficacy of descriptive data mining tasks. PMID:26543895
VizieR Online Data Catalog: HST astro-photometric analysis of NGC5139. III. (Bellini+, 2017)
NASA Astrophysics Data System (ADS)
Bellini, A.; Milone, A. P.; Anderson, J.; Marino, A. F.; Piotto, G.; van der Marel, R. P.; Bedin, L. R.; King, I. R.
2018-03-01
The results presented here are the product of a massive effort, and represent a continuation of what we published in Bellini+ (2010, J/AJ/140/631). Paper I of this series (Bellini+ 2017, J/ApJ/842/6) describes the photometric techniques we adopted and applied to 650 individual exposures in 26 different bands. The photometry has been corrected for differential reddening and zero-point spatial variations in Bellini+ (2017ApJ...842....7B, Paper II). In this paper, we analyze the CMDs and the so-called "chromosome" maps (Milone+ 2017MNRAS.464.3636M) of the MS of the cluster, and finally identify at least 15 distinct stellar populations. (1 data file).
Goodacre, R; Hiom, S J; Cheeseman, S L; Murdoch, D; Weightman, A J; Wade, W G
1996-02-01
Curie-point pyrolysis mass spectra were obtained from 29 oral asaccharolytic Eubacterium strains and 6 abscess isolates previously identified as Peptostreptococcus heliotrinreducens. Pyrolysis mass spectrometry (PyMS) with cluster analysis was able to clarify the taxonomic position of this group of organisms. Artificial neural networks (ANNS) were then trained by supervised learning (with the back-propagation algorithm) to recognize the strains from their pyrolysis mass spectra; all Eubacterium strains were correctly identified, and the abscess isolates were identified as un-named Eubacterium taxon C2 and were distinct from the type strain of P. heliotrinreducens. These results demonstrate that the combination of PyMS and ANNs provides a rapid and accurate identification technique.
NASA Technical Reports Server (NTRS)
McDowell, Mark (Inventor); Glasgow, Thomas K. (Inventor)
1999-01-01
A system and a method for measuring three-dimensional velocities at a plurality of points in a fluid employing at least two cameras positioned approximately perpendicular to one another. The cameras are calibrated to accurately represent image coordinates in world coordinate system. The two-dimensional views of the cameras are recorded for image processing and centroid coordinate determination. Any overlapping particle clusters are decomposed into constituent centroids. The tracer particles are tracked on a two-dimensional basis and then stereo matched to obtain three-dimensional locations of the particles as a function of time so that velocities can be measured therefrom The stereo imaging velocimetry technique of the present invention provides a full-field. quantitative, three-dimensional map of any optically transparent fluid which is seeded with tracer particles.
On the shape and orientation control of an orbiting shallow spherical shell structure
NASA Technical Reports Server (NTRS)
Bainum, P. M.; Reddy, A. S. S. R.
1982-01-01
The dynamics of orbiting shallow flexible spherical shell structures under the influence of control actuators was studied. Control laws are developed to provide both attitude and shape control of the structure. The elastic modal frequencies for the fundamental and lower modes are closely grouped due to the effect of the shell curvature. The shell is gravity stabilized by a spring loaded dumbbell type damper attached at its apex. Control laws are developed based on the pole clustering techniques. Savings in fuel consumption can be realized by using the hybrid shell dumbbell system together with point actuators. It is indicated that instability may result by not including the orbital and first order gravity gradient effects in the plant prior to control law design.
An integrated approach to assess heavy metal source apportionment in peri-urban agricultural soils.
Huang, Ying; Li, Tingqiang; Wu, Chengxian; He, Zhenli; Japenga, Jan; Deng, Meihua; Yang, Xiaoe
2015-12-15
Three techniques (Isotope Ratio Analysis, GIS mapping, and Multivariate Statistical Analysis) were integrated to assess heavy metal pollution and source apportionment in peri-urban agricultural soils. The soils in the study area were moderately polluted with cadmium (Cd) and mercury (Hg), lightly polluted with lead (Pb), and chromium (Cr). GIS Mapping suggested Cd pollution originates from point sources, whereas Hg, Pb, Cr could be traced back to both point and non-point sources. Principal component analysis (PCA) indicated aluminum (Al), manganese (Mn), nickel (Ni) were mainly inherited from natural sources, while Hg, Pb, and Cd were associated with two different kinds of anthropogenic sources. Cluster analysis (CA) further identified fertilizers, waste water, industrial solid wastes, road dust, and atmospheric deposition as potential sources. Based on isotope ratio analysis (IRA) organic fertilizers and road dusts accounted for 74-100% and 0-24% of the total Hg input, while road dusts and solid wastes contributed for 0-80% and 19-100% of the Pb input. This study provides a reliable approach for heavy metal source apportionment in this particular peri-urban area, with a clear potential for future application in other regions. Copyright © 2015 Elsevier B.V. All rights reserved.
Elastic K-means using posterior probability.
Zheng, Aihua; Jiang, Bo; Li, Yan; Zhang, Xuehan; Ding, Chris
2017-01-01
The widely used K-means clustering is a hard clustering algorithm. Here we propose a Elastic K-means clustering model (EKM) using posterior probability with soft capability where each data point can belong to multiple clusters fractionally and show the benefit of proposed Elastic K-means. Furthermore, in many applications, besides vector attributes information, pairwise relations (graph information) are also available. Thus we integrate EKM with Normalized Cut graph clustering into a single clustering formulation. Finally, we provide several useful matrix inequalities which are useful for matrix formulations of learning models. Based on these results, we prove the correctness and the convergence of EKM algorithms. Experimental results on six benchmark datasets demonstrate the effectiveness of proposed EKM and its integrated model.
Melting of isolated tin nanoparticles
Bachels; Guntherodt; Schafer
2000-08-07
The melting of isolated neutral tin cluster distributions with mean sizes of about 500 atoms has been investigated in a molecular beam experiment by calorimetrically measuring the clusters' formation energies as a function of their internal temperature. For this purpose the possibility to adjust the temperature of the clusters' internal degrees of freedom by means of the temperature of the cluster source's nozzle was exploited. The melting point of the investigated tin clusters was found to be lowered by 125 K and the latent heat of fusion per atom is reduced by 35% compared to bulk tin. The melting behavior of the isolated tin clusters is discussed with respect to the occurrence of surface premelting.
MOCCA code for star cluster simulation: comparison with optical observations using COCOA
NASA Astrophysics Data System (ADS)
Askar, Abbas; Giersz, Mirek; Pych, Wojciech; Olech, Arkadiusz; Hypki, Arkadiusz
2016-02-01
We introduce and present preliminary results from COCOA (Cluster simulatiOn Comparison with ObservAtions) code for a star cluster after 12 Gyr of evolution simulated using the MOCCA code. The COCOA code is being developed to quickly compare results of numerical simulations of star clusters with observational data. We use COCOA to obtain parameters of the projected cluster model. For comparison, a FITS file of the projected cluster was provided to observers so that they could use their observational methods and techniques to obtain cluster parameters. The results show that the similarity of cluster parameters obtained through numerical simulations and observations depends significantly on the quality of observational data and photometric accuracy.
ERIC Educational Resources Information Center
Huang, Francis L.; Cornell, Dewey G.
2016-01-01
Advances in multilevel modeling techniques now make it possible to investigate the psychometric properties of instruments using clustered data. Factor models that overlook the clustering effect can lead to underestimated standard errors, incorrect parameter estimates, and model fit indices. In addition, factor structures may differ depending on…
Hierarchical Spatio-temporal Visual Analysis of Cluster Evolution in Electrocorticography Data
Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...
2016-10-02
Here, we present ECoG ClusterFlow, a novel interactive visual analysis tool for the exploration of high-resolution Electrocorticography (ECoG) data. Our system detects and visualizes dynamic high-level structures, such as communities, using the time-varying spatial connectivity network derived from the high-resolution ECoG data. ECoG ClusterFlow provides a multi-scale visualization of the spatio-temporal patterns underlying the time-varying communities using two views: 1) an overview summarizing the evolution of clusters over time and 2) a hierarchical glyph-based technique that uses data aggregation and small multiples techniques to visualize the propagation of clusters in their spatial domain. ECoG ClusterFlow makes it possible 1) tomore » compare the spatio-temporal evolution patterns across various time intervals, 2) to compare the temporal information at varying levels of granularity, and 3) to investigate the evolution of spatial patterns without occluding the spatial context information. Lastly, we present case studies done in collaboration with neuroscientists on our team for both simulated and real epileptic seizure data aimed at evaluating the effectiveness of our approach.« less