Multiscale visual quality assessment for cluster analysis with self-organizing maps
NASA Astrophysics Data System (ADS)
Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias
2011-01-01
Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.
Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Data Analysis and Visualization; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,'' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
2008-05-12
The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii)more » evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.« less
Level-2 Milestone 4797: Early Users on Max, Sequoia Visualization Cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cupps, Kim C.
This report documents the fact that an early user has run successfully on Max, the Sequoia visualization cluster, ASC L2 milestone 4797: Early Users on Sequoia Visualization System (Max), due December 31, 2013. The Max visualization and data analysis cluster will provide Sequoia users with compute cycles and an interactive option for data exploration and analysis. The system will be integrated in the first quarter of FY14 and the system is expected to be moved to the classified network by the second quarter of FY14. The goal of this milestone is to have early users running their visualization and datamore » analysis work on the Max cluster on the classified network.« less
Murray, Nicholas P; Hunfalvay, Melissa
2017-02-01
Considerable research has documented that successful performance in interceptive tasks (such as return of serve in tennis) is based on the performers' capability to capture appropriate anticipatory information prior to the flight path of the approaching object. Athletes of higher skill tend to fixate on different locations in the playing environment prior to initiation of a skill than their lesser skilled counterparts. The purpose of this study was to examine visual search behaviour strategies of elite (world ranked) tennis players and non-ranked competitive tennis players (n = 43) utilising cluster analysis. The results of hierarchical (Ward's method) and nonhierarchical (k means) cluster analyses revealed three different clusters. The clustering method distinguished visual behaviour of high, middle-and low-ranked players. Specifically, high-ranked players demonstrated longer mean fixation duration and lower variation of visual search than middle-and low-ranked players. In conclusion, the results demonstrated that cluster analysis is a useful tool for detecting and analysing the areas of interest for use in experimental analysis of expertise and to distinguish visual search variables among participants'.
Visualizing nD Point Clouds as Topological Landscape Profiles to Guide Local Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2012-05-04
Analyzing high-dimensional point clouds is a classical challenge in visual analytics. Traditional techniques, such as projections or axis-based techniques, suffer from projection artifacts, occlusion, and visual complexity.We propose to split data analysis into two parts to address these shortcomings. First, a structural overview phase abstracts data by its density distribution. This phase performs topological analysis to support accurate and non-overlapping presentation of the high-dimensional cluster structure as a topological landscape profile. Utilizing a landscape metaphor, it presents clusters and their nesting as hills whose height, width, and shape reflect cluster coherence, size, and stability, respectively. A second local analysis phasemore » utilizes this global structural knowledge to select individual clusters or point sets for further, localized data analysis. Focusing on structural entities significantly reduces visual clutter in established geometric visualizations and permits a clearer, more thorough data analysis. In conclusion, this analysis complements the global topological perspective and enables the user to study subspaces or geometric properties, such as shape.« less
Interactive visual exploration and refinement of cluster assignments.
Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R
2017-09-12
With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.
ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.
Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi
2015-01-01
Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.
Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.
Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc
2018-01-01
In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.
a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data
NASA Astrophysics Data System (ADS)
Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.
2017-09-01
Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.
Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A
2018-01-01
Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.
Leung, S C; Fung, W K; Wong, K H
1999-01-01
The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.
Hierarchical Spatio-temporal Visual Analysis of Cluster Evolution in Electrocorticography Data
Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...
2016-10-02
Here, we present ECoG ClusterFlow, a novel interactive visual analysis tool for the exploration of high-resolution Electrocorticography (ECoG) data. Our system detects and visualizes dynamic high-level structures, such as communities, using the time-varying spatial connectivity network derived from the high-resolution ECoG data. ECoG ClusterFlow provides a multi-scale visualization of the spatio-temporal patterns underlying the time-varying communities using two views: 1) an overview summarizing the evolution of clusters over time and 2) a hierarchical glyph-based technique that uses data aggregation and small multiples techniques to visualize the propagation of clusters in their spatial domain. ECoG ClusterFlow makes it possible 1) tomore » compare the spatio-temporal evolution patterns across various time intervals, 2) to compare the temporal information at varying levels of granularity, and 3) to investigate the evolution of spatial patterns without occluding the spatial context information. Lastly, we present case studies done in collaboration with neuroscientists on our team for both simulated and real epileptic seizure data aimed at evaluating the effectiveness of our approach.« less
fluff: exploratory analysis and visualization of high-throughput sequencing data
Georgiou, Georgios
2016-01-01
Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532
2015-01-01
Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893
Visualizing statistical significance of disease clusters using cartograms.
Kronenfeld, Barry J; Wong, David W S
2017-05-15
Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.
Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward; ...
2017-06-06
There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
Interactive visual exploration and analysis of origin-destination data
NASA Astrophysics Data System (ADS)
Ding, Linfang; Meng, Liqiu; Yang, Jian; Krisp, Jukka M.
2018-05-01
In this paper, we propose a visual analytics approach for the exploration of spatiotemporal interaction patterns of massive origin-destination data. Firstly, we visually query the movement database for data at certain time windows. Secondly, we conduct interactive clustering to allow the users to select input variables/features (e.g., origins, destinations, distance, and duration) and to adjust clustering parameters (e.g. distance threshold). The agglomerative hierarchical clustering method is applied for the multivariate clustering of the origin-destination data. Thirdly, we design a parallel coordinates plot for visualizing the precomputed clusters and for further exploration of interesting clusters. Finally, we propose a gradient line rendering technique to show the spatial and directional distribution of origin-destination clusters on a map view. We implement the visual analytics approach in a web-based interactive environment and apply it to real-world floating car data from Shanghai. The experiment results show the origin/destination hotspots and their spatial interaction patterns. They also demonstrate the effectiveness of our proposed approach.
InCHlib - interactive cluster heatmap for web applications.
Skuta, Ctibor; Bartůněk, Petr; Svozil, Daniel
2014-12-01
Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust . The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa
2017-01-01
The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo
2017-12-01
To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
van Haaften, Rachel I M; Luceri, Cristina; van Erk, Arie; Evelo, Chris T A
2009-06-01
Omics technology used for large-scale measurements of gene expression is rapidly evolving. This work pointed out the need of an extensive bioinformatics analyses for array quality assessment before and after gene expression clustering and pathway analysis. A study focused on the effect of red wine polyphenols on rat colon mucosa was used to test the impact of quality control and normalisation steps on the biological conclusions. The integration of data visualization, pathway analysis and clustering revealed an artifact problem that was solved with an adapted normalisation. We propose a possible point to point standard analysis procedure, based on a combination of clustering and data visualization for the analysis of microarray data.
DICON: interactive visual analysis of multidimensional clusters.
Cao, Nan; Gotz, David; Sun, Jimeng; Qu, Huamin
2011-12-01
Clustering as a fundamental data analysis technique has been widely used in many analytic applications. However, it is often difficult for users to understand and evaluate multidimensional clustering results, especially the quality of clusters and their semantics. For large and complex data, high-level statistical information about the clusters is often needed for users to evaluate cluster quality while a detailed display of multidimensional attributes of the data is necessary to understand the meaning of clusters. In this paper, we introduce DICON, an icon-based cluster visualization that embeds statistical information into a multi-attribute display to facilitate cluster interpretation, evaluation, and comparison. We design a treemap-like icon to represent a multidimensional cluster, and the quality of the cluster can be conveniently evaluated with the embedded statistical information. We further develop a novel layout algorithm which can generate similar icons for similar clusters, making comparisons of clusters easier. User interaction and clutter reduction are integrated into the system to help users more effectively analyze and refine clustering results for large datasets. We demonstrate the power of DICON through a user study and a case study in the healthcare domain. Our evaluation shows the benefits of the technique, especially in support of complex multidimensional cluster analysis. © 2011 IEEE
An Information-Theoretic-Cluster Visualization for Self-Organizing Maps.
Brito da Silva, Leonardo Enzo; Wunsch, Donald C
2018-06-01
Improved data visualization will be a significant tool to enhance cluster analysis. In this paper, an information-theoretic-based method for cluster visualization using self-organizing maps (SOMs) is presented. The information-theoretic visualization (IT-vis) has the same structure as the unified distance matrix, but instead of depicting Euclidean distances between adjacent neurons, it displays the similarity between the distributions associated with adjacent neurons. Each SOM neuron has an associated subset of the data set whose cardinality controls the granularity of the IT-vis and with which the first- and second-order statistics are computed and used to estimate their probability density functions. These are used to calculate the similarity measure, based on Renyi's quadratic cross entropy and cross information potential (CIP). The introduced visualizations combine the low computational cost and kernel estimation properties of the representative CIP and the data structure representation of a single-linkage-based grouping algorithm to generate an enhanced SOM-based visualization. The visual quality of the IT-vis is assessed by comparing it with other visualization methods for several real-world and synthetic benchmark data sets. Thus, this paper also contains a significant literature survey. The experiments demonstrate the IT-vis cluster revealing capabilities, in which cluster boundaries are sharply captured. Additionally, the information-theoretic visualizations are used to perform clustering of the SOM. Compared with other methods, IT-vis of large SOMs yielded the best results in this paper, for which the quality of the final partitions was evaluated using external validity indices.
Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M
2008-01-01
Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. Conclusion The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. Method We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit. PMID:18992163
Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M
2008-11-07
Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of SaTScan results. The geovisual analytics approach described in this manuscript facilitates the interpretation of spatial cluster detection methods by providing cartographic representation of SaTScan results and by providing visualization methods and tools that support selection of SaTScan parameters. Our methods distinguish between heterogeneous and homogeneous clusters and assess the stability of clusters across analytic scales. We analyzed the cervical cancer mortality data for the United States aggregated by county between 2000 and 2004. We ran SaTScan on the dataset fifty times with different parameter choices. Our geovisual analytics approach couples SaTScan with our visual analytic platform, allowing users to interactively explore and compare SaTScan results produced by different parameter choices. The Standardized Mortality Ratio and reliability scores are visualized for all the counties to identify stable, homogeneous clusters. We evaluated our analysis result by comparing it to that produced by other independent techniques including the Empirical Bayes Smoothing and Kafadar spatial smoother methods. The geovisual analytics approach introduced here is developed and implemented in our Java-based Visual Inquiry Toolkit.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murugesan, Sugeerth; Bouchard, Kristofer; Chang, Edward
There exists a need for effective and easy-to-use software tools supporting the analysis of complex Electrocorticography (ECoG) data. Understanding how epileptic seizures develop or identifying diagnostic indicators for neurological diseases require the in-depth analysis of neural activity data from ECoG. Such data is multi-scale and is of high spatio-temporal resolution. Comprehensive analysis of this data should be supported by interactive visual analysis methods that allow a scientist to understand functional patterns at varying levels of granularity and comprehend its time-varying behavior. We introduce a novel multi-scale visual analysis system, ECoG ClusterFlow, for the detailed exploration of ECoG data. Our systemmore » detects and visualizes dynamic high-level structures, such as communities, derived from the time-varying connectivity network. The system supports two major views: 1) an overview summarizing the evolution of clusters over time and 2) an electrode view using hierarchical glyph-based design to visualize the propagation of clusters in their spatial, anatomical context. We present case studies that were performed in collaboration with neuroscientists and neurosurgeons using simulated and recorded epileptic seizure data to demonstrate our system's effectiveness. ECoG ClusterFlow supports the comparison of spatio-temporal patterns for specific time intervals and allows a user to utilize various clustering algorithms. Neuroscientists can identify the site of seizure genesis and its spatial progression during various the stages of a seizure. Our system serves as a fast and powerful means for the generation of preliminary hypotheses that can be used as a basis for subsequent application of rigorous statistical methods, with the ultimate goal being the clinical treatment of epileptogenic zones.« less
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.
Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin
2017-08-31
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks
Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin
2017-01-01
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211
Network visualization of conformational sampling during molecular dynamics simulation.
Ahlstrom, Logan S; Baker, Joseph Lee; Ehrlich, Kent; Campbell, Zachary T; Patel, Sunita; Vorontsov, Ivan I; Tama, Florence; Miyashita, Osamu
2013-11-01
Effective data reduction methods are necessary for uncovering the inherent conformational relationships present in large molecular dynamics (MD) trajectories. Clustering algorithms provide a means to interpret the conformational sampling of molecules during simulation by grouping trajectory snapshots into a few subgroups, or clusters, but the relationships between the individual clusters may not be readily understood. Here we show that network analysis can be used to visualize the dominant conformational states explored during simulation as well as the connectivity between them, providing a more coherent description of conformational space than traditional clustering techniques alone. We compare the results of network visualization against 11 clustering algorithms and principal component conformer plots. Several MD simulations of proteins undergoing different conformational changes demonstrate the effectiveness of networks in reaching functional conclusions. Copyright © 2013 Elsevier Inc. All rights reserved.
Using Machine Learning Techniques in the Analysis of Oceanographic Data
NASA Astrophysics Data System (ADS)
Falcinelli, K. E.; Abuomar, S.
2017-12-01
Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.
EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.
Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan
2018-01-01
Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Visual verification and analysis of cluster detection for molecular dynamics.
Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas
2007-01-01
A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.
clusterProfiler: an R package for comparing biological themes among gene clusters.
Yu, Guangchuang; Wang, Li-Gen; Han, Yanyan; He, Qing-Yu
2012-05-01
Increasing quantitative data generated from transcriptomics and proteomics require integrative strategies for analysis. Here, we present an R package, clusterProfiler that automates the process of biological-term classification and the enrichment analysis of gene clusters. The analysis module and visualization module were combined into a reusable workflow. Currently, clusterProfiler supports three species, including humans, mice, and yeast. Methods provided in this package can be easily extended to other species and ontologies. The clusterProfiler package is released under Artistic-2.0 License within Bioconductor project. The source code and vignette are freely available at http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html.
Recognizing patterns of visual field loss using unsupervised machine learning
NASA Astrophysics Data System (ADS)
Yousefi, Siamak; Goldbaum, Michael H.; Zangwill, Linda M.; Medeiros, Felipe A.; Bowd, Christopher
2014-03-01
Glaucoma is a potentially blinding optic neuropathy that results in a decrease in visual sensitivity. Visual field abnormalities (decreased visual sensitivity on psychophysical tests) are the primary means of glaucoma diagnosis. One form of visual field testing is Frequency Doubling Technology (FDT) that tests sensitivity at 52 points within the visual field. Like other psychophysical tests used in clinical practice, FDT results yield specific patterns of defect indicative of the disease. We used Gaussian Mixture Model with Expectation Maximization (GEM), (EM is used to estimate the model parameters) to automatically separate FDT data into clusters of normal and abnormal eyes. Principal component analysis (PCA) was used to decompose each cluster into different axes (patterns). FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal (i.e., glaucomatous) FDT results, recruited from a university-based, longitudinal, multi-center, clinical study on glaucoma. The GEM input was the 52-point FDT threshold sensitivities for all eyes. The optimal GEM model separated the FDT fields into 3 clusters. Cluster 1 contained 94% normal fields (94% specificity) and clusters 2 and 3 combined, contained 77% abnormal fields (77% sensitivity). For clusters 1, 2 and 3 the optimal number of PCA-identified axes were 2, 2 and 5, respectively. GEM with PCA successfully separated FDT fields from healthy and glaucoma eyes and identified familiar glaucomatous patterns of loss.
Coronal Mass Ejection Data Clustering and Visualization of Decision Trees
NASA Astrophysics Data System (ADS)
Ma, Ruizhe; Angryk, Rafal A.; Riley, Pete; Filali Boubrahimi, Soukaina
2018-05-01
Coronal mass ejections (CMEs) can be categorized as either “magnetic clouds” (MCs) or non-MCs. Features such as a large magnetic field, low plasma-beta, and low proton temperature suggest that a CME event is also an MC event; however, so far there is neither a definitive method nor an automatic process to distinguish the two. Human labeling is time-consuming, and results can fluctuate owing to the imprecise definition of such events. In this study, we approach the problem of MC and non-MC distinction from a time series data analysis perspective and show how clustering can shed some light on this problem. Although many algorithms exist for traditional data clustering in the Euclidean space, they are not well suited for time series data. Problems such as inadequate distance measure, inaccurate cluster center description, and lack of intuitive cluster representations need to be addressed for effective time series clustering. Our data analysis in this work is twofold: clustering and visualization. For clustering we compared the results from the popular hierarchical agglomerative clustering technique to a distance density clustering heuristic we developed previously for time series data clustering. In both cases, dynamic time warping will be used for similarity measure. For classification as well as visualization, we use decision trees to aggregate single-dimensional clustering results to form a multidimensional time series decision tree, with averaged time series to present each decision. In this study, we achieved modest accuracy and, more importantly, an intuitive interpretation of how different parameters contribute to an MC event.
Fung, David C Y; Wilkins, Marc R; Hart, David; Hong, Seok-Hee
2010-07-01
The force-directed layout is commonly used in computer-generated visualizations of protein-protein interaction networks. While it is good for providing a visual outline of the protein complexes and their interactions, it has two limitations when used as a visual analysis method. The first is poor reproducibility. Repeated running of the algorithm does not necessarily generate the same layout, therefore, demanding cognitive readaptation on the investigator's part. The second limitation is that it does not explicitly display complementary biological information, e.g. Gene Ontology, other than the protein names or gene symbols. Here, we present an alternative layout called the clustered circular layout. Using the human DNA replication protein-protein interaction network as a case study, we compared the two network layouts for their merits and limitations in supporting visual analysis.
Clustering of samples and variables with mixed-type data
Edelmann, Dominic; Kopp-Schneider, Annette
2017-01-01
Analysis of data measured on different scales is a relevant challenge. Biomedical studies often focus on high-throughput datasets of, e.g., quantitative measurements. However, the need for integration of other features possibly measured on different scales, e.g. clinical or cytogenetic factors, becomes increasingly important. The analysis results (e.g. a selection of relevant genes) are then visualized, while adding further information, like clinical factors, on top. However, a more integrative approach is desirable, where all available data are analyzed jointly, and where also in the visualization different data sources are combined in a more natural way. Here we specifically target integrative visualization and present a heatmap-style graphic display. To this end, we develop and explore methods for clustering mixed-type data, with special focus on clustering variables. Clustering of variables does not receive as much attention in the literature as does clustering of samples. We extend the variables clustering methodology by two new approaches, one based on the combination of different association measures and the other on distance correlation. With simulation studies we evaluate and compare different clustering strategies. Applying specific methods for mixed-type data proves to be comparable and in many cases beneficial as compared to standard approaches applied to corresponding quantitative or binarized data. Our two novel approaches for mixed-type variables show similar or better performance than the existing methods ClustOfVar and bias-corrected mutual information. Further, in contrast to ClustOfVar, our methods provide dissimilarity matrices, which is an advantage, especially for the purpose of visualization. Real data examples aim to give an impression of various kinds of potential applications for the integrative heatmap and other graphical displays based on dissimilarity matrices. We demonstrate that the presented integrative heatmap provides more information than common data displays about the relationship among variables and samples. The described clustering and visualization methods are implemented in our R package CluMix available from https://cran.r-project.org/web/packages/CluMix. PMID:29182671
NeatMap--non-clustering heat map alternatives in R.
Rajaram, Satwik; Oono, Yoshi
2010-01-22
The clustered heat map is the most popular means of visualizing genomic data. It compactly displays a large amount of data in an intuitive format that facilitates the detection of hidden structures and relations in the data. However, it is hampered by its use of cluster analysis which does not always respect the intrinsic relations in the data, often requiring non-standardized reordering of rows/columns to be performed post-clustering. This sometimes leads to uninformative and/or misleading conclusions. Often it is more informative to use dimension-reduction algorithms (such as Principal Component Analysis and Multi-Dimensional Scaling) which respect the topology inherent in the data. Yet, despite their proven utility in the analysis of biological data, they are not as widely used. This is at least partially due to the lack of user-friendly visualization methods with the visceral impact of the heat map. NeatMap is an R package designed to meet this need. NeatMap offers a variety of novel plots (in 2 and 3 dimensions) to be used in conjunction with these dimension-reduction techniques. Like the heat map, but unlike traditional displays of such results, it allows the entire dataset to be displayed while visualizing relations between elements. It also allows superimposition of cluster analysis results for mutual validation. NeatMap is shown to be more informative than the traditional heat map with the help of two well-known microarray datasets. NeatMap thus preserves many of the strengths of the clustered heat map while addressing some of its deficiencies. It is hoped that NeatMap will spur the adoption of non-clustering dimension-reduction algorithms.
Assessment of the vision-specific quality of life using clustered visual field in glaucoma patients.
Sawada, Hideko; Yoshino, Takaiko; Fukuchi, Takeo; Abe, Haruki
2014-02-01
To investigate the significance of vision-specific quality of life (QOL) in glaucoma patients based on the location of visual field defects. We examined 336 eyes of 168 patients. The 25-item National Eye Institute Visual Function Questionnaire was used to evaluate patients' QOL. Visual field testing was performed using the Humphrey Field Analyzer; the visual field was divided into 10 clusters. We defined the eye with better mean deviation as the better eye and the fellow eye as the worse eye. A single linear regression analysis was applied to assess the significance of the relationship between QOL and the clustered visual field. The strongest correlation was observed in the lower paracentral visual field in the better eye. The lower peripheral visual field in the better eye also showed a good correlation. Correlation coefficients in the better eye were generally higher than those in the worse eye. For driving, the upper temporal visual field in the better eye was the most strongly correlated (r=0.509). For role limitation and peripheral vision, the lower peripheral visual field in the better eye had the highest correlation coefficients at 0.459 and 0.425, respectively. Overall, clusters in the lower hemifield in the better eye were more strongly correlated with QOL than those in the worse eye. In particular, the lower paracentral visual field in the better eye was correlated most strongly of all. Driving, however, strongly correlated with the upper hemifield in the better eye.
Ugulu, Ilker; Aydin, Halil
2016-01-01
We propose an approach to clustering and visualization of students' cognitive structural models. We use the self-organizing map (SOM) combined with Ward's clustering to conduct cluster analysis. In the study carried out on 100 subjects, a conceptual understanding test consisting of open-ended questions was used as a data collection tool. The results of analyses indicated that students constructed the aliveness concept by associating it predominantly with human. Motion appeared as the most frequently associated term with the aliveness concept. The results suggest that the aliveness concept has been constructed using anthropocentric and animistic cognitive structures. In the next step, we used the data obtained from the conceptual understanding test for training the SOM. Consequently, we propose a visualization method about cognitive structure of the aliveness concept. PMID:26819579
Nguyen, Hien D; Ullmann, Jeremy F P; McLachlan, Geoffrey J; Voleti, Venkatakaushik; Li, Wenze; Hillman, Elizabeth M C; Reutens, David C; Janke, Andrew L
2018-02-01
Calcium is a ubiquitous messenger in neural signaling events. An increasing number of techniques are enabling visualization of neurological activity in animal models via luminescent proteins that bind to calcium ions. These techniques generate large volumes of spatially correlated time series. A model-based functional data analysis methodology via Gaussian mixtures is suggested for the clustering of data from such visualizations is proposed. The methodology is theoretically justified and a computationally efficient approach to estimation is suggested. An example analysis of a zebrafish imaging experiment is presented.
Visual Pattern Analysis in Histopathology Images Using Bag of Features
NASA Astrophysics Data System (ADS)
Cruz-Roa, Angel; Caicedo, Juan C.; González, Fabio A.
This paper presents a framework to analyse visual patterns in a collection of medical images in a two stage procedure. First, a set of representative visual patterns from the image collection is obtained by constructing a visual-word dictionary under a bag-of-features approach. Second, an analysis of the relationships between visual patterns and semantic concepts in the image collection is performed. The most important visual patterns for each semantic concept are identified using correlation analysis. A matrix visualization of the structure and organization of the image collection is generated using a cluster analysis. The experimental evaluation was conducted on a histopathology image collection and results showed clear relationships between visual patterns and semantic concepts, that in addition, are of easy interpretation and understanding.
Visual cluster analysis and pattern recognition methods
Osbourn, Gordon Cecil; Martinez, Rubel Francisco
2001-01-01
A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.
Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun
2014-04-29
To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Springmeyer, R R; Brugger, E; Cook, R
The Data group provides data analysis and visualization support to its customers. This consists primarily of the development and support of VisIt, a data analysis and visualization tool. Support ranges from answering questions about the tool, providing classes on how to use the tool, and performing data analysis and visualization for customers. The Information Management and Graphics Group supports and develops tools that enhance our ability to access, display, and understand large, complex data sets. Activities include applying visualization software for large scale data exploration; running video production labs on two networks; supporting graphics libraries and tools for end users;more » maintaining PowerWalls and assorted other displays; and developing software for searching and managing scientific data. Researchers in the Center for Applied Scientific Computing (CASC) work on various projects including the development of visualization techniques for large scale data exploration that are funded by the ASC program, among others. The researchers also have LDRD projects and collaborations with other lab researchers, academia, and industry. The IMG group is located in the Terascale Simulation Facility, home to Dawn, Atlas, BGL, and others, which includes both classified and unclassified visualization theaters, a visualization computer floor and deployment workshop, and video production labs. We continued to provide the traditional graphics group consulting and video production support. We maintained five PowerWalls and many other displays. We deployed a 576-node Opteron/IB cluster with 72 TB of memory providing a visualization production server on our classified network. We continue to support a 128-node Opteron/IB cluster providing a visualization production server for our unclassified systems and an older 256-node Opteron/IB cluster for the classified systems, as well as several smaller clusters to drive the PowerWalls. The visualization production systems includes NFS servers to provide dedicated storage for data analysis and visualization. The ASC projects have delivered new versions of visualization and scientific data management tools to end users and continue to refine them. VisIt had 4 releases during the past year, ending with VisIt 2.0. We released version 2.4 of Hopper, a Java application for managing and transferring files. This release included a graphical disk usage view which works on all types of connections and an aggregated copy feature for quickly transferring massive datasets quickly and efficiently to HPSS. We continue to use and develop Blockbuster and Telepath. Both the VisIt and IMG teams were engaged in a variety of movie production efforts during the past year in addition to the development tasks.« less
Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T
2010-01-01
Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.
Visual cluster analysis and pattern recognition template and methods
Osbourn, Gordon Cecil; Martinez, Rubel Francisco
1999-01-01
A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.
Multivariate Analysis of the Visual Information Processing of Numbers
ERIC Educational Resources Information Center
Levine, David M.
1977-01-01
Nonmetric multidimensional scaling and hierarchical clustering procedures are applied to a confusion matrix of numerals. Two dimensions were interpreted: straight versus curved, and locus of curvature. Four major clusters of numerals were developed. (Author/JKS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Andrew; Haass, Michael; Rintoul, Mark Daniel
GazeAppraise advances the state of the art of gaze pattern analysis using methods that simultaneously analyze spatial and temporal characteristics of gaze patterns. GazeAppraise enables novel research in visual perception and cognition; for example, using shape features as distinguishing elements to assess individual differences in visual search strategy. Given a set of point-to-point gaze sequences, hereafter referred to as scanpaths, the method constructs multiple descriptive features for each scanpath. Once the scanpath features have been calculated, they are used to form a multidimensional vector representing each scanpath and cluster analysis is performed on the set of vectors from all scanpaths.more » An additional benefit of this method is the identification of causal or correlated characteristics of the stimuli, subjects, and visual task through statistical analysis of descriptive metadata distributions within and across clusters.« less
Guasom Analysis Of The Alhambra Survey
NASA Astrophysics Data System (ADS)
Garabato, Daniel; Manteiga, Minia; Dafonte, Carlos; Álvarez, Marco A.
2017-10-01
GUASOM is a data mining tool designed for knowledge discovery in large astronomical spectrophotometric archives developed in the framework of Gaia DPAC (Data Processing and Analysis Consortium). Our tool is based on a type of unsupervised learning Artificial Neural Networks named Self-organizing maps (SOMs). SOMs permit the grouping and visualization of big amount of data for which there is no a priori knowledge and hence they are very useful for analyzing the huge amount of information present in modern spectrophotometric surveys. SOMs are used to organize the information in clusters of objects, as homogeneously as possible according to their spectral energy distributions, and to project them onto a 2D grid where the data structure can be visualized. Each cluster has a representative, called prototype which is a virtual pattern that better represents or resembles the set of input patterns belonging to such a cluster. Prototypes make easier the task of determining the physical nature and properties of the objects populating each cluster. Our algorithm has been tested on the ALHAMBRA survey spectrophotometric observations, here we present our results concerning the survey segmentation, visualization of the data structure, separation between types of objects (stars and galaxies), data homogeneity of neurons, cluster prototypes, redshift distribution and crossmatch with other databases (Simbad).
Yang, Guang; Raschke, Felix; Barrick, Thomas R; Howe, Franklyn A
2015-09-01
To investigate whether nonlinear dimensionality reduction improves unsupervised classification of (1) H MRS brain tumor data compared with a linear method. In vivo single-voxel (1) H magnetic resonance spectroscopy (55 patients) and (1) H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With (1) H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of (1) H MRSI data after cluster analysis. © 2014 Wiley Periodicals, Inc.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Convex clustering: an attractive alternative to hierarchical clustering.
Chen, Gary K; Chi, Eric C; Ranola, John Michael O; Lange, Kenneth
2015-05-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/.
NASA Astrophysics Data System (ADS)
Lespinats, Sylvain; Pinker-Domenig, Katja; Wengert, Georg; Houben, Ivo; Lobbes, Marc; Stadlbauer, Andreas; Meyer-Bäse, Anke
2016-05-01
Glioma-derived cancer stem cells (GSCs) are tumor-initiating cells and may be refractory to radiation and chemotherapy and thus have important implications for tumor biology and therapeutics. The analysis and interpretation of large proteomic data sets requires the development of new data mining and visualization approaches. Traditional techniques are insufficient to interpret and visualize these resulting experimental data. The emphasis of this paper lies in the application of novel approaches for the visualization, clustering and projection representation to unveil hidden data structures relevant for the accurate interpretation of biological experiments. These qualitative and quantitative methods are applied to the proteomic analysis of data sets derived from the GSCs. The achieved clustering and visualization results provide a more detailed insight into the protein-level fold changes and putative upstream regulators for the GSCs. However the extracted molecular information is insufficient in classifying GSCs and paving the pathway to an improved therapeutics of the heterogeneous glioma.
Igloo-Plot: a tool for visualization of multidimensional datasets.
Kuntal, Bhusan K; Ghosh, Tarini Shankar; Mande, Sharmila S
2014-01-01
Advances in science and technology have resulted in an exponential growth of multivariate (or multi-dimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies. Igloo-Plot is available for download from: http://metagenomics.atc.tcs.com/IglooPlot/. Copyright © 2014 Elsevier Inc. All rights reserved.
Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.
Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban
2017-05-01
Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P < y"), pointwise and cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P < y") with specificity exactly 95%. These criteria were applied to 505 eyes tested over a mean of 10.5 years, to find how soon each detected "deterioration," and compared using survival models. This was repeated including two subsequent visual fields to determine whether "deterioration" was confirmed. The best global criterion detected deterioration in 25% of eyes in 5.0 years (95% confidence interval [CI], 4.7-5.3 years), compared with 4.8 years (95% CI, 4.2-5.1) for the best cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.
Visual hallucinatory syndromes and the anatomy of the visual brain.
Santhouse, A M; Howard, R J; ffytche, D H
2000-10-01
We have set out to identify phenomenological correlates of cerebral functional architecture within Charles Bonnet syndrome (CBS) hallucinations by looking for associations between specific hallucination categories. Thirty-four CBS patients were examined with a structured interview/questionnaire to establish the presence of 28 different pathological visual experiences. Associations between categories of pathological experience were investigated by an exploratory factor analysis. Twelve of the pathological experiences partitioned into three segregated syndromic clusters. The first cluster consisted of hallucinations of extended landscape scenes and small figures in costumes with hats; the second, hallucinations of grotesque, disembodied and distorted faces with prominent eyes and teeth; and the third, visual perseveration and delayed palinopsia. The three visual psycho-syndromes mirror the segregation of hierarchical visual pathways into streams and suggest a novel theoretical framework for future research into the pathophysiology of neuropsychiatric syndromes.
Market basket analysis visualization on a spherical surface
NASA Astrophysics Data System (ADS)
Hao, Ming C.; Hsu, Meichun; Dayal, Umeshwar; Wei, Shu F.; Sprenger, Thomas; Holenstein, Thomas
2001-05-01
This paper discusses the visualization of the relationships in e-commerce transactions. To date, many practical research projects have shown the usefulness of a physics-based mass- spring technique to layout data items with close relationships on a graph. We describe a market basket analysis visualization system using this technique. This system is described as the following: (1) integrates a physics-based engine into a visual data mining platform; (2) use a 3D spherical surface to visualize the cluster of related data items; and (3) for large volumes of transactions, uses hidden structures to unclutter the display. Several examples of market basket analysis are also provided.
Visual cluster analysis and pattern recognition template and methods
Osbourn, G.C.; Martinez, R.F.
1999-05-04
A method of clustering using a novel template to define a region of influence is disclosed. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques. 30 figs.
Molecular Eigensolution Symmetry Analysis and Fine Structure
Harter, William G.; Mitchell, Justin C.
2013-01-01
Spectra of high-symmetry molecules contain fine and superfine level cluster structure related to J-tunneling between hills and valleys on rovibronic energy surfaces (RES). Such graphic visualizations help disentangle multi-level dynamics, selection rules, and state mixing effects including widespread violation of nuclear spin symmetry species. A review of RES analysis compares it to that of potential energy surfaces (PES) used in Born–Oppenheimer approximations. Both take advantage of adiabatic coupling in order to visualize Hamiltonian eigensolutions. RES of symmetric and D2 asymmetric top rank-2-tensor Hamiltonians are compared with Oh spherical top rank-4-tensor fine-structure clusters of 6-fold and 8-fold tunneling multiplets. Then extreme 12-fold and 24-fold multiplets are analyzed by RES plots of higher rank tensor Hamiltonians. Such extreme clustering is rare in fundamental bands but prevalent in hot bands, and analysis of its superfine structure requires more efficient labeling and a more powerful group theory. This is introduced using elementary examples involving two groups of order-6 (C6 and D3~C3v), then applied to families of Oh clusters in SF6 spectra and to extreme clusters. PMID:23344041
pong: fast analysis and visualization of latent clusters in population genetic data.
Behr, Aaron A; Liu, Katherine Z; Liu-Fang, Gracie; Nakka, Priyanka; Ramachandran, Sohini
2016-09-15
A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. pong is freely available and can be installed using the Python package management system pip. pong's source code is available at https://github.com/abehr/pong aaron_behr@alumni.brown.edu or sramachandran@brown.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Roguev, Assen; Ryan, Colm J; Xu, Jiewei; Colson, Isabelle; Hartsuiker, Edgar; Krogan, Nevan
2018-02-01
This protocol describes computational analysis of genetic interaction screens, ranging from data capture (plate imaging) to downstream analyses. Plate imaging approaches using both digital camera and office flatbed scanners are included, along with a protocol for the extraction of colony size measurements from the resulting images. A commonly used genetic interaction scoring method, calculation of the S-score, is discussed. These methods require minimal computer skills, but some familiarity with MATLAB and Linux/Unix is a plus. Finally, an outline for using clustering and visualization software for analysis of resulting data sets is provided. © 2018 Cold Spring Harbor Laboratory Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sanfilippo, Antonio P.; Chikkagoudar, Satish
We describe an approach to analyzing trade data which uses clustering to detect similarities across shipping manifest records, classification to evaluate clustering results and categorize new unseen shipping data records, and visual analytics to provide to support situation awareness in dynamic decision making to monitor and warn against the movement of radiological threat materials through search, analysis and forecasting capabilities. The evaluation of clustering results through classification and systematic inspection of the clusters show the clusters have strong semantic cohesion and offer novel ways to detect transactions related to nuclear smuggling.
Liston, Adam D; De Munck, Jan C; Hamandi, Khalid; Laufs, Helmut; Ossenblok, Pauly; Duncan, John S; Lemieux, Louis
2006-07-01
Simultaneous acquisition of EEG and fMRI data enables the investigation of the hemodynamic correlates of interictal epileptiform discharges (IEDs) during the resting state in patients with epilepsy. This paper addresses two issues: (1) the semi-automation of IED classification in statistical modelling for fMRI analysis and (2) the improvement of IED detection to increase experimental fMRI efficiency. For patients with multiple IED generators, sensitivity to IED-correlated BOLD signal changes can be improved when the fMRI analysis model distinguishes between IEDs of differing morphology and field. In an attempt to reduce the subjectivity of visual IED classification, we implemented a semi-automated system, based on the spatio-temporal clustering of EEG events. We illustrate the technique's usefulness using EEG-fMRI data from a subject with focal epilepsy in whom 202 IEDs were visually identified and then clustered semi-automatically into four clusters. Each cluster of IEDs was modelled separately for the purpose of fMRI analysis. This revealed IED-correlated BOLD activations in distinct regions corresponding to three different IED categories. In a second step, Signal Space Projection (SSP) was used to project the scalp EEG onto the dipoles corresponding to each IED cluster. This resulted in 123 previously unrecognised IEDs, the inclusion of which, in the General Linear Model (GLM), increased the experimental efficiency as reflected by significant BOLD activations. We have also shown that the detection of extra IEDs is robust in the face of fluctuations in the set of visually detected IEDs. We conclude that automated IED classification can result in more objective fMRI models of IEDs and significantly increased sensitivity.
Katwal, Santosh B; Gore, John C; Marois, Rene; Rogers, Baxter P
2013-09-01
We present novel graph-based visualizations of self-organizing maps for unsupervised functional magnetic resonance imaging (fMRI) analysis. A self-organizing map is an artificial neural network model that transforms high-dimensional data into a low-dimensional (often a 2-D) map using unsupervised learning. However, a postprocessing scheme is necessary to correctly interpret similarity between neighboring node prototypes (feature vectors) on the output map and delineate clusters and features of interest in the data. In this paper, we used graph-based visualizations to capture fMRI data features based upon 1) the distribution of data across the receptive fields of the prototypes (density-based connectivity); and 2) temporal similarities (correlations) between the prototypes (correlation-based connectivity). We applied this approach to identify task-related brain areas in an fMRI reaction time experiment involving a visuo-manual response task, and we correlated the time-to-peak of the fMRI responses in these areas with reaction time. Visualization of self-organizing maps outperformed independent component analysis and voxelwise univariate linear regression analysis in identifying and classifying relevant brain regions. We conclude that the graph-based visualizations of self-organizing maps help in advanced visualization of cluster boundaries in fMRI data enabling the separation of regions with small differences in the timings of their brain responses.
Kovalska, M P; Bürki, E; Schoetzau, A; Orguel, S F; Orguel, S; Grieshaber, M C
2011-04-01
The distinction of real progression from test variability in visual field (VF) series may be based on clinical judgment, on trend analysis based on follow-up of test parameters over time, or on identification of a significant change related to the mean of baseline exams (event analysis). The aim of this study was to compare a new population-based method (Octopus field analysis, OFA) with classic regression analyses and clinical judgment for detecting glaucomatous VF changes. 240 VF series of 240 patients with at least 9 consecutive examinations available were included into this study. They were independently classified by two experienced investigators. The results of such a classification served as a reference for comparison for the following statistical tests: (a) t-test global, (b) r-test global, (c) regression analysis of 10 VF clusters and (d) point-wise linear regression analysis. 32.5 % of the VF series were classified as progressive by the investigators. The sensitivity and specificity were 89.7 % and 92.0 % for r-test, and 73.1 % and 93.8 % for the t-test, respectively. In the point-wise linear regression analysis, the specificity was comparable (89.5 % versus 92 %), but the sensitivity was clearly lower than in the r-test (22.4 % versus 89.7 %) at a significance level of p = 0.01. A regression analysis for the 10 VF clusters showed a markedly higher sensitivity for the r-test (37.7 %) than the t-test (14.1 %) at a similar specificity (88.3 % versus 93.8 %) for a significant trend (p = 0.005). In regard to the cluster distribution, the paracentral clusters and the superior nasal hemifield progressed most frequently. The population-based regression analysis seems to be superior to the trend analysis in detecting VF progression in glaucoma, and may eliminate the drawbacks of the event analysis. Further, it may assist the clinician in the evaluation of VF series and may allow better visualization of the correlation between function and structure owing to VF clusters. © Georg Thieme Verlag KG Stuttgart · New York.
Bennett, Robert M; Russell, Jon; Cappelleri, Joseph C; Bushmakin, Andrew G; Zlateva, Gergana; Sadosky, Alesia
2010-06-28
The purpose of this study was to determine whether some of the clinical features of fibromyalgia (FM) that patients would like to see improved aggregate into definable clusters. Seven hundred and eighty-eight patients with clinically confirmed FM and baseline pain > or =40 mm on a 100 mm visual analogue scale ranked 5 FM clinical features that the subjects would most like to see improved after treatment (one for each priority quintile) from a list of 20 developed during focus groups. For each subject, clinical features were transformed into vectors with rankings assigned values 1-5 (lowest to highest ranking). Logistic analysis was used to create a distance matrix and hierarchical cluster analysis was applied to identify cluster structure. The frequency of cluster selection was determined, and cluster importance was ranked using cluster scores derived from rankings of the clinical features. Multidimensional scaling was used to visualize and conceptualize cluster relationships. Six clinical features clusters were identified and named based on their key characteristics. In order of selection frequency, the clusters were Pain (90%; 4 clinical features), Fatigue (89%; 4 clinical features), Domestic (42%; 4 clinical features), Impairment (29%; 3 functions), Affective (21%; 3 clinical features), and Social (9%; 2 functional). The "Pain Cluster" was ranked of greatest importance by 54% of subjects, followed by Fatigue, which was given the highest ranking by 28% of subjects. Multidimensional scaling mapped these clusters to two dimensions: Status (bounded by Physical and Emotional domains), and Setting (bounded by Individual and Group interactions). Common clinical features of FM could be grouped into 6 clusters (Pain, Fatigue, Domestic, Impairment, Affective, and Social) based on patient perception of relevance to treatment. Furthermore, these 6 clusters could be charted in the 2 dimensions of Status and Setting, thus providing a unique perspective for interpretation of FM symptomatology.
Somatotyping using 3D anthropometry: a cluster analysis.
Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur
2013-01-01
Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.
Modified multidimensional scaling approach to analyze financial markets.
Yin, Yi; Shang, Pengjian
2014-06-01
Detrended cross-correlation coefficient (σDCCA) and dynamic time warping (DTW) are introduced as the dissimilarity measures, respectively, while multidimensional scaling (MDS) is employed to translate the dissimilarities between daily price returns of 24 stock markets. We first propose MDS based on σDCCA dissimilarity and MDS based on DTW dissimilarity creatively, while MDS based on Euclidean dissimilarity is also employed to provide a reference for comparisons. We apply these methods in order to further visualize the clustering between stock markets. Moreover, we decide to confront MDS with an alternative visualization method, "Unweighed Average" clustering method, for comparison. The MDS analysis and "Unweighed Average" clustering method are employed based on the same dissimilarity. Through the results, we find that MDS gives us a more intuitive mapping for observing stable or emerging clusters of stock markets with similar behavior, while the MDS analysis based on σDCCA dissimilarity can provide more clear, detailed, and accurate information on the classification of the stock markets than the MDS analysis based on Euclidean dissimilarity. The MDS analysis based on DTW dissimilarity indicates more knowledge about the correlations between stock markets particularly and interestingly. Meanwhile, it reflects more abundant results on the clustering of stock markets and is much more intensive than the MDS analysis based on Euclidean dissimilarity. In addition, the graphs, originated from applying MDS methods based on σDCCA dissimilarity and DTW dissimilarity, may also guide the construction of multivariate econometric models.
Karmonik, Christof; Fung, Steve H; Dulay, M; Verma, A; Grossman, Robert G
2013-01-01
Graph-theoretical analysis algorithms have been used for identifying subnetworks in the human brain during the Default Mode State. Here, these methods are expanded to determine the interaction of the sensory and the motor subnetworks during the performance of an approach-avoidance paradigm utilizing the correlation strength between the signal intensity time courses as measure of synchrony. From functional magnetic resonance imaging (fMRI) data of 9 healthy volunteers, two signal time courses, one from the primary visual cortex (sensory input) and one from the motor cortex (motor output) were identified and a correlation difference map was calculated. Graph networks were created from this map and visualized with spring-embedded layouts and 3D layouts in the original anatomical space. Functional clusters in these networks were identified with the MCODE clustering algorithm. Interactions between the sensory sub-network and the motor sub-network were quantified through the interaction strengths of these clusters. The percentages of interactions involving the visual cortex ranged from 85 % to 18 % and the motor cortex ranged from 40 % to 9 %. Other regions with high interactions were: frontal cortex (19 ± 18 %), insula (17 ± 22 %), cuneus (16 ± 15 %), supplementary motor area (SMA, 11 ± 18 %) and subcortical regions (11 ± 10 %). Interactions between motor cortex, SMA and visual cortex accounted for 12 %, between visual cortex and cuneus for 8 % and between motor cortex, SMA and cuneus for 6 % of all interactions. These quantitative findings are supported by the visual impressions from the 2D and 3D network layouts.
Goodpaster, Aaron M.; Kennedy, Michael A.
2015-01-01
Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data. PMID:26246647
Visual reconciliation of alternative similarity spaces in climate modeling
J Poco; A Dasgupta; Y Wei; William Hargrove; C.R. Schwalm; D.N. Huntzinger; R Cook; E Bertini; C.T. Silva
2015-01-01
Visual data analysis often requires grouping of data objects based on their similarity. In many application domains researchers use algorithms and techniques like clustering and multidimensional scaling to extract groupings from data. While extracting these groups using a single similarity criteria is relatively straightforward, comparing alternative criteria poses...
Using cluster analysis to organize and explore regional GPS velocities
Simpson, Robert W.; Thatcher, Wayne; Savage, James C.
2012-01-01
Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.
Visual Field Map Clusters in Macaque Extrastriate Visual Cortex
Kolster, Hauke; Mandeville, Joseph B.; Arsenault, John T.; Ekstrom, Leeland B.; Wald, Lawrence L.; Vanduffel, Wim
2009-01-01
The macaque visual cortex contains more than 30 different functional visual areas, yet surprisingly little is known about the underlying organizational principles that structure its components into a complete ‘visual’ unit. A recent model of visual cortical organization in humans suggests that visual field maps are organized as clusters. Clusters minimize axonal connections between individual field maps that represent common visual percepts, with different clusters thought to carry out different functions. Experimental support for this hypothesis, however, is lacking in macaques, leaving open the question of whether it is unique to humans or a more general model for primate vision. Here we show, using high-resolution BOLD fMRI data in the awake monkey at 7 Tesla, that area MT/V5 and its neighbors are organized as a cluster with a common foveal representation and a circular eccentricity map. This novel view on the functional topography of area MT/V5 and satellites indicates that field map clusters are evolutionarily preserved and may be a fundamental organizational principle of the old world primate visual cortex. PMID:19474330
Spatial pattern recognition of seismic events in South West Colombia
NASA Astrophysics Data System (ADS)
Benítez, Hernán D.; Flórez, Juan F.; Duque, Diana P.; Benavides, Alberto; Lucía Baquero, Olga; Quintero, Jiber
2013-09-01
Recognition of seismogenic zones in geographical regions supports seismic hazard studies. This recognition is usually based on visual, qualitative and subjective analysis of data. Spatial pattern recognition provides a well founded means to obtain relevant information from large amounts of data. The purpose of this work is to identify and classify spatial patterns in instrumental data of the South West Colombian seismic database. In this research, clustering tendency analysis validates whether seismic database possesses a clustering structure. A non-supervised fuzzy clustering algorithm creates groups of seismic events. Given the sensitivity of fuzzy clustering algorithms to centroid initial positions, we proposed a methodology to initialize centroids that generates stable partitions with respect to centroid initialization. As a result of this work, a public software tool provides the user with the routines developed for clustering methodology. The analysis of the seismogenic zones obtained reveals meaningful spatial patterns in South-West Colombia. The clustering analysis provides a quantitative location and dispersion of seismogenic zones that facilitates seismological interpretations of seismic activities in South West Colombia.
A Statistical Physics Perspective to Understand Social Visual Attention in Autism Spectrum Disorder.
Liberati, Alessio; Fadda, Roberta; Doneddu, Giuseppe; Congiu, Sara; Javarone, Marco A; Striano, Tricia; Chessa, Alessandro
2017-08-01
This study investigated social visual attention in children with Autism Spectrum Disorder (ASD) and with typical development (TD) in the light of Brockmann and Geisel's model of visual attention. The probability distribution of gaze movements and clustering of gaze points, registered with eye-tracking technology, was studied during a free visual exploration of a gaze stimulus. A data-driven analysis of the distribution of eye movements was chosen to overcome any possible methodological problems related to the subjective expectations of the experimenters about the informative contents of the image in addition to a computational model to simulate group differences. Analysis of the eye-tracking data indicated that the scanpaths of children with TD and ASD were characterized by eye movements geometrically equivalent to Lévy flights. Children with ASD showed a higher frequency of long saccadic amplitudes compared with controls. A clustering analysis revealed a greater dispersion of eye movements for these children. Modeling of the results indicated higher values of the model parameter modulating the dispersion of eye movements for children with ASD. Together, the experimental results and the model point to a greater dispersion of gaze points in ASD.
Barton, Brian; Brewer, Alyssa A.
2017-01-01
The cortical hierarchy of the human visual system has been shown to be organized around retinal spatial coordinates throughout much of low- and mid-level visual processing. These regions contain visual field maps (VFMs) that each follows the organization of the retina, with neighboring aspects of the visual field processed in neighboring cortical locations. On a larger, macrostructural scale, groups of such sensory cortical field maps (CFMs) in both the visual and auditory systems are organized into roughly circular cloverleaf clusters. CFMs within clusters tend to share properties such as receptive field distribution, cortical magnification, and processing specialization. Here we use fMRI and population receptive field (pRF) modeling to investigate the extent of VFM and cluster organization with an examination of higher-level visual processing in temporal cortex and compare these measurements to mid-level visual processing in dorsal occipital cortex. In human temporal cortex, the posterior superior temporal sulcus (pSTS) has been implicated in various neuroimaging studies as subserving higher-order vision, including face processing, biological motion perception, and multimodal audiovisual integration. In human dorsal occipital cortex, the transverse occipital sulcus (TOS) contains the V3A/B cluster, which comprises two VFMs subserving mid-level motion perception and visuospatial attention. For the first time, we present the organization of VFMs in pSTS in a cloverleaf cluster. This pSTS cluster contains four VFMs bilaterally: pSTS-1:4. We characterize these pSTS VFMs as relatively small at ∼125 mm2 with relatively large pRF sizes of ∼2–8° of visual angle across the central 10° of the visual field. V3A and V3B are ∼230 mm2 in surface area, with pRF sizes here similarly ∼1–8° of visual angle across the same region. In addition, cortical magnification measurements show that a larger extent of the pSTS VFM surface areas are devoted to the peripheral visual field than those in the V3A/B cluster. Reliability measurements of VFMs in pSTS and V3A/B reveal that these cloverleaf clusters are remarkably consistent and functionally differentiable. Our findings add to the growing number of measurements of widespread sensory CFMs organized into cloverleaf clusters, indicating that CFMs and cloverleaf clusters may both be fundamental organizing principles in cortical sensory processing. PMID:28293182
Patterns of victimization between and within peer clusters in a high school social network.
Swartz, Kristin; Reyns, Bradford W; Wilcox, Pamela; Dunham, Jessica R
2012-01-01
This study presents a descriptive analysis of patterns of violent victimization between and within the various cohesive clusters of peers comprising a sample of more than 500 9th-12th grade students from one high school. Social network analysis techniques provide a visualization of the overall friendship network structure and allow for the examination of variation in victimization across the various peer clusters within the larger network. Social relationships among clusters with varying levels of victimization are also illustrated so as to provide a sense of possible spatial clustering or diffusion of victimization across proximal peer clusters. Additionally, to provide a sense of the sorts of peer clusters that support (or do not support) victimization, characteristics of clusters at both the high and low ends of the victimization scale are discussed. Finally, several of the peer clusters at both the high and low ends of the victimization continuum are "unpacked", allowing examination of within-network individual-level differences in victimization for these select clusters.
Ranked centroid projection: a data visualization approach with self-organizing maps.
Yen, G G; Wu, Z
2008-02-01
The self-organizing map (SOM) is an efficient tool for visualizing high-dimensional data. In this paper, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e., document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text mining. The proposed approach first transforms the document space into a multidimensional vector space by means of document encoding. Afterwards, a growing hierarchical SOM (GHSOM) is trained and used as a baseline structure to automatically produce maps with various levels of detail. Following the GHSOM training, the new projection method, namely the ranked centroid projection (RCP), is applied to project the input vectors to a hierarchy of 2-D output maps. The RCP is used as a data analysis tool as well as a direct interface to the data. In a set of simulations, the proposed approach is applied to an illustrative data set and two real-world scientific document collections to demonstrate its applicability.
NASA Astrophysics Data System (ADS)
Masuda, Nobuyuki; Sugie, Takashige; Ito, Tomoyoshi; Tanaka, Shinjiro; Hamada, Yu; Satake, Shin-ichi; Kunugi, Tomoaki; Sato, Kazuho
2010-12-01
We have designed a PC cluster system with special purpose computer boards for visualization of fluid flow using digital holographic particle tracking velocimetry (DHPTV). In this board, there is a Field Programmable Gate Array (FPGA) chip in which is installed a pipeline for calculating the intensity of an object from a hologram by fast Fourier transform (FFT). This cluster system can create 1024 reconstructed images from a 1024×1024-grid hologram in 0.77 s. It is expected that this system will contribute to the analysis of fluid flow using DHPTV.
Fernandez, Nicolas F.; Gundersen, Gregory W.; Rahman, Adeeb; Grimes, Mark L.; Rikova, Klarisa; Hornbeck, Peter; Ma’ayan, Avi
2017-01-01
Most tools developed to visualize hierarchically clustered heatmaps generate static images. Clustergrammer is a web-based visualization tool with interactive features such as: zooming, panning, filtering, reordering, sharing, performing enrichment analysis, and providing dynamic gene annotations. Clustergrammer can be used to generate shareable interactive visualizations by uploading a data table to a web-site, or by embedding Clustergrammer in Jupyter Notebooks. The Clustergrammer core libraries can also be used as a toolkit by developers to generate visualizations within their own applications. Clustergrammer is demonstrated using gene expression data from the cancer cell line encyclopedia (CCLE), original post-translational modification data collected from lung cancer cells lines by a mass spectrometry approach, and original cytometry by time of flight (CyTOF) single-cell proteomics data from blood. Clustergrammer enables producing interactive web based visualizations for the analysis of diverse biological data. PMID:28994825
Engels, Michael F M; Gibbs, Alan C; Jaeger, Edward P; Verbinnen, Danny; Lobanov, Victor S; Agrafiotis, Dimitris K
2006-01-01
We report on the structural comparison of the corporate collections of Johnson & Johnson Pharmaceutical Research & Development (JNJPRD) and 3-Dimensional Pharmaceuticals (3DP), performed in the context of the recent acquisition of 3DP by JNJPRD. The main objective of the study was to assess the druglikeness of the 3DP library and the extent to which it enriched the chemical diversity of the JNJPRD corporate collection. The two databases, at the time of acquisition, collectively contained more than 1.1 million compounds with a clearly defined structural description. The analysis was based on a clustering approach and aimed at providing an intuitive quantitative estimate and visual representation of this enrichment. A novel hierarchical clustering algorithm called divisive k-means was employed in combination with Kelley's cluster-level selection method to partition the combined data set into clusters, and the diversity contribution of each library was evaluated as a function of the relative occupancy of these clusters. Typical 3DP chemotypes enriching the diversity of the JNJPRD collection were catalogued and visualized using a modified maximum common substructure algorithm. The joint collection of JNJPRD and 3DP compounds was also compared to other databases of known medicinally active or druglike compounds. The potential of the methodology for the analysis of very large chemical databases is discussed.
Multi-class ERP-based BCI data analysis using a discriminant space self-organizing map.
Onishi, Akinari; Natsume, Kiyohisa
2014-01-01
Emotional or non-emotional image stimulus is recently applied to event-related potential (ERP) based brain computer interfaces (BCI). Though the classification performance is over 80% in a single trial, a discrimination between those ERPs has not been considered. In this research we tried to clarify the discriminability of four-class ERP-based BCI target data elicited by desk, seal, spider images and letter intensifications. A conventional self organizing map (SOM) and newly proposed discriminant space SOM (ds-SOM) were applied, then the discriminabilites were visualized. We also classify all pairs of those ERPs by stepwise linear discriminant analysis (SWLDA) and verify the visualization of discriminabilities. As a result, the ds-SOM showed understandable visualization of the data with a shorter computational time than the traditional SOM. We also confirmed the clear boundary between the letter cluster and the other clusters. The result was coherent with the classification performances by SWLDA. The method might be helpful not only for developing a new BCI paradigm, but also for the big data analysis.
Alam, Zaid; Peddinti, Gopal
2017-01-01
Abstract The advent of polypharmacology paradigm in drug discovery calls for novel chemoinformatic tools for analyzing compounds’ multi-targeting activities. Such tools should provide an intuitive representation of the chemical space through capturing and visualizing underlying patterns of compound similarities linked to their polypharmacological effects. Most of the existing compound-centric chemoinformatics tools lack interactive options and user interfaces that are critical for the real-time needs of chemical biologists carrying out compound screening experiments. Toward that end, we introduce C-SPADE, an open-source exploratory web-tool for interactive analysis and visualization of drug profiling assays (biochemical, cell-based or cell-free) using compound-centric similarity clustering. C-SPADE allows the users to visually map the chemical diversity of a screening panel, explore investigational compounds in terms of their similarity to the screening panel, perform polypharmacological analyses and guide drug-target interaction predictions. C-SPADE requires only the raw drug profiling data as input, and it automatically retrieves the structural information and constructs the compound clusters in real-time, thereby reducing the time required for manual analysis in drug development or repurposing applications. The web-tool provides a customizable visual workspace that can either be downloaded as figure or Newick tree file or shared as a hyperlink with other users. C-SPADE is freely available at http://cspade.fimm.fi/. PMID:28472495
An Analysis of Category Management of Service Contracts
2017-12-01
management teams a way to make informed , data-driven decisions. Data-driven decisions derived from clustering not only align with Category...savings. Furthermore, this methodology provides a data-driven visualization to inform sound business decisions on potential Category Management ...Category Management initiatives. The Maptitude software will allow future research to collect data and develop visualizations to inform Category
Beluga whale, Delphinapterus leucas, vocalizations from the Churchill River, Manitoba, Canada.
Chmelnitsky, Elly G; Ferguson, Steven H
2012-06-01
Classification of animal vocalizations is often done by a human observer using aural and visual analysis but more efficient, automated methods have also been utilized to reduce bias and increase reproducibility. Beluga whale, Delphinapterus leucas, calls were described from recordings collected in the summers of 2006-2008, in the Churchill River, Manitoba. Calls (n=706) were classified based on aural and visual analysis, and call characteristics were measured; calls were separated into 453 whistles (64.2%; 22 types), 183 pulsed∕noisy calls (25.9%; 15 types), and 70 combined calls (9.9%; seven types). Measured parameters varied within each call type but less variation existed in pulsed and noisy call types and some combined call types than in whistles. A more efficient and repeatable hierarchical clustering method was applied to 200 randomly chosen whistles using six call characteristics as variables; twelve groups were identified. Call characteristics varied less in cluster analysis groups than in whistle types described by visual and aural analysis and results were similar to the whistle contours described. This study provided the first description of beluga calls in Hudson Bay and using two methods provides more robust interpretations and an assessment of appropriate methods for future studies.
A scoping review of spatial cluster analysis techniques for point-event data.
Fritz, Charles E; Schuurman, Nadine; Robertson, Colin; Lear, Scott
2013-05-01
Spatial cluster analysis is a uniquely interdisciplinary endeavour, and so it is important to communicate and disseminate ideas, innovations, best practices and challenges across practitioners, applied epidemiology researchers and spatial statisticians. In this research we conducted a scoping review to systematically search peer-reviewed journal databases for research that has employed spatial cluster analysis methods on individual-level, address location, or x and y coordinate derived data. To illustrate the thematic issues raised by our results, methods were tested using a dataset where known clusters existed. Point pattern methods, spatial clustering and cluster detection tests, and a locally weighted spatial regression model were most commonly used for individual-level, address location data (n = 29). The spatial scan statistic was the most popular method for address location data (n = 19). Six themes were identified relating to the application of spatial cluster analysis methods and subsequent analyses, which we recommend researchers to consider; exploratory analysis, visualization, spatial resolution, aetiology, scale and spatial weights. It is our intention that researchers seeking direction for using spatial cluster analysis methods, consider the caveats and strengths of each approach, but also explore the numerous other methods available for this type of analysis. Applied spatial epidemiology researchers and practitioners should give special consideration to applying multiple tests to a dataset. Future research should focus on developing frameworks for selecting appropriate methods and the corresponding spatial weighting schemes.
Computational gene expression profiling under salt stress reveals patterns of co-expression
Sanchita; Sharma, Ashok
2016-01-01
Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411
A relational structure of voluntary visual-attention abilities
Skogsberg, KatieAnn; Grabowecky, Marcia; Wilt, Joshua; Revelle, William; Iordanescu, Lucica; Suzuki, Satoru
2015-01-01
Many studies have examined attention mechanisms involved in specific behavioral tasks (e.g., search, tracking, distractor inhibition). However, relatively little is known about the relationships among those attention mechanisms. Is there a fundamental attention faculty that makes a person superior or inferior at most types of attention tasks, or do relatively independent processes mediate different attention skills? We focused on individual differences in voluntary visual-attention abilities using a battery of eleven representative tasks. An application of parallel analysis, hierarchical-cluster analysis, and multidimensional scaling to the inter-task correlation matrix revealed four functional clusters, representing spatiotemporal attention, global attention, transient attention, and sustained attention, organized along two dimensions, one contrasting spatiotemporal and global attention and the other contrasting transient and sustained attention. Comparison with the neuroscience literature suggests that the spatiotemporal-global dimension corresponds to the dorsal frontoparietal circuit and the transient-sustained dimension corresponds to the ventral frontoparietal circuit, with distinct sub-regions mediating the separate clusters within each dimension. We also obtained highly specific patterns of gender difference, and of deficits for college students with elevated ADHD traits. These group differences suggest that different mechanisms of voluntary visual attention can be selectively strengthened or weakened based on genetic, experiential, and/or pathological factors. PMID:25867505
Scalable Visual Analytics of Massive Textual Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.
2007-04-01
This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
Alexander, Nathan; Woetzel, Nils; Meiler, Jens
2011-02-01
Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks
Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip
2013-01-01
Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
Aura in Cluster Headache: A Cross-Sectional Study.
de Coo, Ilse F; Wilbrink, Leopoldine A; Ie, Gaby D; Haan, Joost; Ferrari, Michel D
2018-06-22
Aura symptoms have been reported in up to 23% of cluster headache patients, but it is not known whether clinical characteristics are different in participants with and without aura. Using validated web-based questionnaires we assessed the presence and characteristics of attack-related aura and other clinical features in 629 subjects available for analysis from an initial cohort of 756 cluster headache subjects. Participants who screened positive for aura were contacted by telephone for confirmation of the ICHD-III criteria for aura. Typical aura symptoms before or during cluster headache attacks were found in 44/629 participants (7.0%) mainly involving visual symptoms (61.4%). Except for lower alcohol consumption and higher prevalence of frontal pain in participants with aura, no differences in clinical characteristics were found compared with participants without aura. At least 7.0% of the participants with cluster headache in our large cohort reported typical aura symptoms, which most often involved visual symptoms. No major clinical differences were found between participants with and without aura. © 2018 The Authors. Headache: The Journal of Head and Face Pain published by Wiley Periodicals, Inc. on behalf of American Headache Society.
ERIC Educational Resources Information Center
Kaya, Deniz
2017-01-01
The purpose of the study is to perform a less-dimensional thorough visualization process for the purpose of determining the images of the students on the concept of angle. The Ward clustering analysis combined with Self-Organizing Neural Network Map (SOM) has been used for the dimension process. The Conceptual Understanding Tool, which consisted…
Comparative analysis on the selection of number of clusters in community detection
NASA Astrophysics Data System (ADS)
Kawamoto, Tatsuro; Kabashima, Yoshiyuki
2018-02-01
We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.
A framework for graph-based synthesis, analysis, and visualization of HPC cluster job data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mayo, Jackson R.; Kegelmeyer, W. Philip, Jr.; Wong, Matthew H.
The monitoring and system analysis of high performance computing (HPC) clusters is of increasing importance to the HPC community. Analysis of HPC job data can be used to characterize system usage and diagnose and examine failure modes and their effects. This analysis is not straightforward, however, due to the complex relationships that exist between jobs. These relationships are based on a number of factors, including shared compute nodes between jobs, proximity of jobs in time, etc. Graph-based techniques represent an approach that is particularly well suited to this problem, and provide an effective technique for discovering important relationships in jobmore » queuing and execution data. The efficacy of these techniques is rooted in the use of a semantic graph as a knowledge representation tool. In a semantic graph job data, represented in a combination of numerical and textual forms, can be flexibly processed into edges, with corresponding weights, expressing relationships between jobs, nodes, users, and other relevant entities. This graph-based representation permits formal manipulation by a number of analysis algorithms. This report presents a methodology and software implementation that leverages semantic graph-based techniques for the system-level monitoring and analysis of HPC clusters based on job queuing and execution data. Ontology development and graph synthesis is discussed with respect to the domain of HPC job data. The framework developed automates the synthesis of graphs from a database of job information. It also provides a front end, enabling visualization of the synthesized graphs. Additionally, an analysis engine is incorporated that provides performance analysis, graph-based clustering, and failure prediction capabilities for HPC systems.« less
Generic Space Science Visualization in 2D/3D using SDDAS
NASA Astrophysics Data System (ADS)
Mukherjee, J.; Murphy, Z. B.; Gonzalez, C. A.; Muller, M.; Ybarra, S.
2017-12-01
The Southwest Data Display and Analysis System (SDDAS) is a flexible multi-mission / multi-instrument software system intended to support space physics data analysis, and has been in active development for over 20 years. For the Magnetospheric Multi-Scale (MMS), Juno, Cluster, and Mars Express missions, we have modified these generic tools for visualizing data in two and three dimensions. The SDDAS software is open source and makes use of various other open source packages, including VTK and Qwt. The software offers interactive plotting as well as a Python and Lua module to modify the data before plotting. In theory, by writing a Lua or Python module to read the data, any data could be used. Currently, the software can natively read data in IDFS, CEF, CDF, FITS, SEG-Y, ASCII, and XLS formats. We have integrated the software with other Python packages such as SPICE and SpacePy. Included with the visualization software is a database application and other utilities for managing data that can retrieve data from the Cluster Active Archive and Space Physics Data Facility at Goddard, as well as other local archives. Line plots, spectrograms, geographic, volume plots, strip charts, etc. are just some of the types of plots one can generate with SDDAS. Furthermore, due to the design, output is not limited to strictly visualization as SDDAS can also be used to generate stand-alone IDL or Python visualization code.. Lastly, SDDAS has been successfully used as a backend for several web based analysis systems as well.
Bruno, Andrew E.; Ruby, Amanda M.; Luft, Joseph R.; Grant, Thomas D.; Seetharaman, Jayaraman; Montelione, Gaetano T.; Hunt, John F.; Snell, Edward H.
2014-01-01
Many bioscience fields employ high-throughput methods to screen multiple biochemical conditions. The analysis of these becomes tedious without a degree of automation. Crystallization, a rate limiting step in biological X-ray crystallography, is one of these fields. Screening of multiple potential crystallization conditions (cocktails) is the most effective method of probing a proteins phase diagram and guiding crystallization but the interpretation of results can be time-consuming. To aid this empirical approach a cocktail distance coefficient was developed to quantitatively compare macromolecule crystallization conditions and outcome. These coefficients were evaluated against an existing similarity metric developed for crystallization, the C6 metric, using both virtual crystallization screens and by comparison of two related 1,536-cocktail high-throughput crystallization screens. Hierarchical clustering was employed to visualize one of these screens and the crystallization results from an exopolyphosphatase-related protein from Bacteroides fragilis, (BfR192) overlaid on this clustering. This demonstrated a strong correlation between certain chemically related clusters and crystal lead conditions. While this analysis was not used to guide the initial crystallization optimization, it led to the re-evaluation of unexplained peaks in the electron density map of the protein and to the insertion and correct placement of sodium, potassium and phosphate atoms in the structure. With these in place, the resulting structure of the putative active site demonstrated features consistent with active sites of other phosphatases which are involved in binding the phosphoryl moieties of nucleotide triphosphates. The new distance coefficient, CDcoeff, appears to be robust in this application, and coupled with hierarchical clustering and the overlay of crystallization outcome, reveals information of biological relevance. While tested with a single example the potential applications related to crystallography appear promising and the distance coefficient, clustering, and hierarchal visualization of results undoubtedly have applications in wider fields. PMID:24971458
Serial and semantic encoding of lists of words in schizophrenia patients with visual hallucinations.
Brébion, Gildas; Ohlsen, Ruth I; Pilowsky, Lyn S; David, Anthony S
2011-03-30
Previous research has suggested that visual hallucinations in schizophrenia are associated with abnormal salience of visual mental images. Since visual imagery is used as a mnemonic strategy to learn lists of words, increased visual imagery might impede the other commonly used strategies of serial and semantic encoding. We had previously published data on the serial and semantic strategies implemented by patients when learning lists of concrete words with different levels of semantic organisation (Brébion et al., 2004). In this paper we present a re-analysis of these data, aiming at investigating the associations between learning strategies and visual hallucinations. Results show that the patients with visual hallucinations presented less serial clustering in the non-organisable list than the other patients. In the semantically organisable list with typical instances, they presented both less serial and less semantic clustering than the other patients. Thus, patients with visual hallucinations demonstrate reduced use of serial and semantic encoding in the lists made up of fairly familiar concrete words, which enable the formation of mental images. Although these results are preliminary, we propose that this different processing of the lists stems from the abnormal salience of the mental images such patients experience from the word stimuli. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Clustervision: Visual Supervision of Unsupervised Clustering.
Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam
2018-01-01
Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Attention Dysfunction Subtypes of Developmental Dyslexia
Lewandowska, Monika; Milner, Rafał; Ganc, Małgorzata; Włodarczyk, Elżbieta; Skarżyński, Henryk
2014-01-01
Background Previous studies indicate that many different aspects of attention are impaired in children diagnosed with developmental dyslexia (DD). The objective of the present study was to identify cognitive profiles of DD on the basis of attentional test performance. Material/Methods 78 children with DD (30 girls, 48 boys, mean age of 12 years ±8 months) and 32 age- and sex-matched non-dyslexic children (14 girls, 18 boys) were examined using a battery of standardized tests of reading, phonological and attentional processes (alertness, covert shift of attention, divided attention, inhibition, flexibility, vigilance, and visual search). Cluster analysis was used to identify subtypes of DD. Results Dyslexic children showed deficits in alertness, covert shift of attention, divided attention, flexibility, and visual search. Three different subtypes of DD were identified, each characterized by poorer performance on the reading, phonological awareness, and visual search tasks. Additionally, children in cluster no. 1 displayed deficits in flexibility and divided attention. In contrast to non-dyslexic children, cluster no. 2 performed poorer in tasks involving alertness, covert shift of attention, divided attention, and vigilance. Cluster no. 3 showed impaired covert shift of attention. Conclusions These results indicate different patterns of attentional impairments in dyslexic children. Remediation programs should address the individual child’s deficit profile. PMID:25387479
Monitoring of changes in cluster structures in water under AC magnetic field
NASA Astrophysics Data System (ADS)
Usanov, A. D.; Ulyanov, S. S.; Ilyukhina, N. S.; Usanov, D. A.
2016-01-01
A fundamental possibility of visualizing cluster structures formed in distilled water by an optical method based on the analysis of dynamic speckle structures is demonstrated. It is shown for the first time that, in contrast to the existing concepts, water clusters can be rather large (up to 200 -m in size), and their lifetime is several tens of seconds. These clusters are found to have an internal spatially inhomogeneous structure, constantly changing in time. The properties of magnetized and non-magnetized water are found to differ significantly. In particular, the number of clusters formed in magnetized water is several times larger than that formed in the same volume of non-magnetized water.
Alerts Analysis and Visualization in Network-based Intrusion Detection Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Dr. Li
2010-08-01
The alerts produced by network-based intrusion detection systems, e.g. Snort, can be difficult for network administrators to efficiently review and respond to due to the enormous number of alerts generated in a short time frame. This work describes how the visualization of raw IDS alert data assists network administrators in understanding the current state of a network and quickens the process of reviewing and responding to intrusion attempts. The project presented in this work consists of three primary components. The first component provides a visual mapping of the network topology that allows the end-user to easily browse clustered alerts. Themore » second component is based on the flocking behavior of birds such that birds tend to follow other birds with similar behaviors. This component allows the end-user to see the clustering process and provides an efficient means for reviewing alert data. The third component discovers and visualizes patterns of multistage attacks by profiling the attacker s behaviors.« less
NASA Astrophysics Data System (ADS)
Hamprecht, Fred A.; Peter, Christine; Daura, Xavier; Thiel, Walter; van Gunsteren, Wilfred F.
2001-02-01
We propose an approach for summarizing the output of long simulations of complex systems, affording a rapid overview and interpretation. First, multidimensional scaling techniques are used in conjunction with dimension reduction methods to obtain a low-dimensional representation of the configuration space explored by the system. A nonparametric estimate of the density of states in this subspace is then obtained using kernel methods. The free energy surface is calculated from that density, and the configurations produced in the simulation are then clustered according to the topography of that surface, such that all configurations belonging to one local free energy minimum form one class. This topographical cluster analysis is performed using basin spanning trees which we introduce as subgraphs of Delaunay triangulations. Free energy surfaces obtained in dimensions lower than four can be visualized directly using iso-contours and -surfaces. Basin spanning trees also afford a glimpse of higher-dimensional topographies. The procedure is illustrated using molecular dynamics simulations on the reversible folding of peptide analoga. Finally, we emphasize the intimate relation of density estimation techniques to modern enhanced sampling algorithms.
Reduced response cluster size in early visual areas explains the acuity deficit in amblyopia.
Huang, Yufeng; Feng, Lixia; Zhou, Yifeng
2017-05-03
Focal visual stimulation typically results in the activation of a large portion of the early visual cortex. This spread of activity is attributed to long-range lateral interactions. Such long-range interactions may serve to stabilize a visual representation or to simply modulate incoming signals, and any associated dysfunction in long-range activation may reduce sensitivity to visual information in conditions such as amblyopia. We sought to measure the dispersion of cortical activity following local visual stimulation in a group of patients with amblyopia and matched normal. Twenty adult anisometropic amblyopes and 10 normal controls participated in this study. Using a multifocal stimulation, we simultaneously measured cluster sizes to multiple stimulation points in the visual field. We found that the functional MRI (fMRI) response cluster size that corresponded to the fellow eye was significantly larger as opposed to that corresponding to the amblyopic eye and that the fMRI response cluster size at the two more central retinotopic locations correlated with amblyopia acuity deficit. Our results suggest that the amblyopic visual cortex has a diminished long-range communication as evidenced by significantly smaller cluster of activity as measured with fMRI. These results have important implications for models of amblyopia and approaches to treatment.
A new metaphor for projection-based visual analysis and data exploration
NASA Astrophysics Data System (ADS)
Schreck, Tobias; Panse, Christian
2007-01-01
In many important application domains such as Business and Finance, Process Monitoring, and Security, huge and quickly increasing volumes of complex data are collected. Strong efforts are underway developing automatic and interactive analysis tools for mining useful information from these data repositories. Many data analysis algorithms require an appropriate definition of similarity (or distance) between data instances to allow meaningful clustering, classification, and retrieval, among other analysis tasks. Projection-based data visualization is highly interesting (a) for visual discrimination analysis of a data set within a given similarity definition, and (b) for comparative analysis of similarity characteristics of a given data set represented by different similarity definitions. We introduce an intuitive and effective novel approach for projection-based similarity visualization for interactive discrimination analysis, data exploration, and visual evaluation of metric space effectiveness. The approach is based on the convex hull metaphor for visually aggregating sets of points in projected space, and it can be used with a variety of different projection techniques. The effectiveness of the approach is demonstrated by application on two well-known data sets. Statistical evidence supporting the validity of the hull metaphor is presented. We advocate the hull-based approach over the standard symbol-based approach to projection visualization, as it allows a more effective perception of similarity relationships and class distribution characteristics.
Knowledge Management for Command and Control
2004-06-01
interfaces relies on rich visual and conceptual understanding of what is sketched, rather than the pattern-recognition technologies that most systems use...recognizers) required by other approaches. • The underlying conceptual representations that nuSketch uses enable it to serve as a front end to knowledge...constructing enemy-intent hypotheses via mixed visual and conceptual analogies. II.C. Multi-ViewPoint Clustering Analysis (MVP-CA) technology To
Zhang, Jiang; Liu, Qi; Chen, Huafu; Yuan, Zhen; Huang, Jin; Deng, Lihua; Lu, Fengmei; Zhang, Junpeng; Wang, Yuqing; Wang, Mingwen; Chen, Liangyin
2015-01-01
Clustering analysis methods have been widely applied to identifying the functional brain networks of a multitask paradigm. However, the previously used clustering analysis techniques are computationally expensive and thus impractical for clinical applications. In this study a novel method, called SOM-SAPC that combines self-organizing mapping (SOM) and supervised affinity propagation clustering (SAPC), is proposed and implemented to identify the motor execution (ME) and motor imagery (MI) networks. In SOM-SAPC, SOM was first performed to process fMRI data and SAPC is further utilized for clustering the patterns of functional networks. As a result, SOM-SAPC is able to significantly reduce the computational cost for brain network analysis. Simulation and clinical tests involving ME and MI were conducted based on SOM-SAPC, and the analysis results indicated that functional brain networks were clearly identified with different response patterns and reduced computational cost. In particular, three activation clusters were clearly revealed, which include parts of the visual, ME and MI functional networks. These findings validated that SOM-SAPC is an effective and robust method to analyze the fMRI data with multitasks.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters
Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard
2011-01-01
The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures – results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. PMID:22144874
Comparative microbial modules resource: generation and visualization of multi-species biclusters.
Kacmarczyk, Thadeous; Waltman, Peter; Bate, Ashley; Eichenberger, Patrick; Bonneau, Richard
2011-12-01
The increasing abundance of large-scale, high-throughput datasets for many closely related organisms provides opportunities for comparative analysis via the simultaneous biclustering of datasets from multiple species. These analyses require a reformulation of how to organize multi-species datasets and visualize comparative genomics data analyses results. Recently, we developed a method, multi-species cMonkey, which integrates heterogeneous high-throughput datatypes from multiple species to identify conserved regulatory modules. Here we present an integrated data visualization system, built upon the Gaggle, enabling exploration of our method's results (available at http://meatwad.bio.nyu.edu/cmmr.html). The system can also be used to explore other comparative genomics datasets and outputs from other data analysis procedures - results from other multiple-species clustering programs or from independent clustering of different single-species datasets. We provide an example use of our system for two bacteria, Escherichia coli and Salmonella Typhimurium. We illustrate the use of our system by exploring conserved biclusters involved in nitrogen metabolism, uncovering a putative function for yjjI, a currently uncharacterized gene that we predict to be involved in nitrogen assimilation. © 2011 Kacmarczyk et al.
A scheme for racquet sports video analysis with the combination of audio-visual information
NASA Astrophysics Data System (ADS)
Xing, Liyuan; Ye, Qixiang; Zhang, Weigang; Huang, Qingming; Yu, Hua
2005-07-01
As a very important category in sports video, racquet sports video, e.g. table tennis, tennis and badminton, has been paid little attention in the past years. Considering the characteristics of this kind of sports video, we propose a new scheme for structure indexing and highlight generating based on the combination of audio and visual information. Firstly, a supervised classification method is employed to detect important audio symbols including impact (ball hit), audience cheers, commentator speech, etc. Meanwhile an unsupervised algorithm is proposed to group video shots into various clusters. Then, by taking advantage of temporal relationship between audio and visual signals, we can specify the scene clusters with semantic labels including rally scenes and break scenes. Thirdly, a refinement procedure is developed to reduce false rally scenes by further audio analysis. Finally, an exciting model is proposed to rank the detected rally scenes from which many exciting video clips such as game (match) points can be correctly retrieved. Experiments on two types of representative racquet sports video, table tennis video and tennis video, demonstrate encouraging results.
Visualizing the Structure of Medical Informatics Using Term Co-Occurrence Analysis.
ERIC Educational Resources Information Center
Morris, Theodore Allan
2000-01-01
Examines the structure of medical informatics and the relationship between biomedicine and information science and information technology. Uses co-occurrence analysis of subject headings assigned to items indexed for MEDLINE as well as multidimensional scaling to show seven to eight broad multidisciplinary subject clusters. (Contains 28…
Advanced Cyber Attack Modeling Analysis and Visualization
2010-03-01
Graph Analysis Network Web Logs Netflow Data TCP Dump Data System Logs Detect Protect Security Management What-If Figure 8. TVA attack graphs for...Clustered Graphs,” in Proceedings of the Symposium on Graph Drawing, September 1996. [25] K. Lakkaraju, W. Yurcik, A. Lee, “NVisionIP: NetFlow
Ye, Weimin; Robbins, R. T.
2004-01-01
Hierarchical cluster analysis based on female morphometric character means including body length, distance from vulva opening to anterior end, head width, odontostyle length, esophagus length, body width, tail length, and tail width were used to examine the morphometric relationships and create dendrograms for (i) 62 populations belonging to 9 Longidorus species from Arkansas, (ii) 137 published Longidorus species, and (iii) 137 published Longidorus species plus 86 populations of 16 Longidorus species from Arkansas and various other locations by using JMP 4.02 software (SAS Institute, Cary, NC). Cluster analysis dendograms visually illustrated the grouping and morphometric relationships of the species and populations. It provided a computerized statistical approach to assist by helping to identify and distinguish species, by indicating morphometric relationships among species, and by assisting with new species diagnosis. The preliminary species identification can be accomplished by running cluster analysis for unknown species together with the data matrix of known published Longidorus species. PMID:19262809
Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...
2016-11-29
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krause, Josua; Dasgupta, Aritra; Fekete, Jean-Daniel
Dealing with the curse of dimensionality is a key challenge in high-dimensional data visualization. We present SeekAView to address three main gaps in the existing research literature. First, automated methods like dimensionality reduction or clustering suffer from a lack of transparency in letting analysts interact with their outputs in real-time to suit their exploration strategies. The results often suffer from a lack of interpretability, especially for domain experts not trained in statistics and machine learning. Second, exploratory visualization techniques like scatter plots or parallel coordinates suffer from a lack of visual scalability: it is difficult to present a coherent overviewmore » of interesting combinations of dimensions. Third, the existing techniques do not provide a flexible workflow that allows for multiple perspectives into the analysis process by automatically detecting and suggesting potentially interesting subspaces. In SeekAView we address these issues using suggestion based visual exploration of interesting patterns for building and refining multidimensional subspaces. Compared to the state-of-the-art in subspace search and visualization methods, we achieve higher transparency in showing not only the results of the algorithms, but also interesting dimensions calibrated against different metrics. We integrate a visually scalable design space with an iterative workflow guiding the analysts by choosing the starting points and letting them slice and dice through the data to find interesting subspaces and detect correlations, clusters, and outliers. We present two usage scenarios for demonstrating how SeekAView can be applied in real-world data analysis scenarios.« less
Comparative analysis and visualization of multiple collinear genomes
2012-01-01
Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897
Zhu, Bin; Liu, Jinlin; Fu, Yang; Zhang, Bo; Mao, Ying
2018-04-02
Viral hepatitis, as one of the most serious notifiable infectious diseases in China, takes heavy tolls from the infected and causes a severe economic burden to society, yet few studies have systematically explored the spatio-temporal epidemiology of viral hepatitis in China. This study aims to explore, visualize and compare the epidemiologic trends and spatial changing patterns of different types of viral hepatitis (A, B, C, E and unspecified, based on the classification of CDC) at the provincial level in China. The growth rates of incidence are used and converted to box plots to visualize the epidemiologic trends, with the linear trend being tested by chi-square linear by linear association test. Two complementary spatial cluster methods are used to explore the overall agglomeration level and identify spatial clusters: spatial autocorrelation analysis (measured by global and local Moran's I) and space-time scan analysis. Based on the spatial autocorrelation analysis, the hotspots of hepatitis A remain relatively stable and gradually shrunk, with Yunnan and Sichuan successively moving out the high-high (HH) cluster area. The HH clustering feature of hepatitis B in China gradually disappeared with time. However, the HH cluster area of hepatitis C has gradually moved towards the west, while for hepatitis E, the provincial units around the Yangtze River Delta region have been revealing HH cluster features since 2005. The space-time scan analysis also indicates the distinct spatial changing patterns of different types of viral hepatitis in China. It is easy to conclude that there is no one-size-fits-all plan for the prevention and control of viral hepatitis in all the provincial units. An effective response requires a package of coordinated actions, which should vary across localities regarding the spatial-temporal epidemic dynamics of each type of virus and the specific conditions of each provincial unit.
Key-Node-Separated Graph Clustering and Layouts for Human Relationship Graph Visualization.
Itoh, Takayuki; Klein, Karsten
2015-01-01
Many graph-drawing methods apply node-clustering techniques based on the density of edges to find tightly connected subgraphs and then hierarchically visualize the clustered graphs. However, users may want to focus on important nodes and their connections to groups of other nodes for some applications. For this purpose, it is effective to separately visualize the key nodes detected based on adjacency and attributes of the nodes. This article presents a graph visualization technique for attribute-embedded graphs that applies a graph-clustering algorithm that accounts for the combination of connections and attributes. The graph clustering step divides the nodes according to the commonality of connected nodes and similarity of feature value vectors. It then calculates the distances between arbitrary pairs of clusters according to the number of connecting edges and the similarity of feature value vectors and finally places the clusters based on the distances. Consequently, the technique separates important nodes that have connections to multiple large clusters and improves the visibility of such nodes' connections. To test this technique, this article presents examples with human relationship graph datasets, including a coauthorship and Twitter communication network dataset.
Hebbian self-organizing integrate-and-fire networks for data clustering.
Landis, Florian; Ott, Thomas; Stoop, Ruedi
2010-01-01
We propose a Hebbian learning-based data clustering algorithm using spiking neurons. The algorithm is capable of distinguishing between clusters and noisy background data and finds an arbitrary number of clusters of arbitrary shape. These properties render the approach particularly useful for visual scene segmentation into arbitrarily shaped homogeneous regions. We present several application examples, and in order to highlight the advantages and the weaknesses of our method, we systematically compare the results with those from standard methods such as the k-means and Ward's linkage clustering. The analysis demonstrates that not only the clustering ability of the proposed algorithm is more powerful than those of the two concurrent methods, the time complexity of the method is also more modest than that of its generally used strongest competitor.
MONGKIE: an integrated tool for network analysis and visualization for multi-omics data.
Jang, Yeongjun; Yu, Namhee; Seo, Jihae; Kim, Sun; Lee, Sanghyuk
2016-03-18
Network-based integrative analysis is a powerful technique for extracting biological insights from multilayered omics data such as somatic mutations, copy number variations, and gene expression data. However, integrated analysis of multi-omics data is quite complicated and can hardly be done in an automated way. Thus, a powerful interactive visual mining tool supporting diverse analysis algorithms for identification of driver genes and regulatory modules is much needed. Here, we present a software platform that integrates network visualization with omics data analysis tools seamlessly. The visualization unit supports various options for displaying multi-omics data as well as unique network models for describing sophisticated biological networks such as complex biomolecular reactions. In addition, we implemented diverse in-house algorithms for network analysis including network clustering and over-representation analysis. Novel functions include facile definition and optimized visualization of subgroups, comparison of a series of data sets in an identical network by data-to-visual mapping and subsequent overlaying function, and management of custom interaction networks. Utility of MONGKIE for network-based visual data mining of multi-omics data was demonstrated by analysis of the TCGA glioblastoma data. MONGKIE was developed in Java based on the NetBeans plugin architecture, thus being OS-independent with intrinsic support of module extension by third-party developers. We believe that MONGKIE would be a valuable addition to network analysis software by supporting many unique features and visualization options, especially for analysing multi-omics data sets in cancer and other diseases. .
Goldbaum, Michael H; Jang, Gil-Jin; Bowd, Chris; Hao, Jiucang; Zangwill, Linda M; Liebmann, Jeffrey; Girkin, Christopher; Jung, Tzyy-Ping; Weinreb, Robert N; Sample, Pamela A
2009-12-01
To determine if the patterns uncovered with variational Bayesian-independent component analysis-mixture model (VIM) applied to a large set of normal and glaucomatous fields obtained with the Swedish Interactive Thresholding Algorithm (SITA) are distinct, recognizable, and useful for modeling the severity of the field loss. SITA fields were obtained with the Humphrey Visual Field Analyzer (Carl Zeiss Meditec, Inc, Dublin, California) on 1,146 normal eyes and 939 glaucoma eyes from subjects followed by the Diagnostic Innovations in Glaucoma Study and the African Descent and Glaucoma Evaluation Study. VIM modifies independent component analysis (ICA) to develop separate sets of ICA axes in the cluster of normal fields and the 2 clusters of abnormal fields. Of 360 models, the model with the best separation of normal and glaucomatous fields was chosen for creating the maximally independent axes. Grayscale displays of fields generated by VIM on each axis were compared. SITA fields most closely associated with each axis and displayed in grayscale were evaluated for consistency of pattern at all severities. The best VIM model had 3 clusters. Cluster 1 (1,193) was mostly normal (1,089, 95% specificity) and had 2 axes. Cluster 2 (596) contained mildly abnormal fields (513) and 2 axes; cluster 3 (323) held mostly moderately to severely abnormal fields (322) and 5 axes. Sensitivity for clusters 2 and 3 combined was 88.9%. The VIM-generated field patterns differed from each other and resembled glaucomatous defects (eg, nasal step, arcuate, temporal wedge). SITA fields assigned to an axis resembled each other and the VIM-generated patterns for that axis. Pattern severity increased in the positive direction of each axis by expansion or deepening of the axis pattern. VIM worked well on SITA fields, separating them into distinctly different yet recognizable patterns of glaucomatous field defects. The axis and pattern properties make VIM a good candidate as a preliminary process for detecting progression.
Wen, Haiguang; Shi, Junxing; Chen, Wei; Liu, Zhongming
2018-02-28
The brain represents visual objects with topographic cortical patterns. To address how distributed visual representations enable object categorization, we established predictive encoding models based on a deep residual network, and trained them to predict cortical responses to natural movies. Using this predictive model, we mapped human cortical representations to 64,000 visual objects from 80 categories with high throughput and accuracy. Such representations covered both the ventral and dorsal pathways, reflected multiple levels of object features, and preserved semantic relationships between categories. In the entire visual cortex, object representations were organized into three clusters of categories: biological objects, non-biological objects, and background scenes. In a finer scale specific to each cluster, object representations revealed sub-clusters for further categorization. Such hierarchical clustering of category representations was mostly contributed by cortical representations of object features from middle to high levels. In summary, this study demonstrates a useful computational strategy to characterize the cortical organization and representations of visual features for rapid categorization.
CRAVAT is an easy to use web-based tool for analysis of cancer variants (missense, nonsense, in-frame indel, frameshift indel, splice site). CRAVAT provides scores and a variety of annotations that assist in identification of important variants. Results are provided in an interactive, highly graphical webpage and include annotated 3D structure visualization. CRAVAT is also available for local or cloud-based installation as a Docker container. MuPIT provides 3D visualization of mutation clusters and functional annotation and is now integrated with CRAVAT.
imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.
Grapov, Dmitry; Newman, John W
2012-09-01
Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010).
Wang, Yi; Coleman-Derr, Devin; Chen, Guoping; Gu, Yong Q
2015-07-01
Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mwangi, Benson; Soares, Jair C; Hasan, Khader M
2014-10-30
Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
ETE: a python Environment for Tree Exploration.
Huerta-Cepas, Jaime; Dopazo, Joaquín; Gabaldón, Toni
2010-01-13
Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org.
ETE: a python Environment for Tree Exploration
2010-01-01
Background Many bioinformatics analyses, ranging from gene clustering to phylogenetics, produce hierarchical trees as their main result. These are used to represent the relationships among different biological entities, thus facilitating their analysis and interpretation. A number of standalone programs are available that focus on tree visualization or that perform specific analyses on them. However, such applications are rarely suitable for large-scale surveys, in which a higher level of automation is required. Currently, many genome-wide analyses rely on tree-like data representation and hence there is a growing need for scalable tools to handle tree structures at large scale. Results Here we present the Environment for Tree Exploration (ETE), a python programming toolkit that assists in the automated manipulation, analysis and visualization of hierarchical trees. ETE libraries provide a broad set of tree handling options as well as specific methods to analyze phylogenetic and clustering trees. Among other features, ETE allows for the independent analysis of tree partitions, has support for the extended newick format, provides an integrated node annotation system and permits to link trees to external data such as multiple sequence alignments or numerical arrays. In addition, ETE implements a number of built-in analytical tools, including phylogeny-based orthology prediction and cluster validation techniques. Finally, ETE's programmable tree drawing engine can be used to automate the graphical rendering of trees with customized node-specific visualizations. Conclusions ETE provides a complete set of methods to manipulate tree data structures that extends current functionality in other bioinformatic toolkits of a more general purpose. ETE is free software and can be downloaded from http://ete.cgenomics.org. PMID:20070885
A web portal for hydrodynamical, cosmological simulations
NASA Astrophysics Data System (ADS)
Ragagnin, A.; Dolag, K.; Biffi, V.; Cadolle Bel, M.; Hammer, N. J.; Krukau, A.; Petkova, M.; Steinborn, D.
2017-07-01
This article describes a data centre hosting a web portal for accessing and sharing the output of large, cosmological, hydro-dynamical simulations with a broad scientific community. It also allows users to receive related scientific data products by directly processing the raw simulation data on a remote computing cluster. The data centre has a multi-layer structure: a web portal, a job control layer, a computing cluster and a HPC storage system. The outer layer enables users to choose an object from the simulations. Objects can be selected by visually inspecting 2D maps of the simulation data, by performing highly compounded and elaborated queries or graphically by plotting arbitrary combinations of properties. The user can run analysis tools on a chosen object. These services allow users to run analysis tools on the raw simulation data. The job control layer is responsible for handling and performing the analysis jobs, which are executed on a computing cluster. The innermost layer is formed by a HPC storage system which hosts the large, raw simulation data. The following services are available for the users: (I) CLUSTERINSPECT visualizes properties of member galaxies of a selected galaxy cluster; (II) SIMCUT returns the raw data of a sub-volume around a selected object from a simulation, containing all the original, hydro-dynamical quantities; (III) SMAC creates idealized 2D maps of various, physical quantities and observables of a selected object; (IV) PHOX generates virtual X-ray observations with specifications of various current and upcoming instruments.
Directional virtual backbone based data aggregation scheme for Wireless Visual Sensor Networks.
Zhang, Jing; Liu, Shi-Jian; Tsai, Pei-Wei; Zou, Fu-Min; Ji, Xiao-Rong
2018-01-01
Data gathering is a fundamental task in Wireless Visual Sensor Networks (WVSNs). Features of directional antennas and the visual data make WVSNs more complex than the conventional Wireless Sensor Network (WSN). The virtual backbone is a technique, which is capable of constructing clusters. The version associating with the aggregation operation is also referred to as the virtual backbone tree. In most of the existing literature, the main focus is on the efficiency brought by the construction of clusters that the existing methods neglect local-balance problems in general. To fill up this gap, Directional Virtual Backbone based Data Aggregation Scheme (DVBDAS) for the WVSNs is proposed in this paper. In addition, a measurement called the energy consumption density is proposed for evaluating the adequacy of results in the cluster-based construction problems. Moreover, the directional virtual backbone construction scheme is proposed by considering the local-balanced factor. Furthermore, the associated network coding mechanism is utilized to construct DVBDAS. Finally, both the theoretical analysis of the proposed DVBDAS and the simulations are given for evaluating the performance. The experimental results prove that the proposed DVBDAS achieves higher performance in terms of both the energy preservation and the network lifetime extension than the existing methods.
Ji, Shuiwang
2013-07-11
The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship.
Lew, Timothy F; Vul, Edward
2015-01-01
People seem to compute the ensemble statistics of objects and use this information to support the recall of individual objects in visual working memory. However, there are many different ways that hierarchical structure might be encoded. We examined the format of structured memories by asking subjects to recall the locations of objects arranged in different spatial clustering structures. Consistent with previous investigations of structured visual memory, subjects recalled objects biased toward the center of their clusters. Subjects also recalled locations more accurately when they were arranged in fewer clusters containing more objects, suggesting that subjects used the clustering structure of objects to aid recall. Furthermore, subjects had more difficulty recalling larger relative distances, consistent with subjects encoding the positions of objects relative to clusters and recalling them with magnitude-proportional (Weber) noise. Our results suggest that clustering improved the fidelity of recall by biasing the recall of locations toward cluster centers to compensate for uncertainty and by reducing the magnitude of encoded relative distances.
BioTextQuest: a web-based biomedical text mining suite for concept discovery.
Papanikolaou, Nikolas; Pafilis, Evangelos; Nikolaou, Stavros; Ouzounis, Christos A; Iliopoulos, Ioannis; Promponas, Vasilis J
2011-12-01
BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. http://biotextquest.biol.ucy.ac.cy vprobon@ucy.ac.cy; iliopj@med.uoc.gr Supplementary data are available at Bioinformatics online.
Visualization techniques for computer network defense
NASA Astrophysics Data System (ADS)
Beaver, Justin M.; Steed, Chad A.; Patton, Robert M.; Cui, Xiaohui; Schultz, Matthew
2011-06-01
Effective visual analysis of computer network defense (CND) information is challenging due to the volume and complexity of both the raw and analyzed network data. A typical CND is comprised of multiple niche intrusion detection tools, each of which performs network data analysis and produces a unique alerting output. The state-of-the-practice in the situational awareness of CND data is the prevalent use of custom-developed scripts by Information Technology (IT) professionals to retrieve, organize, and understand potential threat events. We propose a new visual analytics framework, called the Oak Ridge Cyber Analytics (ORCA) system, for CND data that allows an operator to interact with all detection tool outputs simultaneously. Aggregated alert events are presented in multiple coordinated views with timeline, cluster, and swarm model analysis displays. These displays are complemented with both supervised and semi-supervised machine learning classifiers. The intent of the visual analytics framework is to improve CND situational awareness, to enable an analyst to quickly navigate and analyze thousands of detected events, and to combine sophisticated data analysis techniques with interactive visualization such that patterns of anomalous activities may be more easily identified and investigated.
Gomez Baquero, David; Koppel, Kadri; Chambers, Delores; Hołda, Karolina; Głogowski, Robert; Chambers, Edgar
2018-05-23
Sensory analysis of pet foods has been emerging as an important field of study for the pet food industry over the last few decades. Few studies have been conducted on understanding the pet owners’ perception of pet foods. The objective of this study is to gain a deeper understanding on the perception of the visual characteristics of dry dog foods by dog owners in different consumer segments. A total of 120 consumers evaluated the appearance of 30 dry dog food samples with varying visual characteristics. The consumers rated the acceptance of the samples and associated each one with a list of positive and negative beliefs. Cluster Analysis, ANOVA and Correspondence Analysis were used to analyze the consumer responses. The acceptability of the appearance of dry dog foods was affected by the number of different kibbles present, color(s), shape(s), and size(s) of the kibbles in the product. Three consumer clusters were identified. Consumers rated highest single-kibble samples of medium sizes, traditional shapes, and brown colors. Participants disliked extra-small or extra-large kibble sizes, shapes with high-dimensional contrast, and kibbles of light brown color. These findings can help dry dog food manufacturers to meet consumers’ needs with increasing benefits to the pet food and commodity industries.
Slow-Learner, Average, and Gifted Third Graders: Strategy Analysis and Training for Learning
ERIC Educational Resources Information Center
Friedrich, Douglas
1974-01-01
Experimentally induced rehearsal and clustering strategies facilitated the performance of slow-learner, average, and gifted third graders on a visual short-term memory task. Self-pacing was superior to experimenter pacing of successive object presentation. (Author)
RSAT 2015: Regulatory Sequence Analysis Tools
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-01-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Toyoda, Hiromitsu; Takahashi, Shinji; Hoshino, Masatoshi; Takayama, Kazushi; Iseki, Kazumichi; Sasaoka, Ryuichi; Tsujio, Tadao; Yasuda, Hiroyuki; Sasaki, Takeharu; Kanematsu, Fumiaki; Kono, Hiroshi; Nakamura, Hiroaki
2017-09-23
This study demonstrated four distinct patterns in the course of back pain after osteoporotic vertebral fracture (OVF). Greater angular instability in the first 6 months after the baseline was one factor affecting back pain after OVF. Understanding the natural course of symptomatic acute OVF is important in deciding the optimal treatment strategy. We used latent class analysis to classify the course of back pain after OVF and identify the risk factors associated with persistent pain. This multicenter cohort study included 218 consecutive patients with ≤ 2-week-old OVFs who were enrolled at 11 institutions. Dynamic x-rays and back pain assessment with a visual analog scale (VAS) were obtained at enrollment and at 1-, 3-, and 6-month follow-ups. The VAS scores were used to characterize patient groups, using hierarchical cluster analysis. VAS for 128 patients was used for hierarchical cluster analysis. Analysis yielded four clusters representing different patterns of back pain progression. Cluster 1 patients (50.8%) had stable, mild pain. Cluster 2 patients (21.1%) started with moderate pain and progressed quickly to very low pain. Patients in cluster 3 (10.9%) had moderate pain that initially improved but worsened after 3 months. Cluster 4 patients (17.2%) had persistent severe pain. Patients in cluster 4 showed significant high baseline pain intensity, higher degree of angular instability, and higher number of previous OVFs, and tended to lack regular exercise. In contrast, patients in cluster 2 had significantly lower baseline VAS and less angular instability. We identified four distinct groups of OVF patients with different patterns of back pain progression. Understanding the course of back pain after OVF may help in its management and contribute to future treatment trials.
Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N
2017-01-04
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Hsu, Chi-Lin; Chou, Chih-Hsuan; Huang, Shih-Chuan; Lin, Chia-Yi; Lin, Meng-Ying; Tung, Chun-Che; Lin, Chun-Yen; Lai, Ivan Pochou; Zou, Yan-Fang; Youngson, Neil A; Lin, Shau-Ping; Yang, Chang-Hao; Chen, Shih-Kuo; Gau, Susan Shur-Fen; Huang, Hsien-Sung
2018-03-15
Visual system development is light-experience dependent, which strongly implicates epigenetic mechanisms in light-regulated maturation. Among many epigenetic processes, genomic imprinting is an epigenetic mechanism through which monoallelic gene expression occurs in a parent-of-origin-specific manner. It is unknown if genomic imprinting contributes to visual system development. We profiled the transcriptome and imprintome during critical periods of mouse visual system development under normal- and dark-rearing conditions using B6/CAST F1 hybrid mice. We identified experience-regulated, isoform-specific and brain-region-specific imprinted genes. We also found imprinted microRNAs were predominantly clustered into the Dlk1-Dio3 imprinted locus with light experience affecting some imprinted miRNA expression. Our findings provide the first comprehensive analysis of light-experience regulation of the transcriptome and imprintome during critical periods of visual system development. Our results may contribute to therapeutic strategies for visual impairments and circadian rhythm disorders resulting from a dysfunctional imprintome.
Time-Hierarchical Clustering and Visualization of Weather Forecast Ensembles.
Ferstl, Florian; Kanzler, Mathias; Rautenhaus, Marc; Westermann, Rudiger
2017-01-01
We propose a new approach for analyzing the temporal growth of the uncertainty in ensembles of weather forecasts which are started from perturbed but similar initial conditions. As an alternative to traditional approaches in meteorology, which use juxtaposition and animation of spaghetti plots of iso-contours, we make use of contour clustering and provide means to encode forecast dynamics and spread in one single visualization. Based on a given ensemble clustering in a specified time window, we merge clusters in time-reversed order to indicate when and where forecast trajectories start to diverge. We present and compare different visualizations of the resulting time-hierarchical grouping, including space-time surfaces built by connecting cluster representatives over time, and stacked contour variability plots. We demonstrate the effectiveness of our visual encodings with forecast examples of the European Centre for Medium-Range Weather Forecasts, which convey the evolution of specific features in the data as well as the temporally increasing spatial variability.
Chou, A; Burke, J
1999-05-01
DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :
ERIC Educational Resources Information Center
Torrens, Paul M.; Griffin, William A.
2013-01-01
The authors describe an observational and analytic methodology for recording and interpreting dynamic microprocesses that occur during social interaction, making use of space--time data collection techniques, spatial-statistical analysis, and visualization. The scheme has three investigative foci: Structure, Activity Composition, and Clustering.…
Li, Jieyue; Newberg, Justin Y; Uhlén, Mathias; Lundberg, Emma; Murphy, Robert F
2012-01-01
The Human Protein Atlas contains immunofluorescence images showing subcellular locations for thousands of proteins. These are currently annotated by visual inspection. In this paper, we describe automated approaches to analyze the images and their use to improve annotation. We began by training classifiers to recognize the annotated patterns. By ranking proteins according to the confidence of the classifier, we generated a list of proteins that were strong candidates for reexamination. In parallel, we applied hierarchical clustering to group proteins and identified proteins whose annotations were inconsistent with the remainder of the proteins in their cluster. These proteins were reexamined by the original annotators, and a significant fraction had their annotations changed. The results demonstrate that automated approaches can provide an important complement to visual annotation.
High-resolution Self-Organizing Maps for advanced visualization and dimension reduction.
Saraswati, Ayu; Nguyen, Van Tuc; Hagenbuchner, Markus; Tsoi, Ah Chung
2018-05-04
Kohonen's Self Organizing feature Map (SOM) provides an effective way to project high dimensional input features onto a low dimensional display space while preserving the topological relationships among the input features. Recent advances in algorithms that take advantages of modern computing hardware introduced the concept of high resolution SOMs (HRSOMs). This paper investigates the capabilities and applicability of the HRSOM as a visualization tool for cluster analysis and its suitabilities to serve as a pre-processor in ensemble learning models. The evaluation is conducted on a number of established benchmarks and real-world learning problems, namely, the policeman benchmark, two web spam detection problems, a network intrusion detection problem, and a malware detection problem. It is found that the visualization resulted from an HRSOM provides new insights concerning these learning problems. It is furthermore shown empirically that broad benefits from the use of HRSOMs in both clustering and classification problems can be expected. Copyright © 2018 Elsevier Ltd. All rights reserved.
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.
Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K
2013-03-01
Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
Visualizing and Clustering Protein Similarity Networks: Sequences, Structures, and Functions.
Mai, Te-Lun; Hu, Geng-Ming; Chen, Chi-Ming
2016-07-01
Research in the recent decade has demonstrated the usefulness of protein network knowledge in furthering the study of molecular evolution of proteins, understanding the robustness of cells to perturbation, and annotating new protein functions. In this study, we aimed to provide a general clustering approach to visualize the sequence-structure-function relationship of protein networks, and investigate possible causes for inconsistency in the protein classifications based on sequences, structures, and functions. Such visualization of protein networks could facilitate our understanding of the overall relationship among proteins and help researchers comprehend various protein databases. As a demonstration, we clustered 1437 enzymes by their sequences and structures using the minimum span clustering (MSC) method. The general structure of this protein network was delineated at two clustering resolutions, and the second level MSC clustering was found to be highly similar to existing enzyme classifications. The clustering of these enzymes based on sequence, structure, and function information is consistent with each other. For proteases, the Jaccard's similarity coefficient is 0.86 between sequence and function classifications, 0.82 between sequence and structure classifications, and 0.78 between structure and function classifications. From our clustering results, we discussed possible examples of divergent evolution and convergent evolution of enzymes. Our clustering approach provides a panoramic view of the sequence-structure-function network of proteins, helps visualize the relation between related proteins intuitively, and is useful in predicting the structure and function of newly determined protein sequences.
VAAPA: a web platform for visualization and analysis of alternative polyadenylation.
Guan, Jinting; Fu, Jingyi; Wu, Mingcheng; Chen, Longteng; Ji, Guoli; Quinn Li, Qingshun; Wu, Xiaohui
2015-02-01
Polyadenylation [poly(A)] is an essential process during the maturation of most mRNAs in eukaryotes. Alternative polyadenylation (APA) as an important layer of gene expression regulation has been increasingly recognized in various species. Here, a web platform for visualization and analysis of alternative polyadenylation (VAAPA) was developed. This platform can visualize the distribution of poly(A) sites and poly(A) clusters of a gene or a section of a chromosome. It can also highlight genes with switched APA sites among different conditions. VAAPA is an easy-to-use web-based tool that provides functions of poly(A) site query, data uploading, downloading, and APA sites visualization. It was designed in a multi-tier architecture and developed based on Smart GWT (Google Web Toolkit) using Java as the development language. VAAPA will be a valuable addition to the community for the comprehensive study of APA, not only by making the high quality poly(A) site data more accessible, but also by providing users with numerous valuable functions for poly(A) site analysis and visualization. Copyright © 2014 Elsevier Ltd. All rights reserved.
Towards a Web-Enabled Geovisualization and Analytics Platform for the Energy and Water Nexus
NASA Astrophysics Data System (ADS)
Sanyal, J.; Chandola, V.; Sorokine, A.; Allen, M.; Berres, A.; Pang, H.; Karthik, R.; Nugent, P.; McManamay, R.; Stewart, R.; Bhaduri, B. L.
2017-12-01
Interactive data analytics are playing an increasingly vital role in the generation of new, critical insights regarding the complex dynamics of the energy/water nexus (EWN) and its interactions with climate variability and change. Integration of impacts, adaptation, and vulnerability (IAV) science with emerging, and increasingly critical, data science capabilities offers a promising potential to meet the needs of the EWN community. To enable the exploration of pertinent research questions, a web-based geospatial visualization platform is being built that integrates a data analysis toolbox with advanced data fusion and data visualization capabilities to create a knowledge discovery framework for the EWN. The system, when fully built out, will offer several geospatial visualization capabilities including statistical visual analytics, clustering, principal-component analysis, dynamic time warping, support uncertainty visualization and the exploration of data provenance, as well as support machine learning discoveries to render diverse types of geospatial data and facilitate interactive analysis. Key components in the system architecture includes NASA's WebWorldWind, the Globus toolkit, postgresql, as well as other custom built software modules.
imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel
Grapov, Dmitry; Newman, John W.
2012-01-01
Summary: Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Availability and implementation: Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010). Contact: John.Newman@ars.usda.gov Supplementary Information: Installation instructions, tutorials and users manual are available at http://sourceforge.net/projects/imdev/. PMID:22815358
2013-01-01
Background The structured organization of cells in the brain plays a key role in its functional efficiency. This delicate organization is the consequence of unique molecular identity of each cell gradually established by precise spatiotemporal gene expression control during development. Currently, studies on the molecular-structural association are beginning to reveal how the spatiotemporal gene expression patterns are related to cellular differentiation and structural development. Results In this article, we aim at a global, data-driven study of the relationship between gene expressions and neuroanatomy in the developing mouse brain. To enable visual explorations of the high-dimensional data, we map the in situ hybridization gene expression data to a two-dimensional space by preserving both the global and the local structures. Our results show that the developing brain anatomy is largely preserved in the reduced gene expression space. To provide a quantitative analysis, we cluster the reduced data into groups and measure the consistency with neuroanatomy at multiple levels. Our results show that the clusters in the low-dimensional space are more consistent with neuroanatomy than those in the original space. Conclusions Gene expression patterns and developing brain anatomy are closely related. Dimensionality reduction and visual exploration facilitate the study of this relationship. PMID:23845024
The Cosmic Skidmark: witnessing galaxy transformation at z = 0.19
NASA Astrophysics Data System (ADS)
Murphy, David N. A.
2015-02-01
We present an early-look analysis of the ``Cosmic Skidmark''. Discovered following visual inspection of the Geach, Murphy & Bower (2011) SDSS Stripe 82 cluster catalogue generated by ORCA (an automated cluster algorithm searching for red-sequences; Murphy, Geach & Bower 2012), this z = 0.19 1.4L* galaxy appears to have been caught in the rare act of transformation while accreting onto an estimated 1013-1014 h -1 M⊙-mass galaxy group. SDSS spectroscopy reveals clear signatures of star formation whilst deep optical imaging reveals a pronounced 50 kpc cometary tail. Pending completion of our ALMA Cycle 2 and IFU observations, we show here preliminary analysis of this target.
NASA Astrophysics Data System (ADS)
DiNuzzo, Mauro; Mascali, Daniele; Moraschi, Marta; Bussu, Giorgia; Maraviglia, Bruno; Mangia, Silvia; Giove, Federico
2017-02-01
Time-domain analysis of blood-oxygenation level-dependent (BOLD) signals allows the identification of clusters of voxels responding to photic stimulation in primary visual cortex (V1). However, the characterization of information encoding into temporal properties of the BOLD signals of an activated cluster is poorly investigated. Here, we used Shannon entropy to determine spatial and temporal information encoding in the BOLD signal within the most strongly activated area of the human visual cortex during a hemifield photic stimulation. We determined the distribution profile of BOLD signals during epochs at rest and under stimulation within small (19-121 voxels) clusters designed to include only voxels driven by the stimulus as highly and uniformly as possible. We found consistent and significant increases (2-4% on average) in temporal information entropy during activation in contralateral but not ipsilateral V1, which was mirrored by an expected loss of spatial information entropy. These opposite changes coexisted with increases in both spatial and temporal mutual information (i.e. dependence) in contralateral V1. Thus, we showed that the first cortical stage of visual processing is characterized by a specific spatiotemporal rearrangement of intracluster BOLD responses. Our results indicate that while in the space domain BOLD maps may be incapable of capturing the functional specialization of small neuronal populations due to relatively low spatial resolution, some information encoding may still be revealed in the temporal domain by an increase of temporal information entropy.
Visualization Techniques for Computer Network Defense
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beaver, Justin M; Steed, Chad A; Patton, Robert M
2011-01-01
Effective visual analysis of computer network defense (CND) information is challenging due to the volume and complexity of both the raw and analyzed network data. A typical CND is comprised of multiple niche intrusion detection tools, each of which performs network data analysis and produces a unique alerting output. The state-of-the-practice in the situational awareness of CND data is the prevalent use of custom-developed scripts by Information Technology (IT) professionals to retrieve, organize, and understand potential threat events. We propose a new visual analytics framework, called the Oak Ridge Cyber Analytics (ORCA) system, for CND data that allows an operatormore » to interact with all detection tool outputs simultaneously. Aggregated alert events are presented in multiple coordinated views with timeline, cluster, and swarm model analysis displays. These displays are complemented with both supervised and semi-supervised machine learning classifiers. The intent of the visual analytics framework is to improve CND situational awareness, to enable an analyst to quickly navigate and analyze thousands of detected events, and to combine sophisticated data analysis techniques with interactive visualization such that patterns of anomalous activities may be more easily identified and investigated.« less
A Unified Air-Sea Visualization System: Survey on Gridding Structures
NASA Technical Reports Server (NTRS)
Anand, Harsh; Moorhead, Robert
1995-01-01
The goal is to develop a Unified Air-Sea Visualization System (UASVS) to enable the rapid fusion of observational, archival, and model data for verification and analysis. To design and develop UASVS, modelers were polled to determine the gridding structures and visualization systems used, and their needs with respect to visual analysis. A basic UASVS requirement is to allow a modeler to explore multiple data sets within a single environment, or to interpolate multiple datasets onto one unified grid. From this survey, the UASVS should be able to visualize 3D scalar/vector fields; render isosurfaces; visualize arbitrary slices of the 3D data; visualize data defined on spectral element grids with the minimum number of interpolation stages; render contours; produce 3D vector plots and streamlines; provide unified visualization of satellite images, observations and model output overlays; display the visualization on a projection of the users choice; implement functions so the user can derive diagnostic values; animate the data to see the time-evolution; animate ocean and atmosphere at different rates; store the record of cursor movement, smooth the path, and animate a window around the moving path; repeatedly start and stop the visual time-stepping; generate VHS tape animations; work on a variety of workstations; and allow visualization across clusters of workstations and scalable high performance computer systems.
Pienaar, A E; Barhorst, R; Twisk, J W R
2014-05-01
Perceptual-motor skills contribute to a variety of basic learning skills associated with normal academic success. This study aimed to determine the relationship between academic performance and perceptual-motor skills in first grade South African learners and whether low SES (socio-economic status) school type plays a role in such a relationship. This cross-sectional study of the baseline measurements of the NW-CHILD longitudinal study included a stratified random sample of first grade learners (n = 812; 418 boys and 394 boys), with a mean age of 6.78 years ± 0.49 living in the North West Province (NW) of South Africa. The Beery-Buktenica Developmental Test of Visual-Motor Integration-4 (VMI) was used to assess visual-motor integration, visual perception and hand control while the Bruininks Oseretsky Test of Motor Proficiency, short form (BOT2-SF) assessed overall motor proficiency. Academic performance in math, reading and writing was assessed with the Mastery of Basic Learning Areas Questionnaire. Linear mixed models analysis was performed with spss to determine possible differences between the different VMI and BOT2-SF standard scores in different math, reading and writing mastery categories ranging from no mastery to outstanding mastery. A multinomial multilevel logistic regression analysis was performed to assess the relationship between a clustered score of academic performance and the different determinants. A strong relationship was established between academic performance and VMI, visual perception, hand control and motor proficiency with a significant relationship between a clustered academic performance score, visual-motor integration and visual perception. A negative association was established between low SES school types on academic performance, with a common perceptual motor foundation shared by all basic learning areas. Visual-motor integration, visual perception, hand control and motor proficiency are closely related to basic academic skills required in the first formal school year, especially among learners in low SES type schools. © 2013 John Wiley & Sons Ltd.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.
Zhang, Wen-Ran
2003-01-01
Bipolar logic, bipolar sets, and equilibrium relations are proposed for bipolar cognitive mapping and visualization in online analytical processing (OLAP) and online analytical mining (OLAM). As cognitive models, cognitive maps (CMs) hold great potential for clustering and visualization. Due to the lack of a formal mathematical basis, however, CM-based OLAP and OLAM have not gained popularity. Compared with existing approaches, bipolar cognitive mapping has a number of advantages. First, bipolar CMs are formal logical models as well as cognitive models. Second, equilibrium relations (with polarized reflexivity, symmetry, and transitivity), as bipolar generalizations and fusions of equivalence relations, provide a theoretical basis for bipolar visualization and coordination. Third, an equilibrium relation or CM induces bipolar partitions that distinguish disjoint coalition subsets not involved in any conflict, disjoint coalition subsets involved in a conflict, disjoint conflict subsets, and disjoint harmony subsets. Finally, equilibrium energy analysis leads to harmony and stability measures for strategic decision and multiagent coordination. Thus, this work bridges a gap for CM-based clustering and visualization in OLAP and OLAM. Basic ideas are illustrated with example CMs in international relations.
Molecular subtyping of bladder cancer using Kohonen self-organizing maps
Borkowska, Edyta M; Kruk, Andrzej; Jedrzejczyk, Adam; Rozniecki, Marek; Jablonowski, Zbigniew; Traczyk, Magdalena; Constantinou, Maria; Banaszkiewicz, Monika; Pietrusinski, Michal; Sosnowski, Marek; Hamdy, Freddie C; Peter, Stefan; Catto, James WF; Kaluzewski, Bogdan
2014-01-01
Kohonen self-organizing maps (SOMs) are unsupervised Artificial Neural Networks (ANNs) that are good for low-density data visualization. They easily deal with complex and nonlinear relationships between variables. We evaluated molecular events that characterize high- and low-grade BC pathways in the tumors from 104 patients. We compared the ability of statistical clustering with a SOM to stratify tumors according to the risk of progression to more advanced disease. In univariable analysis, tumor stage (log rank P = 0.006) and grade (P < 0.001), HPV DNA (P < 0.004), Chromosome 9 loss (P = 0.04) and the A148T polymorphism (rs 3731249) in CDKN2A (P = 0.02) were associated with progression. Multivariable analysis of these parameters identified that tumor grade (Cox regression, P = 0.001, OR.2.9 (95% CI 1.6–5.2)) and the presence of HPV DNA (P = 0.017, OR 3.8 (95% CI 1.3–11.4)) were the only independent predictors of progression. Unsupervised hierarchical clustering grouped the tumors into discreet branches but did not stratify according to progression free survival (log rank P = 0.39). These genetic variables were presented to SOM input neurons. SOMs are suitable for complex data integration, allow easy visualization of outcomes, and may stratify BC progression more robustly than hierarchical clustering. PMID:25142434
RSAT 2015: Regulatory Sequence Analysis Tools.
Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques
2015-07-01
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Duque, Ricardo E
2012-04-01
Flow cytometric analysis of cell suspensions involves the sequential 'registration' of intrinsic and extrinsic parameters of thousands of cells in list mode files. Thus, it is almost irresistible to describe phenomena in numerical terms or by 'ratios' that have the appearance of 'accuracy' due to the presence of numbers obtained from thousands of cells. The concepts involved in the detection and characterization of B cell lymphoproliferative processes are revisited in this paper by identifying parameters that, when analyzed appropriately, are both necessary and sufficient. The neoplastic process (cluster) can be visualized easily because the parameters that distinguish it form a cluster in multidimensional space that is unique and distinguishable from neighboring clusters that are not of diagnostic interest but serve to provide a background. For B cell neoplasia it is operationally necessary to identify the multidimensional space occupied by a cluster whose kappa:lambda ratio is 100:0 or 0:100. Thus, the concept of kappa:lambda ratio is without meaning and would not detect B cell neoplasia in an unacceptably high number of cases.
Specialized Computer Systems for Environment Visualization
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.
2018-06-01
The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.
A ground truth based comparative study on clustering of gene expression data.
Zhu, Yitan; Wang, Zuyi; Miller, David J; Clarke, Robert; Xuan, Jianhua; Hoffman, Eric P; Wang, Yue
2008-05-01
Given the variety of available clustering methods for gene expression data analysis, it is important to develop an appropriate and rigorous validation scheme to assess the performance and limitations of the most widely used clustering algorithms. In this paper, we present a ground truth based comparative study on the functionality, accuracy, and stability of five data clustering methods, namely hierarchical clustering, K-means clustering, self-organizing maps, standard finite normal mixture fitting, and a caBIG toolkit (VIsual Statistical Data Analyzer--VISDA), tested on sample clustering of seven published microarray gene expression datasets and one synthetic dataset. We examined the performance of these algorithms in both data-sufficient and data-insufficient cases using quantitative performance measures, including cluster number detection accuracy and mean and standard deviation of partition accuracy. The experimental results showed that VISDA, an interactive coarse-to-fine maximum likelihood fitting algorithm, is a solid performer on most of the datasets, while K-means clustering and self-organizing maps optimized by the mean squared compactness criterion generally produce more stable solutions than the other methods.
Determining the trophic guilds of fishes and macroinvertebrates in a seagrass food web
Luczkovich, J.J.; Ward, G.P.; Johnson, J.C.; Christian, R.R.; Baird, D.; Neckles, H.; Rizzo, W.M.
2002-01-01
We established trophic guilds of macroinvertebrate and fish taxa using correspondence analysis and a hierarchical clustering strategy for a seagrass food web in winter in the northeastern Gulf of Mexico. To create the diet matrix, we characterized the trophic linkages of macroinvertebrate and fish taxa present in Halodule wrightii seagrass habitat areas within the St. Marks National Wildlife Refuge (Florida) using binary data, combining dietary links obtained from relevant literature for macroinvertebrates with stomach analysis of common fishes collected during January and February of 1994. Heirarchical average-linkage cluster analysis of the 73 taxa of fishes and macroinvertebrates in the diet matrix yielded 14 clusters with diet similarity ??? 0.60. We then used correspondence analysis with three factors to jointly plot the coordinates of the consumers (identified by cluster membership) and of the 33 food sources. Correspondence analysis served as a visualization tool for assigning each taxon to one of eight trophic guilds: herbivores, detritivores, suspension feeders, omnivores, molluscivores, meiobenthos consumers, macrobenthos consumers, and piscivores. These trophic groups, cross-classified with major taxonomic groups, were further used to develop consumer compartments in a network analysis model of carbon flow in this seagrass ecosystem. The method presented here should greatly improve the development of future network models of food webs by providing an objective procedure for aggregating trophic groups.
NASA Astrophysics Data System (ADS)
Forbes, Angus; Villegas, Javier; Almryde, Kyle R.; Plante, Elena
2014-03-01
In this paper, we present a novel application, 3D+Time Brain View, for the stereoscopic visualization of functional Magnetic Resonance Imaging (fMRI) data gathered from participants exposed to unfamiliar spoken languages. An analysis technique based on Independent Component Analysis (ICA) is used to identify statistically significant clusters of brain activity and their changes over time during different testing sessions. That is, our system illustrates the temporal evolution of participants' brain activity as they are introduced to a foreign language through displaying these clusters as they change over time. The raw fMRI data is presented as a stereoscopic pair in an immersive environment utilizing passive stereo rendering. The clusters are presented using a ray casting technique for volume rendering. Our system incorporates the temporal information and the results of the ICA into the stereoscopic 3D rendering, making it easier for domain experts to explore and analyze the data.
Dinkel, Philipp Johannes; Willmes, Klaus; Krinzinger, Helga; Konrad, Kerstin; Koten Jr, Jan Willem
2013-01-01
FMRI-studies are mostly based on a group study approach, either analyzing one group or comparing multiple groups, or on approaches that correlate brain activation with clinically relevant criteria or behavioral measures. In this study we investigate the potential of fMRI-techniques focusing on individual differences in brain activation within a test-retest reliability context. We employ a single-case analysis approach, which contrasts dyscalculic children with a control group of typically developing children. In a second step, a support-vector machine analysis and cluster analysis techniques served to investigate similarities in multivariate brain activation patterns. Children were confronted with a non-symbolic number comparison and a non-symbolic exact calculation task during fMRI acquisition. Conventional second level group comparison analysis only showed small differences around the angular gyrus bilaterally and the left parieto-occipital sulcus. Analyses based on single-case statistical procedures revealed that developmental dyscalculia is characterized by individual differences predominantly in visual processing areas. Dyscalculic children seemed to compensate for relative under-activation in the primary visual cortex through an upregulation in higher visual areas. However, overlap in deviant activation was low for the dyscalculic children, indicating that developmental dyscalculia is a disorder characterized by heterogeneous brain activation differences. Using support vector machine analysis and cluster analysis, we tried to group dyscalculic and typically developing children according to brain activation. Fronto-parietal systems seem to qualify for a distinction between the two groups. However, this was only effective when reliable brain activations of both tasks were employed simultaneously. Results suggest that deficits in number representation in the visual-parietal cortex get compensated for through finger related aspects of number representation in fronto-parietal cortex. We conclude that dyscalculic children show large individual differences in brain activation patterns. Nonetheless, the majority of dyscalculic children can be differentiated from controls employing brain activation patterns when appropriate methods are used. PMID:24349547
Dinkel, Philipp Johannes; Willmes, Klaus; Krinzinger, Helga; Konrad, Kerstin; Koten, Jan Willem
2013-01-01
FMRI-studies are mostly based on a group study approach, either analyzing one group or comparing multiple groups, or on approaches that correlate brain activation with clinically relevant criteria or behavioral measures. In this study we investigate the potential of fMRI-techniques focusing on individual differences in brain activation within a test-retest reliability context. We employ a single-case analysis approach, which contrasts dyscalculic children with a control group of typically developing children. In a second step, a support-vector machine analysis and cluster analysis techniques served to investigate similarities in multivariate brain activation patterns. Children were confronted with a non-symbolic number comparison and a non-symbolic exact calculation task during fMRI acquisition. Conventional second level group comparison analysis only showed small differences around the angular gyrus bilaterally and the left parieto-occipital sulcus. Analyses based on single-case statistical procedures revealed that developmental dyscalculia is characterized by individual differences predominantly in visual processing areas. Dyscalculic children seemed to compensate for relative under-activation in the primary visual cortex through an upregulation in higher visual areas. However, overlap in deviant activation was low for the dyscalculic children, indicating that developmental dyscalculia is a disorder characterized by heterogeneous brain activation differences. Using support vector machine analysis and cluster analysis, we tried to group dyscalculic and typically developing children according to brain activation. Fronto-parietal systems seem to qualify for a distinction between the two groups. However, this was only effective when reliable brain activations of both tasks were employed simultaneously. Results suggest that deficits in number representation in the visual-parietal cortex get compensated for through finger related aspects of number representation in fronto-parietal cortex. We conclude that dyscalculic children show large individual differences in brain activation patterns. Nonetheless, the majority of dyscalculic children can be differentiated from controls employing brain activation patterns when appropriate methods are used.
Gardeux, Vincent; David, Fabrice P. A.; Shajkofci, Adrian; Schwalie, Petra C.; Deplancke, Bart
2017-01-01
Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. Results We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. Availability and implementation The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. Contact bart.deplancke@epfl.ch Supplementary information Supplementary data are available at Bioinformatics online. PMID:28541377
Gardeux, Vincent; David, Fabrice P A; Shajkofci, Adrian; Schwalie, Petra C; Deplancke, Bart
2017-10-01
Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. bart.deplancke@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
A novel unsupervised spike sorting algorithm for intracranial EEG.
Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R
2011-01-01
This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.
Liao, Fuyuan; Jan, Yih-Kuen
2012-06-01
This paper presents a recurrence network approach for the analysis of skin blood flow dynamics in response to loading pressure. Recurrence is a fundamental property of many dynamical systems, which can be explored in phase spaces constructed from observational time series. A visualization tool of recurrence analysis called recurrence plot (RP) has been proved to be highly effective to detect transitions in the dynamics of the system. However, it was found that delay embedding can produce spurious structures in RPs. Network-based concepts have been applied for the analysis of nonlinear time series recently. We demonstrate that time series with different types of dynamics exhibit distinct global clustering coefficients and distributions of local clustering coefficients and that the global clustering coefficient is robust to the embedding parameters. We applied the approach to study skin blood flow oscillations (BFO) response to loading pressure. The results showed that global clustering coefficients of BFO significantly decreased in response to loading pressure (p<0.01). Moreover, surrogate tests indicated that such a decrease was associated with a loss of nonlinearity of BFO. Our results suggest that the recurrence network approach can practically quantify the nonlinear dynamics of BFO.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rizzi, Silvio; Hereld, Mark; Insley, Joseph
In this work we perform in-situ visualization of molecular dynamics simulations, which can help scientists to visualize simulation output on-the-fly, without incurring storage overheads. We present a case study to couple LAMMPS, the large-scale molecular dynamics simulation code with vl3, our parallel framework for large-scale visualization and analysis. Our motivation is to identify effective approaches for covisualization and exploration of large-scale atomistic simulations at interactive frame rates.We propose a system of coupled libraries and describe its architecture, with an implementation that runs on GPU-based clusters. We present the results of strong and weak scalability experiments, as well as future researchmore » avenues based on our results.« less
Heyers, Dominik; Manns, Martina; Luksch, Harald; Güntürkün, Onur; Mouritsen, Henrik
2007-09-26
The magnetic compass of migratory birds has been suggested to be light-dependent. Retinal cryptochrome-expressing neurons and a forebrain region, "Cluster N", show high neuronal activity when night-migratory songbirds perform magnetic compass orientation. By combining neuronal tracing with behavioral experiments leading to sensory-driven gene expression of the neuronal activity marker ZENK during magnetic compass orientation, we demonstrate a functional neuronal connection between the retinal neurons and Cluster N via the visual thalamus. Thus, the two areas of the central nervous system being most active during magnetic compass orientation are part of an ascending visual processing stream, the thalamofugal pathway. Furthermore, Cluster N seems to be a specialized part of the visual wulst. These findings strongly support the hypothesis that migratory birds use their visual system to perceive the reference compass direction of the geomagnetic field and that migratory birds "see" the reference compass direction provided by the geomagnetic field.
Integrated Efforts for Analysis of Geophysical Measurements and Models.
1997-09-26
12b. DISTRIBUTION CODE 13. ABSTRACT ( Maximum 200 words) This contract supported investigations of integrated applications of physics, ephemerides...REGIONS AND GPS DATA VALIDATIONS 20 2.5 PL-SCINDA: VISUALIZATION AND ANALYSIS TECHNIQUES 22 2.5.1 View Controls 23 2.5.2 Map Selection...and IR data, about cloudy pixels. Clustering and maximum likelihood classification algorithms categorize up to four cloud layers into stratiform or
Danaci, Hasan Fehmi; Cetin-Atalay, Rengul; Atalay, Volkan
2018-03-26
Visualizing large-scale data produced by the high throughput experiments as a biological graph leads to better understanding and analysis. This study describes a customized force-directed layout algorithm, EClerize, for biological graphs that represent pathways in which the nodes are associated with Enzyme Commission (EC) attributes. The nodes with the same EC class numbers are treated as members of the same cluster. Positions of nodes are then determined based on both the biological similarity and the connection structure. EClerize minimizes the intra-cluster distance, that is the distance between the nodes of the same EC cluster and maximizes the inter-cluster distance, that is the distance between two distinct EC clusters. EClerize is tested on a number of biological pathways and the improvement brought in is presented with respect to the original algorithm. EClerize is available as a plug-in to cytoscape ( http://apps.cytoscape.org/apps/eclerize ).
Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.
Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J
2017-12-01
Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the limitations, trade-offs, and considerations for the sensitivities of variables and the biological interpretations of results. The Cluster app is available as a free download for Apple computers at iTunes, with a link to a user guide website.
Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D
2014-01-01
Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.
NASA Astrophysics Data System (ADS)
Huang, W.; Campredon, R.; Abrao, J. J.; Bernat, M.; Latouche, C.
1994-06-01
In the last decade, the Atlantic coast of south-eastern Brazil has been affected by increasing deforestation and anthropogenic effluents. Sediments in the coastal lagoons have recorded the process of such environmental change. Thirty-seven sediment samples from three cores in Piratininga Lagoon, Rio de Janeiro, were analyzed for their major components and minor element concentrations in order to examine geochemical characteristics and the depositional environment and to investigate the variation of heavy metals of environmental concern. Two multivariate analysis methods, principal component analysis and cluster analysis, were performed on the analytical data set to help visualize the sample clusters and the element associations. On the whole, the sediment samples from each core are similar and the sample clusters corresponding to the three cores are clearly separated, as a result of the different conditions of sedimentation. Some changes in the depositional environment are recognized using the results of multivariate analysis. The enrichment of Pb, Cu, and Zn in the upper parts of cores is in agreement with increasing anthropogenic influx (pollution).
Jayashree, B; Rajgopal, S; Hoisington, D; Prasanth, V P; Chandra, S
2008-09-24
Structure, is a widely used software tool to investigate population genetic structure with multi-locus genotyping data. The software uses an iterative algorithm to group individuals into "K" clusters, representing possibly K genetically distinct subpopulations. The serial implementation of this programme is processor-intensive even with small datasets. We describe an implementation of the program within a parallel framework. Speedup was achieved by running different replicates and values of K on each node of the cluster. A web-based user-oriented GUI has been implemented in PHP, through which the user can specify input parameters for the programme. The number of processors to be used can be specified in the background command. A web-based visualization tool "Visualstruct", written in PHP (HTML and Java script embedded), allows for the graphical display of population clusters output from Structure, where each individual may be visualized as a line segment with K colors defining its possible genomic composition with respect to the K genetic sub-populations. The advantage over available programs is in the increased number of individuals that can be visualized. The analyses of real datasets indicate a speedup of up to four, when comparing the speed of execution on clusters of eight processors with the speed of execution on one desktop. The software package is freely available to interested users upon request.
Zhang, Zhi-Guo; Song, Chang-Heng; Zhang, Fang-Zhen; Chen, Yan-Jing; Xiang, Li-Hua; Xiao, Gary Guishan; Ju, Da-Hong
2016-06-01
Rhizoma Dioscoreae extract (RDE) exhibits a protective effect on alveolar bone loss in ovariectomized (OVX) rats. The aim of this study was to predict the pathways or targets that are regulated by RDE, by re‑assessing our previously reported data and conducting a protein‑protein interaction (PPI) network analysis. In total, 383 differentially expressed genes (≥3‑fold) between alveolar bone samples from the RDE and OVX group rats were identified, and a PPI network was constructed based on these genes. Furthermore, four molecular clusters (A‑D) in the PPI network with the smallest P‑values were detected by molecular complex detection (MCODE) algorithm. Using Database for Annotation, Visualization and Integrated Discovery (DAVID) and Ingenuity Pathway Analysis (IPA) tools, two molecular clusters (A and B) were enriched for biological process in Gene Ontology (GO). Only cluster A was associated with biological pathways in the IPA database. GO and pathway analysis results showed that cluster A, associated with cell cycle regulation, was the most important molecular cluster in the PPI network. In addition, cyclin‑dependent kinase 1 (CDK1) may be a key molecule achieving the cell‑cycle‑regulatory function of cluster A. From the PPI network analysis, it was predicted that delayed cell cycle progression in excessive alveolar bone remodeling via downregulation of CDK1 may be another mechanism underling the anti‑osteopenic effect of RDE on alveolar bone.
Verbal and Visual Memory Impairments in Bipolar I and II Disorder.
Ha, Tae Hyon; Kim, Ji Sun; Chang, Jae Seung; Oh, Sung Hee; Her, Ju Young; Cho, Hyun Sang; Park, Tae Sung; Shin, Soon Young; Ha, Kyooseob
2012-12-01
To compare verbal and visual memory performances between patients with bipolar I disorder (BD I) and patients with bipolar II disorder (BD II) and to determine whether memory deficits were mediated by impaired organizational strategies. Performances on the Korean-California Verbal Learning Test (K-CVLT) and the Rey-Osterrieth Complex Figure Test (ROCF) in 37 patients with BD I, 46 patients with BD II and 42 healthy subjects were compared. Mediating effects of impaired organization strategies on poor delayed recall was tested by comparing direct and mediated models using multiple regression analysis. Both patients groups recalled fewer words and figure components and showed lower Semantic Clustering compared to controls. Verbal memory impairment was partly mediated by difficulties in Semantic Clustering in both subtypes, whereas the mediating effect of Organization deficit on the visual memory impairment was present only in BD I. In all mediated models, group differences in delayed recall remained significant. Our findings suggest that memory impairment may be one of the fundamental cognitive deficits in bipolar disorders and that executive dysfunctions can exert an additional influence on memory impairments.
Dynamic analysis and pattern visualization of forest fires.
Lopes, António M; Tenreiro Machado, J A
2014-01-01
This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns.
Dynamic Analysis and Pattern Visualization of Forest Fires
Lopes, António M.; Tenreiro Machado, J. A.
2014-01-01
This paper analyses forest fires in the perspective of dynamical systems. Forest fires exhibit complex correlations in size, space and time, revealing features often present in complex systems, such as the absence of a characteristic length-scale, or the emergence of long range correlations and persistent memory. This study addresses a public domain forest fires catalogue, containing information of events for Portugal, during the period from 1980 up to 2012. The data is analysed in an annual basis, modelling the occurrences as sequences of Dirac impulses with amplitude proportional to the burnt area. First, we consider mutual information to correlate annual patterns. We use visualization trees, generated by hierarchical clustering algorithms, in order to compare and to extract relationships among the data. Second, we adopt the Multidimensional Scaling (MDS) visualization tool. MDS generates maps where each object corresponds to a point. Objects that are perceived to be similar to each other are placed on the map forming clusters. The results are analysed in order to extract relationships among the data and to identify forest fire patterns. PMID:25137393
Treelink: data integration, clustering and visualization of phylogenetic trees.
Allende, Christian; Sohn, Erik; Little, Cedric
2015-12-29
Phylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis. Treelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms. Our software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com .
Automated atlas-based clustering of white matter fiber tracts from DTMRI.
Maddah, Mahnaz; Mewes, Andrea U J; Haker, Steven; Grimson, W Eric L; Warfield, Simon K
2005-01-01
A new framework is presented for clustering fiber tracts into anatomically known bundles. This work is motivated by medical applications in which variation analysis of known bundles of fiber tracts in the human brain is desired. To include the anatomical knowledge in the clustering, we invoke an atlas of fiber tracts, labeled by the number of bundles of interest. In this work, we construct such an atlas and use it to cluster all fiber tracts in the white matter. To build the atlas, we start with a set of labeled ROIs specified by an expert and extract the fiber tracts initiating from each ROI. Affine registration is used to project the extracted fiber tracts of each subject to the atlas, whereas their B-spline representation is used to efficiently compare them to the fiber tracts in the atlas and assign cluster labels. Expert visual inspection of the result confirms that the proposed method is very promising and efficient in clustering of the known bundles of fiber tracts.
Serrano, M G; Camargo, E P; Teixeira, M M
1999-01-01
The random amplification of polymorphic DNA was used for easy, quick and sensitive assessment of genetic polymorphism within Phytomonas to discriminate isolates and determine genetic relationships within the genus. We examined 48 Phytomonas spp., 31 isolates from plants and 17 from insects, from different geographic regions. Topology of the dendrogram based on randomly amplified polymorphic DNA fingerprints segregated the Phytomonas spp. into 5 main clusters, despite the high genetic variability within this genus. Similar clustering could also be obtained by both visual and cross-hybridization analysis of randomly amplified synapomorphic DNA fragments. There was some concordance between the genetic relationship of isolates and their plant tissue tropism. Moreover, Phytomonas spp. from plants and insects were grouped according to geographic origin, thus revealing a complex structure of this taxon comprising several clusters of very closely related organisms.
González-Calabozo, Jose M; Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen
2016-09-15
Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.
Curtis, Andrew J
2008-01-01
Background An epidemic may exhibit different spatial patterns with a change in geographic scale, with each scale having different conduits and impediments to disease spread. Mapping disease at each of these scales often reveals different cluster patterns. This paper will consider this change of geographic scale in an analysis of yellow fever deaths for New Orleans in 1878. Global clustering for the whole city, will be followed by a focus on the French Quarter, then clusters of that area, and finally street-level patterns of a single cluster. The three-dimensional visualization capabilities of a GIS will be used as part of a cluster creation process that incorporates physical buildings in calculating mortality-to-mortality distance. Including nativity of the deceased will also capture cultural connection. Results Twenty-two yellow fever clusters were identified for the French Quarter. These generally mirror the results of other global cluster and density surfaces created for the entire epidemic in New Orleans. However, the addition of building-distance, and disease specific time frame between deaths reveal that disease spread contains a cultural component. Same nativity mortality clusters emerge in a similar time frame irrespective of proximity. Italian nativity mortalities were far more densely grouped than any of the other cohorts. A final examination of mortalities for one of the nativity clusters reveals that further sub-division is present, and that this pattern would only be revealed at this scale (street level) of investigation. Conclusion Disease spread in an epidemic is complex resulting from a combination of geographic distance, geographic distance with specific connection to the built environment, disease-specific time frame between deaths, impediments such as herd immunity, and social or cultural connection. This research has shown that the importance of cultural connection may be more important than simple proximity, which in turn might mean traditional quarantine measures should be re-evaluated. PMID:18721469
Curtis, Andrew J
2008-08-22
An epidemic may exhibit different spatial patterns with a change in geographic scale, with each scale having different conduits and impediments to disease spread. Mapping disease at each of these scales often reveals different cluster patterns. This paper will consider this change of geographic scale in an analysis of yellow fever deaths for New Orleans in 1878. Global clustering for the whole city, will be followed by a focus on the French Quarter, then clusters of that area, and finally street-level patterns of a single cluster. The three-dimensional visualization capabilities of a GIS will be used as part of a cluster creation process that incorporates physical buildings in calculating mortality-to-mortality distance. Including nativity of the deceased will also capture cultural connection. Twenty-two yellow fever clusters were identified for the French Quarter. These generally mirror the results of other global cluster and density surfaces created for the entire epidemic in New Orleans. However, the addition of building-distance, and disease specific time frame between deaths reveal that disease spread contains a cultural component. Same nativity mortality clusters emerge in a similar time frame irrespective of proximity. Italian nativity mortalities were far more densely grouped than any of the other cohorts. A final examination of mortalities for one of the nativity clusters reveals that further sub-division is present, and that this pattern would only be revealed at this scale (street level) of investigation. Disease spread in an epidemic is complex resulting from a combination of geographic distance, geographic distance with specific connection to the built environment, disease-specific time frame between deaths, impediments such as herd immunity, and social or cultural connection. This research has shown that the importance of cultural connection may be more important than simple proximity, which in turn might mean traditional quarantine measures should be re-evaluated.
Mayday - integrative analytics for expression data
2010-01-01
Background DNA Microarrays have become the standard method for large scale analyses of gene expression and epigenomics. The increasing complexity and inherent noisiness of the generated data makes visual data exploration ever more important. Fast deployment of new methods as well as a combination of predefined, easy to apply methods with programmer's access to the data are important requirements for any analysis framework. Mayday is an open source platform with emphasis on visual data exploration and analysis. Many built-in methods for clustering, machine learning and classification are provided for dissecting complex datasets. Plugins can easily be written to extend Mayday's functionality in a large number of ways. As Java program, Mayday is platform-independent and can be used as Java WebStart application without any installation. Mayday can import data from several file formats, database connectivity is included for efficient data organization. Numerous interactive visualization tools, including box plots, profile plots, principal component plots and a heatmap are available, can be enhanced with metadata and exported as publication quality vector files. Results We have rewritten large parts of Mayday's core to make it more efficient and ready for future developments. Among the large number of new plugins are an automated processing framework, dynamic filtering, new and efficient clustering methods, a machine learning module and database connectivity. Extensive manual data analysis can be done using an inbuilt R terminal and an integrated SQL querying interface. Our visualization framework has become more powerful, new plot types have been added and existing plots improved. Conclusions We present a major extension of Mayday, a very versatile open-source framework for efficient micro array data analysis designed for biologists and bioinformaticians. Most everyday tasks are already covered. The large number of available plugins as well as the extension possibilities using compiled plugins and ad-hoc scripting allow for the rapid adaption of Mayday also to very specialized data exploration. Mayday is available at http://microarray-analysis.org. PMID:20214778
Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy
NASA Astrophysics Data System (ADS)
Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.
2012-11-01
Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.
Healthcare experiences of women with visual impairment.
Sharts-Hopko, Nancy C; Smeltzer, Suzanne; Ott, Barbara B; Zimmerman, Vanessa; Duffin, Janice
2010-01-01
This investigation was a secondary analysis of focus group transcripts to address the question of how women with low vision or blindness have experienced healthcare. Secondary analysis of qualitative data was performed on transcripts from 2 focus groups. These focus groups were conducted at an agency serving visually impaired people in Philadelphia. The 2 focus groups included 7 and 11 women, respectively, having low-vision or who are blind who had been part of an original study of reaching hard-to-reach women with disabilities. Content analysis for the identification of thematic clusters was performed on transcriptions of the focus group data. Findings are consistent with existing research on the health needs of women with disabilities but add specific understanding related to visual impairment. Six thematic categories were identified: health professionals' awareness, information access, healthcare access, isolation, the need for self-advocacy, and perception by others. Secondary analysis of qualitative data affords in-depth understanding of a particular subset of participants within a larger study. Clinical nurse specialists and other health professionals need to increase their sensitivity to the challenges faced by women with visual impairment, and plan and provide care accordingly. Health professions students need to be prepared to interact with people who are visually impaired and healthcare settings need to respond to their needs.
Eguchi, Akihiro; Neymotin, Samuel A.; Stringer, Simon M.
2014-01-01
Although many computational models have been proposed to explain orientation maps in primary visual cortex (V1), it is not yet known how similar clusters of color-selective neurons in macaque V1/V2 are connected and develop. In this work, we address the problem of understanding the cortical processing of color information with a possible mechanism of the development of the patchy distribution of color selectivity via computational modeling. Each color input is decomposed into a red, green, and blue representation and transmitted to the visual cortex via a simulated optic nerve in a luminance channel and red–green and blue–yellow opponent color channels. Our model of the early visual system consists of multiple topographically-arranged layers of excitatory and inhibitory neurons, with sparse intra-layer connectivity and feed-forward connectivity between layers. Layers are arranged based on anatomy of early visual pathways, and include a retina, lateral geniculate nucleus, and layered neocortex. Each neuron in the V1 output layer makes synaptic connections to neighboring neurons and receives the three types of signals in the different channels from the corresponding photoreceptor position. Synaptic weights are randomized and learned using spike-timing-dependent plasticity (STDP). After training with natural images, the neurons display heightened sensitivity to specific colors. Information-theoretic analysis reveals mutual information between particular stimuli and responses, and that the information reaches a maximum with fewer neurons in the higher layers, indicating that estimations of the input colors can be done using the output of fewer cells in the later stages of cortical processing. In addition, cells with similar color receptive fields form clusters. Analysis of spiking activity reveals increased firing synchrony between neurons when particular color inputs are presented or removed (ON-cell/OFF-cell). PMID:24659956
Data Mining in Earth System Science (DMESS 2011)
Forrest M. Hoffman; J. Walter Larson; Richard Tran Mills; Bhorn-Gustaf Brooks; Auroop R. Ganguly; William Hargrove; et al
2011-01-01
From field-scale measurements to global climate simulations and remote sensing, the growing body of very large and long time series Earth science data are increasingly difficult to analyze, visualize, and interpret. Data mining, information theoretic, and machine learning techniquesâsuch as cluster analysis, singular value decomposition, block entropy, Fourier and...
Cheng, Gong; Lu, Quan; Ma, Ling; Zhang, Guocai; Xu, Liang; Zhou, Zongshan
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily.
Cheng, Gong; Zhang, Guocai; Xu, Liang
2017-01-01
Recently, Docker technology has received increasing attention throughout the bioinformatics community. However, its implementation has not yet been mastered by most biologists; accordingly, its application in biological research has been limited. In order to popularize this technology in the field of bioinformatics and to promote the use of publicly available bioinformatics tools, such as Dockerfiles and Images from communities, government sources, and private owners in the Docker Hub Registry and other Docker-based resources, we introduce here a complete and accurate bioinformatics workflow based on Docker. The present workflow enables analysis and visualization of pan-genomes and biosynthetic gene clusters of bacteria. This provides a new solution for bioinformatics mining of big data from various publicly available biological databases. The present step-by-step guide creates an integrative workflow through a Dockerfile to allow researchers to build their own Image and run Container easily. PMID:29204317
Self-assembly of high-nuclearity lanthanide-based nanoclusters for potential bioimaging applications
NASA Astrophysics Data System (ADS)
Yang, Xiaoping; Wang, Shiqing; Schipper, Desmond; Zhang, Lijie; Li, Zongping; Huang, Shaoming; Yuan, Daqiang; Chen, Zhongning; Gnanam, Annie J.; Hall, Justin W.; King, Tyler L.; Que, Emily; Dieye, Yakhya; Vadivelu, Jamuna; Brown, Katherine A.; Jones, Richard A.
2016-05-01
Two series of Cd-Ln and Ni-Ln clusters [Ln8Cd24L12(OAc)44(48)Cl4(0)] and [Ln8Ni6L6(OAc)24(EtOH)6(H2O)2] were constructed using a flexible ligand. The Cd-Ln clusters exhibit interesting nano-drum-like structures which allows direct visualization by TEM. Luminex MicroPlex Microspheres loaded with the Cd-Sm cluster were visualized using epifluorescence microscopy. Cytotoxicity studies on A549 and AGS cancer cell lines showed that the materials have mild to moderate cytotoxicity.Two series of Cd-Ln and Ni-Ln clusters [Ln8Cd24L12(OAc)44(48)Cl4(0)] and [Ln8Ni6L6(OAc)24(EtOH)6(H2O)2] were constructed using a flexible ligand. The Cd-Ln clusters exhibit interesting nano-drum-like structures which allows direct visualization by TEM. Luminex MicroPlex Microspheres loaded with the Cd-Sm cluster were visualized using epifluorescence microscopy. Cytotoxicity studies on A549 and AGS cancer cell lines showed that the materials have mild to moderate cytotoxicity. Electronic supplementary information (ESI) available: Full experimental and characterization details for 1-5. CCDC 1007468, 1007469 and 1007472-1007474. For ESI and crystallographic data in CIF or other electronic format see DOI: 10.1039/c6nr00642f
magHD: a new approach to multi-dimensional data storage, analysis, display and exploitation
NASA Astrophysics Data System (ADS)
Angleraud, Christophe
2014-06-01
The ever increasing amount of data and processing capabilities - following the well- known Moore's law - is challenging the way scientists and engineers are currently exploiting large datasets. The scientific visualization tools, although quite powerful, are often too generic and provide abstract views of phenomena, thus preventing cross disciplines fertilization. On the other end, Geographic information Systems allow nice and visually appealing maps to be built but they often get very confused as more layers are added. Moreover, the introduction of time as a fourth analysis dimension to allow analysis of time dependent phenomena such as meteorological or climate models, is encouraging real-time data exploration techniques that allow spatial-temporal points of interests to be detected by integration of moving images by the human brain. Magellium is involved in high performance image processing chains for satellite image processing as well as scientific signal analysis and geographic information management since its creation (2003). We believe that recent work on big data, GPU and peer-to-peer collaborative processing can open a new breakthrough in data analysis and display that will serve many new applications in collaborative scientific computing, environment mapping and understanding. The magHD (for Magellium Hyper-Dimension) project aims at developing software solutions that will bring highly interactive tools for complex datasets analysis and exploration commodity hardware, targeting small to medium scale clusters with expansion capabilities to large cloud based clusters.
[Bibliometric analysis of current glaucoma research based on Pubmed database].
Huang, Wen-bin; Wang, Wei; Zhou, Min-wen; Chen, Shi-da; Zhang, Xiu-lan
2013-11-01
To survey the distribution pattern and subject domain knowledge of worldwide glaucoma research based on literatures in Pubmed database. Literatures on glaucoma published in 2007 to 2011 were identified in Pubmed database. The analytic items of an article include published year, country, language author, and journal. After core mesh terms had been characterized by BICOMS, the co-occurrence matrix was built. Cluster analysis was finished by SPSS 20.0. Then visualized network was drawn using ucinet 6.0. Totally 6427 literatures were included, the number of annual articles changed slightly between 2007 and 2011. The United States, England, Germany, Australia, and France together accounted for 77.63% of articles. There were 52 high-frequency subjects and hot topics were clustered into the following 10 categories: (1) Pathology of optic disc and nerve fibers and OCT application, (2) METHODS: of visual field (VF) and visual function examination, (3) Glaucoma drug medications, (4) Pathology and physiology of primary open angle glaucoma (POAG) including VF and intraocular pressure (IOP), (5) Glaucoma surgery, (6) Gene research related to POAG, (7) Glaucoma disease pathology and animal models, (8) Ocular hypertension (OHT) induced complications and corneal changes, (9) Etiology of congenital glaucoma and complications, (10) Etiology and epidemiology of glaucoma. The visualized domain knowledge mapping was successfully built. The pathology of optic disc and nerve fibers, medications, and surgery were well developed. Study on IOP and visual field was in the core domain, which have an important link to etiology, diagnosis, and therapy. The researches on glaucomatous gene, disease pathology model, congenital glaucoma, etiology and epidemiology were not developed well, which are of great promotion space. The distribution pattern and subject domain knowledge of worldwide glaucoma research in the recent five years were shown by using bibliometric analysis.Western developed countries play a leading role in the field of glaucoma research, the international influence of related research in China needs to be strengthened.
Towards a New Generation of Time-Series Visualization Tools in the ESA Heliophysics Science Archives
NASA Astrophysics Data System (ADS)
Perez, H.; Martinez, B.; Cook, J. P.; Herment, D.; Fernandez, M.; De Teodoro, P.; Arnaud, M.; Middleton, H. R.; Osuna, P.; Arviset, C.
2017-12-01
During the last decades a varied set of Heliophysics missions have allowed the scientific community to gain a better knowledge on the solar atmosphere and activity. The remote sensing images of missions such as SOHO have paved the ground for Helio-based spatial data visualization software such as JHelioViewer/Helioviewer. On the other hand, the huge amount of in-situ measurements provided by other missions such as Cluster provide a wide base for plot visualization software whose reach is still far from being fully exploited. The Heliophysics Science Archives within the ESAC Science Data Center (ESDC) already provide a first generation of tools for time-series visualization focusing on each mission's needs: visualization of quicklook plots, cross-calibration time series, pre-generated/on-demand multi-plot stacks (Cluster), basic plot zoom in/out options (Ulysses) and easy navigation through the plots in time (Ulysses, Cluster, ISS-Solaces). However, as the needs evolve and the scientists involved in new missions require to plot multi-variable data, heat maps stacks interactive synchronization and axis variable selection among other improvements. The new Heliophysics archives (such as Solar Orbiter) and the evolution of existing ones (Cluster) intend to address these new challenges. This paper provides an overview of the different approaches for visualizing time-series followed within the ESA Heliophysics Archives and their foreseen evolution.
Molecular subtyping of bladder cancer using Kohonen self-organizing maps.
Borkowska, Edyta M; Kruk, Andrzej; Jedrzejczyk, Adam; Rozniecki, Marek; Jablonowski, Zbigniew; Traczyk, Magdalena; Constantinou, Maria; Banaszkiewicz, Monika; Pietrusinski, Michal; Sosnowski, Marek; Hamdy, Freddie C; Peter, Stefan; Catto, James W F; Kaluzewski, Bogdan
2014-10-01
Kohonen self-organizing maps (SOMs) are unsupervised Artificial Neural Networks (ANNs) that are good for low-density data visualization. They easily deal with complex and nonlinear relationships between variables. We evaluated molecular events that characterize high- and low-grade BC pathways in the tumors from 104 patients. We compared the ability of statistical clustering with a SOM to stratify tumors according to the risk of progression to more advanced disease. In univariable analysis, tumor stage (log rank P = 0.006) and grade (P < 0.001), HPV DNA (P < 0.004), Chromosome 9 loss (P = 0.04) and the A148T polymorphism (rs 3731249) in CDKN2A (P = 0.02) were associated with progression. Multivariable analysis of these parameters identified that tumor grade (Cox regression, P = 0.001, OR.2.9 (95% CI 1.6-5.2)) and the presence of HPV DNA (P = 0.017, OR 3.8 (95% CI 1.3-11.4)) were the only independent predictors of progression. Unsupervised hierarchical clustering grouped the tumors into discreet branches but did not stratify according to progression free survival (log rank P = 0.39). These genetic variables were presented to SOM input neurons. SOMs are suitable for complex data integration, allow easy visualization of outcomes, and may stratify BC progression more robustly than hierarchical clustering. © 2014 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
Determining the trophic guilds of fishes and macroinvertebrates in a seagrass food web
Luczkovich, J.J.; Ward, G.P.; Johnson, J.C.; Christian, R.R.; Baird, D.; Neckles, H.; Rizzo, W.M.
2002-01-01
We established trophic guilds of macroinvertebrate and fish taxa using correspondence analysis and a hierarchical clustering strategy for a seagrass food web in winter in the northeastern Gulf of Mexico. To create the diet matrix, we characterized the trophic linkages of macroinvertebrate and fish taxa. present in Hatodule wrightii seagrass habitat areas within the St. Marks National Wildlife Refuge (Florida) using binary data, combining dietary links obtained from relevant literature for macroinvertebrates with stomach analysis of common fishes collected during January and February of 1994. Heirarchical average-linkage cluster analysis of the 73 taxa of fishes and macroinvertebrates in the diet matrix yielded 14 clusters with diet similarity greater than or equal to 0.60. We then used correspondence analysis with three factors to jointly plot the coordinates of the consumers (identified by cluster membership) and of the 33 food sources. Correspondence analysis served as a visualization tool for assigning each taxon to one of eight trophic guilds: herbivores, detritivores, suspension feeders, omnivores, molluscivores, meiobenthos consumers, macrobenthos consumers, and piscivores. These trophic groups, cross-classified with major taxonomic groups, were further used to develop consumer compartments in a network analysis model of carbon flow in this seagrass ecosystem. The method presented here should greatly improve the development of future network models of food webs by providing an objective procedure for aggregating trophic groups.
Carpenter, Joanne S; Robillard, Rébecca; Lee, Rico S C; Hermens, Daniel F; Naismith, Sharon L; White, Django; Whitwell, Bradley; Scott, Elizabeth M; Hickie, Ian B
2015-01-01
Although early-stage affective disorders are associated with both cognitive dysfunction and sleep-wake disruptions, relationships between these factors have not been specifically examined in young adults. Sleep and circadian rhythm disturbances in those with affective disorders are considerably heterogeneous, and may not relate to cognitive dysfunction in a simple linear fashion. This study aimed to characterise profiles of sleep and circadian disturbance in young people with affective disorders and examine associations between these profiles and cognitive performance. Actigraphy monitoring was completed in 152 young people (16-30 years; 66% female) with primary diagnoses of affective disorders, and 69 healthy controls (18-30 years; 57% female). Patients also underwent detailed neuropsychological assessment. Actigraphy data were processed to estimate both sleep and circadian parameters. Overall neuropsychological performance in patients was poor on tasks relating to mental flexibility and visual memory. Two hierarchical cluster analyses identified three distinct patient groups based on sleep variables and three based on circadian variables. Sleep clusters included a 'long sleep' cluster, a 'disrupted sleep' cluster, and a 'delayed and disrupted sleep' cluster. Circadian clusters included a 'strong circadian' cluster, a 'weak circadian' cluster, and a 'delayed circadian' cluster. Medication use differed between clusters. The 'long sleep' cluster displayed significantly worse visual memory performance compared to the 'disrupted sleep' cluster. No other cognitive functions differed between clusters. These results highlight the heterogeneity of sleep and circadian profiles in young people with affective disorders, and provide preliminary evidence in support of a relationship between sleep and visual memory, which may be mediated by use of antipsychotic medication. These findings have implications for the personalisation of treatments and improvement of functioning in young adults early in the course of affective illness.
Machine-learned cluster identification in high-dimensional data.
Ultsch, Alfred; Lötsch, Jörn
2017-02-01
High-dimensional biomedical data are frequently clustered to identify subgroup structures pointing at distinct disease subtypes. It is crucial that the used cluster algorithm works correctly. However, by imposing a predefined shape on the clusters, classical algorithms occasionally suggest a cluster structure in homogenously distributed data or assign data points to incorrect clusters. We analyzed whether this can be avoided by using emergent self-organizing feature maps (ESOM). Data sets with different degrees of complexity were submitted to ESOM analysis with large numbers of neurons, using an interactive R-based bioinformatics tool. On top of the trained ESOM the distance structure in the high dimensional feature space was visualized in the form of a so-called U-matrix. Clustering results were compared with those provided by classical common cluster algorithms including single linkage, Ward and k-means. Ward clustering imposed cluster structures on cluster-less "golf ball", "cuboid" and "S-shaped" data sets that contained no structure at all (random data). Ward clustering also imposed structures on permuted real world data sets. By contrast, the ESOM/U-matrix approach correctly found that these data contain no cluster structure. However, ESOM/U-matrix was correct in identifying clusters in biomedical data truly containing subgroups. It was always correct in cluster structure identification in further canonical artificial data. Using intentionally simple data sets, it is shown that popular clustering algorithms typically used for biomedical data sets may fail to cluster data correctly, suggesting that they are also likely to perform erroneously on high dimensional biomedical data. The present analyses emphasized that generally established classical hierarchical clustering algorithms carry a considerable tendency to produce erroneous results. By contrast, unsupervised machine-learned analysis of cluster structures, applied using the ESOM/U-matrix method, is a viable, unbiased method to identify true clusters in the high-dimensional space of complex data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Susilo; Setyaningsih, M.
2018-01-01
Solanum melongena (eggplant) is one of the diversity of the Solanum family which is grown and widely spread in Indonesia and widely used by the community. This research explored the genetic diversity of four local Indonesian eggplant species namely leuca, tekokak, gelatik and kopek by using RAPD (Random Amplified Polymorphic DNA). The samples were obtained from Agricultural Technology Assessment Institute (BPTP) Bogor, Indonesia. The result of data observation was in the form of Solanum melongena plant’s DNA profile analyzed descriptively and quantitatively. 30 DNA bands (28 polymorphic and 2 monomorphic) were successfully scored by using four primers (OPF-01, OPF-02, OPF-03, and OPF-04). The Primers were used able to amplify all of the four eggplant samples. The result of PCR-RAPD visualization produces bands of 300-1500 bp. The result of cluster analysis showed the existence of three clusters (A, B, and C). Cluster A (coefficient of equal to 49%) consisted of a gelatik, cluster B (coefficient of 65% equilibrium) consisted of TPU (Kopek) and TK (Tekokak), and cluster C (55% equilibrium coefficient) consisted of LC (Leunca). These results indicated that the closest proximity is found in samples of TK (Tekokak) and TPU (Kopek).
Matsen IV, Frederick A.; Evans, Steven N.
2013-01-01
Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. PMID:23505415
Zackay, Arie; Steinhoff, Christine
2010-12-15
Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org.
2010-01-01
Background Exploration of DNA methylation and its impact on various regulatory mechanisms has become a very active field of research. Simultaneously there is an arising need for tools to process and analyse the data together with statistical investigation and visualisation. Findings MethVisual is a new application that enables exploratory analysis and intuitive visualization of DNA methylation data as is typically generated by bisulfite sequencing. The package allows the import of DNA methylation sequences, aligns them and performs quality control comparison. It comprises basic analysis steps as lollipop visualization, co-occurrence display of methylation of neighbouring and distant CpG sites, summary statistics on methylation status, clustering and correspondence analysis. The package has been developed for methylation data but can be also used for other data types for which binary coding can be inferred. The application of the package, as well as a comparison to existing DNA methylation analysis tools and its workflow based on two datasets is presented in this paper. Conclusions The R package MethVisual offers various analysis procedures for data that can be binarized, in particular for bisulfite sequenced methylation data. R/Bioconductor has become one of the most important environments for statistical analysis of various types of biological and medical data. Therefore, any data analysis within R that allows the integration of various data types as provided from different technological platforms is convenient. It is the first and so far the only specific package for DNA methylation analysis, in particular for bisulfite sequenced data available in R/Bioconductor enviroment. The package is available for free at http://methvisual.molgen.mpg.de/ and from the Bioconductor Consortium http://www.bioconductor.org. PMID:21159174
Wang, Li Kun; Heng, Paul Wan Sia; Liew, Celine Valeria
2015-04-01
Bottom spray fluid-bed coating is a common technique for coating multiparticulates. Under the quality-by-design framework, particle recirculation within the partition column is one of the main variability sources affecting particle coating and coat uniformity. However, the occurrence and mechanism of particle recirculation within the partition column of the coater are not well understood. The purpose of this study was to visualize and define particle recirculation within the partition column. Based on different combinations of partition gap setting, air accelerator insert diameter, and particle size fraction, particle movements within the partition column were captured using a high-speed video camera. The particle recirculation probability and voidage information were mapped using a visiometric process analyzer. High-speed images showed that particles contributing to the recirculation phenomenon were behaving as clustered colonies. Fluid dynamics analysis indicated that particle recirculation within the partition column may be attributed to the combined effect of cluster formation and drag reduction. Both visiometric process analysis and particle coating experiments showed that smaller particles had greater propensity toward cluster formation than larger particles. The influence of cluster formation on coating performance and possible solutions to cluster formation were further discussed. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.
Local matrix learning in clustering and applications for manifold visualization.
Arnonkijpanich, Banchar; Hasenfuss, Alexander; Hammer, Barbara
2010-05-01
Electronic data sets are increasing rapidly with respect to both, size of the data sets and data resolution, i.e. dimensionality, such that adequate data inspection and data visualization have become central issues of data mining. In this article, we present an extension of classical clustering schemes by local matrix adaptation, which allows a better representation of data by means of clusters with an arbitrary spherical shape. Unlike previous proposals, the method is derived from a global cost function. The focus of this article is to demonstrate the applicability of this matrix clustering scheme to low-dimensional data embedding for data inspection. The proposed method is based on matrix learning for neural gas and manifold charting. This provides an explicit mapping of a given high-dimensional data space to low dimensionality. We demonstrate the usefulness of this method for data inspection and manifold visualization. 2009 Elsevier Ltd. All rights reserved.
On-road driving impairments and associated cognitive deficits after stroke.
Devos, Hannes; Tant, Mark; Akinwuntan, Abiodun E
2014-01-01
Little is known about the critical on-road driving skills that get affected after a stroke. The purpose of this study was to investigate the key on-road driving impairments and their associated cognitive deficits after a stroke. A second aim was to investigate if lateralization of stroke impacts results of the cognitive and on-road driving tests. In this cross-sectional study, 99 participants with a first-ever stroke who were actively driving prior to stroke underwent a cognitive battery and a standardized road test that evaluated 13 specific on-road driving skills. These on-road driving skills were mapped onto an existing, theoretical framework that categorized the on-road items into hierarchic clusters of operational, tactical, visuo-integrative, and mixed driving skills. The total score on the road test and the on-road decision, made by a certified fitness-to-drive expert, decided the main outcome. The critical on-road driving skills predicting the on-road decision were identified using logistic regression analysis. Linear regression analysis was employed to determine the cognitive impairments leading to poor total on-road scores. Analyses were repeated for right- and left-sided strokes. In all, 37 persons scored poorly on the road test. These participants performed worse in all hierarchic clusters of on-road driving. Performances on the operational cluster and the visuo-integrative cluster best predicted on-road decisions (R(2) = 0.60). 'Lane changing' and 'understanding, insight, and quality of traffic participation' were the critical skill deficits leading to poor performance on the road test (R(2) = 0.65). Divided attention was the main determinant of on-road scores in the total group (R(2) = 0.06). Participants with right-sided stroke performed worse on visual field, visual neglect, visual scanning, visuo-constructive skills, and divided attention compared with those with left-sided stroke. Divided attention was the main determinant of total on-road scores in the right-sided stroke group (R(2) = 0.10). A combination of visual scanning, speed of processing, and executive dysfunction yielded the best model to predict on-road scores in left-sided strokes (R(2) = 0.46). Poor performance in the road test after stroke is determined by critical operational and visuo-integrative driving impairments. Specific and different driving evaluation and training programs are needed for right- and left-sided strokes. © 2014 S. Karger AG, Basel.
Wu, Zhichao; Medeiros, Felipe A
2018-03-20
Visual field testing is an important endpoint in glaucoma clinical trials, and the testing paradigm used can have a significant impact on the sample size requirements. To investigate this, this study included 353 eyes of 247 glaucoma patients seen over a 3-year period to extract real-world visual field rates of change and variability estimates to provide sample size estimates from computer simulations. The clinical trial scenario assumed that a new treatment was added to one of two groups that were both under routine clinical care, with various treatment effects examined. Three different visual field testing paradigms were evaluated: a) evenly spaced testing, b) United Kingdom Glaucoma Treatment Study (UKGTS) follow-up scheme, which adds clustered tests at the beginning and end of follow-up in addition to evenly spaced testing, and c) clustered testing paradigm, with clusters of tests at the beginning and end of the trial period and two intermediary visits. The sample size requirements were reduced by 17-19% and 39-40% using the UKGTS and clustered testing paradigms, respectively, when compared to the evenly spaced approach. These findings highlight how the clustered testing paradigm can substantially reduce sample size requirements and improve the feasibility of future glaucoma clinical trials.
Computationally Efficient Clustering of Audio-Visual Meeting Data
NASA Astrophysics Data System (ADS)
Hung, Hayley; Friedland, Gerald; Yeo, Chuohao
This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahrens, James P; Patchett, John M; Lo, Li - Ta
2011-01-24
This report provides documentation for the completion of the Los Alamos portion of the ASC Level II 'Visualization on the Supercomputing Platform' milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratory and Los Alamos National Laboratory. The milestone text is shown in Figure 1 with the Los Alamos portions highlighted in boldfaced text. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is the most computationally intensive portion of the visualization process. Formore » terascale platforms, commodity clusters with graphics processors (GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the perfromance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. In conclusion, we improved CPU-based rendering performance by a a factor of 2-10 times on our tests. In addition, we evaluated CPU and CPU-based rendering performance. We encourage production visualization experts to consider using CPU-based rendering solutions when it is appropriate. For example, on remote supercomputers CPU-based rendering can offer a means of viewing data without having to offload the data or geometry onto a CPU-based visualization system. In terms of comparative performance of the CPU and CPU we believe that further optimizations of the performance of both CPU or CPU-based rendering are possible. The simulation community is currently confronting this reality as they work to port their simulations to different hardware architectures. What is interesting about CPU rendering of massive datasets is that for part two decades CPU performance has significantly outperformed CPU-based systems. Based on our advancements, evaluations and explorations we believe that CPU-based rendering has returned as one viable option for the visualization of massive datasets.« less
Kastberger, G; Kranner, G
2000-02-01
Viscovery SOMine is a software tool for advanced analysis and monitoring of numerical data sets. It was developed for professional use in business, industry, and science and to support dependency analysis, deviation detection, unsupervised clustering, nonlinear regression, data association, pattern recognition, and animated monitoring. Based on the concept of self-organizing maps (SOMs), it employs a robust variant of unsupervised neural networks--namely, Kohonen's Batch-SOM, which is further enhanced with a new scaling technique for speeding up the learning process. This tool provides a powerful means by which to analyze complex data sets without prior statistical knowledge. The data representation contained in the trained SOM is systematically converted to be used in a spectrum of visualization techniques, such as evaluating dependencies between components, investigating geometric properties of the data distribution, searching for clusters, or monitoring new data. We have used this software tool to analyze and visualize multiple influences of the ocellar system on free-flight behavior in giant honeybees. Occlusion of ocelli will affect orienting reactivities in relation to flight target, level of disturbance, and position of the bee in the flight chamber; it will induce phototaxis and make orienting imprecise and dependent on motivational settings. Ocelli permit the adjustment of orienting strategies to environmental demands by enforcing abilities such as centering or flight kinetics and by providing independent control of posture and flight course.
Multifocal visual evoked potentials for early glaucoma detection.
Weizer, Jennifer S; Musch, David C; Niziol, Leslie M; Khan, Naheed W
2012-07-01
To compare multifocal visual evoked potentials (mfVEP) with other detection methods in early open-angle glaucoma. Ten patients with suspected glaucoma and 5 with early open-angle glaucoma underwent mfVEP, standard automated perimetry (SAP), short-wave automated perimetry, frequency-doubling technology perimetry, and nerve fiber layer optical coherence tomography. Nineteen healthy control subjects underwent mfVEP and SAP for comparison. Comparisons between groups involving continuous variables were made using independent t tests; for categorical variables, Fisher's exact test was used. Monocular mfVEP cluster defects were associated with an increased SAP pattern standard deviation (P = .0195). Visual fields that showed interocular mfVEP cluster defects were more likely to also show superior quadrant nerve fiber layer thinning by OCT (P = .0152). Multifocal visual evoked potential cluster defects are associated with a functional and an anatomic measure that both relate to glaucomatous optic neuropathy. Copyright 2012, SLACK Incorporated.
Liu, Chao; Abu-Jamous, Basel; Brattico, Elvira; Nandi, Asoke K
2017-03-01
In the past decades, neuroimaging of humans has gained a position of status within neuroscience, and data-driven approaches and functional connectivity analyses of functional magnetic resonance imaging (fMRI) data are increasingly favored to depict the complex architecture of human brains. However, the reliability of these findings is jeopardized by too many analysis methods and sometimes too few samples used, which leads to discord among researchers. We propose a tunable consensus clustering paradigm that aims at overcoming the clustering methods selection problem as well as reliability issues in neuroimaging by means of first applying several analysis methods (three in this study) on multiple datasets and then integrating the clustering results. To validate the method, we applied it to a complex fMRI experiment involving affective processing of hundreds of music clips. We found that brain structures related to visual, reward, and auditory processing have intrinsic spatial patterns of coherent neuroactivity during affective processing. The comparisons between the results obtained from our method and those from each individual clustering algorithm demonstrate that our paradigm has notable advantages over traditional single clustering algorithms in being able to evidence robust connectivity patterns even with complex neuroimaging data involving a variety of stimuli and affective evaluations of them. The consensus clustering method is implemented in the R package "UNCLES" available on http://cran.r-project.org/web/packages/UNCLES/index.html .
Toward semantic-based retrieval of visual information: a model-based approach
NASA Astrophysics Data System (ADS)
Park, Youngchoon; Golshani, Forouzan; Panchanathan, Sethuraman
2002-07-01
This paper center around the problem of automated visual content classification. To enable classification based image or visual object retrieval, we propose a new image representation scheme called visual context descriptor (VCD) that is a multidimensional vector in which each element represents the frequency of a unique visual property of an image or a region. VCD utilizes the predetermined quality dimensions (i.e., types of features and quantization level) and semantic model templates mined in priori. Not only observed visual cues, but also contextually relevant visual features are proportionally incorporated in VCD. Contextual relevance of a visual cue to a semantic class is determined by using correlation analysis of ground truth samples. Such co-occurrence analysis of visual cues requires transformation of a real-valued visual feature vector (e.g., color histogram, Gabor texture, etc.,) into a discrete event (e.g., terms in text). Good-feature to track, rule of thirds, iterative k-means clustering and TSVQ are involved in transformation of feature vectors into unified symbolic representations called visual terms. Similarity-based visual cue frequency estimation is also proposed and used for ensuring the correctness of model learning and matching since sparseness of sample data causes the unstable results of frequency estimation of visual cues. The proposed method naturally allows integration of heterogeneous visual or temporal or spatial cues in a single classification or matching framework, and can be easily integrated into a semantic knowledge base such as thesaurus, and ontology. Robust semantic visual model template creation and object based image retrieval are demonstrated based on the proposed content description scheme.
Peckys, Diana B; de Jonge, Niels
2011-04-13
The intracellular uptake of 30 nm diameter gold nanoparticles (Au-NPs) was studied at the nanoscale in pristine eukaryotic cells. Live COS-7 cells were maintained in a microfluidic chamber and imaged using scanning transmission electron microscopy. A quantitative image analysis showed that Au-NPs bound to the membranes of vesicles, possibly lysosomes, and occupied 67% of the available surface area. The vesicles accumulated to form a micrometer-sized cluster after 24 h of incubation. Two clusters were analyzed and found to consist of 117 ± 9 and 164 ± 4 NP-filled vesicles.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637
Classification of posture maintenance data with fuzzy clustering algorithms
NASA Technical Reports Server (NTRS)
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
On three-dimensional misorientation spaces.
Krakow, Robert; Bennett, Robbie J; Johnstone, Duncan N; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J; Einsle, Joshua F; Midgley, Paul A; Rae, Catherine M F; Hielscher, Ralf
2017-10-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance.
On three-dimensional misorientation spaces
NASA Astrophysics Data System (ADS)
Krakow, Robert; Bennett, Robbie J.; Johnstone, Duncan N.; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J.; Einsle, Joshua F.; Midgley, Paul A.; Rae, Catherine M. F.; Hielscher, Ralf
2017-10-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance.
Changing the paradigm: messages for hand hygiene education and audit from cluster analysis.
Gould, D J; Navaie, D; Purssell, E; Drey, N S; Creedon, S
2018-04-01
Hand hygiene is considered to be the foremost infection prevention measure. How healthcare workers accept and make sense of the hand hygiene message is likely to contribute to the success and sustainability of initiatives to improve performance, which is often poor. A survey of nurses in critical care units in three National Health Service trusts in England was undertaken to explore opinions about hand hygiene, use of alcohol hand rubs, audit with performance feedback, and other key hand-hygiene-related issues. Data were analysed descriptively and subjected to cluster analysis. Three main clusters of opinion were visualized, each forming a significant group: positive attitudes, pragmatism and scepticism. A smaller cluster suggested possible guilt about ability to perform hand hygiene. Cluster analysis identified previously unsuspected constellations of beliefs about hand hygiene that offer a plausible explanation for behaviour. Healthcare workers might respond to education and audit differently according to these beliefs. Those holding predominantly positive opinions might comply with hand hygiene policy and perform well as infection prevention link nurses and champions. Those holding pragmatic attitudes are likely to respond favourably to the need for professional behaviour and need to protect themselves from infection. Greater persuasion may be needed to encourage those who are sceptical about the importance of hand hygiene to comply with guidelines. Interventions to increase compliance should be sufficiently broad in scope to tackle different beliefs. Alternatively, cluster analysis of hand hygiene beliefs could be used to identify the most effective educational and monitoring strategies for a particular clinical setting. Copyright © 2017 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
Concept mapping and network analysis: an analytic approach to measure ties among constructs.
Goldman, Alyssa W; Kane, Mary
2014-12-01
Group concept mapping is a mixed-methods approach that helps a group visually represent its ideas on a topic of interest through a series of related maps. The maps and additional graphics are useful for planning, evaluation and theory development. Group concept maps are typically described, interpreted and utilized through points, clusters and distances, and the implications of these features in understanding how constructs relate to one another. This paper focuses on the application of network analysis to group concept mapping to quantify the strength and directionality of relationships among clusters. The authors outline the steps of this analysis, and illustrate its practical use through an organizational strategic planning example. Additional benefits of this analysis to evaluation projects are also discussed, supporting the overall utility of this supplemental technique to the standard concept mapping methodology. Copyright © 2014 Elsevier Ltd. All rights reserved.
[Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].
Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong
2016-10-01
Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.
The Angular Power Spectrum of BATSE 3B Gamma-Ray Bursts
NASA Technical Reports Server (NTRS)
Tegmark, Max; Hartmann, Dieter H.; Briggs, Michael S.; Meegan, Charles A.
1996-01-01
We compute the angular power spectrum C(sub l) from the BATSE 3B catalog of 1122 gamma-ray bursts and find no evidence for clustering on any scale. These constraints bridge the entire range from small scales (which probe source clustering and burst repetition) to the largest scales (which constrain possible anisotropics from the Galactic halo or from nearby cosmological large-scale structures). We develop an analysis technique that takes the angular position errors into account. For specific clustering or repetition models, strong upper limits can be obtained down to scales l approx. equal to 30, corresponding to a couple of degrees on the sky. The minimum-variance burst weighting that we employ is visualized graphically as an all-sky map in which each burst is smeared out by an amount corresponding to its position uncertainty. We also present separate bandpass-filtered sky maps for the quadrupole term and for the multipole ranges l = 3-10 and l = 11-30, so that the fluctuations on different angular scales can be inspected separately for visual features such as localized 'hot spots' or structures aligned with the Galactic plane. These filtered maps reveal no apparent deviations from isotropy.
Clustering of food and activity preferences in primary school children.
Rodenburg, Gerda; Oenema, Anke; Pasma, Marleen; Kremers, Stef P J; van de Mheen, Dike
2013-01-01
This study examined clustering of food and activity preferences in Dutch primary school children. It also explored whether the preference clusters are associated with child and parental background characteristics and with parenting practices. Data were used from 1480 parent-child dyads participating in the IVO Nutrition and Physical Activity Child cohort (INPACT). Children aged 8-11years reported their preferences for food (e.g. fruit and sweet snacks) and activities (e.g. biking and watching television) at school with a newly-developed, visual instrument designed for primary school children. Parents completed a questionnaire at home. Principal component analysis was used to identify preference clusters. Backward regression analyses were used to examine the relationship between child and parental characteristics with cluster scores. We found (1) a clustering of preferences for unhealthy foods and unhealthy drinks, (2) a clustering of preferences for various physical activity behaviours, and (3) a clustering of preferences for unhealthy drinks and sedentary behaviour. Boys had a higher cluster score than girls on all three preference clusters. In addition, physical activity-related parenting practices were negatively related to unhealthy preference clusters and positively to the physical-activity-preference cluster. The next step is to relate our preference clusters to child dietary and activity behaviours, with special attention to gender differences. This may help in the development of interventions aimed at improving children's food and activity preferences. Copyright © 2012 Elsevier Ltd. All rights reserved.
Sanchez Sorzano, Carlos Oscar; Alvarez-Cabrera, Ana Lucia; Kazemi, Mohsen; Carazo, Jose María; Jonić, Slavica
2016-04-26
Single-particle electron microscopy (EM) has been shown to be very powerful for studying structures and associated conformational changes of macromolecular complexes. In the context of analyzing conformational changes of complexes, distinct EM density maps obtained by image analysis and three-dimensional (3D) reconstruction are usually analyzed in 3D for interpretation of structural differences. However, graphic visualization of these differences based on a quantitative analysis of elastic transformations (deformations) among density maps has not been done yet due to a lack of appropriate methods. Here, we present an approach that allows such visualization. This approach is based on statistical analysis of distances among elastically aligned pairs of EM maps (one map is deformed to fit the other map), and results in visualizing EM maps as points in a lower-dimensional distance space. The distances among points in the new space can be analyzed in terms of clusters or trajectories of points related to potential conformational changes. The results of the method are shown with synthetic and experimental EM maps at different resolutions. Copyright © 2016 Biophysical Society. Published by Elsevier Inc. All rights reserved.
SpectralNET – an application for spectral graph analysis and visualization
Forman, Joshua J; Clemons, Paul A; Schreiber, Stuart L; Haggarty, Stephen J
2005-01-01
Background Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that can carry different weights. SpectralNET is a flexible application for analyzing and visualizing these biological and chemical networks. Results Available both as a standalone .NET executable and as an ASP.NET web application, SpectralNET was designed specifically with the analysis of graph-theoretic metrics in mind, a computational task not easily accessible using currently available applications. Users can choose either to upload a network for analysis using a variety of input formats, or to have SpectralNET generate an idealized random network for comparison to a real-world dataset. Whichever graph-generation method is used, SpectralNET displays detailed information about each connected component of the graph, including graphs of degree distribution, clustering coefficient by degree, and average distance by degree. In addition, extensive information about the selected vertex is shown, including degree, clustering coefficient, various distance metrics, and the corresponding components of the adjacency, Laplacian, and normalized Laplacian eigenvectors. SpectralNET also displays several graph visualizations, including a linear dimensionality reduction for uploaded datasets (Principal Components Analysis) and a non-linear dimensionality reduction that provides an elegant view of global graph structure (Laplacian eigenvectors). Conclusion SpectralNET provides an easily accessible means of analyzing graph-theoretic metrics for data modeling and dimensionality reduction. SpectralNET is publicly available as both a .NET application and an ASP.NET web application from . Source code is available upon request. PMID:16236170
SpectralNET--an application for spectral graph analysis and visualization.
Forman, Joshua J; Clemons, Paul A; Schreiber, Stuart L; Haggarty, Stephen J
2005-10-19
Graph theory provides a computational framework for modeling a variety of datasets including those emerging from genomics, proteomics, and chemical genetics. Networks of genes, proteins, small molecules, or other objects of study can be represented as graphs of nodes (vertices) and interactions (edges) that can carry different weights. SpectralNET is a flexible application for analyzing and visualizing these biological and chemical networks. Available both as a standalone .NET executable and as an ASP.NET web application, SpectralNET was designed specifically with the analysis of graph-theoretic metrics in mind, a computational task not easily accessible using currently available applications. Users can choose either to upload a network for analysis using a variety of input formats, or to have SpectralNET generate an idealized random network for comparison to a real-world dataset. Whichever graph-generation method is used, SpectralNET displays detailed information about each connected component of the graph, including graphs of degree distribution, clustering coefficient by degree, and average distance by degree. In addition, extensive information about the selected vertex is shown, including degree, clustering coefficient, various distance metrics, and the corresponding components of the adjacency, Laplacian, and normalized Laplacian eigenvectors. SpectralNET also displays several graph visualizations, including a linear dimensionality reduction for uploaded datasets (Principal Components Analysis) and a non-linear dimensionality reduction that provides an elegant view of global graph structure (Laplacian eigenvectors). SpectralNET provides an easily accessible means of analyzing graph-theoretic metrics for data modeling and dimensionality reduction. SpectralNET is publicly available as both a .NET application and an ASP.NET web application from http://chembank.broad.harvard.edu/resources/. Source code is available upon request.
Computing and visualizing time-varying merge trees for high-dimensional data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oesterling, Patrick; Heine, Christian; Weber, Gunther H.
2017-06-03
We introduce a new method that identifies and tracks features in arbitrary dimensions using the merge tree -- a structure for identifying topological features based on thresholding in scalar fields. This method analyzes the evolution of features of the function by tracking changes in the merge tree and relates features by matching subtrees between consecutive time steps. Using the time-varying merge tree, we present a structural visualization of the changing function that illustrates both features and their temporal evolution. We demonstrate the utility of our approach by applying it to temporal cluster analysis of high-dimensional point clouds.
Metsalu, Tauno; Vilo, Jaak
2015-01-01
The Principal Component Analysis (PCA) is a widely used method of reducing the dimensionality of high-dimensional data, often followed by visualizing two of the components on the scatterplot. Although widely used, the method is lacking an easy-to-use web interface that scientists with little programming skills could use to make plots of their own data. The same applies to creating heatmaps: it is possible to add conditional formatting for Excel cells to show colored heatmaps, but for more advanced features such as clustering and experimental annotations, more sophisticated analysis tools have to be used. We present a web tool called ClustVis that aims to have an intuitive user interface. Users can upload data from a simple delimited text file that can be created in a spreadsheet program. It is possible to modify data processing methods and the final appearance of the PCA and heatmap plots by using drop-down menus, text boxes, sliders etc. Appropriate defaults are given to reduce the time needed by the user to specify input parameters. As an output, users can download PCA plot and heatmap in one of the preferred file formats. This web server is freely available at http://biit.cs.ut.ee/clustvis/. PMID:25969447
Analysis of cytokine release assay data using machine learning approaches.
Xiong, Feiyu; Janko, Marco; Walker, Mindi; Makropoulos, Dorie; Weinstock, Daniel; Kam, Moshe; Hrebien, Leonid
2014-10-01
The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development of monoclonal antibody (mAb) therapeutics. In this study, several machine learning approaches are used to analyze CRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential of mAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic (Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from 44 donors. Three (3) machine learning approaches were applied in combination to observations obtained from that assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followed by K-means clustering; and (iii) Decision Tree Classification (DTC). All three approaches were able to identify the treatment that caused the most severe cytokine response. HCA was able to provide information about the expected number of clusters in the data. PCA coupled with K-means clustering allowed classification of treatments sample by sample, and visualizing clusters of treatments. DTC models showed the relative importance of various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem provides better selection of parameters for one method based on outcomes from another, and an overall improved analysis of the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 may be correlated with CRS reactions, although this correlation has not yet been corroborated in the literature. Copyright © 2014 Elsevier B.V. All rights reserved.
Magnetic assembly of 3D cell clusters: visualizing the formation of an engineered tissue.
Ghosh, S; Kumar, S R P; Puri, I K; Elankumaran, S
2016-02-01
Contactless magnetic assembly of cells into 3D clusters has been proposed as a novel means for 3D tissue culture that eliminates the need for artificial scaffolds. However, thus far its efficacy has only been studied by comparing expression levels of generic proteins. Here, it has been evaluated by visualizing the evolution of cell clusters assembled by magnetic forces, to examine their resemblance to in vivo tissues. Cells were labeled with magnetic nanoparticles, then assembled into 3D clusters using magnetic force. Scanning electron microscopy was used to image intercellular interactions and morphological features of the clusters. When cells were held together by magnetic forces for a single day, they formed intercellular contacts through extracellular fibers. These kept the clusters intact once the magnetic forces were removed, thus serving the primary function of scaffolds. The cells self-organized into constructs consistent with the corresponding tissues in vivo. Epithelial cells formed sheets while fibroblasts formed spheroids and exhibited position-dependent morphological heterogeneity. Cells on the periphery of a cluster were flattened while those within were spheroidal, a well-known characteristic of connective tissues in vivo. Cells assembled by magnetic forces presented visual features representative of their in vivo states but largely absent in monolayers. This established the efficacy of contactless assembly as a means to fabricate in vitro tissue models. © 2016 John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Wheeler, K. I.; Levia, D. F., Jr.; Hudson, J. E.
2017-12-01
As trees undergo autumnal processes such as resorption, senescence, and leaf abscission, the dissolved organic matter (DOM) contribution of leaf litter leachate to streams changes. However, little research has investigated how the fluorescent DOM (FDOM) changes throughout the autumn and how this differs inter- and intraspecifically. Two of the major impacts of global climate change on forested ecosystems include altering phenology and causing forest community species and subspecies composition restructuring. We examined changes in FDOM in leachate from American beech (Fagus grandifolia Ehrh.) leaves in Maryland, Rhode Island, Vermont, and North Carolina and yellow poplar (Liriodendron tulipifera L.) leaves from Maryland throughout three different phenophases: green, senescing, and freshly abscissed. Beech leaves from Maryland and Rhode Island have previously been identified as belonging to the same distinct genetic cluster and beech trees from Vermont and the study site in North Carolina from the other. FDOM in samples was characterized using excitation-emission matrices (EEMs) and a six-component parallel factor analysis (PARAFAC) model was created to identify components. Self-organizing maps (SOMs) were used to visualize variation and patterns in the PARAFAC component proportions of the leachate samples. Phenophase and species had the greatest influence on determining where a sample mapped on the SOM when compared to genetic clusters and geographic origin. Throughout senescence, FDOM from all the trees transitioned from more protein-like components to more humic-like ones. Percent greenness of the sampled leaves and the proportion of the tyrosine-like component 1 were found to significantly differ between the two genetic beech clusters. This suggests possible differences in photosynthesis and resorption between the two genetic clusters of beech. The use of SOMs to visualize differences in patterns of senescence between the different species and genetic populations proved to be useful in ways that other multivariate analysis techniques lack.
Rapid Assessment of Visual Impairment in Urban Population of Delhi, India
Gupta, Noopur; Vashist, Praveen; Malhotra, Sumit; Senjam, Suraj Singh; Misra, Vasundhara; Bhardwaj, Amit
2015-01-01
Purpose To determine the prevalence, causes and associated demographic factors related to visual impairment amongst the urban population of New Delhi, India. Methods A population-based, cross-sectional study was conducted in East Delhi district using cluster random sampling methodology. This Rapid Assessment of Visual Impairment (RAVI) survey involved examination of all individuals aged 40 years and above in 24 randomly selected clusters of the district. Visual acuity (VA) assessment and comprehensive ocular examination were done during the door-to-door survey. A questionnaire was used to collect personal and demographic information of the study population. Blindness and Visual Impairment was defined as presenting VA <3/60and <6/18 in the better eye, respectively. Descriptive statistics were computed along with multivariable logistic regression analysis to determine associated factors for visual impairment. Results Of 2421 subjects enumerated, 2331 (96.3%) were available for ophthalmic examination. Among those examined, 49.3% were males. The prevalence of visual impairment (VI) in the study population, was 11.4% (95% C.I. 10.1, 12.7) and that of blindness was 1.2% (95% C.I. 0.8, 1.6). Uncorrected refractive error was the leading cause of VI accounting for 53.4% of all VI followed by cataract (33.8%). With multivariable logistic regression, the odds of having VI increased with age (OR= 24.6[95% C.I.: 14.9, 40.7]; p<0.001). Illiterate participants were more likely to have VI [OR= 1.5 (95% C.I.: 1.1,2.1)] when compared to educated participants. Conclusions The first implementation of the RAVI methodology in a North Indian population revealed that the burden of visual impairment is considerable in this region despite availability of adequate eye care facilities. Awareness generation and simple interventions like cataract surgery and provision of spectacles will help to eliminate the major causes of blindness and visual impairment in this region. PMID:25915659
Rapid assessment of visual impairment in urban population of Delhi, India.
Gupta, Noopur; Vashist, Praveen; Malhotra, Sumit; Senjam, Suraj Singh; Misra, Vasundhara; Bhardwaj, Amit
2015-01-01
To determine the prevalence, causes and associated demographic factors related to visual impairment amongst the urban population of New Delhi, India. A population-based, cross-sectional study was conducted in East Delhi district using cluster random sampling methodology. This Rapid Assessment of Visual Impairment (RAVI) survey involved examination of all individuals aged 40 years and above in 24 randomly selected clusters of the district. Visual acuity (VA) assessment and comprehensive ocular examination were done during the door-to-door survey. A questionnaire was used to collect personal and demographic information of the study population. Blindness and Visual Impairment was defined as presenting VA < 3/60 and < 6/18 in the better eye, respectively. Descriptive statistics were computed along with multivariable logistic regression analysis to determine associated factors for visual impairment. Of 2421 subjects enumerated, 2331 (96.3%) were available for ophthalmic examination. Among those examined, 49.3% were males. The prevalence of visual impairment (VI) in the study population, was 11.4% (95% C.I. 10.1, 12.7) and that of blindness was 1.2% (95% C.I. 0.8, 1.6). Uncorrected refractive error was the leading cause of VI accounting for 53.4% of all VI followed by cataract (33.8%). With multivariable logistic regression, the odds of having VI increased with age (OR = 24.6[95% C.I.: 14.9, 40.7]; p < 0.001). Illiterate participants were more likely to have VI [OR = 1.5 (95% C.I.: 1.1,2.1)] when compared to educated participants. The first implementation of the RAVI methodology in a North Indian population revealed that the burden of visual impairment is considerable in this region despite availability of adequate eye care facilities. Awareness generation and simple interventions like cataract surgery and provision of spectacles will help to eliminate the major causes of blindness and visual impairment in this region.
Orientation selectivity and the functional clustering of synaptic inputs in primary visual cortex
Wilson, Daniel E.; Whitney, David E.; Scholl, Benjamin; Fitzpatrick, David
2016-01-01
The majority of neurons in primary visual cortex are tuned for stimulus orientation, but the factors that account for the range of orientation selectivities exhibited by cortical neurons remain unclear. To address this issue, we used in vivo 2-photon calcium imaging to characterize the orientation tuning and spatial arrangement of synaptic inputs to the dendritic spines of individual pyramidal neurons in layer 2/3 of ferret visual cortex. The summed synaptic input to individual neurons reliably predicted the neuron’s orientation preference, but did not account for differences in orientation selectivity among neurons. These differences reflected a robust input-output nonlinearity that could not be explained by spike threshold alone, and was strongly correlated with the spatial clustering of co-tuned synaptic inputs within the dendritic field. Dendritic branches with more co-tuned synaptic clusters exhibited greater rates of local dendritic calcium events supporting a prominent role for functional clustering of synaptic inputs in dendritic nonlinearities that shape orientation selectivity. PMID:27294510
Data-driven cluster reinforcement and visualization in sparsely-matched self-organizing maps.
Manukyan, Narine; Eppstein, Margaret J; Rizzo, Donna M
2012-05-01
A self-organizing map (SOM) is a self-organized projection of high-dimensional data onto a typically 2-dimensional (2-D) feature map, wherein vector similarity is implicitly translated into topological closeness in the 2-D projection. However, when there are more neurons than input patterns, it can be challenging to interpret the results, due to diffuse cluster boundaries and limitations of current methods for displaying interneuron distances. In this brief, we introduce a new cluster reinforcement (CR) phase for sparsely-matched SOMs. The CR phase amplifies within-cluster similarity in an unsupervised, data-driven manner. Discontinuities in the resulting map correspond to between-cluster distances and are stored in a boundary (B) matrix. We describe a new hierarchical visualization of cluster boundaries displayed directly on feature maps, which requires no further clustering beyond what was implicitly accomplished during self-organization in SOM training. We use a synthetic benchmark problem and previously published microbial community profile data to demonstrate the benefits of the proposed methods.
The capacity limitations of orientation summary statistics
Attarha, Mouna; Moore, Cathleen M.
2015-01-01
The simultaneous–sequential method was used to test the processing capacity of establishing mean orientation summaries. Four clusters of oriented Gabor patches were presented in the peripheral visual field. One of the clusters had a mean orientation that was tilted either left or right while the mean orientations of the other three clusters were roughly vertical. All four clusters were presented at the same time in the simultaneous condition whereas the clusters appeared in temporal subsets of two in the sequential condition. Performance was lower when the means of all four clusters had to be processed concurrently than when only two had to be processed in the same amount of time. The advantage for establishing fewer summaries at a given time indicates that the processing of mean orientation engages limited-capacity processes (Experiment 1). This limitation cannot be attributed to crowding, low target-distractor discriminability, or a limited-capacity comparison process (Experiments 2 and 3). In contrast to the limitations of establishing multiple summary representations, establishing a single summary representation unfolds without interference (Experiment 4). When interpreted in the context of recent work on the capacity of summary statistics, these findings encourage reevaluation of the view that early visual perception consists of summary statistic representations that unfold independently across multiple areas of the visual field. PMID:25810160
Zhang, Y J; Zhou, D H; Bai, Z P; Xue, F X
2018-02-10
Objective: To quantitatively analyze the current status and development trends regarding the land use regression (LUR) models on ambient air pollution studies. Methods: Relevant literature from the PubMed database before June 30, 2017 was analyzed, using the Bibliographic Items Co-occurrence Matrix Builder (BICOMB 2.0). Keywords co-occurrence networks, cluster mapping and timeline mapping were generated, using the CiteSpace 5.1.R5 software. Relevant literature identified in three Chinese databases was also reviewed. Results: Four hundred sixty four relevant papers were retrieved from the PubMed database. The number of papers published showed an annual increase, in line with the growing trend of the index. Most papers were published in the journal of Environmental Health Perspectives . Results from the Co-word cluster analysis identified five clusters: cluster#0 consisted of birth cohort studies related to the health effects of prenatal exposure to air pollution; cluster#1 referred to land use regression modeling and exposure assessment; cluster#2 was related to the epidemiology on traffic exposure; cluster#3 dealt with the exposure to ultrafine particles and related health effects; cluster#4 described the exposure to black carbon and related health effects. Data from Timeline mapping indicated that cluster#0 and#1 were the main research areas while cluster#3 and#4 were the up-coming hot areas of research. Ninety four relevant papers were retrieved from the Chinese databases with most of them related to studies on modeling. Conclusion: In order to better assess the health-related risks of ambient air pollution, and to best inform preventative public health intervention policies, application of LUR models to environmental epidemiology studies in China should be encouraged.
NASA Astrophysics Data System (ADS)
Takuma, Takehisa; Masugi, Masao
2009-03-01
This paper presents an approach to the assessment of IP-network traffic in terms of the time variation of self-similarity. To get a comprehensive view in analyzing the degree of long-range dependence (LRD) of IP-network traffic, we use a hierarchical clustering scheme, which provides a way to classify high-dimensional data with a tree-like structure. Also, in the LRD-based analysis, we employ detrended fluctuation analysis (DFA), which is applicable to the analysis of long-range power-law correlations or LRD in non-stationary time-series signals. Based on sequential measurements of IP-network traffic at two locations, this paper derives corresponding values for the LRD-related parameter α that reflects the degree of LRD of measured data. In performing the hierarchical clustering scheme, we use three parameters: the α value, average throughput, and the proportion of network traffic that exceeds 80% of network bandwidth for each measured data set. We visually confirm that the traffic data can be classified in accordance with the network traffic properties, resulting in that the combined depiction of the LRD and other factors can give us an effective assessment of network conditions at different times.
On three-dimensional misorientation spaces
Bennett, Robbie J.; Vukmanovic, Zoja; Solano-Alvarez, Wilberth; Lainé, Steven J.; Einsle, Joshua F.; Midgley, Paul A.; Rae, Catherine M. F.; Hielscher, Ralf
2017-01-01
Determining the local orientation of crystals in engineering and geological materials has become routine with the advent of modern crystallographic mapping techniques. These techniques enable many thousands of orientation measurements to be made, directing attention towards how such orientation data are best studied. Here, we provide a guide to the visualization of misorientation data in three-dimensional vector spaces, reduced by crystal symmetry, to reveal crystallographic orientation relationships. Domains for all point group symmetries are presented and an analysis methodology is developed and applied to identify crystallographic relationships, indicated by clusters in the misorientation space, in examples from materials science and geology. This analysis aids the determination of active deformation mechanisms and evaluation of cluster centres and spread enables more accurate description of transformation processes supporting arguments regarding provenance. PMID:29118660
Pre-crash scenarios at road junctions: A clustering method for car crash data.
Nitsche, Philippe; Thomas, Pete; Stuetz, Rainer; Welsh, Ruth
2017-10-01
Given the recent advancements in autonomous driving functions, one of the main challenges is safe and efficient operation in complex traffic situations such as road junctions. There is a need for comprehensive testing, either in virtual simulation environments or on real-world test tracks. This paper presents a novel data analysis method including the preparation, analysis and visualization of car crash data, to identify the critical pre-crash scenarios at T- and four-legged junctions as a basis for testing the safety of automated driving systems. The presented method employs k-medoids to cluster historical junction crash data into distinct partitions and then applies the association rules algorithm to each cluster to specify the driving scenarios in more detail. The dataset used consists of 1056 junction crashes in the UK, which were exported from the in-depth "On-the-Spot" database. The study resulted in thirteen crash clusters for T-junctions, and six crash clusters for crossroads. Association rules revealed common crash characteristics, which were the basis for the scenario descriptions. The results support existing findings on road junction accidents and provide benchmark situations for safety performance tests in order to reduce the possible number parameter combinations. Copyright © 2017 Elsevier Ltd. All rights reserved.
A low complexity visualization tool that helps to perform complex systems analysis
NASA Astrophysics Data System (ADS)
Beiró, M. G.; Alvarez-Hamelin, J. I.; Busch, J. R.
2008-12-01
In this paper, we present an extension of large network visualization (LaNet-vi), a tool to visualize large scale networks using the k-core decomposition. One of the new features is how vertices compute their angular position. While in the later version it is done using shell clusters, in this version we use the angular coordinate of vertices in higher k-shells, and arrange the highest shell according to a cliques decomposition. The time complexity goes from O(n\\sqrt n) to O(n) upon bounds on a heavy-tailed degree distribution. The tool also performs a k-core-connectivity analysis, highlighting vertices that are not k-connected; e.g. this property is useful to measure robustness or quality of service (QoS) capabilities in communication networks. Finally, the actual version of LaNet-vi can draw labels and all the edges using transparencies, yielding an accurate visualization. Based on the obtained figure, it is possible to distinguish different sources and types of complex networks at a glance, in a sort of 'network iris-print'.
NASA Astrophysics Data System (ADS)
Krumholz, Mark R.; Adamo, Angela; Fumagalli, Michele; Wofford, Aida; Calzetti, Daniela; Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Grasha, Kathryn; Gouliermis, Dimitrios A.; Kim, Hwihyun; Nair, Preethi; Ryon, Jenna E.; Smith, Linda J.; Thilker, David; Ubeda, Leonardo; Zackrisson, Erik
2015-10-01
We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar.
Exploiting visual search theory to infer social interactions
NASA Astrophysics Data System (ADS)
Rota, Paolo; Dang-Nguyen, Duc-Tien; Conci, Nicola; Sebe, Nicu
2013-03-01
In this paper we propose a new method to infer human social interactions using typical techniques adopted in literature for visual search and information retrieval. The main piece of information we use to discriminate among different types of interactions is provided by proxemics cues acquired by a tracker, and used to distinguish between intentional and casual interactions. The proxemics information has been acquired through the analysis of two different metrics: on the one hand we observe the current distance between subjects, and on the other hand we measure the O-space synergy between subjects. The obtained values are taken at every time step over a temporal sliding window, and processed in the Discrete Fourier Transform (DFT) domain. The features are eventually merged into an unique array, and clustered using the K-means algorithm. The clusters are reorganized using a second larger temporal window into a Bag Of Words framework, so as to build the feature vector that will feed the SVM classifier.
MetaABC--an integrated metagenomics platform for data adjustment, binning and clustering.
Su, Chien-Hao; Hsu, Ming-Tsung; Wang, Tse-Yi; Chiang, Sufeng; Cheng, Jen-Hao; Weng, Francis C; Kao, Cheng-Yan; Wang, Daryi; Tsai, Huai-Kuang
2011-08-15
MetaABC is a metagenomic platform that integrates several binning tools coupled with methods for removing artifacts, analyzing unassigned reads and controlling sampling biases. It allows users to arrive at a better interpretation via series of distinct combinations of analysis tools. After execution, MetaABC provides outputs in various visual formats such as tables, pie and bar charts as well as clustering result diagrams. MetaABC source code and documentation are available at http://bits2.iis.sinica.edu.tw/MetaABC/ CONTACT: dywang@gate.sinica.edu.tw; hktsai@iis.sinica.edu.tw Supplementary data are available at Bioinformatics online.
Wu, Desheng; Song, Yu; Xie, Kefan; Zhang, Baofeng
2018-04-25
Chemical accidents are major causes of environmental losses and have been debated due to the potential threat to human beings and environment. Compared with the single statistical analysis, co-word analysis of chemical accidents illustrates significant traits at various levels and presents data into a visual network. This study utilizes a co-word analysis of the keywords extracted from the Web crawling texts of environmental loss-related chemical accidents and uses the Pearson's correlation coefficient to examine the internal attributes. To visualize the keywords of the accidents, this study carries out a multidimensional scaling analysis applying PROXSCAL and centrality identification. The research results show that an enormous environmental cost is exacted, especially given the expected environmental loss-related chemical accidents with geographical features. Meanwhile, each event often brings more than one environmental impact. Large number of chemical substances are released in the form of solid, liquid, and gas, leading to serious results. Eight clusters that represent the traits of these accidents are formed, including "leakage," "poisoning," "explosion," "pipeline crack," "river pollution," "dust pollution," "emission," and "industrial effluent." "Explosion" and "gas" possess a strong correlation with "poisoning," located at the center of visualization map.
Clustering Module in OLAP for Horticultural Crops using SpagoBI
NASA Astrophysics Data System (ADS)
Putri, D.; Sitanggang, I. S.
2017-03-01
Horticultural crops data are organized by the Ministry of Agriculture, Republic of Indonesia. The data are presented annually in a tabular form and result a large data set. This situation makes users difficult to obtain summaries of horticultural crops data. This study aims to develop a clustering module in the SOLAP system for the distribution of horticultural crops in Indonesia and to visualize the results of clustering in a map using SpagoBI. The algorithm used for clustering is K-Means. Horticultural crops data include vegetables, ornamental plants, medicinal plants, and fruits from 2000 to 2013. The clustering module displays clustering results of horticultural crops in the form of text and table on SpagoBI. This module can also visualize the distribution of horticultural crops in the form of map on the HTML page. The application is expected to be useful for users in order to easily obtain summaries of the horticultural crops distribution data and its clusters. The summaries and clusters can be beneficial for the stakeholders to determine potential areas in Indonesia for horticultural crops.
Spatio-temporal Analysis for New York State SPARCS Data
Chen, Xin; Wang, Yu; Schoenfeld, Elinor; Saltz, Mary; Saltz, Joel; Wang, Fusheng
2017-01-01
Increased accessibility of health data provides unique opportunities to discover spatio-temporal patterns of diseases. For example, New York State SPARCS (Statewide Planning and Research Cooperative System) data collects patient level detail on patient demographics, diagnoses, services, and charges for each hospital inpatient stay and outpatient visit. Such data also provides home addresses for each patient. This paper presents our preliminary work on spatial, temporal, and spatial-temporal analysis of disease patterns for New York State using SPARCS data. We analyzed spatial distribution patterns of typical diseases at ZIP code level. We performed temporal analysis of common diseases based on 12 years’ historical data. We then compared the spatial variations for diseases with different levels of clustering tendency, and studied the evolution history of such spatial patterns. Case studies based on asthma demonstrated that the discovered spatial clusters are consistent with prior studies. We visualized our spatial-temporal patterns as animations through videos. PMID:28815148
Popkirov, Stoyan; Jungilligens, Johannes; Schlegel, Uwe; Wellmer, Jörg
2018-06-01
Dissociative seizures are a common and often elusive differential diagnosis in epilepsy centers. Considering their high prevalence, long diagnostic delays, and disappointing rates of treatment response, scientific research dedicated to dissociative seizures is surprisingly scarce. In order to chart the scientific landscape of dissociative seizures and to visualize thematic clusters and trends in research, a comprehensive bibliometric analysis was performed. The Web of Science database was examined to identify relevant English language documents from the last half-century. A total of 1751 documents with titles referring to dissociative seizures were identified. Automated textual analysis of all titles and abstracts revealed that research clusters around three major topics: differential diagnosis in epilepsy centers, management and treatment, and psychopathology. Time analysis of term networks revealed that the focus of clinical research has moved from diagnostic procedures to treatment approaches. Furthermore, interest within etiological research is shifting from an emphasis on early life trauma and personality traits to the role of anxiety and emotion regulation. With respect to individual contributing authors, a relatively small network of prolific scientists with a remarkable degree of collaboration emerges. By mapping relevant publications, it becomes evident that dissociative seizures still represent a subject mostly within the realm of neurology and epileptology, with a tendency to settle in the latter domain. This analysis sheds light on an important niche subject and highlights trends in research focus and output. Copyright © 2018 Elsevier Inc. All rights reserved.
StreamExplorer: A Multi-Stage System for Visually Exploring Events in Social Streams.
Wu, Yingcai; Chen, Zhutian; Sun, Guodao; Xie, Xiao; Cao, Nan; Liu, Shixia; Cui, Weiwei
2017-10-18
Analyzing social streams is important for many applications, such as crisis management. However, the considerable diversity, increasing volume, and high dynamics of social streams of large events continue to be significant challenges that must be overcome to ensure effective exploration. We propose a novel framework by which to handle complex social streams on a budget PC. This framework features two components: 1) an online method to detect important time periods (i.e., subevents), and 2) a tailored GPU-assisted Self-Organizing Map (SOM) method, which clusters the tweets of subevents stably and efficiently. Based on the framework, we present StreamExplorer to facilitate the visual analysis, tracking, and comparison of a social stream at three levels. At a macroscopic level, StreamExplorer uses a new glyph-based timeline visualization, which presents a quick multi-faceted overview of the ebb and flow of a social stream. At a mesoscopic level, a map visualization is employed to visually summarize the social stream from either a topical or geographical aspect. At a microscopic level, users can employ interactive lenses to visually examine and explore the social stream from different perspectives. Two case studies and a task-based evaluation are used to demonstrate the effectiveness and usefulness of StreamExplorer.Analyzing social streams is important for many applications, such as crisis management. However, the considerable diversity, increasing volume, and high dynamics of social streams of large events continue to be significant challenges that must be overcome to ensure effective exploration. We propose a novel framework by which to handle complex social streams on a budget PC. This framework features two components: 1) an online method to detect important time periods (i.e., subevents), and 2) a tailored GPU-assisted Self-Organizing Map (SOM) method, which clusters the tweets of subevents stably and efficiently. Based on the framework, we present StreamExplorer to facilitate the visual analysis, tracking, and comparison of a social stream at three levels. At a macroscopic level, StreamExplorer uses a new glyph-based timeline visualization, which presents a quick multi-faceted overview of the ebb and flow of a social stream. At a mesoscopic level, a map visualization is employed to visually summarize the social stream from either a topical or geographical aspect. At a microscopic level, users can employ interactive lenses to visually examine and explore the social stream from different perspectives. Two case studies and a task-based evaluation are used to demonstrate the effectiveness and usefulness of StreamExplorer.
Visualization of Unsteady Computational Fluid Dynamics
NASA Technical Reports Server (NTRS)
Haimes, Robert
1997-01-01
The current compute environment that most researchers are using for the calculation of 3D unsteady Computational Fluid Dynamic (CFD) results is a super-computer class machine. The Massively Parallel Processors (MPP's) such as the 160 node IBM SP2 at NAS and clusters of workstations acting as a single MPP (like NAS's SGI Power-Challenge array and the J90 cluster) provide the required computation bandwidth for CFD calculations of transient problems. If we follow the traditional computational analysis steps for CFD (and we wish to construct an interactive visualizer) we need to be aware of the following: (1) Disk space requirements. A single snap-shot must contain at least the values (primitive variables) stored at the appropriate locations within the mesh. For most simple 3D Euler solvers that means 5 floating point words. Navier-Stokes solutions with turbulence models may contain 7 state-variables. (2) Disk speed vs. Computational speeds. The time required to read the complete solution of a saved time frame from disk is now longer than the compute time for a set number of iterations from an explicit solver. Depending, on the hardware and solver an iteration of an implicit code may also take less time than reading the solution from disk. If one examines the performance improvements in the last decade or two, it is easy to see that depending on disk performance (vs. CPU improvement) may not be the best method for enhancing interactivity. (3) Cluster and Parallel Machine I/O problems. Disk access time is much worse within current parallel machines and cluster of workstations that are acting in concert to solve a single problem. In this case we are not trying to read the volume of data, but are running the solver and the solver outputs the solution. These traditional network interfaces must be used for the file system. (4) Numerics of particle traces. Most visualization tools can work upon a single snap shot of the data but some visualization tools for transient problems require dealing with time.
Visualization of unsteady computational fluid dynamics
NASA Astrophysics Data System (ADS)
Haimes, Robert
1994-11-01
A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
Visualization of unsteady computational fluid dynamics
NASA Technical Reports Server (NTRS)
Haimes, Robert
1994-01-01
A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.
Bobkov, Iu G; Machula, A I; Morozov, Iu I; Dvalishvili, E G
1987-11-01
Evoked visual potentials in associated, parietal and second somatosensory zones of the neocortex were analysed in trained cats using implanted electrodes. The influence of bemethyl on the structure of behavioral reactions was analysed using theoretical methods of perceptual images, particularly the method of cluster analysis. Bemethyl was shown to increase the level of interaction between the functional elements of the system, leading to a more stable resolution of problems facing the system, as compared to the initial state.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heidelberg, S T; Fitzgerald, K J; Richmond, G H
2006-01-24
There has been substantial development of the Lustre parallel filesystem prior to the configuration described below for this milestone. The initial Lustre filesystems that were deployed were directly connected to the cluster interconnect, i.e. Quadrics Elan3. That is, the clients (OSSes) and Meta-data Servers (MDS) were all directly connected to the cluster's internal high speed interconnect. This configuration serves a single cluster very well, but does not provide sharing of the filesystem among clusters. LLNL funded the development of high-efficiency ''portals router'' code by CFS (the company that develops Lustre) to enable us to move the Lustre servers to amore » GigE-connected network configuration, thus making it possible to connect to the servers from several clusters. With portals routing available, here is what changes: (1) another storage-only cluster is deployed to front the Lustre storage devices (these become the Lustre OSSes and MDS), (2) this ''Lustre cluster'' is attached via GigE connections to a large GigE switch/router cloud, (3) a small number of compute-cluster nodes are designated as ''gateway'' or ''portal router'' nodes, and (4) the portals router nodes are GigE-connected to the switch/router cloud. The Lustre configuration is then changed to reflect the new network paths. A typical example of this is a compute cluster and a related visualization cluster: the compute cluster produces the data (writes it to the Lustre filesystem), and the visualization cluster consumes some of the data (reads it from the Lustre filesystem). This process can be expanded by aggregating several collections of Lustre backend storage resources into one or more ''centralized'' Lustre filesystems, and then arranging to have several ''client'' clusters mount these centralized filesystems. The ''client clusters'' can be any combination of compute, visualization, archiving, or other types of cluster. This milestone demonstrates the operation and performance of a scaled-down version of such a large, centralized, shared Lustre filesystem concept.« less
Implementation of visual data mining for unsteady blood flow field in an aortic aneurysm.
Morizawa, Seiichiro; Shimoyama, Koji; Obayashi, Shigeru; Funamoto, Kenichi; Hayase, Toshiyuki
2011-12-01
This study was performed to determine the relations between the features of wall shear stress and aneurysm rupture. For this purpose, visual data mining was performed in unsteady blood flow simulation data for an aortic aneurysm. The time-series data of wall shear stress given at each grid point were converted to spatial and temporal indices, and the grid points were sorted using a self-organizing map based on the similarity of these indices. Next, the results of cluster analysis were mapped onto the real space of the aortic aneurysm to specify the regions that may lead to aneurysm rupture. With reference to previous reports regarding aneurysm rupture, the visual data mining suggested specific hemodynamic features that cause aneurysm rupture. GRAPHICAL ABSTRACT:
ERIC Educational Resources Information Center
Hay-McCutcheon, Marcia J.; Hyams, Adriana; Yang, Xin; Parton, Jason; Panasiuk, Brianna; Ondocsin, Sarah; James, Mary Margaret; Scogin, Forrest
2017-01-01
Purpose: The purpose of this preliminary study was to explore the associations among hearing loss, physical health, and visual memory in adults living in rural areas, urban clusters, and an urban city in west Central Alabama. Method: Two hundred ninety-seven adults (182 women, 115 men) from rural areas, urban clusters, and an urban city of west…
A reference guide for tree analysis and visualization
2010-01-01
The quantities of data obtained by the new high-throughput technologies, such as microarrays or ChIP-Chip arrays, and the large-scale OMICS-approaches, such as genomics, proteomics and transcriptomics, are becoming vast. Sequencing technologies become cheaper and easier to use and, thus, large-scale evolutionary studies towards the origins of life for all species and their evolution becomes more and more challenging. Databases holding information about how data are related and how they are hierarchically organized expand rapidly. Clustering analysis is becoming more and more difficult to be applied on very large amounts of data since the results of these algorithms cannot be efficiently visualized. Most of the available visualization tools that are able to represent such hierarchies, project data in 2D and are lacking often the necessary user friendliness and interactivity. For example, the current phylogenetic tree visualization tools are not able to display easy to understand large scale trees with more than a few thousand nodes. In this study, we review tools that are currently available for the visualization of biological trees and analysis, mainly developed during the last decade. We describe the uniform and standard computer readable formats to represent tree hierarchies and we comment on the functionality and the limitations of these tools. We also discuss on how these tools can be developed further and should become integrated with various data sources. Here we focus on freely available software that offers to the users various tree-representation methodologies for biological data analysis. PMID:20175922
ERIC Educational Resources Information Center
Campbell, Daniel J.; Shic, Frederick; Macari, Suzanne; Chawarska, Katarzyna
2014-01-01
Variability in attention towards direct gaze and child-directed speech may contribute to heterogeneity of clinical presentation in toddlers with autism spectrum disorders (ASD). To evaluate this hypothesis, we clustered sixty-five 20-month-old toddlers with ASD based on their visual responses to dyadic cues for engagement, identifying three…
Multi-Spacecraft Analysis with Generic Visualization Tools
NASA Astrophysics Data System (ADS)
Mukherjee, J.; Vela, L.; Gonzalez, C.; Jeffers, S.
2010-12-01
To handle the needs of scientists today and in the future, software tools are going to have to take better advantage of the currently available hardware. Specifically, computing power, memory, and disk space have become cheaper, while bandwidth has become more expensive due to the explosion of online applications. To overcome these limitations, we have enhanced our Southwest Data Display and Analysis System (SDDAS) to take better advantage of the hardware by utilizing threads and data caching. Furthermore, the system was enhanced to support a framework for adding data formats and data visualization methods without costly rewrites. Visualization tools can speed analysis of many common scientific tasks and we will present a suite of tools that encompass the entire process of retrieving data from multiple data stores to common visualizations of the data. The goals for the end user are ease of use and interactivity with the data and the resulting plots. The data can be simultaneously plotted in a variety of formats and/or time and spatial resolutions. The software will allow one to slice and separate data to achieve other visualizations. Furthermore, one can interact with the data using the GUI or through an embedded language based on the Lua scripting language. The data presented will be primarily from the Cluster and Mars Express missions; however, the tools are data type agnostic and can be used for virtually any type of data.
Visual exploration of high-dimensional data through subspace analysis and dynamic projections
Liu, S.; Wang, B.; Thiagarajan, J. J.; ...
2015-06-01
Here, we introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that createmore » smooth animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less
Visual Exploration of High-Dimensional Data through Subspace Analysis and Dynamic Projections
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, S.; Wang, B.; Thiagarajan, Jayaraman J.
2015-06-01
We introduce a novel interactive framework for visualizing and exploring high-dimensional datasets based on subspace analysis and dynamic projections. We assume the high-dimensional dataset can be represented by a mixture of low-dimensional linear subspaces with mixed dimensions, and provide a method to reliably estimate the intrinsic dimension and linear basis of each subspace extracted from the subspace clustering. Subsequently, we use these bases to define unique 2D linear projections as viewpoints from which to visualize the data. To understand the relationships among the different projections and to discover hidden patterns, we connect these projections through dynamic projections that create smoothmore » animated transitions between pairs of projections. We introduce the view transition graph, which provides flexible navigation among these projections to facilitate an intuitive exploration. Finally, we provide detailed comparisons with related systems, and use real-world examples to demonstrate the novelty and usability of our proposed framework.« less
NASA Astrophysics Data System (ADS)
Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.
1996-04-01
This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruebel, Oliver
2009-11-20
Knowledge discovery from large and complex collections of today's scientific datasets is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the increasing number of data dimensions and data objects is presenting tremendous challenges for data analysis and effective data exploration methods and tools. Researchers are overwhelmed with data and standard tools are often insufficient to enable effective data analysis and knowledge discovery. The main objective of this thesis is to provide important new capabilities to accelerate scientific knowledge discovery form large, complex, and multivariate scientific data. The research coveredmore » in this thesis addresses these scientific challenges using a combination of scientific visualization, information visualization, automated data analysis, and other enabling technologies, such as efficient data management. The effectiveness of the proposed analysis methods is demonstrated via applications in two distinct scientific research fields, namely developmental biology and high-energy physics.Advances in microscopy, image analysis, and embryo registration enable for the first time measurement of gene expression at cellular resolution for entire organisms. Analysis of high-dimensional spatial gene expression datasets is a challenging task. By integrating data clustering and visualization, analysis of complex, time-varying, spatial gene expression patterns and their formation becomes possible. The analysis framework MATLAB and the visualization have been integrated, making advanced analysis tools accessible to biologist and enabling bioinformatic researchers to directly integrate their analysis with the visualization. Laser wakefield particle accelerators (LWFAs) promise to be a new compact source of high-energy particles and radiation, with wide applications ranging from medicine to physics. To gain insight into the complex physical processes of particle acceleration, physicists model LWFAs computationally. The datasets produced by LWFA simulations are (i) extremely large, (ii) of varying spatial and temporal resolution, (iii) heterogeneous, and (iv) high-dimensional, making analysis and knowledge discovery from complex LWFA simulation data a challenging task. To address these challenges this thesis describes the integration of the visualization system VisIt and the state-of-the-art index/query system FastBit, enabling interactive visual exploration of extremely large three-dimensional particle datasets. Researchers are especially interested in beams of high-energy particles formed during the course of a simulation. This thesis describes novel methods for automatic detection and analysis of particle beams enabling a more accurate and efficient data analysis process. By integrating these automated analysis methods with visualization, this research enables more accurate, efficient, and effective analysis of LWFA simulation data than previously possible.« less
SEURAT: visual analytics for the integrated analysis of microarray data.
Gribov, Alexander; Sill, Martin; Lück, Sonja; Rücker, Frank; Döhner, Konstanze; Bullinger, Lars; Benner, Axel; Unwin, Antony
2010-06-03
In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data.
PDBFlex: exploring flexibility in protein structures
Hrabe, Thomas; Li, Zhanwen; Sedova, Mayya; Rotkiewicz, Piotr; Jaroszewski, Lukasz; Godzik, Adam
2016-01-01
The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software. PMID:26615193
Brownian model of transcriptome evolution and phylogenetic network visualization between tissues.
Gu, Xun; Ruan, Hang; Su, Zhixi; Zou, Yangyun
2017-09-01
While phylogenetic analysis of transcriptomes of the same tissue is usually congruent with the species tree, the controversy emerges when multiple tissues are included, that is, whether species from the same tissue are clustered together, or different tissues from the same species are clustered together. Recent studies have suggested that phylogenetic network approach may shed some lights on our understanding of multi-tissue transcriptome evolution; yet the underlying evolutionary mechanism remains unclear. In this paper we develop a Brownian-based model of transcriptome evolution under the phylogenetic network that can statistically distinguish between the patterns of species-clustering and tissue-clustering. Our model can be used as a null hypothesis (neutral transcriptome evolution) for testing any correlation in tissue evolution, can be applied to cancer transcriptome evolution to study whether two tumors of an individual appeared independently or via metastasis, and can be useful to detect convergent evolution at the transcriptional level. Copyright © 2017. Published by Elsevier Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Glagolev, Mikhail K.; Vasilevskaya, Valentina V., E-mail: vvvas@polly.phys.msu.ru; Khokhlov, Alexei R.
Impact of mixture composition on self-organization in concentrated solutions of stiff helical and flexible macromolecules was studied by means of molecular dynamics simulation. The macromolecules were composed of identical amphiphilic monomer units but a fraction f of macromolecules had stiff helical backbones and the remaining chains were flexible. In poor solvents the compacted flexible macromolecules coexist with bundles or filament clusters from few intertwined stiff helical macromolecules. The increase of relative content f of helical macromolecules leads to increase of the length of helical clusters, to alignment of clusters with each other, and then to liquid-crystalline-like ordering along a singlemore » direction. The formation of filament clusters causes segregation of helical and flexible macromolecules and the alignment of the filaments induces effective liquid-like ordering of flexible macromolecules. A visual analysis and calculation of order parameter relaying the anisotropy of diffraction allow concluding that transition from disordered to liquid-crystalline state proceeds sharply at relatively low content of stiff components.« less
Mansour, Ahmad M; Hamade, Haya; Ghaddar, Ayman; Mokadem, Ahmad Samih; El Hajj Ali, Mohamad; Awwad, Shady
2012-01-01
To present the visual outcomes and ocular sequelae of victims of cluster bombs. This retrospective, multicenter case series of ocular injury due to cluster bombs was conducted for 3 years after the war in South Lebanon (July 2006). Data were gathered from the reports to the Information Management System for Mine Action. There were 308 victims of clusters bombs; 36 individuals were killed, of which 2 received ocular lacerations and; 272 individuals were injured with 18 receiving ocular injury. These 18 surviving individuals were assessed by the authors. Ocular injury occurred in 6.5% (20/308) of cluster bomb victims. Trauma to multiple organs occurred in 12 of 18 cases (67%) with ocular injury. Ocular findings included corneal or scleral lacerations (16 eyes), corneal foreign bodies (9 eyes), corneal decompensation (2 eyes), ruptured cataract (6 eyes), and intravitreal foreign bodies (10 eyes). The corneas of one patient had extreme attenuation of the endothelium. Ocular injury occurred in 6.5% of cluster bomb victims and 67% of the patients with ocular injury sustained trauma to multiple organs. Visual morbidity in civilians is an additional reason for a global ban on the use of cluster bombs.
NASA Astrophysics Data System (ADS)
Kumar, Raj; Sharma, Vishal
2017-03-01
The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%).
Visualization analysis of author collaborations in schizophrenia research.
Wu, Ying; Duan, Zhiguang
2015-02-19
Schizophrenia is a serious mental illness that levies a heavy medical toll and cost burden throughout the world. Scientific collaborations are necessary for progress in psychiatric research. However, there have been few publications on scientific collaborations in schizophrenia. The aim of this study was to investigate the extent of author collaborations in schizophrenia research. This study used 58,107 records on schizophrenia from 2003 to 2012 which were downloaded from Science Citation Index Expanded (SCI Expanded) via Web of Science. CiteSpace III, an information visualization and analysis software, was used to make a visual analysis. Collaborative author networks within the field of schizophrenia were determined using published documents. We found that external author collaboration networks were more scattered while potential author collaboration networks were more compact. Results from hierarchical clustering analysis showed that the main collaborative field was genetic research in schizophrenia. Based on the results, authors belonging to different institutions and in different countries should be encouraged to collaborate in schizophrenia research. This will help researchers focus their studies on key issues, and allow each other to offer reasonable suggestions for making polices and providing scientific evidence to effectively diagnose, prevent, and cure schizophrenia.
Regression analysis for LED color detection of visual-MIMO system
NASA Astrophysics Data System (ADS)
Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo
2018-04-01
Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.
Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo
2017-01-01
Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
Creating a Parallel Version of VisIt for Microsoft Windows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitlock, B J; Biagas, K S; Rawson, P L
2011-12-07
VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Analysis of Patent Databases Using VxInsight
DOE Office of Scientific and Technical Information (OSTI.GOV)
BOYACK,KEVIN W.; WYLIE,BRIAN N.; DAVIDSON,GEORGE S.
2000-12-12
We present the application of a new knowledge visualization tool, VxInsight, to the mapping and analysis of patent databases. Patent data are mined and placed in a database, relationships between the patents are identified, primarily using the citation and classification structures, then the patents are clustered using a proprietary force-directed placement algorithm. Related patents cluster together to produce a 3-D landscape view of the tens of thousands of patents. The user can navigate the landscape by zooming into or out of regions of interest. Querying the underlying database places a colored marker on each patent matching the query. Automatically generatedmore » labels, showing landscape content, update continually upon zooming. Optionally, citation links between patents may be shown on the landscape. The combination of these features enables powerful analyses of patent databases.« less
NASA Astrophysics Data System (ADS)
Böhm, J.; Bredif, M.; Gierlinger, T.; Krämer, M.; Lindenberg, R.; Liu, K.; Michel, F.; Sirmacek, B.
2016-06-01
Current 3D data capturing as implemented on for example airborne or mobile laser scanning systems is able to efficiently sample the surface of a city by billions of unselective points during one working day. What is still difficult is to extract and visualize meaningful information hidden in these point clouds with the same efficiency. This is where the FP7 IQmulus project enters the scene. IQmulus is an interactive facility for processing and visualizing big spatial data. In this study the potential of IQmulus is demonstrated on a laser mobile mapping point cloud of 1 billion points sampling ~ 10 km of street environment in Toulouse, France. After the data is uploaded to the IQmulus Hadoop Distributed File System, a workflow is defined by the user consisting of retiling the data followed by a PCA driven local dimensionality analysis, which runs efficiently on the IQmulus cloud facility using a Spark implementation. Points scattering in 3 directions are clustered in the tree class, and are separated next into individual trees. Five hours of processing at the 12 node computing cluster results in the automatic identification of 4000+ urban trees. Visualization of the results in the IQmulus fat client helps users to appreciate the results, and developers to identify remaining flaws in the processing workflow.
Visualizing Time-Varying Distribution Data in EOS Application
NASA Technical Reports Server (NTRS)
Shen, Han-Wei
2004-01-01
In this research, we have developed several novel visualization methods for spatial probability density function data. Our focus has been on 2D spatial datasets, where each pixel is a random variable, and has multiple samples which are the results of experiments on that random variable. We developed novel clustering algorithms as a means to reduce the information contained in these datasets; and investigated different ways of interpreting and clustering the data.
Access and visualization using clusters and other parallel computers
NASA Technical Reports Server (NTRS)
Katz, Daniel S.; Bergou, Attila; Berriman, Bruce; Block, Gary; Collier, Jim; Curkendall, Dave; Good, John; Husman, Laura; Jacob, Joe; Laity, Anastasia;
2003-01-01
JPL's Parallel Applications Technologies Group has been exploring the issues of data access and visualization of very large data sets over the past 10 or so years. this work has used a number of types of parallel computers, and today includes the use of commodity clusters. This talk will highlight some of the applications and tools we have developed, including how they use parallel computing resources, and specifically how we are using modern clusters. Our applications focus on NASA's needs; thus our data sets are usually related to Earth and Space Science, including data delivered from instruments in space, and data produced by telescopes on the ground.
Visual target modulation of functional connectivity networks revealed by self-organizing group ICA.
van de Ven, Vincent; Bledowski, Christoph; Prvulovic, David; Goebel, Rainer; Formisano, Elia; Di Salle, Francesco; Linden, David E J; Esposito, Fabrizio
2008-12-01
We applied a data-driven analysis based on self-organizing group independent component analysis (sogICA) to fMRI data from a three-stimulus visual oddball task. SogICA is particularly suited to the investigation of the underlying functional connectivity and does not rely on a predefined model of the experiment, which overcomes some of the limitations of hypothesis-driven analysis. Unlike most previous applications of ICA in functional imaging, our approach allows the analysis of the data at the group level, which is of particular interest in high order cognitive studies. SogICA is based on the hierarchical clustering of spatially similar independent components, derived from single subject decompositions. We identified four main clusters of components, centered on the posterior cingulate, bilateral insula, bilateral prefrontal cortex, and right posterior parietal and prefrontal cortex, consistently across all participants. Post hoc comparison of time courses revealed that insula, prefrontal cortex and right fronto-parietal components showed higher activity for targets than for distractors. Activation for distractors was higher in the posterior cingulate cortex, where deactivation was observed for targets. While our results conform to previous neuroimaging studies, they also complement conventional results by showing functional connectivity networks with unique contributions to the task that were consistent across subjects. SogICA can thus be used to probe functional networks of active cognitive tasks at the group-level and can provide additional insights to generate new hypotheses for further study. Copyright 2007 Wiley-Liss, Inc.
Successful ageing: A study of the literature using citation network analysis.
Kusumastuti, Sasmita; Derks, Marloes G M; Tellier, Siri; Di Nucci, Ezio; Lund, Rikke; Mortensen, Erik Lykke; Westendorp, Rudi G J
2016-11-01
Ageing is accompanied by an increased risk of disease and a loss of functioning on several bodily and mental domains and some argue that maintaining health and functioning is essential for a successful old age. Paradoxically, studies have shown that overall wellbeing follows a curvilinear pattern with the lowest point at middle age but increases thereafter up to very old age. To shed further light on this paradox, we reviewed the existing literature on how scholars define successful ageing and how they weigh the contribution of health and functioning to define success. We performed a novel, hypothesis-free and quantitative analysis of citation networks exploring the literature on successful ageing that exists in the Web of Science Core Collection Database using the CitNetExplorer software. Outcomes were visualized using timeline-based citation patterns. The clusters and sub-clusters of citation networks identified were starting points for in-depth qualitative analysis. Within the literature from 1902 through 2015, two distinct citation networks were identified. The first cluster had 1146 publications and 3946 citation links. It focused on successful ageing from the perspective of older persons themselves. Analysis of the various sub-clusters emphasized the importance of coping strategies, psycho-social engagement, and cultural differences. The second cluster had 609 publications and 1682 citation links and viewed successful ageing based on the objective measurements as determined by researchers. Subsequent sub-clustering analysis pointed to different domains of functioning and various ways of assessment. In the current literature two mutually exclusive concepts of successful ageing are circulating that depend on whether the individual himself or an outsider judges the situation. These different points of view help to explain the disability paradox, as successful ageing lies in the eyes of the beholder. Copyright © 2016 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Bhavnani, Suresh K.; Chen, Tianlong; Ayyaswamy, Archana; Visweswaran, Shyam; Bellala, Gowtham; Rohit, Divekar; Kevin E., Bassler
2017-01-01
A primary goal of precision medicine is to identify patient subgroups based on their characteristics (e.g., comorbidities or genes) with the goal of designing more targeted interventions. While network visualization methods such as Fruchterman-Reingold have been used to successfully identify such patient subgroups in small to medium sized data sets, they often fail to reveal comprehensible visual patterns in large and dense networks despite having significant clustering. We therefore developed an algorithm called ExplodeLayout, which exploits the existence of significant clusters in bipartite networks to automatically “explode” a traditional network layout with the goal of separating overlapping clusters, while at the same time preserving key network topological properties that are critical for the comprehension of patient subgroups. We demonstrate the utility of ExplodeLayout by visualizing a large dataset extracted from Medicare consisting of readmitted hip-fracture patients and their comorbidities, demonstrate its statistically significant improvement over a traditional layout algorithm, and discuss how the resulting network visualization enabled clinicians to infer mechanisms precipitating hospital readmission in specific patient subgroups. PMID:28815099
Muscle ischaemia associated with NXP2 autoantibodies: a severe subtype of juvenile dermatomyositis.
Aouizerate, Jessie; De Antonio, Marie; Bader-Meunier, Brigitte; Barnerias, Christine; Bodemer, Christine; Isapof, Arnaud; Quartier, Pierre; Melki, Isabelle; Charuel, Jean-Luc; Bassez, Guillaume; Desguerre, Isabelle; Gherardi, Romain K; Authier, François-Jérôme; Gitiaux, Cyril
2018-05-01
Myositis-specific autoantibodies (MSAs) are increasingly used to delineate distinct subgroups of JDM. The aim of our study was to explore without a priori hypotheses whether MSAs are associated with distinct clinical-pathological changes and severity in a monocentric JDM cohort. Clinical, biological and histological findings from 23 JDM patients were assessed. Twenty-six histopathological parameters were subjected to multivariate analysis. Autoantibodies included anti-NXP2 (9/23), anti-TIF1γ (4/23), anti-MDA5 (2/23), no MSAs (8/23). Multivariate analysis yielded two histopathological clusters. Cluster 1 (n = 11) showed a more severe and ischaemic pattern than cluster 2 (n = 12) assessed by: total score severity ⩾ 20 (100.0% vs 25.0%); visual analogic score ⩾6 (100.0% vs 25.0%); the vascular domain score >1 (100.0% vs 41.7%); microinfarcts (100% vs 58.3%); ischaemic myofibrillary loss (focal punched-out vacuoles) (90.9 vs 25%); and obvious capillary loss (81.8% vs 16.7). Compared with cluster 2, patients in cluster 1 had strikingly more often anti-NXP2 antibodies (7/11 vs 2/12), more pronounced muscle weakness, more gastrointestinal involvement and required more aggressive treatment. Furthermore, patients with anti-NXP2 antibodies, mostly assigned in the first cluster, also displayed more severe muscular disease, requiring more aggressive treatment and having a lower remission rate during the follow-up period. Marked muscle ischaemic involvement and the presence of anti-NXP2 autoantibodies are associated with more severe forms of JDM.
Normal versus High Tension Glaucoma: A Comparison of Functional and Structural Defects
Thonginnetra, Oraorn; Greenstein, Vivienne C.; Chu, David; Liebmann, Jeffrey M.; Ritch, Robert; Hood, Donald C.
2009-01-01
Purpose To compare visual field defects obtained with both multifocal visual evoked potential (mfVEP) and Humphrey visual field (HVF) techniques to topographic optic disc measurements in patients with normal tension glaucoma (NTG) and high tension glaucoma (HTG). Methods We studied 32 patients with NTG and 32 with HTG. All patients had reliable 24-2 HVFs with a mean deviation (MD) of −10 dB or better, a glaucomatous optic disc and an abnormal HVF in at least one eye. Multifocal VEPs were obtained from each eye and probability plots created. The mfVEP and HVF probability plots were divided into a central 10-degree (radius) and an outer arcuate subfield in both superior and inferior hemifields. Cluster analyses and counts of abnormal points were performed in each subfield. Optic disc images were obtained with the Heidelberg Retina Tomograph III (HRT III). Eleven stereometric parameters were calculated. Moorfields regression analysis (MRA) and the glaucoma probability score (GPS) were performed. Results There were no significant differences in MD and PSD values between NTG and HTG eyes. However, NTG eyes had a higher percentage of abnormal test points and clusters of abnormal points in the central subfields on both mfVEP and HVF than HTG eyes. For HRT III, there were no significant differences in the 11 stereometric parameters or in the MRA and GPS analyses of the optic disc images. Conclusions The visual field data suggest more localized and central defects for NTG than HTG. PMID:19223786
Paladino, Simona; Lebreton, Stéphanie; Lelek, Mickaël; Riccio, Patrizia; De Nicola, Sergio; Zimmer, Christophe
2017-01-01
Spatio-temporal compartmentalization of membrane proteins is critical for the regulation of diverse vital functions in eukaryotic cells. It was previously shown that, at the apical surface of polarized MDCK cells, glycosylphosphatidylinositol (GPI)-anchored proteins (GPI-APs) are organized in small cholesterol-independent clusters of single GPI-AP species (homoclusters), which are required for the formation of larger cholesterol-dependent clusters formed by multiple GPI-AP species (heteroclusters). This clustered organization is crucial for the biological activities of GPI-APs; hence, understanding the spatio-temporal properties of their membrane organization is of fundamental importance. Here, by using direct stochastic optical reconstruction microscopy coupled to pair correlation analysis (pc-STORM), we were able to visualize and measure the size of these clusters. Specifically, we show that they are non-randomly distributed and have an average size of 67 nm. We also demonstrated that polarized MDCK and non-polarized CHO cells have similar cluster distribution and size, but different sensitivity to cholesterol depletion. Finally, we derived a model that allowed a quantitative characterization of the cluster organization of GPI-APs at the apical surface of polarized MDCK cells for the first time. Experimental FRET (fluorescence resonance energy transfer)/FLIM (fluorescence-lifetime imaging microscopy) data were correlated to the theoretical predictions of the model. PMID:29046391
Solano, Rubén; Gómez-Barroso, Diana; Simón, Fernando; Lafuente, Sarah; Simón, Pere; Rius, Cristina; Gorrindo, Pilar; Toledo, Diana; Caylà, Joan A
2014-05-01
A retrospective, space-time study of whooping cough cases reported to the Public Health Agency of Barcelona, Spain between the years 2000 and 2011 is presented. It is based on 633 individual whooping cough cases and the 2006 population census from the Spanish National Statistics Institute, stratified by age and sex at the census tract level. Cluster identification was attempted using space-time scan statistic assuming a Poisson distribution and restricting temporal extent to 7 days and spatial distance to 500 m. Statistical calculations were performed with Stata 11 and SatScan and mapping was performed with ArcGis 10.0. Only clusters showing statistical significance (P <0.05) were mapped. The most likely cluster identified included five census tracts located in three neighbourhoods in central Barcelona during the week from 17 to 23 August 2011. This cluster included five cases compared with the expected level of 0.0021 (relative risk = 2436, P <0.001). In addition, 11 secondary significant space-time clusters were detected with secondary clusters occurring at different times and localizations. Spatial statistics is felt to be useful by complementing epidemiological surveillance systems through visualizing excess in the number of cases in space and time and thus increase the possibility of identifying outbreaks not reported by the surveillance system.
TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis
Ji, Zhicheng; Ji, Hongkai
2016-01-01
When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. PMID:27179027
TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.
Ji, Zhicheng; Ji, Hongkai
2016-07-27
When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Brehmer, Matthew; Ingram, Stephen; Stray, Jonathan; Munzner, Tamara
2014-12-01
For an investigative journalist, a large collection of documents obtained from a Freedom of Information Act request or a leak is both a blessing and a curse: such material may contain multiple newsworthy stories, but it can be difficult and time consuming to find relevant documents. Standard text search is useful, but even if the search target is known it may not be possible to formulate an effective query. In addition, summarization is an important non-search task. We present Overview, an application for the systematic analysis of large document collections based on document clustering, visualization, and tagging. This work contributes to the small set of design studies which evaluate a visualization system "in the wild", and we report on six case studies where Overview was voluntarily used by self-initiated journalists to produce published stories. We find that the frequently-used language of "exploring" a document collection is both too vague and too narrow to capture how journalists actually used our application. Our iterative process, including multiple rounds of deployment and observations of real world usage, led to a much more specific characterization of tasks. We analyze and justify the visual encoding and interaction techniques used in Overview's design with respect to our final task abstractions, and propose generalizable lessons for visualization design methodology.
Clustering analysis of line indices for LAMOST spectra with AstroStat
NASA Astrophysics Data System (ADS)
Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi
2018-06-01
The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.
Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow.
Wongsuphasawat, Kanit; Smilkov, Daniel; Wexler, James; Wilson, Jimbo; Mane, Dandelion; Fritz, Doug; Krishnan, Dilip; Viegas, Fernanda B; Wattenberg, Martin
2018-01-01
We present a design study of the TensorFlow Graph Visualizer, part of the TensorFlow machine intelligence platform. This tool helps users understand complex machine learning architectures by visualizing their underlying dataflow graphs. The tool works by applying a series of graph transformations that enable standard layout techniques to produce a legible interactive diagram. To declutter the graph, we decouple non-critical nodes from the layout. To provide an overview, we build a clustered graph using the hierarchical structure annotated in the source code. To support exploration of nested structure on demand, we perform edge bundling to enable stable and responsive cluster expansion. Finally, we detect and highlight repeated structures to emphasize a model's modular composition. To demonstrate the utility of the visualizer, we describe example usage scenarios and report user feedback. Overall, users find the visualizer useful for understanding, debugging, and sharing the structures of their models.
Alerts Visualization and Clustering in Network-based Intrusion Detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Dr. Li; Gasior, Wade C; Dasireddy, Swetha
2010-04-01
Today's Intrusion detection systems when deployed on a busy network overload the network with huge number of alerts. This behavior of producing too much raw information makes it less effective. We propose a system which takes both raw data and Snort alerts to visualize and analyze possible intrusions in a network. Then we present with two models for the visualization of clustered alerts. Our first model gives the network administrator with the logical topology of the network and detailed information of each node that involves its associated alerts and connections. In the second model, flocking model, presents the network administratormore » with the visual representation of IDS data in which each alert is represented in different color and the alerts with maximum similarity move together. This gives network administrator with the idea of detecting various of intrusions through visualizing the alert patterns.« less
NASA Astrophysics Data System (ADS)
Pham, T. D.
2016-12-01
Recurrence plots display binary texture of time series from dynamical systems with single dots and line structures. Using fuzzy recurrence plots, recurrences of the phase-space states can be visualized as grayscale texture, which is more informative for pattern analysis. The proposed method replaces the crucial similarity threshold required by symmetrical recurrence plots with the number of cluster centers, where the estimate of the latter parameter is less critical than the estimate of the former.
Costa, Patrício Soares; Santos, Nadine Correia; Cunha, Pedro; Cotter, Jorge; Sousa, Nuno
2013-01-01
The main focus of this study was to illustrate the applicability of multiple correspondence analysis (MCA) in detecting and representing underlying structures in large datasets used to investigate cognitive ageing. Principal component analysis (PCA) was used to obtain main cognitive dimensions, and MCA was used to detect and explore relationships between cognitive, clinical, physical, and lifestyle variables. Two PCA dimensions were identified (general cognition/executive function and memory), and two MCA dimensions were retained. Poorer cognitive performance was associated with older age, less school years, unhealthier lifestyle indicators, and presence of pathology. The first MCA dimension indicated the clustering of general/executive function and lifestyle indicators and education, while the second association was between memory and clinical parameters and age. The clustering analysis with object scores method was used to identify groups sharing similar characteristics. The weaker cognitive clusters in terms of memory and executive function comprised individuals with characteristics contributing to a higher MCA dimensional mean score (age, less education, and presence of indicators of unhealthier lifestyle habits and/or clinical pathologies). MCA provided a powerful tool to explore complex ageing data, covering multiple and diverse variables, showing if a relationship exists and how variables are related, and offering statistical results that can be seen both analytically and visually.
Thinking graphically: Connecting vision and cognition during graph comprehension.
Ratwani, Raj M; Trafton, J Gregory; Boehm-Davis, Deborah A
2008-03-01
Task analytic theories of graph comprehension account for the perceptual and conceptual processes required to extract specific information from graphs. Comparatively, the processes underlying information integration have received less attention. We propose a new framework for information integration that highlights visual integration and cognitive integration. During visual integration, pattern recognition processes are used to form visual clusters of information; these visual clusters are then used to reason about the graph during cognitive integration. In 3 experiments, the processes required to extract specific information and to integrate information were examined by collecting verbal protocol and eye movement data. Results supported the task analytic theories for specific information extraction and the processes of visual and cognitive integration for integrative questions. Further, the integrative processes scaled up as graph complexity increased, highlighting the importance of these processes for integration in more complex graphs. Finally, based on this framework, design principles to improve both visual and cognitive integration are described. PsycINFO Database Record (c) 2008 APA, all rights reserved
Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.
Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X
2017-12-05
Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.
NASA Astrophysics Data System (ADS)
Amirnasr, Elham
It is widely recognized that nonwoven basis weight non-uniformity affects various properties of nonwovens. However, few studies can be found in this topic. The development of uniformity definition and measurement methods and the study of their impact on various web properties such as filtration properties and air permeability would be beneficial both in industrial applications and in academia. They can be utilized as a quality control tool and would provide insights about nonwoven behaviors that cannot be solely explained by average values. Therefore, for quantifying nonwoven web basis weight uniformity we purse to develop an optical analytical tool. The quadrant method and clustering analysis was utilized in an image analysis scheme to help define "uniformity" and its spatial variation. Implementing the quadrant method in an image analysis system allows the establishment of a uniformity index that can be used to quantify the degree of uniformity. Clustering analysis has also been modified and verified using uniform and random simulated images with known parameters. Number of clusters and cluster properties such as cluster size, member and density was determined. We also utilized this new measurement method to evaluate uniformity of nonwovens produced with different processes and investigated impacts of uniformity on filtration and permeability. The results of quadrant method shows that uniformity index computed from quadrant method demonstrate a good range for non-uniformity of nonwoven webs. Clustering analysis is also been applied on reference nonwoven with known visual uniformity. From clustering analysis results, cluster size is promising to be used as uniformity parameter. It is been shown that non-uniform nonwovens has provide lager cluster size than uniform nonwovens. It was been tried to find a relationship between web properties and uniformity index (as a web characteristic). To achieve this, filtration properties, air permeability, solidity and uniformity index of meltblown and spunbond samples was measured. Results for filtration test show some deviation between theoretical and experimental filtration efficiency by considering different types of fiber diameter. This deviation can occur due to variation in basis weight non-uniformity. So an appropriate theory is required to predict the variation of filtration efficiency with respect to non-uniformity of nonwoven filter media. And the results for air permeability test showed that uniformity index determined by quadrant method and measured properties have some relationship. In the other word, air permeability decreases as uniformity index on nonwoven web increase.
Data Mining Technologies Inspired from Visual Principle
NASA Astrophysics Data System (ADS)
Xu, Zongben
In this talk we review the recent work done by our group on data mining (DM) technologies deduced from simulating visual principle. Through viewing a DM problem as a cognition problems and treading a data set as an image with each light point located at a datum position, we developed a series of high efficient algorithms for clustering, classification and regression via mimicking visual principles. In pattern recognition, human eyes seem to possess a singular aptitude to group objects and find important structure in an efficient way. Thus, a DM algorithm simulating visual system may solve some basic problems in DM research. From this point of view, we proposed a new approach for data clustering by modeling the blurring effect of lateral retinal interconnections based on scale space theory. In this approach, as the data image blurs, smaller light blobs merge into large ones until the whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process then generates a family of clustering along the hierarchy. The proposed approach provides unique solutions to many long standing problems, such as the cluster validity and the sensitivity to initialization problems, in clustering. We extended such an approach to classification and regression problems, through combatively employing the Weber's law in physiology and the cell response classification facts. The resultant classification and regression algorithms are proven to be very efficient and solve the problems of model selection and applicability to huge size of data set in DM technologies. We finally applied the similar idea to the difficult parameter setting problem in support vector machine (SVM). Viewing the parameter setting problem as a recognition problem of choosing a visual scale at which the global and local structures of a data set can be preserved, and the difference between the two structures be maximized in the feature space, we derived a direct parameter setting formula for the Gaussian SVM. The simulations and applications show that the suggested formula significantly outperforms the known model selection methods in terms of efficiency and precision.
FloVis: Leveraging Visualization to Protect Sensitive Network Infrastructure
2010-11-01
words, we are clustering the hourly web surfing patterns of users on a small private network. The data in this case is filtered NetFlow records...Entity-based NetFlow Visualization Utility for Identifying Intrusive Behavior. In Goodall et al. (eds.), Mathematics and Visualization (Proceedings
Millstone: software for multiplex microbial genome analysis and engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering.
Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.; ...
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
NASA Astrophysics Data System (ADS)
Soltanian-Zadeh, Hamid; Windham, Joe P.; Peck, Donald J.
1997-04-01
This paper presents development and performance evaluation of an MRI feature space method. The method is useful for: identification of tissue types; segmentation of tissues; and quantitative measurements on tissues, to obtain information that can be used in decision making (diagnosis, treatment planning, and evaluation of treatment). The steps of the work accomplished are as follows: (1) Four T2-weighted and two T1-weighted images (before and after injection of Gadolinium) were acquired for ten tumor patients. (2) Images were analyed by two image analysts according to the following algorithm. The intracranial brain tissues were segmented from the scalp and background. The additive noise was suppressed using a multi-dimensional non-linear edge- preserving filter which preserves partial volume information on average. Image nonuniformities were corrected using a modified lowpass filtering approach. The resulting images were used to generate and visualize an optimal feature space. Cluster centers were identified on the feature space. Then images were segmented into normal tissues and different zones of the tumor. (3) Biopsy samples were extracted from each patient and were subsequently analyzed by the pathology laboratory. (4) Image analysis results were compared to each other and to the biopsy results. Pre- and post-surgery feature spaces were also compared. The proposed algorithm made it possible to visualize the MRI feature space and to segment the image. In all cases, the operators were able to find clusters for normal and abnormal tissues. Also, clusters for different zones of the tumor were found. Based on the clusters marked for each zone, the method successfully segmented the image into normal tissues (white matter, gray matter, and CSF) and different zones of the lesion (tumor, cyst, edema, radiation necrosis, necrotic core, and infiltrated tumor). The results agreed with those obtained from the biopsy samples. Comparison of pre- to post-surgery and radiation feature spaces confirmed that the tumor was not present in the second study but radiation necrosis was generated as a result of radiation.
Maljovec, D.; Liu, S.; Wang, B.; ...
2015-07-14
Here, dynamic probabilistic risk assessment (DPRA) methodologies couple system simulator codes (e.g., RELAP and MELCOR) with simulation controller codes (e.g., RAVEN and ADAPT). Whereas system simulator codes model system dynamics deterministically, simulation controller codes introduce both deterministic (e.g., system control logic and operating procedures) and stochastic (e.g., component failures and parameter uncertainties) elements into the simulation. Typically, a DPRA is performed by sampling values of a set of parameters and simulating the system behavior for that specific set of parameter values. For complex systems, a major challenge in using DPRA methodologies is to analyze the large number of scenarios generated,more » where clustering techniques are typically employed to better organize and interpret the data. In this paper, we focus on the analysis of two nuclear simulation datasets that are part of the risk-informed safety margin characterization (RISMC) boiling water reactor (BWR) station blackout (SBO) case study. We provide the domain experts a software tool that encodes traditional and topological clustering techniques within an interactive analysis and visualization environment, for understanding the structures of such high-dimensional nuclear simulation datasets. We demonstrate through our case study that both types of clustering techniques complement each other for enhanced structural understanding of the data.« less
Mansour, Ahmad M.; Hamade, Haya; Ghaddar, Ayman; Mokadem, Ahmad Samih; El Hajj Ali, Mohamad; Awwad, Shady
2012-01-01
Purpose: To present the visual outcomes and ocular sequelae of victims of cluster bombs. Materials and Methods: This retrospective, multicenter case series of ocular injury due to cluster bombs was conducted for 3 years after the war in South Lebanon (July 2006). Data were gathered from the reports to the Information Management System for Mine Action. Results: There were 308 victims of clusters bombs; 36 individuals were killed, of which 2 received ocular lacerations and; 272 individuals were injured with 18 receiving ocular injury. These 18 surviving individuals were assessed by the authors. Ocular injury occurred in 6.5% (20/308) of cluster bomb victims. Trauma to multiple organs occurred in 12 of 18 cases (67%) with ocular injury. Ocular findings included corneal or scleral lacerations (16 eyes), corneal foreign bodies (9 eyes), corneal decompensation (2 eyes), ruptured cataract (6 eyes), and intravitreal foreign bodies (10 eyes). The corneas of one patient had extreme attenuation of the endothelium. Conclusions: Ocular injury occurred in 6.5% of cluster bomb victims and 67% of the patients with ocular injury sustained trauma to multiple organs. Visual morbidity in civilians is an additional reason for a global ban on the use of cluster bombs. PMID:22346132
Silva, J Padmaka; Gunathunga, M W; Jayasinghe, S
2016-01-01
The burden of noncommunicable diseases (NCDs) and certain behavioral risk factors related to NCDs (unhealthy behaviors) are becoming more common. This survey aims to map out such common unhealthy behaviors among all men 35 to 50 years old in a Medical Officer of Health area in the Western Province of Sri Lanka using a geographical information system (GIS) and an interviewer administered questionnaire by visiting all households in the study area. Data were analyzed with ARC GIS and SPSS software. Geographical areas where men with unhealthy behaviors cluster together (clusters) were identified and visually and statistically related to locations of schools, places of religious worship, and factories in the area. It was revealed that clusters of unhealthy behaviors are mostly seen in areas with less population density. Smoking and alcohol are clustering in estate areas occupied by Tamils. This way GIS mapping could be used to identify and reduce the burden of NCDs by visualizing clusters and how certain locations affect their spread. PMID:26489433
Silva, J Padmaka
2016-01-01
The burden of noncommunicable diseases (NCDs) and certain behavioral risk factors related to NCDs (unhealthy behaviors) are becoming more common. This survey aims to map out such common unhealthy behaviors among all men 35 to 50 years old in a Medical Officer of Health area in the Western Province of Sri Lanka using a geographical information system (GIS) and an interviewer administered questionnaire by visiting all households in the study area. Data were analyzed with ARC GIS and SPSS software. Geographical areas where men with unhealthy behaviors cluster together (clusters) were identified and visually and statistically related to locations of schools, places of religious worship, and factories in the area. It was revealed that clusters of unhealthy behaviors are mostly seen in areas with less population density. Smoking and alcohol are clustering in estate areas occupied by Tamils. This way GIS mapping could be used to identify and reduce the burden of NCDs by visualizing clusters and how certain locations affect their spread. © 2015 APJPH.
The Tehran Eye Study: research design and eye examination protocol
Hashemi, Hassan; Fotouhi, Akbar; Mohammad, Kazem
2003-01-01
Background Visual impairment has a profound impact on society. The majority of visually impaired people live in developing countries, and since most disorders leading to visual impairment are preventable or curable, their control is a priority in these countries. Considering the complicated epidemiology of visual impairment and the wide variety of factors involved, region specific intervention strategies are required for every community. Therefore, providing appropriate data is one of the first steps in these communities, as it is in Iran. The objectives of this study are to describe the prevalence and causes of visual impairment in the population of Tehran city; the prevalence of refractive errors, lens opacity, ocular hypertension, and color blindness in this population, and also the familial aggregation of refractive errors, lens opacity, ocular hypertension, and color blindness within the study sample. Methods Design Through a population-based, cross-sectional study, a total of 5300 Tehran citizens will be selected from 160 clusters using a stratified cluster random sampling strategy. The eligible people will be enumerated through a door-to-door household survey in the selected clusters and will be invited. All participants will be transferred to a clinic for measurements of uncorrected, best corrected and presenting visual acuity; manifest, subjective and cycloplegic refraction; color vision test; Goldmann applanation tonometry; examination of the external eye, anterior segment, media, and fundus; and an interview about demographic characteristics and history of eye diseases, eye trauma, diabetes mellitus, high blood pressure, and ophthalmologic cares. The study design and eye examination protocol are described. Conclusion We expect that findings from the TES will show the status of visual problems and their causes in the community. This study can highlight the people who should be targeted by visual impairment prevention programs. PMID:12859794
The Tehran Eye Study: research design and eye examination protocol.
Hashemi, Hassan; Fotouhi, Akbar; Mohammad, Kazem
2003-07-15
Visual impairment has a profound impact on society. The majority of visually impaired people live in developing countries, and since most disorders leading to visual impairment are preventable or curable, their control is a priority in these countries. Considering the complicated epidemiology of visual impairment and the wide variety of factors involved, region specific intervention strategies are required for every community. Therefore, providing appropriate data is one of the first steps in these communities, as it is in Iran. The objectives of this study are to describe the prevalence and causes of visual impairment in the population of Tehran city; the prevalence of refractive errors, lens opacity, ocular hypertension, and color blindness in this population, and also the familial aggregation of refractive errors, lens opacity, ocular hypertension, and color blindness within the study sample. Through a population-based, cross-sectional study, a total of 5300 Tehran citizens will be selected from 160 clusters using a stratified cluster random sampling strategy. The eligible people will be enumerated through a door-to-door household survey in the selected clusters and will be invited. All participants will be transferred to a clinic for measurements of uncorrected, best corrected and presenting visual acuity; manifest, subjective and cycloplegic refraction; color vision test; Goldmann applanation tonometry; examination of the external eye, anterior segment, media, and fundus; and an interview about demographic characteristics and history of eye diseases, eye trauma, diabetes mellitus, high blood pressure, and ophthalmologic cares. The study design and eye examination protocol are described. We expect that findings from the TES will show the status of visual problems and their causes in the community. This study can highlight the people who should be targeted by visual impairment prevention programs.
EnsembleGraph: Interactive Visual Analysis of Spatial-Temporal Behavior for Ensemble Simulation Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shu, Qingya; Guo, Hanqi; Che, Limei
We present a novel visualization framework—EnsembleGraph— for analyzing ensemble simulation data, in order to help scientists understand behavior similarities between ensemble members over space and time. A graph-based representation is used to visualize individual spatiotemporal regions with similar behaviors, which are extracted by hierarchical clustering algorithms. A user interface with multiple-linked views is provided, which enables users to explore, locate, and compare regions that have similar behaviors between and then users can investigate and analyze the selected regions in detail. The driving application of this paper is the studies on regional emission influences over tropospheric ozone, which is based onmore » ensemble simulations conducted with different anthropogenic emission absences using the MOZART-4 (model of ozone and related tracers, version 4) model. We demonstrate the effectiveness of our method by visualizing the MOZART-4 ensemble simulation data and evaluating the relative regional emission influences on tropospheric ozone concentrations. Positive feedbacks from domain experts and two case studies prove efficiency of our method.« less
Coherent Image Layout using an Adaptive Visual Vocabulary
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dillard, Scott E.; Henry, Michael J.; Bohn, Shawn J.
When querying a huge image database containing millions of images, the result of the query may still contain many thousands of images that need to be presented to the user. We consider the problem of arranging such a large set of images into a visually coherent layout, one that places similar images next to each other. Image similarity is determined using a bag-of-features model, and the layout is constructed from a hierarchical clustering of the image set by mapping an in-order traversal of the hierarchy tree into a space-filling curve. This layout method provides strong locality guarantees so we aremore » able to quantitatively evaluate performance using standard image retrieval benchmarks. Performance of the bag-of-features method is best when the vocabulary is learned on the image set being clustered. Because learning a large, discriminative vocabulary is a computationally demanding task, we present a novel method for efficiently adapting a generic visual vocabulary to a particular dataset. We evaluate our clustering and vocabulary adaptation methods on a variety of image datasets and show that adapting a generic vocabulary to a particular set of images improves performance on both hierarchical clustering and image retrieval tasks.« less
M-Isomap: Orthogonal Constrained Marginal Isomap for Nonlinear Dimensionality Reduction.
Zhang, Zhao; Chow, Tommy W S; Zhao, Mingbo
2013-02-01
Isomap is a well-known nonlinear dimensionality reduction (DR) method, aiming at preserving geodesic distances of all similarity pairs for delivering highly nonlinear manifolds. Isomap is efficient in visualizing synthetic data sets, but it usually delivers unsatisfactory results in benchmark cases. This paper incorporates the pairwise constraints into Isomap and proposes a marginal Isomap (M-Isomap) for manifold learning. The pairwise Cannot-Link and Must-Link constraints are used to specify the types of neighborhoods. M-Isomap computes the shortest path distances over constrained neighborhood graphs and guides the nonlinear DR through separating the interclass neighbors. As a result, large margins between both interand intraclass clusters are delivered and enhanced compactness of intracluster points is achieved at the same time. The validity of M-Isomap is examined by extensive simulations over synthetic, University of California, Irvine, and benchmark real Olivetti Research Library, YALE, and CMU Pose, Illumination, and Expression databases. The data visualization and clustering power of M-Isomap are compared with those of six related DR methods. The visualization results show that M-Isomap is able to deliver more separate clusters. Clustering evaluations also demonstrate that M-Isomap delivers comparable or even better results than some state-of-the-art DR algorithms.
dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering
2015-01-01
Summary: dendextend is an R package for creating and comparing visually appealing tree diagrams. dendextend provides utility functions for manipulating dendrogram objects (their color, shape and content) as well as several advanced methods for comparing trees to one another (both statistically and visually). As such, dendextend offers a flexible framework for enhancing R's rich ecosystem of packages for performing hierarchical clustering of items. Availability and implementation: The dendextend R package (including detailed introductory vignettes) is available under the GPL-2 Open Source license and is freely available to download from CRAN at: (http://cran.r-project.org/package=dendextend) Contact: Tal.Galili@math.tau.ac.il PMID:26209431
Unsupervised Structure Detection in Biomedical Data.
Vogt, Julia E
2015-01-01
A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.
Visualization of complex DNA damage along accelerated ions tracks
NASA Astrophysics Data System (ADS)
Kulikova, Elena; Boreyko, Alla; Bulanova, Tatiana; Ježková, Lucie; Zadneprianetc, Mariia; Smirnova, Elena
2018-04-01
The most deleterious DNA lesions induced by ionizing radiation are clustered DNA double-strand breaks (DSB). Clustered or complex DNA damage is a combination of a few simple lesions (single-strand breaks, base damage etc.) within one or two DNA helix turns. It is known that yield of complex DNA lesions increases with increasing linear energy transfer (LET) of radiation. For investigation of the induction and repair of complex DNA lesions, human fibroblasts were irradiated with high-LET 15N ions (LET = 183.3 keV/μm, E = 13MeV/n) and low-LET 60Co γ-rays (LET ≈ 0.3 keV/μm) radiation. DNA DSBs (γH2AX and 53BP1) and base damage (OGG1) markers were visualized by immunofluorecence staining and high-resolution microscopy. The obtained results showed slower repair kinetics of induced DSBs in cells irradiated with accelerated ions compared to 60Co γ-rays, indicating induction of more complex DNA damage. Confirming previous assumptions, detailed 3D analysis of γH2AX/53BP1 foci in 15N ions tracks revealed more complicated structure of the foci in contrast to γ-rays. It was shown that proteins 53BP1 and OGG1 involved in repair of DNA DSBs and modified bases, respectively, were colocalized in tracks of 15N ions and thus represented clustered DNA DSBs.
Accessing and visualizing scientific spatiotemporal data
NASA Technical Reports Server (NTRS)
Katz, Daniel S.; Bergou, Attila; Berriman, G. Bruce; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;
2004-01-01
This paper discusses work done by JPL's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids.
Smith, Jennifer L; Sivasubramaniam, Selvaraj; Rabiu, Mansur M; Kyari, Fatima; Solomon, Anthony W; Gilbert, Clare
2015-01-01
The distribution of trachoma in Nigeria is spatially heterogeneous, with large-scale trends observed across the country and more local variation within areas. Relative contributions of individual and cluster-level risk factors to the geographic distribution of disease remain largely unknown. The primary aim of this analysis is to assess the relationship between climatic factors and trachomatous trichiasis (TT) and/or corneal opacity (CO) due to trachoma in Nigeria, while accounting for the effects of individual risk factors and spatial correlation. In addition, we explore the relative importance of variation in the risk of trichiasis and/or corneal opacity (TT/CO) at different levels. Data from the 2007 National Blindness and Visual Impairment Survey were used for this analysis, which included a nationally representative sample of adults aged 40 years and above. Complete data were available from 304 clusters selected using a multi-stage stratified cluster-random sampling strategy. All participants (13,543 individuals) were interviewed and examined by an ophthalmologist for the presence or absence of TT and CO. In addition to field-collected data, remotely sensed climatic data were extracted for each cluster and used to fit Bayesian hierarchical logistic models to disease outcome. The risk of TT/CO was associated with factors at both the individual and cluster levels, with approximately 14% of the total variation attributed to the cluster level. Beyond established individual risk factors (age, gender and occupation), there was strong evidence that environmental/climatic factors at the cluster-level (lower precipitation, higher land surface temperature, higher mean annual temperature and rural classification) were also associated with a greater risk of TT/CO. This study establishes the importance of large-scale risk factors in the geographical distribution of TT/CO in Nigeria, supporting anecdotal evidence that environmental conditions are associated with increased risk in this context and highlighting their potential use in improving estimates of disease burden at large scales.
Transformation and model choice for RNA-seq co-expression analysis.
Rau, Andrea; Maugis-Rabusseau, Cathy
2018-05-01
Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
Egocentric daily activity recognition via multitask clustering.
Yan, Yan; Ricci, Elisa; Liu, Gaowen; Sebe, Nicu
2015-10-01
Recognizing human activities from videos is a fundamental research problem in computer vision. Recently, there has been a growing interest in analyzing human behavior from data collected with wearable cameras. First-person cameras continuously record several hours of their wearers' life. To cope with this vast amount of unlabeled and heterogeneous data, novel algorithmic solutions are required. In this paper, we propose a multitask clustering framework for activity of daily living analysis from visual data gathered from wearable cameras. Our intuition is that, even if the data are not annotated, it is possible to exploit the fact that the tasks of recognizing everyday activities of multiple individuals are related, since typically people perform the same actions in similar environments, e.g., people working in an office often read and write documents). In our framework, rather than clustering data from different users separately, we propose to look for clustering partitions which are coherent among related tasks. In particular, two novel multitask clustering algorithms, derived from a common optimization problem, are introduced. Our experimental evaluation, conducted both on synthetic data and on publicly available first-person vision data sets, shows that the proposed approach outperforms several single-task and multitask learning methods.
Alcohol consumption and visual impairment in a rural Northern Chinese population.
Li, Zhijian; Xu, Keke; Wu, Shubin; Sun, Ying; Song, Zhen; Jin, Di; Liu, Ping
2014-12-01
To investigate alcohol drinking status and the association between drinking patterns and visual impairment in an adult population in northern China. Cluster sampling was used to select samples. The protocol consisted of an interview, pilot study, visual acuity (VA) testing and a clinical examination. Visual impairment was defined as presenting VA worse than 20/60 in any eye. Drinking patterns included drinking quantity (standard drinks per week) and frequency (drinking days in the past week). Information on alcohol consumption was obtained from 8445 subjects, 963 (11.4%) of whom reported consuming alcohol. In multivariate analysis, alcohol consumption was significantly associated with older age (p < 0.001), male sex (p < 0.001), and higher education level (p < 0.01). Heavy intake (>14 drinks/week) was associated with higher odds of visual impairment. However, moderate intake (>1-14 drinks/week) was significantly associated with lower odds (adjusted odds ratio, OR, 0.7, 95% confidence interval, CI, 0.5-1.0) of visual impairment (p = 0.03). Higher drinking frequency was significantly associated with higher odds of visual impairment. Multivariate analysis showed that older age, male sex, and higher education level were associated with visual impairment among current drinkers. Age- and sex-adjusted ORs for the association of cataract and alcohol intake showed that higher alcohol consumption was not significantly associated with an increased prevalence of cataract (OR 1.2, 95% CI 0.4-3.6), whereas light and moderate alcohol consumption appeared to reduce incidence of cataract. Drinking patterns were associated with visual impairment. Heavy intake had negative effects on distance vision; meanwhile, moderate intake had a positive effect on distance vision.
Intuitive color-based visualization of multimedia content as large graphs
NASA Astrophysics Data System (ADS)
Delest, Maylis; Don, Anthony; Benois-Pineau, Jenny
2004-06-01
Data visualization techniques are penetrating in various technological areas. In the field of multimedia such as information search and retrieval in multimedia archives, or digital media production and post-production, data visualization methodologies based on large graphs give an exciting alternative to conventional storyboard visualization. In this paper we develop a new approach to visualization of multimedia (video) documents based both on large graph clustering and preliminary video segmenting and indexing.
NASA Astrophysics Data System (ADS)
Hu, Ran; Wan, Jiamin; Kim, Yongman; Tokunaga, Tetsu K.
2017-08-01
How the wettability of pore surfaces affects supercritical (sc) CO2 capillary trapping in geologic carbon sequestration (GCS) is not well understood, and available evidence appears inconsistent. Using a high-pressure micromodel-microscopy system with image analysis, we studied the impact of wettability on scCO2 capillary trapping during short-term brine flooding (80 s, 8-667 pore volumes). Experiments on brine displacing scCO2 were conducted at 8.5 MPa and 45°C in water-wet (static contact angle θ = 20° ± 8°) and intermediate-wet (θ = 94° ± 13°) homogeneous micromodels under four different flow rates (capillary number Ca ranging from 9 × 10-6 to 8 × 10-4) with a total of eight conditions (four replicates for each). Brine invasion processes were recorded and statistical analysis was performed for over 2000 images of scCO2 saturations, and scCO2 cluster characteristics. The trapped scCO2 saturation under intermediate-wet conditions is 15% higher than under water-wet conditions under the slowest flow rate (Ca ˜ 9 × 10-6). Based on the visualization and scCO2 cluster analysis, we show that the scCO2 trapping process in our micromodels is governed by bypass trapping that is enhanced by the larger contact angle. Smaller contact angles enhance cooperative pore filling and widen brine fingers (or channels), leading to smaller volumes of scCO2 being bypassed. Increased flow rates suppress this wettability effect.
FonaDyn - A system for real-time analysis of the electroglottogram, over the voice range
NASA Astrophysics Data System (ADS)
Ternström, Sten; Johansson, Dennis; Selamtzis, Andreas
2018-01-01
From soft to loud and low to high, the mechanisms of human voice have many degrees of freedom, making it difficult to assess phonation from the acoustic signal alone. FonaDyn is a research tool that combines acoustics with electroglottography (EGG). It characterizes and visualizes in real time the dynamics of EGG waveforms, using statistical clustering of the cycle-synchronous EGG Fourier components, and their sample entropy. The prevalence and stability of different EGG waveshapes are mapped as colored regions into a so-called voice range profile, without needing pre-defined thresholds or categories. With appropriately 'trained' clusters, FonaDyn can classify and map voice regimes. This is of potential scientific, clinical and pedagogical interest.
Method and apparatus for offloading compute resources to a flash co-processing appliance
Tzelnic, Percy; Faibish, Sorin; Gupta, Uday K.; Bent, John; Grider, Gary Alan; Chen, Hsing -bung
2015-10-13
Solid-State Drive (SSD) burst buffer nodes are interposed into a parallel supercomputing cluster to enable fast burst checkpoint of cluster memory to or from nearby interconnected solid-state storage with asynchronous migration between the burst buffer nodes and slower more distant disk storage. The SSD nodes also perform tasks offloaded from the compute nodes or associated with the checkpoint data. For example, the data for the next job is preloaded in the SSD node and very fast uploaded to the respective compute node just before the next job starts. During a job, the SSD nodes perform fast visualization and statistical analysis upon the checkpoint data. The SSD nodes can also perform data reduction and encryption of the checkpoint data.
deFUME: Dynamic exploration of functional metagenomic sequencing data.
van der Helm, Eric; Geertz-Hansen, Henrik Marcus; Genee, Hans Jasper; Malla, Sailesh; Sommer, Morten Otto Alexander
2015-07-31
Functional metagenomic selections represent a powerful technique that is widely applied for identification of novel genes from complex metagenomic sources. However, whereas hundreds to thousands of clones can be easily generated and sequenced over a few days of experiments, analyzing the data is time consuming and constitutes a major bottleneck for experimental researchers in the field. Here we present the deFUME web server, an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, tailored to meet the requirements of non-bioinformaticians. The web-server integrates multiple analysis steps into one single workflow: read assembly, open reading frame prediction, and annotation with BLAST, InterPro and GO classifiers. Analysis results are visualized in an online dynamic web-interface. The deFUME webserver provides a fast track from raw sequence to a comprehensive visual data overview that facilitates effortless inspection of gene function, clustering and distribution. The webserver is available at cbs.dtu.dk/services/deFUME/and the source code is distributed at github.com/EvdH0/deFUME.
NASA Astrophysics Data System (ADS)
Morrison, S. M.; Downs, R. T.; Golden, J. J.; Pires, A.; Fox, P. A.; Ma, X.; Zednik, S.; Eleish, A.; Prabhu, A.; Hummer, D. R.; Liu, C.; Meyer, M.; Ralph, J.; Hystad, G.; Hazen, R. M.
2016-12-01
We have developed a comprehensive database of copper (Cu) mineral characteristics. These data include crystallographic, paragenetic, chemical, locality, age, structural complexity, and physical property information for the 689 Cu mineral species approved by the International Mineralogical Association (rruff.info/ima). Synthesis of this large, varied dataset allows for in-depth exploration of statistical trends and visualization techniques. With social network analysis (SNA) and cluster analysis of minerals, we create sociograms and chord diagrams. SNA visualizations illustrate the relationships and connectivity between mineral species, which often form cliques associated with rock type and/or geochemistry. Using mineral ecology statistics, we analyze mineral-locality frequency distribution and predict the number of missing mineral species, visualized with accumulation curves. By assembly of 2-dimensional KLEE diagrams of co-existing elements in minerals, we illustrate geochemical trends within a mineral system. To explore mineral age and chemical oxidation state, we create skyline diagrams and compare trends with varying chemistry. These trends illustrate mineral redox changes through geologic time and correlate with significant geologic occurrences, such as the Great Oxidation Event (GOE) or Wilson Cycles.
GeneXplorer: an interactive web application for microarray data visualization and analysis.
Rees, Christian A; Demeter, Janos; Matese, John C; Botstein, David; Sherlock, Gavin
2004-10-01
When publishing large-scale microarray datasets, it is of great value to create supplemental websites where either the full data, or selected subsets corresponding to figures within the paper, can be browsed. We set out to create a CGI application containing many of the features of some of the existing standalone software for the visualization of clustered microarray data. We present GeneXplorer, a web application for interactive microarray data visualization and analysis in a web environment. GeneXplorer allows users to browse a microarray dataset in an intuitive fashion. It provides simple access to microarray data over the Internet and uses only HTML and JavaScript to display graphic and annotation information. It provides radar and zoom views of the data, allows display of the nearest neighbors to a gene expression vector based on their Pearson correlations and provides the ability to search gene annotation fields. The software is released under the permissive MIT Open Source license, and the complete documentation and the entire source code are freely available for download from CPAN http://search.cpan.org/dist/Microarray-GeneXplorer/.
MotionFlow: Visual Abstraction and Aggregation of Sequential Patterns in Human Motion Tracking Data.
Jang, Sujin; Elmqvist, Niklas; Ramani, Karthik
2016-01-01
Pattern analysis of human motions, which is useful in many research areas, requires understanding and comparison of different styles of motion patterns. However, working with human motion tracking data to support such analysis poses great challenges. In this paper, we propose MotionFlow, a visual analytics system that provides an effective overview of various motion patterns based on an interactive flow visualization. This visualization formulates a motion sequence as transitions between static poses, and aggregates these sequences into a tree diagram to construct a set of motion patterns. The system also allows the users to directly reflect the context of data and their perception of pose similarities in generating representative pose states. We provide local and global controls over the partition-based clustering process. To support the users in organizing unstructured motion data into pattern groups, we designed a set of interactions that enables searching for similar motion sequences from the data, detailed exploration of data subsets, and creating and modifying the group of motion patterns. To evaluate the usability of MotionFlow, we conducted a user study with six researchers with expertise in gesture-based interaction design. They used MotionFlow to explore and organize unstructured motion tracking data. Results show that the researchers were able to easily learn how to use MotionFlow, and the system effectively supported their pattern analysis activities, including leveraging their perception and domain knowledge.
Kumar, Raj; Sharma, Vishal
2017-03-15
The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%). Copyright © 2016 Elsevier B.V. All rights reserved.
Kibinge, Nelson; Ono, Naoaki; Horie, Masafumi; Sato, Tetsuo; Sugiura, Tadao; Altaf-Ul-Amin, Md; Saito, Akira; Kanaya, Shigehiko
2016-06-01
Conventionally, workflows examining transcription regulation networks from gene expression data involve distinct analytical steps. There is a need for pipelines that unify data mining and inference deduction into a singular framework to enhance interpretation and hypotheses generation. We propose a workflow that merges network construction with gene expression data mining focusing on regulation processes in the context of transcription factor driven gene regulation. The pipeline implements pathway-based modularization of expression profiles into functional units to improve biological interpretation. The integrated workflow was implemented as a web application software (TransReguloNet) with functions that enable pathway visualization and comparison of transcription factor activity between sample conditions defined in the experimental design. The pipeline merges differential expression, network construction, pathway-based abstraction, clustering and visualization. The framework was applied in analysis of actual expression datasets related to lung, breast and prostrate cancer. Copyright © 2016 Elsevier Inc. All rights reserved.
Matching multiple rigid domain decompositions of proteins
Flynn, Emily; Streinu, Ileana
2017-01-01
We describe efficient methods for consistently coloring and visualizing collections of rigid cluster decompositions obtained from variations of a protein structure, and lay the foundation for more complex setups that may involve different computational and experimental methods. The focus here is on three biological applications: the conceptually simpler problems of visualizing results of dilution and mutation analyses, and the more complex task of matching decompositions of multiple NMR models of the same protein. Implemented into the KINARI web server application, the improved visualization techniques give useful information about protein folding cores, help examining the effect of mutations on protein flexibility and function, and provide insights into the structural motions of PDB proteins solved with solution NMR. These tools have been developed with the goal of improving and validating rigidity analysis as a credible coarse-grained model capturing essential information about a protein’s slow motions near the native state. PMID:28141528
Visual analysis of immiscible displacement processes in porous media under ultrasound effect
NASA Astrophysics Data System (ADS)
Naderi, Khosrow; Babadagli, Tayfun
2011-05-01
The effect of sonic waves, in particular, ultrasonic radiation, on immiscible displacement in porous media and enhanced oil recovery has been of interest for more than five decades. Attempts were made to investigate the effect through core scale experimental or theoretical models. Visual experiments are useful to scrutinize the reason for improved oil recovery under acoustic waves of different frequency but are not abundant in literature. In this paper, we report observations and analyses as to the effects of ultrasonic energy on immiscible displacement and interaction of the fluid matrix visually in porous media through two-dimensional (2D) sand pack experiments. 2D glass bead models with different wettabilities were saturated with different viscosity oils and water was injected into the models. The experiments were conducted with and without ultrasound. Dynamic water injection experiments were preferred as they had both viscous and capillary forces in effect. The displacement patterns were evaluated both in terms of their shape, size, and the interface characteristics quantitatively and qualitatively to account for the effects of ultrasonic waves on the displacement and the reason for increased oil production under this type of sonic wave. More compact clusters were observed when ultrasonic energy was present in water-wet systems. In the oil-wet cases, more oil was produced after breakthrough when ultrasound was applied and no compact clusters were formed in contrast to the water-wet cases.
SEURAT: Visual analytics for the integrated analysis of microarray data
2010-01-01
Background In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. Results We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. Conclusions The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data. PMID:20525257
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
NASA Astrophysics Data System (ADS)
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard
2014-09-01
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.
ICM: a web server for integrated clustering of multi-dimensional biomedical data.
He, Song; He, Haochen; Xu, Wenjian; Huang, Xin; Jiang, Shuai; Li, Fei; He, Fuchu; Bo, Xiaochen
2016-07-08
Large-scale efforts for parallel acquisition of multi-omics profiling continue to generate extensive amounts of multi-dimensional biomedical data. Thus, integrated clustering of multiple types of omics data is essential for developing individual-based treatments and precision medicine. However, while rapid progress has been made, methods for integrated clustering are lacking an intuitive web interface that facilitates the biomedical researchers without sufficient programming skills. Here, we present a web tool, named Integrated Clustering of Multi-dimensional biomedical data (ICM), that provides an interface from which to fuse, cluster and visualize multi-dimensional biomedical data and knowledge. With ICM, users can explore the heterogeneity of a disease or a biological process by identifying subgroups of patients. The results obtained can then be interactively modified by using an intuitive user interface. Researchers can also exchange the results from ICM with collaborators via a web link containing a Project ID number that will directly pull up the analysis results being shared. ICM also support incremental clustering that allows users to add new sample data into the data of a previous study to obtain a clustering result. Currently, the ICM web server is available with no login requirement and at no cost at http://biotech.bmi.ac.cn/icm/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G., E-mail: yannis@princeton.edu, E-mail: gerhard.hummer@biophys.mpg.de
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlapmore » with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.« less
Paladino, Simona; Lebreton, Stéphanie; Lelek, Mickaël; Riccio, Patrizia; De Nicola, Sergio; Zimmer, Christophe; Zurzolo, Chiara
2017-12-01
Spatio-temporal compartmentalization of membrane proteins is critical for the regulation of diverse vital functions in eukaryotic cells. It was previously shown that, at the apical surface of polarized MDCK cells, glycosylphosphatidylinositol (GPI)-anchored proteins (GPI-APs) are organized in small cholesterol-independent clusters of single GPI-AP species (homoclusters), which are required for the formation of larger cholesterol-dependent clusters formed by multiple GPI-AP species (heteroclusters). This clustered organization is crucial for the biological activities of GPI-APs; hence, understanding the spatio-temporal properties of their membrane organization is of fundamental importance. Here, by using direct stochastic optical reconstruction microscopy coupled to pair correlation analysis (pc-STORM), we were able to visualize and measure the size of these clusters. Specifically, we show that they are non-randomly distributed and have an average size of 67 nm. We also demonstrated that polarized MDCK and non-polarized CHO cells have similar cluster distribution and size, but different sensitivity to cholesterol depletion. Finally, we derived a model that allowed a quantitative characterization of the cluster organization of GPI-APs at the apical surface of polarized MDCK cells for the first time. Experimental FRET (fluorescence resonance energy transfer)/FLIM (fluorescence-lifetime imaging microscopy) data were correlated to the theoretical predictions of the model. © 2017 The Author(s).
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard
2014-01-01
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space. PMID:25240340
Robust fiber clustering of cerebral fiber bundles in white matter
NASA Astrophysics Data System (ADS)
Yao, Xufeng; Wang, Yongxiong; Zhuang, Songlin
2014-11-01
Diffusion tensor imaging fiber tracking (DTI-FT) has been widely accepted in the diagnosis and treatment of brain diseases. During the rendering pipeline of specific fiber tracts, the image noise and low resolution of DTI would lead to false propagations. In this paper, we propose a robust fiber clustering (FC) approach to diminish false fibers from one fiber tract. Our algorithm consists of three steps. Firstly, the optimized fiber assignment continuous tracking (FACT) is implemented to reconstruct one fiber tract; and then each curved fiber in the fiber tract is mapped to a point by kernel principal component analysis (KPCA); finally, the point clouds of fiber tract are clustered by hierarchical clustering which could distinguish false fibers from true fibers in one tract. In our experiment, the corticospinal tract (CST) in one case of human data in vivo was used to validate our method. Our method showed reliable capability in decreasing the false fibers in one tract. In conclusion, our method could effectively optimize the visualization of fiber bundles and would help a lot in the field of fiber evaluation.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2018-01-30
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Wang, Jin; Sun, Xiangping; Nahavandi, Saeid; Kouzani, Abbas; Wu, Yuchuan; She, Mary
2014-11-01
Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Automated X-Ray Diffraction of Irradiated Materials
Rodman, John; Lin, Yuewei; Sprouster, David; ...
2017-10-26
Synchrotron-based X-ray diffraction (XRD) and small-angle Xray scattering (SAXS) characterization techniques used on unirradiated and irradiated reactor pressure vessel steels yield large amounts of data. Machine learning techniques, including PCA, offer a novel method of analyzing and visualizing these large data sets in order to determine the effects of chemistry and irradiation conditions on the formation of radiation induced precipitates. In order to run analysis on these data sets, preprocessing must be carried out to convert the data to a usable format and mask the 2-D detector images to account for experimental variations. Once the data has been preprocessed, itmore » can be organized and visualized using principal component analysis (PCA), multi-dimensional scaling, and k-means clustering. In conclusion, from these techniques, it is shown that sample chemistry has a notable effect on the formation of the radiation induced precipitates in reactor pressure vessel steels.« less
Frontal–Occipital Connectivity During Visual Search
Pantazatos, Spiro P.; Yanagihara, Ted K.; Zhang, Xian; Meitzler, Thomas
2012-01-01
Abstract Although expectation- and attention-related interactions between ventral and medial prefrontal cortex and stimulus category-selective visual regions have been identified during visual detection and discrimination, it is not known if similar neural mechanisms apply to other tasks such as visual search. The current work tested the hypothesis that high-level frontal regions, previously implicated in expectation and visual imagery of object categories, interact with visual regions associated with object recognition during visual search. Using functional magnetic resonance imaging, subjects searched for a specific object that varied in size and location within a complex natural scene. A model-free, spatial-independent component analysis isolated multiple task-related components, one of which included visual cortex, as well as a cluster within ventromedial prefrontal cortex (vmPFC), consistent with the engagement of both top-down and bottom-up processes. Analyses of psychophysiological interactions showed increased functional connectivity between vmPFC and object-sensitive lateral occipital cortex (LOC), and results from dynamic causal modeling and Bayesian Model Selection suggested bidirectional connections between vmPFC and LOC that were positively modulated by the task. Using image-guided diffusion-tensor imaging, functionally seeded, probabilistic white-matter tracts between vmPFC and LOC, which presumably underlie this effective interconnectivity, were also observed. These connectivity findings extend previous models of visual search processes to include specific frontal–occipital neuronal interactions during a natural and complex search task. PMID:22708993
A Study on Regional Frequency Analysis using Artificial Neural Network - the Sumjin River Basin
NASA Astrophysics Data System (ADS)
Jeong, C.; Ahn, J.; Ahn, H.; Heo, J. H.
2017-12-01
Regional frequency analysis means to make up for shortcomings in the at-site frequency analysis which is about a lack of sample size through the regional concept. Regional rainfall quantile depends on the identification of hydrologically homogeneous regions, hence the regional classification based on hydrological homogeneous assumption is very important. For regional clustering about rainfall, multidimensional variables and factors related geographical features and meteorological figure are considered such as mean annual precipitation, number of days with precipitation in a year and average maximum daily precipitation in a month. Self-Organizing Feature Map method which is one of the artificial neural network algorithm in the unsupervised learning techniques solves N-dimensional and nonlinear problems and be shown results simply as a data visualization technique. In this study, for the Sumjin river basin in South Korea, cluster analysis was performed based on SOM method using high-dimensional geographical features and meteorological factor as input data. then, for the results, in order to evaluate the homogeneity of regions, the L-moment based discordancy and heterogeneity measures were used. Rainfall quantiles were estimated as the index flood method which is one of regional rainfall frequency analysis. Clustering analysis using SOM method and the consequential variation in rainfall quantile were analyzed. This research was supported by a grant(2017-MPSS31-001) from Supporting Technology Development Program for Disaster Management funded by Ministry of Public Safety and Security(MPSS) of the Korean government.
Electrofacies analysis for coal lithotype profiling based on high-resolution wireline log data
NASA Astrophysics Data System (ADS)
Roslin, A.; Esterle, J. S.
2016-06-01
The traditional approach to coal lithotype analysis is based on a visual characterisation of coal in core, mine or outcrop exposures. As not all wells are fully cored, the petroleum and coal mining industries increasingly use geophysical wireline logs for lithology interpretation.This study demonstrates a method for interpreting coal lithotypes from geophysical wireline logs, and in particular discriminating between bright or banded, and dull coal at similar densities to a decimetre level. The study explores the optimum combination of geophysical log suites for training the coal electrofacies interpretation, using neural network conception, and then propagating the results to wells with fewer wireline data. This approach is objective and has a recordable reproducibility and rule set.In addition to conventional gamma ray and density logs, laterolog resistivity, microresistivity and PEF data were used in the study. Array resistivity data from a compact micro imager (CMI tool) were processed into a single microresistivity curve and integrated with the conventional resistivity data in the cluster analysis. Microresistivity data were tested in the analysis to test the hypothesis that the improved vertical resolution of microresistivity curve can enhance the accuracy of the clustering analysis. The addition of PEF log allowed discrimination between low density bright to banded coal electrofacies and low density inertinite-rich dull electrofacies.The results of clustering analysis were validated statistically and the results of the electrofacies results were compared to manually derived coal lithotype logs.
GATE: software for the analysis and visualization of high-dimensional time series expression data.
MacArthur, Ben D; Lachmann, Alexander; Lemischka, Ihor R; Ma'ayan, Avi
2010-01-01
We present Grid Analysis of Time series Expression (GATE), an integrated computational software platform for the analysis and visualization of high-dimensional biomolecular time series. GATE uses a correlation-based clustering algorithm to arrange molecular time series on a two-dimensional hexagonal array and dynamically colors individual hexagons according to the expression level of the molecular component to which they are assigned, to create animated movies of systems-level molecular regulatory dynamics. In order to infer potential regulatory control mechanisms from patterns of correlation, GATE also allows interactive interroga-tion of movies against a wide variety of prior knowledge datasets. GATE movies can be paused and are interactive, allowing users to reconstruct networks and perform functional enrichment analyses. Movies created with GATE can be saved in Flash format and can be inserted directly into PDF manuscript files as interactive figures. GATE is available for download and is free for academic use from http://amp.pharm.mssm.edu/maayan-lab/gate.htm
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL.
Stone, John E; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-05-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications.
High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL
Stone, John E.; Messmer, Peter; Sisneros, Robert; Schulten, Klaus
2016-01-01
Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications. PMID:27747137
Yousefi, Siamak; Balasubramanian, Madhusudhanan; Goldbaum, Michael H; Medeiros, Felipe A; Zangwill, Linda M; Weinreb, Robert N; Liebmann, Jeffrey M; Girkin, Christopher A; Bowd, Christopher
2016-05-01
To validate Gaussian mixture-model with expectation maximization (GEM) and variational Bayesian independent component analysis mixture-models (VIM) for detecting glaucomatous progression along visual field (VF) defect patterns (GEM-progression of patterns (POP) and VIM-POP). To compare GEM-POP and VIM-POP with other methods. GEM and VIM models separated cross-sectional abnormal VFs from 859 eyes and normal VFs from 1117 eyes into abnormal and normal clusters. Clusters were decomposed into independent axes. The confidence limit (CL) of stability was established for each axis with a set of 84 stable eyes. Sensitivity for detecting progression was assessed in a sample of 83 eyes with known progressive glaucomatous optic neuropathy (PGON). Eyes were classified as progressed if any defect pattern progressed beyond the CL of stability. Performance of GEM-POP and VIM-POP was compared to point-wise linear regression (PLR), permutation analysis of PLR (PoPLR), and linear regression (LR) of mean deviation (MD), and visual field index (VFI). Sensitivity and specificity for detecting glaucomatous VFs were 89.9% and 93.8%, respectively, for GEM and 93.0% and 97.0%, respectively, for VIM. Receiver operating characteristic (ROC) curve areas for classifying progressed eyes were 0.82 for VIM-POP, 0.86 for GEM-POP, 0.81 for PoPLR, 0.69 for LR of MD, and 0.76 for LR of VFI. GEM-POP was significantly more sensitive to PGON than PoPLR and linear regression of MD and VFI in our sample, while providing localized progression information. Detection of glaucomatous progression can be improved by assessing longitudinal changes in localized patterns of glaucomatous defect identified by unsupervised machine learning.
Safety of Spectacles for Children's Vision: A Cluster-Randomized Controlled Trial.
Ma, Xiaochen; Congdon, Nathan; Yi, Hongmei; Zhou, Zhongqiang; Pang, Xiaopeng; Meltzer, Mirjam E; Shi, Yaojiang; He, Mingguang; Liu, Yizhi; Rozelle, Scott
2015-11-01
To study safety of children's glasses in rural China, where fear that glasses harm vision is an important barrier for families and policy makers. Exploratory analysis from a cluster-randomized, investigator-masked, controlled trial. Among primary schools (n = 252) in western China, children were randomized by school to 1 of 3 interventions: free glasses provided in class, vouchers for free glasses at a local facility, or glasses prescriptions only (Control group). The main outcome of this analysis is uncorrected visual acuity after 8 months, adjusted for baseline acuity. Among 19 934 children randomly selected for screening, 5852 myopic (spherical equivalent refractive error ≤-0.5 diopters) eyes of 3001 children (14.7%, mean age 10.5 years) had VA ≤6/12 without glasses correctable to >6/12 with glasses, and were eligible. Among these, 1903 (32.5%), 1798 (30.7%), and 2151 (36.8%) were randomized to Control, Voucher, and Free Glasses, respectively. Intention-to-treat analyses were performed on all 1831 (96.2%), 1699 (94.5%), and 2007 (93.3%) eyes of children with follow-up in Control, Voucher, and Free Glasses groups. Final visual acuity for eyes of children in the treatment groups (Free Glasses and Voucher) was significantly better than for Control children, adjusting only for baseline visual acuity (difference of 0.023 logMAR units [0.23 vision chart lines, 95% CI: 0.03, 0.43]) or for other baseline factors as well (0.025 logMAR units [0.25 lines, 95% CI 0.04, 0.45]). We found no evidence that spectacles promote decline in uncorrected vision with aging among children. Copyright © 2015 Elsevier Inc. All rights reserved.
Clustering of Synoptic Pattern over the Korean Peninsula from Meteorological Models
NASA Astrophysics Data System (ADS)
Kim, Jinah; Heo, Kiyoung; Choi, Jungwoon; Jung, Sanghoon
2017-04-01
Numerical modeling data on meteorological and ocean science is one of example of big geographic data sources. The properties of the data including the volume, variety, and dynamic aspects pose new challenges for geographic visualization, and visual geoanalytics using big data analysis using machine learning method. A combination of algorithmic and visual approaches that make sense of large volumes of various types of spatiotemporal data are required to gain knowledge about complex phenomena. In the East coast of Korea, it is suffering from property damages and human causalities due to abnormal high waves (swell-like high-height waves). It is known to be caused by local meteorological conditions on the East Sea of Korean Peninsula in previous research and they proposed three kinds of pressure patterns that generate abnormal high waves. However, they cannot describe all kinds of pressure patterns that generate abnormal high waves. In our study, we propose unsupervised machine learning method for pattern clustering and applied it to classify a pattern which has occurred abnormal high waves using numerical meteorological model's reanalysis data from 2000 to 2015 and past historical records of accidents by abnormal high waves. About 25,000 patterns of total spatial distribution of sea surface pressure are clustered into 30 patterns and they are classified into seasonal sea level pressure patterns based on meteorological characteristics of Korean peninsula. Moreover, in order to determine the representative patterns which occurs abnormal high waves, we classified it again using historical accidents cases among the winter season pressure patterns. In this work, we clustered synoptic pattern over the Korean Peninsula in meteorological modeling reanalysis data and we could understand a seasonal variation through identifying the occurrence of clustered synoptic pattern. For the future work, we have to identify the relationship of wave modeling data for better understanding of abnormal high waves and we will develop pattern decision system to predict abnormal high waves in advances. This research was a part of the project titled "Development of Korea Operational Oceanographic System (KOOS), Phase 2" and "Investigation of Large Swell Waves and Rip currents and Development of The Disaster Response System," funded by the Ministry of Oceans & Fisheries Korea (Grant PM59691 and PM59240).
Zhang, Xiaohua Douglas; Yang, Xiting Cindy; Chung, Namjin; Gates, Adam; Stec, Erica; Kunapuli, Priya; Holder, Dan J; Ferrer, Marc; Espeseth, Amy S
2006-04-01
RNA interference (RNAi) high-throughput screening (HTS) experiments carried out using large (>5000 short interfering [si]RNA) libraries generate a huge amount of data. In order to use these data to identify the most effective siRNAs tested, it is critical to adopt and develop appropriate statistical methods. To address the questions in hit selection of RNAi HTS, we proposed a quartile-based method which is robust to outliers, true hits and nonsymmetrical data. We compared it with the more traditional tests, mean +/- k standard deviation (SD) and median +/- 3 median of absolute deviation (MAD). The results suggested that the quartile-based method selected more hits than mean +/- k SD under the same preset error rate. The number of hits selected by median +/- k MAD was close to that by the quartile-based method. Further analysis suggested that the quartile-based method had the greatest power in detecting true hits, especially weak or moderate true hits. Our investigation also suggested that platewise analysis (determining effective siRNAs on a plate-by-plate basis) can adjust for systematic errors in different plates, while an experimentwise analysis, in which effective siRNAs are identified in an analysis of the entire experiment, cannot. However, experimentwise analysis may detect a cluster of true positive hits placed together in one or several plates, while platewise analysis may not. To display hit selection results, we designed a specific figure called a plate-well series plot. We thus suggest the following strategy for hit selection in RNAi HTS experiments. First, choose the quartile-based method, or median +/- k MAD, for identifying effective siRNAs. Second, perform the chosen method experimentwise on transformed/normalized data, such as percentage inhibition, to check the possibility of hit clusters. If a cluster of selected hits are observed, repeat the analysis based on untransformed data to determine whether the cluster is due to an artifact in the data. If no clusters of hits are observed, select hits by performing platewise analysis on transformed data. Third, adopt the plate-well series plot to visualize both the data and the hit selection results, as well as to check for artifacts.
ERIC Educational Resources Information Center
Williams, Carrick C.; Pollatsek, Alexander; Cave, Kyle R.; Stroud, Michael J.
2009-01-01
In 2 experiments, eye movements were examined during searches in which elements were grouped into four 9-item clusters. The target (a red or blue "T") was known in advance, and each cluster contained different numbers of target-color elements. Rather than color composition of a cluster invariantly guiding the order of search though…
Neuroimaging Evidence of a Bilateral Representation for Visually Presented Numbers.
Grotheer, Mareike; Herrmann, Karl-Heinz; Kovács, Gyula
2016-01-06
The clustered architecture of the brain for different visual stimulus categories is one of the most fascinating topics in the cognitive neurosciences. Interestingly, recent research suggests the existence of additional regions for newly acquired stimuli such as letters (letter form area; LFA; Thesen et al., 2012) and numbers (visual number form area; NFA; Shum et al., 2013). However, neuroimaging methods thus far have failed to visualize the NFA in healthy participants, likely due to fMRI signal dropout caused by the air/bone interface of the petrous bone (Shum et al., 2013). In the current study, we combined a 64-channel head coil with high spatial resolution, localized shimming, and liberal smoothing, thereby decreasing the signal dropout and increasing the temporal signal-to-noise ratio in the neighborhood of the NFA. We presented subjects with numbers, letters, false numbers, false letters, objects and their Fourier randomized versions. A group analysis showed significant activations in the inferior temporal gyrus at the previously proposed location of the NFA. Crucially, we found the NFA to be present in both hemispheres. Further, we could identify the NFA on the single-subject level in most of our participants. A detailed analysis of the response profile of the NFA in two separate experiments confirmed the whole-brain results since responses to numbers were significantly higher than to any other presented stimulus in both hemispheres. Our results show for the first time the existence and stimulus selectivity of the NFA in the healthy human brain. This fMRI study shows for the first time a cluster of neurons selective for visually presented numbers in healthy human adults. This visual number form area (NFA) was found in both hemispheres. Crucially, numbers have gained importance for humans too recently for neuronal specialization to be established by evolution. Therefore, investigations of this region will greatly advance our understanding of learning and plasticity in the brain. In addition, these results will aid our knowledge regarding related neurological illnesses (e.g., dyscalculia). To overcome the fMRI signal dropout in the neighborhood of the NFA, we combined high spatial resolution with liberal smoothing. We believe that this approach will be useful to the broad neuroimaging community. Copyright © 2016 the authors 0270-6474/16/360088-10$15.00/0.
Qutaish, Mohammed Q.; Sullivant, Kristin E.; Burden-Gulley, Susan M.; Lu, Hong; Roy, Debashish; Wang, Jing; Basilion, James P.; Brady-Kalnay, Susann M.; Wilson, David L.
2012-01-01
Purpose The goals of this study were to create cryo-imaging methods to quantify characteristics (size, dispersal, and blood vessel density) of mouse orthotopic models of glioblastoma multiforme (GBM) and to enable studies of tumor biology, targeted imaging agents, and theranostic nanoparticles. Procedures Green fluorescent protein-labeled, human glioma LN-229 cells were implanted into mouse brain. At 20–38 days, cryo-imaging gave whole brain, 4-GB, 3D microscopic images of bright field anatomy, including vasculature, and fluorescent tumor. Image analysis/visualization methods were developed. Results Vessel visualization and segmentation methods successfully enabled analyses. The main tumor mass volume, the number of dispersed clusters, the number of cells/cluster, and the percent dispersed volume all increase with age of the tumor. Histograms of dispersal distance give a mean and median of 63 and 56 μm, respectively, averaged over all brains. Dispersal distance tends to increase with age of the tumors. Dispersal tends to occur along blood vessels. Blood vessel density did not appear to increase in and around the tumor with this cell line. Conclusion Cryo-imaging and software allow, for the first time, 3D, whole brain, microscopic characterization of a tumor from a particular cell line. LN-229 exhibits considerable dispersal along blood vessels, a characteristic of human tumors that limits treatment success. PMID:22125093
StrAuto: automation and parallelization of STRUCTURE analysis.
Chhatre, Vikram E; Emerson, Kevin J
2017-03-24
Population structure inference using the software STRUCTURE has become an integral part of population genetic studies covering a broad spectrum of taxa including humans. The ever-expanding size of genetic data sets poses computational challenges for this analysis. Although at least one tool currently implements parallel computing to reduce computational overload of this analysis, it does not fully automate the use of replicate STRUCTURE analysis runs required for downstream inference of optimal K. There is pressing need for a tool that can deploy population structure analysis on high performance computing clusters. We present an updated version of the popular Python program StrAuto, to streamline population structure analysis using parallel computing. StrAuto implements a pipeline that combines STRUCTURE analysis with the Evanno Δ K analysis and visualization of results using STRUCTURE HARVESTER. Using benchmarking tests, we demonstrate that StrAuto significantly reduces the computational time needed to perform iterative STRUCTURE analysis by distributing runs over two or more processors. StrAuto is the first tool to integrate STRUCTURE analysis with post-processing using a pipeline approach in addition to implementing parallel computation - a set up ideal for deployment on computing clusters. StrAuto is distributed under the GNU GPL (General Public License) and available to download from http://strauto.popgen.org .
Lin, Sheng-Hsiang; Liu, Chih-Min; Liu, Yu-Li; Fann, Cathy Shen-Jang; Hsiao, Po-Chang; Wu, Jer-Yuarn; Hung, Shuen-Iu; Chen, Chun-Houh; Wu, Han-Ming; Jou, Yuh-Shan; Liu, Shi K.; Hwang, Tzung J.; Hsieh, Ming H.; Chang, Chien-Ching; Yang, Wei-Chih; Lin, Jin-Jia; Chou, Frank Huang-Chih; Faraone, Stephen V.; Tsuang, Ming T.; Hwu, Hai-Gwo; Chen, Wei J.
2009-01-01
Chromosome 6p is one of the most commonly implicated regions in the genome-wide linkage scans of schizophrenia, whereas further association studies for markers in this region were inconsistent likely due to heterogeneity. This study aimed to identify more homogeneous subgroups of families for fine mapping on regions around markers D6S296 and D6S309 (both in 6p24.3) as well as D6S274 (in 6p22.3) by means of similarity in neurocognitive functioning. A total of 160 families of patients with schizophrenia comprising at least two affected siblings who had data for 8 neurocognitive test variables of the Continuous Performance Test (CPT) and the Wisconsin Card Sorting Test (WCST) were subjected to cluster analysis with data visualization using the test scores of both affected siblings. Family clusters derived were then used separately in family-based association tests for 64 single nucleotide polymorphisms covering the region of 6p24.3 and 6p22.3. Three clusters were derived from the family-based clustering, with deficit cluster 1 representing deficit on the CPT, deficit cluster 2 representing deficit on both the CPT and the WCST, and a third cluster of non-deficit. After adjustment using false discovery rate for multiple testing, SNP rs13873 and haplotype rs1225934-rs13873 on BMP6-TXNDC5 genes were significantly associated with schizophrenia for the deficit cluster 1 but not for the deficit cluster 2 or non-deficit cluster. Our results provide further evidence that the BMP6-TXNDC5 locus on 6p24.3 may play a role in the selective impairments on sustained attention of schizophrenia. PMID:19694819
DiCarlo, James J.; Zecchina, Riccardo; Zoccolan, Davide
2013-01-01
The anterior inferotemporal cortex (IT) is the highest stage along the hierarchy of visual areas that, in primates, processes visual objects. Although several lines of evidence suggest that IT primarily represents visual shape information, some recent studies have argued that neuronal ensembles in IT code the semantic membership of visual objects (i.e., represent conceptual classes such as animate and inanimate objects). In this study, we investigated to what extent semantic, rather than purely visual information, is represented in IT by performing a multivariate analysis of IT responses to a set of visual objects. By relying on a variety of machine-learning approaches (including a cutting-edge clustering algorithm that has been recently developed in the domain of statistical physics), we found that, in most instances, IT representation of visual objects is accounted for by their similarity at the level of shape or, more surprisingly, low-level visual properties. Only in a few cases we observed IT representations of semantic classes that were not explainable by the visual similarity of their members. Overall, these findings reassert the primary function of IT as a conveyor of explicit visual shape information, and reveal that low-level visual properties are represented in IT to a greater extent than previously appreciated. In addition, our work demonstrates how combining a variety of state-of-the-art multivariate approaches, and carefully estimating the contribution of shape similarity to the representation of object categories, can substantially advance our understanding of neuronal coding of visual objects in cortex. PMID:23950700
A method to detect progression of glaucoma using the multifocal visual evoked potential technique
Wangsupadilok, Boonchai; Kanadani, Fabio N.; Grippo, Tomas M.; Liebmann, Jeffrey M.; Ritch, Robert; Hood, Donald C.
2010-01-01
Purpose To describe a method for monitoring progression of glaucoma using the multifocal visual evoked potential (mfVEP) technique. Methods Eighty-seven patients diagnosed with open-angle glaucoma were divided into two groups. Group I, comprised 43 patients who had a repeat mfVEP test within 50 days (mean 0.9 ± 0.5 months), and group II, 44 patients who had a repeat test after at least 6 months (mean 20.7 ± 9.7 months). Monocular mfVEPs were obtained using a 60-sector pattern reversal dartboard display. Monocular and interocular analyses were performed. Data from the two visits were compared. The total number of abnormal test points with P < 5% within the visual field (total scores) and number of abnormal test points within a cluster (cluster size) were calculated. Data for group I provided a measure of test–retest variability independent of disease progression. Data for group II provided a possible measure of progression. Results The difference in the total scores for group II between visit 1 and visit 2 for the interocular and monocular comparison was significant (P < 0.05) as was the difference in cluster size for the interocular comparison (P < 0.05). Group I did not show a significant change in either total score or cluster size. Conclusion The change in the total score and cluster size over time provides a possible method for assessing progression of glaucoma with the mfVEP technique. PMID:18830654
CoryneRegNet 4.0 – A reference database for corynebacterial gene regulatory networks
Baumbach, Jan
2007-01-01
Background Detailed information on DNA-binding transcription factors (the key players in the regulation of gene expression) and on transcriptional regulatory interactions of microorganisms deduced from literature-derived knowledge, computer predictions and global DNA microarray hybridization experiments, has opened the way for the genome-wide analysis of transcriptional regulatory networks. The large-scale reconstruction of these networks allows the in silico analysis of cell behavior in response to changing environmental conditions. We previously published CoryneRegNet, an ontology-based data warehouse of corynebacterial transcription factors and regulatory networks. Initially, it was designed to provide methods for the analysis and visualization of the gene regulatory network of Corynebacterium glutamicum. Results Now we introduce CoryneRegNet release 4.0, which integrates data on the gene regulatory networks of 4 corynebacteria, 2 mycobacteria and the model organism Escherichia coli K12. As the previous versions, CoryneRegNet provides a web-based user interface to access the database content, to allow various queries, and to support the reconstruction, analysis and visualization of regulatory networks at different hierarchical levels. In this article, we present the further improved database content of CoryneRegNet along with novel analysis features. The network visualization feature GraphVis now allows the inter-species comparisons of reconstructed gene regulatory networks and the projection of gene expression levels onto that networks. Therefore, we added stimulon data directly into the database, but also provide Web Service access to the DNA microarray analysis platform EMMA. Additionally, CoryneRegNet now provides a SOAP based Web Service server, which can easily be consumed by other bioinformatics software systems. Stimulons (imported from the database, or uploaded by the user) can be analyzed in the context of known transcriptional regulatory networks to predict putative contradictions or further gene regulatory interactions. Furthermore, it integrates protein clusters by means of heuristically solving the weighted graph cluster editing problem. In addition, it provides Web Service based access to up to date gene annotation data from GenDB. Conclusion The release 4.0 of CoryneRegNet is a comprehensive system for the integrated analysis of procaryotic gene regulatory networks. It is a versatile systems biology platform to support the efficient and large-scale analysis of transcriptional regulation of gene expression in microorganisms. It is publicly available at . PMID:17986320
Storyline Visualizations of Eye Tracking of Movie Viewing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balint, John T.; Arendt, Dustin L.; Blaha, Leslie M.
Storyline visualizations offer an approach that promises to capture the spatio-temporal characteristics of individual observers and simultaneously illustrate emerging group behaviors. We develop a visual analytics approach to parsing, aligning, and clustering fixation sequences from eye tracking data. Visualization of the results captures the similarities and differences across a group of observers performing a common task. We apply our storyline approach to visualize gaze patterns of people watching dynamic movie clips. Storylines mitigate some of the shortcomings of existent spatio-temporal visualization techniques and, importantly, continue to highlight individual observer behavioral dynamics.
COGNAT: a web server for comparative analysis of genomic neighborhoods.
Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y
2017-11-22
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
a Web-Based Platform for Visualizing Spatiotemporal Dynamics of Big Taxi Data
NASA Astrophysics Data System (ADS)
Xiong, H.; Chen, L.; Gui, Z.
2017-09-01
With more and more vehicles equipped with Global Positioning System (GPS), access to large-scale taxi trajectory data has become increasingly easy. Taxis are valuable sensors and information associated with taxi trajectory can provide unprecedented insight into many aspects of city life. But analysing these data presents many challenges. Visualization of taxi data is an efficient way to represent its distributions and structures and reveal hidden patterns in the data. However, Most of the existing visualization systems have some shortcomings. On the one hand, the passenger loading status and speed information cannot be expressed. On the other hand, mono-visualization form limits the information presentation. In view of these problems, this paper designs and implements a visualization system in which we use colour and shape to indicate passenger loading status and speed information and integrate various forms of taxi visualization. The main work as follows: 1. Pre-processing and storing the taxi data into MongoDB database. 2. Visualization of hotspots for taxi pickup points. Through DBSCAN clustering algorithm, we cluster the extracted taxi passenger's pickup locations to produce passenger hotspots. 3. Visualizing the dynamic of taxi moving trajectory using interactive animation. We use a thinning algorithm to reduce the amount of data and design a preloading strategyto load the data smoothly. Colour and shape are used to visualize the taxi trajectory data.
NASA Technical Reports Server (NTRS)
Heinemann, K.; Poppa, H.
1975-01-01
Direct evidence is reported for the simultaneous occurrence of Ostwald ripening and short-distance cluster mobility during annealing of discontinuous metal films on clean amorphous substrates. The annealing characteristics of very thin particulate deposits of silver on amorphized clean surfaces of single crystalline thin graphite substrates were studied by in-situ transmission electron microscopy (TEM) under controlled environmental conditions (residual gas pressure of 10 to the minus 9th power torr) in the temperature range from 25 to 450 C. Sputter cleaning of the substrate surface, metal deposition, and annealing were monitored by TEM observation. Pseudostereographic presentation of micrographs in different annealing stages, the observation of the annealing behavior at cast shadow edges, and measurements with an electronic image analyzing system were employed to aid the visual perception and the analysis of changes in deposit structure recorded during annealing. Slow Ostwald ripening was found to occur in the entire temperature range, but the overriding surface transport mechanism was short-distance cluster mobility.
Saliency detection algorithm based on LSC-RC
NASA Astrophysics Data System (ADS)
Wu, Wei; Tian, Weiye; Wang, Ding; Luo, Xin; Wu, Yingfei; Zhang, Yu
2018-02-01
Image prominence is the most important region in an image, which can cause the visual attention and response of human beings. Preferentially allocating the computer resources for the image analysis and synthesis by the significant region is of great significance to improve the image area detecting. As a preprocessing of other disciplines in image processing field, the image prominence has widely applications in image retrieval and image segmentation. Among these applications, the super-pixel segmentation significance detection algorithm based on linear spectral clustering (LSC) has achieved good results. The significance detection algorithm proposed in this paper is better than the regional contrast ratio by replacing the method of regional formation in the latter with the linear spectral clustering image is super-pixel block. After combining with the latest depth learning method, the accuracy of the significant region detecting has a great promotion. At last, the superiority and feasibility of the super-pixel segmentation detection algorithm based on linear spectral clustering are proved by the comparative test.
The Effect of Longer-Term and Exclusive Breastfeeding Promotion on Visual Outcome in Adolescence
Owen, Christopher G.; Oken, Emily; Rudnicka, Alicja R.; Patel, Rita; Thompson, Jennifer; Rifas-Shiman, Sheryl L.; Vilchuck, Konstatin; Bogdanovich, Natalia; Hameza, Mikhail; Kramer, Michael S.; Martin, Richard M.
2018-01-01
Purpose Breastfeeding may influence early visual development. We examined whether an intervention to promote increased duration and exclusivity of breastfeeding improves visual outcomes at 16 years of age. Methods Follow-up of a cluster-randomized trial in 31 Belarusian maternity hospitals/polyclinics randomized to receive a breastfeeding promotion intervention, or usual care, where 46% vs. 3% were exclusively breastfed at 3 months respectively. Low vision in either eye was defined as unaided logMAR vision of ≥0.3 or worse (equivalent to Snellen 20/40) and was used as the primary outcome. Open-field autorefraction in a subset (n = 963) suggested that 84% of those with low vision were myopic. Primary analysis was based on modified intention-to-treat, accounting for clustering within hospitals/clinics. Observational analyses also examined the effect of breastfeeding duration and exclusivity, as well as other sociodemographic and environmental determinants of low vision. Results A total of 13,392 of 17,046 (79%) participants were followed up at 16 years. Low vision prevalence was 19.6% (95% confidence interval [CI]: 17.5, 22.0%) in the experimental group versus 21.6% (19.5, 23.8%) in the control group. Cluster-adjusted odds ratio (OR) of low vision associated with the intervention was 0.92 (95% CI: 0.73, 1.16); 0.88 (95% CI: 0.74, 1.05) after adjustment for parental and early life factors. In observational analyses, breastfeeding duration and exclusivity had no significant effect on low vision. However, maternal age at birth (OR: 1.13, 95% CI: 1.07, 1.14/5-year increase) and urban versus rural residence were associated with increased risk of low vision. Lower parental education, number of older siblings was associated with a lower risk of low vision; boys had lower risk compared with girls (0.64, 95% CI: 0.59,0.70). Conclusions Exclusive breastfeeding promotion had no significant effect on visual outcomes in this study, but other environmental factors showed strong associations. (ClinicalTrials.gov number, NCT01561612.) PMID:29860453
MASPECTRAS: a platform for management and analysis of proteomics LC-MS/MS data
Hartler, Jürgen; Thallinger, Gerhard G; Stocker, Gernot; Sturn, Alexander; Burkard, Thomas R; Körner, Erik; Rader, Robert; Schmidt, Andreas; Mechtler, Karl; Trajanoski, Zlatko
2007-01-01
Background The advancements of proteomics technologies have led to a rapid increase in the number, size and rate at which datasets are generated. Managing and extracting valuable information from such datasets requires the use of data management platforms and computational approaches. Results We have developed the MAss SPECTRometry Analysis System (MASPECTRAS), a platform for management and analysis of proteomics LC-MS/MS data. MASPECTRAS is based on the Proteome Experimental Data Repository (PEDRo) relational database schema and follows the guidelines of the Proteomics Standards Initiative (PSI). Analysis modules include: 1) import and parsing of the results from the search engines SEQUEST, Mascot, Spectrum Mill, X! Tandem, and OMSSA; 2) peptide validation, 3) clustering of proteins based on Markov Clustering and multiple alignments; and 4) quantification using the Automated Statistical Analysis of Protein Abundance Ratios algorithm (ASAPRatio). The system provides customizable data retrieval and visualization tools, as well as export to PRoteomics IDEntifications public repository (PRIDE). MASPECTRAS is freely available at Conclusion Given the unique features and the flexibility due to the use of standard software technology, our platform represents significant advance and could be of great interest to the proteomics community. PMID:17567892
Methods to estimate lightning activity using WWLLN and RS data
NASA Astrophysics Data System (ADS)
Baranovskiy, Nikolay V.; Belikova, Marina Yu.; Karanina, Svetlana Yu.; Karanin, Andrey V.; Glebova, Alena V.
2017-11-01
The aim of the work is to develop a comprehensive method for assessing thunderstorm activity using WWLLN and RS data. It is necessary to group lightning discharges to solve practical problems of lightning protection and lightningcaused forest fire danger, as well as climatology problems using information on the spatial and temporal characteristics of thunderstorms. For grouping lightning discharges, it is proposed to use clustering algorithms. The region covering Timiryazevskiy forestry (Tomsk region, borders (55.93 - 56.86)x(83.94 - 85.07)) was selected for the computational experiment. We used the data on lightning discharges registered by the WWLLN network in this region on July 23, 2014. 273 lightning discharges were sampling. A relatively small number of discharges allowed us a visual analysis of solutions obtained during clustering.
Detection and tracking of gas plumes in LWIR hyperspectral video sequence data
NASA Astrophysics Data System (ADS)
Gerhart, Torin; Sunu, Justin; Lieu, Lauren; Merkurjev, Ekaterina; Chang, Jen-Mei; Gilles, Jérôme; Bertozzi, Andrea L.
2013-05-01
Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we present an effective method of visualizing hyperspectral video sequences containing chemical plumes and investigate the effectiveness of segmentation techniques on these post-processed videos. Our approach uses a combination of dimension reduction and histogram equalization to prepare the hyperspectral videos for segmentation. First, Principal Components Analysis (PCA) is used to reduce the dimension of the entire video sequence. This is done by projecting each pixel onto the first few Principal Components resulting in a type of spectral filter. Next, a Midway method for histogram equalization is used. These methods redistribute the intensity values in order to reduce icker between frames. This properly prepares these high-dimensional video sequences for more traditional segmentation techniques. We compare the ability of various clustering techniques to properly segment the chemical plume. These include K-means, spectral clustering, and the Ginzburg-Landau functional.
Visual Reconciliation of Alternative Similarity Spaces in Climate Modeling.
Poco, Jorge; Dasgupta, Aritra; Wei, Yaxing; Hargrove, William; Schwalm, Christopher R; Huntzinger, Deborah N; Cook, Robert; Bertini, Enrico; Silva, Claudio T
2014-12-01
Visual data analysis often requires grouping of data objects based on their similarity. In many application domains researchers use algorithms and techniques like clustering and multidimensional scaling to extract groupings from data. While extracting these groups using a single similarity criteria is relatively straightforward, comparing alternative criteria poses additional challenges. In this paper we define visual reconciliation as the problem of reconciling multiple alternative similarity spaces through visualization and interaction. We derive this problem from our work on model comparison in climate science where climate modelers are faced with the challenge of making sense of alternative ways to describe their models: one through the output they generate, another through the large set of properties that describe them. Ideally, they want to understand whether groups of models with similar spatio-temporal behaviors share similar sets of criteria or, conversely, whether similar criteria lead to similar behaviors. We propose a visual analytics solution based on linked views, that addresses this problem by allowing the user to dynamically create, modify and observe the interaction among groupings, thereby making the potential explanations apparent. We present case studies that demonstrate the usefulness of our technique in the area of climate science.
MemAxes: Visualization and Analytics for Characterizing Complex Memory Performance Behaviors.
Gimenez, Alfredo; Gamblin, Todd; Jusufi, Ilir; Bhatele, Abhinav; Schulz, Martin; Bremer, Peer-Timo; Hamann, Bernd
2018-07-01
Memory performance is often a major bottleneck for high-performance computing (HPC) applications. Deepening memory hierarchies, complex memory management, and non-uniform access times have made memory performance behavior difficult to characterize, and users require novel, sophisticated tools to analyze and optimize this aspect of their codes. Existing tools target only specific factors of memory performance, such as hardware layout, allocations, or access instructions. However, today's tools do not suffice to characterize the complex relationships between these factors. Further, they require advanced expertise to be used effectively. We present MemAxes, a tool based on a novel approach for analytic-driven visualization of memory performance data. MemAxes uniquely allows users to analyze the different aspects related to memory performance by providing multiple visual contexts for a centralized dataset. We define mappings of sampled memory access data to new and existing visual metaphors, each of which enabling a user to perform different analysis tasks. We present methods to guide user interaction by scoring subsets of the data based on known performance problems. This scoring is used to provide visual cues and automatically extract clusters of interest. We designed MemAxes in collaboration with experts in HPC and demonstrate its effectiveness in case studies.
Clusters in irregular areas and lattices.
Wieczorek, William F; Delmerico, Alan M; Rogerson, Peter A; Wong, David W S
2012-01-01
Geographic areas of different sizes and shapes of polygons that represent counts or rate data are often encountered in social, economic, health, and other information. Often political or census boundaries are used to define these areas because the information is available only for those geographies. Therefore, these types of boundaries are frequently used to define neighborhoods in spatial analyses using geographic information systems and related approaches such as multilevel models. When point data can be geocoded, it is possible to examine the impact of polygon shape on spatial statistical properties, such as clustering. We utilized point data (alcohol outlets) to examine the issue of polygon shape and size on visualization and statistical properties. The point data were allocated to regular lattices (hexagons and squares) and census areas for zip-code tabulation areas and tracts. The number of units in the lattices was set to be similar to the number of tract and zip-code areas. A spatial clustering statistic and visualization were used to assess the impact of polygon shape for zip- and tract-sized units. Results showed substantial similarities and notable differences across shape and size. The specific circumstances of a spatial analysis that aggregates points to polygons will determine the size and shape of the areal units to be used. The irregular polygons of census units may reflect underlying characteristics that could be missed by large regular lattices. Future research to examine the potential for using a combination of irregular polygons and regular lattices would be useful.
A New MI-Based Visualization Aided Validation Index for Mining Big Longitudinal Web Trial Data
Zhang, Zhaoyang; Fang, Hua; Wang, Honggang
2016-01-01
Web-delivered clinical trials generate big complex data. To help untangle the heterogeneity of treatment effects, unsupervised learning methods have been widely applied. However, identifying valid patterns is a priority but challenging issue for these methods. This paper, built upon our previous research on multiple imputation (MI)-based fuzzy clustering and validation, proposes a new MI-based Visualization-aided validation index (MIVOOS) to determine the optimal number of clusters for big incomplete longitudinal Web-trial data with inflated zeros. Different from a recently developed fuzzy clustering validation index, MIVOOS uses a more suitable overlap and separation measures for Web-trial data but does not depend on the choice of fuzzifiers as the widely used Xie and Beni (XB) index. Through optimizing the view angles of 3-D projections using Sammon mapping, the optimal 2-D projection-guided MIVOOS is obtained to better visualize and verify the patterns in conjunction with trajectory patterns. Compared with XB and VOS, our newly proposed MIVOOS shows its robustness in validating big Web-trial data under different missing data mechanisms using real and simulated Web-trial data. PMID:27482473
NASA Astrophysics Data System (ADS)
Furnell, Kate E.; Collins, Chris A.; Kelvin, Lee S.; Clerc, Nicolas; Baldry, Ivan K.; Finoguenov, Alexis; Erfanianfar, Ghazaleh; Comparat, Johan; Schneider, Donald P.
2018-04-01
We present a sample of 329 low to intermediate redshift (0.05 < z < 0.3) brightest cluster galaxies (BCGs) in X-ray selected clusters from the SPectroscopic IDentification of eRosita Sources (SPIDERS) survey, a spectroscopic survey within Sloan Digital Sky Survey-IV (SDSS-IV). We define our BCGs by simultaneous consideration of legacy X-ray data from ROSAT, maximum likelihood outputs from an optical cluster-finder algorithm and visual inspection. Using SDSS imaging data, we fit Sérsic profiles to our BCGs in three bands (g, r, i) with SIGMA, a GALFIT-based software wrapper. We examine the reliability of our fits by running our pipeline on ˜104 psf-convolved model profiles injected into 8 random cluster fields; we then use the results of this analysis to create a robust subsample of 198 BCGs. We outline three cluster properties of interest: overall cluster X-ray luminosity (LX), cluster richness as estimated by REDMAPPER (λ) and cluster halo mass (M200), which is estimated via velocity dispersion. In general, there are significant correlations with BCG stellar mass between all three environmental properties, but no significant trends arise with either Sérsic index or effective radius. There is no major environmental dependence on the strength of the relation between effective radius and BCG stellar mass. Stellar mass therefore arises as the most important factor governing BCG morphology. Our results indicate that our sample consists of a large number of relaxed, mature clusters containing broadly homogeneous BCGs up to z ˜ 0.3, suggesting that there is little evidence for much ongoing structural evolution for BCGs in these systems.
Wilkinson, Krista M.; McIlvane, William J.
2013-01-01
Augmentative and alternative communication (AAC) systems often supplement oral communication of individuals with intellectual and communication disabilities. Research with nondisabled preschoolers has demonstrated that two visual perceptual factors influence speed and/or accuracy of finding a target - the internal color and spatial organization of symbols. Twelve participants with Down syndrome and 12 with ASD underwent two search tasks. In one, the symbols were clustered by internal color; in the other the identical symbols had no arrangement cue. Visual search was superior in participants with ASD compared to those with Down syndrome. In both groups, responses were significantly faster when the symbols were clustered by internal color. Construction of aided AAC displays may benefit from attention to their physical/perceptual features. PMID:24245729
Evaporation-driven clustering of microscale pillars and lamellae
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Tae-Hong; Kim, Jungchul; Kim, Ho-Young, E-mail: hyk@snu.ac.kr
As a liquid film covering an array of micro- or nanoscale pillars or lamellae evaporates, its meniscus pulls the elastic patterns together because of capillary effects, leading to clustering of the slender microstructures. While this elastocapillary coalescence may imply various useful applications, it is detrimental to a semiconductor manufacturing process called the spin drying, where a liquid film rinses patterned wafers until drying. To understand the transient mechanism underlying such self-organization during and after liquid evaporation, we visualize the clustering dynamics of polymer micropatterns. Our visualization experiments reveal that the patterns clumped during liquid evaporation can be re-separated when completelymore » dried in some cases. This restoration behavior is explained by considering adhesion energy of the patterns as well as capillary forces, which leads to a regime map to predict whether permanent stiction would occur. This work does not only extend our understanding of micropattern stiction, but also suggests a novel path to control and prevent pattern clustering.« less
Nonlinear dimensionality reduction methods for synthetic biology biobricks' visualization.
Yang, Jiaoyun; Wang, Haipeng; Ding, Huitong; An, Ning; Alterovitz, Gil
2017-01-19
Visualizing data by dimensionality reduction is an important strategy in Bioinformatics, which could help to discover hidden data properties and detect data quality issues, e.g. data noise, inappropriately labeled data, etc. As crowdsourcing-based synthetic biology databases face similar data quality issues, we propose to visualize biobricks to tackle them. However, existing dimensionality reduction methods could not be directly applied on biobricks datasets. Hereby, we use normalized edit distance to enhance dimensionality reduction methods, including Isomap and Laplacian Eigenmaps. By extracting biobricks from synthetic biology database Registry of Standard Biological Parts, six combinations of various types of biobricks are tested. The visualization graphs illustrate discriminated biobricks and inappropriately labeled biobricks. Clustering algorithm K-means is adopted to quantify the reduction results. The average clustering accuracy for Isomap and Laplacian Eigenmaps are 0.857 and 0.844, respectively. Besides, Laplacian Eigenmaps is 5 times faster than Isomap, and its visualization graph is more concentrated to discriminate biobricks. By combining normalized edit distance with Isomap and Laplacian Eigenmaps, synthetic biology biobircks are successfully visualized in two dimensional space. Various types of biobricks could be discriminated and inappropriately labeled biobricks could be determined, which could help to assess crowdsourcing-based synthetic biology databases' quality, and make biobricks selection.
Visualizing human communication in business process simulations
NASA Astrophysics Data System (ADS)
Groehn, Matti; Jalkanen, Janne; Haho, Paeivi; Nieminen, Marko; Smeds, Riitta
1999-03-01
In this paper a description of business process simulation is given. Crucial part in the simulation of business processes is the analysis of social contacts between the participants. We will introduce a tool to collect log data and how this log data can be effectively analyzed using two different kind of methods: discussion flow charts and self-organizing maps. Discussion flow charts revealed the communication patterns and self-organizing maps are a very effective way of clustering the participants into development groups.
SpatialEpiApp: A Shiny web application for the analysis of spatial and spatio-temporal disease data.
Moraga, Paula
2017-11-01
During last years, public health surveillance has been facilitated by the existence of several packages implementing statistical methods for the analysis of spatial and spatio-temporal disease data. However, these methods are still inaccesible for many researchers lacking the adequate programming skills to effectively use the required software. In this paper we present SpatialEpiApp, a Shiny web application that integrate two of the most common approaches in health surveillance: disease mapping and detection of clusters. SpatialEpiApp is easy to use and does not require any programming knowledge. Given information about the cases, population and optionally covariates for each of the areas and dates of study, the application allows to fit Bayesian models to obtain disease risk estimates and their uncertainty by using R-INLA, and to detect disease clusters by using SaTScan. The application allows user interaction and the creation of interactive data visualizations and reports showing the analyses performed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Statistical and clustering analysis for disturbances: A case study of voltage dips in wind farms
Garcia-Sanchez, Tania; Gomez-Lazaro, Emilio; Muljadi, Eduard; ...
2016-01-28
This study proposes and evaluates an alternative statistical methodology to analyze a large number of voltage dips. For a given voltage dip, a set of lengths is first identified to characterize the root mean square (rms) voltage evolution along the disturbance, deduced from partial linearized time intervals and trajectories. Principal component analysis and K-means clustering processes are then applied to identify rms-voltage patterns and propose a reduced number of representative rms-voltage profiles from the linearized trajectories. This reduced group of averaged rms-voltage profiles enables the representation of a large amount of disturbances, which offers a visual and graphical representation ofmore » their evolution along the events, aspects that were not previously considered in other contributions. The complete process is evaluated on real voltage dips collected in intense field-measurement campaigns carried out in a wind farm in Spain among different years. The results are included in this paper.« less
Rigbolt, Kristoffer T G; Vanselow, Jens T; Blagoev, Blagoy
2011-08-01
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)(1). The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net.
Rigbolt, Kristoffer T. G.; Vanselow, Jens T.; Blagoev, Blagoy
2011-01-01
Recent technological advances have made it possible to identify and quantify thousands of proteins in a single proteomics experiment. As a result of these developments, the analysis of data has become the bottleneck of proteomics experiment. To provide the proteomics community with a user-friendly platform for comprehensive analysis, inspection and visualization of quantitative proteomics data we developed the Graphical Proteomics Data Explorer (GProX)1. The program requires no special bioinformatics training, as all functions of GProX are accessible within its graphical user-friendly interface which will be intuitive to most users. Basic features facilitate the uncomplicated management and organization of large data sets and complex experimental setups as well as the inspection and graphical plotting of quantitative data. These are complemented by readily available high-level analysis options such as database querying, clustering based on abundance ratios, feature enrichment tests for e.g. GO terms and pathway analysis tools. A number of plotting options for visualization of quantitative proteomics data is available and most analysis functions in GProX create customizable high quality graphical displays in both vector and bitmap formats. The generic import requirements allow data originating from essentially all mass spectrometry platforms, quantitation strategies and software to be analyzed in the program. GProX represents a powerful approach to proteomics data analysis providing proteomics experimenters with a toolbox for bioinformatics analysis of quantitative proteomics data. The program is released as open-source and can be freely downloaded from the project webpage at http://gprox.sourceforge.net. PMID:21602510
Enhanced HMAX model with feedforward feature learning for multiclass categorization.
Li, Yinlin; Wu, Wei; Zhang, Bo; Li, Fengfu
2015-01-01
In recent years, the interdisciplinary research between neuroscience and computer vision has promoted the development in both fields. Many biologically inspired visual models are proposed, and among them, the Hierarchical Max-pooling model (HMAX) is a feedforward model mimicking the structures and functions of V1 to posterior inferotemporal (PIT) layer of the primate visual cortex, which could generate a series of position- and scale- invariant features. However, it could be improved with attention modulation and memory processing, which are two important properties of the primate visual cortex. Thus, in this paper, based on recent biological research on the primate visual cortex, we still mimic the first 100-150 ms of visual cognition to enhance the HMAX model, which mainly focuses on the unsupervised feedforward feature learning process. The main modifications are as follows: (1) To mimic the attention modulation mechanism of V1 layer, a bottom-up saliency map is computed in the S1 layer of the HMAX model, which can support the initial feature extraction for memory processing; (2) To mimic the learning, clustering and short-term memory to long-term memory conversion abilities of V2 and IT, an unsupervised iterative clustering method is used to learn clusters with multiscale middle level patches, which are taken as long-term memory; (3) Inspired by the multiple feature encoding mode of the primate visual cortex, information including color, orientation, and spatial position are encoded in different layers of the HMAX model progressively. By adding a softmax layer at the top of the model, multiclass categorization experiments can be conducted, and the results on Caltech101 show that the enhanced model with a smaller memory size exhibits higher accuracy than the original HMAX model, and could also achieve better accuracy than other unsupervised feature learning methods in multiclass categorization task.
Superresolution Imaging of Human Cytomegalovirus vMIA Localization in Sub-Mitochondrial Compartments
Bhuvanendran, Shivaprasad; Salka, Kyle; Rainey, Kristin; Sreetama, Sen Chandra; Williams, Elizabeth; Leeker, Margretha; Prasad, Vidhya; Boyd, Jonathan; Patterson, George H.; Jaiswal, Jyoti K.; Colberg-Poley, Anamaris M.
2014-01-01
The human cytomegalovirus (HCMV) viral mitochondria-localized inhibitor of apoptosis (vMIA) protein, traffics to mitochondria-associated membranes (MAM), where the endoplasmic reticulum (ER) contacts the outer mitochondrial membrane (OMM). vMIA association with the MAM has not been visualized by imaging. Here, we have visualized this by using a combination of confocal and superresolution imaging. Deconvolution of confocal microscopy images shows vMIA localizes away from mitochondrial matrix at the Mitochondria-ER interface. By gated stimulated emission depletion (GSTED) imaging, we show that along this interface vMIA is distributed in clusters. Through multicolor, multifocal structured illumination microscopy (MSIM), we find vMIA clusters localize away from MitoTracker Red, indicating its OMM localization. GSTED and MSIM imaging show vMIA exists in clusters of ~100–150 nm, which is consistent with the cluster size determined by Photoactivated Localization Microscopy (PALM). With these diverse superresolution approaches, we have imaged the clustered distribution of vMIA at the OMM adjacent to the ER. Our findings directly compare the relative advantages of each of these superresolution imaging modalities for imaging components of the MAM and sub-mitochondrial compartments. These studies establish the ability of superresolution imaging to provide valuable insight into viral protein location, particularly in the sub-mitochondrial compartments, and into their clustered organization. PMID:24721787
VisBricks: multiform visualization of large, inhomogeneous data.
Lex, Alexander; Schulz, Hans-Jörg; Streit, Marc; Partl, Christian; Schmalstieg, Dieter
2011-12-01
Large volumes of real-world data often exhibit inhomogeneities: vertically in the form of correlated or independent dimensions and horizontally in the form of clustered or scattered data items. In essence, these inhomogeneities form the patterns in the data that researchers are trying to find and understand. Sophisticated statistical methods are available to reveal these patterns, however, the visualization of their outcomes is mostly still performed in a one-view-fits-all manner. In contrast, our novel visualization approach, VisBricks, acknowledges the inhomogeneity of the data and the need for different visualizations that suit the individual characteristics of the different data subsets. The overall visualization of the entire data set is patched together from smaller visualizations, there is one VisBrick for each cluster in each group of interdependent dimensions. Whereas the total impression of all VisBricks together gives a comprehensive high-level overview of the different groups of data, each VisBrick independently shows the details of the group of data it represents. State-of-the-art brushing and visual linking between all VisBricks furthermore allows the comparison of the groupings and the distribution of data items among them. In this paper, we introduce the VisBricks visualization concept, discuss its design rationale and implementation, and demonstrate its usefulness by applying it to a use case from the field of biomedicine. © 2011 IEEE
Visual deprivation alters dendritic bundle architecture in layer 4 of rat visual cortex.
Gabbott, P L; Stewart, M G
2012-04-05
The effect of visual deprivation followed by light exposure on the tangential organisation of dendritic bundles passing through layer 4 of the rat visual cortex was studied quantitatively in the light microscope. Four groups of animals were investigated: (I) rats reared in an environment illuminated normally--group 52 dL; (II) rats reared in the dark until 21 days postnatum (DPN) and subsequently light exposed for 31 days-group 21/31; (III) rats dark reared until 52 DPN and then subsequently light exposed for 3 days--group 3 dL; and (IV) rats totally dark reared until 52 DPN--group 52 DPN. Each group contained five animals. Semithin 0.5-1-μm thick resin-embedded sections were collected from tangential sampling levels through the middle of layer 4 in area 17 and stained with Toluidine Blue. These sections were used to quantitatively analyse the composition and distribution of dendritic clusters in the tangential plane. The key result of this study indicates a significant reduction in the mean number of medium- and small-sized dendritic profiles (diameter less than 2 μm) contributing to clusters in layer 4 of groups 3 dL and 52 dD compared with group 21/31. No differences were detected in the mean number of large-sized dendritic profiles composing a bundle in these experimental groups. Moreover, the mean number of clusters and their tangential distribution in layer 4 did not vary significantly between all four groups. Finally, the clustering parameters were not significantly different between groups 21/31 and the normally reared group 52 dL. This study demonstrates, for the first time, that extended periods of dark rearing followed by light exposure can alter the morphological composition of dendritic bundles in thalamorecipient layer 4 of rat visual cortex. Because these changes occur in the primary region of thalamocortical input, they may underlie specific alterations in the processing of visual information both cortically and subcortically during periods of dark rearing and light exposure. Copyright © 2012 IBRO. Published by Elsevier Ltd. All rights reserved.
Patterns of Individual Variation in Visual Pathway Structure and Function in the Sighted and Blind
Datta, Ritobrato; Benson, Noah C.; Prasad, Sashank; Jacobson, Samuel G.; Cideciyan, Artur V.; Bridge, Holly; Watkins, Kate E.; Butt, Omar H.; Dain, Aleksandra S.; Brandes, Lauren; Gennatas, Efstathios D.
2016-01-01
Many structural and functional brain alterations accompany blindness, with substantial individual variation in these effects. In normally sighted people, there is correlated individual variation in some visual pathway structures. Here we examined if the changes in brain anatomy produced by blindness alter the patterns of anatomical variation found in the sighted. We derived eight measures of central visual pathway anatomy from a structural image of the brain from 59 sighted and 53 blind people. These measures showed highly significant differences in mean size between the sighted and blind cohorts. When we examined the measurements across individuals within each group we found three clusters of correlated variation, with V1 surface area and pericalcarine volume linked, and independent of the thickness of V1 cortex. These two clusters were in turn relatively independent of the volumes of the optic chiasm and lateral geniculate nucleus. This same pattern of variation in visual pathway anatomy was found in the sighted and the blind. Anatomical changes within these clusters were graded by the timing of onset of blindness, with those subjects with a post-natal onset of blindness having alterations in brain anatomy that were intermediate to those seen in the sighted and congenitally blind. Many of the blind and sighted subjects also contributed functional MRI measures of cross-modal responses within visual cortex, and a diffusion tensor imaging measure of fractional anisotropy within the optic radiations and the splenium of the corpus callosum. We again found group differences between the blind and sighted in these measures. The previously identified clusters of anatomical variation were also found to be differentially related to these additional measures: across subjects, V1 cortical thickness was related to cross-modal activation, and the volume of the optic chiasm and lateral geniculate was related to fractional anisotropy in the visual pathway. Our findings show that several of the structural and functional effects of blindness may be reduced to a smaller set of dimensions. It also seems that the changes in the brain that accompany blindness are on a continuum with normal variation found in the sighted. PMID:27812129
Visual perception of ADHD children with sensory processing disorder.
Jung, Hyerim; Woo, Young Jae; Kang, Je Wook; Choi, Yeon Woo; Kim, Kyeong Mi
2014-04-01
The aim of the present study was to investigate the visual perception difference between ADHD children with and without sensory processing disorder, and the relationship between sensory processing and visual perception of the children with ADHD. Participants were 47 outpatients, aged 6-8 years, diagnosed with ADHD. After excluding those who met exclusion criteria, 38 subjects were clustered into two groups, ADHD children with and without sensory processing disorder (SPD), using SSP reported by their parents, then subjects completed K-DTVP-2. Spearman correlation analysis was run to determine the relationship between sensory processing and visual perception, and Mann-Whitney-U test was conducted to compare the K-DTVP-2 score of two groups respectively. The ADHD children with SPD performed inferiorly to ADHD children without SPD in the on 3 quotients of K-DTVP-2. The GVP of K-DTVP-2 score was related to Movement Sensitivity section (r=0.368(*)) and Low Energy/Weak section of SSP (r=0.369*). The result of the present study suggests that among children with ADHD, the visual perception is lower in those children with co-morbid SPD. Also, visual perception may be related to sensory processing, especially in the reactions of vestibular and proprioceptive senses. Regarding academic performance, it is necessary to consider how sensory processing issues affect visual perception in children with ADHD.
An Unmanned Aerial Vehicle Cluster Network Cruise System for Monitor
NASA Astrophysics Data System (ADS)
Jiang, Jirong; Tao, Jinpeng; Xin, Guipeng
2018-06-01
The existing maritime cruising system mainly uses manned motorboats to monitor the quality of coastal water and patrol and maintenance of the navigation -aiding facility, which has the problems of high energy consumption, small range of cruise for monitoring, insufficient information control and low visualization. In recent years, the application of UAS in the maritime field has alleviated the phenomenon above to some extent. A cluster-based unmanned network monitoring cruise system designed in this project uses the floating small UAV self-powered launching platform as a carrier, applys the idea of cluster, and combines the strong controllability of the multi-rotor UAV and the capability to carry customized modules, constituting a unmanned, visualized and normalized monitoring cruise network to realize the functions of maritime cruise, maintenance of navigational-aiding and monitoring the quality of coastal water.
A local search for a graph clustering problem
NASA Astrophysics Data System (ADS)
Navrotskaya, Anna; Il'ev, Victor
2016-10-01
In the clustering problems one has to partition a given set of objects (a data set) into some subsets (called clusters) taking into consideration only similarity of the objects. One of most visual formalizations of clustering is graph clustering, that is grouping the vertices of a graph into clusters taking into consideration the edge structure of the graph whose vertices are objects and edges represent similarities between the objects. In the graph k-clustering problem the number of clusters does not exceed k and the goal is to minimize the number of edges between clusters and the number of missing edges within clusters. This problem is NP-hard for any k ≥ 2. We propose a polynomial time (2k-1)-approximation algorithm for graph k-clustering. Then we apply a local search procedure to the feasible solution found by this algorithm and hold experimental research of obtained heuristics.
Running VisIt Software on the Peregrine System | High-Performance Computing
kilobyte range. VisIt features a robust remote visualization capability. VisIt can be started on a local machine and used to visualize data on a remote compute cluster.The remote machine must be able to send VisIt module must be loaded as part of this process. To enable remote visualization the 'module load
Preparation and Luminescence Thermochromism of Tetranuclear Copper(I)-Pyridine-Iodide Clusters
ERIC Educational Resources Information Center
Parmeggiani, Fabio; Sacchetti, Alessandro
2012-01-01
A simple and straightforward synthesis of a tetranuclear copper(I)-pyridine-iodide cluster is described as a laboratory experiment for advanced inorganic chemistry undergraduate students. The product is used to demonstrate the fascinating and visually impressive phenomenon of luminescence thermochromism: exposed to long-wave UV light, the…
Clustering by reordering of similarity and Laplacian matrices: Application to galaxy clusters
NASA Astrophysics Data System (ADS)
Mahmoud, E.; Shoukry, A.; Takey, A.
2018-04-01
Similarity metrics, kernels and similarity-based algorithms have gained much attention due to their increasing applications in information retrieval, data mining, pattern recognition and machine learning. Similarity Graphs are often adopted as the underlying representation of similarity matrices and are at the origin of known clustering algorithms such as spectral clustering. Similarity matrices offer the advantage of working in object-object (two-dimensional) space where visualization of clusters similarities is available instead of object-features (multi-dimensional) space. In this paper, sparse ɛ-similarity graphs are constructed and decomposed into strong components using appropriate methods such as Dulmage-Mendelsohn permutation (DMperm) and/or Reverse Cuthill-McKee (RCM) algorithms. The obtained strong components correspond to groups (clusters) in the input (feature) space. Parameter ɛi is estimated locally, at each data point i from a corresponding narrow range of the number of nearest neighbors. Although more advanced clustering techniques are available, our method has the advantages of simplicity, better complexity and direct visualization of the clusters similarities in a two-dimensional space. Also, no prior information about the number of clusters is needed. We conducted our experiments on two and three dimensional, low and high-sized synthetic datasets as well as on an astronomical real-dataset. The results are verified graphically and analyzed using gap statistics over a range of neighbors to verify the robustness of the algorithm and the stability of the results. Combining the proposed algorithm with gap statistics provides a promising tool for solving clustering problems. An astronomical application is conducted for confirming the existence of 45 galaxy clusters around the X-ray positions of galaxy clusters in the redshift range [0.1..0.8]. We re-estimate the photometric redshifts of the identified galaxy clusters and obtain acceptable values compared to published spectroscopic redshifts with a 0.029 standard deviation of their differences.
Toyz: A framework for scientific analysis of large datasets and astronomical images
NASA Astrophysics Data System (ADS)
Moolekamp, F.; Mamajek, E.
2015-11-01
As the size of images and data products derived from astronomical data continues to increase, new tools are needed to visualize and interact with that data in a meaningful way. Motivated by our own astronomical images taken with the Dark Energy Camera (DECam) we present Toyz, an open source Python package for viewing and analyzing images and data stored on a remote server or cluster. Users connect to the Toyz web application via a web browser, making it a convenient tool for students to visualize and interact with astronomical data without having to install any software on their local machines. In addition it provides researchers with an easy-to-use tool that allows them to browse the files on a server and quickly view very large images (>2 Gb) taken with DECam and other cameras with a large FOV and create their own visualization tools that can be added on as extensions to the default Toyz framework.
Visualized analysis of mixed numeric and categorical data via extended self-organizing map.
Hsu, Chung-Chian; Lin, Shu-Han
2012-01-01
Many real-world datasets are of mixed types, having numeric and categorical attributes. Even though difficult, analyzing mixed-type datasets is important. In this paper, we propose an extended self-organizing map (SOM), called MixSOM, which utilizes a data structure distance hierarchy to facilitate the handling of numeric and categorical values in a direct, unified manner. Moreover, the extended model regularizes the prototype distance between neighboring neurons in proportion to their map distance so that structures of the clusters can be portrayed better on the map. Extensive experiments on several synthetic and real-world datasets are conducted to demonstrate the capability of the model and to compare MixSOM with several existing models including Kohonen's SOM, the generalized SOM and visualization-induced SOM. The results show that MixSOM is superior to the other models in reflecting the structure of the mixed-type data and facilitates further analysis of the data such as exploration at various levels of granularity.
Acidity in DMSO from the embedded cluster integral equation quantum solvation model.
Heil, Jochen; Tomazic, Daniel; Egbers, Simon; Kast, Stefan M
2014-04-01
The embedded cluster reference interaction site model (EC-RISM) is applied to the prediction of acidity constants of organic molecules in dimethyl sulfoxide (DMSO) solution. EC-RISM is based on a self-consistent treatment of the solute's electronic structure and the solvent's structure by coupling quantum-chemical calculations with three-dimensional (3D) RISM integral equation theory. We compare available DMSO force fields with reference calculations obtained using the polarizable continuum model (PCM). The results are evaluated statistically using two different approaches to eliminating the proton contribution: a linear regression model and an analysis of pK(a) shifts for compound pairs. Suitable levels of theory for the integral equation methodology are benchmarked. The results are further analyzed and illustrated by visualizing solvent site distribution functions and comparing them with an aqueous environment.
Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei
2011-04-01
An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach.
[Epidemiological survey of visual impairment in Funing County, Jiangsu].
Yang, M; Zhang, J F; Zhu, R R; Kang, L H; Qin, B; Guan, H J
2017-07-11
Objective: To investigate the prevalence of visual impairment and factors associated with visual impairment among people aged 50 years and above in Funing County, Jiangsu Province. Methods: Cross-sectional study. Random cluster sampling was used in selecting individuals aged ≥50 years in 30 clusters, and 5 947 individuals received visual acuity testing and eye examination. Stata 13.0 software was used to analyze the data. Multivariate logistic regression was used to detect possible factors of visual impairment such as age, gender and education. Statistical significance was defined as P< 0.05. Results: A total of 6 145 persons aged 50 years and above were enumerated, and 5 947 (96.8%) participants were examined. Based on the criteria of World Health Organization (WHO) visual impairment classification and presenting visual acuity, 138 persons were diagnosed as blindness, and 1 405 persons were diagnosed as low vision. The prevalence of blindness and low vision was 2.32% and 23.63%, respectively. And the prevalence of visual impairment was 25.95%. Based on the criteria of WHO visual impairment classification and best-corrected visual acuity, 92 persons were diagnosed as blindness, and 383 persons were diagnosed as low vision. The prevalence of blindness and low vision was 1.55% and 6.44%, respectively. And the prevalence of visual impairment was 7.99%. Concerning presenting visual acuity and best-corrected visual acuity, the prevalence of blindness and low vision was higher in old people, females and less educated persons. Cataract (46.63%) was the leading cause of blindness. Uncorrected refractive error (36.51%) was also a main cause of visual impairment. Conclusion: The prevalence of visual impairment is higher in old people, females and less educated persons in Funing County, Jiangsu Province. Cataract is still the leading cause of visual impairment. (Chin J Ophthalmol, 2017, 53: 502-508) .
Audiovisual Delay as a Novel Cue to Visual Distance.
Jaekl, Philip; Seidlitz, Jakob; Harris, Laurence R; Tadin, Duje
2015-01-01
For audiovisual sensory events, sound arrives with a delay relative to light that increases with event distance. It is unknown, however, whether humans can use these ubiquitous sound delays as an information source for distance computation. Here, we tested the hypothesis that audiovisual delays can both bias and improve human perceptual distance discrimination, such that visual stimuli paired with auditory delays are perceived as more distant and are thereby an ordinal distance cue. In two experiments, participants judged the relative distance of two repetitively displayed three-dimensional dot clusters, both presented with sounds of varying delays. In the first experiment, dot clusters presented with a sound delay were judged to be more distant than dot clusters paired with equivalent sound leads. In the second experiment, we confirmed that the presence of a sound delay was sufficient to cause stimuli to appear as more distant. Additionally, we found that ecologically congruent pairing of more distant events with a sound delay resulted in an increase in the precision of distance judgments. A control experiment determined that the sound delay duration influencing these distance judgments was not detectable, thereby eliminating decision-level influence. In sum, we present evidence that audiovisual delays can be an ordinal cue to visual distance.
A Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization.
He, Sheng; Samara, Petros; Burgers, Jan; Schomaker, Lambert
2016-11-01
It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. First, a novel descriptor, called histogram of orientations of handwritten strokes, is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the multi-label self-organizing map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. The experimental results on the medieval paleographic scale data set demonstrate that our method achieves state-of-the-art results.
Equalizer: a scalable parallel rendering framework.
Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato
2009-01-01
Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
A multimodal detection model of dolphins to estimate abundance validated by field experiments.
Akamatsu, Tomonari; Ura, Tamaki; Sugimatsu, Harumi; Bahl, Rajendar; Behera, Sandeep; Panda, Sudarsan; Khan, Muntaz; Kar, S K; Kar, C S; Kimura, Satoko; Sasaki-Yamamoto, Yukiko
2013-09-01
Abundance estimation of marine mammals requires matching of detection of an animal or a group of animal by two independent means. A multimodal detection model using visual and acoustic cues (surfacing and phonation) that enables abundance estimation of dolphins is proposed. The method does not require a specific time window to match the cues of both means for applying mark-recapture method. The proposed model was evaluated using data obtained in field observations of Ganges River dolphins and Irrawaddy dolphins, as examples of dispersed and condensed distributions of animals, respectively. The acoustic detection probability was approximately 80%, 20% higher than that of visual detection for both species, regardless of the distribution of the animals in present study sites. The abundance estimates of Ganges River dolphins and Irrawaddy dolphins fairly agreed with the numbers reported in previous monitoring studies. The single animal detection probability was smaller than that of larger cluster size, as predicted by the model and confirmed by field data. However, dense groups of Irrawaddy dolphins showed difference in cluster sizes observed by visual and acoustic methods. Lower detection probability of single clusters of this species seemed to be caused by the clumped distribution of this species.
You Can Touch This! Bringing HST images to life as 3-D models
NASA Astrophysics Data System (ADS)
Christian, Carol A.; Nota, A.; Grice, N. A.; Sabbi, E.; Shaheen, N.; Greenfield, P.; Hurst, A.; Kane, S.; Rao, R.; Dutterer, J.; de Mink, S. E.
2014-01-01
We present the very first results of an innovative process to transform Hubble images into tactile 3-D models of astronomical objects. We have created a very new, unique tool for understanding astronomical phenomena, especially designed to make astronomy accessible to visually impaired children and adults. From the multicolor images of stellar clusters, we construct 3-D computer models that are digitally sliced into layers, each featuring touchable patterning and Braille characters, and are printed on a 3-D printer. The slices are then fitted together, so that the user can explore the structure of the cluster environment with their fingertips, slice-by-slice, analogous to a visual fly-through. Students will be able to identify and spatially locate the different components of these complex astronomical objects, namely gas, dust and stars, and will learn about the formation and composition of stellar clusters. The primary audiences for the 3D models are middle school and high school blind students and, secondarily, blind adults. However, we believe that the final materials will address a broad range of individuals with varied and multi-sensory learning styles, and will be interesting and visually appealing to the public at large.
Intuitive visual impressions (cogs) for identifying clusters of diversity within potato species
USDA-ARS?s Scientific Manuscript database
One of the basic research activities of genebanks is to partition stocks into groups that facilitate the efficient preservation and evaluation of the full range of useful phenotype diversity. We sought to test the usefulness of making of infra-specific groups by replicated rapid visual intuitive imp...
A Tyrosine-Hydroxylase Characterization of Dopaminergic Neurons in the Honey Bee Brain
Tedjakumala, Stevanus R.; Rouquette, Jacques; Boizeau, Marie-Laure; Mesce, Karen A.; Hotier, Lucie; Massou, Isabelle; Giurfa, Martin
2017-01-01
Dopamine (DA) plays a fundamental role in insect behavior as it acts both as a general modulator of behavior and as a value system in associative learning where it mediates the reinforcing properties of unconditioned stimuli (US). Here we aimed at characterizing the dopaminergic neurons in the central nervous system of the honey bee, an insect that serves as an established model for the study of learning and memory. We used tyrosine hydroxylase (TH) immunoreactivity (ir) to ensure that the neurons detected synthesize DA endogenously. We found three main dopaminergic clusters, C1–C3, which had been previously described; the C1 cluster is located in a small region adjacent to the esophagus (ES) and the antennal lobe (AL); the C2 cluster is situated above the C1 cluster, between the AL and the vertical lobe (VL) of the mushroom body (MB); the C3 cluster is located below the calyces (CA) of the MB. In addition, we found a novel dopaminergic cluster, C4, located above the dorsomedial border of the lobula, which innervates the visual neuropils of the bee brain. Additional smaller processes and clusters were found and are described. The profuse dopaminergic innervation of the entire bee brain and the specific connectivity of DA neurons, with visual, olfactory and gustatory circuits, provide a foundation for a deeper understanding of how these sensory modules are modulated by DA, and the DA-dependent value-based associations that occur during associative learning. PMID:28740466
Devís-Devís, José; Lizandra, Jorge; Valencia-Peris, Alexandra; Pérez-Gimeno, Esther; García-Massò, Xavier; Peiró-Velert, Carmen
2017-01-01
This study examined longitudinal changes in physical activity, sedentary behavior and body mass index in adolescents, specifically their migrations towards a different weight cluster. A cohort of 755 adolescents participated in a three-year study. A clustering Self-Organized Maps Analysis was performed to visualize changes in subjects' characteristics between the first and second assessment, and how adolescents were grouped. Also a classification tree was used to identify the behavioral characteristics of the groups that changed their weight cluster. Results indicated that boys were more active and less sedentary than girls. Boys were especially keen to technological-based activities while girls preferred social-based activities. A moderate competing effect between sedentary behaviors and physical activities was observed, especially in girls. Overweight and obesity were negatively associated with physical activity, although a small group of overweight/obese adolescents showed a positive relationship with vigorous physical activity. Cluster migrations indicated that 22.66% of adolescents changed their weight cluster to a lower category and none of them moved in the opposite direction. The behavioral characteristics of these adolescents did not support the hypothesis that the change to a lower weight cluster was a consequence of an increase in time devoted to physical activity or a decrease in time spent on sedentary behavior. Physical activity and sedentary behavior does not exert a substantial effect on overweight and obesity. Therefore, there are other ways of changing to a lower-weight status in adolescents apart from those in which physical activity and sedentary behavior are involved.
Lizandra, Jorge; Valencia-Peris, Alexandra; Pérez-Gimeno, Esther; García-Massò, Xavier; Peiró-Velert, Carmen
2017-01-01
This study examined longitudinal changes in physical activity, sedentary behavior and body mass index in adolescents, specifically their migrations towards a different weight cluster. A cohort of 755 adolescents participated in a three-year study. A clustering Self-Organized Maps Analysis was performed to visualize changes in subjects’ characteristics between the first and second assessment, and how adolescents were grouped. Also a classification tree was used to identify the behavioral characteristics of the groups that changed their weight cluster. Results indicated that boys were more active and less sedentary than girls. Boys were especially keen to technological-based activities while girls preferred social-based activities. A moderate competing effect between sedentary behaviors and physical activities was observed, especially in girls. Overweight and obesity were negatively associated with physical activity, although a small group of overweight/obese adolescents showed a positive relationship with vigorous physical activity. Cluster migrations indicated that 22.66% of adolescents changed their weight cluster to a lower category and none of them moved in the opposite direction. The behavioral characteristics of these adolescents did not support the hypothesis that the change to a lower weight cluster was a consequence of an increase in time devoted to physical activity or a decrease in time spent on sedentary behavior. Physical activity and sedentary behavior does not exert a substantial effect on overweight and obesity. Therefore, there are other ways of changing to a lower-weight status in adolescents apart from those in which physical activity and sedentary behavior are involved. PMID:28636644
NASA Astrophysics Data System (ADS)
Kelkar, Kshitija; Gray, Meghan E.; Aragón-Salamanca, Alfonso; Rudnick, Gregory; Milvang-Jensen, Bo; Jablonka, Pascale; Schrabback, Tim
2017-08-01
With the aim of understanding the effect of the environment on the star formation history and morphological transformation of galaxies, we present a detailed analysis of the colour, morphology and internal structure of cluster and field galaxies at 0.4 ≤ z ≤ 0.8. We use the Hubble Space Telescope data for over 500 galaxies from the ESO Distant Cluster Survey to quantify how the galaxies' light distribution deviate from symmetric smooth profiles. We visually inspect the galaxies' images to identify the likely causes for such deviations. We find that the residual flux fraction (RFF), which measures the fractional contribution to the galaxy light of the residuals left after subtracting a symmetric and smooth model, is very sensitive to the degree of structural disturbance but not the causes of such disturbance. On the other hand, the asymmetry of these residuals (Ares) is more sensitive to the causes of the disturbance, with merging galaxies having the highest values of Ares. Using these quantitative parameters, we find that, at a fixed morphology, cluster and field galaxies show statistically similar degrees of disturbance. However, there is a higher fraction of symmetric and passive spirals in the cluster than in the field. These galaxies have smoother light distributions than their star-forming counterparts. We also find that while almost all field and cluster S0s appear undisturbed, there is a relatively small population of star-forming S0s in clusters but not in the field. These findings are consistent with relatively gentle environmental processes acting on galaxies infalling on to clusters.
NASA Astrophysics Data System (ADS)
Tadross, A. L.
2005-12-01
The main physical parameters; the cluster center, distance, radius, age, reddening, and visual absorbtion; have been re-estimated and improved for the open cluster NGC 7086. The metal abundance, galactic distances, membership richness, luminosity function, mass function, and the total mass of NGC 7086 have been examined for the first time here using Monet et al. (2003) catalog.
Kohatsu, Soh; Yamamoto, Daisuke
2015-03-06
The courtship ritual of male Drosophila represents an innate behaviour that is initiated by female-derived sensory stimuli. Here we report that moving light spots can induce courtship-like following pursuit in tethered wild-type male flies provided the fly is primed by optogenetic stimulation of specific dsx-expressing neuronal clusters in the lateral protocerebrum (LPR). Namely, stimulation of the pC1 neuronal cluster initiates unilateral wing extension and vibration of both sides, whereas stimulation of the pC2l cluster initiates only contralateral wing displays. In addition, stimulation of pC2l but not pC1 neurons induced abdominal bending and proboscis extension. Ca(2+) imaging of the pC1 cluster revealed periodic Ca(2+) rises, each corresponding to a turn of the male fly during courtship. In contrast, group-reared fru mutant males exhibit light spot-induced courtship pursuit without optogenetic priming. Ca(2+) imaging revealed enhanced responses of LPR neurons to visual stimuli in the mutants, suggesting a neural correlate of the light spot-induced courtship behaviour.
Crossmaps: Visualization of overlapping relationships in collections of journal papers
Morris, Steven A.; Yen, Gary G.
2004-01-01
A crossmapping technique is introduced for visualizing multiple and overlapping relations among entity types in collections of journal articles. Groups of entities from two entity types are crossplotted to show correspondence of relations. For example, author collaboration groups are plotted on the x axis against groups of papers (research fronts) on the y axis. At the intersection of each pair of author group/research front pairs a circular symbol is plotted whose size is proportional to the number of times that authors in the group appear as authors in papers in the research front. Entity groups are found by agglomerative hierarchical clustering using conventional similarity measures. Crossmaps comprise a simple technique that is particularly suited to showing overlap in relations among entity groups. Particularly useful crossmaps are: research fronts against base reference clusters, research fronts against author collaboration groups, and research fronts against term co-occurrence clusters. When exploring the knowledge domain of a collection of journal papers, it is useful to have several crossmaps of different entity pairs, complemented by research front timelines and base reference cluster timelines. PMID:14762168
Božičević, Alen; Dobrzyński, Maciej; De Bie, Hans; Gafner, Frank; Garo, Eliane; Hamburger, Matthias
2017-12-05
The technological development of LC-MS instrumentation has led to significant improvements of performance and sensitivity, enabling high-throughput analysis of complex samples, such as plant extracts. Most software suites allow preprocessing of LC-MS chromatograms to obtain comprehensive information on single constituents. However, more advanced processing needs, such as the systematic and unbiased comparative metabolite profiling of large numbers of complex LC-MS chromatograms remains a challenge. Currently, users have to rely on different tools to perform such data analyses. We developed a two-step protocol comprising a comparative metabolite profiling tool integrated in ACD/MS Workbook Suite, and a web platform developed in R language designed for clustering and visualization of chromatographic data. Initially, all relevant chromatographic and spectroscopic data (retention time, molecular ions with the respective ion abundance, and sample names) are automatically extracted and assembled in an Excel spreadsheet. The file is then loaded into an online web application that includes various statistical algorithms and provides the user with tools to compare and visualize the results in intuitive 2D heatmaps. We applied this workflow to LC-ESIMS profiles obtained from 69 honey samples. Within few hours of calculation with a standard PC, honey samples were preprocessed and organized in clusters based on their metabolite profile similarities, thereby highlighting the common metabolite patterns and distributions among samples. Implementation in the ACD/Laboratories software package enables ulterior integration of other analytical data, and in silico prediction tools for modern drug discovery.
Spatio-Temporal Metabolite Profiling of the Barley Germination Process by MALDI MS Imaging
Gorzolka, Karin; Kölling, Jan; Nattkemper, Tim W.; Niehaus, Karsten
2016-01-01
MALDI mass spectrometry imaging was performed to localize metabolites during the first seven days of the barley germination. Up to 100 mass signals were detected of which 85 signals were identified as 48 different metabolites with highly tissue-specific localizations. Oligosaccharides were observed in the endosperm and in parts of the developed embryo. Lipids in the endosperm co-localized in dependency on their fatty acid compositions with changes in the distributions of diacyl phosphatidylcholines during germination. 26 potentially antifungal hordatines were detected in the embryo with tissue-specific localizations of their glycosylated, hydroxylated, and O-methylated derivates. In order to reveal spatio-temporal patterns in local metabolite compositions, multiple MSI data sets from a time series were analyzed in one batch. This requires a new preprocessing strategy to achieve comparability between data sets as well as a new strategy for unsupervised clustering. The resulting spatial segmentation for each time point sample is visualized in an interactive cluster map and enables simultaneous interactive exploration of all time points. Using this new analysis approach and visualization tool germination-dependent developments of metabolite patterns with single MS position accuracy were discovered. This is the first study that presents metabolite profiling of a cereals’ germination process over time by MALDI MSI with the identification of a large number of peaks of agronomically and industrially important compounds such as oligosaccharides, lipids and antifungal agents. Their detailed localization as well as the MS cluster analyses for on-tissue metabolite profile mapping revealed important information for the understanding of the germination process, which is of high scientific interest. PMID:26938880
Schroeder, David; Korsakov, Fedor; Knipe, Carissa Mai-Ping; Thorson, Lauren; Ellingson, Arin M; Nuckley, David; Carlis, John; Keefe, Daniel F
2014-12-01
In biomechanics studies, researchers collect, via experiments or simulations, datasets with hundreds or thousands of trials, each describing the same type of motion (e.g., a neck flexion-extension exercise) but under different conditions (e.g., different patients, different disease states, pre- and post-treatment). Analyzing similarities and differences across all of the trials in these collections is a major challenge. Visualizing a single trial at a time does not work, and the typical alternative of juxtaposing multiple trials in a single visual display leads to complex, difficult-to-interpret visualizations. We address this problem via a new strategy that organizes the analysis around motion trends rather than trials. This new strategy matches the cognitive approach that scientists would like to take when analyzing motion collections. We introduce several technical innovations making trend-centric motion visualization possible. First, an algorithm detects a motion collection's trends via time-dependent clustering. Second, a 2D graphical technique visualizes how trials leave and join trends. Third, a 3D graphical technique, using a median 3D motion plus a visual variance indicator, visualizes the biomechanics of the set of trials within each trend. These innovations are combined to create an interactive exploratory visualization tool, which we designed through an iterative process in collaboration with both domain scientists and a traditionally-trained graphic designer. We report on insights generated during this design process and demonstrate the tool's effectiveness via a validation study with synthetic data and feedback from expert musculoskeletal biomechanics researchers who used the tool to analyze the effects of disc degeneration on human spinal kinematics.
NASA Technical Reports Server (NTRS)
Lawrence, Charles; Putt, Charles W.
1997-01-01
The Visual Computing Environment (VCE) is a NASA Lewis Research Center project to develop a framework for intercomponent and multidisciplinary computational simulations. Many current engineering analysis codes simulate various aspects of aircraft engine operation. For example, existing computational fluid dynamics (CFD) codes can model the airflow through individual engine components such as the inlet, compressor, combustor, turbine, or nozzle. Currently, these codes are run in isolation, making intercomponent and complete system simulations very difficult to perform. In addition, management and utilization of these engineering codes for coupled component simulations is a complex, laborious task, requiring substantial experience and effort. To facilitate multicomponent aircraft engine analysis, the CFD Research Corporation (CFDRC) is developing the VCE system. This system, which is part of NASA's Numerical Propulsion Simulation System (NPSS) program, can couple various engineering disciplines, such as CFD, structural analysis, and thermal analysis. The objectives of VCE are to (1) develop a visual computing environment for controlling the execution of individual simulation codes that are running in parallel and are distributed on heterogeneous host machines in a networked environment, (2) develop numerical coupling algorithms for interchanging boundary conditions between codes with arbitrary grid matching and different levels of dimensionality, (3) provide a graphical interface for simulation setup and control, and (4) provide tools for online visualization and plotting. VCE was designed to provide a distributed, object-oriented environment. Mechanisms are provided for creating and manipulating objects, such as grids, boundary conditions, and solution data. This environment includes parallel virtual machine (PVM) for distributed processing. Users can interactively select and couple any set of codes that have been modified to run in a parallel distributed fashion on a cluster of heterogeneous workstations. A scripting facility allows users to dictate the sequence of events that make up the particular simulation.
New atlas of open star clusters
NASA Astrophysics Data System (ADS)
Seleznev, Anton F.; Avvakumova, Ekaterina; Kulesh, Maxim; Filina, Julia; Tsaregorodtseva, Polina; Kvashnina, Alvira
2017-11-01
Due to numerous new discoveries of open star clusters in the last two decades, astronomers need an easy-touse resource to get visual information on the relative position of clusters in the sky. Therefore we propose a new atlas of open star clusters. It is based on a table compiled from the largest modern cluster catalogues. The atlas shows the positions and sizes of 3291 clusters and associations, and consists of two parts. The first contains 108 maps of 12 by 12 degrees with an overlapping of 2 degrees in three strips along the Galactic equator. The second one is an online web application, which shows a square field of an arbitrary size, either in equatorial coordinates or in galactic coordinates by request. The atlas is proposed for the sampling of clusters and cluster stars for further investigation. Another use is the identification of clusters among overdensities in stellar density maps or among stellar groups in images of the sky.
OGLE II Eclipsing Binaries In The LMC: Analysis With Class
NASA Astrophysics Data System (ADS)
Devinney, Edward J.; Prsa, A.; Guinan, E. F.; DeGeorge, M.
2011-01-01
The Eclipsing Binaries (EBs) via Artificial Intelligence (EBAI) Project is applying machine learning techniques to elucidate the nature of EBs. Previously, Prsa, et al. applied artificial neural networks (ANNs) trained on physically-realistic Wilson-Devinney models to solve the light curves of the 1882 detached EBs in the LMC discovered by the OGLE II Project (Wyrzykowski, et al.) fully automatically, bypassing the need for manually-derived starting solutions. A curious result is the non-monotonic distribution of the temperature ratio parameter T2/T1, featuring a subsidiary peak noted previously by Mazeh, et al. in an independent analysis using the EBOP EB solution code (Tamuz, et al.). To explore this and to gain a fuller understanding of the multivariate EBAI LMC observational plus solutions data, we have employed automatic clustering and advanced visualization (CAV) techniques. Clustering the OGLE II data aggregates objects that are similar with respect to many parameter dimensions. Measures of similarity for example, could include the multidimensional Euclidean Distance between data objects, although other measures may be appropriate. Applying clustering, we find good evidence that the T2/T1 subsidiary peak is due to evolved binaries, in support of Mazeh et al.'s speculation. Further, clustering suggests that the LMC detached EBs occupying the main sequence region belong to two distinct classes. Also identified as a separate cluster in the multivariate data are stars having a Period-I band relation. Derekas et al. had previously found a Period-K band relation for LMC EBs discovered by the MACHO Project (Alcock, et al.). We suggest such CAV techniques will prove increasingly useful for understanding the large, multivariate datasets increasingly being produced in astronomy. We are grateful for the support of this research from NSF/RUI Grant AST-05-75042 f.
Early dynamical evolution of substructured stellar clusters
NASA Astrophysics Data System (ADS)
Dorval, Julien; Boily, Christian
2015-08-01
It is now widely accepted that stellar clusters form with a high level of substructure (Kuhn et al. 2014, Bate 2009), inherited from the molecular cloud and the star formation process. Evidence from observations and simulations also indicate the stars in such young clusters form a subvirial system (Kirk et al. 2007, Maschberger et al. 2010). The subsequent dynamical evolution can cause important mass loss, ejecting a large part of the birth population in the field. It can also imprint the stellar population and still be inferred from observations of evolved clusters. Nbody simulations allow a better understanding of these early twists and turns, given realistic initial conditions. Nowadays, substructured, clumpy young clusters are usually obtained through pseudo-fractal growth (Goodwin et al. 2004) and velocity inheritance. Such models are visually realistics and are very useful, they are however somewhat artificial in their velocity distribution. I introduce a new way to create clumpy initial conditions through a "Hubble expansion" which naturally produces self consistent clumps, velocity-wise. A velocity distribution analysis shows the new method produces realistic models, consistent with the dynamical state of the newly created cores in hydrodynamic simulation of cluster formation (Klessen & Burkert 2000). I use these initial conditions to investigate the dynamical evolution of young subvirial clusters, up to 80000 stars. I find an overall soft evolution, with hierarchical merging leading to a high level of mass segregation. I investigate the influence of the mass function on the fate of the cluster, specifically on the amount of mass loss induced by the early violent relaxation. Using a new binary detection algorithm, I also find a strong processing of the native binary population.
Hadjisolomou, Ekaterini; Stefanidis, Konstantinos; Papatheodorou, George; Papastergiadou, Evanthia
2018-03-19
During the last decades, Mediterranean freshwater ecosystems, especially lakes, have been under severe pressure due to increasing eutrophication and water quality deterioration. In this article, we compared the effectiveness of different data analysis methods by assessing the contribution of environmental parameters to eutrophication processes. For this purpose, principal components analysis (PCA), cluster analysis, and a self-organizing map (SOM) were applied, using water quality data from two transboundary lakes of North Greece. SOM is considered as an advanced and powerful data analysis tool because of its ability to represent complex and nonlinear relationships among multivariate data sets. The results of PCA and cluster analysis agreed with the SOM results, although the latter provided more information because of the visualization abilities regarding the parameters' relationships. Besides nutrients that were found to be a key factor for controlling chlorophyll-a (Chl - a), water temperature was related positively with algal production, while the Secchi disk depth parameter was found to be highly important and negatively related toeutrophic conditions. In general, the SOM results were more specific and allowed direct associations between the water quality variables. Our work showed that SOMs can be used effectively in limnological studies to produce robust and interpretable results, aiding scientists and managers to cope with environmental problems such as eutrophication.
Pérez-Rico, Consuelo; Rodríguez-González, Natividad; Arévalo-Serrano, Juan; Blanco, Román
2012-08-01
Dysthyroid optic neuropathy is the most serious, although infrequent (8-10 %) complication in Graves' orbitopathy (GO). It is known that early stages of compressive optic neuropathy may produce reversible visual field defects, suggesting axoplasmic stasis rather than ganglion cell death. This observational, cross-sectional, case-control study assessed 34 consecutive patients (65 eyes) with Graves' hyperthyroidism and longstanding GO and 31 age-matched control subjects. The patients' multifocal visual evoked potentials (mfVEP) were compared to their clinical and psychophysical (standard automated perimetry [SAP]) and structural (optic coherence tomography [OCT]) diagnostic test data. Abnormal cluster defects were found in 12.3 % and 3.1 % of eyes on the interocular and monocular amplitude analysis mfVEP probability plots, respectively. As well, mfVEP latencies delays were found in 13.8 and 20 % of eyes on the interocular and monocular analysis probability plots, respectively. Interestingly, 19 % of patients with GO had ocular hypertension, and a strong correlation between intraocular pressure measured at upgaze and mfVEP latency was found. MfVEP amplitudes and visual acuity were significantly related to each other (P < 0.05), but not with the latencies delays. However, relationships between the interocular or monocular mfVEP amplitudes and latencies analysis and SAP indices or OCT data were not statistically significant. One-third of our patients with GO showed changes in the mfVEP, indicating significant subclinical optic nerve dysfunction. In this sense, the mfVEP may be a useful diagnostic tool in the clinic for early diagnosis and monitoring of optic nerve function abnormalities in patients with GO.
Analyzing gene expression time-courses based on multi-resolution shape mixture model.
Li, Ying; He, Ye; Zhang, Yu
2016-11-01
Biological processes actually are a dynamic molecular process over time. Time course gene expression experiments provide opportunities to explore patterns of gene expression change over a time and understand the dynamic behavior of gene expression, which is crucial for study on development and progression of biology and disease. Analysis of the gene expression time-course profiles has not been fully exploited so far. It is still a challenge problem. We propose a novel shape-based mixture model clustering method for gene expression time-course profiles to explore the significant gene groups. Based on multi-resolution fractal features and mixture clustering model, we proposed a multi-resolution shape mixture model algorithm. Multi-resolution fractal features is computed by wavelet decomposition, which explore patterns of change over time of gene expression at different resolution. Our proposed multi-resolution shape mixture model algorithm is a probabilistic framework which offers a more natural and robust way of clustering time-course gene expression. We assessed the performance of our proposed algorithm using yeast time-course gene expression profiles compared with several popular clustering methods for gene expression profiles. The grouped genes identified by different methods are evaluated by enrichment analysis of biological pathways and known protein-protein interactions from experiment evidence. The grouped genes identified by our proposed algorithm have more strong biological significance. A novel multi-resolution shape mixture model algorithm based on multi-resolution fractal features is proposed. Our proposed model provides a novel horizons and an alternative tool for visualization and analysis of time-course gene expression profiles. The R and Matlab program is available upon the request. Copyright © 2016 Elsevier Inc. All rights reserved.
Regression analysis on the variation in efficiency frontiers for prevention stage of HIV/AIDS.
Kamae, Maki S; Kamae, Isao; Cohen, Joshua T; Neumann, Peter J
2011-01-01
To investigate how the cost effectiveness of preventing HIV/AIDS varies across possible efficiency frontiers (EFs) by taking into account potentially relevant external factors, such as prevention stage, and how the EFs can be characterized using regression analysis given uncertainty of the QALY-cost estimates. We reviewed cost-effectiveness estimates for the prevention and treatment of HIV/AIDS published from 2002-2007 and catalogued in the Tufts Medical Center Cost-Effectiveness Analysis (CEA) Registry. We constructed efficiency frontier (EF) curves by plotting QALYs against costs, using methods used by the Institute for Quality and Efficiency in Health Care (IQWiG) in Germany. We stratified the QALY-cost ratios by prevention stage, country of study, and payer perspective, and estimated EF equations using log and square-root models. A total of 53 QALY-cost ratios were identified for HIV/AIDS in the Tufts CEA Registry. Plotted ratios stratified by prevention stage were visually grouped into a cluster consisting of primary/secondary prevention measures and a cluster consisting of tertiary measures. Correlation coefficients for each cluster were statistically significant. For each cluster, we derived two EF equations - one based on the log model, and one based on the square-root model. Our findings indicate that stratification of HIV/AIDS interventions by prevention stage can yield distinct EFs, and that the correlation and regression analyses are useful for parametrically characterizing EF equations. Our study has certain limitations, such as the small number of included articles and the potential for study populations to be non-representative of countries of interest. Nonetheless, our approach could help develop a deeper appreciation of cost effectiveness beyond the deterministic approach developed by IQWiG.
NASA Astrophysics Data System (ADS)
Wright, D. J.; Raad, M.; Hoel, E.; Park, M.; Mollenkopf, A.; Trujillo, R.
2016-12-01
Introduced is a new approach for processing spatiotemporal big data by leveraging distributed analytics and storage. A suite of temporally-aware analysis tools summarizes data nearby or within variable windows, aggregates points (e.g., for various sensor observations or vessel positions), reconstructs time-enabled points into tracks (e.g., for mapping and visualizing storm tracks), joins features (e.g., to find associations between features based on attributes, spatial relationships, temporal relationships or all three simultaneously), calculates point densities, finds hot spots (e.g., in species distributions), and creates space-time slices and cubes (e.g., in microweather applications with temperature, humidity, and pressure, or within human mobility studies). These "feature geo analytics" tools run in both batch and streaming spatial analysis mode as distributed computations across a cluster of servers on typical "big" data sets, where static data exist in traditional geospatial formats (e.g., shapefile) locally on a disk or file share, attached as static spatiotemporal big data stores, or streamed in near-real-time. In other words, the approach registers large datasets or data stores with ArcGIS Server, then distributes analysis across a cluster of machines for parallel processing. Several brief use cases will be highlighted based on a 16-node server cluster at 14 Gb RAM per node, allowing, for example, the buffering of over 8 million points or thousands of polygons in 1 minute. The approach is "hybrid" in that ArcGIS Server integrates open-source big data frameworks such as Apache Hadoop and Apache Spark on the cluster in order to run the analytics. In addition, the user may devise and connect custom open-source interfaces and tools developed in Python or Python Notebooks; the common denominator being the familiar REST API.
Fast EEG spike detection via eigenvalue analysis and clustering of spatial amplitude distribution
NASA Astrophysics Data System (ADS)
Fukami, Tadanori; Shimada, Takamasa; Ishikawa, Bunnoshin
2018-06-01
Objective. In the current study, we tested a proposed method for fast spike detection in electroencephalography (EEG). Approach. We performed eigenvalue analysis in two-dimensional space spanned by gradients calculated from two neighboring samples to detect high-amplitude negative peaks. We extracted the spike candidates by imposing restrictions on parameters regarding spike shape and eigenvalues reflecting detection characteristics of individual medical doctors. We subsequently performed clustering, classifying detected peaks by considering the amplitude distribution at 19 scalp electrodes. Clusters with a small number of candidates were excluded. We then defined a score for eliminating spike candidates for which the pattern of detected electrodes differed from the overall pattern in a cluster. Spikes were detected by setting the score threshold. Main results. Based on visual inspection by a psychiatrist experienced in EEG, we evaluated the proposed method using two statistical measures of precision and recall with respect to detection performance. We found that precision and recall exhibited a trade-off relationship. The average recall value was 0.708 in eight subjects with the score threshold that maximized the F-measure, with 58.6 ± 36.2 spikes per subject. Under this condition, the average precision was 0.390, corresponding to a false positive rate 2.09 times higher than the true positive rate. Analysis of the required processing time revealed that, using a general-purpose computer, our method could be used to perform spike detection in 12.1% of the recording time. The process of narrowing down spike candidates based on shape occupied most of the processing time. Significance. Although the average recall value was comparable with that of other studies, the proposed method significantly shortened the processing time.
Chounchaisithi, Napa; Santiwong, Busayarat; Sutthavong, Sirikarn; Asvanit, Pompun
2014-02-01
Disclosing agents have a long history of use as an aid in children's tooth brushing instruction. However, their benefit when used to improve self-performed tooth brushing ability without any tooth brushing instruction has not been investigated. To evaluate the effect of disclosed plaque visualization on improving the self-performed, tooth brushing ability of primary school children. A cluster-randomized, crossover study was conducted in Nakhon Nayok province, Thailand. A total of 122 second-grade schoolchildren, aged 8-10 years old, from 12 schools were randomly divided into 2 groups. The first group was assigned to brush with disclosed plaque visualization, while the other group brushed without disclosed plaque visualization. One month later the groups switched procedures. Tooth brushing ability was evaluated by the subjects' reduction in patient hygiene performance (PHP) scores. The data were analyzed using repeated-measures analysis of variance, with significance set at p<0.05. Disclosed plaque visualization had a significant effect on improving the children's self-performed, tooth brushing ability in all areas of the mouth (p<0.001), particularly for anterior teeth, mandibular teeth, buccal surfaces, and areas adjacent to the gingival margin (p<0.001). Disclosed plaque visualization is a viable technique to improve children's self-performed tooth brushing ability, and could be used in school-based oral health promotion programs.
NASA Astrophysics Data System (ADS)
Liu, Jiangang; Tian, Jie
2007-03-01
The present study combined the Independent Component Analysis (ICA) and low-resolution brain electromagnetic tomography (LORETA) algorithms to identify the spatial distribution and time course of single-trial EEG record differences between neural responses to emotional stimuli vs. the neutral. Single-trial multichannel (129-sensor) EEG records were collected from 21 healthy, right-handed subjects viewing the emotion emotional (pleasant/unpleasant) and neutral pictures selected from International Affective Picture System (IAPS). For each subject, the single-trial EEG records of each emotional pictures were concatenated with the neutral, and a three-step analysis was applied to each of them in the same way. First, the ICA was performed to decompose each concatenated single-trial EEG records into temporally independent and spatially fixed components, namely independent components (ICs). The IC associated with artifacts were isolated. Second, the clustering analysis classified, across subjects, the temporally and spatially similar ICs into the same clusters, in which nonparametric permutation test for Global Field Power (GFP) of IC projection scalp maps identified significantly different temporal segments of each emotional condition vs. neutral. Third, the brain regions accounted for those significant segments were localized spatially with LORETA analysis. In each cluster, a voxel-by-voxel randomization test identified significantly different brain regions between each emotional condition vs. the neutral. Compared to the neutral, both emotional pictures elicited activation in the visual, temporal, ventromedial and dorsomedial prefrontal cortex and anterior cingulated gyrus. In addition, the pleasant pictures activated the left middle prefrontal cortex and the posterior precuneus, while the unpleasant pictures activated the right orbitofrontal cortex, posterior cingulated gyrus and somatosensory region. Our results were well consistent with other functional imaging studies, while revealed temporal dynamics of emotional processing of specific brain structure with high temporal resolution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, Chang W.; Iddir, Hakim; Uzun, Alper
To address the challenge of fast, direct atomic-scale visualization of the diffusion of atoms and clusters on surfaces, we used aberration-corrected scanning transmission electron microscopy (STEM) with high scan speeds (as little as ~0.1 s per frame) to visualize the diffusion of (1) a heavy atom (Ir) on the surface of a support consisting of light atoms, MgO(100), and (2) an Ir 3 cluster on MgO(110). Sequential Z-contrast images elucidate the diffusion mechanisms, including the hopping of Ir1 and the rotational migration of Ir 3 as two Ir atoms remain anchored to the surface. Density functional theory (DFT) calculations providedmore » estimates of the diffusion energy barriers and binding energies of the iridium species to the surfaces. The results show how the combination of fast-scan STEM and DFT calculations allow real-time visualization and fundamental understanding of surface diffusion phenomena pertaining to supported catalysts and other materials.« less
Personal sleep pattern visualization using sequence-based kernel self-organizing map on sound data.
Wu, Hongle; Kato, Takafumi; Yamada, Tomomi; Numao, Masayuki; Fukui, Ken-Ichi
2017-07-01
We propose a method to discover sleep patterns via clustering of sound events recorded during sleep. The proposed method extends the conventional self-organizing map algorithm by kernelization and sequence-based technologies to obtain a fine-grained map that visualizes the distribution and changes of sleep-related events. We introduced features widely applied in sound processing and popular kernel functions to the proposed method to evaluate and compare performance. The proposed method provides a new aspect of sleep monitoring because the results demonstrate that sound events can be directly correlated to an individual's sleep patterns. In addition, by visualizing the transition of cluster dynamics, sleep-related sound events were found to relate to the various stages of sleep. Therefore, these results empirically warrant future study into the assessment of personal sleep quality using sound data. Copyright © 2017 Elsevier B.V. All rights reserved.
ND 2 AV: N-dimensional data analysis and visualization analysis for the National Ignition Campaign
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bremer, Peer -Timo; Maljovec, Dan; Saha, Avishek
Here, one of the biggest challenges in high-energy physics is to analyze a complex mix of experimental and simulation data to gain new insights into the underlying physics. Currently, this analysis relies primarily on the intuition of trained experts often using nothing more sophisticated than default scatter plots. Many advanced analysis techniques are not easily accessible to scientists and not flexible enough to explore the potentially interesting hypotheses in an intuitive manner. Furthermore, results from individual techniques are often difficult to integrate, leading to a confusing patchwork of analysis snippets too cumbersome for data exploration. This paper presents a case study on how a combination of techniques from statistics, machine learning, topology, and visualization can have a significant impact in the field of inertial confinement fusion. We present themore » $$\\mathrm{ND}^2\\mathrm{AV}$$: N-dimensional data analysis and visualization framework, a user-friendly tool aimed at exploiting the intuition and current workflow of the target users. The system integrates traditional analysis approaches such as dimension reduction and clustering with state-of-the-art techniques such as neighborhood graphs and topological analysis, and custom capabilities such as defining combined metrics on the fly. All components are linked into an interactive environment that enables an intuitive exploration of a wide variety of hypotheses while relating the results to concepts familiar to the users, such as scatter plots. $$\\mathrm{ND}^2\\mathrm{AV}$$ uses a modular design providing easy extensibility and customization for different applications. $$\\mathrm{ND}^2\\mathrm{AV}$$ is being actively used in the National Ignition Campaign and has already led to a number of unexpected discoveries.« less
ND 2 AV: N-dimensional data analysis and visualization analysis for the National Ignition Campaign
Bremer, Peer -Timo; Maljovec, Dan; Saha, Avishek; ...
2015-07-01
Here, one of the biggest challenges in high-energy physics is to analyze a complex mix of experimental and simulation data to gain new insights into the underlying physics. Currently, this analysis relies primarily on the intuition of trained experts often using nothing more sophisticated than default scatter plots. Many advanced analysis techniques are not easily accessible to scientists and not flexible enough to explore the potentially interesting hypotheses in an intuitive manner. Furthermore, results from individual techniques are often difficult to integrate, leading to a confusing patchwork of analysis snippets too cumbersome for data exploration. This paper presents a case study on how a combination of techniques from statistics, machine learning, topology, and visualization can have a significant impact in the field of inertial confinement fusion. We present themore » $$\\mathrm{ND}^2\\mathrm{AV}$$: N-dimensional data analysis and visualization framework, a user-friendly tool aimed at exploiting the intuition and current workflow of the target users. The system integrates traditional analysis approaches such as dimension reduction and clustering with state-of-the-art techniques such as neighborhood graphs and topological analysis, and custom capabilities such as defining combined metrics on the fly. All components are linked into an interactive environment that enables an intuitive exploration of a wide variety of hypotheses while relating the results to concepts familiar to the users, such as scatter plots. $$\\mathrm{ND}^2\\mathrm{AV}$$ uses a modular design providing easy extensibility and customization for different applications. $$\\mathrm{ND}^2\\mathrm{AV}$$ is being actively used in the National Ignition Campaign and has already led to a number of unexpected discoveries.« less
Spatiotemporal analysis of indigenous and imported dengue fever cases in Guangdong province, China.
Li, Zhongjie; Yin, Wenwu; Clements, Archie; Williams, Gail; Lai, Shengjie; Zhou, Hang; Zhao, Dan; Guo, Yansha; Zhang, Yonghui; Wang, Jinfeng; Hu, Wenbiao; Yang, Weizhong
2012-06-12
Dengue fever has been a major public health concern in China since it re-emerged in Guangdong province in 1978. This study aimed to explore spatiotemporal characteristics of dengue fever cases for both indigenous and imported cases during recent years in Guangdong province, so as to identify high-risk areas of the province and thereby help plan resource allocation for dengue interventions. Notifiable cases of dengue fever were collected from all 123 counties of Guangdong province from 2005 to 2010. Descriptive temporal and spatial analysis were conducted, including plotting of seasonal distribution of cases, and creating choropleth maps of cumulative incidence by county. The space-time scan statistic was used to determine space-time clusters of dengue fever cases at the county level, and a geographical information system was used to visualize the location of the clusters. Analysis were stratified by imported and indigenous origin. 1658 dengue fever cases were recorded in Guangdong province during the study period, including 94 imported cases and 1564 indigenous cases. Both imported and indigenous cases occurred more frequently in autumn. The areas affected by the indigenous and imported cases presented a geographically expanding trend over the study period. The results showed that the most likely cluster of imported cases (relative risk = 7.52, p < 0.001) and indigenous cases (relative risk = 153.56, p < 0.001) occurred in the Pearl River Delta Area; while a secondary cluster of indigenous cases occurred in one district of the Chao Shan Area (relative risk = 471.25, p < 0.001). This study demonstrated that the geographic range of imported and indigenous dengue fever cases has expanded over recent years, and cases were significantly clustered in two heavily urbanised areas of Guangdong province. This provides the foundation for further investigation of risk factors and interventions in these high-risk areas.
BioImageXD: an open, general-purpose and high-throughput image-processing platform.
Kankaanpää, Pasi; Paavolainen, Lassi; Tiitta, Silja; Karjalainen, Mikko; Päivärinne, Joacim; Nieminen, Jonna; Marjomäki, Varpu; Heino, Jyrki; White, Daniel J
2012-06-28
BioImageXD puts open-source computer science tools for three-dimensional visualization and analysis into the hands of all researchers, through a user-friendly graphical interface tuned to the needs of biologists. BioImageXD has no restrictive licenses or undisclosed algorithms and enables publication of precise, reproducible and modifiable workflows. It allows simple construction of processing pipelines and should enable biologists to perform challenging analyses of complex processes. We demonstrate its performance in a study of integrin clustering in response to selected inhibitors.
Broca’s area network in language function: a pooling-data connectivity study
Bernal, Byron; Ardila, Alfredo; Rosselli, Monica
2015-01-01
Background and Objective: Modern neuroimaging developments have demonstrated that cognitive functions correlate with brain networks rather than specific areas. The purpose of this paper was to analyze the connectivity of Broca’s area based on language tasks. Methods: A connectivity modeling study was performed by pooling data of Broca’s activation in language tasks. Fifty-seven papers that included 883 subjects in 84 experiments were analyzed. Analysis of Likelihood Estimates of pooled data was utilized to generate the map; thresholds at p < 0.01 were corrected for multiple comparisons and false discovery rate. Resulting images were co-registered into MNI standard space. Results: A network consisting of 16 clusters of activation was obtained. Main clusters were located in the frontal operculum, left posterior temporal region, supplementary motor area, and the parietal lobe. Less common clusters were seen in the sub-cortical structures including the left thalamus, left putamen, secondary visual areas, and the right cerebellum. Conclusion: Broca’s area-44-related networks involved in language processing were demonstrated utilizing a pooling-data connectivity study. Significance, interpretation, and limitations of the results are discussed. PMID:26074842
Sheffler, Will; Baker, David
2009-01-01
We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high-resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures.
Sheffler, Will; Baker, David
2009-01-01
We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high-resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures. PMID:19177366
Nirmalan, P K; Thulasiraj, R D; Maneksha, V; Rahmathullah, R; Ramakrishnan, R; Padmavathi, A; Munoz, S R; Ellwein, L B
2002-01-01
Aims: To assess the prevalence of vision impairment, blindness, and cataract surgery and to evaluate visual acuity outcomes after cataract surgery in a south Indian population. Methods: Cluster sampling was used to randomly select a cross sectional sample of people ≥50 years of age living in the Tirunelveli district of south India. Eligible subjects in 28 clusters were enumerated through a door to door household survey. Visual acuity measurements and ocular examinations were performed at a selected site within each of the clusters in early 2000. The principal cause of visual impairment was identified for eyes with presenting visual acuity <6/18. Independent replicate testing for quality assurance monitoring was performed in subjects with reduced vision and in a sample of those with normal vision for six of the study clusters. Results: A total of 5795 people in 3986 households were enumerated and 5411 (93.37%) were examined. The prevalence of presenting and best corrected visual acuity ≥6/18 in both eyes was 59.4% and 75.7%, respectively. Presenting vision <6/60 in both eyes (the definition of blindness in India) was found in 11.0%, and in 4.6% with best correction. Presenting blindness was associated with older age, female sex, and illiteracy. Cataract was the principal cause of blindness in at least one eye in 70.6% of blind people. The prevalence of cataract surgery was 11.8%—with an estimated 56.5% of the cataract blind already operated on. Surgical coverage was inversely associated with illiteracy and with female sex in rural areas. Within the cataract operated sample, 31.7% had presenting visual acuity ≥6/18 in both eyes and 11.8% were <6/60; 40% were bilaterally operated on, with 63% pseudophakic. Presenting vision was <6/60 in 40.7% of aphakic eyes and in 5.1% of pseudophakic eyes; with best correction the percentages were 17.6% and 3.7%, respectively. Refractive error, including uncorrected aphakia, was the main cause of visual impairment in cataract operated eyes. Vision <6/18 was associated with cataract surgery in government, as opposed to that in non-governmental/private facilities. Age, sex, literacy, and area of residence were not predictors of visual outcomes. Conclusion: Treatable blindness, particularly that associated with cataract and refractive error, remains a significant problem among older adults in south Indian populations, especially in females, the illiterate, and those living in rural areas. Further study is needed to better understand why a significant proportion of the cataract blind are not taking advantage of free of charge eye care services offered by the Aravind Eye Hospital and others in the district. While continuing to increase cataract surgical volume to reduce blindness, emphasis must also be placed on improving postoperative visual acuity outcomes. PMID:11973242
Gilbert, Clare E; Shah, S P; Jadoon, M Z; Bourne, R; Dineen, B; Khan, M A; Johnson, G J; Khan, M D
2008-01-05
To explore the association between blindness and deprivation in a nationally representative sample of adults in Pakistan. Cross sectional population based survey. 221 rural and urban clusters selected randomly throughout Pakistan. Nationally representative sample of 16 507 adults aged 30 or above (95.3% response rate). Associations between visual impairment and poverty assessed by a cluster level deprivation index and a household level poverty indicator; prevalence and causes of blindness; measures of the rate of uptake and quality of eye care services. 561 blind participants (<3/60 in the better eye) were identified during the survey. Clusters in urban Sindh province were the most affluent, whereas rural areas in Balochistan were the poorest. The prevalence of blindness in adults living in affluent clusters was 2.2%, compared with 3.7% in medium clusters and 3.9% in poor clusters (P<0.001 for affluent v poor). The highest prevalence of blindness was found in rural Balochistan (5.2%). The prevalence of total blindness (bilateral no light perception) was more than three times higher in poor clusters than in affluent clusters (0.24% v 0.07%, P<0.001). The prevalences of blindness caused by cataract, glaucoma, and corneal opacity were lower in affluent clusters and households. Reflecting access to eye care services, cataract surgical coverage was higher in affluent clusters (80.6%) than in medium (76.8%) and poor areas (75.1%). Intraocular lens implantation rates were significantly lower in participants from poorer households. 10.2% of adults living in affluent clusters presented to the examination station wearing spectacles, compared with 6.7% in medium clusters and 4.4% in poor cluster areas. Spectacle coverage in affluent areas was more than double that in poor clusters (23.5% v 11.1%, P<0.001). Blindness is associated with poverty in Pakistan; lower access to eye care services was one contributory factor. To reduce blindness, strategies targeting poor people will be needed. These interventions may have an impact on deprivation in Pakistan.
Marvel, Skylar W; To, Kimberly; Grimm, Fabian A; Wright, Fred A; Rusyn, Ivan; Reif, David M
2018-03-05
Drawing integrated conclusions from diverse source data requires synthesis across multiple types of information. The ToxPi (Toxicological Prioritization Index) is an analytical framework that was developed to enable integration of multiple sources of evidence by transforming data into integrated, visual profiles. Methodological improvements have advanced ToxPi and expanded its applicability, necessitating a new, consolidated software platform to provide functionality, while preserving flexibility for future updates. We detail the implementation of a new graphical user interface for ToxPi (Toxicological Prioritization Index) that provides interactive visualization, analysis, reporting, and portability. The interface is deployed as a stand-alone, platform-independent Java application, with a modular design to accommodate inclusion of future analytics. The new ToxPi interface introduces several features, from flexible data import formats (including legacy formats that permit backward compatibility) to similarity-based clustering to options for high-resolution graphical output. We present the new ToxPi interface for dynamic exploration, visualization, and sharing of integrated data models. The ToxPi interface is freely-available as a single compressed download that includes the main Java executable, all libraries, example data files, and a complete user manual from http://toxpi.org .
NASA Astrophysics Data System (ADS)
Ozturk, D.; Chaudhary, A.; Votava, P.; Kotfila, C.
2016-12-01
Jointly developed by Kitware and NASA Ames, GeoNotebook is an open source tool designed to give the maximum amount of flexibility to analysts, while dramatically simplifying the process of exploring geospatially indexed datasets. Packages like Fiona (backed by GDAL), Shapely, Descartes, Geopandas, and PySAL provide a stack of technologies for reading, transforming, and analyzing geospatial data. Combined with the Jupyter notebook and libraries like matplotlib/Basemap it is possible to generate detailed geospatial visualizations. Unfortunately, visualizations generated is either static or does not perform well for very large datasets. Also, this setup requires a great deal of boilerplate code to create and maintain. Other extensions exist to remedy these problems, but they provide a separate map for each input cell and do not support map interactions that feed back into the python environment. To support interactive data exploration and visualization on large datasets we have developed an extension to the Jupyter notebook that provides a single dynamic map that can be managed from the Python environment, and that can communicate back with a server which can perform operations like data subsetting on a cloud-based cluster.
Real-time interactive tractography analysis for multimodal brain visualization tool: MultiXplore
NASA Astrophysics Data System (ADS)
Bakhshmand, Saeed M.; de Ribaupierre, Sandrine; Eagleson, Roy
2017-03-01
Most debilitating neurological disorders can have anatomical origins. Yet unlike other body organs, the anatomy alone cannot easily provide an understanding of brain functionality. In fact, addressing the challenge of linking structural and functional connectivity remains in the frontiers of neuroscience. Aggregating multimodal neuroimaging datasets may be critical for developing theories that span brain functionality, global neuroanatomy and internal microstructures. Functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI) are main such techniques that are employed to investigate the brain under normal and pathological conditions. FMRI records blood oxygenation level of the grey matter (GM), whereas DTI is able to reveal the underlying structure of the white matter (WM). Brain global activity is assumed to be an integration of GM functional hubs and WM neural pathways that serve to connect them. In this study we developed and evaluated a two-phase algorithm. This algorithm is employed in a 3D interactive connectivity visualization framework and helps to accelerate clustering of virtual neural pathways. In this paper, we will detail an algorithm that makes use of an index-based membership array formed for a whole brain tractography file and corresponding parcellated brain atlas. Next, we demonstrate efficiency of the algorithm by measuring required times for extracting a variety of fiber clusters, which are chosen in such a way to resemble all sizes probable output data files that algorithm will generate. The proposed algorithm facilitates real-time visual inspection of neuroimaging data to further the discovery in structure-function relationship of the brain networks.
Topographic Independent Component Analysis reveals random scrambling of orientation in visual space
Martinez-Garcia, Marina; Martinez, Luis M.
2017-01-01
Neurons at primary visual cortex (V1) in humans and other species are edge filters organized in orientation maps. In these maps, neurons with similar orientation preference are clustered together in iso-orientation domains. These maps have two fundamental properties: (1) retinotopy, i.e. correspondence between displacements at the image space and displacements at the cortical surface, and (2) a trade-off between good coverage of the visual field with all orientations and continuity of iso-orientation domains in the cortical space. There is an active debate on the origin of these locally continuous maps. While most of the existing descriptions take purely geometric/mechanistic approaches which disregard the network function, a clear exception to this trend in the literature is the original approach of Hyvärinen and Hoyer based on infomax and Topographic Independent Component Analysis (TICA). Although TICA successfully addresses a number of other properties of V1 simple and complex cells, in this work we question the validity of the orientation maps obtained from TICA. We argue that the maps predicted by TICA can be analyzed in the retinal space, and when doing so, it is apparent that they lack the required continuity and retinotopy. Here we show that in the orientation maps reported in the TICA literature it is easy to find examples of violation of the continuity between similarly tuned mechanisms in the retinal space, which suggest a random scrambling incompatible with the maps in primates. The new experiments in the retinal space presented here confirm this guess: TICA basis vectors actually follow a random salt-and-pepper organization back in the image space. Therefore, the interesting clusters found in the TICA topology cannot be interpreted as the actual cortical orientation maps found in cats, primates or humans. In conclusion, Topographic ICA does not reproduce cortical orientation maps. PMID:28640816
Topographic Independent Component Analysis reveals random scrambling of orientation in visual space.
Martinez-Garcia, Marina; Martinez, Luis M; Malo, Jesús
2017-01-01
Neurons at primary visual cortex (V1) in humans and other species are edge filters organized in orientation maps. In these maps, neurons with similar orientation preference are clustered together in iso-orientation domains. These maps have two fundamental properties: (1) retinotopy, i.e. correspondence between displacements at the image space and displacements at the cortical surface, and (2) a trade-off between good coverage of the visual field with all orientations and continuity of iso-orientation domains in the cortical space. There is an active debate on the origin of these locally continuous maps. While most of the existing descriptions take purely geometric/mechanistic approaches which disregard the network function, a clear exception to this trend in the literature is the original approach of Hyvärinen and Hoyer based on infomax and Topographic Independent Component Analysis (TICA). Although TICA successfully addresses a number of other properties of V1 simple and complex cells, in this work we question the validity of the orientation maps obtained from TICA. We argue that the maps predicted by TICA can be analyzed in the retinal space, and when doing so, it is apparent that they lack the required continuity and retinotopy. Here we show that in the orientation maps reported in the TICA literature it is easy to find examples of violation of the continuity between similarly tuned mechanisms in the retinal space, which suggest a random scrambling incompatible with the maps in primates. The new experiments in the retinal space presented here confirm this guess: TICA basis vectors actually follow a random salt-and-pepper organization back in the image space. Therefore, the interesting clusters found in the TICA topology cannot be interpreted as the actual cortical orientation maps found in cats, primates or humans. In conclusion, Topographic ICA does not reproduce cortical orientation maps.
Mactaggart, Islay; Polack, Sarah; Murthy, Gvs; Kuper, Hannah
2018-06-01
To estimate the prevalence and correlates of visual impairment in Mahabubnagar district, Telangana, India. Fifty-one clusters of 80 people (all ages) were sampled with probability proportionate to size. Households within clusters were selected through the compact segment sampling. Visual acuity (VA) was measured with a tumbling "E" chart. An Ophthalmic Assistant or Vision Technician examined people with VA<6/12 in either eye. Other impairments (hearing, physical) were clinically assessed and self-reported functional difficulties measured using the Washington Group Extended Set. People with visual impairment and age-sex matched controls with normal vision were interviewed about poverty, employment and education. 4,125 people were enumerated and 3,574 screened (86.6%). The prevalence of visual impairment (VA<6/12) was 8.0% (95% CI = 6.9-9.4%) and blindness was 0.4% (0.2-0.9%), and both increased rapidly with age. Uncorrected refractive error was the leading cause of visual impairment, and cataract the leading cause of blindness. Cataract surgical coverage (proportion of all cataracts that had received surgery) was relatively low (41% of people at VA<6/60), while the post-surgery outcomes were good (81% of operated eyes had presenting VA≥6/18). Among the 287 people with visual impairment, 15% had a moderate/severe physical impairment or epilepsy and 25% had a moderate/severe hearing impairment. Self-reported difficulties in vision were relatively closely related to visual acuity. People with visual impairment were more likely to be in the poorest quartile (OR = 1.9, 95% CI = 1.0-3.4) or unemployed (5.0, 2.2-10.0), compared to controls. Visual impairment was common in Mahabubnagar district, was mostly avoidable, and was correlated with poverty markers.
Visualization of protein interaction networks: problems and solutions
2013-01-01
Background Visualization concerns the representation of data visually and is an important task in scientific research. Protein-protein interactions (PPI) are discovered using either wet lab techniques, such mass spectrometry, or in silico predictions tools, resulting in large collections of interactions stored in specialized databases. The set of all interactions of an organism forms a protein-protein interaction network (PIN) and is an important tool for studying the behaviour of the cell machinery. Since graphic representation of PINs may highlight important substructures, e.g. protein complexes, visualization is more and more used to study the underlying graph structure of PINs. Although graphs are well known data structures, there are different open problems regarding PINs visualization: the high number of nodes and connections, the heterogeneity of nodes (proteins) and edges (interactions), the possibility to annotate proteins and interactions with biological information extracted by ontologies (e.g. Gene Ontology) that enriches the PINs with semantic information, but complicates their visualization. Methods In these last years many software tools for the visualization of PINs have been developed. Initially thought for visualization only, some of them have been successively enriched with new functions for PPI data management and PIN analysis. The paper analyzes the main software tools for PINs visualization considering four main criteria: (i) technology, i.e. availability/license of the software and supported OS (Operating System) platforms; (ii) interoperability, i.e. ability to import/export networks in various formats, ability to export data in a graphic format, extensibility of the system, e.g. through plug-ins; (iii) visualization, i.e. supported layout and rendering algorithms and availability of parallel implementation; (iv) analysis, i.e. availability of network analysis functions, such as clustering or mining of the graph, and the possibility to interact with external databases. Results Currently, many tools are available and it is not easy for the users choosing one of them. Some tools offer sophisticated 2D and 3D network visualization making available many layout algorithms, others tools are more data-oriented and support integration of interaction data coming from different sources and data annotation. Finally, some specialistic tools are dedicated to the analysis of pathways and cellular processes and are oriented toward systems biology studies, where the dynamic aspects of the processes being studied are central. Conclusion A current trend is the deployment of open, extensible visualization tools (e.g. Cytoscape), that may be incrementally enriched by the interactomics community with novel and more powerful functions for PIN analysis, through the development of plug-ins. On the other hand, another emerging trend regards the efficient and parallel implementation of the visualization engine that may provide high interactivity and near real-time response time, as in NAViGaTOR. From a technological point of view, open-source, free and extensible tools, like Cytoscape, guarantee a long term sustainability due to the largeness of the developers and users communities, and provide a great flexibility since new functions are continuously added by the developer community through new plug-ins, but the emerging parallel, often closed-source tools like NAViGaTOR, can offer near real-time response time also in the analysis of very huge PINs. PMID:23368786
Evaluating Combinations of Ranked Lists and Visualizations of Inter-Document Similarity.
ERIC Educational Resources Information Center
Allan, James; Leuski, Anton; Swan, Russell; Byrd, Donald
2001-01-01
Considers how ideas from document clustering can be used to improve retrieval accuracy of ranked lists in interactive systems and how to evaluate system effectiveness. Describes a TREC (Text Retrieval Conference) study that constructed and evaluated systems that present the user with ranked lists and a visualization of inter-document similarities.…
Liem, David Alexandre; Murali, Sanjana; Sigdel, Dibakar; Shi, Yu; Wang, Xuan; Shen, Jiaming; Choi, Howard; Caufield, J Harry; Wang, Wei; Ping, Peipei; Han, Jiawei
2018-05-18
Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. By using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely ischemic heart disease (IHD), cardiomyopathies (CM), cerebrovascular accident (CVA), congenital heart disease (CHD), arrhythmias (ARR), and valve disease (VD), anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the six groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-aware Semantic Online Analytical Processing (CaseOLAP) was applied to semantically rank the association of proteins to each and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein is visualized as a six dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all six CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters associated with a targeted CVD; analyses revealed unexpected insights underlying ECM-pathogenesis of CVDs.
Towards semi-automatic rock mass discontinuity orientation and set analysis from 3D point clouds
NASA Astrophysics Data System (ADS)
Guo, Jiateng; Liu, Shanjun; Zhang, Peina; Wu, Lixin; Zhou, Wenhui; Yu, Yinan
2017-06-01
Obtaining accurate information on rock mass discontinuities for deformation analysis and the evaluation of rock mass stability is important. Obtaining measurements for high and steep zones with the traditional compass method is difficult. Photogrammetry, three-dimensional (3D) laser scanning and other remote sensing methods have gradually become mainstream methods. In this study, a method that is based on a 3D point cloud is proposed to semi-automatically extract rock mass structural plane information. The original data are pre-treated prior to segmentation by removing outlier points. The next step is to segment the point cloud into different point subsets. Various parameters, such as the normal, dip/direction and dip, can be calculated for each point subset after obtaining the equation of the best fit plane for the relevant point subset. A cluster analysis (a point subset that satisfies some conditions and thus forms a cluster) is performed based on the normal vectors by introducing the firefly algorithm (FA) and the fuzzy c-means (FCM) algorithm. Finally, clusters that belong to the same discontinuity sets are merged and coloured for visualization purposes. A prototype system is developed based on this method to extract the points of the rock discontinuity from a 3D point cloud. A comparison with existing software shows that this method is feasible. This method can provide a reference for rock mechanics, 3D geological modelling and other related fields.
OGLE Collection of Star Clusters. New Objects in the Outskirts of the Large Magellanic Cloud
NASA Astrophysics Data System (ADS)
Sitek, M.; Szymański, M. K.; Skowron, D. M.; Udalski, A.; Kostrzewa-Rutkowska, Z.; Skowron, J.; Karczmarek, P.; Cieślar, M.; Wyrzykowski, Ł.; Kozłowski, S.; Pietrukowicz, P.; Soszyński, I.; Mróz, P.; Pawlak, M.; Poleski, R.; Ulaczyk, K.
2016-09-01
The Magellanic System (MS), consisting of the Large Magellanic Cloud (LMC), the Small Magellanic Cloud (SMC) and the Magellanic Bridge (MBR), contains diverse sample of star clusters. Their spatial distribution, ages and chemical abundances may provide important information about the history of formation of the whole System. We use deep photometric maps derived from the images collected during the fourth phase of the Optical Gravitational Lensing Experiment (OGLE-IV) to construct the most complete catalog of star clusters in the Large Magellanic Cloud using the homogeneous photometric data. In this paper we present the collection of star clusters found in the area of about 225 square degrees in the outer regions of the LMC. Our sample contains 679 visually identified star cluster candidates, 226 of which were not listed in any of the previously published catalogs. The new clusters are mainly young small open clusters or clusters similar to associations.
Spatio-temporal analysis of small-area intestinal parasites infections in Ghana.
Osei, F B; Stein, A
2017-09-22
Intestinal parasites infection is a major public health burden in low and middle-income countries. In Ghana, it is amongst the top five morbidities. In order to optimize scarce resources, reliable information on its geographical distribution is needed to guide periodic mass drug administration to populations of high risk. We analyzed district level morbidities of intestinal parasites between 2010 and 2014 using exploratory spatial analysis and geostatistics. We found a significantly positive Moran's Index of spatial autocorrelation for each year, suggesting that adjoining districts have similar risk levels. Using local Moran's Index, we found high-high clusters extending towards the Guinea and Sudan Savannah ecological zones, whereas low-low clusters extended within the semi-deciduous forest and transitional ecological zones. Variograms indicated that local and regional scale risk factors modulate the variation of intestinal parasites. Poisson kriging maps showed smoothed spatially varied distribution of intestinal parasites risk. These emphasize the need for a follow-up investigation into the exact determining factors modulating the observed patterns. The findings also underscored the potential of exploratory spatial analysis and geostatistics as tools for visualizing the spatial distribution of small area intestinal worms infections.
TrackMate: An open and extensible platform for single-particle tracking.
Tinevez, Jean-Yves; Perry, Nick; Schindelin, Johannes; Hoopes, Genevieve M; Reynolds, Gregory D; Laplantine, Emmanuel; Bednarek, Sebastian Y; Shorte, Spencer L; Eliceiri, Kevin W
2017-02-15
We present TrackMate, an open source Fiji plugin for the automated, semi-automated, and manual tracking of single-particles. It offers a versatile and modular solution that works out of the box for end users, through a simple and intuitive user interface. It is also easily scriptable and adaptable, operating equally well on 1D over time, 2D over time, 3D over time, or other single and multi-channel image variants. TrackMate provides several visualization and analysis tools that aid in assessing the relevance of results. The utility of TrackMate is further enhanced through its ability to be readily customized to meet specific tracking problems. TrackMate is an extensible platform where developers can easily write their own detection, particle linking, visualization or analysis algorithms within the TrackMate environment. This evolving framework provides researchers with the opportunity to quickly develop and optimize new algorithms based on existing TrackMate modules without the need of having to write de novo user interfaces, including visualization, analysis and exporting tools. The current capabilities of TrackMate are presented in the context of three different biological problems. First, we perform Caenorhabditis-elegans lineage analysis to assess how light-induced damage during imaging impairs its early development. Our TrackMate-based lineage analysis indicates the lack of a cell-specific light-sensitive mechanism. Second, we investigate the recruitment of NEMO (NF-κB essential modulator) clusters in fibroblasts after stimulation by the cytokine IL-1 and show that photodamage can generate artifacts in the shape of TrackMate characterized movements that confuse motility analysis. Finally, we validate the use of TrackMate for quantitative lifetime analysis of clathrin-mediated endocytosis in plant cells. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Borra, Elena; Visco-Comandini, Federica; Averbeck, Bruno B.
2017-01-01
The statistical structure of intrinsic parietal and parieto-frontal connectivity in monkeys was studied through hierarchical cluster analysis. Based on their inputs, parietal and frontal areas were grouped into different clusters, including a variable number of areas that in most instances occupied contiguous architectonic fields. Connectivity tended to be stronger locally: that is, within areas of the same cluster. Distant frontal and parietal areas were targeted through connections that in most instances were reciprocal and often of different strength. These connections linked parietal and frontal clusters formed by areas sharing basic functional properties. This led to five different medio-laterally oriented pillar domains spanning the entire extent of the parieto-frontal system, in the posterior parietal, anterior parietal, cingulate, frontal, and prefrontal cortex. Different information processing streams could be identified thanks to inter-domain connectivity. These streams encode fast hand reaching and its control, complex visuomotor action spaces, hand grasping, action/intention recognition, oculomotor intention and visual attention, behavioral goals and strategies, and reward and decision value outcome. Most of these streams converge on the cingulate domain, the main hub of the system. All of them are embedded within a larger eye–hand coordination network, from which they can be selectively set in motion by task demands. PMID:28275714
2017-12-08
Release Date: March 10, 2010 - Distant galaxy clusters mysteriously stream at a million miles per hour along a path roughly centered on the southern constellations Centaurus and Hydra. A new study led by Alexander Kashlinsky at NASA's Goddard Space Flight Center in Greenbelt, Md., tracks this collective motion -- dubbed the "dark flow" -- to twice the distance originally reported, out to more than 2.5 billion light-years. Abell 1689, redshift 0.181. Credit: NASA/Goddard Space Flight Center/Scientific Visualization Studio/ESA/L. Bradley/JHU To learn more go to: www.nasa.gov/centers/goddard/news/releases/2010/10-023.html To see other visualizations related to this story go to: svs.gsfc.nasa.gov/goto?10580
Comparison of memory thresholds for planar qudit geometries
NASA Astrophysics Data System (ADS)
Marks, Jacob; Jochym-O'Connor, Tomas; Gheorghiu, Vlad
2017-11-01
We introduce and analyze a new type of decoding algorithm called general color clustering, based on renormalization group methods, to be used in qudit color codes. The performance of this decoder is analyzed under a generalized bit-flip error model, and is used to obtain the first memory threshold estimates for qudit 6-6-6 color codes. The proposed decoder is compared with similar decoding schemes for qudit surface codes as well as the current leading qubit decoders for both sets of codes. We find that, as with surface codes, clustering performs sub-optimally for qubit color codes, giving a threshold of 5.6 % compared to the 8.0 % obtained through surface projection decoding methods. However, the threshold rate increases by up to 112% for large qudit dimensions, plateauing around 11.9 % . All the analysis is performed using QTop, a new open-source software for simulating and visualizing topological quantum error correcting codes.
A superior edge preserving filter with a systematic analysis
NASA Technical Reports Server (NTRS)
Holladay, Kenneth W.; Rickman, Doug
1991-01-01
A new, adaptive, edge preserving filter for use in image processing is presented. It had superior performance when compared to other filters. Termed the contiguous K-average, it aggregates pixels by examining all pixels contiguous to an existing cluster and adding the pixel closest to the mean of the existing cluster. The process is iterated until K pixels were accumulated. Rather than simply compare the visual results of processing with this operator to other filters, some approaches were developed which allow quantitative evaluation of how well and filter performs. Particular attention is given to the standard deviation of noise within a feature and the stability of imagery under iterative processing. Demonstrations illustrate the performance of several filters to discriminate against noise and retain edges, the effect of filtering as a preprocessing step, and the utility of the contiguous K-average filter when used with remote sensing data.
NASA Astrophysics Data System (ADS)
Park, Sang Cheol; Zheng, Bin; Wang, Xiao-Hui; Gur, David
2008-03-01
Digital breast tomosynthesis (DBT) has emerged as a promising imaging modality for screening mammography. However, visually detecting micro-calcification clusters depicted on DBT images is a difficult task. Computer-aided detection (CAD) schemes for detecting micro-calcification clusters depicted on mammograms can achieve high performance and the use of CAD results can assist radiologists in detecting subtle micro-calcification clusters. In this study, we compared the performance of an available 2D based CAD scheme with one that includes a new grouping and scoring method when applied to both projection and reconstructed DBT images. We selected a dataset involving 96 DBT examinations acquired on 45 women. Each DBT image set included 11 low dose projection images and a varying number of reconstructed image slices ranging from 18 to 87. In this dataset 20 true-positive micro-calcification clusters were visually detected on the projection images and 40 were visually detected on the reconstructed images, respectively. We first applied the CAD scheme that was previously developed in our laboratory to the DBT dataset. We then tested a new grouping method that defines an independent cluster by grouping the same cluster detected on different projection or reconstructed images. We then compared four scoring methods to assess the CAD performance. The maximum sensitivity level observed for the different grouping and scoring methods were 70% and 88% for the projection and reconstructed images with a maximum false-positive rate of 4.0 and 15.9 per examination, respectively. This preliminary study demonstrates that (1) among the maximum, the minimum or the average CAD generated scores, using the maximum score of the grouped cluster regions achieved the highest performance level, (2) the histogram based scoring method is reasonably effective in reducing false-positive detections on the projection images but the overall CAD sensitivity is lower due to lower signal-to-noise ratio, and (3) CAD achieved higher sensitivity and higher false-positive rate (per examination) on the reconstructed images. We concluded that without changing the detection threshold or performing pre-filtering to possibly increase detection sensitivity, current CAD schemes developed and optimized for 2D mammograms perform relatively poorly and need to be re-optimized using DBT datasets and new grouping and scoring methods need to be incorporated into the schemes if these are to be used on the DBT examinations.
Hanzen, Gineke; van Nispen, Ruth M A; van der Putten, Annette A J; Waninge, Aly
2017-02-01
The available opinions regarding participation do not appear to be applicable to adults with visual and severe or profound intellectual disabilities (VSPID). Because a clear definition and operationalization are lacking, it is difficult for support professionals to give meaning to participation for adults with VSPID. The purpose of the present study was to develop a definition and operationalization of the concept of participation of adults with VSPID. Parents or family members, professionals, and experts participated in an online concept mapping procedure. This procedure includes generating statements, clustering them, and rating their importance. The data were analyzed quantitatively using multidimensional scaling and qualitatively with triangulation. A total of 53 participants generated 319 statements of which 125 were clustered and rated. The final cluster map of the statements contained seven clusters: (1) Experience and discover; (2) Inclusion; (3) Involvement; (4) Leisure and recreation; (5) Communication and being understood; (6) Social relations; and (7) Self-management and autonomy. The average importance rating of the statements varied from 6.49 to 8.95. A definition of participation of this population was developed which included these seven clusters. The combination of the developed definition, the clusters, and the statements in these clusters, derived from the perceptions of parents or family members, professionals, and experts, can be employed to operationalize the construct of participation of adults with VSPID. This operationalization supports professionals in their ability to give meaning to participation in these adults. Future research will focus on using the operationalization as a checklist of participation for adults with VSPID. Copyright © 2016 Elsevier Ltd. All rights reserved.
Clustering document fragments using background color and texture information
NASA Astrophysics Data System (ADS)
Chanda, Sukalpa; Franke, Katrin; Pal, Umapada
2012-01-01
Forensic analysis of questioned documents sometimes can be extensively data intensive. A forensic expert might need to analyze a heap of document fragments and in such cases to ensure reliability he/she should focus only on relevant evidences hidden in those document fragments. Relevant document retrieval needs finding of similar document fragments. One notion of obtaining such similar documents could be by using document fragment's physical characteristics like color, texture, etc. In this article we propose an automatic scheme to retrieve similar document fragments based on visual appearance of document paper and texture. Multispectral color characteristics using biologically inspired color differentiation techniques are implemented here. This is done by projecting document color characteristics to Lab color space. Gabor filter-based texture analysis is used to identify document texture. It is desired that document fragments from same source will have similar color and texture. For clustering similar document fragments of our test dataset we use a Self Organizing Map (SOM) of dimension 5×5, where the document color and texture information are used as features. We obtained an encouraging accuracy of 97.17% from 1063 test images.
Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer
2015-01-01
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885
On the analysis of large data sets
NASA Astrophysics Data System (ADS)
Ruch, Gerald T., Jr.
We present a set of tools and techniques for performing detailed comparisons between computational models with high dimensional parameter spaces and large sets of archival data. By combining a principal component analysis of a large grid of samples from the model with an artificial neural network, we create a powerful data visualization tool as well as a way to robustly recover physical parameters from a large set of experimental data. Our techniques are applied in the context of circumstellar disks, the likely sites of planetary formation. An analysis is performed applying the two layer approximation of Chiang et al. (2001) and Dullemond et al. (2001) to the archive created by the Spitzer Space Telescope Cores to Disks Legacy program. We find two populations of disk sources. The first population is characterized by the lack of a puffed up inner rim while the second population appears to contain an inner rim which casts a shadow across the disk. The first population also exhibits a trend of increasing spectral index while the second population exhibits a decreasing trend in the strength of the 20 mm silicate emission feature. We also present images of the giant molecular cloud W3 obtained with the Infrared Array Camera (IRAC) and the Multiband Imaging Photometer (MIPS) on board the Spitzer Space Telescope. The images encompass the star forming regions W3 Main, W3(OH), and a region that we refer to as the Central Cluster which encloses the emission nebula IC 1795. We present a star count analysis of the point sources detected in W3. The star count analysis shows that the stellar population of the Central Cluster, when compared to that in the background, contains an over density of sources. The Central Cluster also contains an excess of sources with colors consistent with Class II Young Stellar Objects (YSOs). A analysis of the color-color diagrams also reveals a large number of Class II YSOs in the Central Cluster. Our results suggest that an earlier epoch of star formation created the Central Cluster, created a cavity, and triggered the active star formation in the W3 Main and W3(OH) regions. We also detect a new outflow and its candidate exciting star.
Hubble Space Telescope,Spitzer Space Telescope
2018-01-11
This image showcases both the visible and infrared visualizations of the Orion Nebula. This view from a movie sequence looks down the 'valley' leading to the star cluster at the far end. The left side of the image shows the visible-light visualization, which fades to the infrared-light visualization on the right. These two contrasting models derive from observations by the Hubble and Spitzer space telescopes. An animation is available at https://photojournal.jpl.nasa.gov/catalog/PIA22089
Jin, Lingmin; Sun, Jinbo; Xu, Ziliang; Yang, Xuejuan; Liu, Peng; Qin, Wei
2018-02-01
To use a promising analytical method, namely intersubject synchronisation (ISS), to evaluate the brain activity associated with the instant effects of acupuncture and compare the findings with traditional general linear model (GLM) methods. 30 healthy volunteers were recruited for this study. Block-designed manual acupuncture stimuli were delivered at SP6, and de qi sensations were measured after acupuncture stimulation. All subjects underwent functional MRI (fMRI) scanning during the acupuncture stimuli. The fMRI data were separately analysed by ISS and traditional GLM methods. All subjects experienced de qi sensations. ISS analysis showed that the regions activated during acupuncture stimulation at SP6 were mainly divided into five clusters based on the time courses. The time courses of clusters 1 and 2 were in line with the acupuncture stimulation pattern, and the active regions were mainly involved in the sensorimotor system and salience network. Clusters 3, 4 and 5 displayed an almost contrary time course relative to the stimulation pattern. The brain regions activated included the default mode network, descending pain modulation pathway and visual cortices. GLM analysis indicated that the brain responses associated with the instant effects of acupuncture were largely implicated in sensory and motor processing and sensory integration. The ISS analysis considered the sustained effect of acupuncture and uncovered additional information not shown by GLM analysis. We suggest that ISS may be a suitable approach to investigate the brain responses associated with the instant effects of acupuncture. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Al-Samarrai, Taha H.; Zhang, Ningxin; Lamont, Iain L.; Martin, Lois; Kolbe, John; Wilsher, Margaret; Morris, Arthur J.; Schmid, Jan
2000-01-01
We describe here a method for computer-assisted fingerprinting of Pseudomonas aeruginosa. In this method, DNA is digested with SalI, and bands with molecular sizes of ≥9.7 kb are visually scored after electrophoresis on agarose gels. Pattern scores are entered into a Microsoft Excel database. In scoring, the number of bands within each of a set of molecular size ranges is scored, rather than the absolute molecular size of each band, substantially enhancing the speed and reproducibility of the method, while eliminating the need for using expensive gel scanning equipment and software. Pattern scores are used to generate matrices of genetic distance values, which can be visualized in neighbor-joining trees. The method reliably distinguishes two epidemiologically unrelated isolates in 99.3% of all comparisons. The genetic relationships between isolates observed with the method were consistent with those obtained by analysis of two P. aeruginosa genes, indicating that it provides valid estimates of genetic divergence between isolates. Using the method, respiratory tract isolates from cystic fibrosis patients in Green Lane Hospital in Auckland, New Zealand, were shown to be genetically less diverse than epidemiologically unrelated isolates from other patients. This finding was not due to the existence of clusters of related strains specialized toward colonization of the respiratory tract and thus was indicative of transmission between patients. Analysis of multiple isolates from individual cystic fibrosis patients suggested that up to five separate clusters of genetically related strains may simultaneously be present in a patient. The method described should significantly enhance our ability to investigate the epidemiology of P. aeruginosa. PMID:11101578
Analysis of neoplastic lesions in magnetic resonance imaging using self-organizing maps.
Mei, Paulo Afonso; de Carvalho Carneiro, Cleyton; Fraser, Stephen J; Min, Li Li; Reis, Fabiano
2015-12-15
To provide an improved method for the identification and analysis of brain tumors in MRI scans using a semi-automated computational approach, that has the potential to provide a more objective, precise and quantitatively rigorous analysis, compared to human visual analysis. Self-Organizing Maps (SOM) is an unsupervised, exploratory data analysis tool, which can automatically domain an image into selfsimilar regions or clusters, based on measures of similarity. It can be used to perform image-domain of brain tissue on MR images, without prior knowledge. We used SOM to analyze T1, T2 and FLAIR acquisitions from two MRI machines in our service from 14 patients with brain tumors confirmed by biopsies--three lymphomas, six glioblastomas, one meningioma, one ganglioglioma, two oligoastrocytomas and one astrocytoma. The SOM software was used to analyze the data from the three image acquisitions from each patient and generated a self-organized map for each containing 25 clusters. Damaged tissue was separated from the normal tissue using the SOM technique. Furthermore, in some cases it allowed to separate different areas from within the tumor--like edema/peritumoral infiltration and necrosis. In lesions with less precise boundaries in FLAIR, the estimated damaged tissue area in the resulting map appears bigger. Our results showed that SOM has the potential to be a powerful MR imaging analysis technique for the assessment of brain tumors. Copyright © 2015. Published by Elsevier B.V.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Hyun Jung; McDonnell, Kevin T.; Zelenyuk, Alla
2014-03-01
Although the Euclidean distance does well in measuring data distances within high-dimensional clusters, it does poorly when it comes to gauging inter-cluster distances. This significantly impacts the quality of global, low-dimensional space embedding procedures such as the popular multi-dimensional scaling (MDS) where one can often observe non-intuitive layouts. We were inspired by the perceptual processes evoked in the method of parallel coordinates which enables users to visually aggregate the data by the patterns the polylines exhibit across the dimension axes. We call the path of such a polyline its structure and suggest a metric that captures this structure directly inmore » high-dimensional space. This allows us to better gauge the distances of spatially distant data constellations and so achieve data aggregations in MDS plots that are more cognizant of existing high-dimensional structure similarities. Our MDS plots also exhibit similar visual relationships as the method of parallel coordinates which is often used alongside to visualize the high-dimensional data in raw form. We then cast our metric into a bi-scale framework which distinguishes far-distances from near-distances. The coarser scale uses the structural similarity metric to separate data aggregates obtained by prior classification or clustering, while the finer scale employs the appropriate Euclidean distance.« less
Abualhaj, Bedor; Weng, Guoyang; Ong, Melissa; Attarwala, Ali Asgar; Molina, Flavia; Büsing, Karen; Glatting, Gerhard
2017-01-01
Dynamic [ 18 F]fluoro-ethyl-L-tyrosine positron emission tomography ([ 18 F]FET-PET) is used to identify tumor lesions for radiotherapy treatment planning, to differentiate glioma recurrence from radiation necrosis and to classify gliomas grading. To segment different regions in the brain k-means cluster analysis can be used. The main disadvantage of k-means is that the number of clusters must be pre-defined. In this study, we therefore compared different cluster validity indices for automated and reproducible determination of the optimal number of clusters based on the dynamic PET data. The k-means algorithm was applied to dynamic [ 18 F]FET-PET images of 8 patients. Akaike information criterion (AIC), WB, I, modified Dunn's and Silhouette indices were compared on their ability to determine the optimal number of clusters based on requirements for an adequate cluster validity index. To check the reproducibility of k-means, the coefficients of variation CVs of the objective function values OFVs (sum of squared Euclidean distances within each cluster) were calculated using 100 random centroid initialization replications RCI 100 for 2 to 50 clusters. k-means was performed independently on three neighboring slices containing tumor for each patient to investigate the stability of the optimal number of clusters within them. To check the independence of the validity indices on the number of voxels, cluster analysis was applied after duplication of a slice selected from each patient. CVs of index values were calculated at the optimal number of clusters using RCI 100 to investigate the reproducibility of the validity indices. To check if the indices have a single extremum, visual inspection was performed on the replication with minimum OFV from RCI 100 . The maximum CV of OFVs was 2.7 × 10 -2 from all patients. The optimal number of clusters given by modified Dunn's and Silhouette indices was 2 or 3 leading to a very poor segmentation. WB and I indices suggested in median 5, [range 4-6] and 4, [range 3-6] clusters, respectively. For WB, I, modified Dunn's and Silhouette validity indices the suggested optimal number of clusters was not affected by the number of the voxels. The maximum coefficient of variation of WB, I, modified Dunn's, and Silhouette validity indices were 3 × 10 -2 , 1, 2 × 10 -1 and 3 × 10 -3 , respectively. WB-index showed a single global maximum, whereas the other indices showed also local extrema. From the investigated cluster validity indices, the WB-index is best suited for automated determination of the optimal number of clusters for [ 18 F]FET-PET brain images for the investigated image reconstruction algorithm and the used scanner: it yields meaningful results allowing better differentiation of tissues with higher number of clusters, it is simple, reproducible and has an unique global minimum. © 2016 American Association of Physicists in Medicine.
Yasuda, Akihito; Onuki, Yoshinori; Obata, Yasuko; Takayama, Kozo
2015-01-01
The "quality by design" concept in pharmaceutical formulation development requires the establishment of a science-based rationale and design space. In this article, we integrate thin-plate spline (TPS) interpolation, Kohonen's self-organizing map (SOM) and a Bayesian network (BN) to visualize the latent structure underlying causal factors and pharmaceutical responses. As a model pharmaceutical product, theophylline tablets were prepared using a standard formulation. We measured the tensile strength and disintegration time as response variables and the compressibility, cohesion and dispersibility of the pretableting blend as latent variables. We predicted these variables quantitatively using nonlinear TPS, generated a large amount of data on pretableting blends and tablets and clustered these data into several clusters using a SOM. Our results show that we are able to predict the experimental values of the latent and response variables with a high degree of accuracy and are able to classify the tablet data into several distinct clusters. In addition, to visualize the latent structure between the causal and latent factors and the response variables, we applied a BN method to the SOM clustering results. We found that despite having inserted latent variables between the causal factors and response variables, their relation is equivalent to the results for the SOM clustering, and thus we are able to explain the underlying latent structure. Consequently, this technique provides a better understanding of the relationships between causal factors and pharmaceutical responses in theophylline tablet formulation.
Schroeder, David; Korsakov, Fedor; Knipe, Carissa Mai-Ping; Thorson, Lauren; Ellingson, Arin M.; Nuckley, David; Carlis, John; Keefe, Daniel F
2017-01-01
In biomechanics studies, researchers collect, via experiments or simulations, datasets with hundreds or thousands of trials, each describing the same type of motion (e.g., a neck flexion-extension exercise) but under different conditions (e.g., different patients, different disease states, pre- and post-treatment). Analyzing similarities and differences across all of the trials in these collections is a major challenge. Visualizing a single trial at a time does not work, and the typical alternative of juxtaposing multiple trials in a single visual display leads to complex, difficult-to-interpret visualizations. We address this problem via a new strategy that organizes the analysis around motion trends rather than trials. This new strategy matches the cognitive approach that scientists would like to take when analyzing motion collections. We introduce several technical innovations making trend-centric motion visualization possible. First, an algorithm detects a motion collection’s trends via time-dependent clustering. Second, a 2D graphical technique visualizes how trials leave and join trends. Third, a 3D graphical technique, using a median 3D motion plus a visual variance indicator, visualizes the biomechanics of the set of trials within each trend. These innovations are combined to create an interactive exploratory visualization tool, which we designed through an iterative process in collaboration with both domain scientists and a traditionally-trained graphic designer. We report on insights generated during this design process and demonstrate the tool’s effectiveness via a validation study with synthetic data and feedback from expert musculoskeletal biomechanics researchers who used the tool to analyze the effects of disc degeneration on human spinal kinematics. PMID:26356978
Radom, Marcin; Rybarczyk, Agnieszka; Szawulak, Bartlomiej; Andrzejewski, Hubert; Chabelski, Piotr; Kozak, Adam; Formanowicz, Piotr
2017-12-01
Model development and its analysis is a fundamental step in systems biology. The theory of Petri nets offers a tool for such a task. Since the rapid development of computer science, a variety of tools for Petri nets emerged, offering various analytical algorithms. From this follows a problem of using different programs to analyse a single model. Many file formats and different representations of results make the analysis much harder. Especially for larger nets the ability to visualize the results in a proper form provides a huge help in the understanding of their significance. We present a new tool for Petri nets development and analysis called Holmes. Our program contains algorithms for model analysis based on different types of Petri nets, e.g. invariant generator, Maximum Common Transitions (MCT) sets and cluster modules, simulation algorithms or knockout analysis tools. A very important feature is the ability to visualize the results of almost all analytical modules. The integration of such modules into one graphical environment allows a researcher to fully devote his or her time to the model building and analysis. Available at http://www.cs.put.poznan.pl/mradom/Holmes/holmes.html. piotr@cs.put.poznan.pl. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
The effect of billboard design specifications on driving: A pilot study.
Marciano, Hadas; Setter, Pe'erly
2017-07-01
Decades of research on the effects of advertising billboards on road accident rates, driver performance, and driver visual scanning behavior, has produced no conclusive findings. We suggest that road safety researchers should shift their focus and attempt to identify the billboard characteristics that are most distracting to drivers. This line of research may produce concrete guidelines for permissible billboards that would be likely to reduce the influence of the billboards on road safety. The current study is a first step towards this end. A pool of 161 photos of real advertising billboards was used as stimuli within a triple task paradigm designed to simulate certain components of driving. Each trial consisted of one ongoing tracking task accompanied by two additional concurrent tasks: (1) billboard observation task; and (2) circle color change identification task. Five clusters of billboards, identified by conducting a cluster analysis of their graphic content, were used as a within variable in one-way ANOVAs conducted on performance level data collected from the multiple tasks. Cluster 5, labeled Loaded Billboards, yielded significantly deteriorated performance on the tracking task. Cluster 4, labeled Graphical Billboards, yielded deteriorated performance primarily on the color change identification task. Cluster 3, labeled Minimal Billboards, had no effect on any of these tasks. We strongly recommend that these clusters be systematically explored in experiments involving additional real driving settings, such as driving simulators and field studies. This will enable validation of the current results and help incorporate them into real driving situations. Copyright © 2017. Published by Elsevier Ltd.
LOD-based clustering techniques for efficient large-scale terrain storage and visualization
NASA Astrophysics Data System (ADS)
Bao, Xiaohong; Pajarola, Renato
2003-05-01
Large multi-resolution terrain data sets are usually stored out-of-core. To visualize terrain data at interactive frame rates, the data needs to be organized on disk, loaded into main memory part by part, then rendered efficiently. Many main-memory algorithms have been proposed for efficient vertex selection and mesh construction. Organization of terrain data on disk is quite difficult because the error, the triangulation dependency and the spatial location of each vertex all need to be considered. Previous terrain clustering algorithms did not consider the per-vertex approximation error of individual terrain data sets. Therefore, the vertex sequences on disk are exactly the same for any terrain. In this paper, we propose a novel clustering algorithm which introduces the level-of-detail (LOD) information to terrain data organization to map multi-resolution terrain data to external memory. In our approach the LOD parameters of the terrain elevation points are reflected during clustering. The experiments show that dynamic loading and paging of terrain data at varying LOD is very efficient and minimizes page faults. Additionally, the preprocessing of this algorithm is very fast and works from out-of-core.
Three-dimensional reconstruction of clustered microcalcifications from two digitized mammograms
NASA Astrophysics Data System (ADS)
Stotzka, Rainer; Mueller, Tim O.; Epper, Wolfgang; Gemmeke, Hartmut
1998-06-01
X-ray mammography is one of the most significant diagnosis methods in early detection of breast cancer. Usually two X- ray images from different angles are taken from each mamma to make even overlapping structures visible. X-ray mammography has a very high spatial resolution and can show microcalcifications of 50 - 200 micron in size. Clusters of microcalcifications are one of the most important and often the only indicator for malignant tumors. These calcifications are in some cases extremely difficult to detect. Computer assisted diagnosis of digitized mammograms may improve detection and interpretation of microcalcifications and cause more reliable diagnostic findings. We build a low-cost mammography workstation to detect and classify clusters of microcalcifications and tissue densities automatically. New in this approach is the estimation of the 3D formation of segmented microcalcifications and its visualization which will put additional diagnostic information at the radiologists disposal. The real problem using only two or three projections for reconstruction is the big loss of volume information. Therefore the arrangement of a cluster is estimated using only the positions of segmented microcalcifications. The arrangement of microcalcifications is visualized to the physician by rotating.
deepTools2: a next generation web server for deep-sequencing data analysis.
Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas
2016-07-08
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mapping white-matter functional organization at rest and during naturalistic visual perception.
Marussich, Lauren; Lu, Kun-Han; Wen, Haiguang; Liu, Zhongming
2017-02-01
Despite the wide applications of functional magnetic resonance imaging (fMRI) to mapping brain activation and connectivity in cortical gray matter, it has rarely been utilized to study white-matter functions. In this study, we investigated the spatiotemporal characteristics of fMRI data within the white matter acquired from humans both in the resting state and while watching a naturalistic movie. By using independent component analysis and hierarchical clustering, resting-state fMRI data in the white matter were de-noised and decomposed into spatially independent components, which were further assembled into hierarchically organized axonal fiber bundles. Interestingly, such components were partly reorganized during natural vision. Relative to resting state, the visual task specifically induced a stronger degree of temporal coherence within the optic radiations, as well as significant correlations between the optic radiations and multiple cortical visual networks. Therefore, fMRI contains rich functional information about the activity and connectivity within white matter at rest and during tasks, challenging the conventional practice of taking white-matter signals as noise or artifacts. Copyright © 2016 Elsevier Inc. All rights reserved.
Yokoyama, Eiji; Uchimura, Masako
2007-11-01
Ninety-five enterohemorrhagic Escherichia coli serovar O157 strains, including 30 strains isolated from 13 intrafamily outbreaks and 14 strains isolated from 3 mass outbreaks, were studied by pulsed-field gel electrophoresis (PFGE) and variable number of tandem repeats (VNTR) typing, and the resulting data were subjected to cluster analysis. Cluster analysis of the VNTR typing data revealed that 57 (60.0%) of 95 strains, including all epidemiologically linked strains, formed clusters with at least 95% similarity. Cluster analysis of the PFGE patterns revealed that 67 (70.5%) of 95 strains, including all but 1 of the epidemiologically linked strains, formed clusters with 90% similarity. The number of epidemiologically unlinked strains forming clusters was significantly less by VNTR cluster analysis than by PFGE cluster analysis. The congruence value between PFGE and VNTR cluster analysis was low and did not show an obvious correlation. With two-step cluster analysis, the number of clustered epidemiologically unlinked strains by PFGE cluster analysis that were divided by subsequent VNTR cluster analysis was significantly higher than the number by VNTR cluster analysis that were divided by subsequent PFGE cluster analysis. These results indicate that VNTR cluster analysis is more efficient than PFGE cluster analysis as an epidemiological tool to trace the transmission of enterohemorrhagic E. coli O157.
VAMPS: a website for visualization and analysis of microbial population structures.
Huse, Susan M; Mark Welch, David B; Voorhis, Andy; Shipunova, Anna; Morrison, Hilary G; Eren, A Murat; Sogin, Mitchell L
2014-02-05
The advent of next-generation DNA sequencing platforms has revolutionized molecular microbial ecology by making the detailed analysis of complex communities over time and space a tractable research pursuit for small research groups. However, the ability to generate 10⁵-10⁸ reads with relative ease brings with it many downstream complications. Beyond the computational resources and skills needed to process and analyze data, it is difficult to compare datasets in an intuitive and interactive manner that leads to hypothesis generation and testing. We developed the free web service VAMPS (Visualization and Analysis of Microbial Population Structures, http://vamps.mbl.edu) to address these challenges and to facilitate research by individuals or collaborating groups working on projects with large-scale sequencing data. Users can upload marker gene sequences and associated metadata; reads are quality filtered and assigned to both taxonomic structures and to taxonomy-independent clusters. A simple point-and-click interface allows users to select for analysis any combination of their own or their collaborators' private data and data from public projects, filter these by their choice of taxonomic and/or abundance criteria, and then explore these data using a wide range of analytic methods and visualizations. Each result is extensively hyperlinked to other analysis and visualization options, promoting data exploration and leading to a greater understanding of data relationships. VAMPS allows researchers using marker gene sequence data to analyze the diversity of microbial communities and the relationships between communities, to explore these analyses in an intuitive visual context, and to download data, results, and images for publication. VAMPS obviates the need for individual research groups to make the considerable investment in computational infrastructure and bioinformatic support otherwise necessary to process, analyze, and interpret massive amounts of next-generation sequence data. Any web-capable device can be used to upload, process, explore, and extract data and results from VAMPS. VAMPS encourages researchers to share sequence and metadata, and fosters collaboration between researchers of disparate biomes who recognize common patterns in shared data.
Pellicer-Chenoll, Maite; Garcia-Massó, Xavier; Morales, Jose; Serra-Añó, Pilar; Solana-Tramunt, Mònica; González, Luis-Millán; Toca-Herrera, José-Luis
2015-06-01
The relationship among physical activity, physical fitness and academic achievement in adolescents has been widely studied; however, controversy concerning this topic persists. The methods used thus far to analyse the relationship between these variables have included mostly traditional lineal analysis according to the available literature. The aim of this study was to perform a visual analysis of this relationship with self-organizing maps and to monitor the subject's evolution during the 4 years of secondary school. Four hundred and forty-four students participated in the study. The physical activity and physical fitness of the participants were measured, and the participants' grade point averages were obtained from the five participant institutions. Four main clusters representing two primary student profiles with few differences between boys and girls were observed. The clustering demonstrated that students with higher energy expenditure and better physical fitness exhibited lower body mass index (BMI) and higher academic performance, whereas those adolescents with lower energy expenditure exhibited worse physical fitness, higher BMI and lower academic performance. With respect to the evolution of the students during the 4 years, ∼25% of the students originally clustered in a negative profile moved to a positive profile, and there was no movement in the opposite direction. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Receptor-driven, multimodal mapping of the human amygdala.
Kedo, Olga; Zilles, Karl; Palomero-Gallagher, Nicola; Schleicher, Axel; Mohlberg, Hartmut; Bludau, Sebastian; Amunts, Katrin
2018-05-01
The human amygdala consists of subdivisions contributing to various functions. However, principles of structural organization at the cellular and molecular level are not well understood. Thus, we re-analyzed the cytoarchitecture of the amygdala and generated cytoarchitectonic probabilistic maps of ten subdivisions in stereotaxic space based on novel workflows and mapping tools. This parcellation was then used as a basis for analyzing the receptor expression for 15 receptor types. Receptor fingerprints, i.e., the characteristic balance between densities of all receptor types, were generated in each subdivision to comprehensively visualize differences and similarities in receptor architecture between the subdivisions. Fingerprints of the central and medial nuclei and the anterior amygdaloid area were highly similar. Fingerprints of the lateral, basolateral and basomedial nuclei were also similar to each other, while those of the remaining nuclei were distinct in shape. Similarities were further investigated by a hierarchical cluster analysis: a two-cluster solution subdivided the phylogenetically older part (central, medial nuclei, anterior amygdaloid area) from the remaining parts of the amygdala. A more fine-grained three-cluster solution replicated our previous parcellation including a laterobasal, superficial and centromedial group. Furthermore, it helped to better characterize the paralaminar nucleus with a molecular organization in-between the laterobasal and the superficial group. The multimodal cyto- and receptor-architectonic analysis of the human amygdala provides new insights into its microstructural organization, intersubject variability, localization in stereotaxic space and principles of receptor-based neurochemical differences.
Janssens, Thomas; Orban, Guy A.
2014-01-01
The retinotopic organization of macaque occipitotemporal cortex rostral to area V4 and caudorostral to the recently described middle temporal (MT) cluster of the monkey (Kolster et al., 2009) is not well established. The proposed number of areas within this region varies from one to four, underscoring the ambiguity concerning the functional organization in this region of extrastriate cortex. We used phase-encoded retinotopic functional MRI mapping methods to reveal the functional topography of this cortical domain. Polar-angle maps showed one complete hemifield representation bordering area V4 anteriorly, split into dorsal and ventral counterparts corresponding to the lower and upper visual field quadrants, respectively. The location of this hemifield representation corresponds to area V4A. More rostroventrally, we identified three other complete hemifield representations. Two of these correspond to the dorsal and the ventral posterior inferotemporal areas (PITd and PITv, respectively) as identified in the Felleman and Van Essen (1991) scheme. The third representation has been tentatively named dorsal occipitotemporal area (OTd). Areas V4A, PITd, PITv, and OTd share a central visual field representation, similar to the areas constituting the MT cluster. Furthermore, they vary widely in size and represent the complete contralateral visual field. Functionally, these four areas show little motion sensitivity, unlike those of the MT cluster, and two of them, OTd and PITd, displayed pronounced two-dimensional shape sensitivity. In general, these results suggest that retinotopically organized tissue extends farther into rostral occipitotemporal cortex of the monkey than generally assumed. PMID:25080580
Weiser, Armin A; Thöns, Christian; Filter, Matthias; Falenski, Alexander; Appel, Bernd; Käsbohrer, Annemarie
2016-01-01
FoodChain-Lab is modular open-source software for trace-back and trace-forward analysis in food-borne disease outbreak investigations. Development of FoodChain-Lab has been driven by a need for appropriate software in several food-related outbreaks in Germany since 2011. The software allows integrated data management, data linkage, enrichment and visualization as well as interactive supply chain analyses. Identification of possible outbreak sources or vehicles is facilitated by calculation of tracing scores for food-handling stations (companies or persons) and food products under investigation. The software also supports consideration of station-specific cross-contamination, analysis of geographical relationships, and topological clustering of the tracing network structure. FoodChain-Lab has been applied successfully in previous outbreak investigations, for example during the 2011 EHEC outbreak and the 2013/14 European hepatitis A outbreak. The software is most useful in complex, multi-area outbreak investigations where epidemiological evidence may be insufficient to discriminate between multiple implicated food products. The automated analysis and visualization components would be of greater value if trading information on food ingredients and compound products was more easily available.
Filter, Matthias; Falenski, Alexander; Appel, Bernd; Käsbohrer, Annemarie
2016-01-01
FoodChain-Lab is modular open-source software for trace-back and trace-forward analysis in food-borne disease outbreak investigations. Development of FoodChain-Lab has been driven by a need for appropriate software in several food-related outbreaks in Germany since 2011. The software allows integrated data management, data linkage, enrichment and visualization as well as interactive supply chain analyses. Identification of possible outbreak sources or vehicles is facilitated by calculation of tracing scores for food-handling stations (companies or persons) and food products under investigation. The software also supports consideration of station-specific cross-contamination, analysis of geographical relationships, and topological clustering of the tracing network structure. FoodChain-Lab has been applied successfully in previous outbreak investigations, for example during the 2011 EHEC outbreak and the 2013/14 European hepatitis A outbreak. The software is most useful in complex, multi-area outbreak investigations where epidemiological evidence may be insufficient to discriminate between multiple implicated food products. The automated analysis and visualization components would be of greater value if trading information on food ingredients and compound products was more easily available. PMID:26985673
EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data.
Linard, Benjamin; Nguyen, Ngoc Hoan; Prosdocimi, Francisco; Poch, Olivier; Thompson, Julie D
2012-01-01
Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes.
Relating interesting quantitative time series patterns with text events and text features
NASA Astrophysics Data System (ADS)
Wanner, Franz; Schreck, Tobias; Jentner, Wolfgang; Sharalieva, Lyubka; Keim, Daniel A.
2013-12-01
In many application areas, the key to successful data analysis is the integrated analysis of heterogeneous data. One example is the financial domain, where time-dependent and highly frequent quantitative data (e.g., trading volume and price information) and textual data (e.g., economic and political news reports) need to be considered jointly. Data analysis tools need to support an integrated analysis, which allows studying the relationships between textual news documents and quantitative properties of the stock market price series. In this paper, we describe a workflow and tool that allows a flexible formation of hypotheses about text features and their combinations, which reflect quantitative phenomena observed in stock data. To support such an analysis, we combine the analysis steps of frequent quantitative and text-oriented data using an existing a-priori method. First, based on heuristics we extract interesting intervals and patterns in large time series data. The visual analysis supports the analyst in exploring parameter combinations and their results. The identified time series patterns are then input for the second analysis step, in which all identified intervals of interest are analyzed for frequent patterns co-occurring with financial news. An a-priori method supports the discovery of such sequential temporal patterns. Then, various text features like the degree of sentence nesting, noun phrase complexity, the vocabulary richness, etc. are extracted from the news to obtain meta patterns. Meta patterns are defined by a specific combination of text features which significantly differ from the text features of the remaining news data. Our approach combines a portfolio of visualization and analysis techniques, including time-, cluster- and sequence visualization and analysis functionality. We provide two case studies, showing the effectiveness of our combined quantitative and textual analysis work flow. The workflow can also be generalized to other application domains such as data analysis of smart grids, cyber physical systems or the security of critical infrastructure, where the data consists of a combination of quantitative and textual time series data.
Analysis of model output and science data in the Virtual Model Repository (VMR).
NASA Astrophysics Data System (ADS)
De Zeeuw, D.; Ridley, A. J.
2014-12-01
Big scientific data not only includes large repositories of data from scientific platforms like satelites and ground observation, but also the vast output of numerical models. The Virtual Model Repository (VMR) provides scientific analysis and visualization tools for a many numerical models of the Earth-Sun system. Individual runs can be analyzed in the VMR and compared to relevant data through relevant metadata, but larger collections of runs can also now be studied and statistics generated on the accuracy and tendancies of model output. The vast model repository at the CCMC with over 1000 simulations of the Earth's magnetosphere was used to look at overall trends in accuracy when compared to satelites such as GOES, Geotail, and Cluster. Methodology for this analysis as well as case studies will be presented.
Abe, Takashi; Hamano, Yuta; Ikemura, Toshimichi
2014-01-01
A strategy of evolutionary studies that can compare vast numbers of genome sequences is becoming increasingly important with the remarkable progress of high-throughput DNA sequencing methods. We previously established a sequence alignment-free clustering method "BLSOM" for di-, tri-, and tetranucleotide compositions in genome sequences, which can characterize sequence characteristics (genome signatures) of a wide range of species. In the present study, we generated BLSOMs for tetra- and pentanucleotide compositions in approximately one million sequence fragments derived from 101 eukaryotes, for which almost complete genome sequences were available. BLSOM recognized phylotype-specific characteristics (e.g., key combinations of oligonucleotide frequencies) in the genome sequences, permitting phylotype-specific clustering of the sequences without any information regarding the species. In our detailed examination of 12 Drosophila species, the correlation between their phylogenetic classification and the classification on the BLSOMs was observed to visualize oligonucleotides diagnostic for species-specific clustering.
A Fine-Scale Functional Logic to Convergence from Retina to Thalamus.
Liang, Liang; Fratzl, Alex; Goldey, Glenn; Ramesh, Rohan N; Sugden, Arthur U; Morgan, Josh L; Chen, Chinfei; Andermann, Mark L
2018-05-31
Numerous well-defined classes of retinal ganglion cells innervate the thalamus to guide image-forming vision, yet the rules governing their convergence and divergence remain unknown. Using two-photon calcium imaging in awake mouse thalamus, we observed a functional arrangement of retinal ganglion cell axonal boutons in which coarse-scale retinotopic ordering gives way to fine-scale organization based on shared preferences for other visual features. Specifically, at the ∼6 μm scale, clusters of boutons from different axons often showed similar preferences for either one or multiple features, including axis and direction of motion, spatial frequency, and changes in luminance. Conversely, individual axons could "de-multiplex" information channels by participating in multiple, functionally distinct bouton clusters. Finally, ultrastructural analyses demonstrated that retinal axonal boutons in a local cluster often target the same dendritic domain. These data suggest that functionally specific convergence and divergence of retinal axons may impart diverse, robust, and often novel feature selectivity to visual thalamus. Copyright © 2018 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shum, Andrew D.; Parkinson, Dilworth Y.; Xiao, Xianghui
The performance of polymer-electrolyte fuel cells is heavily dependent on proper management of liquid water. One particular reason is that liquid water can collect in the gas diffusion layers (GDLs) blocking the reactant flow to the catalyst layer. This results in increased mass-transport losses. At higher temperatures, evaporation of water becomes a dominant water-removal mechanism and specifically phase-change-induced (PCI) flow is present due to thermal gradients. This study used synchrotron based micro X-ray computed tomography (CT) to visualize and quantify the water distribution within gas diffusion layers subject to a thermal gradient. Plotting saturation as a function of through-plane distancemore » quantitatively shows water redistribution, where water evaporates at hotter locations and condenses in colder locations. The morphology of the 2 GDLs on the micro-scale, as well as evaporating water clusters, are resolved, indicating that the GDL voids are slightly prolate, whereas water clusters are oblate. From the mean radii of water distributions and visual inspection, it is observed that larger water clusters evaporate faster than smaller ones.« less
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Sarukkai, Sekhar R.; Mehra, Pankaj; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
This paper presents a methodology for debugging the performance of message-passing programs on both tightly coupled and loosely coupled distributed-memory machines. The AIMS (Automated Instrumentation and Monitoring System) toolkit, a suite of software tools for measurement and analysis of performance, is introduced and its application illustrated using several benchmark programs drawn from the field of computational fluid dynamics. AIMS includes (i) Xinstrument, a powerful source-code instrumentor, which supports both Fortran77 and C as well as a number of different message-passing libraries including Intel's NX Thinking Machines' CMMD, and PVM; (ii) Monitor, a library of timestamping and trace -collection routines that run on supercomputers (such as Intel's iPSC/860, Delta, and Paragon and Thinking Machines' CM5) as well as on networks of workstations (including Convex Cluster and SparcStations connected by a LAN); (iii) Visualization Kernel, a trace-animation facility that supports source-code clickback, simultaneous visualization of computation and communication patterns, as well as analysis of data movements; (iv) Statistics Kernel, an advanced profiling facility, that associates a variety of performance data with various syntactic components of a parallel program; (v) Index Kernel, a diagnostic tool that helps pinpoint performance bottlenecks through the use of abstract indices; (vi) Modeling Kernel, a facility for automated modeling of message-passing programs that supports both simulation -based and analytical approaches to performance prediction and scalability analysis; (vii) Intrusion Compensator, a utility for recovering true performance from observed performance by removing the overheads of monitoring and their effects on the communication pattern of the program; and (viii) Compatibility Tools, that convert AIMS-generated traces into formats used by other performance-visualization tools, such as ParaGraph, Pablo, and certain AVS/Explorer modules.
A bibliometric and visual analysis of global geo-ontology research
NASA Astrophysics Data System (ADS)
Li, Lin; Liu, Yu; Zhu, Haihong; Ying, Shen; Luo, Qinyao; Luo, Heng; Kuai, Xi; Xia, Hui; Shen, Hang
2017-02-01
In this paper, the results of a bibliometric and visual analysis of geo-ontology research articles collected from the Web of Science (WOS) database between 1999 and 2014 are presented. The numbers of national institutions and published papers are visualized and a global research heat map is drawn, illustrating an overview of global geo-ontology research. In addition, we present a chord diagram of countries and perform a visual cluster analysis of a knowledge co-citation network of references, disclosing potential academic communities and identifying key points, main research areas, and future research trends. The International Journal of Geographical Information Science, Progress in Human Geography, and Computers & Geosciences are the most active journals. The USA makes the largest contributions to geo-ontology research by virtue of its highest numbers of independent and collaborative papers, and its dominance was also confirmed in the country chord diagram. The majority of institutions are in the USA, Western Europe, and Eastern Asia. Wuhan University, University of Munster, and the Chinese Academy of Sciences are notable geo-ontology institutions. Keywords such as "Semantic Web," "GIS," and "space" have attracted a great deal of attention. "Semantic granularity in ontology-driven geographic information systems, "Ontologies in support of activities in geographical space" and "A translation approach to portable ontology specifications" have the highest cited centrality. Geographical space, computer-human interaction, and ontology cognition are the three main research areas of geo-ontology. The semantic mismatch between the producers and users of ontology data as well as error propagation in interdisciplinary and cross-linguistic data reuse needs to be solved. In addition, the development of geo-ontology modeling primitives based on OWL (Web Ontology Language)and finding methods to automatically rework data in Semantic Web are needed. Furthermore, the topological relations between geographical entities still require further study.
Davies-Thompson, Jodie; Johnston, Samantha; Tashakkor, Yashar; Pancaroglu, Raika; Barton, Jason J S
2016-08-01
Visual words and faces activate similar networks but with complementary hemispheric asymmetries, faces being lateralized to the right and words to the left. A recent theory proposes that this reflects developmental competition between visual word and face processing. We investigated whether this results in an inverse correlation between the degree of lateralization of visual word and face activation in the fusiform gyri. 26 literate right-handed healthy adults underwent functional MRI with face and word localizers. We derived lateralization indices for cluster size and peak responses for word and face activity in left and right fusiform gyri, and correlated these across subjects. A secondary analysis examined all face- and word-selective voxels in the inferior occipitotemporal cortex. No negative correlations were found. There were positive correlations for the peak MR response between word and face activity within the left hemisphere, and between word activity in the left visual word form area and face activity in the right fusiform face area. The face lateralization index was positively rather than negatively correlated with the word index. In summary, we do not find a complementary relationship between visual word and face lateralization across subjects. The significance of the positive correlations is unclear: some may reflect the influences of general factors such as attention, but others may point to other factors that influence lateralization of function. Copyright © 2016 Elsevier B.V. All rights reserved.
Visualizing frequent patterns in large multivariate time series
NASA Astrophysics Data System (ADS)
Hao, M.; Marwah, M.; Janetzko, H.; Sharma, R.; Keim, D. A.; Dayal, U.; Patnaik, D.; Ramakrishnan, N.
2011-01-01
The detection of previously unknown, frequently occurring patterns in time series, often called motifs, has been recognized as an important task. However, it is difficult to discover and visualize these motifs as their numbers increase, especially in large multivariate time series. To find frequent motifs, we use several temporal data mining and event encoding techniques to cluster and convert a multivariate time series to a sequence of events. Then we quantify the efficiency of the discovered motifs by linking them with a performance metric. To visualize frequent patterns in a large time series with potentially hundreds of nested motifs on a single display, we introduce three novel visual analytics methods: (1) motif layout, using colored rectangles for visualizing the occurrences and hierarchical relationships of motifs in a multivariate time series, (2) motif distortion, for enlarging or shrinking motifs as appropriate for easy analysis and (3) motif merging, to combine a number of identical adjacent motif instances without cluttering the display. Analysts can interactively optimize the degree of distortion and merging to get the best possible view. A specific motif (e.g., the most efficient or least efficient motif) can be quickly detected from a large time series for further investigation. We have applied these methods to two real-world data sets: data center cooling and oil well production. The results provide important new insights into the recurring patterns.
Marmamula, Srinivas; Keeffe, Jill E; Rao, Gullapalli N
2009-01-01
To investigate the prevalence of uncorrected refractive errors, presbyopia and spectacle coverage in subjects aged 15-50 years using rapid assessment methodology in the Mahabubnagar district of Andhra Pradesh, India. A population-based cross sectional study was conducted using cluster random sampling to enumerate 3,300 subjects from 55 clusters. Unaided, aided and pinhole visual acuity was assessed using a LogMAR chart at a distance of 4 meters. Near vision was assessed using N notation chart. Uncorrected refractive error was defined as presenting visual acuity worse than 6/12 but improving to at least 6/12 or better on using a pinhole. Presbyopia is defined as binocular near vision worse than N8 in subjects aged more than 35 years with binocular distance visual acuity of 6/12 or better. Of the 3,300 subjects enumerated from 55 clusters, 3,203 (97%) subjects were available for examination. Of these, 1,496 (46.7%) were females and 930 (29%) were > or = 40 years. Age and gender adjusted prevalence of uncorrected refractive errors causing visual impairment in the better eye was 2.7% (95% CI, 2.1-3.2%). Presbyopia was present in 690 (63.7%, 95% CI, 60.8-66.6%) subjects aged over 35 years. Spectacle coverage for refractive error was 29% and for presbyopia it was 19%. There is a large unmet need for refractive correction in this area in India. Rapid assessment methods are an effective means of assessing the need for services and the impact of models of care.