unsupervised partition method: Topics by Science.gov

Sample records for unsupervised partition method

Unsupervised hierarchical partitioning of hyperspectral images: application to marine algae identification

NASA Astrophysics Data System (ADS)

Chen, B.; Chehdi, K.; De Oliveria, E.; Cariou, C.; Charbonnier, B.

2015-10-01

In this paper a new unsupervised top-down hierarchical classification method to partition airborne hyperspectral images is proposed. The unsupervised approach is preferred because the difficulty of area access and the human and financial resources required to obtain ground truth data, constitute serious handicaps especially over large areas which can be covered by airborne or satellite images. The developed classification approach allows i) a successive partitioning of data into several levels or partitions in which the main classes are first identified, ii) an estimation of the number of classes automatically at each level without any end user help, iii) a nonsystematic subdivision of all classes of a partition Pj to form a partition Pj+1, iv) a stable partitioning result of the same data set from one run of the method to another. The proposed approach was validated on synthetic and real hyperspectral images related to the identification of several marine algae species. In addition to highly accurate and consistent results (correct classification rate over 99%), this approach is completely unsupervised. It estimates at each level, the optimal number of classes and the final partition without any end user intervention.
Semi-supervised clustering for parcellating brain regions based on resting state fMRI data

NASA Astrophysics Data System (ADS)

Cheng, Hewei; Fan, Yong

2014-03-01

Many unsupervised clustering techniques have been adopted for parcellating brain regions of interest into functionally homogeneous subregions based on resting state fMRI data. However, the unsupervised clustering techniques are not able to take advantage of exiting knowledge of the functional neuroanatomy readily available from studies of cytoarchitectonic parcellation or meta-analysis of the literature. In this study, we propose a semi-supervised clustering method for parcellating amygdala into functionally homogeneous subregions based on resting state fMRI data. Particularly, the semi-supervised clustering is implemented under the framework of graph partitioning, and adopts prior information and spatial consistent constraints to obtain a spatially contiguous parcellation result. The graph partitioning problem is solved using an efficient algorithm similar to the well-known weighted kernel k-means algorithm. Our method has been validated for parcellating amygdala into 3 subregions based on resting state fMRI data of 28 subjects. The experiment results have demonstrated that the proposed method is more robust than unsupervised clustering and able to parcellate amygdala into centromedial, laterobasal, and superficial parts with improved functionally homogeneity compared with the cytoarchitectonic parcellation result. The validity of the parcellation results is also supported by distinctive functional and structural connectivity patterns of the subregions and high consistency between coactivation patterns derived from a meta-analysis and functional connectivity patterns of corresponding subregions.
Comparative study of feature selection with ensemble learning using SOM variants

NASA Astrophysics Data System (ADS)

Filali, Ameni; Jlassi, Chiraz; Arous, Najet

2017-03-01

Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains.
Unsupervised segmentation of MRI knees using image partition forests

NASA Astrophysics Data System (ADS)

Marčan, Marija; Voiculescu, Irina

2016-03-01

Nowadays many people are affected by arthritis, a condition of the joints with limited prevention measures, but with various options of treatment the most radical of which is surgical. In order for surgery to be successful, it can make use of careful analysis of patient-based models generated from medical images, usually by manual segmentation. In this work we show how to automate the segmentation of a crucial and complex joint -- the knee. To achieve this goal we rely on our novel way of representing a 3D voxel volume as a hierarchical structure of partitions which we have named Image Partition Forest (IPF). The IPF contains several partition layers of increasing coarseness, with partitions nested across layers in the form of adjacency graphs. On the basis of a set of properties (size, mean intensity, coordinates) of each node in the IPF we classify nodes into different features. Values indicating whether or not any particular node belongs to the femur or tibia are assigned through node filtering and node-based region growing. So far we have evaluated our method on 15 MRI knee images. Our unsupervised segmentation compared against a hand-segmented gold standard has achieved an average Dice similarity coefficient of 0.95 for femur and 0.93 for tibia, and an average symmetric surface distance of 0.98 mm for femur and 0.73 mm for tibia. The paper also discusses ways to introduce stricter morphological and spatial conditioning in the bone labelling process.
Nonequilibrium thermodynamics of restricted Boltzmann machines.

PubMed

Salazar, Domingos S P

2017-08-01

In this work, we analyze the nonequilibrium thermodynamics of a class of neural networks known as restricted Boltzmann machines (RBMs) in the context of unsupervised learning. We show how the network is described as a discrete Markov process and how the detailed balance condition and the Maxwell-Boltzmann equilibrium distribution are sufficient conditions for a complete thermodynamics description, including nonequilibrium fluctuation theorems. Numerical simulations in a fully trained RBM are performed and the heat exchange fluctuation theorem is verified with excellent agreement to the theory. We observe how the contrastive divergence functional, mostly used in unsupervised learning of RBMs, is closely related to nonequilibrium thermodynamic quantities. We also use the framework to interpret the estimation of the partition function of RBMs with the annealed importance sampling method from a thermodynamics standpoint. Finally, we argue that unsupervised learning of RBMs is equivalent to a work protocol in a system driven by the laws of thermodynamics in the absence of labeled data.
Generalized Wishart Mixtures for Unsupervised Classification of PolSAR Data

NASA Astrophysics Data System (ADS)

Li, Lan; Chen, Erxue; Li, Zengyuan

2013-01-01

This paper presents an unsupervised clustering algorithm based upon the expectation maximization (EM) algorithm for finite mixture modelling, using the complex wishart probability density function (PDF) for the probabilities. The mixture model enables to consider heterogeneous thematic classes which could not be better fitted by the unimodal wishart distribution. In order to make it fast and robust to calculate, we use the recently proposed generalized gamma distribution (GΓD) for the single polarization intensity data to make the initial partition. Then we use the wishart probability density function for the corresponding sample covariance matrix to calculate the posterior class probabilities for each pixel. The posterior class probabilities are used for the prior probability estimates of each class and weights for all class parameter updates. The proposed method is evaluated and compared with the wishart H-Alpha-A classification. Preliminary results show that the proposed method has better performance.
A mesh partitioning algorithm for preserving spatial locality in arbitrary geometries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nivarti, Girish V., E-mail: g.nivarti@alumni.ubc.ca; Salehi, M. Mahdi; Bushe, W. Kendal

2015-01-15

Highlights: •An algorithm for partitioning computational meshes is proposed. •The Morton order space-filling curve is modified to achieve improved locality. •A spatial locality metric is defined to compare results with existing approaches. •Results indicate improved performance of the algorithm in complex geometries. -- Abstract: A space-filling curve (SFC) is a proximity preserving linear mapping of any multi-dimensional space and is widely used as a clustering tool. Equi-sized partitioning of an SFC ignores the loss in clustering quality that occurs due to inaccuracies in the mapping. Often, this results in poor locality within partitions, especially for the conceptually simple, Morton ordermore » curves. We present a heuristic that improves partition locality in arbitrary geometries by slicing a Morton order curve at points where spatial locality is sacrificed. In addition, we develop algorithms that evenly distribute points to the extent possible while maintaining spatial locality. A metric is defined to estimate relative inter-partition contact as an indicator of communication in parallel computing architectures. Domain partitioning tests have been conducted on geometries relevant to turbulent reactive flow simulations. The results obtained highlight the performance of our method as an unsupervised and computationally inexpensive domain partitioning tool.« less
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

PubMed Central

Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

2016-01-01

Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939
Unsupervised Spatial, Temporal and Relational Models for Social Processes

DTIC Science & Technology

2012-02-01

Andrej Mrvar . A partitioning approach to structural balance. Social Networks, 18(2):149 – 168, 1996 . [37] Thi V. Duong, Hung H. Bui, Dinh Q. Phung, and...partitioning provided by Doreian and Mrvar [36], who demonstrate that there was increasing evidence over time that 62 CHAPTER 4. COMMUNITY DETECTION this...foursome was a genuine group. Doreian and Mrvar used a block modeling approach optimiz- ing structural balance, a measure of cohesion incorporating
A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

NASA Astrophysics Data System (ADS)

Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

2018-01-01

This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.
Hyperspectral image segmentation using a cooperative nonparametric approach

NASA Astrophysics Data System (ADS)

Taher, Akar; Chehdi, Kacem; Cariou, Claude

2013-10-01

In this paper a new unsupervised nonparametric cooperative and adaptive hyperspectral image segmentation approach is presented. The hyperspectral images are partitioned band by band in parallel and intermediate classification results are evaluated and fused, to get the final segmentation result. Two unsupervised nonparametric segmentation methods are used in parallel cooperation, namely the Fuzzy C-means (FCM) method, and the Linde-Buzo-Gray (LBG) algorithm, to segment each band of the image. The originality of the approach relies firstly on its local adaptation to the type of regions in an image (textured, non-textured), and secondly on the introduction of several levels of evaluation and validation of intermediate segmentation results before obtaining the final partitioning of the image. For the management of similar or conflicting results issued from the two classification methods, we gradually introduced various assessment steps that exploit the information of each spectral band and its adjacent bands, and finally the information of all the spectral bands. In our approach, the detected textured and non-textured regions are treated separately from feature extraction step, up to the final classification results. This approach was first evaluated on a large number of monocomponent images constructed from the Brodatz album. Then it was evaluated on two real applications using a respectively multispectral image for Cedar trees detection in the region of Baabdat (Lebanon) and a hyperspectral image for identification of invasive and non invasive vegetation in the region of Cieza (Spain). A correct classification rate (CCR) for the first application is over 97% and for the second application the average correct classification rate (ACCR) is over 99%.
IMMAN: free software for information theory-based chemometric analysis.

PubMed

Urias, Ricardo W Pino; Barigye, Stephen J; Marrero-Ponce, Yovani; García-Jacas, César R; Valdes-Martiní, José R; Perez-Gimenez, Facundo

2015-05-01

The features and theoretical background of a new and free computational program for chemometric analysis denominated IMMAN (acronym for Information theory-based CheMoMetrics ANalysis) are presented. This is multi-platform software developed in the Java programming language, designed with a remarkably user-friendly graphical interface for the computation of a collection of information-theoretic functions adapted for rank-based unsupervised and supervised feature selection tasks. A total of 20 feature selection parameters are presented, with the unsupervised and supervised frameworks represented by 10 approaches in each case. Several information-theoretic parameters traditionally used as molecular descriptors (MDs) are adapted for use as unsupervised rank-based feature selection methods. On the other hand, a generalization scheme for the previously defined differential Shannon's entropy is discussed, as well as the introduction of Jeffreys information measure for supervised feature selection. Moreover, well-known information-theoretic feature selection parameters, such as information gain, gain ratio, and symmetrical uncertainty are incorporated to the IMMAN software ( http://mobiosd-hub.com/imman-soft/ ), following an equal-interval discretization approach. IMMAN offers data pre-processing functionalities, such as missing values processing, dataset partitioning, and browsing. Moreover, single parameter or ensemble (multi-criteria) ranking options are provided. Consequently, this software is suitable for tasks like dimensionality reduction, feature ranking, as well as comparative diversity analysis of data matrices. Simple examples of applications performed with this program are presented. A comparative study between IMMAN and WEKA feature selection tools using the Arcene dataset was performed, demonstrating similar behavior. In addition, it is revealed that the use of IMMAN unsupervised feature selection methods improves the performance of both IMMAN and WEKA supervised algorithms. Graphic representation for Shannon's distribution of MD calculating software.
Ocean surface partitioning strategies using ocean colour remote Sensing: A review

NASA Astrophysics Data System (ADS)

Krug, Lilian Anne; Platt, Trevor; Sathyendranath, Shubha; Barbosa, Ana B.

2017-06-01

The ocean surface is organized into regions with distinct properties reflecting the complexity of interactions between environmental forcing and biological responses. The delineation of these functional units, each with unique, homogeneous properties and underlying ecosystem structure and dynamics, can be defined as ocean surface partitioning. The main purposes and applications of ocean partitioning include the evaluation of particular marine environments; generation of more accurate satellite ocean colour products; assimilation of data into biogeochemical and climate models; and establishment of ecosystem-based management practices. This paper reviews the diverse approaches implemented for ocean surface partition into functional units, using ocean colour remote sensing (OCRS) data, including their purposes, criteria, methods and scales. OCRS offers a synoptic, high spatial-temporal resolution, multi-decadal coverage of bio-optical properties, relevant to the applications and value of ocean surface partitioning. In combination with other biotic and/or abiotic data, OCRS-derived data (e.g., chlorophyll-a, optical properties) provide a broad and varied source of information that can be analysed using different delineation methods derived from subjective, expert-based to unsupervised learning approaches (e.g., cluster, fuzzy and empirical orthogonal function analyses). Partition schemes are applied at global to mesoscale spatial coverage, with static (time-invariant) or dynamic (time-varying) representations. A case study, the highly heterogeneous area off SW Iberian Peninsula (NE Atlantic), illustrates how the selection of spatial coverage and temporal representation affects the discrimination of distinct environmental drivers of phytoplankton variability. Advances in operational oceanography and in the subject area of satellite ocean colour, including development of new sensors, algorithms and products, are among the potential benefits from extended use, scope and applications of ocean surface partitioning using OCRS.
Information-Based Approach to Unsupervised Machine Learning

DTIC Science & Technology

2013-06-19

Leibler , R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86. Minka, T. P. (2000). Old and new matrix algebra use ...and Arabie, P. Comparing partitions. Journal of Classification, 2(1):193–218, 1985. Kullback , S. and Leibler , R. A. On information and suf- ficiency...the test input density to a lin- ear combination of class-wise input distributions under the Kullback - Leibler (KL) divergence ( Kullback
Semi-supervised clustering methods

PubMed Central

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as “semi-supervised clustering” methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided. PMID:24729830
Semi-supervised clustering methods.

PubMed

Bair, Eric

2013-01-01

Cluster analysis methods seek to partition a data set into homogeneous subgroups. It is useful in a wide variety of applications, including document processing and modern genetics. Conventional clustering methods are unsupervised, meaning that there is no outcome variable nor is anything known about the relationship between the observations in the data set. In many situations, however, information about the clusters is available in addition to the values of the features. For example, the cluster labels of some observations may be known, or certain observations may be known to belong to the same cluster. In other cases, one may wish to identify clusters that are associated with a particular outcome variable. This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in these situations. The majority of these methods are modifications of the popular k-means clustering method, and several of them will be described in detail. A brief description of some other semi-supervised clustering algorithms is also provided.
An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

PubMed

Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K

2003-11-01

Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Unsupervised Deep Hashing With Pseudo Labels for Scalable Image Retrieval.

PubMed

Zhang, Haofeng; Liu, Li; Long, Yang; Shao, Ling

2018-04-01

In order to achieve efficient similarity searching, hash functions are designed to encode images into low-dimensional binary codes with the constraint that similar features will have a short distance in the projected Hamming space. Recently, deep learning-based methods have become more popular, and outperform traditional non-deep methods. However, without label information, most state-of-the-art unsupervised deep hashing (DH) algorithms suffer from severe performance degradation for unsupervised scenarios. One of the main reasons is that the ad-hoc encoding process cannot properly capture the visual feature distribution. In this paper, we propose a novel unsupervised framework that has two main contributions: 1) we convert the unsupervised DH model into supervised by discovering pseudo labels; 2) the framework unifies likelihood maximization, mutual information maximization, and quantization error minimization so that the pseudo labels can maximumly preserve the distribution of visual features. Extensive experiments on three popular data sets demonstrate the advantages of the proposed method, which leads to significant performance improvement over the state-of-the-art unsupervised hashing algorithms.
Leveraging unsupervised training sets for multi-scale compartmentalization in renal pathology

NASA Astrophysics Data System (ADS)

Lutnick, Brendon; Tomaszewski, John E.; Sarder, Pinaki

2017-03-01

Clinical pathology relies on manual compartmentalization and quantification of biological structures, which is time consuming and often error-prone. Application of computer vision segmentation algorithms to histopathological image analysis, in contrast, can offer fast, reproducible, and accurate quantitative analysis to aid pathologists. Algorithms tunable to different biologically relevant structures can allow accurate, precise, and reproducible estimates of disease states. In this direction, we have developed a fast, unsupervised computational method for simultaneously separating all biologically relevant structures from histopathological images in multi-scale. Segmentation is achieved by solving an energy optimization problem. Representing the image as a graph, nodes (pixels) are grouped by minimizing a Potts model Hamiltonian, adopted from theoretical physics, modeling interacting electron spins. Pixel relationships (modeled as edges) are used to update the energy of the partitioned graph. By iteratively improving the clustering, the optimal number of segments is revealed. To reduce computational time, the graph is simplified using a Cantor pairing function to intelligently reduce the number of included nodes. The classified nodes are then used to train a multiclass support vector machine to apply the segmentation over the full image. Accurate segmentations of images with as many as 106 pixels can be completed only in 5 sec, allowing for attainable multi-scale visualization. To establish clinical potential, we employed our method in renal biopsies to quantitatively visualize for the first time scale variant compartments of heterogeneous intra- and extraglomerular structures simultaneously. Implications of the utility of our method extend to fields such as oncology, genomics, and non-biological problems.
Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease.

PubMed

Taguchi, Y-h; Iwadate, Mitsuo; Umeyama, Hideaki

2015-04-30

Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery.

Novel Histogram Based Unsupervised Classification Technique to Determine Natural Classes From Biophysically Relevant Fit Parameters to Hyperspectral Data

DOE PAGES

McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra; ...

2017-05-23

Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less
Novel Histogram Based Unsupervised Classification Technique to Determine Natural Classes From Biophysically Relevant Fit Parameters to Hyperspectral Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra

Hyperspectral image analysis has benefited from an array of methods that take advantage of the increased spectral depth compared to multispectral sensors; however, the focus of these developments has been on supervised classification methods. Lack of a priori knowledge regarding land cover characteristics can make unsupervised classification methods preferable under certain circumstances. An unsupervised classification technique is presented in this paper that utilizes physically relevant basis functions to model the reflectance spectra. These fit parameters used to generate the basis functions allow clustering based on spectral characteristics rather than spectral channels and provide both noise and data reduction. Histogram splittingmore » of the fit parameters is then used as a means of producing an unsupervised classification. Unlike current unsupervised classification techniques that rely primarily on Euclidian distance measures to determine similarity, the unsupervised classification technique uses the natural splitting of the fit parameters associated with the basis functions creating clusters that are similar in terms of physical parameters. The data set used in this work utilizes the publicly available data collected at Indian Pines, Indiana. This data set provides reference data allowing for comparisons of the efficacy of different unsupervised data analysis. The unsupervised histogram splitting technique presented in this paper is shown to be better than the standard unsupervised ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. Finally, this improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA.« less
Unsupervised learning on scientific ocean drilling datasets from the South China Sea

NASA Astrophysics Data System (ADS)

Tse, Kevin C.; Chiu, Hon-Chim; Tsang, Man-Yin; Li, Yiliang; Lam, Edmund Y.

2018-06-01

Unsupervised learning methods were applied to explore data patterns in multivariate geophysical datasets collected from ocean floor sediment core samples coming from scientific ocean drilling in the South China Sea. Compared to studies on similar datasets, but using supervised learning methods which are designed to make predictions based on sample training data, unsupervised learning methods require no a priori information and focus only on the input data. In this study, popular unsupervised learning methods including K-means, self-organizing maps, hierarchical clustering and random forest were coupled with different distance metrics to form exploratory data clusters. The resulting data clusters were externally validated with lithologic units and geologic time scales assigned to the datasets by conventional methods. Compact and connected data clusters displayed varying degrees of correspondence with existing classification by lithologic units and geologic time scales. K-means and self-organizing maps were observed to perform better with lithologic units while random forest corresponded best with geologic time scales. This study sets a pioneering example of how unsupervised machine learning methods can be used as an automatic processing tool for the increasingly high volume of scientific ocean drilling data.
A Novel Unsupervised Segmentation Quality Evaluation Method for Remote Sensing Images

PubMed Central

Tang, Yunwei; Jing, Linhai; Ding, Haifeng

2017-01-01

The segmentation of a high spatial resolution remote sensing image is a critical step in geographic object-based image analysis (GEOBIA). Evaluating the performance of segmentation without ground truth data, i.e., unsupervised evaluation, is important for the comparison of segmentation algorithms and the automatic selection of optimal parameters. This unsupervised strategy currently faces several challenges in practice, such as difficulties in designing effective indicators and limitations of the spectral values in the feature representation. This study proposes a novel unsupervised evaluation method to quantitatively measure the quality of segmentation results to overcome these problems. In this method, multiple spectral and spatial features of images are first extracted simultaneously and then integrated into a feature set to improve the quality of the feature representation of ground objects. The indicators designed for spatial stratified heterogeneity and spatial autocorrelation are included to estimate the properties of the segments in this integrated feature set. These two indicators are then combined into a global assessment metric as the final quality score. The trade-offs of the combined indicators are accounted for using a strategy based on the Mahalanobis distance, which can be exhibited geometrically. The method is tested on two segmentation algorithms and three testing images. The proposed method is compared with two existing unsupervised methods and a supervised method to confirm its capabilities. Through comparison and visual analysis, the results verified the effectiveness of the proposed method and demonstrated the reliability and improvements of this method with respect to other methods. PMID:29064416
Color normalization of histology slides using graph regularized sparse NMF

NASA Astrophysics Data System (ADS)

Sha, Lingdao; Schonfeld, Dan; Sethi, Amit

2017-03-01

Computer based automatic medical image processing and quantification are becoming popular in digital pathology. However, preparation of histology slides can vary widely due to differences in staining equipment, procedures and reagents, which can reduce the accuracy of algorithms that analyze their color and texture information. To re- duce the unwanted color variations, various supervised and unsupervised color normalization methods have been proposed. Compared with supervised color normalization methods, unsupervised color normalization methods have advantages of time and cost efficient and universal applicability. Most of the unsupervised color normaliza- tion methods for histology are based on stain separation. Based on the fact that stain concentration cannot be negative and different parts of the tissue absorb different stains, nonnegative matrix factorization (NMF), and particular its sparse version (SNMF), are good candidates for stain separation. However, most of the existing unsupervised color normalization method like PCA, ICA, NMF and SNMF fail to consider important information about sparse manifolds that its pixels occupy, which could potentially result in loss of texture information during color normalization. Manifold learning methods like Graph Laplacian have proven to be very effective in interpreting high-dimensional data. In this paper, we propose a novel unsupervised stain separation method called graph regularized sparse nonnegative matrix factorization (GSNMF). By considering the sparse prior of stain concentration together with manifold information from high-dimensional image data, our method shows better performance in stain color deconvolution than existing unsupervised color deconvolution methods, especially in keeping connected texture information. To utilized the texture information, we construct a nearest neighbor graph between pixels within a spatial area of an image based on their distances using heat kernal in lαβ space. The representation of a pixel in the stain density space is constrained to follow the feature distance of the pixel to pixels in the neighborhood graph. Utilizing color matrix transfer method with the stain concentrations found using our GSNMF method, the color normalization performance was also better than existing methods.
Multispectral and Panchromatic used Enhancement Resolution and Study Effective Enhancement on Supervised and Unsupervised Classification Land – Cover

NASA Astrophysics Data System (ADS)

Salman, S. S.; Abbas, W. A.

2018-05-01

The goal of the study is to support analysis Enhancement of Resolution and study effect on classification methods on bands spectral information of specific and quantitative approaches. In this study introduce a method to enhancement resolution Landsat 8 of combining the bands spectral of 30 meters resolution with panchromatic band 8 of 15 meters resolution, because of importance multispectral imagery to extracting land - cover. Classification methods used in this study to classify several lands -covers recorded from OLI- 8 imagery. Two methods of Data mining can be classified as either supervised or unsupervised. In supervised methods, there is a particular predefined target, that means the algorithm learn which values of the target are associated with which values of the predictor sample. K-nearest neighbors and maximum likelihood algorithms examine in this work as supervised methods. In other hand, no sample identified as target in unsupervised methods, the algorithm of data extraction searches for structure and patterns between all the variables, represented by Fuzzy C-mean clustering method as one of the unsupervised methods, NDVI vegetation index used to compare the results of classification method, the percent of dense vegetation in maximum likelihood method give a best results.
Teacher and learner: Supervised and unsupervised learning in communities.

PubMed

Shafto, Michael G; Seifert, Colleen M

2015-01-01

How far can teaching methods go to enhance learning? Optimal methods of teaching have been considered in research on supervised and unsupervised learning. Locally optimal methods are usually hybrids of teaching and self-directed approaches. The costs and benefits of specific methods have been shown to depend on the structure of the learning task, the learners, the teachers, and the environment.
Unsupervised classification of multivariate geostatistical data: Two algorithms

NASA Astrophysics Data System (ADS)

Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

2015-12-01

With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.
A consensus embedding approach for segmentation of high resolution in vivo prostate magnetic resonance imagery

NASA Astrophysics Data System (ADS)

Viswanath, Satish; Rosen, Mark; Madabhushi, Anant

2008-03-01

Current techniques for localization of prostatic adenocarcinoma (CaP) via blinded trans-rectal ultrasound biopsy are associated with a high false negative detection rate. While high resolution endorectal in vivo Magnetic Resonance (MR) prostate imaging has been shown to have improved contrast and resolution for CaP detection over ultrasound, similarity in intensity characteristics between benign and cancerous regions on MR images contribute to a high false positive detection rate. In this paper, we present a novel unsupervised segmentation method that employs manifold learning via consensus schemes for detection of cancerous regions from high resolution 1.5 Tesla (T) endorectal in vivo prostate MRI. A significant contribution of this paper is a method to combine multiple weak, lower-dimensional representations of high dimensional feature data in a way analogous to classifier ensemble schemes, and hence create a stable and accurate reduced dimensional representation. After correcting for MR image intensity artifacts, such as bias field inhomogeneity and intensity non-standardness, our algorithm extracts over 350 3D texture features at every spatial location in the MR scene at multiple scales and orientations. Non-linear dimensionality reduction schemes such as Locally Linear Embedding (LLE) and Graph Embedding (GE) are employed to create multiple low dimensional data representations of this high dimensional texture feature space. Our novel consensus embedding method is used to average object adjacencies from within the multiple low dimensional projections so that class relationships are preserved. Unsupervised consensus clustering is then used to partition the objects in this consensus embedding space into distinct classes. Quantitative evaluation on 18 1.5 T prostate MR data against corresponding histology obtained from the multi-site ACRIN trials show a sensitivity of 92.65% and a specificity of 82.06%, which suggests that our method is successfully able to detect suspicious regions in the prostate.
A comparative analysis of pixel- and object-based detection of landslides from very high-resolution images

NASA Astrophysics Data System (ADS)

Keyport, Ren N.; Oommen, Thomas; Martha, Tapas R.; Sajinkumar, K. S.; Gierke, John S.

2018-02-01

A comparative analysis of landslides detected by pixel-based and object-oriented analysis (OOA) methods was performed using very high-resolution (VHR) remotely sensed aerial images for the San Juan La Laguna, Guatemala, which witnessed widespread devastation during the 2005 Hurricane Stan. A 3-band orthophoto of 0.5 m spatial resolution together with a 115 field-based landslide inventory were used for the analysis. A binary reference was assigned with a zero value for landslide and unity for non-landslide pixels. The pixel-based analysis was performed using unsupervised classification, which resulted in 11 different trial classes. Detection of landslides using OOA includes 2-step K-means clustering to eliminate regions based on brightness; elimination of false positives using object properties such as rectangular fit, compactness, length/width ratio, mean difference of objects, and slope angle. Both overall accuracy and F-score for OOA methods outperformed pixel-based unsupervised classification methods in both landslide and non-landslide classes. The overall accuracy for OOA and pixel-based unsupervised classification was 96.5% and 94.3%, respectively, whereas the best F-score for landslide identification for OOA and pixel-based unsupervised methods: were 84.3% and 77.9%, respectively.Results indicate that the OOA is able to identify the majority of landslides with a few false positive when compared to pixel-based unsupervised classification.
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

PubMed Central

Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

2015-01-01

Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
Detection of food intake from swallowing sequences by supervised and unsupervised methods.

PubMed

Lopez-Meyer, Paulo; Makeyev, Oleksandr; Schuckers, Stephanie; Melanson, Edward L; Neuman, Michael R; Sazonov, Edward

2010-08-01

Studies of food intake and ingestive behavior in free-living conditions most often rely on self-reporting-based methods that can be highly inaccurate. Methods of Monitoring of Ingestive Behavior (MIB) rely on objective measures derived from chewing and swallowing sequences and thus can be used for unbiased study of food intake with free-living conditions. Our previous study demonstrated accurate detection of food intake in simple models relying on observation of both chewing and swallowing. This article investigates methods that achieve comparable accuracy of food intake detection using only the time series of swallows and thus eliminating the need for the chewing sensor. The classification is performed for each individual swallow rather than for previously used time slices and thus will lead to higher accuracy in mass prediction models relying on counts of swallows. Performance of a group model based on a supervised method (SVM) is compared to performance of individual models based on an unsupervised method (K-means) with results indicating better performance of the unsupervised, self-adapting method. Overall, the results demonstrate that highly accurate detection of intake of foods with substantially different physical properties is possible by an unsupervised system that relies on the information provided by the swallowing alone.
Detection of Food Intake from Swallowing Sequences by Supervised and Unsupervised Methods

PubMed Central

Lopez-Meyer, Paulo; Makeyev, Oleksandr; Schuckers, Stephanie; Melanson, Edward L.; Neuman, Michael R.; Sazonov, Edward

2010-01-01

Studies of food intake and ingestive behavior in free-living conditions most often rely on self-reporting-based methods that can be highly inaccurate. Methods of Monitoring of Ingestive Behavior (MIB) rely on objective measures derived from chewing and swallowing sequences and thus can be used for unbiased study of food intake with free-living conditions. Our previous study demonstrated accurate detection of food intake in simple models relying on observation of both chewing and swallowing. This article investigates methods that achieve comparable accuracy of food intake detection using only the time series of swallows and thus eliminating the need for the chewing sensor. The classification is performed for each individual swallow rather than for previously used time slices and thus will lead to higher accuracy in mass prediction models relying on counts of swallows. Performance of a group model based on a supervised method (SVM) is compared to performance of individual models based on an unsupervised method (K-means) with results indicating better performance of the unsupervised, self-adapting method. Overall, the results demonstrate that highly accurate detection of intake of foods with substantially different physical properties is possible by an unsupervised system that relies on the information provided by the swallowing alone. PMID:20352335
Unsupervised frequency-recognition method of SSVEPs using a filter bank implementation of binary subband CCA.

PubMed

Rabiul Islam, Md; Khademul Islam Molla, Md; Nakanishi, Masaki; Tanaka, Toshihisa

2017-04-01

Recently developed effective methods for detection commands of steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) that need calibration for visual stimuli, which cause more time and fatigue prior to the use, as the number of commands increases. This paper develops a novel unsupervised method based on canonical correlation analysis (CCA) for accurate detection of stimulus frequency. A novel unsupervised technique termed as binary subband CCA (BsCCA) is implemented in a multiband approach to enhance the frequency recognition performance of SSVEP. In BsCCA, two subbands are used and a CCA-based correlation coefficient is computed for the individual subbands. In addition, a reduced set of artificial reference signals is used to calculate CCA for the second subband. The analyzing SSVEP is decomposed into multiple subband and the BsCCA is implemented for each one. Then, the overall recognition score is determined by a weighted sum of the canonical correlation coefficients obtained from each band. A 12-class SSVEP dataset (frequency range: 9.25-14.75 Hz with an interval of 0.5 Hz) for ten healthy subjects are used to evaluate the performance of the proposed method. The results suggest that BsCCA significantly improves the performance of SSVEP-based BCI compared to the state-of-the-art methods. The proposed method is an unsupervised approach with averaged information transfer rate (ITR) of 77.04 bits min -1 across 10 subjects. The maximum individual ITR is 107.55 bits min -1 for 12-class SSVEP dataset, whereas, the ITR of 69.29 and 69.44 bits min -1 are achieved with CCA and NCCA respectively. The statistical test shows that the proposed unsupervised method significantly improves the performance of the SSVEP-based BCI. It can be usable in real world applications.
Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification.

PubMed

Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V; Robles, Montserrat; Aparici, F; Martí-Bonmatí, L; García-Gómez, Juan M

2015-01-01

Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation.
Unsupervised automated high throughput phenotyping of RNAi time-lapse movies.

PubMed

Failmezger, Henrik; Fröhlich, Holger; Tresch, Achim

2013-10-04

Gene perturbation experiments in combination with fluorescence time-lapse cell imaging are a powerful tool in reverse genetics. High content applications require tools for the automated processing of the large amounts of data. These tools include in general several image processing steps, the extraction of morphological descriptors, and the grouping of cells into phenotype classes according to their descriptors. This phenotyping can be applied in a supervised or an unsupervised manner. Unsupervised methods are suitable for the discovery of formerly unknown phenotypes, which are expected to occur in high-throughput RNAi time-lapse screens. We developed an unsupervised phenotyping approach based on Hidden Markov Models (HMMs) with multivariate Gaussian emissions for the detection of knockdown-specific phenotypes in RNAi time-lapse movies. The automated detection of abnormal cell morphologies allows us to assign a phenotypic fingerprint to each gene knockdown. By applying our method to the Mitocheck database, we show that a phenotypic fingerprint is indicative of a gene's function. Our fully unsupervised HMM-based phenotyping is able to automatically identify cell morphologies that are specific for a certain knockdown. Beyond the identification of genes whose knockdown affects cell morphology, phenotypic fingerprints can be used to find modules of functionally related genes.
Object-oriented feature-tracking algorithms for SAR images of the marginal ice zone

NASA Technical Reports Server (NTRS)

Daida, Jason; Samadani, Ramin; Vesecky, John F.

1990-01-01

An unsupervised method that chooses and applies the most appropriate tracking algorithm from among different sea-ice tracking algorithms is reported. In contrast to current unsupervised methods, this method chooses and applies an algorithm by partially examining a sequential image pair to draw inferences about what was examined. Based on these inferences the reported method subsequently chooses which algorithm to apply to specific areas of the image pair where that algorithm should work best.
A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG) Pattern Recognition

PubMed Central

Huang, Qi; Yang, Dapeng; Jiang, Li; Zhang, Huajie; Liu, Hong; Kotani, Kiyoshi

2017-01-01

Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC), by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC). We compared PAC performance with incremental support vector classifier (ISVC) and non-adapting SVC (NSVC) in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05) and ISVC (13.38% ± 2.62%, p = 0.001), and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle). PMID:28608824
A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG) Pattern Recognition.

PubMed

Huang, Qi; Yang, Dapeng; Jiang, Li; Zhang, Huajie; Liu, Hong; Kotani, Kiyoshi

2017-06-13

Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC), by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC). We compared PAC performance with incremental support vector classifier (ISVC) and non-adapting SVC (NSVC) in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05) and ISVC (13.38% ± 2.62%, p = 0.001), and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle).
Supervised and Unsupervised Aspect Category Detection for Sentiment Analysis with Co-occurrence Data.

PubMed

Schouten, Kim; van der Weijde, Onne; Frasincar, Flavius; Dekker, Rommert

2018-04-01

Using online consumer reviews as electronic word of mouth to assist purchase-decision making has become increasingly popular. The Web provides an extensive source of consumer reviews, but one can hardly read all reviews to obtain a fair evaluation of a product or service. A text processing framework that can summarize reviews, would therefore be desirable. A subtask to be performed by such a framework would be to find the general aspect categories addressed in review sentences, for which this paper presents two methods. In contrast to most existing approaches, the first method presented is an unsupervised method that applies association rule mining on co-occurrence frequency data obtained from a corpus to find these aspect categories. While not on par with state-of-the-art supervised methods, the proposed unsupervised method performs better than several simple baselines, a similar but supervised method, and a supervised baseline, with an -score of 67%. The second method is a supervised variant that outperforms existing methods with an -score of 84%.

Evaluating unsupervised methods to size and classify suspended particles using digital in-line holography

USGS Publications Warehouse

Davies, Emlyn J.; Buscombe, Daniel D.; Graham, George W.; Nimmo-Smith, W. Alex M.

2015-01-01

Substantial information can be gained from digital in-line holography of marine particles, eliminating depth-of-field and focusing errors associated with standard lens-based imaging methods. However, for the technique to reach its full potential in oceanographic research, fully unsupervised (automated) methods are required for focusing, segmentation, sizing and classification of particles. These computational challenges are the subject of this paper, in which we draw upon data collected using a variety of holographic systems developed at Plymouth University, UK, from a significant range of particle types, sizes and shapes. A new method for noise reduction in reconstructed planes is found to be successful in aiding particle segmentation and sizing. The performance of an automated routine for deriving particle characteristics (and subsequent size distributions) is evaluated against equivalent size metrics obtained by a trained operative measuring grain axes on screen. The unsupervised method is found to be reliable, despite some errors resulting from over-segmentation of particles. A simple unsupervised particle classification system is developed, and is capable of successfully differentiating sand grains, bubbles and diatoms from within the surf-zone. Avoiding miscounting bubbles and biological particles as sand grains enables more accurate estimates of sand concentrations, and is especially important in deployments of particle monitoring instrumentation in aerated water. Perhaps the greatest potential for further development in the computational aspects of particle holography is in the area of unsupervised particle classification. The simple method proposed here provides a foundation upon which further development could lead to reliable identification of more complex particle populations, such as those containing phytoplankton, zooplankton, flocculated cohesive sediments and oil droplets.
Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees

PubMed Central

Verhoeven, Thibault; Schmid, Konstantin; Müller, Klaus-Robert; Tangermann, Michael; Kindermans, Pieter-Jan

2017-01-01

Objective Using traditional approaches, a brain-computer interface (BCI) requires the collection of calibration data for new subjects prior to online use. Calibration time can be reduced or eliminated e.g., by subject-to-subject transfer of a pre-trained classifier or unsupervised adaptive classification methods which learn from scratch and adapt over time. While such heuristics work well in practice, none of them can provide theoretical guarantees. Our objective is to modify an event-related potential (ERP) paradigm to work in unison with the machine learning decoder, and thus to achieve a reliable unsupervised calibrationless decoding with a guarantee to recover the true class means. Method We introduce learning from label proportions (LLP) to the BCI community as a new unsupervised, and easy-to-implement classification approach for ERP-based BCIs. The LLP estimates the mean target and non-target responses based on known proportions of these two classes in different groups of the data. We present a visual ERP speller to meet the requirements of LLP. For evaluation, we ran simulations on artificially created data sets and conducted an online BCI study with 13 subjects performing a copy-spelling task. Results Theoretical considerations show that LLP is guaranteed to minimize the loss function similar to a corresponding supervised classifier. LLP performed well in simulations and in the online application, where 84.5% of characters were spelled correctly on average without prior calibration. Significance The continuously adapting LLP classifier is the first unsupervised decoder for ERP BCIs guaranteed to find the optimal decoder. This makes it an ideal solution to avoid tedious calibration sessions. Additionally, LLP works on complementary principles compared to existing unsupervised methods, opening the door for their further enhancement when combined with LLP. PMID:28407016
Comparing supervised and unsupervised multiresolution segmentation approaches for extracting buildings from very high resolution imagery.

PubMed

Belgiu, Mariana; Dr Guţ, Lucian

2014-10-01

Although multiresolution segmentation (MRS) is a powerful technique for dealing with very high resolution imagery, some of the image objects that it generates do not match the geometries of the target objects, which reduces the classification accuracy. MRS can, however, be guided to produce results that approach the desired object geometry using either supervised or unsupervised approaches. Although some studies have suggested that a supervised approach is preferable, there has been no comparative evaluation of these two approaches. Therefore, in this study, we have compared supervised and unsupervised approaches to MRS. One supervised and two unsupervised segmentation methods were tested on three areas using QuickBird and WorldView-2 satellite imagery. The results were assessed using both segmentation evaluation methods and an accuracy assessment of the resulting building classifications. Thus, differences in the geometries of the image objects and in the potential to achieve satisfactory thematic accuracies were evaluated. The two approaches yielded remarkably similar classification results, with overall accuracies ranging from 82% to 86%. The performance of one of the unsupervised methods was unexpectedly similar to that of the supervised method; they identified almost identical scale parameters as being optimal for segmenting buildings, resulting in very similar geometries for the resulting image objects. The second unsupervised method produced very different image objects from the supervised method, but their classification accuracies were still very similar. The latter result was unexpected because, contrary to previously published findings, it suggests a high degree of independence between the segmentation results and classification accuracy. The results of this study have two important implications. The first is that object-based image analysis can be automated without sacrificing classification accuracy, and the second is that the previously accepted idea that classification is dependent on segmentation is challenged by our unexpected results, casting doubt on the value of pursuing 'optimal segmentation'. Our results rather suggest that as long as under-segmentation remains at acceptable levels, imperfections in segmentation can be ruled out, so that a high level of classification accuracy can still be achieved.
Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

PubMed Central

Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

2015-01-01

Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453
Automated 3D renal segmentation based on image partitioning

NASA Astrophysics Data System (ADS)

Yeghiazaryan, Varduhi; Voiculescu, Irina D.

2016-03-01

Despite several decades of research into segmentation techniques, automated medical image segmentation is barely usable in a clinical context, and still at vast user time expense. This paper illustrates unsupervised organ segmentation through the use of a novel automated labelling approximation algorithm followed by a hypersurface front propagation method. The approximation stage relies on a pre-computed image partition forest obtained directly from CT scan data. We have implemented all procedures to operate directly on 3D volumes, rather than slice-by-slice, because our algorithms are dimensionality-independent. The results picture segmentations which identify kidneys, but can easily be extrapolated to other body parts. Quantitative analysis of our automated segmentation compared against hand-segmented gold standards indicates an average Dice similarity coefficient of 90%. Results were obtained over volumes of CT data with 9 kidneys, computing both volume-based similarity measures (such as the Dice and Jaccard coefficients, true positive volume fraction) and size-based measures (such as the relative volume difference). The analysis considered both healthy and diseased kidneys, although extreme pathological cases were excluded from the overall count. Such cases are difficult to segment both manually and automatically due to the large amplitude of Hounsfield unit distribution in the scan, and the wide spread of the tumorous tissue inside the abdomen. In the case of kidneys that have maintained their shape, the similarity range lies around the values obtained for inter-operator variability. Whilst the procedure is fully automated, our tools also provide a light level of manual editing.
Unsupervised universal steganalyzer for high-dimensional steganalytic features

NASA Astrophysics Data System (ADS)

Hou, Xiaodan; Zhang, Tao

2016-11-01

The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.
Feature Extraction Using an Unsupervised Neural Network

DTIC Science & Technology

1991-05-03

with this neural netowrk is given and its connection to exploratory projection pursuit methods is established. DD I 2 P JA d 73 EDITIONj Of I NOV 6s...IS OBSOLETE $IN 0102- LF- 014- 6601 SECURITY CLASSIFICATION OF THIS PAGE (When Daoes Enlered) Feature Extraction using an Unsupervised Neural Network
An Unsupervised Method for Uncovering Morphological Chains (Open Access, Publisher’s Version)

DTIC Science & Technology

2015-03-08

Consortium. Marco Baroni, Johannes Matiasek, and Harald Trost. 2002. Unsupervised discovery of morphologically re- lated words based on orthographic and...Better word representations with re- cursive neural networks for morphology. In CoNLL, Sofia, Bulgaria. Mohamed Maamouri, Ann Bies, Hubert Jin, and Tim
Unsupervised frequency-recognition method of SSVEPs using a filter bank implementation of binary subband CCA

NASA Astrophysics Data System (ADS)

Rabiul Islam, Md; Khademul Islam Molla, Md; Nakanishi, Masaki; Tanaka, Toshihisa

2017-04-01

Objective. Recently developed effective methods for detection commands of steady-state visual evoked potential (SSVEP)-based brain-computer interface (BCI) that need calibration for visual stimuli, which cause more time and fatigue prior to the use, as the number of commands increases. This paper develops a novel unsupervised method based on canonical correlation analysis (CCA) for accurate detection of stimulus frequency. Approach. A novel unsupervised technique termed as binary subband CCA (BsCCA) is implemented in a multiband approach to enhance the frequency recognition performance of SSVEP. In BsCCA, two subbands are used and a CCA-based correlation coefficient is computed for the individual subbands. In addition, a reduced set of artificial reference signals is used to calculate CCA for the second subband. The analyzing SSVEP is decomposed into multiple subband and the BsCCA is implemented for each one. Then, the overall recognition score is determined by a weighted sum of the canonical correlation coefficients obtained from each band. Main results. A 12-class SSVEP dataset (frequency range: 9.25-14.75 Hz with an interval of 0.5 Hz) for ten healthy subjects are used to evaluate the performance of the proposed method. The results suggest that BsCCA significantly improves the performance of SSVEP-based BCI compared to the state-of-the-art methods. The proposed method is an unsupervised approach with averaged information transfer rate (ITR) of 77.04 bits min-1 across 10 subjects. The maximum individual ITR is 107.55 bits min-1 for 12-class SSVEP dataset, whereas, the ITR of 69.29 and 69.44 bits min-1 are achieved with CCA and NCCA respectively. Significance. The statistical test shows that the proposed unsupervised method significantly improves the performance of the SSVEP-based BCI. It can be usable in real world applications.
Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimers Disease

DTIC Science & Technology

2015-12-01

group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met- ric. For...differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as
An unsupervised method for quantifying the behavior of paired animals

NASA Astrophysics Data System (ADS)

Klibaite, Ugne; Berman, Gordon J.; Cande, Jessica; Stern, David L.; Shaevitz, Joshua W.

2017-02-01

Behaviors involving the interaction of multiple individuals are complex and frequently crucial for an animal’s survival. These interactions, ranging across sensory modalities, length scales, and time scales, are often subtle and difficult to characterize. Contextual effects on the frequency of behaviors become even more difficult to quantify when physical interaction between animals interferes with conventional data analysis, e.g. due to visual occlusion. We introduce a method for quantifying behavior in fruit fly interaction that combines high-throughput video acquisition and tracking of individuals with recent unsupervised methods for capturing an animal’s entire behavioral repertoire. We find behavioral differences between solitary flies and those paired with an individual of the opposite sex, identifying specific behaviors that are affected by social and spatial context. Our pipeline allows for a comprehensive description of the interaction between two individuals using unsupervised machine learning methods, and will be used to answer questions about the depth of complexity and variance in fruit fly courtship.
Unsupervised discovery of information structure in biomedical documents.

PubMed

Kiela, Douwe; Guo, Yufan; Stenius, Ulla; Korhonen, Anna

2015-04-01

Information structure (IS) analysis is a text mining technique, which classifies text in biomedical articles into categories that capture different types of information, such as objectives, methods, results and conclusions of research. It is a highly useful technique that can support a range of Biomedical Text Mining tasks and can help readers of biomedical literature find information of interest faster, accelerating the highly time-consuming process of literature review. Several approaches to IS analysis have been presented in the past, with promising results in real-world biomedical tasks. However, all existing approaches, even weakly supervised ones, require several hundreds of hand-annotated training sentences specific to the domain in question. Because biomedicine is subject to considerable domain variation, such annotations are expensive to obtain. This makes the application of IS analysis across biomedical domains difficult. In this article, we investigate an unsupervised approach to IS analysis and evaluate the performance of several unsupervised methods on a large corpus of biomedical abstracts collected from PubMed. Our best unsupervised algorithm (multilevel-weighted graph clustering algorithm) performs very well on the task, obtaining over 0.70 F scores for most IS categories when applied to well-known IS schemes. This level of performance is close to that of lightly supervised IS methods and has proven sufficient to aid a range of practical tasks. Thus, using an unsupervised approach, IS could be applied to support a wide range of tasks across sub-domains of biomedicine. We also demonstrate that unsupervised learning brings novel insights into IS of biomedical literature and discovers information categories that are not present in any of the existing IS schemes. The annotated corpus and software are available at http://www.cl.cam.ac.uk/∼dk427/bio14info.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Manifold Learning in MR spectroscopy using nonlinear dimensionality reduction and unsupervised clustering.

PubMed

Yang, Guang; Raschke, Felix; Barrick, Thomas R; Howe, Franklyn A

2015-09-01

To investigate whether nonlinear dimensionality reduction improves unsupervised classification of (1) H MRS brain tumor data compared with a linear method. In vivo single-voxel (1) H magnetic resonance spectroscopy (55 patients) and (1) H magnetic resonance spectroscopy imaging (MRSI) (29 patients) data were acquired from histopathologically diagnosed gliomas. Data reduction using Laplacian eigenmaps (LE) or independent component analysis (ICA) was followed by k-means clustering or agglomerative hierarchical clustering (AHC) for unsupervised learning to assess tumor grade and for tissue type segmentation of MRSI data. An accuracy of 93% in classification of glioma grade II and grade IV, with 100% accuracy in distinguishing tumor and normal spectra, was obtained by LE with unsupervised clustering, but not with the combination of k-means and ICA. With (1) H MRSI data, LE provided a more linear distribution of data for cluster analysis and better cluster stability than ICA. LE combined with k-means or AHC provided 91% accuracy for classifying tumor grade and 100% accuracy for identifying normal tissue voxels. Color-coded visualization of normal brain, tumor core, and infiltration regions was achieved with LE combined with AHC. The LE method is promising for unsupervised clustering to separate brain and tumor tissue with automated color-coding for visualization of (1) H MRSI data after cluster analysis. © 2014 Wiley Periodicals, Inc.
Wavelet-based unsupervised learning method for electrocardiogram suppression in surface electromyograms.

PubMed

Niegowski, Maciej; Zivanovic, Miroslav

2016-03-01

We present a novel approach aimed at removing electrocardiogram (ECG) perturbation from single-channel surface electromyogram (EMG) recordings by means of unsupervised learning of wavelet-based intensity images. The general idea is to combine the suitability of certain wavelet decomposition bases which provide sparse electrocardiogram time-frequency representations, with the capacity of non-negative matrix factorization (NMF) for extracting patterns from images. In order to overcome convergence problems which often arise in NMF-related applications, we design a novel robust initialization strategy which ensures proper signal decomposition in a wide range of ECG contamination levels. Moreover, the method can be readily used because no a priori knowledge or parameter adjustment is needed. The proposed method was evaluated on real surface EMG signals against two state-of-the-art unsupervised learning algorithms and a singular spectrum analysis based method. The results, expressed in terms of high-to-low energy ratio, normalized median frequency, spectral power difference and normalized average rectified value, suggest that the proposed method enables better ECG-EMG separation quality than the reference methods. Copyright © 2015 IPEM. Published by Elsevier Ltd. All rights reserved.
Clustervision: Visual Supervision of Unsupervised Clustering.

PubMed

Kwon, Bum Chul; Eysenbach, Ben; Verma, Janu; Ng, Kenney; De Filippi, Christopher; Stewart, Walter F; Perer, Adam

2018-01-01

Clustering, the process of grouping together similar items into distinct partitions, is a common type of unsupervised machine learning that can be useful for summarizing and aggregating complex multi-dimensional data. However, data can be clustered in many ways, and there exist a large body of algorithms designed to reveal different patterns. While having access to a wide variety of algorithms is helpful, in practice, it is quite difficult for data scientists to choose and parameterize algorithms to get the clustering results relevant for their dataset and analytical tasks. To alleviate this problem, we built Clustervision, a visual analytics tool that helps ensure data scientists find the right clustering among the large amount of techniques and parameters available. Our system clusters data using a variety of clustering techniques and parameters and then ranks clustering results utilizing five quality metrics. In addition, users can guide the system to produce more relevant results by providing task-relevant constraints on the data. Our visual user interface allows users to find high quality clustering results, explore the clusters using several coordinated visualization techniques, and select the cluster result that best suits their task. We demonstrate this novel approach using a case study with a team of researchers in the medical domain and showcase that our system empowers users to choose an effective representation of their complex data.
Down-Regulation of Olfactory Receptors in Response to Traumatic Brain Injury Promotes Risk for Alzheimer’s Disease

DTIC Science & Technology

2013-10-01

correct group assignment of samples in unsupervised hierarchical clustering by the Unweighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on...centering of log2 transformed MAS5.0 signal values; probe set clustering was performed by the UPGMA method using Cosine correlation as the similarity met...A) The 108 differentially-regulated genes identified were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with
Learning from label proportions in brain-computer interfaces: Online unsupervised learning with guarantees.

PubMed

Hübner, David; Verhoeven, Thibault; Schmid, Konstantin; Müller, Klaus-Robert; Tangermann, Michael; Kindermans, Pieter-Jan

2017-01-01

Using traditional approaches, a brain-computer interface (BCI) requires the collection of calibration data for new subjects prior to online use. Calibration time can be reduced or eliminated e.g., by subject-to-subject transfer of a pre-trained classifier or unsupervised adaptive classification methods which learn from scratch and adapt over time. While such heuristics work well in practice, none of them can provide theoretical guarantees. Our objective is to modify an event-related potential (ERP) paradigm to work in unison with the machine learning decoder, and thus to achieve a reliable unsupervised calibrationless decoding with a guarantee to recover the true class means. We introduce learning from label proportions (LLP) to the BCI community as a new unsupervised, and easy-to-implement classification approach for ERP-based BCIs. The LLP estimates the mean target and non-target responses based on known proportions of these two classes in different groups of the data. We present a visual ERP speller to meet the requirements of LLP. For evaluation, we ran simulations on artificially created data sets and conducted an online BCI study with 13 subjects performing a copy-spelling task. Theoretical considerations show that LLP is guaranteed to minimize the loss function similar to a corresponding supervised classifier. LLP performed well in simulations and in the online application, where 84.5% of characters were spelled correctly on average without prior calibration. The continuously adapting LLP classifier is the first unsupervised decoder for ERP BCIs guaranteed to find the optimal decoder. This makes it an ideal solution to avoid tedious calibration sessions. Additionally, LLP works on complementary principles compared to existing unsupervised methods, opening the door for their further enhancement when combined with LLP.
Statistical Significance for Hierarchical Clustering

PubMed Central

Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

2017-01-01

Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
Predicting protein complexes using a supervised learning method combined with local structural information.

PubMed

Dong, Yadong; Sun, Yongqi; Qin, Chao

2018-01-01

The existing protein complex detection methods can be broadly divided into two categories: unsupervised and supervised learning methods. Most of the unsupervised learning methods assume that protein complexes are in dense regions of protein-protein interaction (PPI) networks even though many true complexes are not dense subgraphs. Supervised learning methods utilize the informative properties of known complexes; they often extract features from existing complexes and then use the features to train a classification model. The trained model is used to guide the search process for new complexes. However, insufficient extracted features, noise in the PPI data and the incompleteness of complex data make the classification model imprecise. Consequently, the classification model is not sufficient for guiding the detection of complexes. Therefore, we propose a new robust score function that combines the classification model with local structural information. Based on the score function, we provide a search method that works both forwards and backwards. The results from experiments on six benchmark PPI datasets and three protein complex datasets show that our approach can achieve better performance compared with the state-of-the-art supervised, semi-supervised and unsupervised methods for protein complex detection, occasionally significantly outperforming such methods.
Galaxy morphology - An unsupervised machine learning approach

NASA Astrophysics Data System (ADS)

Schutter, A.; Shamir, L.

2015-09-01

Structural properties poses valuable information about the formation and evolution of galaxies, and are important for understanding the past, present, and future universe. Here we use unsupervised machine learning methodology to analyze a network of similarities between galaxy morphological types, and automatically deduce a morphological sequence of galaxies. Application of the method to the EFIGI catalog show that the morphological scheme produced by the algorithm is largely in agreement with the De Vaucouleurs system, demonstrating the ability of computer vision and machine learning methods to automatically profile galaxy morphological sequences. The unsupervised analysis method is based on comprehensive computer vision techniques that compute the visual similarities between the different morphological types. Rather than relying on human cognition, the proposed system deduces the similarities between sets of galaxy images in an automatic manner, and is therefore not limited by the number of galaxies being analyzed. The source code of the method is publicly available, and the protocol of the experiment is included in the paper so that the experiment can be replicated, and the method can be used to analyze user-defined datasets of galaxy images.

The impact of initialization procedures on unsupervised unmixing of hyperspectral imagery using the constrained positive matrix factorization

NASA Astrophysics Data System (ADS)

Masalmah, Yahya M.; Vélez-Reyes, Miguel

2007-04-01

The authors proposed in previous papers the use of the constrained Positive Matrix Factorization (cPMF) to perform unsupervised unmixing of hyperspectral imagery. Two iterative algorithms were proposed to compute the cPMF based on the Gauss-Seidel and penalty approaches to solve optimization problems. Results presented in previous papers have shown the potential of the proposed method to perform unsupervised unmixing in HYPERION and AVIRIS imagery. The performance of iterative methods is highly dependent on the initialization scheme. Good initialization schemes can improve convergence speed, whether or not a global minimum is found, and whether or not spectra with physical relevance are retrieved as endmembers. In this paper, different initializations using random selection, longest norm pixels, and standard endmembers selection routines are studied and compared using simulated and real data.
Mathematical morphology for automated analysis of remotely sensed objects in radar images

NASA Technical Reports Server (NTRS)

Daida, Jason M.; Vesecky, John F.

1991-01-01

A symbiosis of pyramidal segmentation and morphological transmission is described. The pyramidal segmentation portion of the symbiosis has resulted in low (2.6 percent) misclassification error rate for a one-look simulation. Other simulations indicate lower error rates (1.8 percent for a four-look image). The morphological transformation portion has resulted in meaningful partitions with a minimal loss of fractal boundary information. An unpublished version of Thicken, suitable for watersheds transformations of fractal objects, is also presented. It is demonstrated that the proposed symbiosis works with SAR (synthetic aperture radar) images: in this case, a four-look Seasat image of sea ice. It is concluded that the symbiotic forms of both segmentation and morphological transformation seem well suited for unsupervised geophysical analysis.
Unsupervised iterative detection of land mines in highly cluttered environments.

PubMed

Batman, Sinan; Goutsias, John

2003-01-01

An unsupervised iterative scheme is proposed for land mine detection in heavily cluttered scenes. This scheme is based on iterating hybrid multispectral filters that consist of a decorrelating linear transform coupled with a nonlinear morphological detector. Detections extracted from the first pass are used to improve results in subsequent iterations. The procedure stops after a predetermined number of iterations. The proposed scheme addresses several weaknesses associated with previous adaptations of morphological approaches to land mine detection. Improvement in detection performance, robustness with respect to clutter inhomogeneities, a completely unsupervised operation, and computational efficiency are the main highlights of the method. Experimental results reveal excellent performance.
Improving zero-training brain-computer interfaces by mixing model estimators

NASA Astrophysics Data System (ADS)

Verhoeven, T.; Hübner, D.; Tangermann, M.; Müller, K. R.; Dambre, J.; Kindermans, P. J.

2017-06-01

Objective. Brain-computer interfaces (BCI) based on event-related potentials (ERP) incorporate a decoder to classify recorded brain signals and subsequently select a control signal that drives a computer application. Standard supervised BCI decoders require a tedious calibration procedure prior to every session. Several unsupervised classification methods have been proposed that tune the decoder during actual use and as such omit this calibration. Each of these methods has its own strengths and weaknesses. Our aim is to improve overall accuracy of ERP-based BCIs without calibration. Approach. We consider two approaches for unsupervised classification of ERP signals. Learning from label proportions (LLP) was recently shown to be guaranteed to converge to a supervised decoder when enough data is available. In contrast, the formerly proposed expectation maximization (EM) based decoding for ERP-BCI does not have this guarantee. However, while this decoder has high variance due to random initialization of its parameters, it obtains a higher accuracy faster than LLP when the initialization is good. We introduce a method to optimally combine these two unsupervised decoding methods, letting one method’s strengths compensate for the weaknesses of the other and vice versa. The new method is compared to the aforementioned methods in a resimulation of an experiment with a visual speller. Main results. Analysis of the experimental results shows that the new method exceeds the performance of the previous unsupervised classification approaches in terms of ERP classification accuracy and symbol selection accuracy during the spelling experiment. Furthermore, the method shows less dependency on random initialization of model parameters and is consequently more reliable. Significance. Improving the accuracy and subsequent reliability of calibrationless BCIs makes these systems more appealing for frequent use.
Accuracy of latent-variable estimation in Bayesian semi-supervised learning.

PubMed

Yamazaki, Keisuke

2015-09-01

Hierarchical probabilistic models, such as Gaussian mixture models, are widely used for unsupervised learning tasks. These models consist of observable and latent variables, which represent the observable data and the underlying data-generation process, respectively. Unsupervised learning tasks, such as cluster analysis, are regarded as estimations of latent variables based on the observable ones. The estimation of latent variables in semi-supervised learning, where some labels are observed, will be more precise than that in unsupervised, and one of the concerns is to clarify the effect of the labeled data. However, there has not been sufficient theoretical analysis of the accuracy of the estimation of latent variables. In a previous study, a distribution-based error function was formulated, and its asymptotic form was calculated for unsupervised learning with generative models. It has been shown that, for the estimation of latent variables, the Bayes method is more accurate than the maximum-likelihood method. The present paper reveals the asymptotic forms of the error function in Bayesian semi-supervised learning for both discriminative and generative models. The results show that the generative model, which uses all of the given data, performs better when the model is well specified. Copyright © 2015 Elsevier Ltd. All rights reserved.
Segmentation of fluorescence microscopy cell images using unsupervised mining.

PubMed

Du, Xian; Dua, Sumeet

2010-05-28

The accurate measurement of cell and nuclei contours are critical for the sensitive and specific detection of changes in normal cells in several medical informatics disciplines. Within microscopy, this task is facilitated using fluorescence cell stains, and segmentation is often the first step in such approaches. Due to the complex nature of cell issues and problems inherent to microscopy, unsupervised mining approaches of clustering can be incorporated in the segmentation of cells. In this study, we have developed and evaluated the performance of multiple unsupervised data mining techniques in cell image segmentation. We adapt four distinctive, yet complementary, methods for unsupervised learning, including those based on k-means clustering, EM, Otsu's threshold, and GMAC. Validation measures are defined, and the performance of the techniques is evaluated both quantitatively and qualitatively using synthetic and recently published real data. Experimental results demonstrate that k-means, Otsu's threshold, and GMAC perform similarly, and have more precise segmentation results than EM. We report that EM has higher recall values and lower precision results from under-segmentation due to its Gaussian model assumption. We also demonstrate that these methods need spatial information to segment complex real cell images with a high degree of efficacy, as expected in many medical informatics applications.
Spectral gene set enrichment (SGSE).

PubMed

Frost, H Robert; Li, Zhigang; Moore, Jason H

2015-03-03

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.
Supervised segmentation of microelectrode recording artifacts using power spectral density.

PubMed

Bakstein, Eduard; Schneider, Jakub; Sieger, Tomas; Novak, Daniel; Wild, Jiri; Jech, Robert

2015-08-01

Appropriate detection of clean signal segments in extracellular microelectrode recordings (MER) is vital for maintaining high signal-to-noise ratio in MER studies. Existing alternatives to manual signal inspection are based on unsupervised change-point detection. We present a method of supervised MER artifact classification, based on power spectral density (PSD) and evaluate its performance on a database of 95 labelled MER signals. The proposed method yielded test-set accuracy of 90%, which was close to the accuracy of annotation (94%). The unsupervised methods achieved accuracy of about 77% on both training and testing data.
Misclassification Errors in Unsupervised Classification Methods. Comparison Based on the Simulation of Targeted Proteomics Data

PubMed Central

Andreev, Victor P; Gillespie, Brenda W; Helfand, Brian T; Merion, Robert M

2016-01-01

Unsupervised classification methods are gaining acceptance in omics studies of complex common diseases, which are often vaguely defined and are likely the collections of disease subtypes. Unsupervised classification based on the molecular signatures identified in omics studies have the potential to reflect molecular mechanisms of the subtypes of the disease and to lead to more targeted and successful interventions for the identified subtypes. Multiple classification algorithms exist but none is ideal for all types of data. Importantly, there are no established methods to estimate sample size in unsupervised classification (unlike power analysis in hypothesis testing). Therefore, we developed a simulation approach allowing comparison of misclassification errors and estimating the required sample size for a given effect size, number, and correlation matrix of the differentially abundant proteins in targeted proteomics studies. All the experiments were performed in silico. The simulated data imitated the expected one from the study of the plasma of patients with lower urinary tract dysfunction with the aptamer proteomics assay Somascan (SomaLogic Inc, Boulder, CO), which targeted 1129 proteins, including 330 involved in inflammation, 180 in stress response, 80 in aging, etc. Three popular clustering methods (hierarchical, k-means, and k-medoids) were compared. K-means clustering performed much better for the simulated data than the other two methods and enabled classification with misclassification error below 5% in the simulated cohort of 100 patients based on the molecular signatures of 40 differentially abundant proteins (effect size 1.5) from among the 1129-protein panel. PMID:27524871
An Efficient Optimization Method for Solving Unsupervised Data Classification Problems.

PubMed

Shabanzadeh, Parvaneh; Yusof, Rubiyah

2015-01-01

Unsupervised data classification (or clustering) analysis is one of the most useful tools and a descriptive task in data mining that seeks to classify homogeneous groups of objects based on similarity and is used in many medical disciplines and various applications. In general, there is no single algorithm that is suitable for all types of data, conditions, and applications. Each algorithm has its own advantages, limitations, and deficiencies. Hence, research for novel and effective approaches for unsupervised data classification is still active. In this paper a heuristic algorithm, Biogeography-Based Optimization (BBO) algorithm, was adapted for data clustering problems by modifying the main operators of BBO algorithm, which is inspired from the natural biogeography distribution of different species. Similar to other population-based algorithms, BBO algorithm starts with an initial population of candidate solutions to an optimization problem and an objective function that is calculated for them. To evaluate the performance of the proposed algorithm assessment was carried on six medical and real life datasets and was compared with eight well known and recent unsupervised data classification algorithms. Numerical results demonstrate that the proposed evolutionary optimization algorithm is efficient for unsupervised data classification.
Unsupervised detection of salt marsh platforms: a topographic method

NASA Astrophysics Data System (ADS)

Goodwin, Guillaume C. H.; Mudd, Simon M.; Clubb, Fiona J.

2018-03-01

Salt marshes filter pollutants, protect coastlines against storm surges, and sequester carbon, yet are under threat from sea level rise and anthropogenic modification. The sustained existence of the salt marsh ecosystem depends on the topographic evolution of marsh platforms. Quantifying marsh platform topography is vital for improving the management of these valuable landscapes. The determination of platform boundaries currently relies on supervised classification methods requiring near-infrared data to detect vegetation, or demands labour-intensive field surveys and digitisation. We propose a novel, unsupervised method to reproducibly isolate salt marsh scarps and platforms from a digital elevation model (DEM), referred to as Topographic Identification of Platforms (TIP). Field observations and numerical models show that salt marshes mature into subhorizontal platforms delineated by subvertical scarps. Based on this premise, we identify scarps as lines of local maxima on a slope raster, then fill landmasses from the scarps upward, thus isolating mature marsh platforms. We test the TIP method using lidar-derived DEMs from six salt marshes in England with varying tidal ranges and geometries, for which topographic platforms were manually isolated from tidal flats. Agreement between manual and unsupervised classification exceeds 94 % for DEM resolutions of 1 m, with all but one site maintaining an accuracy superior to 90 % for resolutions up to 3 m. For resolutions of 1 m, platforms detected with the TIP method are comparable in surface area to digitised platforms and have similar elevation distributions. We also find that our method allows for the accurate detection of local block failures as small as 3 times the DEM resolution. Detailed inspection reveals that although tidal creeks were digitised as part of the marsh platform, unsupervised classification categorises them as part of the tidal flat, causing an increase in false negatives and overall platform perimeter. This suggests our method may benefit from combination with existing creek detection algorithms. Fallen blocks and high tidal flat portions, associated with potential pioneer zones, can also lead to differences between our method and supervised mapping. Although pioneer zones prove difficult to classify using a topographic method, we suggest that these transition areas should be considered when analysing erosion and accretion processes, particularly in the case of incipient marsh platforms. Ultimately, we have shown that unsupervised classification of marsh platforms from high-resolution topography is possible and sufficient to monitor and analyse topographic evolution.
Audio-based, unsupervised machine learning reveals cyclic changes in earthquake mechanisms in the Geysers geothermal field, California

NASA Astrophysics Data System (ADS)

Holtzman, B. K.; Paté, A.; Paisley, J.; Waldhauser, F.; Repetto, D.; Boschi, L.

2017-12-01

The earthquake process reflects complex interactions of stress, fracture and frictional properties. New machine learning methods reveal patterns in time-dependent spectral properties of seismic signals and enable identification of changes in faulting processes. Our methods are based closely on those developed for music information retrieval and voice recognition, using the spectrogram instead of the waveform directly. Unsupervised learning involves identification of patterns based on differences among signals without any additional information provided to the algorithm. Clustering of 46,000 earthquakes of $0.3
Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.

PubMed

Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo

Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.
Classification of ROTSE Variable Stars using Machine Learning

NASA Astrophysics Data System (ADS)

Wozniak, P. R.; Akerlof, C.; Amrose, S.; Brumby, S.; Casperson, D.; Gisler, G.; Kehoe, R.; Lee, B.; Marshall, S.; McGowan, K. E.; McKay, T.; Perkins, S.; Priedhorsky, W.; Rykoff, E.; Smith, D. A.; Theiler, J.; Vestrand, W. T.; Wren, J.; ROTSE Collaboration

2001-12-01

We evaluate several Machine Learning algorithms as potential tools for automated classification of variable stars. Using the ROTSE sample of ~1800 variables from a pilot study of 5% of the whole sky, we compare the effectiveness of a supervised technique (Support Vector Machines, SVM) versus unsupervised methods (K-means and Autoclass). There are 8 types of variables in the sample: RR Lyr AB, RR Lyr C, Delta Scuti, Cepheids, detached eclipsing binaries, contact binaries, Miras and LPVs. Preliminary results suggest a very high ( ~95%) efficiency of SVM in isolating a few best defined classes against the rest of the sample, and good accuracy ( ~70-75%) for all classes considered simultaneously. This includes some degeneracies, irreducible with the information at hand. Supervised methods naturally outperform unsupervised methods, in terms of final error rate, but unsupervised methods offer many advantages for large sets of unlabeled data. Therefore, both types of methods should be considered as promising tools for mining vast variability surveys. We project that there are more than 30,000 periodic variables in the ROTSE-I data base covering the entire local sky between V=10 and 15.5 mag. This sample size is already stretching the time capabilities of human analysts.
An Unsupervised kNN Method to Systematically Detect Changes in Protein Localization in High-Throughput Microscopy Images.

PubMed

Lu, Alex Xijie; Moses, Alan M

2016-01-01

Despite the importance of characterizing genes that exhibit subcellular localization changes between conditions in proteome-wide imaging experiments, many recent studies still rely upon manual evaluation to assess the results of high-throughput imaging experiments. We describe and demonstrate an unsupervised k-nearest neighbours method for the detection of localization changes. Compared to previous classification-based supervised change detection methods, our method is much simpler and faster, and operates directly on the feature space to overcome limitations in needing to manually curate training sets that may not generalize well between screens. In addition, the output of our method is flexible in its utility, generating both a quantitatively ranked list of localization changes that permit user-defined cut-offs, and a vector for each gene describing feature-wise direction and magnitude of localization changes. We demonstrate that our method is effective at the detection of localization changes using the Δrpd3 perturbation in Saccharomyces cerevisiae, where we capture 71.4% of previously known changes within the top 10% of ranked genes, and find at least four new localization changes within the top 1% of ranked genes. The results of our analysis indicate that simple unsupervised methods may be able to identify localization changes in images without laborious manual image labelling steps.
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm

PubMed Central

Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

2016-01-01

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. PMID:27959895
Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.

PubMed

Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

2016-01-01

In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.
Subspace K-means clustering.

PubMed

Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

2013-12-01

To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).
Sparse alignment for robust tensor learning.

PubMed

Lai, Zhihui; Wong, Wai Keung; Xu, Yong; Zhao, Cairong; Sun, Mingming

2014-10-01

Multilinear/tensor extensions of manifold learning based algorithms have been widely used in computer vision and pattern recognition. This paper first provides a systematic analysis of the multilinear extensions for the most popular methods by using alignment techniques, thereby obtaining a general tensor alignment framework. From this framework, it is easy to show that the manifold learning based tensor learning methods are intrinsically different from the alignment techniques. Based on the alignment framework, a robust tensor learning method called sparse tensor alignment (STA) is then proposed for unsupervised tensor feature extraction. Different from the existing tensor learning methods, L1- and L2-norms are introduced to enhance the robustness in the alignment step of the STA. The advantage of the proposed technique is that the difficulty in selecting the size of the local neighborhood can be avoided in the manifold learning based tensor feature extraction algorithms. Although STA is an unsupervised learning method, the sparsity encodes the discriminative information in the alignment step and provides the robustness of STA. Extensive experiments on the well-known image databases as well as action and hand gesture databases by encoding object images as tensors demonstrate that the proposed STA algorithm gives the most competitive performance when compared with the tensor-based unsupervised learning methods.
A comparative evaluation of supervised and unsupervised representation learning approaches for anaplastic medulloblastoma differentiation

NASA Astrophysics Data System (ADS)

Cruz-Roa, Angel; Arevalo, John; Basavanhally, Ajay; Madabhushi, Anant; González, Fabio

2015-01-01

Learning data representations directly from the data itself is an approach that has shown great success in different pattern recognition problems, outperforming state-of-the-art feature extraction schemes for different tasks in computer vision, speech recognition and natural language processing. Representation learning applies unsupervised and supervised machine learning methods to large amounts of data to find building-blocks that better represent the information in it. Digitized histopathology images represents a very good testbed for representation learning since it involves large amounts of high complex, visual data. This paper presents a comparative evaluation of different supervised and unsupervised representation learning architectures to specifically address open questions on what type of learning architectures (deep or shallow), type of learning (unsupervised or supervised) is optimal. In this paper we limit ourselves to addressing these questions in the context of distinguishing between anaplastic and non-anaplastic medulloblastomas from routine haematoxylin and eosin stained images. The unsupervised approaches evaluated were sparse autoencoders and topographic reconstruct independent component analysis, and the supervised approach was convolutional neural networks. Experimental results show that shallow architectures with more neurons are better than deeper architectures without taking into account local space invariances and that topographic constraints provide useful invariant features in scale and rotations for efficient tumor differentiation.

Wavelet-based Gaussian-mixture hidden Markov model for the detection of multistage seizure dynamics: A proof-of-concept study

PubMed Central

2011-01-01

Background Epilepsy is a common neurological disorder characterized by recurrent electrophysiological activities, known as seizures. Without the appropriate detection strategies, these seizure episodes can dramatically affect the quality of life for those afflicted. The rationale of this study is to develop an unsupervised algorithm for the detection of seizure states so that it may be implemented along with potential intervention strategies. Methods Hidden Markov model (HMM) was developed to interpret the state transitions of the in vitro rat hippocampal slice local field potentials (LFPs) during seizure episodes. It can be used to estimate the probability of state transitions and the corresponding characteristics of each state. Wavelet features were clustered and used to differentiate the electrophysiological characteristics at each corresponding HMM states. Using unsupervised training method, the HMM and the clustering parameters were obtained simultaneously. The HMM states were then assigned to the electrophysiological data using expert guided technique. Minimum redundancy maximum relevance (mRMR) analysis and Akaike Information Criterion (AICc) were applied to reduce the effect of over-fitting. The sensitivity, specificity and optimality index of chronic seizure detection were compared for various HMM topologies. The ability of distinguishing early and late tonic firing patterns prior to chronic seizures were also evaluated. Results Significant improvement in state detection performance was achieved when additional wavelet coefficient rates of change information were used as features. The final HMM topology obtained using mRMR and AICc was able to detect non-ictal (interictal), early and late tonic firing, chronic seizures and postictal activities. A mean sensitivity of 95.7%, mean specificity of 98.9% and optimality index of 0.995 in the detection of chronic seizures was achieved. The detection of early and late tonic firing was validated with experimental intracellular electrical recordings of seizures. Conclusions The HMM implementation of a seizure dynamics detector is an improvement over existing approaches using visual detection and complexity measures. The subjectivity involved in partitioning the observed data prior to training can be eliminated. It can also decipher the probabilities of seizure state transitions using the magnitude and rate of change wavelet information of the LFPs. PMID:21504608
Infrared vehicle recognition using unsupervised feature learning based on K-feature

NASA Astrophysics Data System (ADS)

Lin, Jin; Tan, Yihua; Xia, Haijiao; Tian, Jinwen

2018-02-01

Subject to the complex battlefield environment, it is difficult to establish a complete knowledge base in practical application of vehicle recognition algorithms. The infrared vehicle recognition is always difficult and challenging, which plays an important role in remote sensing. In this paper we propose a new unsupervised feature learning method based on K-feature to recognize vehicle in infrared images. First, we use the target detection algorithm which is based on the saliency to detect the initial image. Then, the unsupervised feature learning based on K-feature, which is generated by Kmeans clustering algorithm that extracted features by learning a visual dictionary from a large number of samples without label, is calculated to suppress the false alarm and improve the accuracy. Finally, the vehicle target recognition image is finished by some post-processing. Large numbers of experiments demonstrate that the proposed method has satisfy recognition effectiveness and robustness for vehicle recognition in infrared images under complex backgrounds, and it also improve the reliability of it.
Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature

NASA Astrophysics Data System (ADS)

Iwahashi, Junko; Pike, Richard J.

2007-05-01

An iterative procedure that implements the classification of continuous topography as a problem in digital image-processing automatically divides an area into categories of surface form; three taxonomic criteria-slope gradient, local convexity, and surface texture-are calculated from a square-grid digital elevation model (DEM). The sequence of programmed operations combines twofold-partitioned maps of the three variables converted to greyscale images, using the mean of each variable as the dividing threshold. To subdivide increasingly subtle topography, grid cells sloping at less than mean gradient of the input DEM are classified by designating mean values of successively lower-sloping subsets of the study area (nested means) as taxonomic thresholds, thereby increasing the number of output categories from the minimum 8 to 12 or 16. Program output is exemplified by 16 topographic types for the world at 1-km spatial resolution (SRTM30 data), the Japanese Islands at 270 m, and part of Hokkaido at 55 m. Because the procedure is unsupervised and reflects frequency distributions of the input variables rather than pre-set criteria, the resulting classes are undefined and must be calibrated empirically by subsequent analysis. Maps of the example classifications reflect physiographic regions, geological structure, and landform as well as slope materials and processes; fine-textured terrain categories tend to correlate with erosional topography or older surfaces, coarse-textured classes with areas of little dissection. In Japan the resulting classes approximate landform types mapped from airphoto analysis, while in the Americas they create map patterns resembling Hammond's terrain types or surface-form classes; SRTM30 output for the United States compares favorably with Fenneman's physical divisions. Experiments are suggested for further developing the method; the Arc/Info AML and the map of terrain classes for the world are available as online downloads.
Automated classifications of topography from DEMs by an unsupervised nested-means algorithm and a three-part geometric signature

USGS Publications Warehouse

Iwahashi, J.; Pike, R.J.

2007-01-01

An iterative procedure that implements the classification of continuous topography as a problem in digital image-processing automatically divides an area into categories of surface form; three taxonomic criteria-slope gradient, local convexity, and surface texture-are calculated from a square-grid digital elevation model (DEM). The sequence of programmed operations combines twofold-partitioned maps of the three variables converted to greyscale images, using the mean of each variable as the dividing threshold. To subdivide increasingly subtle topography, grid cells sloping at less than mean gradient of the input DEM are classified by designating mean values of successively lower-sloping subsets of the study area (nested means) as taxonomic thresholds, thereby increasing the number of output categories from the minimum 8 to 12 or 16. Program output is exemplified by 16 topographic types for the world at 1-km spatial resolution (SRTM30 data), the Japanese Islands at 270??m, and part of Hokkaido at 55??m. Because the procedure is unsupervised and reflects frequency distributions of the input variables rather than pre-set criteria, the resulting classes are undefined and must be calibrated empirically by subsequent analysis. Maps of the example classifications reflect physiographic regions, geological structure, and landform as well as slope materials and processes; fine-textured terrain categories tend to correlate with erosional topography or older surfaces, coarse-textured classes with areas of little dissection. In Japan the resulting classes approximate landform types mapped from airphoto analysis, while in the Americas they create map patterns resembling Hammond's terrain types or surface-form classes; SRTM30 output for the United States compares favorably with Fenneman's physical divisions. Experiments are suggested for further developing the method; the Arc/Info AML and the map of terrain classes for the world are available as online downloads. ?? 2006 Elsevier B.V. All rights reserved.
Automated and unsupervised detection of malarial parasites in microscopic images.

PubMed

Purwar, Yashasvi; Shah, Sirish L; Clarke, Gwen; Almugairi, Areej; Muehlenbachs, Atis

2011-12-13

Malaria is a serious infectious disease. According to the World Health Organization, it is responsible for nearly one million deaths each year. There are various techniques to diagnose malaria of which manual microscopy is considered to be the gold standard. However due to the number of steps required in manual assessment, this diagnostic method is time consuming (leading to late diagnosis) and prone to human error (leading to erroneous diagnosis), even in experienced hands. The focus of this study is to develop a robust, unsupervised and sensitive malaria screening technique with low material cost and one that has an advantage over other techniques in that it minimizes human reliance and is, therefore, more consistent in applying diagnostic criteria. A method based on digital image processing of Giemsa-stained thin smear image is developed to facilitate the diagnostic process. The diagnosis procedure is divided into two parts; enumeration and identification. The image-based method presented here is designed to automate the process of enumeration and identification; with the main advantage being its ability to carry out the diagnosis in an unsupervised manner and yet have high sensitivity and thus reducing cases of false negatives. The image based method is tested over more than 500 images from two independent laboratories. The aim is to distinguish between positive and negative cases of malaria using thin smear blood slide images. Due to the unsupervised nature of method it requires minimal human intervention thus speeding up the whole process of diagnosis. Overall sensitivity to capture cases of malaria is 100% and specificity ranges from 50-88% for all species of malaria parasites. Image based screening method will speed up the whole process of diagnosis and is more advantageous over laboratory procedures that are prone to errors and where pathological expertise is minimal. Further this method provides a consistent and robust way of generating the parasite clearance curves.
[On the partition of acupuncture academic schools].

PubMed

Yang, Pengyan; Luo, Xi; Xia, Youbing

2016-05-01

Nowadays extensive attention has been paid on the research of acupuncture academic schools, however, a widely accepted method of partition of acupuncture academic schools is still in need. In this paper, the methods of partition of acupuncture academic schools in the history have been arranged, and three typical methods of"partition of five schools" "partition of eighteen schools" and "two-stage based partition" are summarized. After adeep analysis on the disadvantages and advantages of these three methods, a new method of partition of acupuncture academic schools that is called "three-stage based partition" is proposed. In this method, after the overall acupuncture academic schools are divided into an ancient stage, a modern stage and a contemporary stage, each schoolis divided into its sub-school category. It is believed that this method of partition can remedy the weaknesses ofcurrent methods, but also explore a new model of inheritance and development under a different aspect through thedifferentiation and interaction of acupuncture academic schools at three stages.
Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

PubMed

Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

2015-01-01

To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
Unsupervised classification of variable stars

NASA Astrophysics Data System (ADS)

Valenzuela, Lucas; Pichara, Karim

2018-03-01

During the past 10 years, a considerable amount of effort has been made to develop algorithms for automatic classification of variable stars. That has been primarily achieved by applying machine learning methods to photometric data sets where objects are represented as light curves. Classifiers require training sets to learn the underlying patterns that allow the separation among classes. Unfortunately, building training sets is an expensive process that demands a lot of human efforts. Every time data come from new surveys; the only available training instances are the ones that have a cross-match with previously labelled objects, consequently generating insufficient training sets compared with the large amounts of unlabelled sources. In this work, we present an algorithm that performs unsupervised classification of variable stars, relying only on the similarity among light curves. We tackle the unsupervised classification problem by proposing an untraditional approach. Instead of trying to match classes of stars with clusters found by a clustering algorithm, we propose a query-based method where astronomers can find groups of variable stars ranked by similarity. We also develop a fast similarity function specific for light curves, based on a novel data structure that allows scaling the search over the entire data set of unlabelled objects. Experiments show that our unsupervised model achieves high accuracy in the classification of different types of variable stars and that the proposed algorithm scales up to massive amounts of light curves.
Automatically identifying health- and clinical-related content in wikipedia.

PubMed

Liu, Feifan; Moosavinasab, Soheil; Agarwal, Shashank; Bennett, Andrew S; Yu, Hong

2013-01-01

Physicians are increasingly using the Internet for finding medical information related to patient care. Wikipedia is a valuable online medical resource to be integrated into existing clinical question answering (QA) systems. On the other hand, Wikipedia contains a full spectrum of world's knowledge and therefore comprises a large partition of non-health-related content, which makes disambiguation more challenging and consequently leads to large overhead for existing systems to effectively filter irrelevant information. To overcome this, we have developed both unsupervised and supervised approaches to identify health-related articles as well as clinically relevant articles. Furthermore, we explored novel features by extracting health related hierarchy from the Wikipedia category network, from which a variety of features were derived and evaluated. Our experiments show promising results and also demonstrate that employing the category hierarchy can effectively improve the system performance.
Natural-Annotation-based Unsupervised Construction of Korean-Chinese Domain Dictionary

NASA Astrophysics Data System (ADS)

Liu, Wuying; Wang, Lin

2018-03-01

The large-scale bilingual parallel resource is significant to statistical learning and deep learning in natural language processing. This paper addresses the automatic construction issue of the Korean-Chinese domain dictionary, and presents a novel unsupervised construction method based on the natural annotation in the raw corpus. We firstly extract all Korean-Chinese word pairs from Korean texts according to natural annotations, secondly transform the traditional Chinese characters into the simplified ones, and finally distill out a bilingual domain dictionary after retrieving the simplified Chinese words in an extra Chinese domain dictionary. The experimental results show that our method can automatically build multiple Korean-Chinese domain dictionaries efficiently.
Shadow detection and removal in RGB VHR images for land use unsupervised classification

NASA Astrophysics Data System (ADS)

Movia, A.; Beinat, A.; Crosilla, F.

2016-09-01

Nowadays, high resolution aerial images are widely available thanks to the diffusion of advanced technologies such as UAVs (Unmanned Aerial Vehicles) and new satellite missions. Although these developments offer new opportunities for accurate land use analysis and change detection, cloud and terrain shadows actually limit benefits and possibilities of modern sensors. Focusing on the problem of shadow detection and removal in VHR color images, the paper proposes new solutions and analyses how they can enhance common unsupervised classification procedures for identifying land use classes related to the CO2 absorption. To this aim, an improved fully automatic procedure has been developed for detecting image shadows using exclusively RGB color information, and avoiding user interaction. Results show a significant accuracy enhancement with respect to similar methods using RGB based indexes. Furthermore, novel solutions derived from Procrustes analysis have been applied to remove shadows and restore brightness in the images. In particular, two methods implementing the so called "anisotropic Procrustes" and the "not-centered oblique Procrustes" algorithms have been developed and compared with the linear correlation correction method based on the Cholesky decomposition. To assess how shadow removal can enhance unsupervised classifications, results obtained with classical methods such as k-means, maximum likelihood, and self-organizing maps, have been compared to each other and with a supervised clustering procedure.
Parental Monitoring, Negotiated Unsupervised Time, and Parental Trust: The Role of Perceived Parenting Practices in Adolescent Health Risk Behaviors

PubMed Central

BORAWSKI, ELAINE A.; IEVERS-LANDIS, CAROLYN E.; LOVEGREEN, LOREN D.; TRAPL, ERIKA S.

2010-01-01

Purpose To compare two different parenting practices (parental monitoring and negotiated unsupervised time) and perceived parental trust in the reporting of health risk behaviors among adolescents. Methods Data were derived from 692 adolescents in 9th and 10th grades (X̄ = 15.7 years) enrolled in health education classes in six urban high schools. Students completed a self-administered paper-based survey that assessed adolescents’ perceptions of the degree to which their parents monitor their whereabouts, are permitted to negotiate unsupervised time with their friends and trust them to make decisions. Using gender-specific multivariate logistic regression analyses, we examined the relative importance of parental monitoring, negotiated unsupervised time with peers, and parental trust in predicting reported sexual activity, sex-related protective actions (e.g., condom use, carrying protection) and substance use (alcohol, tobacco, and marijuana). Results For males and females, increased negotiated unsupervised time was strongly associated with increased risk behavior (e.g., sexual activity, alcohol and marijuana use) but also sex-related protective actions. In males, high parental monitoring was associated with less alcohol use and consistent condom use. Parental monitoring had no affect on female behavior. Perceived parental trust served as a protective factor against sexual activity, tobacco, and marijuana use in females, and alcohol use in males. Conclusions Although monitoring is an important practice for parents of older adolescents, managing their behavior through negotiation of unsupervised time may have mixed results leading to increased experimentation with sexuality and substances, but perhaps in a more responsible way. Trust established between an adolescent female and her parents continues to be a strong deterrent for risky behaviors but appears to have little effect on behaviors of adolescent males. PMID:12890596
Unsupervised Fault Diagnosis of a Gear Transmission Chain Using a Deep Belief Network

PubMed Central

He, Jun; Yang, Shixi; Gan, Chunbiao

2017-01-01

Artificial intelligence (AI) techniques, which can effectively analyze massive amounts of fault data and automatically provide accurate diagnosis results, have been widely applied to fault diagnosis of rotating machinery. Conventional AI methods are applied using features selected by a human operator, which are manually extracted based on diagnostic techniques and field expertise. However, developing robust features for each diagnostic purpose is often labour-intensive and time-consuming, and the features extracted for one specific task may be unsuitable for others. In this paper, a novel AI method based on a deep belief network (DBN) is proposed for the unsupervised fault diagnosis of a gear transmission chain, and the genetic algorithm is used to optimize the structural parameters of the network. Compared to the conventional AI methods, the proposed method can adaptively exploit robust features related to the faults by unsupervised feature learning, thus requires less prior knowledge about signal processing techniques and diagnostic expertise. Besides, it is more powerful at modelling complex structured data. The effectiveness of the proposed method is validated using datasets from rolling bearings and gearbox. To show the superiority of the proposed method, its performance is compared with two well-known classifiers, i.e., back propagation neural network (BPNN) and support vector machine (SVM). The fault classification accuracies are 99.26% for rolling bearings and 100% for gearbox when using the proposed method, which are much higher than that of the other two methods. PMID:28677638
Unsupervised Fault Diagnosis of a Gear Transmission Chain Using a Deep Belief Network.

PubMed

He, Jun; Yang, Shixi; Gan, Chunbiao

2017-07-04

Artificial intelligence (AI) techniques, which can effectively analyze massive amounts of fault data and automatically provide accurate diagnosis results, have been widely applied to fault diagnosis of rotating machinery. Conventional AI methods are applied using features selected by a human operator, which are manually extracted based on diagnostic techniques and field expertise. However, developing robust features for each diagnostic purpose is often labour-intensive and time-consuming, and the features extracted for one specific task may be unsuitable for others. In this paper, a novel AI method based on a deep belief network (DBN) is proposed for the unsupervised fault diagnosis of a gear transmission chain, and the genetic algorithm is used to optimize the structural parameters of the network. Compared to the conventional AI methods, the proposed method can adaptively exploit robust features related to the faults by unsupervised feature learning, thus requires less prior knowledge about signal processing techniques and diagnostic expertise. Besides, it is more powerful at modelling complex structured data. The effectiveness of the proposed method is validated using datasets from rolling bearings and gearbox. To show the superiority of the proposed method, its performance is compared with two well-known classifiers, i.e., back propagation neural network (BPNN) and support vector machine (SVM). The fault classification accuracies are 99.26% for rolling bearings and 100% for gearbox when using the proposed method, which are much higher than that of the other two methods.
An unsupervised approach for measuring myocardial perfusion in MR image sequences

NASA Astrophysics Data System (ADS)

Discher, Antoine; Rougon, Nicolas; Preteux, Francoise

2005-08-01

Quantitatively assessing myocardial perfusion is a key issue for the diagnosis, therapeutic planning and patient follow-up of cardio-vascular diseases. To this end, perfusion MRI (p-MRI) has emerged as a valuable clinical investigation tool thanks to its ability of dynamically imaging the first pass of a contrast bolus in the framework of stress/rest exams. However, reliable techniques for automatically computing regional first pass curves from 2D short-axis cardiac p-MRI sequences remain to be elaborated. We address this problem and develop an unsupervised four-step approach comprising: (i) a coarse spatio-temporal segmentation step, allowing to automatically detect a region of interest for the heart over the whole sequence, and to select a reference frame with maximal myocardium contrast; (ii) a model-based variational segmentation step of the reference frame, yielding a bi-ventricular partition of the heart into left ventricle, right ventricle and myocardium components; (iii) a respiratory/cardiac motion artifacts compensation step using a novel region-driven intensity-based non rigid registration technique, allowing to elastically propagate the reference bi-ventricular segmentation over the whole sequence; (iv) a measurement step, delivering first-pass curves over each region of a segmental model of the myocardium. The performance of this approach is assessed over a database of 15 normal and pathological subjects, and compared with perfusion measurements delivered by a MRI manufacturer software package based on manual delineations by a medical expert.
Performance analysis of unsupervised optimal fuzzy clustering algorithm for MRI brain tumor segmentation.

PubMed

Blessy, S A Praylin Selva; Sulochana, C Helen

2015-01-01

Segmentation of brain tumor from Magnetic Resonance Imaging (MRI) becomes very complicated due to the structural complexities of human brain and the presence of intensity inhomogeneities. To propose a method that effectively segments brain tumor from MR images and to evaluate the performance of unsupervised optimal fuzzy clustering (UOFC) algorithm for segmentation of brain tumor from MR images. Segmentation is done by preprocessing the MR image to standardize intensity inhomogeneities followed by feature extraction, feature fusion and clustering. Different validation measures are used to evaluate the performance of the proposed method using different clustering algorithms. The proposed method using UOFC algorithm produces high sensitivity (96%) and low specificity (4%) compared to other clustering methods. Validation results clearly show that the proposed method with UOFC algorithm effectively segments brain tumor from MR images.
Unsupervised learning of natural languages

PubMed Central

Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

2005-01-01

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics. PMID:16087885
Unsupervised learning of natural languages.

PubMed

Solan, Zach; Horn, David; Ruppin, Eytan; Edelman, Shimon

2005-08-16

We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.
Automated age-related macular degeneration classification in OCT using unsupervised feature learning

NASA Astrophysics Data System (ADS)

Venhuizen, Freerk G.; van Ginneken, Bram; Bloemen, Bart; van Grinsven, Mark J. J. P.; Philipsen, Rick; Hoyng, Carel; Theelen, Thomas; Sánchez, Clara I.

2015-03-01

Age-related Macular Degeneration (AMD) is a common eye disorder with high prevalence in elderly people. The disease mainly affects the central part of the retina, and could ultimately lead to permanent vision loss. Optical Coherence Tomography (OCT) is becoming the standard imaging modality in diagnosis of AMD and the assessment of its progression. However, the evaluation of the obtained volumetric scan is time consuming, expensive and the signs of early AMD are easy to miss. In this paper we propose a classification method to automatically distinguish AMD patients from healthy subjects with high accuracy. The method is based on an unsupervised feature learning approach, and processes the complete image without the need for an accurate pre-segmentation of the retina. The method can be divided in two steps: an unsupervised clustering stage that extracts a set of small descriptive image patches from the training data, and a supervised training stage that uses these patches to create a patch occurrence histogram for every image on which a random forest classifier is trained. Experiments using 384 volume scans show that the proposed method is capable of identifying AMD patients with high accuracy, obtaining an area under the Receiver Operating Curve of 0:984. Our method allows for a quick and reliable assessment of the presence of AMD pathology in OCT volume scans without the need for accurate layer segmentation algorithms.
Exploring supervised and unsupervised methods to detect topics in biomedical text

PubMed Central

Lee, Minsuk; Wang, Weiqing; Yu, Hong

2006-01-01

Background Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on information content. Topic detection will benefit many other natural language processing tasks including information retrieval, text summarization and question answering; and is a necessary step towards the building of an information system that provides an efficient way for biologists to seek information from an ocean of literature. Results We have explored the methods of Topic Spotting, a task of text categorization that applies the supervised machine-learning technique naïve Bayes to assign automatically a document into one or more predefined topics; and Topic Clustering, which apply unsupervised hierarchical clustering algorithms to aggregate documents into clusters such that each cluster represents a topic. We have applied our methods to detect topics of more than fifteen thousand of articles that represent over sixteen thousand entries in the Online Mendelian Inheritance in Man (OMIM) database. We have explored bag of words as the features. Additionally, we have explored semantic features; namely, the Medical Subject Headings (MeSH) that are assigned to the MEDLINE records, and the Unified Medical Language System (UMLS) semantic types that correspond to the MeSH terms, in addition to bag of words, to facilitate the tasks of topic detection. Our results indicate that incorporating the MeSH terms and the UMLS semantic types as additional features enhances the performance of topic detection and the naïve Bayes has the highest accuracy, 66.4%, for predicting the topic of an OMIM article as one of the total twenty-five topics. Conclusion Our results indicate that the supervised topic spotting methods outperformed the unsupervised topic clustering; on the other hand, the unsupervised topic clustering methods have the advantages of being robust and applicable in real world settings. PMID:16539745

Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

PubMed

Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

2018-05-03

Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.
Remote photoplethysmography system for unsupervised monitoring regional anesthesia effectiveness

NASA Astrophysics Data System (ADS)

Rubins, U.; Miscuks, A.; Marcinkevics, Z.; Lange, M.

2017-12-01

Determining the level of regional anesthesia (RA) is vitally important to both an anesthesiologist and surgeon, also knowing the RA level can protect the patient and reduce the time of surgery. Normally to detect the level of RA, usually a simple subjective (sensitivity test) and complicated quantitative methods (thermography, neuromyography, etc.) are used, but there is not yet a standardized method for objective RA detection and evaluation. In this study, the advanced remote photoplethysmography imaging (rPPG) system for unsupervised monitoring of human palm RA is demonstrated. The rPPG system comprises compact video camera with green optical filter, surgical lamp as a light source and a computer with custom-developed software. The algorithm implemented in Matlab software recognizes the palm and two dermatomes (Medial and Ulnar innervation), calculates the perfusion map and perfusion changes in real-time to detect effect of RA. Seven patients (aged 18-80 years) undergoing hand surgery received peripheral nerve brachial plexus blocks during the measurements. Clinical experiments showed that our rPPG system is able to perform unsupervised monitoring of RA.
Unsupervised Feature Learning With Winner-Takes-All Based STDP

PubMed Central

Ferré, Paul; Mamalet, Franck; Thorpe, Simon J.

2018-01-01

We present a novel strategy for unsupervised feature learning in image applications inspired by the Spike-Timing-Dependent-Plasticity (STDP) biological learning rule. We show equivalence between rank order coding Leaky-Integrate-and-Fire neurons and ReLU artificial neurons when applied to non-temporal data. We apply this to images using rank-order coding, which allows us to perform a full network simulation with a single feed-forward pass using GPU hardware. Next we introduce a binary STDP learning rule compatible with training on batches of images. Two mechanisms to stabilize the training are also presented : a Winner-Takes-All (WTA) framework which selects the most relevant patches to learn from along the spatial dimensions, and a simple feature-wise normalization as homeostatic process. This learning process allows us to train multi-layer architectures of convolutional sparse features. We apply our method to extract features from the MNIST, ETH80, CIFAR-10, and STL-10 datasets and show that these features are relevant for classification. We finally compare these results with several other state of the art unsupervised learning methods. PMID:29674961
A single-layer network unsupervised feature learning method for white matter hyperintensity segmentation

NASA Astrophysics Data System (ADS)

Vijverberg, Koen; Ghafoorian, Mohsen; van Uden, Inge W. M.; de Leeuw, Frank-Erik; Platel, Bram; Heskes, Tom

2016-03-01

Cerebral small vessel disease (SVD) is a disorder frequently found among the old people and is associated with deterioration in cognitive performance, parkinsonism, motor and mood impairments. White matter hyperintensities (WMH) as well as lacunes, microbleeds and subcortical brain atrophy are part of the spectrum of image findings, related to SVD. Accurate segmentation of WMHs is important for prognosis and diagnosis of multiple neurological disorders such as MS and SVD. Almost all of the published (semi-)automated WMH detection models employ multiple complex hand-crafted features, which require in-depth domain knowledge. In this paper we propose to apply a single-layer network unsupervised feature learning (USFL) method to avoid hand-crafted features, but rather to automatically learn a more efficient set of features. Experimental results show that a computer aided detection system with a USFL system outperforms a hand-crafted approach. Moreover, since the two feature sets have complementary properties, a hybrid system that makes use of both hand-crafted and unsupervised learned features, shows a significant performance boost compared to each system separately, getting close to the performance of an independent human expert.
Unsupervised learning of discriminative edge measures for vehicle matching between nonoverlapping cameras.

PubMed

Shan, Ying; Sawhney, Harpreet S; Kumar, Rakesh

2008-04-01

This paper proposes a novel unsupervised algorithm learning discriminative features in the context of matching road vehicles between two non-overlapping cameras. The matching problem is formulated as a same-different classification problem, which aims to compute the probability of vehicle images from two distinct cameras being from the same vehicle or different vehicle(s). We employ a novel measurement vector that consists of three independent edge-based measures and their associated robust measures computed from a pair of aligned vehicle edge maps. The weight of each measure is determined by an unsupervised learning algorithm that optimally separates the same-different classes in the combined measurement space. This is achieved with a weak classification algorithm that automatically collects representative samples from same-different classes, followed by a more discriminative classifier based on Fisher' s Linear Discriminants and Gibbs Sampling. The robustness of the match measures and the use of unsupervised discriminant analysis in the classification ensures that the proposed method performs consistently in the presence of missing/false features, temporally and spatially changing illumination conditions, and systematic misalignment caused by different camera configurations. Extensive experiments based on real data of over 200 vehicles at different times of day demonstrate promising results.
Penalized unsupervised learning with outliers

PubMed Central

Witten, Daniela M.

2013-01-01

We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057
Comparisons of non-Gaussian statistical models in DNA methylation analysis.

PubMed

Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-06-16

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

PubMed Central

Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-01-01

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge

PubMed Central

Wagner, Florian

2015-01-01

Method Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. Results I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets. PMID:26575370
Unsupervised change detection in a particular vegetation land cover type using spectral angle mapper

NASA Astrophysics Data System (ADS)

Renza, Diego; Martinez, Estibaliz; Molina, Iñigo; Ballesteros L., Dora M.

2017-04-01

This paper presents a new unsupervised change detection methodology for multispectral images applied to specific land covers. The proposed method involves comparing each image against a reference spectrum, where the reference spectrum is obtained from the spectral signature of the type of coverage you want to detect. In this case the method has been tested using multispectral images (SPOT5) of the community of Madrid (Spain), and multispectral images (Quickbird) of an area over Indonesia that was impacted by the December 26, 2004 tsunami; here, the tests have focused on the detection of changes in vegetation. The image comparison is obtained by applying Spectral Angle Mapper between the reference spectrum and each multitemporal image. Then, a threshold to produce a single image of change is applied, which corresponds to the vegetation zones. The results for each multitemporal image are combined through an exclusive or (XOR) operation that selects vegetation zones that have changed over time. Finally, the derived results were compared against a supervised method based on classification with the Support Vector Machine. Furthermore, the NDVI-differencing and the Spectral Angle Mapper techniques were selected as unsupervised methods for comparison purposes. The main novelty of the method consists in the detection of changes in a specific land cover type (vegetation), therefore, for comparison purposes, the best scenario is to compare it with methods that aim to detect changes in a specific land cover type (vegetation). This is the main reason to select NDVI-based method and the post-classification method (SVM implemented in a standard software tool). To evaluate the improvements using a reference spectrum vector, the results are compared with the basic-SAM method. In SPOT5 image, the overall accuracy was 99.36% and the κ index was 90.11%; in Quickbird image, the overall accuracy was 97.5% and the κ index was 82.16%. Finally, the precision results of the method are comparable to those of a supervised method, supported by low detection of false positives and false negatives, along with a high overall accuracy and a high kappa index. On the other hand, the execution times were comparable to those of unsupervised methods of low computational load.
An unsupervised method for estimating the global horizontal irradiance from photovoltaic power measurements

NASA Astrophysics Data System (ADS)

Nespoli, Lorenzo; Medici, Vasco

2017-12-01

In this paper, we present a method to determine the global horizontal irradiance (GHI) from the power measurements of one or more PV systems, located in the same neighborhood. The method is completely unsupervised and is based on a physical model of a PV plant. The precise assessment of solar irradiance is pivotal for the forecast of the electric power generated by photovoltaic (PV) plants. However, on-ground measurements are expensive and are generally not performed for small and medium-sized PV plants. Satellite-based services represent a valid alternative to on site measurements, but their space-time resolution is limited. Results from two case studies located in Switzerland are presented. The performance of the proposed method at assessing GHI is compared with that of free and commercial satellite services. Our results show that the presented method is generally better than satellite-based services, especially at high temporal resolutions.
Numerical solution of the nonlinear Schrodinger equation by feedforward neural networks

NASA Astrophysics Data System (ADS)

Shirvany, Yazdan; Hayati, Mohsen; Moradian, Rostam

2008-12-01

We present a method to solve boundary value problems using artificial neural networks (ANN). A trial solution of the differential equation is written as a feed-forward neural network containing adjustable parameters (the weights and biases). From the differential equation and its boundary conditions we prepare the energy function which is used in the back-propagation method with momentum term to update the network parameters. We improved energy function of ANN which is derived from Schrodinger equation and the boundary conditions. With this improvement of energy function we can use unsupervised training method in the ANN for solving the equation. Unsupervised training aims to minimize a non-negative energy function. We used the ANN method to solve Schrodinger equation for few quantum systems. Eigenfunctions and energy eigenvalues are calculated. Our numerical results are in agreement with their corresponding analytical solution and show the efficiency of ANN method for solving eigenvalue problems.
Multilayer Extreme Learning Machine With Subnetwork Nodes for Representation Learning.

PubMed

Yang, Yimin; Wu, Q M Jonathan

2016-11-01

The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feedforward neural networks, provides efficient unified learning solutions for the applications of clustering, regression, and classification. It presents competitive accuracy with superb efficiency in many applications. However, ELM with subnetwork nodes architecture has not attracted much research attentions. Recently, many methods have been proposed for supervised/unsupervised dimension reduction or representation learning, but these methods normally only work for one type of problem. This paper studies the general architecture of multilayer ELM (ML-ELM) with subnetwork nodes, showing that: 1) the proposed method provides a representation learning platform with unsupervised/supervised and compressed/sparse representation learning and 2) experimental results on ten image datasets and 16 classification datasets show that, compared to other conventional feature learning methods, the proposed ML-ELM with subnetwork nodes performs competitively or much better than other feature learning methods.
VHR satellite multitemporal data to extract cultural landscape changes in the roman site of Grumentum

NASA Astrophysics Data System (ADS)

masini, nicola; Lasaponara, Rosa

2013-04-01

The papers deals with the use of VHR satellite multitemporal data set to extract cultural landscape changes in the roman site of Grumentum Grumentum is an ancient town, 50 km south of Potenza, located near the roman road of Via Herculea which connected the Venusia, in the north est of Basilicata, with Heraclea in the Ionian coast. The first settlement date back to the 6th century BC. It was resettled by the Romans in the 3rd century BC. Its urban fabric which evidences a long history from the Republican age to late Antiquity (III BC-V AD) is composed of the typical urban pattern of cardi and decumani. Its excavated ruins include a large amphitheatre, a theatre, the thermae, the Forum and some temples. There are many techniques nowadays available to capture and record differences in two or more images. In this paper we focus and apply the two main approaches which can be distinguished into : (i) unsupervised and (ii) supervised change detection methods. Unsupervised change detection methods are generally based on the transformation of the two multispectral images in to a single band or multiband image which are further analyzed to identify changes Unsupervised change detection techniques are generally based on three basic steps (i) the preprocessing step, (ii) a pixel-by-pixel comparison is performed, (iii). Identification of changes according to the magnitude an direction (positive /negative). Unsupervised change detection are generally based on the transformation of the two multispectral images into a single band or multiband image which are further analyzed to identify changes. Than the separation between changed and unchanged classes is obtained from the magnitude of the resulting spectral change vectors by means of empirical or theoretical well founded approaches Supervised change detection methods are generally based on supervised classification methods, which require the availability of a suitable training set for the learning process of the classifiers. Unsupervised change detection techniques are generally based on three basic steps (i) the preprocessing step, (ii) supervised classification is performed on the single dates or on the map obtained as the difference of two dates, (iii). Identification of changes according to the magnitude an direction (positive /negative). Supervised change detection are generally based on supervised classification methods, which require the availability of a suitable training set for the learning process of the classifiers, therefore these algorithms require a preliminary knowledge necessary: (i) to generate representative parameters for each class of interest; and (ii) to carry out the training stage Advantages and disadvantages of the supervised and unsupervised approaches are discuss. Finally results from the the satellite multitemporal dataset was also integrated with aerial photos from historical archive in order to expand the time window of the investigation and capture landscape changes occurred from the Agrarian Reform, in the 50s, up today.
Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study.

PubMed

Feltus, F Alex; Ficklin, Stephen P; Gibson, Scott M; Smith, Melissa C

2013-06-05

In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.
Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study

PubMed Central

2013-01-01

Background In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. Results A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Conclusions Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired. PMID:23738693
Prediction task guided representation learning of medical codes in EHR.

PubMed

Cui, Liwen; Xie, Xiaolei; Shen, Zuojun

2018-06-18

There have been rapidly growing applications using machine learning models for predictive analytics in Electronic Health Records (EHR) to improve the quality of hospital services and the efficiency of healthcare resource utilization. A fundamental and crucial step in developing such models is to convert medical codes in EHR to feature vectors. These medical codes are used to represent diagnoses or procedures. Their vector representations have a tremendous impact on the performance of machine learning models. Recently, some researchers have utilized representation learning methods from Natural Language Processing (NLP) to learn vector representations of medical codes. However, most previous approaches are unsupervised, i.e. the generation of medical code vectors is independent from prediction tasks. Thus, the obtained feature vectors may be inappropriate for a specific prediction task. Moreover, unsupervised methods often require a lot of samples to obtain reliable results, but most practical problems have very limited patient samples. In this paper, we develop a new method called Prediction Task Guided Health Record Aggregation (PTGHRA), which aggregates health records guided by prediction tasks, to construct training corpus for various representation learning models. Compared with unsupervised approaches, representation learning models integrated with PTGHRA yield a significant improvement in predictive capability of generated medical code vectors, especially for limited training samples. Copyright © 2018. Published by Elsevier Inc.
An evaluation of unsupervised and supervised learning algorithms for clustering landscape types in the United States

USGS Publications Warehouse

Wendel, Jochen; Buttenfield, Barbara P.; Stanislawski, Larry V.

2016-01-01

Knowledge of landscape type can inform cartographic generalization of hydrographic features, because landscape characteristics provide an important geographic context that affects variation in channel geometry, flow pattern, and network configuration. Landscape types are characterized by expansive spatial gradients, lacking abrupt changes between adjacent classes; and as having a limited number of outliers that might confound classification. The US Geological Survey (USGS) is exploring methods to automate generalization of features in the National Hydrography Data set (NHD), to associate specific sequences of processing operations and parameters with specific landscape characteristics, thus obviating manual selection of a unique processing strategy for every NHD watershed unit. A chronology of methods to delineate physiographic regions for the United States is described, including a recent maximum likelihood classification based on seven input variables. This research compares unsupervised and supervised algorithms applied to these seven input variables, to evaluate and possibly refine the recent classification. Evaluation metrics for unsupervised methods include the Davies–Bouldin index, the Silhouette index, and the Dunn index as well as quantization and topographic error metrics. Cross validation and misclassification rate analysis are used to evaluate supervised classification methods. The paper reports the comparative analysis and its impact on the selection of landscape regions. The compared solutions show problems in areas of high landscape diversity. There is some indication that additional input variables, additional classes, or more sophisticated methods can refine the existing classification.
Methods for automatic detection of artifacts in microelectrode recordings.

PubMed

Bakštein, Eduard; Sieger, Tomáš; Wild, Jiří; Novák, Daniel; Schneider, Jakub; Vostatek, Pavel; Urgošík, Dušan; Jech, Robert

2017-10-01

Extracellular microelectrode recording (MER) is a prominent technique for studies of extracellular single-unit neuronal activity. In order to achieve robust results in more complex analysis pipelines, it is necessary to have high quality input data with a low amount of artifacts. We show that noise (mainly electromagnetic interference and motion artifacts) may affect more than 25% of the recording length in a clinical MER database. We present several methods for automatic detection of noise in MER signals, based on (i) unsupervised detection of stationary segments, (ii) large peaks in the power spectral density, and (iii) a classifier based on multiple time- and frequency-domain features. We evaluate the proposed methods on a manually annotated database of 5735 ten-second MER signals from 58 Parkinson's disease patients. The existing methods for artifact detection in single-channel MER that have been rigorously tested, are based on unsupervised change-point detection. We show on an extensive real MER database that the presented techniques are better suited for the task of artifact identification and achieve much better results. The best-performing classifiers (bagging and decision tree) achieved artifact classification accuracy of up to 89% on an unseen test set and outperformed the unsupervised techniques by 5-10%. This was close to the level of agreement among raters using manual annotation (93.5%). We conclude that the proposed methods are suitable for automatic MER denoising and may help in the efficient elimination of undesirable signal artifacts. Copyright © 2017 Elsevier B.V. All rights reserved.
GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge.

PubMed

Wagner, Florian

2015-01-01

Genome-wide expression profiling is a widely used approach for characterizing heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such data typically relies on generic unsupervised methods, e.g. principal component analysis (PCA) or hierarchical clustering. However, generic methods fail to exploit prior knowledge about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that combines PCA with nonparametric GO enrichment analysis, in order to systematically search for sets of genes that are both strongly correlated and closely functionally related. These gene sets are then used to automatically generate expression signatures with functional labels, which collectively aim to provide a readily interpretable representation of biologically relevant similarities and differences. The robustness of the results obtained can be assessed by bootstrapping. I first applied GO-PCA to datasets containing diverse hematopoietic cell types from human and mouse, respectively. In both cases, GO-PCA generated a small number of signatures that represented the majority of lineages present, and whose labels reflected their respective biological characteristics. I then applied GO-PCA to human glioblastoma (GBM) data, and recovered signatures associated with four out of five previously defined GBM subtypes. My results demonstrate that GO-PCA is a powerful and versatile exploratory method that reduces an expression matrix containing thousands of genes to a much smaller set of interpretable signatures. In this way, GO-PCA aims to facilitate hypothesis generation, design of further analyses, and functional comparisons across datasets.

Accuracy of un-supervised versus provider-supervised self-administered HIV testing in Uganda: A randomized implementation trial.

PubMed

Asiimwe, Stephen; Oloya, James; Song, Xiao; Whalen, Christopher C

2014-12-01

Unsupervised HIV self-testing (HST) has potential to increase knowledge of HIV status; however, its accuracy is unknown. To estimate the accuracy of unsupervised HST in field settings in Uganda, we performed a non-blinded, randomized controlled, non-inferiority trial of unsupervised compared with supervised HST among selected high HIV risk fisherfolk (22.1 % HIV Prevalence) in three fishing villages in Uganda between July and September 2013. The study enrolled 246 participants and randomized them in a 1:1 ratio to unsupervised HST or provider-supervised HST. In an intent-to-treat analysis, the HST sensitivity was 90 % in the unsupervised arm and 100 % among the provider-supervised, yielding a difference 0f -10 % (90 % CI -21, 1 %); non-inferiority was not shown. In a per protocol analysis, the difference in sensitivity was -5.6 % (90 % CI -14.4, 3.3 %) and did show non-inferiority. We conclude that unsupervised HST is feasible in rural Africa and may be non-inferior to provider-supervised HST.
Unsupervised fuzzy segmentation of 3D magnetic resonance brain images

NASA Astrophysics Data System (ADS)

Velthuizen, Robert P.; Hall, Lawrence O.; Clarke, Laurence P.; Bensaid, Amine M.; Arrington, J. A.; Silbiger, Martin L.

1993-07-01

Unsupervised fuzzy methods are proposed for segmentation of 3D Magnetic Resonance images of the brain. Fuzzy c-means (FCM) has shown promising results for segmentation of single slices. FCM has been investigated for volume segmentations, both by combining results of single slices and by segmenting the full volume. Different strategies and initializations have been tried. In particular, two approaches have been used: (1) a method by which, iteratively, the furthest sample is split off to form a new cluster center, and (2) the traditional FCM in which the membership grade matrix is initialized in some way. Results have been compared with volume segmentations by k-means and with two supervised methods, k-nearest neighbors and region growing. Results of individual segmentations are presented as well as comparisons on the application of the different methods to a number of tumor patient data sets.
Unsupervised and self-mapping category formation and semantic object recognition for mobile robot vision used in an actual environment

NASA Astrophysics Data System (ADS)

Madokoro, H.; Tsukada, M.; Sato, K.

2013-07-01

This paper presents an unsupervised learning-based object category formation and recognition method for mobile robot vision. Our method has the following features: detection of feature points and description of features using a scale-invariant feature transform (SIFT), selection of target feature points using one class support vector machines (OC-SVMs), generation of visual words using self-organizing maps (SOMs), formation of labels using adaptive resonance theory 2 (ART-2), and creation and classification of categories on a category map of counter propagation networks (CPNs) for visualizing spatial relations between categories. Classification results of dynamic images using time-series images obtained using two different-size robots and according to movements respectively demonstrate that our method can visualize spatial relations of categories while maintaining time-series characteristics. Moreover, we emphasize the effectiveness of our method for category formation of appearance changes of objects.
Feature Selection for Ridge Regression with Provable Guarantees.

PubMed

Paul, Saurabh; Drineas, Petros

2016-04-01

We introduce single-set spectral sparsification as a deterministic sampling-based feature selection technique for regularized least-squares classification, which is the classification analog to ridge regression. The method is unsupervised and gives worst-case guarantees of the generalization power of the classification function after feature selection with respect to the classification function obtained using all features. We also introduce leverage-score sampling as an unsupervised randomized feature selection method for ridge regression. We provide risk bounds for both single-set spectral sparsification and leverage-score sampling on ridge regression in the fixed design setting and show that the risk in the sampled space is comparable to the risk in the full-feature space. We perform experiments on synthetic and real-world data sets; a subset of TechTC-300 data sets, to support our theory. Experimental results indicate that the proposed methods perform better than the existing feature selection methods.
Signature extension: An approach to operational multispectral surveys

NASA Technical Reports Server (NTRS)

Nalepka, R. F.; Morgenstern, J. P.

1973-01-01

Two data processing techniques were suggested as applicable to the large area survey problem. One approach was to use unsupervised classification (clustering) techniques. Investigation of this method showed that since the method did nothing to reduce the signal variability, the use of this method would be very time consuming and possibly inaccurate as well. The conclusion is that unsupervised classification techniques of themselves are not a solution to the large area survey problem. The other method investigated was the use of signature extension techniques. Such techniques function by normalizing the data to some reference condition. Thus signatures from an isolated area could be used to process large quantities of data. In this manner, ground information requirements and computer training are minimized. Several signature extension techniques were tested. The best of these allowed signatures to be extended between data sets collected four days and 80 miles apart with an average accuracy of better than 90%.
Use of Binary Partition Tree and energy minimization for object-based classification of urban land cover

NASA Astrophysics Data System (ADS)

Li, Mengmeng; Bijker, Wietske; Stein, Alfred

2015-04-01

Two main challenges are faced when classifying urban land cover from very high resolution satellite images: obtaining an optimal image segmentation and distinguishing buildings from other man-made objects. For optimal segmentation, this work proposes a hierarchical representation of an image by means of a Binary Partition Tree (BPT) and an unsupervised evaluation of image segmentations by energy minimization. For building extraction, we apply fuzzy sets to create a fuzzy landscape of shadows which in turn involves a two-step procedure. The first step is a preliminarily image classification at a fine segmentation level to generate vegetation and shadow information. The second step models the directional relationship between building and shadow objects to extract building information at the optimal segmentation level. We conducted the experiments on two datasets of Pléiades images from Wuhan City, China. To demonstrate its performance, the proposed classification is compared at the optimal segmentation level with Maximum Likelihood Classification and Support Vector Machine classification. The results show that the proposed classification produced the highest overall accuracies and kappa coefficients, and the smallest over-classification and under-classification geometric errors. We conclude first that integrating BPT with energy minimization offers an effective means for image segmentation. Second, we conclude that the directional relationship between building and shadow objects represented by a fuzzy landscape is important for building extraction.
Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

PubMed

Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

2017-07-01

The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.
Unsupervised quality estimation model for English to German translation and its application in extensive supervised evaluation.

PubMed

Han, Aaron L-F; Wong, Derek F; Chao, Lidia S; He, Liangye; Lu, Yi

2014-01-01

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.
Application of diffusion maps to identify human factors of self-reported anomalies in aviation.

PubMed

Andrzejczak, Chris; Karwowski, Waldemar; Mikusinski, Piotr

2012-01-01

A study investigating what factors are present leading to pilots submitting voluntary anomaly reports regarding their flight performance was conducted. Diffusion Maps (DM) were selected as the method of choice for performing dimensionality reduction on text records for this study. Diffusion Maps have seen successful use in other domains such as image classification and pattern recognition. High-dimensionality data in the form of narrative text reports from the NASA Aviation Safety Reporting System (ASRS) were clustered and categorized by way of dimensionality reduction. Supervised analyses were performed to create a baseline document clustering system. Dimensionality reduction techniques identified concepts or keywords within records, and allowed the creation of a framework for an unsupervised document classification system. Results from the unsupervised clustering algorithm performed similarly to the supervised methods outlined in the study. The dimensionality reduction was performed on 100 of the most commonly occurring words within 126,000 text records describing commercial aviation incidents. This study demonstrates that unsupervised machine clustering and organization of incident reports is possible based on unbiased inputs. Findings from this study reinforced traditional views on what factors contribute to civil aviation anomalies, however, new associations between previously unrelated factors and conditions were also found.
Clustering and visualizing similarity networks of membrane proteins.

PubMed

Hu, Geng-Ming; Mai, Te-Lun; Chen, Chi-Ming

2015-08-01

We proposed a fast and unsupervised clustering method, minimum span clustering (MSC), for analyzing the sequence-structure-function relationship of biological networks, and demonstrated its validity in clustering the sequence/structure similarity networks (SSN) of 682 membrane protein (MP) chains. The MSC clustering of MPs based on their sequence information was found to be consistent with their tertiary structures and functions. For the largest seven clusters predicted by MSC, the consistency in chain function within the same cluster is found to be 100%. From analyzing the edge distribution of SSN for MPs, we found a characteristic threshold distance for the boundary between clusters, over which SSN of MPs could be properly clustered by an unsupervised sparsification of the network distance matrix. The clustering results of MPs from both MSC and the unsupervised sparsification methods are consistent with each other, and have high intracluster similarity and low intercluster similarity in sequence, structure, and function. Our study showed a strong sequence-structure-function relationship of MPs. We discussed evidence of convergent evolution of MPs and suggested applications in finding structural similarities and predicting biological functions of MP chains based on their sequence information. © 2015 Wiley Periodicals, Inc.
Unsupervised method for automatic construction of a disease dictionary from a large free text collection.

PubMed

Xu, Rong; Supekar, Kaustubh; Morgan, Alex; Das, Amar; Garber, Alan

2008-11-06

Concept specific lexicons (e.g. diseases, drugs, anatomy) are a critical source of background knowledge for many medical language-processing systems. However, the rapid pace of biomedical research and the lack of constraints on usage ensure that such dictionaries are incomplete. Focusing on disease terminology, we have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive medical dictionary of disease terms from randomized clinical trial (RCT) abstracts, and we compared different ranking methods for automatically extracting con-textual patterns and concept terms. When used to identify disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 increased by 35-88%) over available, manually created disease terminologies.
Unsupervised Method for Automatic Construction of a Disease Dictionary from a Large Free Text Collection

PubMed Central

Xu, Rong; Supekar, Kaustubh; Morgan, Alex; Das, Amar; Garber, Alan

2008-01-01

Concept specific lexicons (e.g. diseases, drugs, anatomy) are a critical source of background knowledge for many medical language-processing systems. However, the rapid pace of biomedical research and the lack of constraints on usage ensure that such dictionaries are incomplete. Focusing on disease terminology, we have developed an automated, unsupervised, iterative pattern learning approach for constructing a comprehensive medical dictionary of disease terms from randomized clinical trial (RCT) abstracts, and we compared different ranking methods for automatically extracting contextual patterns and concept terms. When used to identify disease concepts from 100 randomly chosen, manually annotated clinical abstracts, our disease dictionary shows significant performance improvement (F1 increased by 35–88%) over available, manually created disease terminologies. PMID:18999169
Unsupervised daily routine and activity discovery in smart homes.

PubMed

Jie Yin; Qing Zhang; Karunanithi, Mohan

2015-08-01

The ability to accurately recognize daily activities of residents is a core premise of smart homes to assist with remote health monitoring. Most of the existing methods rely on a supervised model trained from a preselected and manually labeled set of activities, which are often time-consuming and costly to obtain in practice. In contrast, this paper presents an unsupervised method for discovering daily routines and activities for smart home residents. Our proposed method first uses a Markov chain to model a resident's locomotion patterns at different times of day and discover clusters of daily routines at the macro level. For each routine cluster, it then drills down to further discover room-level activities at the micro level. The automatic identification of daily routines and activities is useful for understanding indicators of functional decline of elderly people and suggesting timely interventions.
FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection.

PubMed

Noto, Keith; Brodley, Carla; Slonim, Donna

2012-01-01

Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called "normal" instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.
FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

PubMed Central

Brodley, Carla; Slonim, Donna

2011-01-01

Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach. PMID:22639542
ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.

PubMed

Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka

2015-01-01

Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.
An Improved Unsupervised Image Segmentation Evaluation Approach Based on - and Over-Segmentation Aware

NASA Astrophysics Data System (ADS)

Su, Tengfei

2018-04-01

In this paper, an unsupervised evaluation scheme for remote sensing image segmentation is developed. Based on a method called under- and over-segmentation aware (UOA), the new approach is improved by overcoming the defect in the part of estimating over-segmentation error. Two cases of such error-prone defect are listed, and edge strength is employed to devise a solution to this issue. Two subsets of high resolution remote sensing images were used to test the proposed algorithm, and the experimental results indicate its superior performance, which is attributed to its improved OSE detection model.
A SOFTWARE PACKAGE FOR UNSUPERVISED PATTERN RECOGNITION AND SYNOPTIC REPRESENTATION OF RESULTS: APPLICATION TO VOLCANIC TREMOR DATA OF MT ETNA

NASA Astrophysics Data System (ADS)

Langer, H. K.; Falsaperla, S. M.; Behncke, B.; Messina, A.; Spampinato, S.

2009-12-01

Artificial Intelligence (AI) has found broad applications in volcano observatories worldwide with the aim of reducing volcanic hazard. The need to process larger and larger quantity of data makes indeed AI techniques appealing for monitoring purposes. Tools based on Artificial Neural Networks and Support Vector Machine have proved to be particularly successful in the classification of seismic events and volcanic tremor changes heralding eruptive activity, such as paroxysmal explosions and lava fountaining at Stromboli and Mt Etna, Italy (e.g., Falsaperla et al., 1996; Langer et al., 2009). Moving on from the excellent results obtained from these applications, we present KKAnalysis, a MATLAB based software which combines several unsupervised pattern classification methods, exploiting routines of the SOM Toolbox 2 for MATLAB (http://www.cis.hut.fi/projects/somtoolbox). KKAnalysis is based on Self Organizing Maps (SOM) and clustering methods consisting of K-Means, Fuzzy C-Means, and a scheme based on a metrics accounting for correlation between components of the feature vector. We show examples of applications of this tool to volcanic tremor data recorded at Mt Etna between 2007 and 2009. This time span - during which Strombolian explosions, 7 episodes of lava fountaining and effusive activity occurred - is particularly interesting, as it encompassed different states of volcanic activity (i.e., non-eruptive, eruptive according to different styles) for the unsupervised classifier to identify, highlighting their development in time. Even subtle changes in the signal characteristics allow the unsupervised classifier to recognize features belonging to the different classes and stages of volcanic activity. A convenient color-code representation shows up the temporal development of the different classes of signal, making this method extremely helpful for monitoring purposes and surveillance. Though being developed for volcanic tremor classification, KKAnalysis is generally applicable to any type of physical or chemical pattern, provided that feature vectors are given in numerical form. References: Falsaperla, S., S. Graziani, G. Nunnari, and S. Spampinato (1996). Automatic classification of volcanic earthquakes by using multy-layered neural networks. Natural Hazard, 13, 205-228. Langer, H., S. Falsaperla, M. Masotti, R. Campanini, S. Spampinato, and A. Messina (2008). Synopsis of supervised and unsupervised pattern classification techniques applied to volcanic tremor data at Mt Etna, Italy. Geophys. J. Int., doi:10.1111/j.1365-246X.2009.04179.x.
Integrative analysis of gene expression and DNA methylation using unsupervised feature extraction for detecting candidate cancer biomarkers.

PubMed

Moon, Myungjin; Nakai, Kenta

2018-04-01

Currently, cancer biomarker discovery is one of the important research topics worldwide. In particular, detecting significant genes related to cancer is an important task for early diagnosis and treatment of cancer. Conventional studies mostly focus on genes that are differentially expressed in different states of cancer; however, noise in gene expression datasets and insufficient information in limited datasets impede precise analysis of novel candidate biomarkers. In this study, we propose an integrative analysis of gene expression and DNA methylation using normalization and unsupervised feature extractions to identify candidate biomarkers of cancer using renal cell carcinoma RNA-seq datasets. Gene expression and DNA methylation datasets are normalized by Box-Cox transformation and integrated into a one-dimensional dataset that retains the major characteristics of the original datasets by unsupervised feature extraction methods, and differentially expressed genes are selected from the integrated dataset. Use of the integrated dataset demonstrated improved performance as compared with conventional approaches that utilize gene expression or DNA methylation datasets alone. Validation based on the literature showed that a considerable number of top-ranked genes from the integrated dataset have known relationships with cancer, implying that novel candidate biomarkers can also be acquired from the proposed analysis method. Furthermore, we expect that the proposed method can be expanded for applications involving various types of multi-omics datasets.
Automated unsupervised multi-parametric classification of adipose tissue depots in skeletal muscle

PubMed Central

Valentinitsch, Alexander; Karampinos, Dimitrios C.; Alizai, Hamza; Subburaj, Karupppasamy; Kumar, Deepak; Link, Thomas M.; Majumdar, Sharmila

2012-01-01

Purpose To introduce and validate an automated unsupervised multi-parametric method for segmentation of the subcutaneous fat and muscle regions in order to determine subcutaneous adipose tissue (SAT) and intermuscular adipose tissue (IMAT) areas based on data from a quantitative chemical shift-based water-fat separation approach. Materials and Methods Unsupervised standard k-means clustering was employed to define sets of similar features (k = 2) within the whole multi-modal image after the water-fat separation. The automated image processing chain was composed of three primary stages including tissue, muscle and bone region segmentation. The algorithm was applied on calf and thigh datasets to compute SAT and IMAT areas and was compared to a manual segmentation. Results The IMAT area using the automatic segmentation had excellent agreement with the IMAT area using the manual segmentation for all the cases in the thigh (R2: 0.96) and for cases with up to moderate IMAT area in the calf (R2: 0.92). The group with the highest grade of muscle fat infiltration in the calf had the highest error in the inner SAT contour calculation. Conclusion The proposed multi-parametric segmentation approach combined with quantitative water-fat imaging provides an accurate and reliable method for an automated calculation of the SAT and IMAT areas reducing considerably the total post-processing time. PMID:23097409

Incrementally learning objects by touch: online discriminative and generative models for tactile-based recognition.

PubMed

Soh, Harold; Demiris, Yiannis

2014-01-01

Human beings not only possess the remarkable ability to distinguish objects through tactile feedback but are further able to improve upon recognition competence through experience. In this work, we explore tactile-based object recognition with learners capable of incremental learning. Using the sparse online infinite Echo-State Gaussian process (OIESGP), we propose and compare two novel discriminative and generative tactile learners that produce probability distributions over objects during object grasping/palpation. To enable iterative improvement, our online methods incorporate training samples as they become available. We also describe incremental unsupervised learning mechanisms, based on novelty scores and extreme value theory, when teacher labels are not available. We present experimental results for both supervised and unsupervised learning tasks using the iCub humanoid, with tactile sensors on its five-fingered anthropomorphic hand, and 10 different object classes. Our classifiers perform comparably to state-of-the-art methods (C4.5 and SVM classifiers) and findings indicate that tactile signals are highly relevant for making accurate object classifications. We also show that accurate "early" classifications are possible using only 20-30 percent of the grasp sequence. For unsupervised learning, our methods generate high quality clusterings relative to the widely-used sequential k-means and self-organising map (SOM), and we present analyses into the differences between the approaches.
Unsupervised Transfer Learning via Multi-Scale Convolutional Sparse Coding for Biomedical Applications

PubMed Central

Chang, Hang; Han, Ju; Zhong, Cheng; Snijders, Antoine M.; Mao, Jian-Hua

2017-01-01

The capabilities of (I) learning transferable knowledge across domains; and (II) fine-tuning the pre-learned base knowledge towards tasks with considerably smaller data scale are extremely important. Many of the existing transfer learning techniques are supervised approaches, among which deep learning has the demonstrated power of learning domain transferrable knowledge with large scale network trained on massive amounts of labeled data. However, in many biomedical tasks, both the data and the corresponding label can be very limited, where the unsupervised transfer learning capability is urgently needed. In this paper, we proposed a novel multi-scale convolutional sparse coding (MSCSC) method, that (I) automatically learns filter banks at different scales in a joint fashion with enforced scale-specificity of learned patterns; and (II) provides an unsupervised solution for learning transferable base knowledge and fine-tuning it towards target tasks. Extensive experimental evaluation of MSCSC demonstrates the effectiveness of the proposed MSCSC in both regular and transfer learning tasks in various biomedical domains. PMID:28129148
An unsupervised method for summarizing egocentric sport videos

NASA Astrophysics Data System (ADS)

Habibi Aghdam, Hamed; Jahani Heravi, Elnaz; Puig, Domenec

2015-12-01

People are getting more interested to record their sport activities using head-worn or hand-held cameras. This type of videos which is called egocentric sport videos has different motion and appearance patterns compared with life-logging videos. While a life-logging video can be defined in terms of well-defined human-object interactions, notwithstanding, it is not trivial to describe egocentric sport videos using well-defined activities. For this reason, summarizing egocentric sport videos based on human-object interaction might fail to produce meaningful results. In this paper, we propose an unsupervised method for summarizing egocentric videos by identifying the key-frames of the video. Our method utilizes both appearance and motion information and it automatically finds the number of the key-frames. Our blind user study on the new dataset collected from YouTube shows that in 93:5% cases, the users choose the proposed method as their first video summary choice. In addition, our method is within the top 2 choices of the users in 99% of studies.
Unsupervised learning toward brain imaging data analysis: cigarette craving and resistance related neuronal activations from functional magnetic resonance imaging data analysis

NASA Astrophysics Data System (ADS)

Kim, Dong-Youl; Lee, Jong-Hwan

2014-05-01

A data-driven unsupervised learning such as an independent component analysis was gainfully applied to bloodoxygenation- level-dependent (BOLD) functional magnetic resonance imaging (fMRI) data compared to a model-based general linear model (GLM). This is due to an ability of this unsupervised learning method to extract a meaningful neuronal activity from BOLD signal that is a mixture of confounding non-neuronal artifacts such as head motions and physiological artifacts as well as neuronal signals. In this study, we support this claim by identifying neuronal underpinnings of cigarette craving and cigarette resistance. The fMRI data were acquired from heavy cigarette smokers (n = 14) while they alternatively watched images with and without cigarette smoking. During acquisition of two fMRI runs, they were asked to crave when they watched cigarette smoking images or to resist the urge to smoke. Data driven approaches of group independent component analysis (GICA) method based on temporal concatenation (TC) and TCGICA with an extension of iterative dual-regression (TC-GICA-iDR) were applied to the data. From the results, cigarette craving and cigarette resistance related neuronal activations were identified in the visual area and superior frontal areas, respectively with a greater statistical significance from the TC-GICA-iDR method than the TC-GICA method. On the other hand, the neuronal activity levels in many of these regions were not statistically different from the GLM method between the cigarette craving and cigarette resistance due to potentially aberrant BOLD signals.
Towards an unsupervised device for the diagnosis of childhood pneumonia in low resource settings: automatic segmentation of respiratory sounds.

PubMed

Sola, J; Braun, F; Muntane, E; Verjus, C; Bertschi, M; Hugon, F; Manzano, S; Benissa, M; Gervaix, A

2016-08-01

Pneumonia remains the worldwide leading cause of children mortality under the age of five, with every year 1.4 million deaths. Unfortunately, in low resource settings, very limited diagnostic support aids are provided to point-of-care practitioners. Current UNICEF/WHO case management algorithm relies on the use of a chronometer to manually count breath rates on pediatric patients: there is thus a major need for more sophisticated tools to diagnose pneumonia that increase sensitivity and specificity of breath-rate-based algorithms. These tools should be low cost, and adapted to practitioners with limited training. In this work, a novel concept of unsupervised tool for the diagnosis of childhood pneumonia is presented. The concept relies on the automated analysis of respiratory sounds as recorded by a point-of-care electronic stethoscope. By identifying the presence of auscultation sounds at different chest locations, this diagnostic tool is intended to estimate a pneumonia likelihood score. After presenting the overall architecture of an algorithm to estimate pneumonia scores, the importance of a robust unsupervised method to identify inspiratory and expiratory phases of a respiratory cycle is highlighted. Based on data from an on-going study involving pediatric pneumonia patients, a first algorithm to segment respiratory sounds is suggested. The unsupervised algorithm relies on a Mel-frequency filter bank, a two-step Gaussian Mixture Model (GMM) description of data, and a final Hidden Markov Model (HMM) interpretation of inspiratory-expiratory sequences. Finally, illustrative results on first recruited patients are provided. The presented algorithm opens the doors to a new family of unsupervised respiratory sound analyzers that could improve future versions of case management algorithms for the diagnosis of pneumonia in low-resources settings.
Unsupervised Online Classifier in Sleep Scoring for Sleep Deprivation Studies

PubMed Central

Libourel, Paul-Antoine; Corneyllie, Alexandra; Luppi, Pierre-Hervé; Chouvet, Guy; Gervasoni, Damien

2015-01-01

Study Objective: This study was designed to evaluate an unsupervised adaptive algorithm for real-time detection of sleep and wake states in rodents. Design: We designed a Bayesian classifier that automatically extracts electroencephalogram (EEG) and electromyogram (EMG) features and categorizes non-overlapping 5-s epochs into one of the three major sleep and wake states without any human supervision. This sleep-scoring algorithm is coupled online with a new device to perform selective paradoxical sleep deprivation (PSD). Settings: Controlled laboratory settings for chronic polygraphic sleep recordings and selective PSD. Participants: Ten adult Sprague-Dawley rats instrumented for chronic polysomnographic recordings Measurements: The performance of the algorithm is evaluated by comparison with the score obtained by a human expert reader. Online detection of PS is then validated with a PSD protocol with duration of 72 hours. Results: Our algorithm gave a high concordance with human scoring with an average κ coefficient > 70%. Notably, the specificity to detect PS reached 92%. Selective PSD using real-time detection of PS strongly reduced PS amounts, leaving only brief PS bouts necessary for the detection of PS in EEG and EMG signals (4.7 ± 0.7% over 72 h, versus 8.9 ± 0.5% in baseline), and was followed by a significant PS rebound (23.3 ± 3.3% over 150 minutes). Conclusions: Our fully unsupervised data-driven algorithm overcomes some limitations of the other automated methods such as the selection of representative descriptors or threshold settings. When used online and coupled with our sleep deprivation device, it represents a better option for selective PSD than other methods like the tedious gentle handling or the platform method. Citation: Libourel PA, Corneyllie A, Luppi PH, Chouvet G, Gervasoni D. Unsupervised online classifier in sleep scoring for sleep deprivation studies. SLEEP 2015;38(5):815–828. PMID:25325478
A novel unsupervised spike sorting algorithm for intracranial EEG.

PubMed

Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R

2011-01-01

This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.
Unsupervised segmentation of lung fields in chest radiographs using multiresolution fractal feature vector and deformable models.

PubMed

Lee, Wen-Li; Chang, Koyin; Hsieh, Kai-Sheng

2016-09-01

Segmenting lung fields in a chest radiograph is essential for automatically analyzing an image. We present an unsupervised method based on multiresolution fractal feature vector. The feature vector characterizes the lung field region effectively. A fuzzy c-means clustering algorithm is then applied to obtain a satisfactory initial contour. The final contour is obtained by deformable models. The results show the feasibility and high performance of the proposed method. Furthermore, based on the segmentation of lung fields, the cardiothoracic ratio (CTR) can be measured. The CTR is a simple index for evaluating cardiac hypertrophy. After identifying a suspicious symptom based on the estimated CTR, a physician can suggest that the patient undergoes additional extensive tests before a treatment plan is finalized.
Unsupervised classification of operator workload from brain signals.

PubMed

Schultze-Kraft, Matthias; Dähne, Sven; Gugler, Manfred; Curio, Gabriel; Blankertz, Benjamin

2016-06-01

In this study we aimed for the classification of operator workload as it is expected in many real-life workplace environments. We explored brain-signal based workload predictors that differ with respect to the level of label information required for training, including entirely unsupervised approaches. Subjects executed a task on a touch screen that required continuous effort of visual and motor processing with alternating difficulty. We first employed classical approaches for workload state classification that operate on the sensor space of EEG and compared those to the performance of three state-of-the-art spatial filtering methods: common spatial patterns (CSPs) analysis, which requires binary label information; source power co-modulation (SPoC) analysis, which uses the subjects' error rate as a target function; and canonical SPoC (cSPoC) analysis, which solely makes use of cross-frequency power correlations induced by different states of workload and thus represents an unsupervised approach. Finally, we investigated the effects of fusing brain signals and peripheral physiological measures (PPMs) and examined the added value for improving classification performance. Mean classification accuracies of 94%, 92% and 82% were achieved with CSP, SPoC, cSPoC, respectively. These methods outperformed the approaches that did not use spatial filtering and they extracted physiologically plausible components. The performance of the unsupervised cSPoC is significantly increased by augmenting it with PPM features. Our analyses ensured that the signal sources used for classification were of cortical origin and not contaminated with artifacts. Our findings show that workload states can be successfully differentiated from brain signals, even when less and less information from the experimental paradigm is used, thus paving the way for real-world applications in which label information may be noisy or entirely unavailable.
Unsupervised classification of operator workload from brain signals

NASA Astrophysics Data System (ADS)

Schultze-Kraft, Matthias; Dähne, Sven; Gugler, Manfred; Curio, Gabriel; Blankertz, Benjamin

2016-06-01

Objective. In this study we aimed for the classification of operator workload as it is expected in many real-life workplace environments. We explored brain-signal based workload predictors that differ with respect to the level of label information required for training, including entirely unsupervised approaches. Approach. Subjects executed a task on a touch screen that required continuous effort of visual and motor processing with alternating difficulty. We first employed classical approaches for workload state classification that operate on the sensor space of EEG and compared those to the performance of three state-of-the-art spatial filtering methods: common spatial patterns (CSPs) analysis, which requires binary label information; source power co-modulation (SPoC) analysis, which uses the subjects’ error rate as a target function; and canonical SPoC (cSPoC) analysis, which solely makes use of cross-frequency power correlations induced by different states of workload and thus represents an unsupervised approach. Finally, we investigated the effects of fusing brain signals and peripheral physiological measures (PPMs) and examined the added value for improving classification performance. Main results. Mean classification accuracies of 94%, 92% and 82% were achieved with CSP, SPoC, cSPoC, respectively. These methods outperformed the approaches that did not use spatial filtering and they extracted physiologically plausible components. The performance of the unsupervised cSPoC is significantly increased by augmenting it with PPM features. Significance. Our analyses ensured that the signal sources used for classification were of cortical origin and not contaminated with artifacts. Our findings show that workload states can be successfully differentiated from brain signals, even when less and less information from the experimental paradigm is used, thus paving the way for real-world applications in which label information may be noisy or entirely unavailable.
Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation

NASA Technical Reports Server (NTRS)

Sharma, Manali; Das, Kamalika; Bilgic, Mustafa; Matthews, Bryan; Nielsen, David Lynn; Oza, Nikunj C.

2016-01-01

A major focus of the commercial aviation community is discovery of unknown safety events in flight operations data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.
Unsupervised laparoscopic appendicectomy by surgical trainees is safe and time-effective.

PubMed

Wong, Kenneth; Duncan, Tristram; Pearson, Andrew

2007-07-01

Open appendicectomy is the traditional standard treatment for appendicitis. Laparoscopic appendicectomy is perceived as a procedure with greater potential for complications and longer operative times. This paper examines the hypothesis that unsupervised laparoscopic appendicectomy by surgical trainees is a safe and time-effective valid alternative. Medical records, operating theatre records and histopathology reports of all patients undergoing laparoscopic and open appendicectomy over a 15-month period in two hospitals within an area health service were retrospectively reviewed. Data were analysed to compare patient features, pathology findings, operative times, complications, readmissions and mortality between laparoscopic and open groups and between unsupervised surgical trainee operators versus consultant surgeon operators. A total of 143 laparoscopic and 222 open appendicectomies were reviewed. Unsupervised trainees performed 64% of the laparoscopic appendicectomies and 55% of the open appendicectomies. There were no significant differences in complication rates, readmissions, mortality and length of stay between laparoscopic and open appendicectomy groups or between trainee and consultant surgeon operators. Conversion rates (laparoscopic to open approach) were similar for trainees and consultants. Unsupervised senior surgical trainees did not take significantly longer to perform laparoscopic appendicectomy when compared to unsupervised trainee-performed open appendicectomy. Unsupervised laparoscopic appendicectomy by surgical trainees is safe and time-effective.
Partitioning Strategy Using Static Analysis Techniques

NASA Astrophysics Data System (ADS)

Seo, Yongjin; Soo Kim, Hyeon

2016-08-01

Flight software is software used in satellites' on-board computers. It has requirements such as real time and reliability. The IMA architecture is used to satisfy these requirements. The IMA architecture has the concept of partitions and this affected the configuration of flight software. That is, situations occurred in which software that had been loaded on one system was divided into many partitions when being loaded. For new issues, existing studies use experience based partitioning methods. However, these methods have a problem that they cannot be reused. In this respect, this paper proposes a partitioning method that is reusable and consistent.
Brain Network Regional Synchrony Analysis in Deafness

PubMed Central

Xu, Lei; Liang, Mao-Jin

2018-01-01

Deafness, the most common auditory disease, has greatly affected people for a long time. The major treatment for deafness is cochlear implantation (CI). However, till today, there is still a lack of objective and precise indicator serving as evaluation of the effectiveness of the cochlear implantation. The goal of this EEG-based study is to effectively distinguish CI children from those prelingual deafened children without cochlear implantation. The proposed method is based on the functional connectivity analysis, which focuses on the brain network regional synchrony. Specifically, we compute the functional connectivity between each channel pair first. Then, we quantify the brain network synchrony among regions of interests (ROIs), where both intraregional synchrony and interregional synchrony are computed. And finally the synchrony values are concatenated to form the feature vector for the SVM classifier. What is more, we develop a new ROI partition method of 128-channel EEG recording system. That is, both the existing ROI partition method and the proposed ROI partition method are used in the experiments. Compared with the existing EEG signal classification methods, our proposed method has achieved significant improvements as large as 87.20% and 86.30% when the existing ROI partition method and the proposed ROI partition method are used, respectively. It further demonstrates that the new ROI partition method is comparable to the existing ROI partition method. PMID:29854776
A Novel Coarsening Method for Scalable and Efficient Mesh Generation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, A; Hysom, D; Gunney, B

2010-12-02

In this paper, we propose a novel mesh coarsening method called brick coarsening method. The proposed method can be used in conjunction with any graph partitioners and scales to very large meshes. This method reduces problem space by decomposing the original mesh into fixed-size blocks of nodes called bricks, layered in a similar way to conventional brick laying, and then assigning each node of the original mesh to appropriate brick. Our experiments indicate that the proposed method scales to very large meshes while allowing simple RCB partitioner to produce higher-quality partitions with significantly less edge cuts. Our results further indicatemore » that the proposed brick-coarsening method allows more complicated partitioners like PT-Scotch to scale to very large problem size while still maintaining good partitioning performance with relatively good edge-cut metric. Graph partitioning is an important problem that has many scientific and engineering applications in such areas as VLSI design, scientific computing, and resource management. Given a graph G = (V,E), where V is the set of vertices and E is the set of edges, (k-way) graph partitioning problem is to partition the vertices of the graph (V) into k disjoint groups such that each group contains roughly equal number of vertices and the number of edges connecting vertices in different groups is minimized. Graph partitioning plays a key role in large scientific computing, especially in mesh-based computations, as it is used as a tool to minimize the volume of communication and to ensure well-balanced load across computing nodes. The impact of graph partitioning on the reduction of communication can be easily seen, for example, in different iterative methods to solve a sparse system of linear equation. Here, a graph partitioning technique is applied to the matrix, which is basically a graph in which each edge is a non-zero entry in the matrix, to allocate groups of vertices to processors in such a way that many of matrix-vector multiplication can be performed locally on each processor and hence to minimize communication. Furthermore, a good graph partitioning scheme ensures the equal amount of computation performed on each processor. Graph partitioning is a well known NP-complete problem, and thus the most commonly used graph partitioning algorithms employ some forms of heuristics. These algorithms vary in terms of their complexity, partition generation time, and the quality of partitions, and they tend to trade off these factors. A significant challenge we are currently facing at the Lawrence Livermore National Laboratory is how to partition very large meshes on massive-size distributed memory machines like IBM BlueGene/P, where scalability becomes a big issue. For example, we have found that the ParMetis, a very popular graph partitioning tool, can only scale to 16K processors. An ideal graph partitioning method on such an environment should be fast and scale to very large meshes, while producing high quality partitions. This is an extremely challenging task, as to scale to that level, the partitioning algorithm should be simple and be able to produce partitions that minimize inter-processor communications and balance the load imposed on the processors. Our goals in this work are two-fold: (1) To develop a new scalable graph partitioning method with good load balancing and communication reduction capability. (2) To study the performance of the proposed partitioning method on very large parallel machines using actual data sets and compare the performance to that of existing methods. The proposed method achieves the desired scalability by reducing the mesh size. For this, it coarsens an input mesh into a smaller size mesh by coalescing the vertices and edges of the original mesh into a set of mega-vertices and mega-edges. A new coarsening method called brick algorithm is developed in this research. In the brick algorithm, the zones in a given mesh are first grouped into fixed size blocks called bricks. These brick are then laid in a way similar to conventional brick laying technique, which reduces the number of neighboring blocks each block needs to communicate. Contributions of this research are as follows: (1) We have developed a novel method that scales to a really large problem size while producing high quality mesh partitions; (2) We measured the performance and scalability of the proposed method on a machine of massive size using a set of actual large complex data sets, where we have scaled to a mesh with 110 million zones using our method. To the best of our knowledge, this is the largest complex mesh that a partitioning method is successfully applied to; and (3) We have shown that proposed method can reduce the number of edge cuts by as much as 65%.« less
Using preoperative unsupervised cluster analysis of chronic rhinosinusitis to inform patient decision and endoscopic sinus surgery outcome.

PubMed

Adnane, Choaib; Adouly, Taoufik; Khallouk, Amine; Rouadi, Sami; Abada, Redallah; Roubal, Mohamed; Mahtar, Mohamed

2017-02-01

The purpose of this study is to use unsupervised cluster methodology to identify phenotype and mucosal eosinophilia endotype subgroups of patients with medical refractory chronic rhinosinusitis (CRS), and evaluate the difference in quality of life (QOL) outcomes after endoscopic sinus surgery (ESS) between these clusters for better surgical case selection. A prospective cohort study included 131 patients with medical refractory CRS who elected ESS. The Sino-Nasal Outcome Test (SNOT-22) was used to evaluate QOL before and 12 months after surgery. Unsupervised two-step clustering method was performed. One hundred and thirteen subjects were retained in this study: 46 patients with CRS without nasal polyps and 67 patients with nasal polyps. Nasal polyps, gender, mucosal eosinophilia profile, and prior sinus surgery were the most discriminating factors in the generated clusters. Three clusters were identified. A significant clinical improvement was observed in all clusters 12 months after surgery with a reduction of SNOT-22 scores. There was a significant difference in QOL outcomes between clusters; cluster 1 had the worst QOL improvement after FESS in comparison with the other clusters 2 and 3. All patients in cluster 1 presented CRSwNP with the highest mucosal eosinophilia endotype. Clustering method is able to classify CRS phenotypes and endotypes with different associated surgical outcomes.
Unsupervised Categorization in a Sample of Children with Autism Spectrum Disorders

ERIC Educational Resources Information Center

Edwards, Darren J.; Perlman, Amotz; Reed, Phil

2012-01-01

Studies of supervised Categorization have demonstrated limited Categorization performance in participants with autism spectrum disorders (ASD), however little research has been conducted regarding unsupervised Categorization in this population. This study explored unsupervised Categorization using two stimulus sets that differed in their…
Unsupervised parameter optimization for automated retention time alignment of severely shifted gas chromatographic data using the piecework alignment algorithm.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pierce, Karisa M.; Wright, Bob W.; Synovec, Robert E.

2007-02-02

First, simulated chromatographic separations with declining retention time precision were used to study the performance of the piecewise retention time alignment algorithm and to demonstrate an unsupervised parameter optimization method. The average correlation coefficient between the first chromatogram and every other chromatogram in the data set was used to optimize the alignment parameters. This correlation method does not require a training set, so it is unsupervised and automated. This frees the user from needing to provide class information and makes the alignment algorithm more generally applicable to classifying completely unknown data sets. For a data set of simulated chromatograms wheremore » the average chromatographic peak was shifted past two neighboring peaks between runs, the average correlation coefficient of the raw data was 0.46 ± 0.25. After automated, optimized piecewise alignment, the average correlation coefficient was 0.93 ± 0.02. Additionally, a relative shift metric and principal component analysis (PCA) were used to independently quantify and categorize the alignment performance, respectively. The relative shift metric was defined as four times the standard deviation of a given peak’s retention time in all of the chromatograms, divided by the peak-width-at-base. The raw simulated data sets that were studied contained peaks with average relative shifts ranging between 0.3 and 3.0. Second, a “real” data set of gasoline separations was gathered using three different GC methods to induce severe retention time shifting. In these gasoline separations, retention time precision improved ~8 fold following alignment. Finally, piecewise alignment and the unsupervised correlation optimization method were applied to severely shifted GC separations of reformate distillation fractions. The effect of piecewise alignment on peak heights and peak areas is also reported. Piecewise alignment either did not change the peak height, or caused it to slightly decrease. The average relative difference in peak height after piecewise alignment was –0.20%. Piecewise alignment caused the peak areas to either stay the same, slightly increase, or slightly decrease. The average absolute relative difference in area after piecewise alignment was 0.15%.« less
Spatiotemporal information during unsupervised learning enhances viewpoint invariant object recognition

PubMed Central

Tian, Moqian; Grill-Spector, Kalanit

2015-01-01

Recognizing objects is difficult because it requires both linking views of an object that can be different and distinguishing objects with similar appearance. Interestingly, people can learn to recognize objects across views in an unsupervised way, without feedback, just from the natural viewing statistics. However, there is intense debate regarding what information during unsupervised learning is used to link among object views. Specifically, researchers argue whether temporal proximity, motion, or spatiotemporal continuity among object views during unsupervised learning is beneficial. Here, we untangled the role of each of these factors in unsupervised learning of novel three-dimensional (3-D) objects. We found that after unsupervised training with 24 object views spanning a 180° view space, participants showed significant improvement in their ability to recognize 3-D objects across rotation. Surprisingly, there was no advantage to unsupervised learning with spatiotemporal continuity or motion information than training with temporal proximity. However, we discovered that when participants were trained with just a third of the views spanning the same view space, unsupervised learning via spatiotemporal continuity yielded significantly better recognition performance on novel views than learning via temporal proximity. These results suggest that while it is possible to obtain view-invariant recognition just from observing many views of an object presented in temporal proximity, spatiotemporal information enhances performance by producing representations with broader view tuning than learning via temporal association. Our findings have important implications for theories of object recognition and for the development of computational algorithms that learn from examples. PMID:26024454
Unsupervised Learning Through Randomized Algorithms for High-Volume High-Velocity Data (ULTRA-HV).

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pinar, Ali; Kolda, Tamara G.; Carlberg, Kevin Thomas

Through long-term investments in computing, algorithms, facilities, and instrumentation, DOE is an established leader in massive-scale, high-fidelity simulations, as well as science-leading experimentation. In both cases, DOE is generating more data than it can analyze and the problem is intensifying quickly. The need for advanced algorithms that can automatically convert the abundance of data into a wealth of useful information by discovering hidden structures is well recognized. Such efforts however, are hindered by the massive volume of the data and its high velocity. Here, the challenge is developing unsupervised learning methods to discover hidden structure in high-volume, high-velocity data.

Full-body gestures and movements recognition: user descriptive and unsupervised learning approaches in GDL classifier

NASA Astrophysics Data System (ADS)

Hachaj, Tomasz; Ogiela, Marek R.

2014-09-01

Gesture Description Language (GDL) is a classifier that enables syntactic description and real time recognition of full-body gestures and movements. Gestures are described in dedicated computer language named Gesture Description Language script (GDLs). In this paper we will introduce new GDLs formalisms that enable recognition of selected classes of movement trajectories. The second novelty is new unsupervised learning method with which it is possible to automatically generate GDLs descriptions. We have initially evaluated both proposed extensions of GDL and we have obtained very promising results. Both the novel methodology and evaluation results will be described in this paper.
Hierarchical Adaptive Means (HAM) clustering for hardware-efficient, unsupervised and real-time spike sorting.

PubMed

Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G

2014-09-30

This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.
Method for chemical amplification based on fluid partitioning in an immiscible liquid

DOEpatents

Anderson, Brian L.; Colston, Bill W.; Elkin, Christopher J.

2015-06-02

A system for nucleic acid amplification of a sample comprises partitioning the sample into partitioned sections and performing PCR on the partitioned sections of the sample. Another embodiment of the invention provides a system for nucleic acid amplification and detection of a sample comprising partitioning the sample into partitioned sections, performing PCR on the partitioned sections of the sample, and detecting and analyzing the partitioned sections of the sample.
Method for chemical amplification based on fluid partitioning in an immiscible liquid

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, Brian L.; Colston, Bill W.; Elkin, Christopher J.

A system for nucleic acid amplification of a sample comprises partitioning the sample into partitioned sections and performing PCR on the partitioned sections of the sample. Another embodiment of the invention provides a system for nucleic acid amplification and detection of a sample comprising partitioning the sample into partitioned sections, performing PCR on the partitioned sections of the sample, and detecting and analyzing the partitioned sections of the sample.
The prediction of blood-tissue partitions, water-skin partitions and skin permeation for agrochemicals.

PubMed

Abraham, Michael H; Gola, Joelle M R; Ibrahim, Adam; Acree, William E; Liu, Xiangli

2014-07-01

There is considerable interest in the blood-tissue distribution of agrochemicals, and a number of researchers have developed experimental methods for in vitro distribution. These methods involve the determination of saline-blood and saline-tissue partitions; not only are they indirect, but they do not yield the required in vivo distribution. The authors set out equations for gas-tissue and blood-tissue distribution, for partition from water into skin and for permeation from water through human skin. Together with Abraham descriptors for the agrochemicals, these equations can be used to predict values for all of these processes. The present predictions compare favourably with experimental in vivo blood-tissue distribution where available. The predictions require no more than simple arithmetic. The present method represents a much easier and much more economic way of estimating blood-tissue partitions than the method that uses saline-blood and saline-tissue partitions. It has the added advantages of yielding the required in vivo partitions and being easily extended to the prediction of partition of agrochemicals from water into skin and permeation from water through skin. © 2013 Society of Chemical Industry.
Predicting category intuitiveness with the rational model, the simplicity model, and the generalized context model.

PubMed

Pothos, Emmanuel M; Bailey, Todd M

2009-07-01

Naïve observers typically perceive some groupings for a set of stimuli as more intuitive than others. The problem of predicting category intuitiveness has been historically considered the remit of models of unsupervised categorization. In contrast, this article develops a measure of category intuitiveness from one of the most widely supported models of supervised categorization, the generalized context model (GCM). Considering different category assignments for a set of instances, the authors asked how well the GCM can predict the classification of each instance on the basis of all the other instances. The category assignment that results in the smallest prediction error is interpreted as the most intuitive for the GCM-the authors refer to this way of applying the GCM as "unsupervised GCM." The authors systematically compared predictions of category intuitiveness from the unsupervised GCM and two models of unsupervised categorization: the simplicity model and the rational model. The unsupervised GCM compared favorably with the simplicity model and the rational model. This success of the unsupervised GCM illustrates that the distinction between supervised and unsupervised categorization may need to be reconsidered. However, no model emerged as clearly superior, indicating that there is more work to be done in understanding and modeling category intuitiveness.
A strategy to load balancing for non-connectivity MapReduce job

NASA Astrophysics Data System (ADS)

Zhou, Huaping; Liu, Guangzong; Gui, Haixia

2017-09-01

MapReduce has been widely used in large scale and complex datasets as a kind of distributed programming model. Original Hash partitioning function in MapReduce often results the problem of data skew when data distribution is uneven. To solve the imbalance of data partitioning, we proposes a strategy to change the remaining partitioning index when data is skewed. In Map phase, we count the amount of data which will be distributed to each reducer, then Job Tracker monitor the global partitioning information and dynamically modify the original partitioning function according to the data skew model, so the Partitioner can change the index of these partitioning which will cause data skew to the other reducer that has less load in the next partitioning process, and can eventually balance the load of each node. Finally, we experimentally compare our method with existing methods on both synthetic and real datasets, the experimental results show our strategy can solve the problem of data skew with better stability and efficiency than Hash method and Sampling method for non-connectivity MapReduce task.
One-Channel Surface Electromyography Decomposition for Muscle Force Estimation.

PubMed

Sun, Wentao; Zhu, Jinying; Jiang, Yinlai; Yokoi, Hiroshi; Huang, Qiang

2018-01-01

Estimating muscle force by surface electromyography (sEMG) is a non-invasive and flexible way to diagnose biomechanical diseases and control assistive devices such as prosthetic hands. To estimate muscle force using sEMG, a supervised method is commonly adopted. This requires simultaneous recording of sEMG signals and muscle force measured by additional devices to tune the variables involved. However, recording the muscle force of the lost limb of an amputee is challenging, and the supervised method has limitations in this regard. Although the unsupervised method does not require muscle force recording, it suffers from low accuracy due to a lack of reference data. To achieve accurate and easy estimation of muscle force by the unsupervised method, we propose a decomposition of one-channel sEMG signals into constituent motor unit action potentials (MUAPs) in two steps: (1) learning an orthogonal basis of sEMG signals through reconstruction independent component analysis; (2) extracting spike-like MUAPs from the basis vectors. Nine healthy subjects were recruited to evaluate the accuracy of the proposed approach in estimating muscle force of the biceps brachii. The results demonstrated that the proposed approach based on decomposed MUAPs explains more than 80% of the muscle force variability recorded at an arbitrary force level, while the conventional amplitude-based approach explains only 62.3% of this variability. With the proposed approach, we were also able to achieve grip force control of a prosthetic hand, which is one of the most important clinical applications of the unsupervised method. Experiments on two trans-radial amputees indicated that the proposed approach improves the performance of the prosthetic hand in grasping everyday objects.
The influence of unsupervised time on elementary school children at high risk for inattention and problem behaviors.

PubMed

Na, Kyoung-Sae; Lee, Soyoung Irene; Hong, Hyun Ju; Oh, Myoung-Ja; Bahn, Geon Ho; Ha, Kyunghee; Shin, Yun Mi; Song, Jungeun; Park, Eun Jin; Yoo, Heejung; Kim, Hyunsoo; Kyung, Yun-Mi

2014-06-01

In the last few decades, changing socioeconomic and family structures have increasingly left children alone without adult supervision. Carefully prepared and limited periods of unsupervised time are not harmful for children. However, long unsupervised periods have harmful effects, particularly for those children at high risk for inattention and problem behaviors. In this study, we examined the influence of unsupervised time on behavior problems by studying a sample of elementary school children at high risk for inattention and problem behaviors. The study analyzed data from the Children's Mental Health Promotion Project, which was conducted in collaboration with education, government, and mental health professionals. The child behavior checklist (CBCL) was administered to assess problem behaviors among first- and fourth-grade children. Multivariate logistic regression analysis was used to evaluate the influence of unsupervised time on children's behavior. A total of 3,270 elementary school children (1,340 first-graders and 1,930 fourth-graders) were available for this study; 1,876 of the 3,270 children (57.4%) reportedly spent a significant amount of time unsupervised during the day. Unsupervised time that exceeded more than 2h per day increased the risk of delinquency, aggressive behaviors, and somatic complaints, as well as externalizing and internalizing problems. Carefully planned afterschool programming and care should be provided to children at high risk for inattention and problem behaviors. Also, a more comprehensive approach is needed to identify the possible mechanisms by which unsupervised time aggravates behavior problems in children predisposed for these behaviors. Copyright © 2013 Elsevier Ltd. All rights reserved.
Improved Estimation of Cardiac Function Parameters Using a Combination of Independent Automated Segmentation Results in Cardiovascular Magnetic Resonance Imaging.

PubMed

Lebenberg, Jessica; Lalande, Alain; Clarysse, Patrick; Buvat, Irene; Casta, Christopher; Cochet, Alexandre; Constantinidès, Constantin; Cousty, Jean; de Cesare, Alain; Jehan-Besson, Stephanie; Lefort, Muriel; Najman, Laurent; Roullot, Elodie; Sarry, Laurent; Tilmant, Christophe; Frouin, Frederique; Garreau, Mireille

2015-01-01

This work aimed at combining different segmentation approaches to produce a robust and accurate segmentation result. Three to five segmentation results of the left ventricle were combined using the STAPLE algorithm and the reliability of the resulting segmentation was evaluated in comparison with the result of each individual segmentation method. This comparison was performed using a supervised approach based on a reference method. Then, we used an unsupervised statistical evaluation, the extended Regression Without Truth (eRWT) that ranks different methods according to their accuracy in estimating a specific biomarker in a population. The segmentation accuracy was evaluated by estimating six cardiac function parameters resulting from the left ventricle contour delineation using a public cardiac cine MRI database. Eight different segmentation methods, including three expert delineations and five automated methods, were considered, and sixteen combinations of the automated methods using STAPLE were investigated. The supervised and unsupervised evaluations demonstrated that in most cases, STAPLE results provided better estimates than individual automated segmentation methods. Overall, combining different automated segmentation methods improved the reliability of the segmentation result compared to that obtained using an individual method and could achieve the accuracy of an expert.
Improved Estimation of Cardiac Function Parameters Using a Combination of Independent Automated Segmentation Results in Cardiovascular Magnetic Resonance Imaging

PubMed Central

Lebenberg, Jessica; Lalande, Alain; Clarysse, Patrick; Buvat, Irene; Casta, Christopher; Cochet, Alexandre; Constantinidès, Constantin; Cousty, Jean; de Cesare, Alain; Jehan-Besson, Stephanie; Lefort, Muriel; Najman, Laurent; Roullot, Elodie; Sarry, Laurent; Tilmant, Christophe

2015-01-01

This work aimed at combining different segmentation approaches to produce a robust and accurate segmentation result. Three to five segmentation results of the left ventricle were combined using the STAPLE algorithm and the reliability of the resulting segmentation was evaluated in comparison with the result of each individual segmentation method. This comparison was performed using a supervised approach based on a reference method. Then, we used an unsupervised statistical evaluation, the extended Regression Without Truth (eRWT) that ranks different methods according to their accuracy in estimating a specific biomarker in a population. The segmentation accuracy was evaluated by estimating six cardiac function parameters resulting from the left ventricle contour delineation using a public cardiac cine MRI database. Eight different segmentation methods, including three expert delineations and five automated methods, were considered, and sixteen combinations of the automated methods using STAPLE were investigated. The supervised and unsupervised evaluations demonstrated that in most cases, STAPLE results provided better estimates than individual automated segmentation methods. Overall, combining different automated segmentation methods improved the reliability of the segmentation result compared to that obtained using an individual method and could achieve the accuracy of an expert. PMID:26287691
Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.

PubMed

Wu, Yonghui; Jiang, Min; Lei, Jianbo; Xu, Hua

2015-01-01

Rapid growth in electronic health records (EHRs) use has led to an unprecedented expansion of available clinical data in electronic formats. However, much of the important healthcare information is locked in the narrative documents. Therefore Natural Language Processing (NLP) technologies, e.g., Named Entity Recognition that identifies boundaries and types of entities, has been extensively studied to unlock important clinical information in free text. In this study, we investigated a novel deep learning method to recognize clinical entities in Chinese clinical documents using the minimal feature engineering approach. We developed a deep neural network (DNN) to generate word embeddings from a large unlabeled corpus through unsupervised learning and another DNN for the NER task. The experiment results showed that the DNN with word embeddings trained from the large unlabeled corpus outperformed the state-of-the-art CRF's model in the minimal feature engineering setting, achieving the highest F1-score of 0.9280. Further analysis showed that word embeddings derived through unsupervised learning from large unlabeled corpus remarkably improved the DNN with randomized embedding, denoting the usefulness of unsupervised feature learning.
Partitioning of Alkali Metal Salts and Boric Acid from Aqueous Phase into the Polyamide Active Layers of Reverse Osmosis Membranes.

PubMed

Wang, Jingbo; Kingsbury, Ryan S; Perry, Lamar A; Coronell, Orlando

2017-02-21

The partition coefficient of solutes into the polyamide active layer of reverse osmosis (RO) membranes is one of the three membrane properties (together with solute diffusion coefficient and active layer thickness) that determine solute permeation. However, no well-established method exists to measure solute partition coefficients into polyamide active layers. Further, the few studies that measured partition coefficients for inorganic salts report values significantly higher than one (∼3-8), which is contrary to expectations from Donnan theory and the observed high rejection of salts. As such, we developed a benchtop method to determine solute partition coefficients into the polyamide active layers of RO membranes. The method uses a quartz crystal microbalance (QCM) to measure the change in the mass of the active layer caused by the uptake of the partitioned solutes. The method was evaluated using several inorganic salts (alkali metal salts of chloride) and a weak acid of common concern in water desalination (boric acid). All partition coefficients were found to be lower than 1, in general agreement with expectations from Donnan theory. Results reported in this study advance the fundamental understanding of contaminant transport through RO membranes, and can be used in future studies to decouple the contributions of contaminant partitioning and diffusion to contaminant permeation.
Domain decomposition by the advancing-partition method for parallel unstructured grid generation

NASA Technical Reports Server (NTRS)

Banihashemi, legal representative, Soheila (Inventor); Pirzadeh, Shahyar Z. (Inventor)

2012-01-01

In a method for domain decomposition for generating unstructured grids, a surface mesh is generated for a spatial domain. A location of a partition plane dividing the domain into two sections is determined. Triangular faces on the surface mesh that intersect the partition plane are identified. A partition grid of tetrahedral cells, dividing the domain into two sub-domains, is generated using a marching process in which a front comprises only faces of new cells which intersect the partition plane. The partition grid is generated until no active faces remain on the front. Triangular faces on each side of the partition plane are collected into two separate subsets. Each subset of triangular faces is renumbered locally and a local/global mapping is created for each sub-domain. A volume grid is generated for each sub-domain. The partition grid and volume grids are then merged using the local-global mapping.
Methods and Systems for Authorizing an Effector Command in an Integrated Modular Environment

NASA Technical Reports Server (NTRS)

Sunderland, Dean E. (Inventor); Ahrendt, Terry J. (Inventor); Moore, Tim (Inventor)

2013-01-01

Methods and systems are provided for authorizing a command of an integrated modular environment in which a plurality of partitions control actions of a plurality of effectors is provided. A first identifier, a second identifier, and a third identifier are determined. The first identifier identifies a first partition of the plurality of partitions from which the command originated. The second identifier identifies a first effector of the plurality of effectors for which the command is intended. The third identifier identifies a second partition of the plurality of partitions that is responsible for controlling the first effector. The first identifier and the third identifier are compared to determine whether the first partition is the same as the second partition for authorization of the command.
Unsupervised change detection of multispectral images based on spatial constraint chi-squared transform and Markov random field model

NASA Astrophysics Data System (ADS)

Shi, Aiye; Wang, Chao; Shen, Shaohong; Huang, Fengchen; Ma, Zhenli

2016-10-01

Chi-squared transform (CST), as a statistical method, can describe the difference degree between vectors. The CST-based methods operate directly on information stored in the difference image and are simple and effective methods for detecting changes in remotely sensed images that have been registered and aligned. However, the technique does not take spatial information into consideration, which leads to much noise in the result of change detection. An improved unsupervised change detection method is proposed based on spatial constraint CST (SCCST) in combination with a Markov random field (MRF) model. First, the mean and variance matrix of the difference image of bitemporal images are estimated by an iterative trimming method. In each iteration, spatial information is injected to reduce scattered changed points (also known as "salt and pepper" noise). To determine the key parameter confidence level in the SCCST method, a pseudotraining dataset is constructed to estimate the optimal value. Then, the result of SCCST, as an initial solution of change detection, is further improved by the MRF model. The experiments on simulated and real multitemporal and multispectral images indicate that the proposed method performs well in comprehensive indices compared with other methods.
Domain Decomposition By the Advancing-Partition Method for Parallel Unstructured Grid Generation

NASA Technical Reports Server (NTRS)

Pirzadeh, Shahyar Z.; Zagaris, George

2009-01-01

A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
Domain Decomposition By the Advancing-Partition Method

NASA Technical Reports Server (NTRS)

Pirzadeh, Shahyar Z.

2008-01-01

A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
Discovering motion primitives for unsupervised grouping and one-shot learning of human actions, gestures, and expressions.

PubMed

Yang, Yang; Saleemi, Imran; Shah, Mubarak

2013-07-01

This paper proposes a novel representation of articulated human actions and gestures and facial expressions. The main goals of the proposed approach are: 1) to enable recognition using very few examples, i.e., one or k-shot learning, and 2) meaningful organization of unlabeled datasets by unsupervised clustering. Our proposed representation is obtained by automatically discovering high-level subactions or motion primitives, by hierarchical clustering of observed optical flow in four-dimensional, spatial, and motion flow space. The completely unsupervised proposed method, in contrast to state-of-the-art representations like bag of video words, provides a meaningful representation conducive to visual interpretation and textual labeling. Each primitive action depicts an atomic subaction, like directional motion of limb or torso, and is represented by a mixture of four-dimensional Gaussian distributions. For one--shot and k-shot learning, the sequence of primitive labels discovered in a test video are labeled using KL divergence, and can then be represented as a string and matched against similar strings of training videos. The same sequence can also be collapsed into a histogram of primitives or be used to learn a Hidden Markov model to represent classes. We have performed extensive experiments on recognition by one and k-shot learning as well as unsupervised action clustering on six human actions and gesture datasets, a composite dataset, and a database of facial expressions. These experiments confirm the validity and discriminative nature of the proposed representation.
SAR image segmentation using skeleton-based fuzzy clustering

NASA Astrophysics Data System (ADS)

Cao, Yun Yi; Chen, Yan Qiu

2003-06-01

SAR image segmentation can be converted to a clustering problem in which pixels or small patches are grouped together based on local feature information. In this paper, we present a novel framework for segmentation. The segmentation goal is achieved by unsupervised clustering upon characteristic descriptors extracted from local patches. The mixture model of characteristic descriptor, which combines intensity and texture feature, is investigated. The unsupervised algorithm is derived from the recently proposed Skeleton-Based Data Labeling method. Skeletons are constructed as prototypes of clusters to represent arbitrary latent structures in image data. Segmentation using Skeleton-Based Fuzzy Clustering is able to detect the types of surfaces appeared in SAR images automatically without any user input.

Construction and Analysis of Multi-Rate Partitioned Runge-Kutta Methods

DTIC Science & Technology

2012-06-01

ANALYSIS OF MULTI-RATE PARTITIONED RUNGE-KUTTA METHODS by Patrick R. Mugg June 2012 Thesis Advisor: Francis Giraldo Second Reader: Hong...COVERED Master’s Thesis 4. TITLE AND SUBTITLE Construction and Analysis of Multi-Rate Partitioned Runge-Kutta Methods 5. FUNDING NUMBERS 6. AUTHOR...The most widely known and used procedure for analyzing stability is the Von Neumann Method , such that Von Neumann’s stability analysis looks at
Unsupervised self-care predicts conduct problems: The moderating roles of hostile aggression and gender.

PubMed

Atherton, Olivia E; Schofield, Thomas J; Sitka, Angela; Conger, Rand D; Robins, Richard W

2016-04-01

Despite widespread speculation about the detrimental effect of unsupervised self-care on adolescent outcomes, little is known about which children are particularly prone to problem behaviors when left at home without adult supervision. The present research used data from a longitudinal study of 674 Mexican-origin children residing in the United States to examine the prospective effect of unsupervised self-care on conduct problems, and the moderating roles of hostile aggression and gender. Results showed that unsupervised self-care was related to increases over time in conduct problems such as lying, stealing, and bullying. However, unsupervised self-care only led to conduct problems for boys and for children with an aggressive temperament. The main and interactive effects held for both mother-reported and observational-rated hostile aggression and after controlling for potential confounds. Copyright © 2016 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.
New approach to canonical partition functions computation in Nf=2 lattice QCD at finite baryon density

NASA Astrophysics Data System (ADS)

Bornyakov, V. G.; Boyda, D. L.; Goy, V. A.; Molochkov, A. V.; Nakamura, Atsushi; Nikolaev, A. A.; Zakharov, V. I.

2017-05-01

We propose and test a new approach to computation of canonical partition functions in lattice QCD at finite density. We suggest a few steps procedure. We first compute numerically the quark number density for imaginary chemical potential i μq I . Then we restore the grand canonical partition function for imaginary chemical potential using the fitting procedure for the quark number density. Finally we compute the canonical partition functions using high precision numerical Fourier transformation. Additionally we compute the canonical partition functions using the known method of the hopping parameter expansion and compare results obtained by two methods in the deconfining as well as in the confining phases. The agreement between two methods indicates the validity of the new method. Our numerical results are obtained in two flavor lattice QCD with clover improved Wilson fermions.
Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.

PubMed

Mwangi, Benson; Soares, Jair C; Hasan, Khader M

2014-10-30

Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
Unsupervised segmentation of low-contrast multichannel images: discrimination of tissue components in microscopic images of unstained specimens

NASA Astrophysics Data System (ADS)

Kopriva, Ivica; Popović Hadžija, Marijana; Hadžija, Mirko; Aralica, Gorana

2015-06-01

Low-contrast images, such as color microscopic images of unstained histological specimens, are composed of objects with highly correlated spectral profiles. Such images are very hard to segment. Here, we present a method that nonlinearly maps low-contrast color image into an image with an increased number of non-physical channels and a decreased correlation between spectral profiles. The method is a proof-of-concept validated on the unsupervised segmentation of color images of unstained specimens, in which case the tissue components appear colorless when viewed under the light microscope. Specimens of human hepatocellular carcinoma, human liver with metastasis from colon and gastric cancer and mouse fatty liver were used for validation. The average correlation between the spectral profiles of the tissue components was greater than 0.9985, and the worst case correlation was greater than 0.9997. The proposed method can potentially be applied to the segmentation of low-contrast multichannel images with high spatial resolution that arise in other imaging modalities.
A new local-global approach for classification.

PubMed

Peres, R T; Pedreira, C E

2010-09-01

In this paper, we propose a new local-global pattern classification scheme that combines supervised and unsupervised approaches, taking advantage of both, local and global environments. We understand as global methods the ones concerned with the aim of constructing a model for the whole problem space using the totality of the available observations. Local methods focus into sub regions of the space, possibly using an appropriately selected subset of the sample. In the proposed method, the sample is first divided in local cells by using a Vector Quantization unsupervised algorithm, the LBG (Linde-Buzo-Gray). In a second stage, the generated assemblage of much easier problems is locally solved with a scheme inspired by Bayes' rule. Four classification methods were implemented for comparison purposes with the proposed scheme: Learning Vector Quantization (LVQ); Feedforward Neural Networks; Support Vector Machine (SVM) and k-Nearest Neighbors. These four methods and the proposed scheme were implemented in eleven datasets, two controlled experiments, plus nine public available datasets from the UCI repository. The proposed method has shown a quite competitive performance when compared to these classical and largely used classifiers. Our method is simple concerning understanding and implementation and is based on very intuitive concepts. Copyright 2010 Elsevier Ltd. All rights reserved.
Parametric embedding for class visualization.

PubMed

Iwata, Tomoharu; Saito, Kazumi; Ueda, Naonori; Stromsten, Sean; Griffiths, Thomas L; Tenenbaum, Joshua B

2007-09-01

We propose a new method, parametric embedding (PE), that embeds objects with the class structure into a low-dimensional visualization space. PE takes as input a set of class conditional probabilities for given data points and tries to preserve the structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a gaussian mixture with equal covariances in the embedding space. PE has many potential uses depending on the source of the input data, providing insight into the classifier's behavior in supervised, semisupervised, and unsupervised settings. The PE algorithm has a computational advantage over conventional embedding methods based on pairwise object relations since its complexity scales with the product of the number of objects and the number of classes. We demonstrate PE by visualizing supervised categorization of Web pages, semisupervised categorization of digits, and the relations of words and latent topics found by an unsupervised algorithm, latent Dirichlet allocation.
An Unsupervised Approach for Extraction of Blood Vessels from Fundus Images.

PubMed

Dash, Jyotiprava; Bhoi, Nilamani

2018-04-26

Pathological disorders may happen due to small changes in retinal blood vessels which may later turn into blindness. Hence, the accurate segmentation of blood vessels is becoming a challenging task for pathological analysis. This paper offers an unsupervised recursive method for extraction of blood vessels from ophthalmoscope images. First, a vessel-enhanced image is generated with the help of gamma correction and contrast-limited adaptive histogram equalization (CLAHE). Next, the vessels are extracted iteratively by applying an adaptive thresholding technique. At last, a final vessel segmented image is produced by applying a morphological cleaning operation. Evaluations are accompanied on the publicly available digital retinal images for vessel extraction (DRIVE) and Child Heart And Health Study in England (CHASE_DB1) databases using nine different measurements. The proposed method achieves average accuracies of 0.957 and 0.952 on DRIVE and CHASE_DB1 databases respectively.
Application of unsupervised pattern recognition approaches for exploration of rare earth elements in Se-Chahun iron ore, central Iran

NASA Astrophysics Data System (ADS)

Sarparandeh, Mohammadali; Hezarkhani, Ardeshir

2017-12-01

The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed methods and geological studies leads to finding some hidden information, and this approach has the best results compared to using only one of them.
Video mining using combinations of unsupervised and supervised learning techniques

NASA Astrophysics Data System (ADS)

Divakaran, Ajay; Miyahara, Koji; Peker, Kadir A.; Radhakrishnan, Regunathan; Xiong, Ziyou

2003-12-01

We discuss the meaning and significance of the video mining problem, and present our work on some aspects of video mining. A simple definition of video mining is unsupervised discovery of patterns in audio-visual content. Such purely unsupervised discovery is readily applicable to video surveillance as well as to consumer video browsing applications. We interpret video mining as content-adaptive or "blind" content processing, in which the first stage is content characterization and the second stage is event discovery based on the characterization obtained in stage 1. We discuss the target applications and find that using a purely unsupervised approach are too computationally complex to be implemented on our product platform. We then describe various combinations of unsupervised and supervised learning techniques that help discover patterns that are useful to the end-user of the application. We target consumer video browsing applications such as commercial message detection, sports highlights extraction etc. We employ both audio and video features. We find that supervised audio classification combined with unsupervised unusual event discovery enables accurate supervised detection of desired events. Our techniques are computationally simple and robust to common variations in production styles etc.
Systematic exploration of unsupervised methods for mapping behavior

NASA Astrophysics Data System (ADS)

Todd, Jeremy G.; Kain, Jamey S.; de Bivort, Benjamin L.

2017-02-01

To fully understand the mechanisms giving rise to behavior, we need to be able to precisely measure it. When coupled with large behavioral data sets, unsupervised clustering methods offer the potential of unbiased mapping of behavioral spaces. However, unsupervised techniques to map behavioral spaces are in their infancy, and there have been few systematic considerations of all the methodological options. We compared the performance of seven distinct mapping methods in clustering a wavelet-transformed data set consisting of the x- and y-positions of the six legs of individual flies. Legs were automatically tracked by small pieces of fluorescent dye, while the fly was tethered and walking on an air-suspended ball. We find that there is considerable variation in the performance of these mapping methods, and that better performance is attained when clustering is done in higher dimensional spaces (which are otherwise less preferable because they are hard to visualize). High dimensionality means that some algorithms, including the non-parametric watershed cluster assignment algorithm, cannot be used. We developed an alternative watershed algorithm which can be used in high-dimensional spaces when a probability density estimate can be computed directly. With these tools in hand, we examined the behavioral space of fly leg postural dynamics and locomotion. We find a striking division of behavior into modes involving the fore legs and modes involving the hind legs, with few direct transitions between them. By computing behavioral clusters using the data from all flies simultaneously, we show that this division appears to be common to all flies. We also identify individual-to-individual differences in behavior and behavioral transitions. Lastly, we suggest a computational pipeline that can achieve satisfactory levels of performance without the taxing computational demands of a systematic combinatorial approach.
On the robustness of EC-PC spike detection method for online neural recording.

PubMed

Zhou, Yin; Wu, Tong; Rastegarnia, Amir; Guan, Cuntai; Keefer, Edward; Yang, Zhi

2014-09-30

Online spike detection is an important step to compress neural data and perform real-time neural information decoding. An unsupervised, automatic, yet robust signal processing is strongly desired, thus it can support a wide range of applications. We have developed a novel spike detection algorithm called "exponential component-polynomial component" (EC-PC) spike detection. We firstly evaluate the robustness of the EC-PC spike detector under different firing rates and SNRs. Secondly, we show that the detection Precision can be quantitatively derived without requiring additional user input parameters. We have realized the algorithm (including training) into a 0.13 μm CMOS chip, where an unsupervised, nonparametric operation has been demonstrated. Both simulated data and real data are used to evaluate the method under different firing rates (FRs), SNRs. The results show that the EC-PC spike detector is the most robust in comparison with some popular detectors. Moreover, the EC-PC detector can track changes in the background noise due to the ability to re-estimate the neural data distribution. Both real and synthesized data have been used for testing the proposed algorithm in comparison with other methods, including the absolute thresholding detector (AT), median absolute deviation detector (MAD), nonlinear energy operator detector (NEO), and continuous wavelet detector (CWD). Comparative testing results reveals that the EP-PC detection algorithm performs better than the other algorithms regardless of recording conditions. The EC-PC spike detector can be considered as an unsupervised and robust online spike detection. It is also suitable for hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.
Robust Arm and Hand Tracking by Unsupervised Context Learning

PubMed Central

Spruyt, Vincent; Ledda, Alessandro; Philips, Wilfried

2014-01-01

Hand tracking in video is an increasingly popular research field due to the rise of novel human-computer interaction methods. However, robust and real-time hand tracking in unconstrained environments remains a challenging task due to the high number of degrees of freedom and the non-rigid character of the human hand. In this paper, we propose an unsupervised method to automatically learn the context in which a hand is embedded. This context includes the arm and any other object that coherently moves along with the hand. We introduce two novel methods to incorporate this context information into a probabilistic tracking framework, and introduce a simple yet effective solution to estimate the position of the arm. Finally, we show that our method greatly increases robustness against occlusion and cluttered background, without degrading tracking performance if no contextual information is available. The proposed real-time algorithm is shown to outperform the current state-of-the-art by evaluating it on three publicly available video datasets. Furthermore, a novel dataset is created and made publicly available for the research community. PMID:25004155
A Granular Self-Organizing Map for Clustering and Gene Selection in Microarray Data.

PubMed

Ray, Shubhra Sankar; Ganivada, Avatharam; Pal, Sankar K

2016-09-01

A new granular self-organizing map (GSOM) is developed by integrating the concept of a fuzzy rough set with the SOM. While training the GSOM, the weights of a winning neuron and the neighborhood neurons are updated through a modified learning procedure. The neighborhood is newly defined using the fuzzy rough sets. The clusters (granules) evolved by the GSOM are presented to a decision table as its decision classes. Based on the decision table, a method of gene selection is developed. The effectiveness of the GSOM is shown in both clustering samples and developing an unsupervised fuzzy rough feature selection (UFRFS) method for gene selection in microarray data. While the superior results of the GSOM, as compared with the related clustering methods, are provided in terms of β -index, DB-index, Dunn-index, and fuzzy rough entropy, the genes selected by the UFRFS are not only better in terms of classification accuracy and a feature evaluation index, but also statistically more significant than the related unsupervised methods. The C-codes of the GSOM and UFRFS are available online at http://avatharamg.webs.com/software-code.
A novel partitioning method for block-structured adaptive meshes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fu, Lin, E-mail: lin.fu@tum.de; Litvinov, Sergej, E-mail: sergej.litvinov@aer.mw.tum.de; Hu, Xiangyu Y., E-mail: xiangyu.hu@tum.de

We propose a novel partitioning method for block-structured adaptive meshes utilizing the meshless Lagrangian particle concept. With the observation that an optimum partitioning has high analogy to the relaxation of a multi-phase fluid to steady state, physically motivated model equations are developed to characterize the background mesh topology and are solved by multi-phase smoothed-particle hydrodynamics. In contrast to well established partitioning approaches, all optimization objectives are implicitly incorporated and achieved during the particle relaxation to stationary state. Distinct partitioning sub-domains are represented by colored particles and separated by a sharp interface with a surface tension model. In order to obtainmore » the particle relaxation, special viscous and skin friction models, coupled with a tailored time integration algorithm are proposed. Numerical experiments show that the present method has several important properties: generation of approximately equal-sized partitions without dependence on the mesh-element type, optimized interface communication between distinct partitioning sub-domains, continuous domain decomposition which is physically localized and implicitly incremental. Therefore it is particularly suitable for load-balancing of high-performance CFD simulations.« less
A novel partitioning method for block-structured adaptive meshes

NASA Astrophysics Data System (ADS)

Fu, Lin; Litvinov, Sergej; Hu, Xiangyu Y.; Adams, Nikolaus A.

2017-07-01

We propose a novel partitioning method for block-structured adaptive meshes utilizing the meshless Lagrangian particle concept. With the observation that an optimum partitioning has high analogy to the relaxation of a multi-phase fluid to steady state, physically motivated model equations are developed to characterize the background mesh topology and are solved by multi-phase smoothed-particle hydrodynamics. In contrast to well established partitioning approaches, all optimization objectives are implicitly incorporated and achieved during the particle relaxation to stationary state. Distinct partitioning sub-domains are represented by colored particles and separated by a sharp interface with a surface tension model. In order to obtain the particle relaxation, special viscous and skin friction models, coupled with a tailored time integration algorithm are proposed. Numerical experiments show that the present method has several important properties: generation of approximately equal-sized partitions without dependence on the mesh-element type, optimized interface communication between distinct partitioning sub-domains, continuous domain decomposition which is physically localized and implicitly incremental. Therefore it is particularly suitable for load-balancing of high-performance CFD simulations.
Supervised and Unsupervised Self-Testing for HIV in High- and Low-Risk Populations: A Systematic Review

PubMed Central

Pant Pai, Nitika; Sharma, Jigyasa; Shivkumar, Sushmita; Pillay, Sabrina; Vadnais, Caroline; Joseph, Lawrence; Dheda, Keertan; Peeling, Rosanna W.

2013-01-01

Background Stigma, discrimination, lack of privacy, and long waiting times partly explain why six out of ten individuals living with HIV do not access facility-based testing. By circumventing these barriers, self-testing offers potential for more people to know their sero-status. Recent approval of an in-home HIV self test in the US has sparked self-testing initiatives, yet data on acceptability, feasibility, and linkages to care are limited. We systematically reviewed evidence on supervised (self-testing and counselling aided by a health care professional) and unsupervised (performed by self-tester with access to phone/internet counselling) self-testing strategies. Methods and Findings Seven databases (Medline [via PubMed], Biosis, PsycINFO, Cinahl, African Medicus, LILACS, and EMBASE) and conference abstracts of six major HIV/sexually transmitted infections conferences were searched from 1st January 2000–30th October 2012. 1,221 citations were identified and 21 studies included for review. Seven studies evaluated an unsupervised strategy and 14 evaluated a supervised strategy. For both strategies, data on acceptability (range: 74%–96%), preference (range: 61%–91%), and partner self-testing (range: 80%–97%) were high. A high specificity (range: 99.8%–100%) was observed for both strategies, while a lower sensitivity was reported in the unsupervised (range: 92.9%–100%; one study) versus supervised (range: 97.4%–97.9%; three studies) strategy. Regarding feasibility of linkage to counselling and care, 96% (n = 102/106) of individuals testing positive for HIV stated they would seek post-test counselling (unsupervised strategy, one study). No extreme adverse events were noted. The majority of data (n = 11,019/12,402 individuals, 89%) were from high-income settings and 71% (n = 15/21) of studies were cross-sectional in design, thus limiting our analysis. Conclusions Both supervised and unsupervised testing strategies were highly acceptable, preferred, and more likely to result in partner self-testing. However, no studies evaluated post-test linkage with counselling and treatment outcomes and reporting quality was poor. Thus, controlled trials of high quality from diverse settings are warranted to confirm and extend these findings. Please see later in the article for the Editors' Summary PMID:23565066
A primitive study on unsupervised anomaly detection with an autoencoder in emergency head CT volumes

NASA Astrophysics Data System (ADS)

Sato, Daisuke; Hanaoka, Shouhei; Nomura, Yukihiro; Takenaga, Tomomi; Miki, Soichiro; Yoshikawa, Takeharu; Hayashi, Naoto; Abe, Osamu

2018-02-01

Purpose: The target disorders of emergency head CT are wide-ranging. Therefore, people working in an emergency department desire a computer-aided detection system for general disorders. In this study, we proposed an unsupervised anomaly detection method in emergency head CT using an autoencoder and evaluated the anomaly detection performance of our method in emergency head CT. Methods: We used a 3D convolutional autoencoder (3D-CAE), which contains 11 layers in the convolution block and 6 layers in the deconvolution block. In the training phase, we trained the 3D-CAE using 10,000 3D patches extracted from 50 normal cases. In the test phase, we calculated abnormalities of each voxel in 38 emergency head CT volumes (22 abnormal cases and 16 normal cases) for evaluation and evaluated the likelihood of lesion existence. Results: Our method achieved a sensitivity of 68% and a specificity of 88%, with an area under the curve of the receiver operating characteristic curve of 0.87. It shows that this method has a moderate accuracy to distinguish normal CT cases to abnormal ones. Conclusion: Our method has potentialities for anomaly detection in emergency head CT.
The Partition of Multi-Resolution LOD Based on Qtm

NASA Astrophysics Data System (ADS)

Hou, M.-L.; Xing, H.-Q.; Zhao, X.-S.; Chen, J.

2011-08-01

The partition hierarch of Quaternary Triangular Mesh (QTM) determine the accuracy of spatial analysis and application based on QTM. In order to resolve the problem that the partition hierarch of QTM is limited by the level of the computer hardware, the new method that Multi- Resolution LOD (Level of Details) based on QTM will be discussed in this paper. This method can make the resolution of the cells varying with the viewpoint position by partitioning the cells of QTM, selecting the particular area according to the viewpoint; dealing with the cracks caused by different subdivisions, it satisfies the request of unlimited partition in part.
An unsupervised MVA method to compare specific regions in human breast tumor tissue samples using ToF-SIMS.

PubMed

Bluestein, Blake M; Morrish, Fionnuala; Graham, Daniel J; Guenthoer, Jamie; Hockenbery, David; Porter, Peggy L; Gamble, Lara J

2016-03-21

Imaging time-of-flight secondary ion mass spectrometry (ToF-SIMS) and principal component analysis (PCA) were used to investigate two sets of pre- and post-chemotherapy human breast tumor tissue sections to characterize lipids associated with tumor metabolic flexibility and response to treatment. The micron spatial resolution imaging capability of ToF-SIMS provides a powerful approach to attain spatially-resolved molecular and cellular data from cancerous tissues not available with conventional imaging techniques. Three ca. 1 mm(2) areas per tissue section were analyzed by stitching together 200 μm × 200 μm raster area scans. A method to isolate and analyze specific tissue regions of interest by utilizing PCA of ToF-SIMS images is presented, which allowed separation of cellularized areas from stromal areas. These PCA-generated regions of interest were then used as masks to reconstruct representative spectra from specifically stromal or cellular regions. The advantage of this unsupervised selection method is a reduction in scatter in the spectral PCA results when compared to analyzing all tissue areas or analyzing areas highlighted by a pathologist. Utilizing this method, stromal and cellular regions of breast tissue biopsies taken pre- versus post-chemotherapy demonstrate chemical separation using negatively-charged ion species. In this sample set, the cellular regions were predominantly all cancer cells. Fatty acids (i.e. palmitic, oleic, and stearic), monoacylglycerols, diacylglycerols and vitamin E profiles were distinctively different between the pre- and post-therapy tissues. These results validate a new unsupervised method to isolate and interpret biochemically distinct regions in cancer tissues using imaging ToF-SIMS data. In addition, the method developed here can provide a framework to compare a variety of tissue samples using imaging ToF-SIMS, especially where there is section-to-section variability that makes it difficult to use a serial hematoxylin and eosin (H&E) stained section to direct the SIMS analysis.

Unsupervised background-constrained tank segmentation of infrared images in complex background based on the Otsu method.

PubMed

Zhou, Yulong; Gao, Min; Fang, Dan; Zhang, Baoquan

2016-01-01

In an effort to implement fast and effective tank segmentation from infrared images in complex background, the threshold of the maximum between-class variance method (i.e., the Otsu method) is analyzed and the working mechanism of the Otsu method is discussed. Subsequently, a fast and effective method for tank segmentation from infrared images in complex background is proposed based on the Otsu method via constraining the complex background of the image. Considering the complexity of background, the original image is firstly divided into three classes of target region, middle background and lower background via maximizing the sum of their between-class variances. Then, the unsupervised background constraint is implemented based on the within-class variance of target region and hence the original image can be simplified. Finally, the Otsu method is applied to simplified image for threshold selection. Experimental results on a variety of tank infrared images (880 × 480 pixels) in complex background demonstrate that the proposed method enjoys better segmentation performance and even could be comparative with the manual segmentation in segmented results. In addition, its average running time is only 9.22 ms, implying the new method with good performance in real time processing.
Probability density function learning by unsupervised neurons.

PubMed

Fiori, S

2001-10-01

In a recent work, we introduced the concept of pseudo-polynomial adaptive activation function neuron (FAN) and presented an unsupervised information-theoretic learning theory for such structure. The learning model is based on entropy optimization and provides a way of learning probability distributions from incomplete data. The aim of the present paper is to illustrate some theoretical features of the FAN neuron, to extend its learning theory to asymmetrical density function approximation, and to provide an analytical and numerical comparison with other known density function estimation methods, with special emphasis to the universal approximation ability. The paper also provides a survey of PDF learning from incomplete data, as well as results of several experiments performed on real-world problems and signals.
Unsupervised Outlier Profile Analysis

PubMed Central

Ghosh, Debashis; Li, Song

2014-01-01

In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C(α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study. PMID:25452686
Dominant partition method. [based on a wave function formalism

NASA Technical Reports Server (NTRS)

Dixon, R. M.; Redish, E. F.

1979-01-01

By use of the L'Huillier, Redish, and Tandy (LRT) wave function formalism, a partially connected method, the dominant partition method (DPM) is developed for obtaining few body reductions of the many body problem in the LRT and Bencze, Redish, and Sloan (BRS) formalisms. The DPM maps the many body problem to a fewer body one by using the criterion that the truncated formalism must be such that consistency with the full Schroedinger equation is preserved. The DPM is based on a class of new forms for the irreducible cluster potential, which is introduced in the LRT formalism. Connectivity is maintained with respect to all partitions containing a given partition, which is referred to as the dominant partition. Degrees of freedom corresponding to the breakup of one or more of the clusters of the dominant partition are treated in a disconnected manner. This approach for simplifying the complicated BRS equations is appropriate for physical problems where a few body reaction mechanism prevails.
Analysis of Partitioned Methods for the Biot System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bukac, Martina; Layton, William; Moraiti, Marina

2015-02-18

In this work, we present a comprehensive study of several partitioned methods for the coupling of flow and mechanics. We derive energy estimates for each method for the fully-discrete problem. We write the obtained stability conditions in terms of a key control parameter defined as a ratio of the coupling strength and the speed of propagation. Depending on the parameters in the problem, give the choice of the partitioned method which allows the largest time step. (C) 2015 Wiley Periodicals, Inc.
Quasi-Supervised Scoring of Human Sleep in Polysomnograms Using Augmented Input Variables

PubMed Central

Yaghouby, Farid; Sunderam, Sridhar

2015-01-01

The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18 to 79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models—specifically Gaussian mixtures and hidden Markov models—are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's K statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. PMID:25679475
Quasi-supervised scoring of human sleep in polysomnograms using augmented input variables.

PubMed

Yaghouby, Farid; Sunderam, Sridhar

2015-04-01

The limitations of manual sleep scoring make computerized methods highly desirable. Scoring errors can arise from human rater uncertainty or inter-rater variability. Sleep scoring algorithms either come as supervised classifiers that need scored samples of each state to be trained, or as unsupervised classifiers that use heuristics or structural clues in unscored data to define states. We propose a quasi-supervised classifier that models observations in an unsupervised manner but mimics a human rater wherever training scores are available. EEG, EMG, and EOG features were extracted in 30s epochs from human-scored polysomnograms recorded from 42 healthy human subjects (18-79 years) and archived in an anonymized, publicly accessible database. Hypnograms were modified so that: 1. Some states are scored but not others; 2. Samples of all states are scored but not for transitional epochs; and 3. Two raters with 67% agreement are simulated. A framework for quasi-supervised classification was devised in which unsupervised statistical models-specifically Gaussian mixtures and hidden Markov models--are estimated from unlabeled training data, but the training samples are augmented with variables whose values depend on available scores. Classifiers were fitted to signal features incorporating partial scores, and used to predict scores for complete recordings. Performance was assessed using Cohen's Κ statistic. The quasi-supervised classifier performed significantly better than an unsupervised model and sometimes as well as a completely supervised model despite receiving only partial scores. The quasi-supervised algorithm addresses the need for classifiers that mimic scoring patterns of human raters while compensating for their limitations. Copyright © 2015 Elsevier Ltd. All rights reserved.
Comparing methods for partitioning a decade of carbon dioxide and water vapor fluxes in a temperate forest

Treesearch

Benjamin N. Sulman; Daniel Tyler Roman; Todd M. Scanlon; Lixin Wang; Kimberly A. Novick

2016-01-01

The eddy covariance (EC) method is routinely used to measure net ecosystem fluxes of carbon dioxide (CO2) and evapotranspiration (ET) in terrestrial ecosystems. It is often desirable to partition CO2 flux into gross primary production (GPP) and ecosystem respiration (RE), and to partition ET into evaporation and...
An unsupervised technique for optimal feature selection in attribute profiles for spectral-spatial classification of hyperspectral images

NASA Astrophysics Data System (ADS)

Bhardwaj, Kaushal; Patra, Swarnajyoti

2018-04-01

Inclusion of spatial information along with spectral features play a significant role in classification of remote sensing images. Attribute profiles have already proved their ability to represent spatial information. In order to incorporate proper spatial information, multiple attributes are required and for each attribute large profiles need to be constructed by varying the filter parameter values within a wide range. Thus, the constructed profiles that represent spectral-spatial information of an hyperspectral image have huge dimension which leads to Hughes phenomenon and increases computational burden. To mitigate these problems, this work presents an unsupervised feature selection technique that selects a subset of filtered image from the constructed high dimensional multi-attribute profile which are sufficiently informative to discriminate well among classes. In this regard the proposed technique exploits genetic algorithms (GAs). The fitness function of GAs are defined in an unsupervised way with the help of mutual information. The effectiveness of the proposed technique is assessed using one-against-all support vector machine classifier. The experiments conducted on three hyperspectral data sets show the robustness of the proposed method in terms of computation time and classification accuracy.
Discrete Wavelet Transform-Based Whole-Spectral and Subspectral Analysis for Improved Brain Tumor Clustering Using Single Voxel MR Spectroscopy.

PubMed

Yang, Guang; Nawaz, Tahir; Barrick, Thomas R; Howe, Franklyn A; Slabaugh, Greg

2015-12-01

Many approaches have been considered for automatic grading of brain tumors by means of pattern recognition with magnetic resonance spectroscopy (MRS). Providing an improved technique which can assist clinicians in accurately identifying brain tumor grades is our main objective. The proposed technique, which is based on the discrete wavelet transform (DWT) of whole-spectral or subspectral information of key metabolites, combined with unsupervised learning, inspects the separability of the extracted wavelet features from the MRS signal to aid the clustering. In total, we included 134 short echo time single voxel MRS spectra (SV MRS) in our study that cover normal controls, low grade and high grade tumors. The combination of DWT-based whole-spectral or subspectral analysis and unsupervised clustering achieved an overall clustering accuracy of 94.8% and a balanced error rate of 7.8%. To the best of our knowledge, it is the first study using DWT combined with unsupervised learning to cluster brain SV MRS. Instead of dimensionality reduction on SV MRS or feature selection using model fitting, our study provides an alternative method of extracting features to obtain promising clustering results.
Significant Scales in Community Structure

NASA Astrophysics Data System (ADS)

Traag, V. A.; Krings, G.; van Dooren, P.

2013-10-01

Many complex networks show signs of modular structure, uncovered by community detection. Although many methods succeed in revealing various partitions, it remains difficult to detect at what scale some partition is significant. This problem shows foremost in multi-resolution methods. We here introduce an efficient method for scanning for resolutions in one such method. Additionally, we introduce the notion of ``significance'' of a partition, based on subgraph probabilities. Significance is independent of the exact method used, so could also be applied in other methods, and can be interpreted as the gain in encoding a graph by making use of a partition. Using significance, we can determine ``good'' resolution parameters, which we demonstrate on benchmark networks. Moreover, optimizing significance itself also shows excellent performance. We demonstrate our method on voting data from the European Parliament. Our analysis suggests the European Parliament has become increasingly ideologically divided and that nationality plays no role.
Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery

PubMed Central

Shen, Feichen; Liu, Hongfang; Sohn, Sunghwan; Larson, David W.; Lee, Yugyung

2017-01-01

In the current biomedical data movement, numerous efforts have been made to convert and normalize a large number of traditional structured and unstructured data (e.g., EHRs, reports) to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the biomedical community, data integration and knowledge discovery from heterogeneous domains become important research problem. In the application level, detection of related concepts among medical ontologies is an important goal of life science research. It is more crucial to figure out how different concepts are related within a single ontology or across multiple ontologies by analysing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for biomedical researchers to find existing or potential predicates to perform linking among cross domain concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and do query generation to discover cross domain knowledge from each topic. In this paper, we present such a model that predicates oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovated unsupervised learning algorithm to partition large data sets into smaller and closer topics and generate meaningful queries to fully discover knowledge over a set of interlinked data sources. We have implemented a prototype system named BmQGen and evaluate the proposed model with colorectal surgical cohort from the Mayo Clinic. PMID:28983419
Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System.

PubMed

Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque; Rativa, Diego

2018-01-01

There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen's Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals.
GPU implementation of the simplex identification via split augmented Lagrangian

NASA Astrophysics Data System (ADS)

Sevilla, Jorge; Nascimento, José M. P.

2015-10-01

Hyperspectral imaging can be used for object detection and for discriminating between different objects based on their spectral characteristics. One of the main problems of hyperspectral data analysis is the presence of mixed pixels, due to the low spatial resolution of such images. This means that several spectrally pure signatures (endmembers) are combined into the same mixed pixel. Linear spectral unmixing follows an unsupervised approach which aims at inferring pure spectral signatures and their material fractions at each pixel of the scene. The huge data volumes acquired by such sensors put stringent requirements on processing and unmixing methods. This paper proposes an efficient implementation of a unsupervised linear unmixing method on GPUs using CUDA. The method finds the smallest simplex by solving a sequence of nonsmooth convex subproblems using variable splitting to obtain a constraint formulation, and then applying an augmented Lagrangian technique. The parallel implementation of SISAL presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory. The results herein presented indicate that the GPU implementation can significantly accelerate the method's execution over big datasets while maintaining the methods accuracy.
Above-Water Reflectance for the Evaluation of Adjacency Effects in Earth Observation Data: Initial Results and Methods Comparison for Near-Coastal Waters in the Western Channel, UK

NASA Astrophysics Data System (ADS)

Martinez Vicente, V.; Simis, S. G. H.; Alegre, R.; Land, P. E.; Groom, S. B.

2013-09-01

Un-supervised hyperspectral remote-sensing reflectance data (<15 km from the shore) were collected from a moving research vessel. Twodifferent processing methods were compared. The results were similar to concurrent Aqua-MODIS and Suomi-NPP-VIIRS satellite data.
Improving Unstructured Mesh Partitions for Multiple Criteria Using Mesh Adjacencies

DOE PAGES

Smith, Cameron W.; Rasquin, Michel; Ibanez, Dan; ...

2018-02-13

The scalability of unstructured mesh based applications depends on partitioning methods that quickly balance the computational work while reducing communication costs. Zhou et al. [SIAM J. Sci. Comput., 32 (2010), pp. 3201{3227; J. Supercomput., 59 (2012), pp. 1218{1228] demonstrated the combination of (hyper)graph methods with vertex and element partition improvement for PHASTA CFD scaling to hundreds of thousands of processes. Our work generalizes partition improvement to support balancing combinations of all the mesh entity dimensions (vertices, edges, faces, regions) in partitions with imbalances exceeding 70%. Improvement results are then presented for multiple entity dimensions on up to one million processesmore » on meshes with over 12 billion tetrahedral elements.« less
Improving Unstructured Mesh Partitions for Multiple Criteria Using Mesh Adjacencies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Cameron W.; Rasquin, Michel; Ibanez, Dan

The scalability of unstructured mesh based applications depends on partitioning methods that quickly balance the computational work while reducing communication costs. Zhou et al. [SIAM J. Sci. Comput., 32 (2010), pp. 3201{3227; J. Supercomput., 59 (2012), pp. 1218{1228] demonstrated the combination of (hyper)graph methods with vertex and element partition improvement for PHASTA CFD scaling to hundreds of thousands of processes. Our work generalizes partition improvement to support balancing combinations of all the mesh entity dimensions (vertices, edges, faces, regions) in partitions with imbalances exceeding 70%. Improvement results are then presented for multiple entity dimensions on up to one million processesmore » on meshes with over 12 billion tetrahedral elements.« less
Integrating dynamic stopping, transfer learning and language models in an adaptive zero-training ERP speller.

PubMed

Kindermans, Pieter-Jan; Tangermann, Michael; Müller, Klaus-Robert; Schrauwen, Benjamin

2014-06-01

Most BCIs have to undergo a calibration session in which data is recorded to train decoders with machine learning. Only recently zero-training methods have become a subject of study. This work proposes a probabilistic framework for BCI applications which exploit event-related potentials (ERPs). For the example of a visual P300 speller we show how the framework harvests the structure suitable to solve the decoding task by (a) transfer learning, (b) unsupervised adaptation, (c) language model and (d) dynamic stopping. A simulation study compares the proposed probabilistic zero framework (using transfer learning and task structure) to a state-of-the-art supervised model on n = 22 subjects. The individual influence of the involved components (a)-(d) are investigated. Without any need for a calibration session, the probabilistic zero-training framework with inter-subject transfer learning shows excellent performance--competitive to a state-of-the-art supervised method using calibration. Its decoding quality is carried mainly by the effect of transfer learning in combination with continuous unsupervised adaptation. A high-performing zero-training BCI is within reach for one of the most popular BCI paradigms: ERP spelling. Recording calibration data for a supervised BCI would require valuable time which is lost for spelling. The time spent on calibration would allow a novel user to spell 29 symbols with our unsupervised approach. It could be of use for various clinical and non-clinical ERP-applications of BCI.
Integrating dynamic stopping, transfer learning and language models in an adaptive zero-training ERP speller

NASA Astrophysics Data System (ADS)

Kindermans, Pieter-Jan; Tangermann, Michael; Müller, Klaus-Robert; Schrauwen, Benjamin

2014-06-01

Objective. Most BCIs have to undergo a calibration session in which data is recorded to train decoders with machine learning. Only recently zero-training methods have become a subject of study. This work proposes a probabilistic framework for BCI applications which exploit event-related potentials (ERPs). For the example of a visual P300 speller we show how the framework harvests the structure suitable to solve the decoding task by (a) transfer learning, (b) unsupervised adaptation, (c) language model and (d) dynamic stopping. Approach. A simulation study compares the proposed probabilistic zero framework (using transfer learning and task structure) to a state-of-the-art supervised model on n = 22 subjects. The individual influence of the involved components (a)-(d) are investigated. Main results. Without any need for a calibration session, the probabilistic zero-training framework with inter-subject transfer learning shows excellent performance—competitive to a state-of-the-art supervised method using calibration. Its decoding quality is carried mainly by the effect of transfer learning in combination with continuous unsupervised adaptation. Significance. A high-performing zero-training BCI is within reach for one of the most popular BCI paradigms: ERP spelling. Recording calibration data for a supervised BCI would require valuable time which is lost for spelling. The time spent on calibration would allow a novel user to spell 29 symbols with our unsupervised approach. It could be of use for various clinical and non-clinical ERP-applications of BCI.
Classification and analysis of the Rudaki's Area

NASA Astrophysics Data System (ADS)

Zambon, F.; De sanctis, M.; Capaccioni, F.; Filacchione, G.; Carli, C.; Ammannito, E.; Frigeri, A.

2011-12-01

During the first two MESSENGER flybys the Mercury Dual Imaging System (MDIS) has mapped 90% of the Mercury's surface. An effective way to study the different terrain on planetary surfaces is to apply classification methods. These are based on clustering algorithms and they can be divided in two categories: unsupervised and supervised. The unsupervised classifiers do not require the analyst feedback and the algorithm automatically organizes pixels values into classes. In the supervised method, instead, the analyst must choose the "training area" that define the pixels value of a given class. We applied an unsupervised classifier, ISODATA, to the WAC filter images of the Rudaki's area where several kind of terrain have been identified showing differences in albedo, topography and crater density. ISODATA classifier divides this region in four classes: 1) shadow regions, 2) rough regions, 3) smooth plane, 4) highest reflectance area. ISODATA can not distinguish the high albedo regions from highly reflective illuminated edge of the craters, however the algorithm identify four classes that can be considered different units mainly on the basis of their reflectances at the various wavelengths. Is not possible, instead, to extrapolate compositional information because of the absence of clear spectral features. An additional analysis was made using ISODATA to choose the "training area" for further supervised classifications. These approach would allow, for example, to separate more accurately the edge of the craters from the high reflectance areas and the low reflectance regions from the shadow areas.

Overlapped Partitioning for Ensemble Classifiers of P300-Based Brain-Computer Interfaces

PubMed Central

Onishi, Akinari; Natsume, Kiyohisa

2014-01-01

A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance. PMID:24695550
Overlapped partitioning for ensemble classifiers of P300-based brain-computer interfaces.

PubMed

Onishi, Akinari; Natsume, Kiyohisa

2014-01-01

A P300-based brain-computer interface (BCI) enables a wide range of people to control devices that improve their quality of life. Ensemble classifiers with naive partitioning were recently applied to the P300-based BCI and these classification performances were assessed. However, they were usually trained on a large amount of training data (e.g., 15300). In this study, we evaluated ensemble linear discriminant analysis (LDA) classifiers with a newly proposed overlapped partitioning method using 900 training data. In addition, the classification performances of the ensemble classifier with naive partitioning and a single LDA classifier were compared. One of three conditions for dimension reduction was applied: the stepwise method, principal component analysis (PCA), or none. The results show that an ensemble stepwise LDA (SWLDA) classifier with overlapped partitioning achieved a better performance than the commonly used single SWLDA classifier and an ensemble SWLDA classifier with naive partitioning. This result implies that the performance of the SWLDA is improved by overlapped partitioning and the ensemble classifier with overlapped partitioning requires less training data than that with naive partitioning. This study contributes towards reducing the required amount of training data and achieving better classification performance.
Using Optimisation Techniques to Granulise Rough Set Partitions

NASA Astrophysics Data System (ADS)

Crossingham, Bodie; Marwala, Tshilidzi

2007-11-01

This paper presents an approach to optimise rough set partition sizes using various optimisation techniques. Three optimisation techniques are implemented to perform the granularisation process, namely, genetic algorithm (GA), hill climbing (HC) and simulated annealing (SA). These optimisation methods maximise the classification accuracy of the rough sets. The proposed rough set partition method is tested on a set of demographic properties of individuals obtained from the South African antenatal survey. The three techniques are compared in terms of their computational time, accuracy and number of rules produced when applied to the Human Immunodeficiency Virus (HIV) data set. The optimised methods results are compared to a well known non-optimised discretisation method, equal-width-bin partitioning (EWB). The accuracies achieved after optimising the partitions using GA, HC and SA are 66.89%, 65.84% and 65.48% respectively, compared to the accuracy of EWB of 59.86%. In addition to rough sets providing the plausabilities of the estimated HIV status, they also provide the linguistic rules describing how the demographic parameters drive the risk of HIV.
Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.

PubMed

Mehryary, Farrokh; Kaewphan, Suwisa; Hakala, Kai; Ginter, Filip

2016-01-01

Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trigger detection, is a crucial part of event extraction, and notable overall performance gains can be obtained by solely focusing on this sub-task. In this paper we propose a novel approach for filtering falsely identified triggers from large-scale event databases, thus improving the quality of knowledge extraction. Our method relies on state-of-the-art word embeddings, event statistics gathered from the whole biomedical literature, and both supervised and unsupervised machine learning techniques. We focus on EVEX, an event database covering the whole PubMed and PubMed Central Open Access literature containing more than 40 million extracted events. The top most frequent EVEX trigger words are hierarchically clustered, and the resulting cluster tree is pruned to identify words that can never act as triggers regardless of their context. For rarely occurring trigger words we introduce a supervised approach trained on the combination of trigger word classification produced by the unsupervised clustering method and manual annotation. The method is evaluated on the official test set of BioNLP Shared Task on Event Extraction. The evaluation shows that the method can be used to improve the performance of the state-of-the-art event extraction systems. This successful effort also translates into removing 1,338,075 of potentially incorrect events from EVEX, thus greatly improving the quality of the data. The method is not solely bound to the EVEX resource and can be thus used to improve the quality of any event extraction system or database. The data and source code for this work are available at: http://bionlp-www.utu.fi/trigger-clustering/.
Classify epithelium-stroma in histopathological images based on deep transferable network.

PubMed

Yu, X; Zheng, H; Liu, C; Huang, Y; Ding, X

2018-04-20

Recently, the deep learning methods have received more attention in histopathological image analysis. However, the traditional deep learning methods assume that training data and test data have the same distributions, which causes certain limitations in real-world histopathological applications. However, it is costly to recollect a large amount of labeled histology data to train a new neural network for each specified image acquisition procedure even for similar tasks. In this paper, an unsupervised domain adaptation is introduced into a typical deep convolutional neural network (CNN) model to mitigate the repeating of the labels. The unsupervised domain adaptation is implemented by adding two regularisation terms, namely the feature-based adaptation and entropy minimisation, to the object function of a widely used CNN model called the AlexNet. Three independent public epithelium-stroma datasets were used to verify the proposed method. The experimental results have demonstrated that in the epithelium-stroma classification, the proposed method can achieve better performance than the commonly used deep learning methods and some existing deep domain adaptation methods. Therefore, the proposed method can be considered as a better option for the real-world applications of histopathological image analysis because there is no requirement for recollection of large-scale labeled data for every specified domain. © 2018 The Authors Journal of Microscopy © 2018 Royal Microscopical Society.
Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm.

PubMed

Kamali, Tahereh; Stashuk, Daniel

2016-10-01

Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright © 2016 Elsevier B.V. All rights reserved.
Insights into quasar UV spectra using unsupervised clustering analysis

NASA Astrophysics Data System (ADS)

Tammour, A.; Gallagher, S. C.; Daley, M.; Richards, G. T.

2016-06-01

Machine learning techniques can provide powerful tools to detect patterns in multidimensional parameter space. We use K-means - a simple yet powerful unsupervised clustering algorithm which picks out structure in unlabelled data - to study a sample of quasar UV spectra from the Quasar Catalog of the 10th Data Release of the Sloan Digital Sky Survey (SDSS-DR10) of Paris et al. Detecting patterns in large data sets helps us gain insights into the physical conditions and processes giving rise to the observed properties of quasars. We use K-means to find clusters in the parameter space of the equivalent width (EW), the blue- and red-half-width at half-maximum (HWHM) of the Mg II 2800 Å line, the C IV 1549 Å line, and the C III] 1908 Å blend in samples of broad absorption line (BAL) and non-BAL quasars at redshift 1.6-2.1. Using this method, we successfully recover correlations well-known in the UV regime such as the anti-correlation between the EW and blueshift of the C IV emission line and the shape of the ionizing spectra energy distribution (SED) probed by the strength of He II and the Si III]/C III] ratio. We find this to be particularly evident when the properties of C III] are used to find the clusters, while those of Mg II proved to be less strongly correlated with the properties of the other lines in the spectra such as the width of C IV or the Si III]/C III] ratio. We conclude that unsupervised clustering methods (such as K-means) are powerful methods for finding `natural' binning boundaries in multidimensional data sets and discuss caveats and future work.
Unsupervised online classifier in sleep scoring for sleep deprivation studies.

PubMed

Libourel, Paul-Antoine; Corneyllie, Alexandra; Luppi, Pierre-Hervé; Chouvet, Guy; Gervasoni, Damien

2015-05-01

This study was designed to evaluate an unsupervised adaptive algorithm for real-time detection of sleep and wake states in rodents. We designed a Bayesian classifier that automatically extracts electroencephalogram (EEG) and electromyogram (EMG) features and categorizes non-overlapping 5-s epochs into one of the three major sleep and wake states without any human supervision. This sleep-scoring algorithm is coupled online with a new device to perform selective paradoxical sleep deprivation (PSD). Controlled laboratory settings for chronic polygraphic sleep recordings and selective PSD. Ten adult Sprague-Dawley rats instrumented for chronic polysomnographic recordings. The performance of the algorithm is evaluated by comparison with the score obtained by a human expert reader. Online detection of PS is then validated with a PSD protocol with duration of 72 hours. Our algorithm gave a high concordance with human scoring with an average κ coefficient > 70%. Notably, the specificity to detect PS reached 92%. Selective PSD using real-time detection of PS strongly reduced PS amounts, leaving only brief PS bouts necessary for the detection of PS in EEG and EMG signals (4.7 ± 0.7% over 72 h, versus 8.9 ± 0.5% in baseline), and was followed by a significant PS rebound (23.3 ± 3.3% over 150 minutes). Our fully unsupervised data-driven algorithm overcomes some limitations of the other automated methods such as the selection of representative descriptors or threshold settings. When used online and coupled with our sleep deprivation device, it represents a better option for selective PSD than other methods like the tedious gentle handling or the platform method. © 2015 Associated Professional Sleep Societies, LLC.
On the unsupervised analysis of domain-specific Chinese texts

PubMed Central

Deng, Ke; Bol, Peter K.; Li, Kate J.; Liu, Jun S.

2016-01-01

With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We introduce an unsupervised method, top-down word discovery and segmentation (TopWORDS), for simultaneously discovering and segmenting words and phrases from large volumes of unstructured Chinese texts, and propose ways to order discovered words and conduct higher-level context analyses. TopWORDS is particularly useful for mining online and domain-specific texts where the underlying vocabulary is unknown or the texts of interest differ significantly from available training corpora. When outputs from TopWORDS are fed into context analysis tools such as topic modeling, word embedding, and association pattern finding, the results are as good as or better than that from using outputs of a supervised segmentation method. PMID:27185919
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering.

PubMed

Rodríguez-Sotelo, J L; Peluffo-Ordoñez, D; Cuesta-Frau, D; Castellanos-Domínguez, G

2012-10-01

The computer-assisted analysis of biomedical records has become an essential tool in clinical settings. However, current devices provide a growing amount of data that often exceeds the processing capacity of normal computers. As this amount of information rises, new demands for more efficient data extracting methods appear. This paper addresses the task of data mining in physiological records using a feature selection scheme. An unsupervised method based on relevance analysis is described. This scheme uses a least-squares optimization of the input feature matrix in a single iteration. The output of the algorithm is a feature weighting vector. The performance of the method was assessed using a heartbeat clustering test on real ECG records. The quantitative cluster validity measures yielded a correctly classified heartbeat rate of 98.69% (specificity), 85.88% (sensitivity) and 95.04% (general clustering performance), which is even higher than the performance achieved by other similar ECG clustering studies. The number of features was reduced on average from 100 to 18, and the temporal cost was a 43% lower than in previous ECG clustering schemes. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Statistical approaches for studying the wave climate of crossing-sea states

NASA Astrophysics Data System (ADS)

Barbariol, Francesco; Portilla, Jesus; Benetazzo, Alvise; Cavaleri, Luigi; Sclavo, Mauro; Carniel, Sandro

2017-04-01

Surface waves are an important feature of the world's oceans and seas. Their role in the air-sea exchanges is well recognized, together with their effects on the upper ocean and lower atmosphere dynamics. Physical processes involving surface waves contribute in driving the Earth's climate that, while experiencing changes at global and regional scales, in turn affects the surface waves climate over the oceans. The assessment of the wave climate at specific locations of the ocean is fruitful for many research fields in marine and atmospheric sciences and also for the human activities in the marine environment. Very often, wind generated waves (wind-sea) and one or more swell systems occur simultaneously, depending on the complexity of the atmospheric conditions that force the waves. Therefore, a wave climate assessed from the statistical analysis of long time series of integral wave parameters, can hardly say something about the frequency of occurrence of the so-called crossing-seas, as well as of their features. Directional wave spectra carry such information but proper statistical methods to analyze them are needed. In this respect, in order to identify the crossing sea states within the spectral time series and to assess their frequency of occurrence we exploit two advanced statistical techniques. First, we apply the Spectral Partitioning, a well-established method based on a two-step partitioning of the spectrum that allows to identify the individual wave systems and to compute their probability of occurrence in the frequency/direction space. Then, we use the Self-Organizing Maps, an unsupervised neural network algorithm that quantize the time series by autonomously identifying an arbitrary (small) number of wave spectra representing the whole wave climate, each with its frequency of occurrence. This method has been previously applied to time series of wave parameters and for the first time is applied to directional wave spectra. We analyze the wave climate of one of the most severe regions of the Mediterranean Sea, between north-west Sardinia and the Gulf of Lion, where quite often wave systems coming from different directions superpose. Time series for the analysis is taken from the ERA-Interim Reanalysis dataset, which provides global directional wave spectra at 1° resolution, starting from 1979 up to the present. Results from the two techniques are shown to be consistent, and their comparison points out the contribution that each technique can provide for a more detailed interpretation of the wave climate.
Best friends' interactions and substance use: The role of friend pressure and unsupervised co-deviancy.

PubMed

Tsakpinoglou, Florence; Poulin, François

2017-10-01

Best friends exert a substantial influence on rising alcohol and marijuana use during adolescence. Two mechanisms occurring within friendship - friend pressure and unsupervised co-deviancy - may partially capture the way friends influence one another. The current study aims to: (1) examine the psychometric properties of a new instrument designed to assess pressure from a youth's best friend and unsupervised co-deviancy; (2) investigate the relative contribution of these processes to alcohol and marijuana use; and (3) determine whether gender moderates these associations. Data were collected through self-report questionnaires completed by 294 Canadian youths (62% female) across two time points (ages 15-16). Principal component analysis yielded a two-factor solution corresponding to friend pressure and unsupervised co-deviancy. Logistic regressions subsequently showed that unsupervised co-deviancy was predictive of an increase in marijuana use one year later. Neither process predicted an increase in alcohol use. Results did not differ as a function of gender. Copyright © 2017 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.
Use of strainrange partitioning to predict high temperature low-cycle fatigue life. [of metallic materials

NASA Technical Reports Server (NTRS)

Hirschberg, M. H.; Halford, G. R.

1976-01-01

The fundamental concepts of the strainrange partitioning approach to high temperature, low low-cycle fatigue are reviewed. Procedures are presented by which the partitioned strainrange versus life relationships for any material can be generated. Laboratory tests are suggested for further verifying the ability of the method of strainrange partitioning to predict life.
The neural network classification of false killer whale (Pseudorca crassidens) vocalizations.

PubMed

Murray, S O; Mercado, E; Roitblat, H L

1998-12-01

This study reports the use of unsupervised, self-organizing neural network to categorize the repertoire of false killer whale vocalizations. Self-organizing networks are capable of detecting patterns in their input and partitioning those patterns into categories without requiring that the number or types of categories be predefined. The inputs for the neural networks were two-dimensional characterization of false killer whale vocalization, where each vocalization was characterized by a sequence of short-time measurements of duty cycle and peak frequency. The first neural network used competitive learning, where units in a competitive layer distributed themselves to recognize frequently presented input vectors. This network resulted in classes representing typical patterns in the vocalizations. The second network was a Kohonen feature map which organized the outputs topologically, providing a graphical organization of pattern relationships. The networks performed well as measured by (1) the average correlation between the input vectors and the weight vectors for each category, and (2) the ability of the networks to classify novel vocalizations. The techniques used in this study could easily be applied to other species and facilitate the development of objective, comprehensive repertoire models.
An unsupervised classification approach for analysis of Landsat data to monitor land reclamation in Belmont county, Ohio

NASA Technical Reports Server (NTRS)

Brumfield, J. O.; Bloemer, H. H. L.; Campbell, W. J.

1981-01-01

Two unsupervised classification procedures for analyzing Landsat data used to monitor land reclamation in a surface mining area in east central Ohio are compared for agreement with data collected from the corresponding locations on the ground. One procedure is based on a traditional unsupervised-clustering/maximum-likelihood algorithm sequence that assumes spectral groupings in the Landsat data in n-dimensional space; the other is based on a nontraditional unsupervised-clustering/canonical-transformation/clustering algorithm sequence that not only assumes spectral groupings in n-dimensional space but also includes an additional feature-extraction technique. It is found that the nontraditional procedure provides an appreciable improvement in spectral groupings and apparently increases the level of accuracy in the classification of land cover categories.
Transcriptomic-based effects monitoring for endocrine active chemicals: Assessing relative contribution of treated wastewater to downstream pollution

EPA Science Inventory

The present study investigated whether combining of targeted analytical chemistry methods with unsupervised, data-rich methodologies (i.e. transcriptomics) can be utilized to evaluate relative contributions of wastewater treatment plant (WWTP) effluents to biological effects. The...
Machine learning for neuroimaging with scikit-learn.

PubMed

Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

2014-01-01

Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.
Machine learning for neuroimaging with scikit-learn

PubMed Central

Abraham, Alexandre; Pedregosa, Fabian; Eickenberg, Michael; Gervais, Philippe; Mueller, Andreas; Kossaifi, Jean; Gramfort, Alexandre; Thirion, Bertrand; Varoquaux, Gaël

2014-01-01

Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. PMID:24600388
Metric Learning for Hyperspectral Image Segmentation

NASA Technical Reports Server (NTRS)

Bue, Brian D.; Thompson, David R.; Gilmore, Martha S.; Castano, Rebecca

2011-01-01

We present a metric learning approach to improve the performance of unsupervised hyperspectral image segmentation. Unsupervised spatial segmentation can assist both user visualization and automatic recognition of surface features. Analysts can use spatially-continuous segments to decrease noise levels and/or localize feature boundaries. However, existing segmentation methods use tasks-agnostic measures of similarity. Here we learn task-specific similarity measures from training data, improving segment fidelity to classes of interest. Multiclass Linear Discriminate Analysis produces a linear transform that optimally separates a labeled set of training classes. The defines a distance metric that generalized to a new scenes, enabling graph-based segmentation that emphasizes key spectral features. We describe tests based on data from the Compact Reconnaissance Imaging Spectrometer (CRISM) in which learned metrics improve segment homogeneity with respect to mineralogical classes.
Perceptual approach for unsupervised digital color restoration of cinematographic archives

NASA Astrophysics Data System (ADS)

Chambah, Majed; Rizzi, Alessandro; Gatta, Carlo; Besserer, Bernard; Marini, Daniele

2003-01-01

The cinematographic archives represent an important part of our collective memory. We present in this paper some advances in automating the color fading restoration process, especially with regard to the automatic color correction technique. The proposed color correction method is based on the ACE model, an unsupervised color equalization algorithm based on a perceptual approach and inspired by some adaptation mechanisms of the human visual system, in particular lightness constancy and color constancy. There are some advantages in a perceptual approach: mainly its robustness and its local filtering properties, that lead to more effective results. The resulting technique, is not just an application of ACE on movie images, but an enhancement of ACE principles to meet the requirements in the digital film restoration field. The presented preliminary results are satisfying and promising.

Analysis On Land Cover In Municipality Of Malang With Landsat 8 Image Through Unsupervised Classification

NASA Astrophysics Data System (ADS)

Nahari, R. V.; Alfita, R.

2018-01-01

Remote sensing technology has been widely used in the geographic information system in order to obtain data more quickly, accurately and affordably. One of the advantages of using remote sensing imagery (satellite imagery) is to analyze land cover and land use. Satellite image data used in this study were images from the Landsat 8 satellite combined with the data from the Municipality of Malang government. The satellite image was taken in July 2016. Furthermore, the method used in this study was unsupervised classification. Based on the analysis towards the satellite images and field observations, 29% of the land in the Municipality of Malang was plantation, 22% of the area was rice field, 12% was residential area, 10% was land with shrubs, and the remaining 2% was water (lake/reservoir). The shortcoming of the methods was 25% of the land in the area was unidentified because it was covered by cloud. It is expected that future researchers involve cloud removal processing to minimize unidentified area.
Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics.

PubMed

Yang, Jian; Zhang, David; Yang, Jing-Yu; Niu, Ben

2007-04-01

This paper develops an unsupervised discriminant projection (UDP) technique for dimensionality reduction of high-dimensional data in small sample size cases. UDP can be seen as a linear approximation of a multimanifolds-based learning framework which takes into account both the local and nonlocal quantities. UDP characterizes the local scatter as well as the nonlocal scatter, seeking to find a projection that simultaneously maximizes the nonlocal scatter and minimizes the local scatter. This characteristic makes UDP more intuitive and more powerful than the most up-to-date method, Locality Preserving Projection (LPP), which considers only the local scatter for clustering or classification tasks. The proposed method is applied to face and palm biometrics and is examined using the Yale, FERET, and AR face image databases and the PolyU palmprint database. The experimental results show that UDP consistently outperforms LPP and PCA and outperforms LDA when the training sample size per class is small. This demonstrates that UDP is a good choice for real-world biometrics applications.
A neural-visualization IDS for honeynet data.

PubMed

Herrero, Álvaro; Zurutuza, Urko; Corchado, Emilio

2012-04-01

Neural intelligent systems can provide a visualization of the network traffic for security staff, in order to reduce the widely known high false-positive rate associated with misuse-based Intrusion Detection Systems (IDSs). Unlike previous work, this study proposes an unsupervised neural models that generate an intuitive visualization of the captured traffic, rather than network statistics. These snapshots of network events are immensely useful for security personnel that monitor network behavior. The system is based on the use of different neural projection and unsupervised methods for the visual inspection of honeypot data, and may be seen as a complementary network security tool that sheds light on internal data structures through visual inspection of the traffic itself. Furthermore, it is intended to facilitate verification and assessment of Snort performance (a well-known and widely-used misuse-based IDS), through the visualization of attack patterns. Empirical verification and comparison of the proposed projection methods are performed in a real domain, where two different case studies are defined and analyzed.
Age and gender classification in the wild with unsupervised feature learning

NASA Astrophysics Data System (ADS)

Wan, Lihong; Huo, Hong; Fang, Tao

2017-03-01

Inspired by unsupervised feature learning (UFL) within the self-taught learning framework, we propose a method based on UFL, convolution representation, and part-based dimensionality reduction to handle facial age and gender classification, which are two challenging problems under unconstrained circumstances. First, UFL is introduced to learn selective receptive fields (filters) automatically by applying whitening transformation and spherical k-means on random patches collected from unlabeled data. The learning process is fast and has no hyperparameters to tune. Then, the input image is convolved with these filters to obtain filtering responses on which local contrast normalization is applied. Average pooling and feature concatenation are then used to form global face representation. Finally, linear discriminant analysis with part-based strategy is presented to reduce the dimensions of the global representation and to improve classification performances further. Experiments on three challenging databases, namely, Labeled faces in the wild, Gallagher group photos, and Adience, demonstrate the effectiveness of the proposed method relative to that of state-of-the-art approaches.
Unsupervised spike sorting based on discriminative subspace learning.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2014-01-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification

NASA Astrophysics Data System (ADS)

Lin, Daoyu; Fu, Kun; Wang, Yang; Xu, Guangluan; Sun, Xian

2017-11-01

With the development of deep learning, supervised learning has frequently been adopted to classify remotely sensed images using convolutional networks (CNNs). However, due to the limited amount of labeled data available, supervised learning is often difficult to carry out. Therefore, we proposed an unsupervised model called multiple-layer feature-matching generative adversarial networks (MARTA GANs) to learn a representation using only unlabeled data. MARTA GANs consists of both a generative model $G$ and a discriminative model $D$. We treat $D$ as a feature extractor. To fit the complex properties of remote sensing data, we use a fusion layer to merge the mid-level and global features. $G$ can produce numerous images that are similar to the training data; therefore, $D$ can learn better representations of remotely sensed images using the training data provided by $G$. The classification results on two widely used remote sensing image databases show that the proposed method significantly improves the classification performance compared with other state-of-the-art methods.
Automatic identification of the number of food items in a meal using clustering techniques based on the monitoring of swallowing and chewing.

PubMed

Lopez-Meyer, Paulo; Schuckers, Stephanie; Makeyev, Oleksandr; Fontana, Juan M; Sazonov, Edward

2012-09-01

The number of distinct foods consumed in a meal is of significant clinical concern in the study of obesity and other eating disorders. This paper proposes the use of information contained in chewing and swallowing sequences for meal segmentation by food types. Data collected from experiments of 17 volunteers were analyzed using two different clustering techniques. First, an unsupervised clustering technique, Affinity Propagation (AP), was used to automatically identify the number of segments within a meal. Second, performance of the unsupervised AP method was compared to a supervised learning approach based on Agglomerative Hierarchical Clustering (AHC). While the AP method was able to obtain 90% accuracy in predicting the number of food items, the AHC achieved an accuracy >95%. Experimental results suggest that the proposed models of automatic meal segmentation may be utilized as part of an integral application for objective Monitoring of Ingestive Behavior in free living conditions.
Unsupervised Indoor Localization Based on Smartphone Sensors, iBeacon and Wi-Fi.

PubMed

Chen, Jing; Zhang, Yi; Xue, Wei

2018-04-28

In this paper, we propose UILoc, an unsupervised indoor localization scheme that uses a combination of smartphone sensors, iBeacons and Wi-Fi fingerprints for reliable and accurate indoor localization with zero labor cost. Firstly, compared with the fingerprint-based method, the UILoc system can build a fingerprint database automatically without any site survey and the database will be applied in the fingerprint localization algorithm. Secondly, since the initial position is vital to the system, UILoc will provide the basic location estimation through the pedestrian dead reckoning (PDR) method. To provide accurate initial localization, this paper proposes an initial localization module, a weighted fusion algorithm combined with a k-nearest neighbors (KNN) algorithm and a least squares algorithm. In UILoc, we have also designed a reliable model to reduce the landmark correction error. Experimental results show that the UILoc can provide accurate positioning, the average localization error is about 1.1 m in the steady state, and the maximum error is 2.77 m.
The effects of an unsupervised water exercise program on low back pain and sick leave among healthy pregnant women – A randomised controlled trial

PubMed Central

Tabor, Ann; Albert, Hanne; Rosthøj, Susanne; Damm, Peter; Hegaard, Hanne K.

2017-01-01

Background Low back pain is highly prevalent among pregnant women, but evidence of an effective treatment are still lacking. Supervised exercise–either land or water based–has shown benefits for low back pain, but no trial has investigated the evidence of an unsupervised water exercise program on low back pain. We aimed to assess the effect of an unsupervised water exercise program on low back pain intensity and days spent on sick leave among healthy pregnant women. Methods In this randomised, controlled, parallel-group trial, 516 healthy pregnant women were randomly assigned to either unsupervised water exercise twice a week for a period of 12 weeks or standard prenatal care. Healthy pregnant women aged 18 years or older, with a single fetus and between 16–17 gestational weeks were eligible. The primary outcome was low back pain intensity measured by the Low Back Pain Rating scale at 32 weeks. The secondary outcomes were self-reported days spent on sick leave, disability due to low back pain (Roland Morris Disability Questionnaire) and self-rated general health (EQ-5D and EQ-VAS). Results Low back pain intensity was significantly lower in the water exercise group, with a score of 2.01 (95% CI 1.75–2.26) vs. 2.38 in the control group (95% CI 2.12–2.64) (mean difference = 0.38, 95% CI 0.02–0.74 p = 0.04). No difference was found in the number of days spent on sick leave (median 4 vs. 4, p = 0.83), disability due to low back pain nor self-rated general health. There was a trend towards more women in the water exercise group reporting no low back pain at 32 weeks (21% vs. 14%, p = 0.07). Conclusions Unsupervised water exercise results in a statistically significant lower intensity of low back pain in healthy pregnant women, but the result was most likely not clinically significant. It did not affect the number of days on sick leave, disability due to low back pain nor self-rated health. Trial registration ClinicalTrials.gov NCT02354430 PMID:28877165
Unsupervised ensemble ranking of terms in electronic health record notes based on their importance to patients.

PubMed

Chen, Jinying; Yu, Hong

2017-04-01

Allowing patients to access their own electronic health record (EHR) notes through online patient portals has the potential to improve patient-centered care. However, EHR notes contain abundant medical jargon that can be difficult for patients to comprehend. One way to help patients is to reduce information overload and help them focus on medical terms that matter most to them. Targeted education can then be developed to improve patient EHR comprehension and the quality of care. The aim of this work was to develop FIT (Finding Important Terms for patients), an unsupervised natural language processing (NLP) system that ranks medical terms in EHR notes based on their importance to patients. We built FIT on a new unsupervised ensemble ranking model derived from the biased random walk algorithm to combine heterogeneous information resources for ranking candidate terms from each EHR note. Specifically, FIT integrates four single views (rankers) for term importance: patient use of medical concepts, document-level term salience, word co-occurrence based term relatedness, and topic coherence. It also incorporates partial information of term importance as conveyed by terms' unfamiliarity levels and semantic types. We evaluated FIT on 90 expert-annotated EHR notes and used the four single-view rankers as baselines. In addition, we implemented three benchmark unsupervised ensemble ranking methods as strong baselines. FIT achieved 0.885 AUC-ROC for ranking candidate terms from EHR notes to identify important terms. When including term identification, the performance of FIT for identifying important terms from EHR notes was 0.813 AUC-ROC. Both performance scores significantly exceeded the corresponding scores from the four single rankers (P<0.001). FIT also outperformed the three ensemble rankers for most metrics. Its performance is relatively insensitive to its parameter. FIT can automatically identify EHR terms important to patients. It may help develop future interventions to improve quality of care. By using unsupervised learning as well as a robust and flexible framework for information fusion, FIT can be readily applied to other domains and applications. Copyright © 2017 Elsevier Inc. All rights reserved.
Will an Unsupervised Self-Testing Strategy for HIV Work in Health Care Workers of South Africa? A Cross Sectional Pilot Feasibility Study

PubMed Central

Pant Pai, Nitika; Behlim, Tarannum; Abrahams, Lameze; Vadnais, Caroline; Shivkumar, Sushmita; Pillay, Sabrina; Binder, Anke; Deli-Houssein, Roni; Engel, Nora; Joseph, Lawrence; Dheda, Keertan

2013-01-01

Background In South Africa, stigma, discrimination, social visibility and fear of loss of confidentiality impede health facility-based HIV testing. With 50% of adults having ever tested for HIV in their lifetime, private, alternative testing options are urgently needed. Non-invasive, oral self-tests offer a potential for a confidential, unsupervised HIV self-testing option, but global data are limited. Methods A pilot cross-sectional study was conducted from January to June 2012 in health care workers based at the University of Cape Town, South Africa. An innovative, unsupervised, self-testing strategy was evaluated for feasibility; defined as completion of self-testing process (i.e., self test conduct, interpretation and linkage). An oral point-of-care HIV test, an Internet and paper-based self-test HIV applications, and mobile phones were synergized to create an unsupervised strategy. Self-tests were additionally confirmed with rapid tests on site and laboratory tests. Of 270 health care workers (18 years and above, of unknown HIV status approached), 251 consented for participation. Findings Overall, about 91% participants rated a positive experience with the strategy. Of 251 participants, 126 evaluated the Internet and 125 the paper-based application successfully; completion rate of 99.2%. All sero-positives were linked to treatment (completion rate:100% (95% CI, 66.0–100). About half of sero-negatives were offered counselling on mobile phones; completion rate: 44.6% (95% CI, 38.0–51.0). A majority of participants (78.1%) were females, aged 18–24 years (61.4%). Nine participants were found sero-positive after confirmatory tests (prevalence 3.6% 95% CI, 1.8–6.9). Six of nine positive self-tests were accurately interpreted; sensitivity: 66.7% (95% CI, 30.9–91.0); specificity:100% (95% CI, 98.1–100). Interpretation Our unsupervised self-testing strategy was feasible to operationalize in health care workers in South Africa. Linkages were successfully operationalized with mobile phones in all sero-positives and about half of the sero-negatives sought post-test counselling. Controlled trials and implementation research studies are needed before a scale-up is considered. PMID:24312185
A Mobile App to Stabilize Daily Functional Activity of Breast Cancer Patients in Collaboration With the Physician: A Randomized Controlled Clinical Trial

PubMed Central

Egbring, Marco; Far, Elmira; Roos, Malgorzata; Dietrich, Michael; Brauchbar, Mathis; Kullak-Ublick, Gerd A

2016-01-01

Background The well-being of breast cancer patients and reporting of adverse events require close monitoring. Mobile apps allow continuous recording of disease- and medication-related symptoms in patients undergoing chemotherapy. Objective The aim of the study was to evaluate the effects of a mobile app on patient-reported daily functional activity in a supervised and unsupervised setting. Methods We conducted a randomized controlled study of 139 breast cancer patients undergoing chemotherapy. Patient status was self-measured using Eastern Cooperative Oncology Group scoring and Common Terminology Criteria for Adverse Events. Participants were randomly assigned to a control group, an unsupervised group that used a mobile app to record data, or a supervised group that used the app and reviewed data with a physician. Primary outcome variables were change in daily functional activity and symptoms over three outpatient visits. Results Functional activity scores declined in all groups from the first to second visit. However, from the second to third visit, only the supervised group improved, whereas the others continued to decline. Overall, the supervised group showed no significant difference from the first (median 90.85, IQR 30.67) to third visit (median 84.76, IQR 18.29, P=.72). Both app-using groups reported more distinct adverse events in the app than in the questionnaire (supervised: n=1033 vs n=656; unsupervised: n=852 vs n=823), although the unsupervised group reported more symptoms overall (n=4808) in the app than the supervised group (n=4463). Conclusions The mobile app was associated with stabilized daily functional activity when used under collaborative review. App-using participants could more frequently report adverse events, and those under supervision made fewer and more precise entries than unsupervised participants. Our findings suggest that patient well-being and awareness of chemotherapy adverse effects can be improved by using a mobile app in collaboration with the treating physician. ClinicalTrial ClinicalTrials.gov NCT02004496; https://clinicaltrials.gov/ct2/show/NCT02004496 (Archived by WebCite at http://www.webcitation.org/6k68FZHo2) PMID:27601354
An Unsupervised Change Detection Method Using Time-Series of PolSAR Images from Radarsat-2 and GaoFen-3.

PubMed

Liu, Wensong; Yang, Jie; Zhao, Jinqi; Shi, Hongtao; Yang, Le

2018-02-12

The traditional unsupervised change detection methods based on the pixel level can only detect the changes between two different times with same sensor, and the results are easily affected by speckle noise. In this paper, a novel method is proposed to detect change based on time-series data from different sensors. Firstly, the overall difference image of the time-series PolSAR is calculated by omnibus test statistics, and difference images between any two images in different times are acquired by R j test statistics. Secondly, the difference images are segmented with a Generalized Statistical Region Merging (GSRM) algorithm which can suppress the effect of speckle noise. Generalized Gaussian Mixture Model (GGMM) is then used to obtain the time-series change detection maps in the final step of the proposed method. To verify the effectiveness of the proposed method, we carried out the experiment of change detection using time-series PolSAR images acquired by Radarsat-2 and Gaofen-3 over the city of Wuhan, in China. Results show that the proposed method can not only detect the time-series change from different sensors, but it can also better suppress the influence of speckle noise and improve the overall accuracy and Kappa coefficient.
Spectral Transfer Learning Using Information Geometry for a User-Independent Brain-Computer Interface

PubMed Central

Waytowich, Nicholas R.; Lawhern, Vernon J.; Bohannon, Addison W.; Ball, Kenneth R.; Lance, Brent J.

2016-01-01

Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry, STIG), which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIG method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as outperform traditional within-subject calibration techniques when limited data is available. This method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system. PMID:27713685
Spectral Transfer Learning Using Information Geometry for a User-Independent Brain-Computer Interface.

PubMed

Waytowich, Nicholas R; Lawhern, Vernon J; Bohannon, Addison W; Ball, Kenneth R; Lance, Brent J

2016-01-01

Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry, STIG), which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIG method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as outperform traditional within-subject calibration techniques when limited data is available. This method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system.
Novel Hyperspectral Anomaly Detection Methods Based on Unsupervised Nearest Regularized Subspace

NASA Astrophysics Data System (ADS)

Hou, Z.; Chen, Y.; Tan, K.; Du, P.

2018-04-01

Anomaly detection has been of great interest in hyperspectral imagery analysis. Most conventional anomaly detectors merely take advantage of spectral and spatial information within neighboring pixels. In this paper, two methods of Unsupervised Nearest Regularized Subspace-based with Outlier Removal Anomaly Detector (UNRSORAD) and Local Summation UNRSORAD (LSUNRSORAD) are proposed, which are based on the concept that each pixel in background can be approximately represented by its spatial neighborhoods, while anomalies cannot. Using a dual window, an approximation of each testing pixel is a representation of surrounding data via a linear combination. The existence of outliers in the dual window will affect detection accuracy. Proposed detectors remove outlier pixels that are significantly different from majority of pixels. In order to make full use of various local spatial distributions information with the neighboring pixels of the pixels under test, we take the local summation dual-window sliding strategy. The residual image is constituted by subtracting the predicted background from the original hyperspectral imagery, and anomalies can be detected in the residual image. Experimental results show that the proposed methods have greatly improved the detection accuracy compared with other traditional detection method.
Unsupervised Learning and Pattern Recognition of Biological Data Structures with Density Functional Theory and Machine Learning.

PubMed

Chen, Chien-Chang; Juan, Hung-Hui; Tsai, Meng-Yuan; Lu, Henry Horng-Shing

2018-01-11

By introducing the methods of machine learning into the density functional theory, we made a detour for the construction of the most probable density function, which can be estimated by learning relevant features from the system of interest. Using the properties of universal functional, the vital core of density functional theory, the most probable cluster numbers and the corresponding cluster boundaries in a studying system can be simultaneously and automatically determined and the plausibility is erected on the Hohenberg-Kohn theorems. For the method validation and pragmatic applications, interdisciplinary problems from physical to biological systems were enumerated. The amalgamation of uncharged atomic clusters validated the unsupervised searching process of the cluster numbers and the corresponding cluster boundaries were exhibited likewise. High accurate clustering results of the Fisher's iris dataset showed the feasibility and the flexibility of the proposed scheme. Brain tumor detections from low-dimensional magnetic resonance imaging datasets and segmentations of high-dimensional neural network imageries in the Brainbow system were also used to inspect the method practicality. The experimental results exhibit the successful connection between the physical theory and the machine learning methods and will benefit the clinical diagnoses.
The relationship between unsupervised time after school and physical activity in adolescent girls

PubMed Central

Rushovich, Berenice R; Voorhees, Carolyn C; Davis, CE; Neumark-Sztainer, Dianne; Pfeiffer, Karin A; Elder, John P; Going, Scott; Marino, Vivian G

2006-01-01

Background Rising obesity and declining physical activity levels are of great concern because of the associated health risks. Many children are left unsupervised after the school day ends, but little is known about the association between unsupervised time and physical activity levels. This paper seeks to determine whether adolescent girls who are without adult supervision after school are more or less active than their peers who have a caregiver at home. Methods A random sample of girls from 36 middle schools at 6 field sites across the U.S. was selected during the fall of the 2002–2003 school year to participate in the baseline measurement activities of the Trial of Activity for Adolescent Girls (TAAG). Information was collected using six-day objectively measured physical activity, self-reported physical activity using a three-day recall, and socioeconomic and psychosocial measures. Complete information was available for 1422 out of a total of 1596 respondents. Categorical variables were analyzed using chi square and continuous variables were analyzed by t-tests. The four categories of time alone were compared using a mixed linear model controlling for clustering effects by study center. Results Girls who spent more time after school (≥2 hours per day, ≥2 days per week) without adult supervision were more active than those with adult supervision (p = 0.01). Girls alone for ≥2 hours after school, ≥2 days a week, on average accrue 7.55 minutes more moderate to vigorous physical activity (MVPA) per day than do girls who are supervised (95% confidence interval ([C.I]). These results adjusted for ethnicity, parent's education, participation in the free/reduced lunch program, neighborhood resources, or available transportation. Unsupervised girls (n = 279) did less homework (53.1% vs. 63.3%), spent less time riding in a car or bus (48.0% vs. 56.6%), talked on the phone more (35.5% vs. 21.1%), and watched more television (59.9% vs. 52.6%) than supervised girls (n = 569). However, unsupervised girls also were more likely to be dancing (14.0% vs. 9.3%) and listening to music (20.8% vs. 12.0%) (p < .05). Conclusion Girls in an unsupervised environment engaged in fewer structured activities and did not immediately do their homework, but they were more likely to be physically active than supervised girls. These results may have implications for parents, school, and community agencies as to how to structure activities in order to encourage teenage girls to be more physically active. PMID:16879750
Air Traffic Sector Configuration Change Frequency

NASA Technical Reports Server (NTRS)

Chatterji, Gano Broto; Drew, Michael

2009-01-01

Several techniques for partitioning airspace have been developed in the literature. The question of whether a region of airspace created by such methods can be used with other days of traffic, and the number of times a different partition is needed during the day is examined in this paper. Both these aspects are examined for the Fort Worth Center airspace sectors. A Mixed Integer Linear Programming method is used with actual air traffic data of ten high-volume low-weather-delay days for creating sectors. Nine solutions were obtained for each two-hour period of the day by partitioning the center airspace into two through 18 sectors in steps of two sectors. Actual track-data were played back with the generated partitions for creating histograms of the traffic-counts. The best partition for each two-hour period was then identified based on the nine traffic-count distributions. Numbers of sectors in such partitions were analyzed to determine the number of times a different configuration is needed during the day. One to three partitions were selected for the 24-hour period, and traffic data from ten days were played back to test if the traffic-counts stayed below the threshold values associated with these partitions. Results show that these partitions are robust and can be used for longer durations than they were designed for
UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets.

PubMed

Abu-Jamous, Basel; Fa, Rui; Roberts, David J; Nandi, Asoke K

2015-06-04

Collective analysis of the increasingly emerging gene expression datasets are required. The recently proposed binarisation of consensus partition matrices (Bi-CoPaM) method can combine clustering results from multiple datasets to identify the subsets of genes which are consistently co-expressed in all of the provided datasets in a tuneable manner. However, results validation and parameter setting are issues that complicate the design of such methods. Moreover, although it is a common practice to test methods by application to synthetic datasets, the mathematical models used to synthesise such datasets are usually based on approximations which may not always be sufficiently representative of real datasets. Here, we propose an unsupervised method for the unification of clustering results from multiple datasets using external specifications (UNCLES). This method has the ability to identify the subsets of genes consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets, and to identify the subsets of genes consistently co-expressed in all given datasets. We also propose the M-N scatter plots validation technique and adopt it to set the parameters of UNCLES, such as the number of clusters, automatically. Additionally, we propose an approach for the synthesis of gene expression datasets using real data profiles in a way which combines the ground-truth-knowledge of synthetic data and the realistic expression values of real data, and therefore overcomes the problem of faithfulness of synthetic expression data modelling. By application to those datasets, we validate UNCLES while comparing it with other conventional clustering methods, and of particular relevance, biclustering methods. We further validate UNCLES by application to a set of 14 real genome-wide yeast datasets as it produces focused clusters that conform well to known biological facts. Furthermore, in-silico-based hypotheses regarding the function of a few previously unknown genes in those focused clusters are drawn. The UNCLES method, the M-N scatter plots technique, and the expression data synthesis approach will have wide application for the comprehensive analysis of genomic and other sources of multiple complex biological datasets. Moreover, the derived in-silico-based biological hypotheses represent subjects for future functional studies.

Interactogeneous: Disease Gene Prioritization Using Heterogeneous Networks and Full Topology Scores

PubMed Central

Gonçalves, Joana P.; Francisco, Alexandre P.; Moreau, Yves; Madeira, Sara C.

2012-01-01

Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease. PMID:23185389
Evidential analysis of difference images for change detection of multitemporal remote sensing images

NASA Astrophysics Data System (ADS)

Chen, Yin; Peng, Lijuan; Cremers, Armin B.

2018-03-01

In this article, we develop two methods for unsupervised change detection in multitemporal remote sensing images based on Dempster-Shafer's theory of evidence (DST). In most unsupervised change detection methods, the probability of difference image is assumed to be characterized by mixture models, whose parameters are estimated by the expectation maximization (EM) method. However, the main drawback of the EM method is that it does not consider spatial contextual information, which may entail rather noisy detection results with numerous spurious alarms. To remedy this, we firstly develop an evidence theory based EM method (EEM) which incorporates spatial contextual information in EM by iteratively fusing the belief assignments of neighboring pixels to the central pixel. Secondly, an evidential labeling method in the sense of maximizing a posteriori probability (MAP) is proposed in order to further enhance the detection result. It first uses the parameters estimated by EEM to initialize the class labels of a difference image. Then it iteratively fuses class conditional information and spatial contextual information, and updates labels and class parameters. Finally it converges to a fixed state which gives the detection result. A simulated image set and two real remote sensing data sets are used to evaluate the two evidential change detection methods. Experimental results show that the new evidential methods are comparable to other prevalent methods in terms of total error rate.
Evolving bipartite authentication graph partitions

DOE PAGES

Pope, Aaron Scott; Tauritz, Daniel Remy; Kent, Alexander D.

2017-01-16

As large scale enterprise computer networks become more ubiquitous, finding the appropriate balance between user convenience and user access control is an increasingly challenging proposition. Suboptimal partitioning of users’ access and available services contributes to the vulnerability of enterprise networks. Previous edge-cut partitioning methods unduly restrict users’ access to network resources. This paper introduces a novel method of network partitioning superior to the current state-of-the-art which minimizes user impact by providing alternate avenues for access that reduce vulnerability. Networks are modeled as bipartite authentication access graphs and a multi-objective evolutionary algorithm is used to simultaneously minimize the size of largemore » connected components while minimizing overall restrictions on network users. Lastly, results are presented on a real world data set that demonstrate the effectiveness of the introduced method compared to previous naive methods.« less
Evolving bipartite authentication graph partitions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pope, Aaron Scott; Tauritz, Daniel Remy; Kent, Alexander D.

As large scale enterprise computer networks become more ubiquitous, finding the appropriate balance between user convenience and user access control is an increasingly challenging proposition. Suboptimal partitioning of users’ access and available services contributes to the vulnerability of enterprise networks. Previous edge-cut partitioning methods unduly restrict users’ access to network resources. This paper introduces a novel method of network partitioning superior to the current state-of-the-art which minimizes user impact by providing alternate avenues for access that reduce vulnerability. Networks are modeled as bipartite authentication access graphs and a multi-objective evolutionary algorithm is used to simultaneously minimize the size of largemore » connected components while minimizing overall restrictions on network users. Lastly, results are presented on a real world data set that demonstrate the effectiveness of the introduced method compared to previous naive methods.« less
Employing broadband spectra and cluster analysis to assess thermal defoliation of cotton

USDA-ARS?s Scientific Manuscript database

Growers and field scouts need assistance in surveying cotton (Gossypium hirsutum L.) fields subjected to thermal defoliation to reap the benefits provided by this nonchemical defoliation method. A study was conducted to evaluate broadband spectral data and unsupervised classification as tools for s...
A Fast Implementation of the ISOCLUS Algorithm

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline

2003-01-01

Unsupervised clustering is a fundamental tool in numerous image processing and remote sensing applications. For example, unsupervised clustering is often used to obtain vegetation maps of an area of interest. This approach is useful when reliable training data are either scarce or expensive, and when relatively little a priori information about the data is available. Unsupervised clustering methods play a significant role in the pursuit of unsupervised classification. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points (or samples) in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute a set of cluster centers in d-space. Although there is no specific optimization criterion, the algorithm is similar in spirit to the well known k-means clustering method in which the objective is to minimize the average squared distance of each point to its nearest center, called the average distortion. One significant feature of ISOCLUS over k-means is that clusters may be merged or split, and so the final number of clusters may be different from the number k supplied as part of the input. This algorithm will be described in later in this paper. The ISOCLUS algorithm can run very slowly, particularly on large data sets. Given its wide use in remote sensing, its efficient computation is an important goal. We have developed a fast implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm, the filtering algorithm, by Kanungo et al.. They showed that, by storing the data in a kd-tree, it was possible to significantly reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm. For technical reasons, which are explained later, it is necessary to make a minor modification to the ISOCLUS specification. We provide empirical evidence, on both synthetic and Landsat image data sets, that our algorithm's performance is essentially the same as that of ISOCLUS, but with significantly lower running times. We show that our algorithm runs from 3 to 30 times faster than a straightforward implementation of ISOCLUS. Our adaptation of the filtering algorithm involves the efficient computation of a number of cluster statistics that are needed for ISOCLUS, but not for k-means.
Supervised versus unsupervised categorization: two sides of the same coin?

PubMed

Pothos, Emmanuel M; Edwards, Darren J; Perlman, Amotz

2011-09-01

Supervised and unsupervised categorization have been studied in separate research traditions. A handful of studies have attempted to explore a possible convergence between the two. The present research builds on these studies, by comparing the unsupervised categorization results of Pothos et al. ( 2011 ; Pothos et al., 2008 ) with the results from two procedures of supervised categorization. In two experiments, we tested 375 participants with nine different stimulus sets and examined the relation between ease of learning of a classification, memory for a classification, and spontaneous preference for a classification. After taking into account the role of the number of category labels (clusters) in supervised learning, we found the three variables to be closely associated with each other. Our results provide encouragement for researchers seeking unified theoretical explanations for supervised and unsupervised categorization, but raise a range of challenging theoretical questions.
Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval.

PubMed

Wei, Xiu-Shen; Luo, Jian-Hao; Wu, Jianxin; Zhou, Zhi-Hua

2017-06-01

Deep convolutional neural network models pre-trained for the ImageNet classification task have been successfully adopted to tasks in other domains, such as texture description and object proposal generation, but these tasks require annotations for images in the new domain. In this paper, we focus on a novel and challenging task in the pure unsupervised setting: fine-grained image retrieval. Even with image labels, fine-grained images are difficult to classify, letting alone the unsupervised retrieval task. We propose the selective convolutional descriptor aggregation (SCDA) method. The SCDA first localizes the main object in fine-grained images, a step that discards the noisy background and keeps useful deep descriptors. The selected descriptors are then aggregated and the dimensionality is reduced into a short feature vector using the best practices we found. The SCDA is unsupervised, using no image label or bounding box annotation. Experiments on six fine-grained data sets confirm the effectiveness of the SCDA for fine-grained image retrieval. Besides, visualization of the SCDA features shows that they correspond to visual attributes (even subtle ones), which might explain SCDA's high-mean average precision in fine-grained retrieval. Moreover, on general image retrieval data sets, the SCDA achieves comparable retrieval results with the state-of-the-art general image retrieval approaches.
The Effects of 6 Months of Progressive High Effort Resistance Training Methods upon Strength, Body Composition, Function, and Wellbeing of Elderly Adults.

PubMed

Steele, James; Raubold, Kristin; Kemmler, Wolfgang; Fisher, James; Gentil, Paulo; Giessing, Jürgen

2017-01-01

The present study examined the progressive implementation of a high effort resistance training (RT) approach in older adults over 6 months and through a 6-month follow-up on strength, body composition, function, and wellbeing of older adults. Twenty-three older adults (aged 61 to 80 years) completed a 6-month supervised RT intervention applying progressive introduction of higher effort set end points. After completion of the intervention participants could choose to continue performing RT unsupervised until 6-month follow-up. Strength, body composition, function, and wellbeing all significantly improved over the intervention. Over the follow-up, body composition changes reverted to baseline values, strength was reduced though it remained significantly higher than baseline, and wellbeing outcomes were mostly maintained. Comparisons over the follow-up between those who did and those who did not continue with RT revealed no significant differences for changes in any outcome measure. Supervised RT employing progressive application of high effort set end points is well tolerated and effective in improving strength, body composition, function, and wellbeing in older adults. However, whether participants continued, or did not, with RT unsupervised at follow-up had no effect on outcomes perhaps due to reduced effort employed during unsupervised RT.
VizieR Online Data Catalog: Redshift reliability flags (VVDS data) (Jamal+, 2018)

NASA Astrophysics Data System (ADS)

Jamal, S.; Le Brun, V.; Le Fevre, O.; Vibert, D.; Schmitt, A.; Surace, C.; Copin, Y.; Garilli, B.; Moresco, M.; Pozzetti, L.

2017-09-01

The VIMOS VLT Deep Survey (Le Fevre et al. 2013A&A...559A..14L) is a combination of 3 i-band magnitude limited surveys: Wide (17.5<=iAB<=22.5; 8.6deg2), Deep (17.5<=iAB<=24; 0.6deg2) and Ultra-Deep (23<=iAB<=24.75; 512arcmin2), that produced a total of 35526 spectroscopic galaxy redshifts between 0 and 6.7 (22434 in Wide, 12051 in Deep and 1041 in UDeep). We supplement spectra of the VIMOS VLT Deep Survey (VVDS) with newly-defined redshift reliability flags obtained from clustering (unsupervised classification in Machine Learning) a set of descriptors from individual zPDFs. In this paper, we exploit a set of 24519 spectra from the VVDS database. After computing zPDFs for each individual spectrum, a set of (8) descriptors of the zPDF are extracted to build a feature matrix X (dimension = 24519 rows, 8 columns). Then, we use a clustering (unsupervised algorithms in Machine Learning) algorithm to partition the feature space into distinct clusters (5 clusters: C1,C2,C3,C4,C5), each depicting a different level of confidence to associate with the measured redshift zMAP (Maximum-A-Posteriori estimate that corresponds to the maximum of the redshift PDF). The clustering results (C1,C2,C3,C4,C5) reported in the table are those used in the paper (Jamal et al, 2017) to present the new methodology of automating the zspec reliability assessment. In particular, we would like to point out that they were obtained from first tests conducted on the VVDS spectroscopic data (end of 2016). Therefore, the table does not depict immutable results (on-going improvements). Future updates of the VVDS redshift reliability flags can be expected. (1 data file).
Life prediction of thermal-mechanical fatigue using strainrange partitioning

NASA Technical Reports Server (NTRS)

Halford, G. R.; Manson, S. S.

1975-01-01

This paper describes the applicability of the method of Strainrange Partitioning to the life prediction of thermal-mechanical strain-cycling fatigue. An in-phase test on 316 stainless steel is analyzed as an illustrative example. The observed life is in excellent agreement with the life predicted by the method using the recently proposed Step-Stress Method of experimental partitioning, the Interaction Damage Rule, and the life relationships determined at an isothermal temperature of 705 C. Implications of the present study are discussed relative to the general thermal fatigue problem.
Life prediction of thermal-mechanical fatigue using strain-range partitioning

NASA Technical Reports Server (NTRS)

Halford, G. R.; Manson, S. S.

1975-01-01

The applicability is described of the method of Strainrange Partitioning to the life prediction of thermal-mechanical strain-cycling fatigue. An in-phase test on 316 stainless steel is analyzed as an illustrative example. The observed life is in excellent agreement with the life predicted by the method using the recently proposed Step-Stress Method of experimental partitioning, the Interation Damage Rule, and the life relationships determined at an isothermal temperature of 705 C. Implications of the study are discussed relative to the general thermal fatigue problem.
Choosing the best partition of the output from a large-scale simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Challacombe, Chelsea Jordan; Casleton, Emily Michele

Data partitioning becomes necessary when a large-scale simulation produces more data than can be feasibly stored. The goal is to partition the data, typically so that every element belongs to one and only one partition, and store summary information about the partition, either a representative value plus an estimate of the error or a distribution. Once the partitions are determined and the summary information stored, the raw data is discarded. This process can be performed in-situ; meaning while the simulation is running. When creating the partitions there are many decisions that researchers must make. For instance, how to determine oncemore » an adequate number of partitions have been created, how are the partitions created with respect to dividing the data, or how many variables should be considered simultaneously. In addition, decisions must be made for how to summarize the information within each partition. Because of the combinatorial number of possible ways to partition and summarize the data, a method of comparing the different possibilities will help guide researchers into choosing a good partitioning and summarization scheme for their application.« less
Mastication Evaluation With Unsupervised Learning: Using an Inertial Sensor-Based System

PubMed Central

Lucena, Caroline Vieira; Lacerda, Marcelo; Caldas, Rafael; De Lima Neto, Fernando Buarque

2018-01-01

There is a direct relationship between the prevalence of musculoskeletal disorders of the temporomandibular joint and orofacial disorders. A well-elaborated analysis of the jaw movements provides relevant information for healthcare professionals to conclude their diagnosis. Different approaches have been explored to track jaw movements such that the mastication analysis is getting less subjective; however, all methods are still highly subjective, and the quality of the assessments depends much on the experience of the health professional. In this paper, an accurate and non-invasive method based on a commercial low-cost inertial sensor (MPU6050) to measure jaw movements is proposed. The jaw-movement feature values are compared to the obtained with clinical analysis, showing no statistically significant difference between both methods. Moreover, We propose to use unsupervised paradigm approaches to cluster mastication patterns of healthy subjects and simulated patients with facial trauma. Two techniques were used in this paper to instantiate the method: Kohonen’s Self-Organizing Maps and K-Means Clustering. Both algorithms have excellent performances to process jaw-movements data, showing encouraging results and potential to bring a full assessment of the masticatory function. The proposed method can be applied in real-time providing relevant dynamic information for health-care professionals. PMID:29651365
Rough-Fuzzy Clustering and Unsupervised Feature Selection for Wavelet Based MR Image Segmentation

PubMed Central

Maji, Pradipta; Roy, Shaswati

2015-01-01

Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of brain magnetic resonance (MR) images. For many human experts, manual segmentation is a difficult and time consuming task, which makes an automated brain MR image segmentation method desirable. In this regard, this paper presents a new segmentation method for brain MR images, integrating judiciously the merits of rough-fuzzy computing and multiresolution image analysis technique. The proposed method assumes that the major brain tissues, namely, gray matter, white matter, and cerebrospinal fluid from the MR images are considered to have different textural properties. The dyadic wavelet analysis is used to extract the scale-space feature vector for each pixel, while the rough-fuzzy clustering is used to address the uncertainty problem of brain MR image segmentation. An unsupervised feature selection method is introduced, based on maximum relevance-maximum significance criterion, to select relevant and significant textural features for segmentation problem, while the mathematical morphology based skull stripping preprocessing step is proposed to remove the non-cerebral tissues like skull. The performance of the proposed method, along with a comparison with related approaches, is demonstrated on a set of synthetic and real brain MR images using standard validity indices. PMID:25848961
Strainrange partitioning life predictions of the long time metal properties council creep-fatigue tests

NASA Technical Reports Server (NTRS)

Saltsman, J. F.; Halford, G. R.

1979-01-01

The method of strainrange partitioning is used to predict the cyclic lives of the Metal Properties Council's long time creep-fatigue interspersion tests of several steel alloys. Comparisons are made with predictions based upon the time- and cycle-fraction approach. The method of strainrange partitioning is shown to give consistently more accurate predictions of cyclic life than is given by the time- and cycle-fraction approach.
Implementation of spectral clustering with partitioning around medoids (PAM) algorithm on microarray data of carcinoma

NASA Astrophysics Data System (ADS)

Cahyaningrum, Rosalia D.; Bustamam, Alhadi; Siswantining, Titin

2017-03-01

Technology of microarray became one of the imperative tools in life science to observe the gene expression levels, one of which is the expression of the genes of people with carcinoma. Carcinoma is a cancer that forms in the epithelial tissue. These data can be analyzed such as the identification expressions hereditary gene and also build classifications that can be used to improve diagnosis of carcinoma. Microarray data usually served in large dimension that most methods require large computing time to do the grouping. Therefore, this study uses spectral clustering method which allows to work with any object for reduces dimension. Spectral clustering method is a method based on spectral decomposition of the matrix which is represented in the form of a graph. After the data dimensions are reduced, then the data are partitioned. One of the famous partition method is Partitioning Around Medoids (PAM) which is minimize the objective function with exchanges all the non-medoid points into medoid point iteratively until converge. Objectivity of this research is to implement methods spectral clustering and partitioning algorithm PAM to obtain groups of 7457 genes with carcinoma based on the similarity value. The result in this study is two groups of genes with carcinoma.
Semi-supervised and unsupervised extreme learning machines.

PubMed

Huang, Gao; Song, Shiji; Gupta, Jatinder N D; Wu, Cheng

2014-12-01

Extreme learning machines (ELMs) have proven to be efficient and effective learning mechanisms for pattern classification and regression. However, ELMs are primarily applied to supervised learning problems. Only a few existing research papers have used ELMs to explore unlabeled data. In this paper, we extend ELMs for both semi-supervised and unsupervised tasks based on the manifold regularization, thus greatly expanding the applicability of ELMs. The key advantages of the proposed algorithms are as follows: 1) both the semi-supervised ELM (SS-ELM) and the unsupervised ELM (US-ELM) exhibit learning capability and computational efficiency of ELMs; 2) both algorithms naturally handle multiclass classification or multicluster clustering; and 3) both algorithms are inductive and can handle unseen data at test time directly. Moreover, it is shown in this paper that all the supervised, semi-supervised, and unsupervised ELMs can actually be put into a unified framework. This provides new perspectives for understanding the mechanism of random feature mapping, which is the key concept in ELM theory. Empirical study on a wide range of data sets demonstrates that the proposed algorithms are competitive with the state-of-the-art semi-supervised or unsupervised learning algorithms in terms of accuracy and efficiency.
Construction of exponentially fitted symplectic Runge-Kutta-Nyström methods from partitioned Runge-Kutta methods

NASA Astrophysics Data System (ADS)

Monovasilis, Theodore; Kalogiratou, Zacharoula; Simos, T. E.

2014-10-01

In this work we derive exponentially fitted symplectic Runge-Kutta-Nyström (RKN) methods from symplectic exponentially fitted partitioned Runge-Kutta (PRK) methods methods (for the approximate solution of general problems of this category see [18] - [40] and references therein). We construct RKN methods from PRK methods with up to five stages and fourth algebraic order.
Unsupervised Ontology Generation from Unstructured Text. CRESST Report 827

ERIC Educational Resources Information Center

Mousavi, Hamid; Kerr, Deirdre; Iseli, Markus R.

2013-01-01

Ontologies are a vital component of most knowledge acquisition systems, and recently there has been a huge demand for generating ontologies automatically since manual or supervised techniques are not scalable. In this paper, we introduce "OntoMiner", a rule-based, iterative method to extract and populate ontologies from unstructured or…

Evaluating unsupervised and supervised image classification methods for mapping cotton root rot

USDA-ARS?s Scientific Manuscript database

Cotton root rot, caused by the soilborne fungus Phymatotrichopsis omnivora, is one of the most destructive plant diseases occurring throughout the southwestern United States. This disease has plagued the cotton industry for over a century, but effective practices for its control are still lacking. R...
Unsupervised MDP Value Selection for Automating ITS Capabilities

ERIC Educational Resources Information Center

Stamper, John; Barnes, Tiffany

2009-01-01

We seek to simplify the creation of intelligent tutors by using student data acquired from standard computer aided instruction (CAI) in conjunction with educational data mining methods to automatically generate adaptive hints. In our previous work, we have automatically generated hints for logic tutoring by constructing a Markov Decision Process…
Unsupervised chunking based on graph propagation from bilingual corpus.

PubMed

Zhu, Ling; Wong, Derek F; Chao, Lidia S

2014-01-01

This paper presents a novel approach for unsupervised shallow parsing model trained on the unannotated Chinese text of parallel Chinese-English corpus. In this approach, no information of the Chinese side is applied. The exploitation of graph-based label propagation for bilingual knowledge transfer, along with an application of using the projected labels as features in unsupervised model, contributes to a better performance. The experimental comparisons with the state-of-the-art algorithms show that the proposed approach is able to achieve impressive higher accuracy in terms of F-score.
An unsupervised classification technique for multispectral remote sensing data.

NASA Technical Reports Server (NTRS)

Su, M. Y.; Cummings, R. E.

1973-01-01

Description of a two-part clustering technique consisting of (a) a sequential statistical clustering, which is essentially a sequential variance analysis, and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum-likelihood classification techniques.
Unsupervised classification of earth resources data.

NASA Technical Reports Server (NTRS)

Su, M. Y.; Jayroe, R. R., Jr.; Cummings, R. E.

1972-01-01

A new clustering technique is presented. It consists of two parts: (a) a sequential statistical clustering which is essentially a sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by existing supervised maximum liklihood classification technique.
Hard exudates segmentation based on learned initial seeds and iterative graph cut.

PubMed

Kusakunniran, Worapan; Wu, Qiang; Ritthipravat, Panrasee; Zhang, Jian

2018-05-01

(Background and Objective): The occurrence of hard exudates is one of the early signs of diabetic retinopathy which is one of the leading causes of the blindness. Many patients with diabetic retinopathy lose their vision because of the late detection of the disease. Thus, this paper is to propose a novel method of hard exudates segmentation in retinal images in an automatic way. (Methods): The existing methods are based on either supervised or unsupervised learning techniques. In addition, the learned segmentation models may often cause miss-detection and/or fault-detection of hard exudates, due to the lack of rich characteristics, the intra-variations, and the similarity with other components in the retinal image. Thus, in this paper, the supervised learning based on the multilayer perceptron (MLP) is only used to identify initial seeds with high confidences to be hard exudates. Then, the segmentation is finalized by unsupervised learning based on the iterative graph cut (GC) using clusters of initial seeds. Also, in order to reduce color intra-variations of hard exudates in different retinal images, the color transfer (CT) is applied to normalize their color information, in the pre-processing step. (Results): The experiments and comparisons with the other existing methods are based on the two well-known datasets, e_ophtha EX and DIARETDB1. It can be seen that the proposed method outperforms the other existing methods in the literature, with the sensitivity in the pixel-level of 0.891 for the DIARETDB1 dataset and 0.564 for the e_ophtha EX dataset. The cross datasets validation where the training process is performed on one dataset and the testing process is performed on another dataset is also evaluated in this paper, in order to illustrate the robustness of the proposed method. (Conclusions): This newly proposed method integrates the supervised learning and unsupervised learning based techniques. It achieves the improved performance, when compared with the existing methods in the literature. The robustness of the proposed method for the scenario of cross datasets could enhance its practical usage. That is, the trained model could be more practical for unseen data in the real-world situation, especially when the capturing environments of training and testing images are not the same. Copyright © 2018 Elsevier B.V. All rights reserved.
Canonical partition functions: ideal quantum gases, interacting classical gases, and interacting quantum gases

NASA Astrophysics Data System (ADS)

Zhou, Chi-Chun; Dai, Wu-Sheng

2018-02-01

In statistical mechanics, for a system with a fixed number of particles, e.g. a finite-size system, strictly speaking, the thermodynamic quantity needs to be calculated in the canonical ensemble. Nevertheless, the calculation of the canonical partition function is difficult. In this paper, based on the mathematical theory of the symmetric function, we suggest a method for the calculation of the canonical partition function of ideal quantum gases, including ideal Bose, Fermi, and Gentile gases. Moreover, we express the canonical partition functions of interacting classical and quantum gases given by the classical and quantum cluster expansion methods in terms of the Bell polynomial in mathematics. The virial coefficients of ideal Bose, Fermi, and Gentile gases are calculated from the exact canonical partition function. The virial coefficients of interacting classical and quantum gases are calculated from the canonical partition function by using the expansion of the Bell polynomial, rather than calculated from the grand canonical potential.
Unsupervised Structure Detection in Biomedical Data.

PubMed

Vogt, Julia E

2015-01-01

A major challenge in computational biology is to find simple representations of high-dimensional data that best reveal the underlying structure. In this work, we present an intuitive and easy-to-implement method based on ranked neighborhood comparisons that detects structure in unsupervised data. The method is based on ordering objects in terms of similarity and on the mutual overlap of nearest neighbors. This basic framework was originally introduced in the field of social network analysis to detect actor communities. We demonstrate that the same ideas can successfully be applied to biomedical data sets in order to reveal complex underlying structure. The algorithm is very efficient and works on distance data directly without requiring a vectorial embedding of data. Comprehensive experiments demonstrate the validity of this approach. Comparisons with state-of-the-art clustering methods show that the presented method outperforms hierarchical methods as well as density based clustering methods and model-based clustering. A further advantage of the method is that it simultaneously provides a visualization of the data. Especially in biomedical applications, the visualization of data can be used as a first pre-processing step when analyzing real world data sets to get an intuition of the underlying data structure. We apply this model to synthetic data as well as to various biomedical data sets which demonstrate the high quality and usefulness of the inferred structure.
Unsupervised semantic indoor scene classification for robot vision based on context of features using Gist and HSV-SIFT

NASA Astrophysics Data System (ADS)

Madokoro, H.; Yamanashi, A.; Sato, K.

2013-08-01

This paper presents an unsupervised scene classification method for actualizing semantic recognition of indoor scenes. Background and foreground features are respectively extracted using Gist and color scale-invariant feature transform (SIFT) as feature representations based on context. We used hue, saturation, and value SIFT (HSV-SIFT) because of its simple algorithm with low calculation costs. Our method creates bags of features for voting visual words created from both feature descriptors to a two-dimensional histogram. Moreover, our method generates labels as candidates of categories for time-series images while maintaining stability and plasticity together. Automatic labeling of category maps can be realized using labels created using adaptive resonance theory (ART) as teaching signals for counter propagation networks (CPNs). We evaluated our method for semantic scene classification using KTH's image database for robot localization (KTH-IDOL), which is popularly used for robot localization and navigation. The mean classification accuracies of Gist, gray SIFT, one class support vector machines (OC-SVM), position-invariant robust features (PIRF), and our method are, respectively, 39.7, 58.0, 56.0, 63.6, and 79.4%. The result of our method is 15.8% higher than that of PIRF. Moreover, we applied our method for fine classification using our original mobile robot. We obtained mean classification accuracy of 83.2% for six zones.
Automatic cloud coverage assessment of Formosat-2 image

NASA Astrophysics Data System (ADS)

Hsu, Kuo-Hsien

2011-11-01

Formosat-2 satellite equips with the high-spatial-resolution (2m ground sampling distance) remote sensing instrument. It has been being operated on the daily-revisiting mission orbit by National Space organization (NSPO) of Taiwan since May 21 2004. NSPO has also serving as one of the ground receiving stations for daily processing the received Formosat- 2 images. The current cloud coverage assessment of Formosat-2 image for NSPO Image Processing System generally consists of two major steps. Firstly, an un-supervised K-means method is used for automatically estimating the cloud statistic of Formosat-2 image. Secondly, manual estimation of cloud coverage from Formosat-2 image is processed by manual examination. Apparently, a more accurate Automatic Cloud Coverage Assessment (ACCA) method certainly increases the efficiency of processing step 2 with a good prediction of cloud statistic. In this paper, mainly based on the research results from Chang et al, Irish, and Gotoh, we propose a modified Formosat-2 ACCA method which considered pre-processing and post-processing analysis. For pre-processing analysis, cloud statistic is determined by using un-supervised K-means classification, Sobel's method, Otsu's method, non-cloudy pixels reexamination, and cross-band filter method. Box-Counting fractal method is considered as a post-processing tool to double check the results of pre-processing analysis for increasing the efficiency of manual examination.
Automated Classification of Thermal Infrared Spectra Using Self-organizing Maps

NASA Technical Reports Server (NTRS)

Roush, Ted L.; Hogan, Robert

2006-01-01

Existing and planned space missions to a variety of planetary and satellite surfaces produce an ever increasing volume of spectral data. Understanding the scientific informational content in this large data volume is a daunting task. Fortunately various statistical approaches are available to assess such data sets. Here we discuss an automated classification scheme based on Kohonen Self-organizing maps (SOM) we have developed. The SUM process produces an output layer were spectra having similar properties lie in close proximity to each other. One major effort is partitioning this output layer into appropriate regions. This is prefonned by defining dosed regions based upon the strength of the boundaries between adjacent cells in the SOM output layer. We use the Davies-Bouldin index as a measure of the inter-class similarities and intra-class dissimilarities that determines the optimum partition of the output layer, and hence number of SOM clusters. This allows us to identify the natural number of clusters formed from the spectral data. Mineral spectral libraries prepared at Arizona State University (ASU) and John Hopkins University (JHU) are used to test and evaluate the classification scheme. We label the library sample spectra in a hierarchical scheme with class, subclass, and mineral group names. We use a portion of the spectra to train the SOM, i.e. produce the output layer, while the remaining spectra are used to test the SOM. The test spectra are presented to the SOM output layer and assigned membership to the appropriate cluster. We then evaluate these assignments to assess the scientific meaning and accuracy of the derived SOM classes as they relate to the labels. We demonstrate that unsupervised classification by SOMs can be a useful component in autonomous systems designed to identify mineral species from reflectance and emissivity spectra in the therrnal IR.
Linear time relational prototype based learning.

PubMed

Gisbrecht, Andrej; Mokbel, Bassam; Schleif, Frank-Michael; Zhu, Xibin; Hammer, Barbara

2012-10-01

Prototype based learning offers an intuitive interface to inspect large quantities of electronic data in supervised or unsupervised settings. Recently, many techniques have been extended to data described by general dissimilarities rather than Euclidean vectors, so-called relational data settings. Unlike the Euclidean counterparts, the techniques have quadratic time complexity due to the underlying quadratic dissimilarity matrix. Thus, they are infeasible already for medium sized data sets. The contribution of this article is twofold: On the one hand we propose a novel supervised prototype based classification technique for dissimilarity data based on popular learning vector quantization (LVQ), on the other hand we transfer a linear time approximation technique, the Nyström approximation, to this algorithm and an unsupervised counterpart, the relational generative topographic mapping (GTM). This way, linear time and space methods result. We evaluate the techniques on three examples from the biomedical domain.
Bio-inspired computational heuristics to study Lane-Emden systems arising in astrophysics model.

PubMed

Ahmad, Iftikhar; Raja, Muhammad Asif Zahoor; Bilal, Muhammad; Ashraf, Farooq

2016-01-01

This study reports novel hybrid computational methods for the solutions of nonlinear singular Lane-Emden type differential equation arising in astrophysics models by exploiting the strength of unsupervised neural network models and stochastic optimization techniques. In the scheme the neural network, sub-part of large field called soft computing, is exploited for modelling of the equation in an unsupervised manner. The proposed approximated solutions of higher order ordinary differential equation are calculated with the weights of neural networks trained with genetic algorithm, and pattern search hybrid with sequential quadratic programming for rapid local convergence. The results of proposed solvers for solving the nonlinear singular systems are in good agreements with the standard solutions. Accuracy and convergence the design schemes are demonstrated by the results of statistical performance measures based on the sufficient large number of independent runs.
Unsupervised pattern recognition methods in ciders profiling based on GCE voltammetric signals.

PubMed

Jakubowska, Małgorzata; Sordoń, Wanda; Ciepiela, Filip

2016-07-15

This work presents a complete methodology of distinguishing between different brands of cider and ageing degrees, based on voltammetric signals, utilizing dedicated data preprocessing procedures and unsupervised multivariate analysis. It was demonstrated that voltammograms recorded on glassy carbon electrode in Britton-Robinson buffer at pH 2 are reproducible for each brand. By application of clustering algorithms and principal component analysis visible homogenous clusters were obtained. Advanced signal processing strategy which included automatic baseline correction, interval scaling and continuous wavelet transform with dedicated mother wavelet, was a key step in the correct recognition of the objects. The results show that voltammetry combined with optimized univariate and multivariate data processing is a sufficient tool to distinguish between ciders from various brands and to evaluate their freshness. Copyright © 2016 Elsevier Ltd. All rights reserved.
Compliance with 14-day primaquine therapy for radical cure of vivax malaria--a randomized placebo-controlled trial comparing unsupervised with supervised treatment.

PubMed

Leslie, Toby; Rab, Mohammad Abdur; Ahmadzai, Hayat; Durrani, Naeem; Fayaz, Mohammad; Kolaczinski, Jan; Rowland, Mark

2004-03-01

The only available treatment that can eliminate the latent hypnozoite reservoir of vivax malaria is a 14 d course of primaquine (PQ). A potential problem with long-course chemotherapy is the issue of compliance after clinical symptoms have subsided. The present study, carried out at an Afghan refugee camp in Pakistan, between June 2000 and August 2001, compared 14 d treatment in supervised and unsupervised groups in which compliance was monitored by comparison of relapse rates. Clinical cases recruited by passive case detection were randomised by family to placebo, supervised, or unsupervised groups, and treated with chloroquine (25 mg/kg) over 3 days to eliminate erythrocytic stages. Individuals with glucose-6-phosphate dehydrogenase (G6PD) deficiency were excluded from the trial. Cases allocated to supervision were given directly observed treatment (0.25 mg PQ/kg body weight) once per day for 14 days. Cases allocated to the unsupervised group were provided with 14 PQ doses upon enrollment and strongly advised to complete the course. A total of 595 cases were enrolled. After 9 months of follow up PQ proved equally protective against further episodes of P. vivax in supervised (odds ratio 0.35, 95% CI 0.21-0.57) and unsupervised (odds ratio 0.37, 95% CI 0.23-0.59) groups as compared to placebo. All age groups on supervised or unsupervised treatment showed a similar degree of protection even though the risk of relapse decreased with age. The study showed that a presumed problem of poor compliance may be overcome with simple health messages even when the majority of individuals are illiterate and without formal education. Unsupervised treatment with 14-day PQ when combined with simple instruction can avert a significant amount of the morbidity associated with relapse in populations where G6PD deficiency is either absent or readily diagnosable.
True Zero-Training Brain-Computer Interfacing – An Online Study

PubMed Central

Kindermans, Pieter-Jan; Schreuder, Martijn; Schrauwen, Benjamin; Müller, Klaus-Robert; Tangermann, Michael

2014-01-01

Despite several approaches to realize subject-to-subject transfer of pre-trained classifiers, the full performance of a Brain-Computer Interface (BCI) for a novel user can only be reached by presenting the BCI system with data from the novel user. In typical state-of-the-art BCI systems with a supervised classifier, the labeled data is collected during a calibration recording, in which the user is asked to perform a specific task. Based on the known labels of this recording, the BCI's classifier can learn to decode the individual's brain signals. Unfortunately, this calibration recording consumes valuable time. Furthermore, it is unproductive with respect to the final BCI application, e.g. text entry. Therefore, the calibration period must be reduced to a minimum, which is especially important for patients with a limited concentration ability. The main contribution of this manuscript is an online study on unsupervised learning in an auditory event-related potential (ERP) paradigm. Our results demonstrate that the calibration recording can be bypassed by utilizing an unsupervised trained classifier, that is initialized randomly and updated during usage. Initially, the unsupervised classifier tends to make decoding mistakes, as the classifier might not have seen enough data to build a reliable model. Using a constant re-analysis of the previously spelled symbols, these initially misspelled symbols can be rectified posthoc when the classifier has learned to decode the signals. We compare the spelling performance of our unsupervised approach and of the unsupervised posthoc approach to the standard supervised calibration-based dogma for n = 10 healthy users. To assess the learning behavior of our approach, it is unsupervised trained from scratch three times per user. Even with the relatively low SNR of an auditory ERP paradigm, the results show that after a limited number of trials (30 trials), the unsupervised approach performs comparably to a classic supervised model. PMID:25068464
40 CFR 799.6756 - TSCA partition coefficient (n-octanol/water), generator column method.

Code of Federal Regulations, 2013 CFR

2013-07-01

... method, or any other reliable quantitative procedure must be used for those compounds that do not absorb... any other reliable quantitative method, aqueous solutions from the generator column enter a collecting... Solubilities and Octanol-Water Partition Coefficients of Hydrophobic Substances,” Journal of Research of the...
40 CFR 799.6756 - TSCA partition coefficient (n-octanol/water), generator column method.

Code of Federal Regulations, 2014 CFR

2014-07-01

... method, or any other reliable quantitative procedure must be used for those compounds that do not absorb... any other reliable quantitative method, aqueous solutions from the generator column enter a collecting... Solubilities and Octanol-Water Partition Coefficients of Hydrophobic Substances,” Journal of Research of the...
A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain.

PubMed

Hall, L O; Bensaid, A M; Clarke, L P; Velthuizen, R P; Silbiger, M S; Bezdek, J C

1992-01-01

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed, with fuzz-c-means approaches being slightly preferred over feedforward cascade correlation results. Various facets of both approaches, such as supervised versus unsupervised learning, time complexity, and utility for the diagnostic process, are compared.
Six weeks of unsupervised Nintendo Wii Fit gaming is effective at improving balance in independent older adults.

PubMed

Nicholson, Vaughan Patrick; McKean, Mark; Lowe, John; Fawcett, Christine; Burkett, Brendan

2015-01-01

To determine the effectiveness of unsupervised Nintendo Wii Fit balance training in older adults. Forty-one older adults were recruited from local retirement villages and educational settings to participate in a six-week two-group repeated measures study. The Wii group (n = 19, 75 ± 6 years) undertook 30 min of unsupervised Wii balance gaming three times per week in their retirement village while the comparison group (n = 22, 74 ± 5 years) continued with their usual exercise program. Participants' balance abilities were assessed pre- and postintervention. The Wii Fit group demonstrated significant improvements (P < .05) in timed up-and-go, left single-leg balance, lateral reach (left and right), and gait speed compared with the comparison group. Reported levels of enjoyment following game play increased during the study. Six weeks of unsupervised Wii balance training is an effective modality for improving balance in independent older adults.

Assessing the Linguistic Productivity of Unsupervised Deep Neural Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Phillips, Lawrence A.; Hodas, Nathan O.

Increasingly, cognitive scientists have demonstrated interest in applying tools from deep learning. One use for deep learning is in language acquisition where it is useful to know if a linguistic phenomenon can be learned through domain-general means. To assess whether unsupervised deep learning is appropriate, we first pose a smaller question: Can unsupervised neural networks apply linguistic rules productively, using them in novel situations. We draw from the literature on determiner/noun productivity by training an unsupervised, autoencoder network measuring its ability to combine nouns with determiners. Our simple autoencoder creates combinations it has not previously encountered, displaying a degree ofmore » overlap similar to actual children. While this preliminary work does not provide conclusive evidence for productivity, it warrants further investigation with more complex models. Further, this work helps lay the foundations for future collaboration between the deep learning and cognitive science communities.« less
Cytotoxicity of Sargassum angustifolium Partitions against Breast and Cervical Cancer Cell Lines

PubMed Central

Vaseghi, Golnaz; Sharifi, Mohsen; Dana, Nasim; Ghasemi, Ahmad; Yegdaneh, Afsaneh

2018-01-01

Background: Marine organisms produce a variety of compounds with pharmacological activities including anticancer effects. This study attempt to find cytotoxicity of hexane (HEX), dichloromethane (DCM), and butanol (BUTOH) partitions of Sargassum angustifolium. Materials and Methods: S. angustifolium was collected from Bushehr, a Southwest coastline of Persian Gulf. The plant was extracted by maceration with methanol-ethyl acetate. The extract was evaporated under vacuum and partitioned by Kupchan method to yield HEX, DCM, and BUTOH partitions. The cytotoxic activity of the extract (150, 450, and 900 μg/ml) was investigated against MCF-7 (breast cancer), HeLa (cervical cancer), and human umbilical vein endothelial cells cell lines by mitochondrial tetrazolium test assay after 72 h. Results: The cell survivals of HeLa and MCF-7 cell were decreased by increasing the concentration of extracts from 150 μg/ml to 900 μg/ml. The median growth inhibitory concentration value of HEX partition was 71 and 77 μg/ml against HeLa and MCF-7, dichloromethane partition was 36 and 88 μg/ml against HeLa and MCF-7, respectively. BUTOH partition was 25 μg/ml against MCF-7. Conclusion: This study reveals that different partitions of S. angustifolium have cytotoxic activity against cancer cell lines. PMID:29657928
A Recursive Method for Calculating Certain Partition Functions.

ERIC Educational Resources Information Center

Woodrum, Luther; And Others

1978-01-01

Describes a simple recursive method for calculating the partition function and average energy of a system consisting of N electrons and L energy levels. Also, presents an efficient APL computer program to utilize the recursion relation. (Author/GA)
An automated and objective method for age partitioning of reference intervals based on continuous centile curves.

PubMed

Yang, Qian; Lew, Hwee Yeong; Peh, Raymond Hock Huat; Metz, Michael Patrick; Loh, Tze Ping

2016-10-01

Reference intervals are the most commonly used decision support tool when interpreting quantitative laboratory results. They may require partitioning to better describe subpopulations that display significantly different reference values. Partitioning by age is particularly important for the paediatric population since there are marked physiological changes associated with growth and maturation. However, most partitioning methods are either technically complex or require prior knowledge of the underlying physiology/biological variation of the population. There is growing interest in the use of continuous centile curves, which provides seamless laboratory reference values as a child grows, as an alternative to rigidly described fixed reference intervals. However, the mathematical functions that describe these curves can be complex and may not be easily implemented in laboratory information systems. Hence, the use of fixed reference intervals is expected to continue for a foreseeable time. We developed a method that objectively proposes optimised age partitions and reference intervals for quantitative laboratory data (http://research.sph.nus.edu.sg/pp/ppResult.aspx), based on the sum of gradient that best describes the underlying distribution of the continuous centile curves. It is hoped that this method may improve the selection of age intervals for partitioning, which is receiving increasing attention in paediatric laboratory medicine. Copyright © 2016 Royal College of Pathologists of Australasia. Published by Elsevier B.V. All rights reserved.
Determination of air-loop volume and radon partition coefficient for measuring radon in water sample.

PubMed

Lee, Kil Yong; Burnett, William C

A simple method for the direct determination of the air-loop volume in a RAD7 system as well as the radon partition coefficient was developed allowing for an accurate measurement of the radon activity in any type of water. The air-loop volume may be measured directly using an external radon source and an empty bottle with a precisely measured volume. The partition coefficient and activity of radon in the water sample may then be determined via the RAD7 using the determined air-loop volume. Activity ratios instead of absolute activities were used to measure the air-loop volume and the radon partition coefficient. In order to verify this approach, we measured the radon partition coefficient in deionized water in the temperature range of 10-30 °C and compared the values to those calculated from the well-known Weigel equation. The results were within 5 % variance throughout the temperature range. We also applied the approach for measurement of the radon partition coefficient in synthetic saline water (0-75 ppt salinity) as well as tap water. The radon activity of the tap water sample was determined by this method as well as the standard RAD-H 2 O and BigBottle RAD-H 2 O. The results have shown good agreement between this method and the standard methods.
Automatic partitioning of head CTA for enabling segmentation

NASA Astrophysics Data System (ADS)

Suryanarayanan, Srikanth; Mullick, Rakesh; Mallya, Yogish; Kamath, Vidya; Nagaraj, Nithin

2004-05-01

Radiologists perform a CT Angiography procedure to examine vascular structures and associated pathologies such as aneurysms. Volume rendering is used to exploit volumetric capabilities of CT that provides complete interactive 3-D visualization. However, bone forms an occluding structure and must be segmented out. The anatomical complexity of the head creates a major challenge in the segmentation of bone and vessel. An analysis of the head volume reveals varying spatial relationships between vessel and bone that can be separated into three sub-volumes: "proximal", "middle", and "distal". The "proximal" and "distal" sub-volumes contain good spatial separation between bone and vessel (carotid referenced here). Bone and vessel appear contiguous in the "middle" partition that remains the most challenging region for segmentation. The partition algorithm is used to automatically identify these partition locations so that different segmentation methods can be developed for each sub-volume. The partition locations are computed using bone, image entropy, and sinus profiles along with a rule-based method. The algorithm is validated on 21 cases (varying volume sizes, resolution, clinical sites, pathologies) using ground truth identified visually. The algorithm is also computationally efficient, processing a 500+ slice volume in 6 seconds (an impressive 0.01 seconds / slice) that makes it an attractive algorithm for pre-processing large volumes. The partition algorithm is integrated into the segmentation workflow. Fast and simple algorithms are implemented for processing the "proximal" and "distal" partitions. Complex methods are restricted to only the "middle" partition. The partitionenabled segmentation has been successfully tested and results are shown from multiple cases.
A Partitioning and Bounded Variable Algorithm for Linear Programming

ERIC Educational Resources Information Center

Sheskin, Theodore J.

2006-01-01

An interesting new partitioning and bounded variable algorithm (PBVA) is proposed for solving linear programming problems. The PBVA is a variant of the simplex algorithm which uses a modified form of the simplex method followed by the dual simplex method for bounded variables. In contrast to the two-phase method and the big M method, the PBVA does…
Emotional textile image classification based on cross-domain convolutional sparse autoencoders with feature selection

NASA Astrophysics Data System (ADS)

Li, Zuhe; Fan, Yangyu; Liu, Weihua; Yu, Zeqi; Wang, Fengqin

2017-01-01

We aim to apply sparse autoencoder-based unsupervised feature learning to emotional semantic analysis for textile images. To tackle the problem of limited training data, we present a cross-domain feature learning scheme for emotional textile image classification using convolutional autoencoders. We further propose a correlation-analysis-based feature selection method for the weights learned by sparse autoencoders to reduce the number of features extracted from large size images. First, we randomly collect image patches on an unlabeled image dataset in the source domain and learn local features with a sparse autoencoder. We then conduct feature selection according to the correlation between different weight vectors corresponding to the autoencoder's hidden units. We finally adopt a convolutional neural network including a pooling layer to obtain global feature activations of textile images in the target domain and send these global feature vectors into logistic regression models for emotional image classification. The cross-domain unsupervised feature learning method achieves 65% to 78% average accuracy in the cross-validation experiments corresponding to eight emotional categories and performs better than conventional methods. Feature selection can reduce the computational cost of global feature extraction by about 50% while improving classification performance.
Automatic segmentation of amyloid plaques in MR images using unsupervised SVM

PubMed Central

Iordanescu, Gheorghe; Venkatasubramanian, Palamadai N.; Wyrwicz, Alice M.

2011-01-01

Deposition of the β-amyloid peptide (Aβ) is an important pathological hallmark of Alzheimer’s disease (AD). However, reliable quantification of amyloid plaques in both human and animal brains remains a challenge. We present here a novel automatic plaque segmentation algorithm based on the intrinsic MR signal characteristics of plaques. This algorithm identifies plaque candidates in MR data by using watershed transform, which extracts regions with low intensities completely surrounded by higher intensity neighbors. These candidates are classified as plaque or non-plaque by an unsupervised learning method using features derived from the MR data intensity. The algorithm performance is validated by comparison with histology. We also demonstrate the algorithm’s ability to detect age-related changes in plaque load ex vivo in 5×FAD APP transgenic mice. To our knowledge, this work represents the first quantitative method for characterizing amyloid plaques in MRI data. The proposed method can be used to describe the spatio-temporal progression of amyloid deposition, which is necessary for understanding the evolution of plaque pathology in mouse models of AD and to evaluate the efficacy of emergent amyloid-targeting therapies in preclinical trials. PMID:22189675
Data Mining for Anomaly Detection

NASA Technical Reports Server (NTRS)

Biswas, Gautam; Mack, Daniel; Mylaraswamy, Dinkar; Bharadwaj, Raj

2013-01-01

The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments.
Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method.

PubMed

Liu, H; Lussier, Y A; Friedman, C

2001-08-01

With the growing use of Natural Language Processing (NLP) techniques for information extraction and concept indexing in the biomedical domain, a method that quickly and efficiently assigns the correct sense of an ambiguous biomedical term in a given context is needed concurrently. The current status of word sense disambiguation (WSD) in the biomedical domain is that handcrafted rules are used based on contextual material. The disadvantages of this approach are (i) generating WSD rules manually is a time-consuming and tedious task, (ii) maintenance of rule sets becomes increasingly difficult over time, and (iii) handcrafted rules are often incomplete and perform poorly in new domains comprised of specialized vocabularies and different genres of text. This paper presents a two-phase unsupervised method to build a WSD classifier for an ambiguous biomedical term W. The first phase automatically creates a sense-tagged corpus for W, and the second phase derives a classifier for W using the derived sense-tagged corpus as a training set. A formative experiment was performed, which demonstrated that classifiers trained on the derived sense-tagged corpora achieved an overall accuracy of about 97%, with greater than 90% accuracy for each individual ambiguous term.
Multilevel Green's function interpolation method for scattering from composite metallic and dielectric objects.

PubMed

Shi, Yan; Wang, Hao Gang; Li, Long; Chan, Chi Hou

2008-10-01

A multilevel Green's function interpolation method based on two kinds of multilevel partitioning schemes--the quasi-2D and the hybrid partitioning scheme--is proposed for analyzing electromagnetic scattering from objects comprising both conducting and dielectric parts. The problem is formulated using the surface integral equation for homogeneous dielectric and conducting bodies. A quasi-2D multilevel partitioning scheme is devised to improve the efficiency of the Green's function interpolation. In contrast to previous multilevel partitioning schemes, noncubic groups are introduced to discretize the whole EM structure in this quasi-2D multilevel partitioning scheme. Based on the detailed analysis of the dimension of the group in this partitioning scheme, a hybrid quasi-2D/3D multilevel partitioning scheme is proposed to effectively handle objects with fine local structures. Selection criteria for some key parameters relating to the interpolation technique are given. The proposed algorithm is ideal for the solution of problems involving objects such as missiles, microstrip antenna arrays, photonic bandgap structures, etc. Numerical examples are presented to show that CPU time is between O(N) and O(N log N) while the computer memory requirement is O(N).
Strainrange partitioning behavior of the nickel-base superalloys, Rene' 80 and in 100

NASA Technical Reports Server (NTRS)

Halford, G. R.; Nachtigall, A. J.

1978-01-01

A study was made to assess the ability of the method of Strainrange Partitioning (SRP) to both correlate and predict high-temperature, low cycle fatigue lives of nickel base superalloys for gas turbine applications. The partitioned strainrange versus life relationships for uncoated Rene' 80 and cast IN 100 were also determined from the ductility normalized-Strainrange Partitioning equations. These were used to predict the cyclic lives of the baseline tests. The life predictability of the method was verified for cast IN 100 by applying the baseline results to the cyclic life prediction of a series of complex strain cycling tests with multiple hold periods at constant strain. It was concluded that the method of SRP can correlate and predict the cyclic lives of laboratory specimens of the nickel base superalloys evaluated in this program.
Calcic amphibole thermobarometry in metamorphic and igneous rocks: New calibrations based on plagioclase/amphibole Al-Si partitioning and amphibole/liquid Mg partitioning

NASA Astrophysics Data System (ADS)

Molina, J. F.; Moreno, J. A.; Castro, A.; Rodríguez, C.; Fershtater, G. B.

2015-09-01

Dependencies of plagioclase/amphibole Al-Si partitioning, DAl/Siplg/amp, and amphibole/liquid Mg partitioning, DMgamp/liq, on temperature, pressure and phase compositions are investigated employing robust regression methods based on MM-estimators. A database with 92 amphibole-plagioclase pairs - temperature range: 650-1050 °C; amphibole compositional limits: > 0.02 apfu (23O) Ti and > 0.05 apfu Al - and 148 amphibole-glass pairs - temperature range: 800-1100 °C; amphibole compositional limit: CaM4/(CaM4 + NaM4) > 0.75 - compiled from experiments in the literature was used for the calculations (amphibole normalization scheme: 13-CNK method).
Regional Climate Modeling over the Marmara Region, Turkey, with Improved Land Cover Data

NASA Astrophysics Data System (ADS)

Sertel, E.; Robock, A.

2007-12-01

Land surface controls the partitioning of available energy at the surface between sensible and latent heat,and controls partitioning of available water between evaporation and runoff. Current land cover data available within the regional climate models such as Regional Atmospheric Modeling System (RAMS), the Fifth-Generation NCAR/Penn State Mesoscale Model (MM5) and Weather Research and Forecasting (WRF) was obtained from 1- km Advanced Very High Resolution Radiometer satellite images spanning April 1992 through March 1993 with an unsupervised classification technique. These data are not up-to-date and are not accurate for all regions and some land cover types such as urban areas. Here we introduce new, up-to-date and accurate land cover data for the Marmara Region, Turkey derived from Landsat Enhanced Thematic Mapper images into the WRF regional climate model. We used several image processing techniques to create accurate land cover data from Landsat images obtained between 2001 and 2005. First, all images were atmospherically and radiometrically corrected to minimize contamination effects of atmospheric particles and systematic errors. Then, geometric correction was performed for each image to eliminate geometric distortions and define images in a common coordinate system. Finally, unsupervised and supervised classification techniques were utilized to form the most accurate land cover data yet for the study area. Accuracy assessments of the classifications were performed using error matrix and kappa statistics to find the best classification results. Maximum likelihood classification method gave the most accurate results over the study area. We compared the new land cover data with the default WRF land cover data. WRF land cover data cannot represent urban areas in the cities of Istanbul, Izmit, and Bursa. As an example, both original satellite images and new land cover data showed the expansion of urban areas into the Istanbul metropolitan area, but in the WRF land cover data only a limited area along the Bosporus is shown as urban. In addition, the new land cover data indicate that the northern part of Istanbul is covered by evergreen and deciduous forest (verified by ground truth data), but the WRF data indicate that most of this region is croplands. In the northern part of the Marmara Region, there is bare ground as a result of open mining activities and this class can be identified in our land cover data, whereas the WRF data indicated this region as woodland. We then used this new data set to conduct WRF simulations for one main and two nested domains, where the inner-most domain represents the Marmara Region with 3 km horizontal resolution. The vertical domain of both main and nested domains extends over 28 vertical levels. Initial and boundary conditions were obtained from National Centers for Environmental Prediction-Department of Energy Reanalysis II and the Noah model was selected as the land surface model. Two model simulations were conducted; one with available land cover data and one with the newly created land cover data. Using detailed meteorological station data within the study area, we find that the simulation with the new land cover data set produces better temperature and precipitation simulations for the region, showing the value of accurate land cover data and that changing land cover data can be an important influence on local climate change.
A Novel Method for Discovering Fuzzy Sequential Patterns Using the Simple Fuzzy Partition Method.

ERIC Educational Resources Information Center

Chen, Ruey-Shun; Hu, Yi-Chung

2003-01-01

Discusses sequential patterns, data mining, knowledge acquisition, and fuzzy sequential patterns described by natural language. Proposes a fuzzy data mining technique to discover fuzzy sequential patterns by using the simple partition method which allows the linguistic interpretation of each fuzzy set to be easily obtained. (Author/LRW)
Finite element modeling of diffusion and partitioning in biological systems: the infinite composite medium problem.

PubMed

Missel, P J

2000-01-01

Four methods are proposed for modeling diffusion in heterogeneous media where diffusion and partition coefficients take on differing values in each subregion. The exercise was conducted to validate finite element modeling (FEM) procedures in anticipation of modeling drug diffusion with regional partitioning into ocular tissue, though the approach can be useful for other organs, or for modeling diffusion in laminate devices. Partitioning creates a discontinuous value in the dependent variable (concentration) at an intertissue boundary that is not easily handled by available general-purpose FEM codes, which allow for only one value at each node. The discontinuity is handled using a transformation on the dependent variable based upon the region-specific partition coefficient. Methods were evaluated by their ability to reproduce a known exact result, for the problem of the infinite composite medium (Crank, J. The Mathematics of Diffusion, 2nd ed. New York: Oxford University Press, 1975, pp. 38-39.). The most physically intuitive method is based upon the concept of chemical potential, which is continuous across an interphase boundary (method III). This method makes the equation of the dependent variable highly nonlinear. This can be linearized easily by a change of variables (method IV). Results are also given for a one-dimensional problem simulating bolus injection into the vitreous, predicting time disposition of drug in vitreous and retina.
Personalized Medicine in Veterans with Traumatic Brain Injuries

DTIC Science & Technology

2013-05-01

Pair-Group Method using Arithmetic averages ( UPGMA ) based on cosine correlation of row mean centered log2 signal values; this was the top 50%-tile...cluster- ing was performed by the UPGMA method using Cosine correlation as the similarity metric. For comparative purposes, clustered heat maps included...non-mTBI cases were subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as the similarity
Personalized Medicine in Veterans with Traumatic Brain Injuries

DTIC Science & Technology

2014-07-01

9 control cases are subjected to unsupervised hierarchical clustering analysis using the UPGMA algorithm with cosine correlation as the similarity...in unsu- pervised hierarchical clustering by the Un- weighted Pair-Group Method using Arithmetic averages ( UPGMA ) based on cosine correlation of row...of log2 trans- formed MAS5.0 signal values; probe set cluster- ing was performed by the UPGMA method using Cosine correlation as the similarity
Spectral Transfer Learning Using Information Geometry for a User-Independent Brain-Computer Interface

DOE PAGES

Waytowich, Nicholas R.; Lawhern, Vernon J.; Bohannon, Addison W.; ...

2016-09-22

Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry,STIG),which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIGmore » method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as out perform traditional within-subject calibration techniques when limited data is available. Here, this method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system.« less

Maximum Margin Clustering of Hyperspectral Data

NASA Astrophysics Data System (ADS)

Niazmardi, S.; Safari, A.; Homayouni, S.

2013-09-01

In recent decades, large margin methods such as Support Vector Machines (SVMs) are supposed to be the state-of-the-art of supervised learning methods for classification of hyperspectral data. However, the results of these algorithms mainly depend on the quality and quantity of available training data. To tackle down the problems associated with the training data, the researcher put effort into extending the capability of large margin algorithms for unsupervised learning. One of the recent proposed algorithms is Maximum Margin Clustering (MMC). The MMC is an unsupervised SVMs algorithm that simultaneously estimates both the labels and the hyperplane parameters. Nevertheless, the optimization of the MMC algorithm is a non-convex problem. Most of the existing MMC methods rely on the reformulating and the relaxing of the non-convex optimization problem as semi-definite programs (SDP), which are computationally very expensive and only can handle small data sets. Moreover, most of these algorithms are two-class classification, which cannot be used for classification of remotely sensed data. In this paper, a new MMC algorithm is used that solve the original non-convex problem using Alternative Optimization method. This algorithm is also extended for multi-class classification and its performance is evaluated. The results of the proposed algorithm show that the algorithm has acceptable results for hyperspectral data clustering.
Spectral Transfer Learning Using Information Geometry for a User-Independent Brain-Computer Interface

DOE Office of Scientific and Technical Information (OSTI.GOV)

Waytowich, Nicholas R.; Lawhern, Vernon J.; Bohannon, Addison W.

Recent advances in signal processing and machine learning techniques have enabled the application of Brain-Computer Interface (BCI) technologies to fields such as medicine, industry, and recreation; however, BCIs still suffer from the requirement of frequent calibration sessions due to the intra- and inter-individual variability of brain-signals, which makes calibration suppression through transfer learning an area of increasing interest for the development of practical BCI systems. In this paper, we present an unsupervised transfer method (spectral transfer using information geometry,STIG),which ranks and combines unlabeled predictions from an ensemble of information geometry classifiers built on data from individual training subjects. The STIGmore » method is validated in both off-line and real-time feedback analysis during a rapid serial visual presentation task (RSVP). For detection of single-trial, event-related potentials (ERPs), the proposed method can significantly outperform existing calibration-free techniques as well as out perform traditional within-subject calibration techniques when limited data is available. Here, this method demonstrates that unsupervised transfer learning for single-trial detection in ERP-based BCIs can be achieved without the requirement of costly training data, representing a step-forward in the overall goal of achieving a practical user-independent BCI system.« less
Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

PubMed Central

Saeed, Isaam; Tang, Sen-Lin; Halgamuge, Saman K.

2012-01-01

An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis. PMID:22180538
Change detection and classification in brain MR images using change vector analysis.

PubMed

Simões, Rita; Slump, Cornelis

2011-01-01

The automatic detection of longitudinal changes in brain images is valuable in the assessment of disease evolution and treatment efficacy. Most existing change detection methods that are currently used in clinical research to monitor patients suffering from neurodegenerative diseases--such as Alzheimer's--focus on large-scale brain deformations. However, such patients often have other brain impairments, such as infarcts, white matter lesions and hemorrhages, which are typically overlooked by the deformation-based methods. Other unsupervised change detection algorithms have been proposed to detect tissue intensity changes. The outcome of these methods is typically a binary change map, which identifies changed brain regions. However, understanding what types of changes these regions underwent is likely to provide equally important information about lesion evolution. In this paper, we present an unsupervised 3D change detection method based on Change Vector Analysis. We compute and automatically threshold the Generalized Likelihood Ratio map to obtain a binary change map. Subsequently, we perform histogram-based clustering to classify the change vectors. We obtain a Kappa Index of 0.82 using various types of simulated lesions. The classification error is 2%. Finally, we are able to detect and discriminate both small changes and ventricle expansions in datasets from Mild Cognitive Impairment patients.
On the asymptotic improvement of supervised learning by utilizing additional unlabeled samples - Normal mixture density case

NASA Technical Reports Server (NTRS)

Shahshahani, Behzad M.; Landgrebe, David A.

1992-01-01

The effect of additional unlabeled samples in improving the supervised learning process is studied in this paper. Three learning processes. supervised, unsupervised, and combined supervised-unsupervised, are compared by studying the asymptotic behavior of the estimates obtained under each process. Upper and lower bounds on the asymptotic covariance matrices are derived. It is shown that under a normal mixture density assumption for the probability density function of the feature space, the combined supervised-unsupervised learning is always superior to the supervised learning in achieving better estimates. Experimental results are provided to verify the theoretical concepts.
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning

PubMed Central

Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi

2017-01-01

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization. PMID:28786986
Massively parallel unsupervised single-particle cryo-EM data clustering via statistical manifold learning.

PubMed

Wu, Jiayi; Ma, Yong-Bei; Congdon, Charles; Brett, Bevin; Chen, Shuobing; Xu, Yaofang; Ouyang, Qi; Mao, Youdong

2017-01-01

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.
Automated processing of webcam images for phenological classification.

PubMed

Bothmann, Ludwig; Menzel, Annette; Menze, Bjoern H; Schunk, Christian; Kauermann, Göran

2017-01-01

Along with the global climate change, there is an increasing interest for its effect on phenological patterns such as start and end of the growing season. Scientific digital webcams are used for this purpose taking every day one or more images from the same natural motive showing for example trees or grassland sites. To derive phenological patterns from the webcam images, regions of interest are manually defined on these images by an expert and subsequently a time series of percentage greenness is derived and analyzed with respect to structural changes. While this standard approach leads to satisfying results and allows to determine dates of phenological change points, it is associated with a considerable amount of manual work and is therefore constrained to a limited number of webcams only. In particular, this forbids to apply the phenological analysis to a large network of publicly accessible webcams in order to capture spatial phenological variation. In order to be able to scale up the analysis to several hundreds or thousands of webcams, we propose and evaluate two automated alternatives for the definition of regions of interest, allowing for efficient analyses of webcam images. A semi-supervised approach selects pixels based on the correlation of the pixels' time series of percentage greenness with a few prototype pixels. An unsupervised approach clusters pixels based on scores of a singular value decomposition. We show for a scientific webcam that the resulting regions of interest are at least as informative as those chosen by an expert with the advantage that no manual action is required. Additionally, we show that the methods can even be applied to publicly available webcams accessed via the internet yielding interesting partitions of the analyzed images. Finally, we show that the methods are suitable for the intended big data applications by analyzing 13988 webcams from the AMOS database. All developed methods are implemented in the statistical software package R and publicly available in the R package phenofun. Executable example code is provided as supplementary material.
Automated processing of webcam images for phenological classification

PubMed Central

Bothmann, Ludwig; Menzel, Annette; Menze, Bjoern H.; Schunk, Christian; Kauermann, Göran

2017-01-01

Along with the global climate change, there is an increasing interest for its effect on phenological patterns such as start and end of the growing season. Scientific digital webcams are used for this purpose taking every day one or more images from the same natural motive showing for example trees or grassland sites. To derive phenological patterns from the webcam images, regions of interest are manually defined on these images by an expert and subsequently a time series of percentage greenness is derived and analyzed with respect to structural changes. While this standard approach leads to satisfying results and allows to determine dates of phenological change points, it is associated with a considerable amount of manual work and is therefore constrained to a limited number of webcams only. In particular, this forbids to apply the phenological analysis to a large network of publicly accessible webcams in order to capture spatial phenological variation. In order to be able to scale up the analysis to several hundreds or thousands of webcams, we propose and evaluate two automated alternatives for the definition of regions of interest, allowing for efficient analyses of webcam images. A semi-supervised approach selects pixels based on the correlation of the pixels’ time series of percentage greenness with a few prototype pixels. An unsupervised approach clusters pixels based on scores of a singular value decomposition. We show for a scientific webcam that the resulting regions of interest are at least as informative as those chosen by an expert with the advantage that no manual action is required. Additionally, we show that the methods can even be applied to publicly available webcams accessed via the internet yielding interesting partitions of the analyzed images. Finally, we show that the methods are suitable for the intended big data applications by analyzing 13988 webcams from the AMOS database. All developed methods are implemented in the statistical software package R and publicly available in the R package phenofun. Executable example code is provided as supplementary material. PMID:28235092
Empirical Analysis of Exploiting Review Helpfulness for Extractive Summarization of Online Reviews

ERIC Educational Resources Information Center

Xiong, Wenting; Litman, Diane

2014-01-01

We propose a novel unsupervised extractive approach for summarizing online reviews by exploiting review helpfulness ratings. In addition to using the helpfulness ratings for review-level filtering, we suggest using them as the supervision of a topic model for sentence-level content scoring. The proposed method is metadata-driven, requiring no…
Children Home Alone Unsupervised: Modeling Parental Decisions and Associated Factors in Botswana, Mexico, and Vietnam

ERIC Educational Resources Information Center

Ruiz-Casares, Monica; Heymann, Jody

2009-01-01

Objective: This paper examines different child care arrangements utilized by working families in countries undergoing major socio-economic transitions, with a focus on modeling parental decisions to leave children home alone. Method: The study interviewed 537 working caregivers attending government health clinics in Botswana, Mexico, and Vietnam.…
An Empirical Generative Framework for Computational Modeling of Language Acquisition

ERIC Educational Resources Information Center

Waterfall, Heidi R.; Sandbank, Ben; Onnis, Luca; Edelman, Shimon

2010-01-01

This paper reports progress in developing a computer model of language acquisition in the form of (1) a generative grammar that is (2) algorithmically learnable from realistic corpus data, (3) viable in its large-scale quantitative performance and (4) psychologically real. First, we describe new algorithmic methods for unsupervised learning of…
Unsupervised Anomaly Detection Based on Clustering and Multiple One-Class SVM

NASA Astrophysics Data System (ADS)

Song, Jungsuk; Takakura, Hiroki; Okabe, Yasuo; Kwon, Yongjin

Intrusion detection system (IDS) has played an important role as a device to defend our networks from cyber attacks. However, since it is unable to detect unknown attacks, i.e., 0-day attacks, the ultimate challenge in intrusion detection field is how we can exactly identify such an attack by an automated manner. Over the past few years, several studies on solving these problems have been made on anomaly detection using unsupervised learning techniques such as clustering, one-class support vector machine (SVM), etc. Although they enable one to construct intrusion detection models at low cost and effort, and have capability to detect unforeseen attacks, they still have mainly two problems in intrusion detection: a low detection rate and a high false positive rate. In this paper, we propose a new anomaly detection method based on clustering and multiple one-class SVM in order to improve the detection rate while maintaining a low false positive rate. We evaluated our method using KDD Cup 1999 data set. Evaluation results show that our approach outperforms the existing algorithms reported in the literature; especially in detection of unknown attacks.
High Throughput Multispectral Image Processing with Applications in Food Science.

PubMed

Tsakanikas, Panagiotis; Pavlidis, Dimitris; Nychas, George-John

2015-01-01

Recently, machine vision is gaining attention in food science as well as in food industry concerning food quality assessment and monitoring. Into the framework of implementation of Process Analytical Technology (PAT) in the food industry, image processing can be used not only in estimation and even prediction of food quality but also in detection of adulteration. Towards these applications on food science, we present here a novel methodology for automated image analysis of several kinds of food products e.g. meat, vanilla crème and table olives, so as to increase objectivity, data reproducibility, low cost information extraction and faster quality assessment, without human intervention. Image processing's outcome will be propagated to the downstream analysis. The developed multispectral image processing method is based on unsupervised machine learning approach (Gaussian Mixture Models) and a novel unsupervised scheme of spectral band selection for segmentation process optimization. Through the evaluation we prove its efficiency and robustness against the currently available semi-manual software, showing that the developed method is a high throughput approach appropriate for massive data extraction from food samples.
A Deep Convolutional Coupling Network for Change Detection Based on Heterogeneous Optical and Radar Images.

PubMed

Liu, Jia; Gong, Maoguo; Qin, Kai; Zhang, Puzhao

2018-03-01

We propose an unsupervised deep convolutional coupling network for change detection based on two heterogeneous images acquired by optical sensors and radars on different dates. Most existing change detection methods are based on homogeneous images. Due to the complementary properties of optical and radar sensors, there is an increasing interest in change detection based on heterogeneous images. The proposed network is symmetric with each side consisting of one convolutional layer and several coupling layers. The two input images connected with the two sides of the network, respectively, are transformed into a feature space where their feature representations become more consistent. In this feature space, the different map is calculated, which then leads to the ultimate detection map by applying a thresholding algorithm. The network parameters are learned by optimizing a coupling function. The learning process is unsupervised, which is different from most existing change detection methods based on heterogeneous images. Experimental results on both homogenous and heterogeneous images demonstrate the promising performance of the proposed network compared with several existing approaches.
Domain-Invariant Partial-Least-Squares Regression.

PubMed

Nikzad-Langerodi, Ramin; Zellinger, Werner; Lughofer, Edwin; Saminger-Platz, Susanne

2018-05-11

Multivariate calibration models often fail to extrapolate beyond the calibration samples because of changes associated with the instrumental response, environmental condition, or sample matrix. Most of the current methods used to adapt a source calibration model to a target domain exclusively apply to calibration transfer between similar analytical devices, while generic methods for calibration-model adaptation are largely missing. To fill this gap, we here introduce domain-invariant partial-least-squares (di-PLS) regression, which extends ordinary PLS by a domain regularizer in order to align the source and target distributions in the latent-variable space. We show that a domain-invariant weight vector can be derived in closed form, which allows the integration of (partially) labeled data from the source and target domains as well as entirely unlabeled data from the latter. We test our approach on a simulated data set where the aim is to desensitize a source calibration model to an unknown interfering agent in the target domain (i.e., unsupervised model adaptation). In addition, we demonstrate unsupervised, semisupervised, and supervised model adaptation by di-PLS on two real-world near-infrared (NIR) spectroscopic data sets.
Implementation of hybrid clustering based on partitioning around medoids algorithm and divisive analysis on human Papillomavirus DNA

NASA Astrophysics Data System (ADS)

Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian

2017-03-01

Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.

PubMed

Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H

2006-01-01

Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Dyslexic Participants Show Intact Spontaneous Categorization Processes

ERIC Educational Resources Information Center

Nikolopoulos, Dimitris S.; Pothos, Emmanuel M.

2009-01-01

We examine the performance of dyslexic participants on an unsupervised categorization task against that of matched non-dyslexic control participants. Unsupervised categorization is a cognitive process critical for conceptual development. Existing research in dyslexia has emphasized perceptual tasks and supervised categorization tasks (for which…
The Use of Binary Search Trees in External Distribution Sorting.

ERIC Educational Resources Information Center

Cooper, David; Lynch, Michael F.

1984-01-01

Suggests new method of external distribution called tree partitioning that involves use of binary tree to split incoming file into successively smaller partitions for internal sorting. Number of disc accesses during a tree-partitioning sort were calculated in simulation using files extracted from British National Bibliography catalog files. (19…

A physics-motivated Centroidal Voronoi Particle domain decomposition method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fu, Lin, E-mail: lin.fu@tum.de; Hu, Xiangyu Y., E-mail: xiangyu.hu@tum.de; Adams, Nikolaus A., E-mail: nikolaus.adams@tum.de

2017-04-15

In this paper, we propose a novel domain decomposition method for large-scale simulations in continuum mechanics by merging the concepts of Centroidal Voronoi Tessellation (CVT) and Voronoi Particle dynamics (VP). The CVT is introduced to achieve a high-level compactness of the partitioning subdomains by the Lloyd algorithm which monotonically decreases the CVT energy. The number of computational elements between neighboring partitioning subdomains, which scales the communication effort for parallel simulations, is optimized implicitly as the generated partitioning subdomains are convex and simply connected with small aspect-ratios. Moreover, Voronoi Particle dynamics employing physical analogy with a tailored equation of state ismore » developed, which relaxes the particle system towards the target partition with good load balance. Since the equilibrium is computed by an iterative approach, the partitioning subdomains exhibit locality and the incremental property. Numerical experiments reveal that the proposed Centroidal Voronoi Particle (CVP) based algorithm produces high-quality partitioning with high efficiency, independently of computational-element types. Thus it can be used for a wide range of applications in computational science and engineering.« less
A physics-motivated Centroidal Voronoi Particle domain decomposition method

NASA Astrophysics Data System (ADS)

Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.

2017-04-01

In this paper, we propose a novel domain decomposition method for large-scale simulations in continuum mechanics by merging the concepts of Centroidal Voronoi Tessellation (CVT) and Voronoi Particle dynamics (VP). The CVT is introduced to achieve a high-level compactness of the partitioning subdomains by the Lloyd algorithm which monotonically decreases the CVT energy. The number of computational elements between neighboring partitioning subdomains, which scales the communication effort for parallel simulations, is optimized implicitly as the generated partitioning subdomains are convex and simply connected with small aspect-ratios. Moreover, Voronoi Particle dynamics employing physical analogy with a tailored equation of state is developed, which relaxes the particle system towards the target partition with good load balance. Since the equilibrium is computed by an iterative approach, the partitioning subdomains exhibit locality and the incremental property. Numerical experiments reveal that the proposed Centroidal Voronoi Particle (CVP) based algorithm produces high-quality partitioning with high efficiency, independently of computational-element types. Thus it can be used for a wide range of applications in computational science and engineering.
Dexamethasone exerts profound immunologic interference on treatment efficacy for recurrent glioblastoma

PubMed Central

Wong, E T; Lok, E; Gautam, S; Swanson, K D

2015-01-01

Background: Patients with recurrent glioblastoma have a poor outcome. Data from the phase III registration trial comparing tumour-treating alternating electric fields (TTFields) vs chemotherapy provided a unique opportunity to study dexamethasone effects on patient outcome unencumbered by the confounding immune and myeloablative side effects of chemotherapy. Methods: Using an unsupervised binary partitioning algorithm, we segregated both cohorts of the trial based on the dexamethasone dose that yielded the greatest statistical difference in overall survival (OS). The results were validated in a separate cohort treated in a single institution with TTFields and their T lymphocytes were correlated with OS. Results: Patients who used dexamethasone doses >4.1 mg per day had a significant reduction in OS when compared with those who used ⩽4.1 mg per day, 4.8 vs 11.0 months respectively (χ2=34.6, P<0.0001) in the TTField-treated cohort and 6.0 vs 8.9 months respectively (χ2=10.0, P<0.0015) in the chemotherapy-treated cohort. In a single institution validation cohort treated with TTFields, the median OS of patients who used dexamethasone >4.1 mg per day was 3.2 months compared with those who used ⩽4.1 mg per day was 8.7 months (χ2=11.1, P=0.0009). There was a significant correlation between OS and T-lymphocyte counts. Conclusions: Dexamethasone exerted profound effects on both TTFields and chemotherapy efficacy resulting in lower patient OS. Therefore, global immunosuppression by dexamethasone likely interferes with immune functions that are necessary for the treatment of glioblastoma. PMID:26125449
Some simple guides to finding useful information in exploration geochemical data

USGS Publications Warehouse

Singer, D.A.; Kouda, R.

2001-01-01

Most regional geochemistry data reflect processes that can produce superfluous bits of noise and, perhaps, information about the mineralization process of interest. There are two end-member approaches to finding patterns in geochemical data-unsupervised learning and supervised learning. In unsupervised learning, data are processed and the geochemist is given the task of interpreting and identifying possible sources of any patterns. In supervised learning, data from known subgroups such as rock type, mineralized and nonmineralized, and types of mineralization are used to train the system which then is given unknown samples to classify into these subgroups. To locate patterns of interest, it is helpful to transform the data and to remove unwanted masking patterns. With trace elements use of a logarithmic transformation is recommended. In many situations, missing censored data can be estimated using multiple regression of other uncensored variables on the variable with censored values. In unsupervised learning, transformed values can be standardized, or normalized, to a Z-score by subtracting the subset's mean and dividing by its standard deviation. Subsets include any source of differences that might be related to processes unrelated to the target sought such as different laboratories, regional alteration, analytical procedures, or rock types. Normalization removes effects of different means and measurement scales as well as facilitates comparison of spatial patterns of elements. These adjustments remove effects of different subgroups and hopefully leave on the map the simple and uncluttered pattern(s) related to the mineralization only. Supervised learning methods, such as discriminant analysis and neural networks, offer the promise of consistent and, in certain situations, unbiased estimates of where mineralization might exist. These methods critically rely on being trained with data that encompasses all populations fairly and that can possibly fall into only the identified populations. ?? 2001 International Association for Mathematical Geology.
Classification of high-resolution multi-swath hyperspectral data using Landsat 8 surface reflectance data as a calibration target and a novel histogram based unsupervised classification technique to determine natural classes from biophysically relevant fit parameters

NASA Astrophysics Data System (ADS)

McCann, C.; Repasky, K. S.; Morin, M.; Lawrence, R. L.; Powell, S. L.

2016-12-01

Compact, cost-effective, flight-based hyperspectral imaging systems can provide scientifically relevant data over large areas for a variety of applications such as ecosystem studies, precision agriculture, and land management. To fully realize this capability, unsupervised classification techniques based on radiometrically-calibrated data that cluster based on biophysical similarity rather than simply spectral similarity are needed. An automated technique to produce high-resolution, large-area, radiometrically-calibrated hyperspectral data sets based on the Landsat surface reflectance data product as a calibration target was developed and applied to three subsequent years of data covering approximately 1850 hectares. The radiometrically-calibrated data allows inter-comparison of the temporal series. Advantages of the radiometric calibration technique include the need for minimal site access, no ancillary instrumentation, and automated processing. Fitting the reflectance spectra of each pixel using a set of biophysically relevant basis functions reduces the data from 80 spectral bands to 9 parameters providing noise reduction and data compression. Examination of histograms of these parameters allows for determination of natural splitting into biophysical similar clusters. This method creates clusters that are similar in terms of biophysical parameters, not simply spectral proximity. Furthermore, this method can be applied to other data sets, such as urban scenes, by developing other physically meaningful basis functions. The ability to use hyperspectral imaging for a variety of important applications requires the development of data processing techniques that can be automated. The radiometric-calibration combined with the histogram based unsupervised classification technique presented here provide one potential avenue for managing big-data associated with hyperspectral imaging.
Enhanced HMAX model with feedforward feature learning for multiclass categorization.

PubMed

Li, Yinlin; Wu, Wei; Zhang, Bo; Li, Fengfu

2015-01-01

In recent years, the interdisciplinary research between neuroscience and computer vision has promoted the development in both fields. Many biologically inspired visual models are proposed, and among them, the Hierarchical Max-pooling model (HMAX) is a feedforward model mimicking the structures and functions of V1 to posterior inferotemporal (PIT) layer of the primate visual cortex, which could generate a series of position- and scale- invariant features. However, it could be improved with attention modulation and memory processing, which are two important properties of the primate visual cortex. Thus, in this paper, based on recent biological research on the primate visual cortex, we still mimic the first 100-150 ms of visual cognition to enhance the HMAX model, which mainly focuses on the unsupervised feedforward feature learning process. The main modifications are as follows: (1) To mimic the attention modulation mechanism of V1 layer, a bottom-up saliency map is computed in the S1 layer of the HMAX model, which can support the initial feature extraction for memory processing; (2) To mimic the learning, clustering and short-term memory to long-term memory conversion abilities of V2 and IT, an unsupervised iterative clustering method is used to learn clusters with multiscale middle level patches, which are taken as long-term memory; (3) Inspired by the multiple feature encoding mode of the primate visual cortex, information including color, orientation, and spatial position are encoded in different layers of the HMAX model progressively. By adding a softmax layer at the top of the model, multiclass categorization experiments can be conducted, and the results on Caltech101 show that the enhanced model with a smaller memory size exhibits higher accuracy than the original HMAX model, and could also achieve better accuracy than other unsupervised feature learning methods in multiclass categorization task.
Semi-supervised classification tool for DubaiSat-2 multispectral imagery

NASA Astrophysics Data System (ADS)

Al-Mansoori, Saeed

2015-10-01

This paper addresses a semi-supervised classification tool based on a pixel-based approach of the multi-spectral satellite imagery. There are not many studies demonstrating such algorithm for the multispectral images, especially when the image consists of 4 bands (Red, Green, Blue and Near Infrared) as in DubaiSat-2 satellite images. The proposed approach utilizes both unsupervised and supervised classification schemes sequentially to identify four classes in the image, namely, water bodies, vegetation, land (developed and undeveloped areas) and paved areas (i.e. roads). The unsupervised classification concept is applied to identify two classes; water bodies and vegetation, based on a well-known index that uses the distinct wavelengths of visible and near-infrared sunlight that is absorbed and reflected by the plants to identify the classes; this index parameter is called "Normalized Difference Vegetation Index (NDVI)". Afterward, the supervised classification is performed by selecting training homogenous samples for roads and land areas. Here, a precise selection of training samples plays a vital role in the classification accuracy. Post classification is finally performed to enhance the classification accuracy, where the classified image is sieved, clumped and filtered before producing final output. Overall, the supervised classification approach produced higher accuracy than the unsupervised method. This paper shows some current preliminary research results which point out the effectiveness of the proposed technique in a virtual perspective.
Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hu, Wenjian; Singh, Rajiv R. P.; Scalettar, Richard T.

Here, we apply unsupervised machine learning techniques, mainly principal component analysis (PCA), to compare and contrast the phase behavior and phase transitions in several classical spin models - the square and triangular-lattice Ising models, the Blume-Capel model, a highly degenerate biquadratic-exchange spin-one Ising (BSI) model, and the 2D XY model, and examine critically what machine learning is teaching us. We find that quantified principal components from PCA not only allow exploration of different phases and symmetry-breaking, but can distinguish phase transition types and locate critical points. We show that the corresponding weight vectors have a clear physical interpretation, which ismore » particularly interesting in the frustrated models such as the triangular antiferromagnet, where they can point to incipient orders. Unlike the other well-studied models, the properties of the BSI model are less well known. Using both PCA and conventional Monte Carlo analysis, we demonstrate that the BSI model shows an absence of phase transition and macroscopic ground-state degeneracy. The failure to capture the 'charge' correlations (vorticity) in the BSI model (XY model) from raw spin configurations points to some of the limitations of PCA. Finally, we employ a nonlinear unsupervised machine learning procedure, the 'antoencoder method', and demonstrate that it too can be trained to capture phase transitions and critical points.« less
Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination

DOE PAGES

Hu, Wenjian; Singh, Rajiv R. P.; Scalettar, Richard T.

2017-06-19

Here, we apply unsupervised machine learning techniques, mainly principal component analysis (PCA), to compare and contrast the phase behavior and phase transitions in several classical spin models - the square and triangular-lattice Ising models, the Blume-Capel model, a highly degenerate biquadratic-exchange spin-one Ising (BSI) model, and the 2D XY model, and examine critically what machine learning is teaching us. We find that quantified principal components from PCA not only allow exploration of different phases and symmetry-breaking, but can distinguish phase transition types and locate critical points. We show that the corresponding weight vectors have a clear physical interpretation, which ismore » particularly interesting in the frustrated models such as the triangular antiferromagnet, where they can point to incipient orders. Unlike the other well-studied models, the properties of the BSI model are less well known. Using both PCA and conventional Monte Carlo analysis, we demonstrate that the BSI model shows an absence of phase transition and macroscopic ground-state degeneracy. The failure to capture the 'charge' correlations (vorticity) in the BSI model (XY model) from raw spin configurations points to some of the limitations of PCA. Finally, we employ a nonlinear unsupervised machine learning procedure, the 'antoencoder method', and demonstrate that it too can be trained to capture phase transitions and critical points.« less
Discovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examination

NASA Astrophysics Data System (ADS)

Hu, Wenjian; Singh, Rajiv R. P.; Scalettar, Richard T.

2017-06-01

We apply unsupervised machine learning techniques, mainly principal component analysis (PCA), to compare and contrast the phase behavior and phase transitions in several classical spin models—the square- and triangular-lattice Ising models, the Blume-Capel model, a highly degenerate biquadratic-exchange spin-1 Ising (BSI) model, and the two-dimensional X Y model—and we examine critically what machine learning is teaching us. We find that quantified principal components from PCA not only allow the exploration of different phases and symmetry-breaking, but they can distinguish phase-transition types and locate critical points. We show that the corresponding weight vectors have a clear physical interpretation, which is particularly interesting in the frustrated models such as the triangular antiferromagnet, where they can point to incipient orders. Unlike the other well-studied models, the properties of the BSI model are less well known. Using both PCA and conventional Monte Carlo analysis, we demonstrate that the BSI model shows an absence of phase transition and macroscopic ground-state degeneracy. The failure to capture the "charge" correlations (vorticity) in the BSI model (X Y model) from raw spin configurations points to some of the limitations of PCA. Finally, we employ a nonlinear unsupervised machine learning procedure, the "autoencoder method," and we demonstrate that it too can be trained to capture phase transitions and critical points.
Intelligent Fault Diagnosis of Rotary Machinery Based on Unsupervised Multiscale Representation Learning

NASA Astrophysics Data System (ADS)

Jiang, Guo-Qian; Xie, Ping; Wang, Xiao; Chen, Meng; He, Qun

2017-11-01

The performance of traditional vibration based fault diagnosis methods greatly depends on those handcrafted features extracted using signal processing algorithms, which require significant amounts of domain knowledge and human labor, and do not generalize well to new diagnosis domains. Recently, unsupervised representation learning provides an alternative promising solution to feature extraction in traditional fault diagnosis due to its superior learning ability from unlabeled data. Given that vibration signals usually contain multiple temporal structures, this paper proposes a multiscale representation learning (MSRL) framework to learn useful features directly from raw vibration signals, with the aim to capture rich and complementary fault pattern information at different scales. In our proposed approach, a coarse-grained procedure is first employed to obtain multiple scale signals from an original vibration signal. Then, sparse filtering, a newly developed unsupervised learning algorithm, is applied to automatically learn useful features from each scale signal, respectively, and then the learned features at each scale to be concatenated one by one to obtain multiscale representations. Finally, the multiscale representations are fed into a supervised classifier to achieve diagnosis results. Our proposed approach is evaluated using two different case studies: motor bearing and wind turbine gearbox fault diagnosis. Experimental results show that the proposed MSRL approach can take full advantages of the availability of unlabeled data to learn discriminative features and achieved better performance with higher accuracy and stability compared to the traditional approaches.
Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling

PubMed Central

Zhao, Liang; Chen, Feng; Dai, Jing; Hua, Ting; Lu, Chang-Tien; Ramakrishnan, Naren

2014-01-01

Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach. PMID:25350136
Report: Unsupervised identification of malaria parasites using computer vision.

PubMed

Khan, Najeed Ahmed; Pervaz, Hassan; Latif, Arsalan; Musharaff, Ayesha

2017-01-01

Malaria in human is a serious and fatal tropical disease. This disease results from Anopheles mosquitoes that are infected by Plasmodium species. The clinical diagnosis of malaria based on the history, symptoms and clinical findings must always be confirmed by laboratory diagnosis. Laboratory diagnosis of malaria involves identification of malaria parasite or its antigen / products in the blood of the patient. Manual diagnosis of malaria parasite by the pathologists has proven to become cumbersome. Therefore, there is a need of automatic, efficient and accurate identification of malaria parasite. In this paper, we proposed a computer vision based approach to identify the malaria parasite from light microscopy images. This research deals with the challenges involved in the automatic detection of malaria parasite tissues. Our proposed method is based on the pixel-based approach. We used K-means clustering (unsupervised approach) for the segmentation to identify malaria parasite tissues.
Unsupervised Feature Learning for Heart Sounds Classification Using Autoencoder

NASA Astrophysics Data System (ADS)

Hu, Wei; Lv, Jiancheng; Liu, Dongbo; Chen, Yao

2018-04-01

Cardiovascular disease seriously threatens the health of many people. It is usually diagnosed during cardiac auscultation, which is a fast and efficient method of cardiovascular disease diagnosis. In recent years, deep learning approach using unsupervised learning has made significant breakthroughs in many fields. However, to our knowledge, deep learning has not yet been used for heart sound classification. In this paper, we first use the average Shannon energy to extract the envelope of the heart sounds, then find the highest point of S1 to extract the cardiac cycle. We convert the time-domain signals of the cardiac cycle into spectrograms and apply principal component analysis whitening to reduce the dimensionality of the spectrogram. Finally, we apply a two-layer autoencoder to extract the features of the spectrogram. The experimental results demonstrate that the features from the autoencoder are suitable for heart sound classification.
Unsupervised analysis of small animal dynamic Cerenkov luminescence imaging

NASA Astrophysics Data System (ADS)

Spinelli, Antonello E.; Boschi, Federico

2011-12-01

Clustering analysis (CA) and principal component analysis (PCA) were applied to dynamic Cerenkov luminescence images (dCLI). In order to investigate the performances of the proposed approaches, two distinct dynamic data sets obtained by injecting mice with 32P-ATP and 18F-FDG were acquired using the IVIS 200 optical imager. The k-means clustering algorithm has been applied to dCLI and was implemented using interactive data language 8.1. We show that cluster analysis allows us to obtain good agreement between the clustered and the corresponding emission regions like the bladder, the liver, and the tumor. We also show a good correspondence between the time activity curves of the different regions obtained by using CA and manual region of interest analysis on dCLIT and PCA images. We conclude that CA provides an automatic unsupervised method for the analysis of preclinical dynamic Cerenkov luminescence image data.
Unsupervised identification of cone photoreceptors in non-confocal adaptive optics scanning light ophthalmoscope images.

PubMed

Bergeles, Christos; Dubis, Adam M; Davidson, Benjamin; Kasilian, Melissa; Kalitzeos, Angelos; Carroll, Joseph; Dubra, Alfredo; Michaelides, Michel; Ourselin, Sebastien

2017-06-01

Precise measurements of photoreceptor numerosity and spatial arrangement are promising biomarkers for the early detection of retinal pathologies and may be valuable in the evaluation of retinal therapies. Adaptive optics scanning light ophthalmoscopy (AOSLO) is a method of imaging that corrects for aberrations of the eye to acquire high-resolution images that reveal the photoreceptor mosaic. These images are typically graded manually by experienced observers, obviating the robust, large-scale use of the technology. This paper addresses unsupervised automated detection of cones in non-confocal, split-detection AOSLO images. Our algorithm leverages the appearance of split-detection images to create a cone model that is used for classification. Results show that it compares favorably to the state-of-the-art, both for images of healthy retinas and for images from patients affected by Stargardt disease. The algorithm presented also compares well to manual annotation while excelling in speed.
Analysis of thematic mapper simulator data collected over eastern North Dakota

NASA Technical Reports Server (NTRS)

Anderson, J. E. (Principal Investigator)

1982-01-01

The results of the analysis of aircraft-acquired thematic mapper simulator (TMS) data, collected to investigate the utility of thematic mapper data in crop area and land cover estimates, are discussed. Results of the analysis indicate that the seven-channel TMS data are capable of delineating the 13 crop types included in the study to an overall pixel classification accuracy of 80.97% correct, with relative efficiencies for four crop types examined between 1.62 and 26.61. Both supervised and unsupervised spectral signature development techniques were evaluated. The unsupervised methods proved to be inferior (based on analysis of variance) for the majority of crop types considered. Given the ground truth data set used for spectral signature development as well as evaluation of performance, it is possible to demonstrate which signature development technique would produce the highest percent correct classification for each crop type.
Unsupervised color image segmentation using a lattice algebra clustering technique

NASA Astrophysics Data System (ADS)

Urcid, Gonzalo; Ritter, Gerhard X.

2011-08-01

In this paper we introduce a lattice algebra clustering technique for segmenting digital images in the Red-Green- Blue (RGB) color space. The proposed technique is a two step procedure. Given an input color image, the first step determines the finite set of its extreme pixel vectors within the color cube by means of the scaled min-W and max-M lattice auto-associative memory matrices, including the minimum and maximum vector bounds. In the second step, maximal rectangular boxes enclosing each extreme color pixel are found using the Chebychev distance between color pixels; afterwards, clustering is performed by assigning each image pixel to its corresponding maximal box. The two steps in our proposed method are completely unsupervised or autonomous. Illustrative examples are provided to demonstrate the color segmentation results including a brief numerical comparison with two other non-maximal variations of the same clustering technique.
Unsupervised feature learning for autonomous rock image classification

NASA Astrophysics Data System (ADS)

Shu, Lei; McIsaac, Kenneth; Osinski, Gordon R.; Francis, Raymond

2017-09-01

Autonomous rock image classification can enhance the capability of robots for geological detection and enlarge the scientific returns, both in investigation on Earth and planetary surface exploration on Mars. Since rock textural images are usually inhomogeneous and manually hand-crafting features is not always reliable, we propose an unsupervised feature learning method to autonomously learn the feature representation for rock images. In our tests, rock image classification using the learned features shows that the learned features can outperform manually selected features. Self-taught learning is also proposed to learn the feature representation from a large database of unlabelled rock images of mixed class. The learned features can then be used repeatedly for classification of any subclass. This takes advantage of the large dataset of unlabelled rock images and learns a general feature representation for many kinds of rocks. We show experimental results supporting the feasibility of self-taught learning on rock images.
Computational efficient unsupervised coastline detection from single-polarization 1-look SAR images of complex coastal environments

NASA Astrophysics Data System (ADS)

Garzelli, Andrea; Zoppetti, Claudia; Pinelli, Gianpaolo

2017-10-01

Coastline detection in synthetic aperture radar (SAR) images is crucial in many application fields, from coastal erosion monitoring to navigation, from damage assessment to security planning for port facilities. The backscattering difference between land and sea is not always documented in SAR imagery, due to the severe speckle noise, especially in 1-look data with high spatial resolution, high sea state, or complex coastal environments. This paper presents an unsupervised, computationally efficient solution to extract the coastline acquired by only one single-polarization 1-look SAR image. Extensive tests on Spotlight COSMO-SkyMed images of complex coastal environments and objective assessment demonstrate the validity of the proposed procedure which is compared to state-of-the-art methods through visual results and with an objective evaluation of the distance between the detected and the true coastline provided by regional authorities.

Data mining with unsupervised clustering using photonic micro-ring resonators

NASA Astrophysics Data System (ADS)

McAulay, Alastair D.

2013-09-01

Data is commonly moved through optical fiber in modern data centers and may be stored optically. We propose an optical method of data mining for future data centers to enhance performance. For example, in clustering, a form of unsupervised learning, we propose that parameters corresponding to information in a database are converted from analog values to frequencies, as in the brain's neurons, where similar data will have close frequencies. We describe the Wilson-Cowan model for oscillating neurons. In optics we implement the frequencies with micro ring resonators. Due to the influence of weak coupling, a group of resonators will form clusters of similar frequencies that will indicate the desired parameters having close relations. Fewer clusters are formed as clustering proceeds, which allows the creation of a tree showing topics of importance and their relationships in the database. The tree can be used for instance to target advertising and for planning.
Housing and sexual health among street-involved youth.

PubMed

Kumar, Maya M; Nisenbaum, Rosane; Barozzino, Tony; Sgro, Michael; Bonifacio, Herbert J; Maguire, Jonathon L

2015-10-01

Street-involved youth (SIY) carry a disproportionate burden of sexually transmitted diseases (STD). Studies among adults suggest that improving housing stability may be an effective primary prevention strategy for improving sexual health. Housing options available to SIY offer varying degrees of stability and adult supervision. This study investigated whether housing options offering more stability and adult supervision are associated with fewer STD and related risk behaviors among SIY. A cross-sectional study was performed using public health survey and laboratory data collected from Toronto SIY in 2010. Three exposure categories were defined a priori based on housing situation: (1) stable and supervised housing, (2) stable and unsupervised housing, and (3) unstable and unsupervised housing. Multivariate logistic regression was used to test the association between housing category and current or recent STD. Secondary analyses were performed using the following secondary outcomes: blood-borne infection, recent binge-drinking, and recent high-risk sexual behavior. The final analysis included 184 SIY. Of these, 28.8 % had a current or recent STD. Housing situation was stable and supervised for 12.5 %, stable and unsupervised for 46.2 %, and unstable and unsupervised for 41.3 %. Compared to stable and supervised housing, there was no significant association between current or recent STD among stable and unsupervised housing or unstable and unsupervised housing. There was no significant association between housing category and risk of blood-borne infection, binge-drinking, or high-risk sexual behavior. Although we did not demonstrate a significant association between stable and supervised housing and lower STD risk, our incorporation of both housing stability and adult supervision into a priori defined exposure groups may inform future studies of housing-related prevention strategies among SIY. Multi-modal interventions beyond housing alone may also be required to prevent sexual morbidity among these vulnerable youth.
Out-of-School Time and Adolescent Substance Use.

PubMed

Lee, Kenneth T H; Vandell, Deborah Lowe

2015-11-01

High levels of adolescent substance use are linked to lower academic achievement, reduced schooling, and delinquency. We assess four types of out-of-school time (OST) contexts--unsupervised time with peers, sports, organized activities, and paid employment--in relation to tobacco, alcohol, and marijuana use at the end of high school. Other research has examined these OST contexts in isolation, limiting efforts to disentangle potentially confounded relations. Longitudinal data from the National Institute of Child Health and Human Development Study of Early Child Care and Youth Development (N = 766) examined associations between different OST contexts during high school and substance use at the end of high school. Unsupervised time with peers increased the odds of tobacco, alcohol, and marijuana use, whereas sports increased the odds of alcohol use and decreased the odds of marijuana use. Paid employment increased the odds of tobacco and alcohol use. Unsupervised time with peers predicted increased amounts of tobacco, alcohol, and marijuana use, whereas sports predicted decreased amounts of tobacco and marijuana use and increased amounts of alcohol use at the end of high school. Although unsupervised time with peers, sports, and paid employment were differentially linked to the odds of substance use, only unsupervised time with peers and sports were significantly associated with the amounts of tobacco, alcohol, and marijuana use at the end of high school. These findings underscore the value of considering OST contexts in relation to strategies to promote adolescent health. Reducing unsupervised time with peers and increasing sports participation may have positive impacts on reducing substance use. Copyright © 2015 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.
Partition functions for heterotic WZW conformal field theories

NASA Astrophysics Data System (ADS)

Gannon, Terry

1993-08-01

Thus far in the search for, and classification of, "physical" modular invariant partition functions ΣN LRχ Lχ R∗ the attention has been focused on the symmetric case where the holomorphic and anti-holomorphic sectors, and hence the characters χLand χR, are associated with the same Kac-Moody algebras ĝL = ĝR and levels κ L = κ R. In this paper we consider the more general possibility where ( ĝL, κ L) may not equal ( ĝR, κ R). We discuss which choices of algebras and levels may correspond to well-defined conformal field theories, we find the "smallest" such heterotic (i.e. asymmetric) partition functions, and we give a method, generalizing the Roberts-Terao-Warner lattice method, for explicitly constructing many other modular invariants. We conclude the paper by proving that this new lattice method will succeed in generating all the heterotic partition functions, for all choices of algebras and levels.
Task-specific image partitioning.

PubMed

Kim, Sungwoong; Nowozin, Sebastian; Kohli, Pushmeet; Yoo, Chang D

2013-02-01

Image partitioning is an important preprocessing step for many of the state-of-the-art algorithms used for performing high-level computer vision tasks. Typically, partitioning is conducted without regard to the task in hand. We propose a task-specific image partitioning framework to produce a region-based image representation that will lead to a higher task performance than that reached using any task-oblivious partitioning framework and existing supervised partitioning framework, albeit few in number. The proposed method partitions the image by means of correlation clustering, maximizing a linear discriminant function defined over a superpixel graph. The parameters of the discriminant function that define task-specific similarity/dissimilarity among superpixels are estimated based on structured support vector machine (S-SVM) using task-specific training data. The S-SVM learning leads to a better generalization ability while the construction of the superpixel graph used to define the discriminant function allows a rich set of features to be incorporated to improve discriminability and robustness. We evaluate the learned task-aware partitioning algorithms on three benchmark datasets. Results show that task-aware partitioning leads to better labeling performance than the partitioning computed by the state-of-the-art general-purpose and supervised partitioning algorithms. We believe that the task-specific image partitioning paradigm is widely applicable to improving performance in high-level image understanding tasks.
Partition coefficients of methylated DNA bases obtained from free energy calculations with molecular electron density derived atomic charges.

PubMed

Lara, A; Riquelme, M; Vöhringer-Martinez, E

2018-05-11

Partition coefficients serve in various areas as pharmacology and environmental sciences to predict the hydrophobicity of different substances. Recently, they have also been used to address the accuracy of force fields for various organic compounds and specifically the methylated DNA bases. In this study, atomic charges were derived by different partitioning methods (Hirshfeld and Minimal Basis Iterative Stockholder) directly from the electron density obtained by electronic structure calculations in a vacuum, with an implicit solvation model or with explicit solvation taking the dynamics of the solute and the solvent into account. To test the ability of these charges to describe electrostatic interactions in force fields for condensed phases, the original atomic charges of the AMBER99 force field were replaced with the new atomic charges and combined with different solvent models to obtain the hydration and chloroform solvation free energies by molecular dynamics simulations. Chloroform-water partition coefficients derived from the obtained free energies were compared to experimental and previously reported values obtained with the GAFF or the AMBER-99 force field. The results show that good agreement with experimental data is obtained when the polarization of the electron density by the solvent has been taken into account, and when the energy needed to polarize the electron density of the solute has been considered in the transfer free energy. These results were further confirmed by hydration free energies of polar and aromatic amino acid side chain analogs. Comparison of the two partitioning methods, Hirshfeld-I and Minimal Basis Iterative Stockholder (MBIS), revealed some deficiencies in the Hirshfeld-I method related to the unstable isolated anionic nitrogen pro-atom used in the method. Hydration free energies and partitioning coefficients obtained with atomic charges from the MBIS partitioning method accounting for polarization by the implicit solvation model are in good agreement with the experimental values. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Spectral (Finite) Volume Method for Conservation Laws on Unstructured Grids II: Extension to Two Dimensional Scalar Equation

NASA Technical Reports Server (NTRS)

Wang, Z. J.; Liu, Yen; Kwak, Dochan (Technical Monitor)

2002-01-01

The framework for constructing a high-order, conservative Spectral (Finite) Volume (SV) method is presented for two-dimensional scalar hyperbolic conservation laws on unstructured triangular grids. Each triangular grid cell forms a spectral volume (SV), and the SV is further subdivided into polygonal control volumes (CVs) to supported high-order data reconstructions. Cell-averaged solutions from these CVs are used to reconstruct a high order polynomial approximation in the SV. Each CV is then updated independently with a Godunov-type finite volume method and a high-order Runge-Kutta time integration scheme. A universal reconstruction is obtained by partitioning all SVs in a geometrically similar manner. The convergence of the SV method is shown to depend on how a SV is partitioned. A criterion based on the Lebesgue constant has been developed and used successfully to determine the quality of various partitions. Symmetric, stable, and convergent linear, quadratic, and cubic SVs have been obtained, and many different types of partitions have been evaluated. The SV method is tested for both linear and non-linear model problems with and without discontinuities.
Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation.

PubMed

Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira

2013-04-01

Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers.
Unsupervised Ensemble Anomaly Detection Using Time-Periodic Packet Sampling

NASA Astrophysics Data System (ADS)

Uchida, Masato; Nawata, Shuichi; Gu, Yu; Tsuru, Masato; Oie, Yuji

We propose an anomaly detection method for finding patterns in network traffic that do not conform to legitimate (i.e., normal) behavior. The proposed method trains a baseline model describing the normal behavior of network traffic without using manually labeled traffic data. The trained baseline model is used as the basis for comparison with the audit network traffic. This anomaly detection works in an unsupervised manner through the use of time-periodic packet sampling, which is used in a manner that differs from its intended purpose — the lossy nature of packet sampling is used to extract normal packets from the unlabeled original traffic data. Evaluation using actual traffic traces showed that the proposed method has false positive and false negative rates in the detection of anomalies regarding TCP SYN packets comparable to those of a conventional method that uses manually labeled traffic data to train the baseline model. Performance variation due to the probabilistic nature of sampled traffic data is mitigated by using ensemble anomaly detection that collectively exploits multiple baseline models in parallel. Alarm sensitivity is adjusted for the intended use by using maximum- and minimum-based anomaly detection that effectively take advantage of the performance variations among the multiple baseline models. Testing using actual traffic traces showed that the proposed anomaly detection method performs as well as one using manually labeled traffic data and better than one using randomly sampled (unlabeled) traffic data.
Unsupervised malaria parasite detection based on phase spectrum.

PubMed

Fang, Yuming; Xiong, Wei; Lin, Weisi; Chen, Zhenzhong

2011-01-01

In this paper, we propose a novel method for malaria parasite detection based on phase spectrum. The method first obtains the amplitude spectrum and phase spectrum for blood smear images through Quaternion Fourier Transform (QFT). Then it gets the reconstructed image based on Inverse Quaternion Fourier transform (IQFT) on a constant amplitude spectrum and the original phase spectrum. The malaria parasite areas can be detected easily from the reconstructed blood smear images. Extensive experiments have demonstrated the effectiveness of this novel method.
Iron Partitioning in Ferropericlase and Consequences for the Magma Ocean.

NASA Astrophysics Data System (ADS)

Braithwaite, J. W. H.; Stixrude, L. P.; Holmstrom, E.; Pinilla, C.

2016-12-01

The relative buoyancy of crystals and liquid is likely to exert a strong influence on the thermal and chemical evolution of the magma ocean. Theory indicates that liquids approach, but do not exceed the density of iso-chemical crystals in the deep mantle. The partitioning of heavy elements, such as Fe, is therefore likely to control whether crystals sink or float. While some experimental results exist, our knowledge of silicate liquid-crystal element partitioning is still limited in the deep mantle. We have developed a method for computing the Mg-Fe partitioning of Fe in such systems. We have focused initially on ferropericlase, as a relatively simple system where the buoyancy effects of Fe partitioning are likely to be large. The method is based on molecular dynamics driven by density functional theory (spin polarized, PBEsol+U). We compute the free energy of Mg for Fe substitution in simulations of liquid and B1 crystalline phases via adiabatic switching. We investigate the dependence of partitioning on pressure, temperature, and iron concentration. We find that the liquid is denser than the coexisting crystalline phase at all conditions studies. We also find that the high-spin to low-spin transition in the crystal and the liquid, have an important influence on partitioning behavior.
An iterative network partition algorithm for accurate identification of dense network modules

PubMed Central

Sun, Siqi; Dong, Xinran; Fu, Yao; Tian, Weidong

2012-01-01

A key step in network analysis is to partition a complex network into dense modules. Currently, modularity is one of the most popular benefit functions used to partition network modules. However, recent studies suggested that it has an inherent limitation in detecting dense network modules. In this study, we observed that despite the limitation, modularity has the advantage of preserving the primary network structure of the undetected modules. Thus, we have developed a simple iterative Network Partition (iNP) algorithm to partition a network. The iNP algorithm provides a general framework in which any modularity-based algorithm can be implemented in the network partition step. Here, we tested iNP with three modularity-based algorithms: multi-step greedy (MSG), spectral clustering and Qcut. Compared with the original three methods, iNP achieved a significant improvement in the quality of network partition in a benchmark study with simulated networks, identified more modules with significantly better enrichment of functionally related genes in both yeast protein complex network and breast cancer gene co-expression network, and discovered more cancer-specific modules in the cancer gene co-expression network. As such, iNP should have a broad application as a general method to assist in the analysis of biological networks. PMID:22121225
Partitioning of monomethylmercury between freshwater algae and water.

PubMed

Miles, C J; Moye, H A; Phlips, E J; Sargent, B

2001-11-01

Phytoplankton-water monomethylmercury (MeHg) partition constants (KpI) have been determined in the laboratory for two green algae Selenastrum capricornutum and Cosmarium botrytis, the blue-green algae Schizothrix calcicola, and the diatom Thallasiosira spp., algal species that are commonly found in natural surface waters. Two methods were used to determine KpI, the Freundlich isotherm method and the flow-through/dialysis bag method. Both methods yielded KpI values of about 10(6.6) for S. capricornutum and were not significantly different. The KpI for the four algae studied were similar except for Schizothrix, which was significantly lower than S. capricornutum. The KpI for MeHg and S. capricornutum (exponential growth) was not significantly different in systems with predominantly MeHgOH or MeHgCl species. This is consistent with other studies that show metal speciation controls uptake kinetics, but the reactivity with intracellular components controls steady-state concentrations. Partitioning constants determined with exponential and stationary phase S. capricornutum cells at the same conditions were not significantly different, while the partitioning constant for exponential phase, phosphorus-limited cells was significantly lower, suggesting that P-limitation alters the ecophysiology of S. capricornutum sufficiently to impact partitioning, which may then ultimately affect mercury levels in higher trophic species.
A Multi-Objective Partition Method for Marine Sensor Networks Based on Degree of Event Correlation.

PubMed

Huang, Dongmei; Xu, Chenyixuan; Zhao, Danfeng; Song, Wei; He, Qi

2017-09-21

Existing marine sensor networks acquire data from sea areas that are geographically divided, and store the data independently in their affiliated sea area data centers. In the case of marine events across multiple sea areas, the current network structure needs to retrieve data from multiple data centers, and thus severely affects real-time decision making. In this study, in order to provide a fast data retrieval service for a marine sensor network, we use all the marine sensors as the vertices, establish the edge based on marine events, and abstract the marine sensor network as a graph. Then, we construct a multi-objective balanced partition method to partition the abstract graph into multiple regions and store them in the cloud computing platform. This method effectively increases the correlation of the sensors and decreases the retrieval cost. On this basis, an incremental optimization strategy is designed to dynamically optimize existing partitions when new sensors are added into the network. Experimental results show that the proposed method can achieve the optimal layout for distributed storage in the process of disaster data retrieval in the China Sea area, and effectively optimize the result of partitions when new buoys are deployed, which eventually will provide efficient data access service for marine events.
The oligonucleotide frequency derived error gradient and its application to the binning of metagenome fragments

PubMed Central

2009-01-01

Background The characterisation, or binning, of metagenome fragments is an important first step to further downstream analysis of microbial consortia. Here, we propose a one-dimensional signature, OFDEG, derived from the oligonucleotide frequency profile of a DNA sequence, and show that it is possible to obtain a meaningful phylogenetic signal for relatively short DNA sequences. The one-dimensional signal is essentially a compact representation of higher dimensional feature spaces of greater complexity and is intended to improve on the tetranucleotide frequency feature space preferred by current compositional binning methods. Results We compare the fidelity of OFDEG against tetranucleotide frequency in both an unsupervised and semi-supervised setting on simulated metagenome benchmark data. Four tests were conducted using assembler output of Arachne and phrap, and for each, performance was evaluated on contigs which are greater than or equal to 8 kbp in length and contigs which are composed of at least 10 reads. Using both G-C content in conjunction with OFDEG gave an average accuracy of 96.75% (semi-supervised) and 95.19% (unsupervised), versus 94.25% (semi-supervised) and 82.35% (unsupervised) for tetranucleotide frequency. Conclusion We have presented an observation of an alternative characteristic of DNA sequences. The proposed feature representation has proven to be more beneficial than the existing tetranucleotide frequency space to the metagenome binning problem. We do note, however, that our observation of OFDEG deserves further anlaysis and investigation. Unsupervised clustering revealed OFDEG related features performed better than standard tetranucleotide frequency in representing a relevant organism specific signal. Further improvement in binning accuracy is given by semi-supervised classification using OFDEG. The emphasis on a feature-driven, bottom-up approach to the problem of binning reveals promising avenues for future development of techniques to characterise short environmental sequences without bias toward cultivable organisms. PMID:19958473
Advanced soft computing diagnosis method for tumour grading.

PubMed

Papageorgiou, E I; Spyridonos, P P; Stylios, C D; Ravazoula, P; Groumpos, P P; Nikiforidis, G N

2006-01-01

To develop an advanced diagnostic method for urinary bladder tumour grading. A novel soft computing modelling methodology based on the augmentation of fuzzy cognitive maps (FCMs) with the unsupervised active Hebbian learning (AHL) algorithm is applied. One hundred and twenty-eight cases of urinary bladder cancer were retrieved from the archives of the Department of Histopathology, University Hospital of Patras, Greece. All tumours had been characterized according to the classical World Health Organization (WHO) grading system. To design the FCM model for tumour grading, three experts histopathologists defined the main histopathological features (concepts) and their impact on grade characterization. The resulted FCM model consisted of nine concepts. Eight concepts represented the main histopathological features for tumour grading. The ninth concept represented the tumour grade. To increase the classification ability of the FCM model, the AHL algorithm was applied to adjust the weights of the FCM. The proposed FCM grading model achieved a classification accuracy of 72.5%, 74.42% and 95.55% for tumours of grades I, II and III, respectively. An advanced computerized method to support tumour grade diagnosis decision was proposed and developed. The novelty of the method is based on employing the soft computing method of FCMs to represent specialized knowledge on histopathology and on augmenting FCMs ability using an unsupervised learning algorithm, the AHL. The proposed method performs with reasonably high accuracy compared to other existing methods and at the same time meets the physicians' requirements for transparency and explicability.
Comparison of Source Partitioning Methods for CO2 and H2O Fluxes Based on High Frequency Eddy Covariance Data

NASA Astrophysics Data System (ADS)

Klosterhalfen, Anne; Moene, Arnold; Schmidt, Marius; Ney, Patrizia; Graf, Alexander

2017-04-01

Source partitioning of eddy covariance (EC) measurements of CO2 into respiration and photosynthesis is routinely used for a better understanding of the exchange of greenhouse gases, especially between terrestrial ecosystems and the atmosphere. The most frequently used methods are usually based either on relations of fluxes to environmental drivers or on chamber measurements. However, they often depend strongly on assumptions or invasive measurements and do usually not offer partitioning estimates for latent heat fluxes into evaporation and transpiration. SCANLON and SAHU (2008) and SCANLON and KUSTAS (2010) proposed an promising method to estimate the contributions of transpiration and evaporation using measured high frequency time series of CO2 and H2O fluxes - no extra instrumentation necessary. This method (SK10 in the following) is based on the spatial separation and relative strength of sources and sinks of CO2 and water vapor among the sub-canopy and canopy. Assuming that air from those sources and sinks is not yet perfectly mixed before reaching EC sensors, partitioning is estimated based on the separate application of the flux-variance similarity theory to the stomatal and non-stomatal components of the regarded fluxes, as well as on additional assumptions on stomatal water use efficiency (WUE). The CO2 partitioning method after THOMAS et al. (2008) (TH08 in the following) also follows the argument that the dissimilarities of sources and sinks in and below a canopy affect the relation between H2O and CO2 fluctuations. Instead of involving assumptions on WUE, TH08 directly screens their scattergram for signals of joint respiration and evaporation events and applies a conditional sampling methodology. In spite of their different main targets (H2O vs. CO2), both methods can yield partitioning estimates on both fluxes. We therefore compare various sub-methods of SK10 and TH08 including own modifications (e.g., cluster analysis) to each other, to established source partitioning methods, and to chamber measurements at various agroecosystems. Further, profile measurements and a canopy-resolving Large Eddy Simulation model are used to test the assumptions involved in SK10. Scanlon, T.M., Kustas, W.P., 2010. Partitioning carbon dioxide and water vapor fluxes using correlation analysis. Agricultural and Forest Meteorology 150 (1), 89-99. Scanlon, T.M., Sahu, P., 2008. On the correlation structure of water vapor and carbon dioxide in the atmospheric surface layer: A basis for flux partitioning. Water Resources Research 44 (10), W10418, 15 pp. Thomas, C., Martin, J.G., Goeckede, M., Siqueira, M.B., Foken, T., Law, B.E., Loescher H.W., Katul, G., 2008. Estimating daytime subcanopy respiration from conditional sampling methods applied to multi-scalar high frequency turbulence time series. Agricultural and Forest Meteorology 148 (8-9), 1210-1229.
Maritime Search and Rescue via Multiple Coordinated UAS

DTIC Science & Technology

2016-01-01

partitioning method uses the underlying probability distribution assumptions to place that probability near the geometric center of the partitions. There...During partitioning the known locations are accommodated, but the unaccounted for objects are placed into geometrically unfavorable conditions. The...Zeitlin, A.D.: UAS Sence and Avoid Develop- ment - the Challenges of Technology, Standards, and Certification. Aerospace Sciences Meeting including
Cost efficient CFD simulations: Proper selection of domain partitioning strategies

NASA Astrophysics Data System (ADS)

Haddadi, Bahram; Jordan, Christian; Harasek, Michael

2017-10-01

Computational Fluid Dynamics (CFD) is one of the most powerful simulation methods, which is used for temporally and spatially resolved solutions of fluid flow, heat transfer, mass transfer, etc. One of the challenges of Computational Fluid Dynamics is the extreme hardware demand. Nowadays super-computers (e.g. High Performance Computing, HPC) featuring multiple CPU cores are applied for solving-the simulation domain is split into partitions for each core. Some of the different methods for partitioning are investigated in this paper. As a practical example, a new open source based solver was utilized for simulating packed bed adsorption, a common separation method within the field of thermal process engineering. Adsorption can for example be applied for removal of trace gases from a gas stream or pure gases production like Hydrogen. For comparing the performance of the partitioning methods, a 60 million cell mesh for a packed bed of spherical adsorbents was created; one second of the adsorption process was simulated. Different partitioning methods available in OpenFOAM® (Scotch, Simple, and Hierarchical) have been used with different numbers of sub-domains. The effect of the different methods and number of processor cores on the simulation speedup and also energy consumption were investigated for two different hardware infrastructures (Vienna Scientific Clusters VSC 2 and VSC 3). As a general recommendation an optimum number of cells per processor core was calculated. Optimized simulation speed, lower energy consumption and consequently the cost effects are reported here.
Polymers as Reference Partitioning Phase: Polymer Calibration for an Analytically Operational Approach To Quantify Multimedia Phase Partitioning.

PubMed

Gilbert, Dorothea; Witt, Gesine; Smedes, Foppe; Mayer, Philipp

2016-06-07

Polymers are increasingly applied for the enrichment of hydrophobic organic chemicals (HOCs) from various types of samples and media in many analytical partitioning-based measuring techniques. We propose using polymers as a reference partitioning phase and introduce polymer-polymer partitioning as the basis for a deeper insight into partitioning differences of HOCs between polymers, calibrating analytical methods, and consistency checking of existing and calculation of new partition coefficients. Polymer-polymer partition coefficients were determined for polychlorinated biphenyls (PCBs), polycyclic aromatic hydrocarbons (PAHs), and organochlorine pesticides (OCPs) by equilibrating 13 silicones, including polydimethylsiloxane (PDMS) and low-density polyethylene (LDPE) in methanol-water solutions. Methanol as cosolvent ensured that all polymers reached equilibrium while its effect on the polymers' properties did not significantly affect silicone-silicone partition coefficients. However, we noticed minor cosolvent effects on determined polymer-polymer partition coefficients. Polymer-polymer partition coefficients near unity confirmed identical absorption capacities of several PDMS materials, whereas larger deviations from unity were indicated within the group of silicones and between silicones and LDPE. Uncertainty in polymer volume due to imprecise coating thickness or the presence of fillers was identified as the source of error for partition coefficients. New polymer-based (LDPE-lipid, PDMS-air) and multimedia partition coefficients (lipid-water, air-water) were calculated by applying the new concept of a polymer as reference partitioning phase and by using polymer-polymer partition coefficients as conversion factors. The present study encourages the use of polymer-polymer partition coefficients, recognizing that polymers can serve as a linking third phase for a quantitative understanding of equilibrium partitioning of HOCs between any two phases.

Enhancing data locality by using terminal propagation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Van Driessche, R.

1995-12-31

Terminal propagation is a method developed in the circuit placement community for adding constraints to graph partitioning problems. This paper adapts and expands this idea, and applies it to the problem of partitioning data structures among the processors of a parallel computer. We show how the constraints in terminal propagation can be used to encourage partitions in which messages are communicated only between architecturally near processors. We then show how these constraints can be handled in two important partitioning algorithms, spectral bisection and multilevel-KL. We compare the quality of partitions generated by these algorithms to each other and to Partitionsmore » generated by more familiar techniques.« less
A partitioning strategy for nonuniform problems on multiprocessors

NASA Technical Reports Server (NTRS)

Berger, M. J.; Bokhari, S.

1985-01-01

The partitioning of a problem on a domain with unequal work estimates in different subddomains is considered in a way that balances the work load across multiple processors. Such a problem arises for example in solving partial differential equations using an adaptive method that places extra grid points in certain subregions of the domain. A binary decomposition of the domain is used to partition it into rectangles requiring equal computational effort. The communication costs of mapping this partitioning onto different microprocessors: a mesh-connected array, a tree machine and a hypercube is then studied. The communication cost expressions can be used to determine the optimal depth of the above partitioning.
A Fifth-order Symplectic Trigonometrically Fitted Partitioned Runge-Kutta Method

NASA Astrophysics Data System (ADS)

Kalogiratou, Z.; Monovasilis, Th.; Simos, T. E.

2007-09-01

Trigonometrically fitted symplectic Partitioned Runge Kutta (EFSPRK) methods for the numerical integration of Hamoltonian systems with oscillatory solutions are derived. These methods integrate exactly differential systems whose solutions can be expressed as linear combinations of the set of functions sin(wx),cos(wx), w∈R. We modify a fifth order symplectic PRK method with six stages so to derive an exponentially fitted SPRK method. The methods are tested on the numerical integration of the two body problem.
Unsupervised progressive elastic band exercises for frail geriatric inpatients objectively monitored by new exercise-integrated technology-a feasibility trial with an embedded qualitative study.

PubMed

Rathleff, C R; Bandholm, T; Spaich, E G; Jorgensen, M; Andreasen, J

2017-01-01

Frailty is a serious condition frequently present in geriatric inpatients that potentially causes serious adverse events. Strength training is acknowledged as a means of preventing or delaying frailty and loss of function in these patients. However, limited hospital resources challenge the amount of supervised training, and unsupervised training could possibly supplement supervised training thereby increasing the total exercise dose during admission. A new valid and reliable technology, the BandCizer, objectively measures the exact training dosage performed. The purpose was to investigate feasibility and acceptability of an unsupervised progressive strength training intervention monitored by BandCizer for frail geriatric inpatients. This feasibility trial included 15 frail inpatients at a geriatric ward. At hospitalization, the patients were prescribed two elastic band exercises to be performed unsupervised once daily. A BandCizer Datalogger enabling measurement of the number of sets, repetitions, and time-under-tension was attached to the elastic band. The patients were instructed in performing strength training: 3 sets of 10 repetitions (10-12 repetition maximum (RM)) with a separation of 2-min pauses and a time-under-tension of 8 s. The feasibility criterion for the unsupervised progressive exercises was that 33% of the recommended number of sets would be performed by at least 30% of patients. In addition, patients and staff were interviewed about their experiences with the intervention. Four (27%) out of 15 patients completed 33% of the recommended number of sets. For the total sample, the average percent of performed sets was 23% and for those who actually trained ( n = 12) 26%. Patients and staff expressed a general positive attitude towards the unsupervised training as an addition to the supervised training sessions. However, barriers were also described-especially constant interruptions. Based on the predefined criterion for feasibility, the unsupervised training was not feasible, although the criterion was almost met. The patients and staff mainly expressed positive attitudes towards the unsupervised training. As even a small training dosage has been shown to improve the physical performance of geriatric inpatients, the proposed intervention might be relevant if the interruptions are decreased in future large-scale trials and if the adherence is increased. ClinicalTrials.gov: NCT02702557, February 29, 2016. Data Protection Agency: 2016-42, February 25, 2016. Ethics Committee: No registration needed, December 8, 2015 (e-mail correspondence).
Experimental Method Development for Estimating Solid-phase Diffusion Coefficients and Material/Air Partition Coefficients of SVOCs

EPA Science Inventory

The solid-phase diffusion coefficient (Dm) and material-air partition coefficient (Kma) are key parameters for characterizing the sources and transport of semivolatile organic compounds (SVOCs) in the indoor environment. In this work, a new experimental method was developed to es...
77 FR 46289 - Technical Corrections to Organizational Names, Addresses, and OMB Control Numbers

Federal Register 2010, 2011, 2012, 2013, 2014

2012-08-03

...]795.232 Inhalation and dermal pharmacokinetics of commercial hexane. * * * * * (c) * * * (2) * * * (i... to read as follows: Sec. 799.6755 TSCA partition coefficient (n-octanol/water), shake flask method... read as follows: Sec. 799.6756 TSCA partition coefficient (n-octanol/water), generator column method...
Surveillance system and method having an operating mode partitioned fault classification model

NASA Technical Reports Server (NTRS)

Bickford, Randall L. (Inventor)

2005-01-01

A system and method which partitions a parameter estimation model, a fault detection model, and a fault classification model for a process surveillance scheme into two or more coordinated submodels together providing improved diagnostic decision making for at least one determined operating mode of an asset.
Estimation of octanol/water partition coefficient and aqueous solubility of environmental chemicals using molecular fingerprints and machine learning methods

EPA Science Inventory

Octanol/water partition coefficient (logP) and aqueous solubility (logS) are two important parameters in pharmacology and toxicology studies, and experimental measurements are usually time-consuming and expensive. In the present research, novel methods are presented for the estim...
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

PubMed

Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

2013-03-01

Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
Alexithymia and personality traits of patients with inflammatory bowel disease

PubMed Central

La Barbera, D.; Bonanno, B.; Rumeo, M. V.; Alabastro, V.; Frenda, M.; Massihnia, E.; Morgante, M. C.; Sideli, L.; Craxì, A.; Cappello, M.; Tumminello, M.; Miccichè, S.; Nastri, L.

2017-01-01

Psychological factors, specific lifestyles and environmental stressors may influence etiopathogenesis and evolution of chronic diseases. We investigate the association between Chronic Inflammatory Bowel Diseases (IBD) and psychological dimensions such as personality traits, defence mechanisms, and Alexithymia, i.e. deficits of emotional awareness with inability to give a name to emotional states. We analyzed a survey of 100 patients with IBD and a control group of 66 healthy individuals. The survey involved filling out clinical and anamnestic forms and administering five psychological tests. These were then analyzed by using a network representation of the system by considering it as a bipartite network in which elements of one set are the 166 individuals, while the elements of the other set are the outcome of the survey. We then run an unsupervised community detection algorithm providing a partition of the 166 participants into clusters. That allowed us to determine a statistically significant association between psychological factors and IBD. We find clusters of patients characterized by high neuroticism, alexithymia, impulsivity and severe physical conditions and being of female gender. We therefore hypothesize that in a population of alexithymic patients, females are inclined to develop psychosomatic diseases like IBD while males might eventually develop behavioral disorders. PMID:28150800
The impact of aerosol composition on the particle to gas partitioning of reactive mercury.

PubMed

Rutter, Andrew P; Schauer, James J

2007-06-01

A laboratory system was developed to study the gas-particle partitioning of reactive mercury (RM) as a function of aerosol composition in synthetic atmospheric particulate matter. The collection of RM was achieved by filter- and sorbent-based methods. Analyses of the RM collected on the filters and sorbents were performed using thermal extraction combined with cold vapor atomic fluorescence spectroscopy (CVAFS), allowing direct measurement of the RM load on the substrates. Laboratory measurements of the gas-particle partitioning coefficients of RM to atmospheric aerosol particles revealed a strong dependence on aerosol composition, with partitioning coefficients that varied by orders of magnitude depending on the composition of the particles. Particles of sodium nitrate and the chlorides of potassium and sodium had high partitioning coefficients, shifting the RM partitioning toward the particle phase, while ammonium sulfate, levoglucosan, and adipic acid caused the RM to partition toward the gas phase and, therefore, had partitioning coefficients that were lower by orders of magnitude.
A Dual Super-Element Domain Decomposition Approach for Parallel Nonlinear Finite Element Analysis

NASA Astrophysics Data System (ADS)

Jokhio, G. A.; Izzuddin, B. A.

2015-05-01

This article presents a new domain decomposition method for nonlinear finite element analysis introducing the concept of dual partition super-elements. The method extends ideas from the displacement frame method and is ideally suited for parallel nonlinear static/dynamic analysis of structural systems. In the new method, domain decomposition is realized by replacing one or more subdomains in a "parent system," each with a placeholder super-element, where the subdomains are processed separately as "child partitions," each wrapped by a dual super-element along the partition boundary. The analysis of the overall system, including the satisfaction of equilibrium and compatibility at all partition boundaries, is realized through direct communication between all pairs of placeholder and dual super-elements. The proposed method has particular advantages for matrix solution methods based on the frontal scheme, and can be readily implemented for existing finite element analysis programs to achieve parallelization on distributed memory systems with minimal intervention, thus overcoming memory bottlenecks typically faced in the analysis of large-scale problems. Several examples are presented in this article which demonstrate the computational benefits of the proposed parallel domain decomposition approach and its applicability to the nonlinear structural analysis of realistic structural systems.
Online and unsupervised face recognition for continuous video stream

NASA Astrophysics Data System (ADS)

Huo, Hongwen; Feng, Jufu

2009-10-01

We present a novel online face recognition approach for video stream in this paper. Our method includes two stages: pre-training and online training. In the pre-training phase, our method observes interactions, collects batches of input data, and attempts to estimate their distributions (Box-Cox transformation is adopted here to normalize rough estimates). In the online training phase, our method incrementally improves classifiers' knowledge of the face space and updates it continuously with incremental eigenspace analysis. The performance achieved by our method shows its great potential in video stream processing.
Partitioning of functional gene expression data using principal points.

PubMed

Kim, Jaehee; Kim, Haseong

2017-10-12

DNA microarrays offer motivation and hope for the simultaneous study of variations in multiple genes. Gene expression is a temporal process that allows variations in expression levels with a characterized gene function over a period of time. Temporal gene expression curves can be treated as functional data since they are considered as independent realizations of a stochastic process. This process requires appropriate models to identify patterns of gene functions. The partitioning of the functional data can find homogeneous subgroups of entities for the massive genes within the inherent biological networks. Therefor it can be a useful technique for the analysis of time-course gene expression data. We propose a new self-consistent partitioning method of functional coefficients for individual expression profiles based on the orthonormal basis system. A principal points based functional partitioning method is proposed for time-course gene expression data. The method explores the relationship between genes using Legendre coefficients as principal points to extract the features of gene functions. Our proposed method provides high connectivity in connectedness after clustering for simulated data and finds a significant subsets of genes with the increased connectivity. Our approach has comparative advantages that fewer coefficients are used from the functional data and self-consistency of principal points for partitioning. As real data applications, we are able to find partitioned genes through the gene expressions found in budding yeast data and Escherichia coli data. The proposed method benefitted from the use of principal points, dimension reduction, and choice of orthogonal basis system as well as provides appropriately connected genes in the resulting subsets. We illustrate our method by applying with each set of cell-cycle-regulated time-course yeast genes and E. coli genes. The proposed method is able to identify highly connected genes and to explore the complex dynamics of biological systems in functional genomics.
Subtyping of Children with Developmental Dyslexia via Bootstrap Aggregated Clustering and the Gap Statistic: Comparison with the Double-Deficit Hypothesis

ERIC Educational Resources Information Center

King, Wayne M.; Giess, Sally A.; Lombardino, Linda J.

2007-01-01

Background: The marked degree of heterogeneity in persons with developmental dyslexia has motivated the investigation of possible subtypes. Attempts have proceeded both from theoretical models of reading and the application of unsupervised learning (clustering) methods. Previous cluster analyses of data obtained from persons with reading…
Sexual Behaviors of College Freshmen and the Need for University-Based Education

ERIC Educational Resources Information Center

Wyatt, Tammy; Oswalt, Sara

2014-01-01

Problem: College life offers several challenges for students, particularly freshmen who often find themselves in an unsupervised environment with multiple opportunities to engage in a variety of risk behaviors. The purpose of this study was to examine the sexual behaviors of college freshmen enrolled at a U.S. Hispanic Serving Institution. Method:…
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
dPCR: A Technology Review

PubMed Central

Quan, Phenix-Lan; Sauzade, Martin

2018-01-01

Digital Polymerase Chain Reaction (dPCR) is a novel method for the absolute quantification of target nucleic acids. Quantification by dPCR hinges on the fact that the random distribution of molecules in many partitions follows a Poisson distribution. Each partition acts as an individual PCR microreactor and partitions containing amplified target sequences are detected by fluorescence. The proportion of PCR-positive partitions suffices to determine the concentration of the target sequence without a need for calibration. Advances in microfluidics enabled the current revolution of digital quantification by providing efficient partitioning methods. In this review, we compare the fundamental concepts behind the quantification of nucleic acids by dPCR and quantitative real-time PCR (qPCR). We detail the underlying statistics of dPCR and explain how it defines its precision and performance metrics. We review the different microfluidic digital PCR formats, present their underlying physical principles, and analyze the technological evolution of dPCR platforms. We present the novel multiplexing strategies enabled by dPCR and examine how isothermal amplification could be an alternative to PCR in digital assays. Finally, we determine whether the theoretical advantages of dPCR over qPCR hold true by perusing studies that directly compare assays implemented with both methods. PMID:29677144
Prediction of distribution coefficient from structure. 1. Estimation method.

PubMed

Csizmadia, F; Tsantili-Kakoulidou, A; Panderi, I; Darvas, F

1997-07-01

A method has been developed for the estimation of the distribution coefficient (D), which considers the microspecies of a compound. D is calculated from the microscopic dissociation constants (microconstants), the partition coefficients of the microspecies, and the counterion concentration. A general equation for the calculation of D at a given pH is presented. The microconstants are calculated from the structure using Hammett and Taft equations. The partition coefficients of the ionic microspecies are predicted by empirical equations using the dissociation constants and the partition coefficient of the uncharged species, which are estimated from the structure by a Linear Free Energy Relationship method. The algorithm is implemented in a program module called PrologD.
Nonlinear phase noise tolerance for coherent optical systems using soft-decision-aided ML carrier phase estimation enhanced with constellation partitioning

NASA Astrophysics Data System (ADS)

Li, Yan; Wu, Mingwei; Du, Xinwei; Xu, Zhuoran; Gurusamy, Mohan; Yu, Changyuan; Kam, Pooi-Yuen

2018-02-01

A novel soft-decision-aided maximum likelihood (SDA-ML) carrier phase estimation method and its simplified version, the decision-aided and soft-decision-aided maximum likelihood (DA-SDA-ML) methods are tested in a nonlinear phase noise-dominant channel. The numerical performance results show that both the SDA-ML and DA-SDA-ML methods outperform the conventional DA-ML in systems with constant-amplitude modulation formats. In addition, modified algorithms based on constellation partitioning are proposed. With partitioning, the modified SDA-ML and DA-SDA-ML are shown to be useful for compensating the nonlinear phase noise in multi-level modulation systems.

"K"-Balance Partitioning: An Exact Method with Applications to Generalized Structural Balance and Other Psychological Contexts

ERIC Educational Resources Information Center

Brusco, Michael; Steinley, Douglas

2010-01-01

Structural balance theory (SBT) has maintained a venerable status in the psychological literature for more than 5 decades. One important problem pertaining to SBT is the approximation of structural or generalized balance via the partitioning of the vertices of a signed graph into "K" clusters. This "K"-balance partitioning problem also has more…
Study and modeling of the evolution of gas-liquid partitioning of hydrogen sulfide in model solutions simulating winemaking fermentations.

PubMed

Mouret, Jean-Roch; Sablayrolles, Jean-Marie; Farines, Vincent

2015-04-01

The knowledge of gas-liquid partitioning of aroma compounds during winemaking fermentation could allow optimization of fermentation management, maximizing concentrations of positive markers of aroma and minimizing formation of molecules, such as hydrogen sulfide (H2S), responsible for defects. In this study, the effect of the main fermentation parameters on the gas-liquid partition coefficients (Ki) of H2S was assessed. The Ki for this highly volatile sulfur compound was measured in water by an original semistatic method developed in this work for the determination of gas-liquid partitioning. This novel method was validated and then used to determine the Ki of H2S in synthetic media simulating must, fermenting musts at various steps of the fermentation process, and wine. Ki values were found to be mainly dependent on the temperature but also varied with the composition of the medium, especially with the glucose concentration. Finally, a model was developed to quantify the gas-liquid partitioning of H2S in synthetic media simulating must to wine. This model allowed a very accurate prediction of the partition coefficient of H2S: the difference between observed and predicted values never exceeded 4%.
Total strain version of strainrange partitioning for thermomechanical fatigue at low strains

NASA Technical Reports Server (NTRS)

Halford, G. R.; Saltsman, J. F.

1987-01-01

A new method is proposed for characterizing and predicting the thermal fatigue behavior of materials. The method is based on three innovations in characterizing high temperature material behavior: (1) the bithermal concept of fatigue testing; (2) advanced, nonlinear, cyclic constitutive models; and (3) the total strain version of traditional strainrange partitioning.
Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

PubMed Central

Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

2015-01-01

This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003
Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.

PubMed

Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

2015-01-01

This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.
Modifications to the Patient Rule-Induction Method that utilize non-additive combinations of genetic and environmental effects to define partitions that predict ischemic heart disease.

PubMed

Dyson, Greg; Frikke-Schmidt, Ruth; Nordestgaard, Børge G; Tybjaerg-Hansen, Anne; Sing, Charles F

2009-05-01

This article extends the Patient Rule-Induction Method (PRIM) for modeling cumulative incidence of disease developed by Dyson et al. (Genet Epidemiol 31:515-527) to include the simultaneous consideration of non-additive combinations of predictor variables, a significance test of each combination, an adjustment for multiple testing and a confidence interval for the estimate of the cumulative incidence of disease in each partition. We employ the partitioning algorithm component of the Combinatorial Partitioning Method to construct combinations of predictors, permutation testing to assess the significance of each combination, theoretical arguments for incorporating a multiple testing adjustment and bootstrap resampling to produce the confidence intervals. An illustration of this revised PRIM utilizing a sample of 2,258 European male participants from the Copenhagen City Heart Study is presented that assesses the utility of genetic variants in predicting the presence of ischemic heart disease beyond the established risk factors.
Modifications to the Patient Rule-Induction Method that utilize non-additive combinations of genetic and environmental effects to define partitions that predict ischemic heart disease

PubMed Central

Dyson, Greg; Frikke-Schmidt, Ruth; Nordestgaard, Børge G.; Tybjærg-Hansen, Anne; Sing, Charles F.

2009-01-01

This paper extends the Patient Rule-Induction Method (PRIM) for modeling cumulative incidence of disease developed by Dyson et al. (2007) to include the simultaneous consideration of non-additive combinations of predictor variables, a significance test of each combination, an adjustment for multiple testing and a confidence interval for the estimate of the cumulative incidence of disease in each partition. We employ the partitioning algorithm component of the Combinatorial Partitioning Method (CPM) to construct combinations of predictors, permutation testing to assess the significance of each combination, theoretical arguments for incorporating a multiple testing adjustment and bootstrap resampling to produce the confidence intervals. An illustration of this revised PRIM utilizing a sample of 2258 European male participants from the Copenhagen City Heart Study is presented that assesses the utility of genetic variants in predicting the presence of ischemic heart disease beyond the established risk factors. PMID:19025787
Multiple Attribute Group Decision-Making Methods Based on Trapezoidal Fuzzy Two-Dimensional Linguistic Partitioned Bonferroni Mean Aggregation Operators.

PubMed

Yin, Kedong; Yang, Benshuo; Li, Xuemei

2018-01-24

In this paper, we investigate multiple attribute group decision making (MAGDM) problems where decision makers represent their evaluation of alternatives by trapezoidal fuzzy two-dimensional uncertain linguistic variable. To begin with, we introduce the definition, properties, expectation, operational laws of trapezoidal fuzzy two-dimensional linguistic information. Then, to improve the accuracy of decision making in some case where there are a sort of interrelationship among the attributes, we analyze partition Bonferroni mean (PBM) operator in trapezoidal fuzzy two-dimensional variable environment and develop two operators: trapezoidal fuzzy two-dimensional linguistic partitioned Bonferroni mean (TF2DLPBM) aggregation operator and trapezoidal fuzzy two-dimensional linguistic weighted partitioned Bonferroni mean (TF2DLWPBM) aggregation operator. Furthermore, we develop a novel method to solve MAGDM problems based on TF2DLWPBM aggregation operator. Finally, a practical example is presented to illustrate the effectiveness of this method and analyses the impact of different parameters on the results of decision-making.
Multiple Attribute Group Decision-Making Methods Based on Trapezoidal Fuzzy Two-Dimensional Linguistic Partitioned Bonferroni Mean Aggregation Operators

PubMed Central

Yin, Kedong; Yang, Benshuo

2018-01-01

In this paper, we investigate multiple attribute group decision making (MAGDM) problems where decision makers represent their evaluation of alternatives by trapezoidal fuzzy two-dimensional uncertain linguistic variable. To begin with, we introduce the definition, properties, expectation, operational laws of trapezoidal fuzzy two-dimensional linguistic information. Then, to improve the accuracy of decision making in some case where there are a sort of interrelationship among the attributes, we analyze partition Bonferroni mean (PBM) operator in trapezoidal fuzzy two-dimensional variable environment and develop two operators: trapezoidal fuzzy two-dimensional linguistic partitioned Bonferroni mean (TF2DLPBM) aggregation operator and trapezoidal fuzzy two-dimensional linguistic weighted partitioned Bonferroni mean (TF2DLWPBM) aggregation operator. Furthermore, we develop a novel method to solve MAGDM problems based on TF2DLWPBM aggregation operator. Finally, a practical example is presented to illustrate the effectiveness of this method and analyses the impact of different parameters on the results of decision-making. PMID:29364849
Field-gradient partitioning for fracture and frictional contact in the material point method: Field-gradient partitioning for fracture and frictional contact in the material point method [Fracture and frictional contact in material point method using damage-field gradients for velocity-field partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Homel, Michael A.; Herbold, Eric B.

Contact and fracture in the material point method require grid-scale enrichment or partitioning of material into distinct velocity fields to allow for displacement or velocity discontinuities at a material interface. We present a new method which a kernel-based damage field is constructed from the particle data. The gradient of this field is used to dynamically repartition the material into contact pairs at each node. Our approach avoids the need to construct and evolve explicit cracks or contact surfaces and is therefore well suited to problems involving complex 3-D fracture with crack branching and coalescence. A straightforward extension of this approachmore » permits frictional ‘self-contact’ between surfaces that are initially part of a single velocity field, enabling more accurate simulation of granular flow, porous compaction, fragmentation, and comminution of brittle materials. Finally, numerical simulations of self contact and dynamic crack propagation are presented to demonstrate the accuracy of the approach.« less
Field-gradient partitioning for fracture and frictional contact in the material point method: Field-gradient partitioning for fracture and frictional contact in the material point method [Fracture and frictional contact in material point method using damage-field gradients for velocity-field partitioning

DOE PAGES

Homel, Michael A.; Herbold, Eric B.

2016-08-15

Contact and fracture in the material point method require grid-scale enrichment or partitioning of material into distinct velocity fields to allow for displacement or velocity discontinuities at a material interface. We present a new method which a kernel-based damage field is constructed from the particle data. The gradient of this field is used to dynamically repartition the material into contact pairs at each node. Our approach avoids the need to construct and evolve explicit cracks or contact surfaces and is therefore well suited to problems involving complex 3-D fracture with crack branching and coalescence. A straightforward extension of this approachmore » permits frictional ‘self-contact’ between surfaces that are initially part of a single velocity field, enabling more accurate simulation of granular flow, porous compaction, fragmentation, and comminution of brittle materials. Finally, numerical simulations of self contact and dynamic crack propagation are presented to demonstrate the accuracy of the approach.« less
Many-body formalism for fermions: The partition function

NASA Astrophysics Data System (ADS)

Watson, D. K.

2017-09-01

The partition function, a fundamental tenet in statistical thermodynamics, contains in principle all thermodynamic information about a system. It encapsulates both microscopic information through the quantum energy levels and statistical information from the partitioning of the particles among the available energy levels. For identical particles, this statistical accounting is complicated by the symmetry requirements of the allowed quantum states. In particular, for Fermi systems, the enforcement of the Pauli principle is typically a numerically demanding task, responsible for much of the cost of the calculations. The interplay of these three elements—the structure of the many-body spectrum, the statistical partitioning of the N particles among the available levels, and the enforcement of the Pauli principle—drives the behavior of mesoscopic and macroscopic Fermi systems. In this paper, we develop an approach for the determination of the partition function, a numerically difficult task, for systems of strongly interacting identical fermions and apply it to a model system of harmonically confined, harmonically interacting fermions. This approach uses a recently introduced many-body method that is an extension of the symmetry-invariant perturbation method (SPT) originally developed for bosons. It uses group theory and graphical techniques to avoid the heavy computational demands of conventional many-body methods which typically scale exponentially with the number of particles. The SPT application of the Pauli principle is trivial to implement since it is done "on paper" by imposing restrictions on the normal-mode quantum numbers at first order in the perturbation. The method is applied through first order and represents an extension of the SPT method to excited states. Our method of determining the partition function and various thermodynamic quantities is accurate and efficient and has the potential to yield interesting insight into the role played by the Pauli principle and the influence of large degeneracies on the emergence of the thermodynamic behavior of large-N systems.
[Analytic methods for seed models with genotype x environment interactions].

PubMed

Zhu, J

1996-01-01

Genetic models with genotype effect (G) and genotype x environment interaction effect (GE) are proposed for analyzing generation means of seed quantitative traits in crops. The total genetic effect (G) is partitioned into seed direct genetic effect (G0), cytoplasm genetic of effect (C), and maternal plant genetic effect (Gm). Seed direct genetic effect (G0) can be further partitioned into direct additive (A) and direct dominance (D) genetic components. Maternal genetic effect (Gm) can also be partitioned into maternal additive (Am) and maternal dominance (Dm) genetic components. The total genotype x environment interaction effect (GE) can also be partitioned into direct genetic by environment interaction effect (G0E), cytoplasm genetic by environment interaction effect (CE), and maternal genetic by environment interaction effect (GmE). G0E can be partitioned into direct additive by environment interaction (AE) and direct dominance by environment interaction (DE) genetic components. GmE can also be partitioned into maternal additive by environment interaction (AmE) and maternal dominance by environment interaction (DmE) genetic components. Partitions of genetic components are listed for parent, F1, F2 and backcrosses. A set of parents, their reciprocal F1 and F2 seeds is applicable for efficient analysis of seed quantitative traits. MINQUE(0/1) method can be used for estimating variance and covariance components. Unbiased estimation for covariance components between two traits can also be obtained by the MINQUE(0/1) method. Random genetic effects in seed models are predictable by the Adjusted Unbiased Prediction (AUP) approach with MINQUE(0/1) method. The jackknife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects, which can be further used in a t-test for parameter. Unbiasedness and efficiency for estimating variance components and predicting genetic effects are tested by Monte Carlo simulations.
Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods

PubMed Central

Chasin, Rachel; Rumshisky, Anna; Uzuner, Ozlem; Szolovits, Peter

2014-01-01

Objective To evaluate state-of-the-art unsupervised methods on the word sense disambiguation (WSD) task in the clinical domain. In particular, to compare graph-based approaches relying on a clinical knowledge base with bottom-up topic-modeling-based approaches. We investigate several enhancements to the topic-modeling techniques that use domain-specific knowledge sources. Materials and methods The graph-based methods use variations of PageRank and distance-based similarity metrics, operating over the Unified Medical Language System (UMLS). Topic-modeling methods use unlabeled data from the Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II) database to derive models for each ambiguous word. We investigate the impact of using different linguistic features for topic models, including UMLS-based and syntactic features. We use a sense-tagged clinical dataset from the Mayo Clinic for evaluation. Results The topic-modeling methods achieve 66.9% accuracy on a subset of the Mayo Clinic's data, while the graph-based methods only reach the 40–50% range, with a most-frequent-sense baseline of 56.5%. Features derived from the UMLS semantic type and concept hierarchies do not produce a gain over bag-of-words features in the topic models, but identifying phrases from UMLS and using syntax does help. Discussion Although topic models outperform graph-based methods, semantic features derived from the UMLS prove too noisy to improve performance beyond bag-of-words. Conclusions Topic modeling for WSD provides superior results in the clinical domain; however, integration of knowledge remains to be effectively exploited. PMID:24441986
Genetic Classification of Populations Using Supervised Learning

PubMed Central

Bridges, Michael; Heron, Elizabeth A.; O'Dushlaine, Colm; Segurado, Ricardo; Morris, Derek; Corvin, Aiden; Gill, Michael; Pinto, Carlos

2011-01-01

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case–control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies. PMID:21589856
Individualized Functional Parcellation of the Human Amygdala Using a Semi-supervised Clustering Method: A 7T Resting State fMRI Study.

PubMed

Zhang, Xianchang; Cheng, Hewei; Zuo, Zhentao; Zhou, Ke; Cong, Fei; Wang, Bo; Zhuo, Yan; Chen, Lin; Xue, Rong; Fan, Yong

2018-01-01

The amygdala plays an important role in emotional functions and its dysfunction is considered to be associated with multiple psychiatric disorders in humans. Cytoarchitectonic mapping has demonstrated that the human amygdala complex comprises several subregions. However, it's difficult to delineate boundaries of these subregions in vivo even if using state of the art high resolution structural MRI. Previous attempts to parcellate this small structure using unsupervised clustering methods based on resting state fMRI data suffered from the low spatial resolution of typical fMRI data, and it remains challenging for the unsupervised methods to define subregions of the amygdala in vivo . In this study, we developed a novel brain parcellation method to segment the human amygdala into spatially contiguous subregions based on 7T high resolution fMRI data. The parcellation was implemented using a semi-supervised spectral clustering (SSC) algorithm at an individual subject level. Under guidance of prior information derived from the Julich cytoarchitectonic atlas, our method clustered voxels of the amygdala into subregions according to similarity measures of their functional signals. As a result, three distinct amygdala subregions can be obtained in each hemisphere for every individual subject. Compared with the cytoarchitectonic atlas, our method achieved better performance in terms of subregional functional homogeneity. Validation experiments have also demonstrated that the amygdala subregions obtained by our method have distinctive, lateralized functional connectivity (FC) patterns. Our study has demonstrated that the semi-supervised brain parcellation method is a powerful tool for exploring amygdala subregional functions.
Exploiting Redundancy for Flexible Behavior: Unsupervised Learning in a Modular Sensorimotor Control Architecture

ERIC Educational Resources Information Center

Butz, Martin V.; Herbort, Oliver; Hoffmann, Joachim

2007-01-01

Autonomously developing organisms face several challenges when learning reaching movements. First, motor control is learned unsupervised or self-supervised. Second, knowledge of sensorimotor contingencies is acquired in contexts in which action consequences unfold in time. Third, motor redundancies must be resolved. To solve all 3 of these…
Bilingual Lexical Interactions in an Unsupervised Neural Network Model

ERIC Educational Resources Information Center

Zhao, Xiaowei; Li, Ping

2010-01-01

In this paper we present an unsupervised neural network model of bilingual lexical development and interaction. We focus on how the representational structures of the bilingual lexicons can emerge, develop, and interact with each other as a function of the learning history. The results show that: (1) distinct representations for the two lexicons…
Uncertain Henry's law constants compromise equilibrium partitioning calculations of atmospheric oxidation products

NASA Astrophysics Data System (ADS)

Wang, Chen; Yuan, Tiange; Wood, Stephen A.; Goss, Kai-Uwe; Li, Jingyi; Ying, Qi; Wania, Frank

2017-06-01

Gas-particle partitioning governs the distribution, removal, and transport of organic compounds in the atmosphere and the formation of secondary organic aerosol (SOA). The large variety of atmospheric species and their wide range of properties make predicting this partitioning equilibrium challenging. Here we expand on earlier work and predict gas-organic and gas-aqueous phase partitioning coefficients for 3414 atmospherically relevant molecules using COSMOtherm, SPARC Performs Automated Reasoning in Chemistry (SPARC), and poly-parameter linear free-energy relationships. The Master Chemical Mechanism generated the structures by oxidizing primary emitted volatile organic compounds. Predictions for gas-organic phase partitioning coefficients (KWIOM/G) by different methods are on average within 1 order of magnitude of each other, irrespective of the numbers of functional groups, except for predictions by COSMOtherm and SPARC for compounds with more than three functional groups, which have a slightly higher discrepancy. Discrepancies between predictions of gas-aqueous partitioning (KW/G) are much larger and increase with the number of functional groups in the molecule. In particular, COSMOtherm often predicts much lower KW/G for highly functionalized compounds than the other methods. While the quantum-chemistry-based COSMOtherm accounts for the influence of intra-molecular interactions on conformation, highly functionalized molecules likely fall outside of the applicability domain of the other techniques, which at least in part rely on empirical data for calibration. Further analysis suggests that atmospheric phase distribution calculations are sensitive to the partitioning coefficient estimation method, in particular to the estimated value of KW/G. The large uncertainty in KW/G predictions for highly functionalized organic compounds needs to be resolved to improve the quantitative treatment of SOA formation.
Programmable partitioning for high-performance coherence domains in a multiprocessor system

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY

2011-01-25

A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.

Random Partition Distribution Indexed by Pairwise Information

PubMed Central

Dahl, David B.; Day, Ryan; Tsai, Jerry W.

2017-01-01

We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318
Tensor decomposition-based and principal-component-analysis-based unsupervised feature extraction applied to the gene expression and methylation profiles in the brains of social insects with multiple castes.

PubMed

Taguchi, Y-H

2018-05-08

Even though coexistence of multiple phenotypes sharing the same genomic background is interesting, it remains incompletely understood. Epigenomic profiles may represent key factors, with unknown contributions to the development of multiple phenotypes, and social-insect castes are a good model for elucidation of the underlying mechanisms. Nonetheless, previous studies have failed to identify genes associated with aberrant gene expression and methylation profiles because of the lack of suitable methodology that can address this problem properly. A recently proposed principal component analysis (PCA)-based and tensor decomposition (TD)-based unsupervised feature extraction (FE) can solve this problem because these two approaches can deal with gene expression and methylation profiles even when a small number of samples is available. PCA-based and TD-based unsupervised FE methods were applied to the analysis of gene expression and methylation profiles in the brains of two social insects, Polistes canadensis and Dinoponera quadriceps. Genes associated with differential expression and methylation between castes were identified, and analysis of enrichment of Gene Ontology terms confirmed reliability of the obtained sets of genes from the biological standpoint. Biologically relevant genes, shown to be associated with significant differential gene expression and methylation between castes, were identified here for the first time. The identification of these genes may help understand the mechanisms underlying epigenetic control of development of multiple phenotypes under the same genomic conditions.
Determination of partition coefficients using 1 H NMR spectroscopy and time domain complete reduction to amplitude-frequency table (CRAFT) analysis.

PubMed

Soulsby, David; Chica, Jeryl A M

2017-08-01

We have developed a simple, direct and novel method for the determination of partition coefficients and partitioning behavior using 1 H NMR spectroscopy combined with time domain complete reduction to amplitude-frequency tables (CRAFT). After partitioning into water and 1-octanol using standard methods, aliquots from each layer are directly analyzed using either proton or selective excitation NMR experiments. Signal amplitudes for each compound from each layer are then extracted directly from the time domain data in an automated fashion and analyzed using the CRAFT software. From these amplitudes, log P and log D 7.4 values can be calculated directly. Phase, baseline and internal standard issues, which can be problematic when Fourier transformed data are used, are unimportant when using time domain data. Furthermore, analytes can contain impurities because only a single resonance is examined and need not be UV active. Using this approach, we examined a variety of pharmaceutically relevant compounds and determined partition coefficients that are in excellent agreement with literature values. To demonstrate the utility of this approach, we also examined salicylic acid in more detail demonstrating an aggregation effect as a function of sample loading and partition coefficient behavior as a function of pH value. This method provides a valuable addition to the medicinal chemist toolbox for determining these important constants. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Relationships of parental monitoring and emotion regulation with early adolescents' sexual behaviors.

PubMed

Hadley, Wendy; Houck, Christopher D; Barker, David; Senocak, Natali

2015-06-01

The purpose of this study was to examine the moderating influence of parental monitoring (e.g., unsupervised time with opposite sex peers) and adolescent emotional competence on sexual behaviors, among a sample of at-risk early adolescents. This study included 376 seventh-grade adolescents (age, 12-14 years) with behavioral or emotional difficulties. Questionnaires were completed on private laptop computers and assessed adolescent Emotional Competence (including Regulation and Negativity/Lability), Unsupervised Time, and a range of Sexual Behaviors. Generalized linear models were used to evaluate the independent and combined influence of Emotional Competency and Unsupervised Time on adolescent report of Sexual Behaviors. Analyses were stratified by gender to account for the notable gender differences in the targeted moderators and outcome variables. Findings indicated that more unsupervised time was a risk factor for all youth but was influenced by an adolescent's ability to regulate their emotions. Specifically, for males and females, poorer Emotion Regulation was associated with having engaged in a greater variety of Sexual Behaviors. However, lower Negativity/Lability and >1× per week Unsupervised Time were associated with a higher number of sexual behaviors among females only. Based on the findings of this study, a lack of parental supervision seems to be particularly problematic for both male and female adolescents with poor emotion regulation abilities. It may be important to impact both emotion regulation abilities and increase parental knowledge and skills associated with effective monitoring to reduce risk-taking for these youth.
Effects of an off-season conditioning program on the physical characteristics of adolescent rugby union players.

PubMed

Smart, Daniel J; Gill, Nicholas D

2013-03-01

The aims of the study were to determine if a supervised off-season conditioning program enhanced gains in physical characteristics compared with the same program performed in an unsupervised manner and to establish the persistence of the physical changes after a 6-month unsupervised competition period. Forty-four provincial representative adolescent rugby union players (age, mean ± SD, 15.3 ± 1.3 years) participated in a 15-week off-season conditioning program either under supervision from an experienced strength and conditioning coach or unsupervised. Measures of body composition, strength, vertical jump, speed, and anaerobic and aerobic running performance were taken, before, immediately after, and 6 months after the conditioning. Post conditioning program the supervised group had greater improvements in all strength measures than the unsupervised group, with small, moderate and large differences between the groups\\x{2019} changes for chin-ups (9.1%; ± 11.6%), bench-press (16.9%; ± 11.7%) and box-squat (50.4%; ± 20.9%) estimated 1RM respectively. Both groups showed trivial increases in mass; however increases in fat free mass were small and trivial for supervised and unsupervised players respectively. Strength declined in the supervised group while the unsupervised group had small increases during the competition phase, resulting in only a small difference between the long-term changes in box-squat 1RM (15.9%; ± 13.2%). The supervised group had further small increases in fat free mass resulting in a small difference (2.4%; ± 2.7%) in the long-term changes. The postconditioning differences between the 2 groups may have been a result of increased adherence and the attainment of higher training loads during supervised training. The lack of differences in strength after the competition period indicates that supervision should be maintained to reduce substantial decrements in performance.
Information-theoretic indices usage for the prediction and calculation of octanol-water partition coefficient.

PubMed

Persona, Marek; Kutarov, Vladimir V; Kats, Boris M; Persona, Andrzej; Marczewska, Barbara

2007-01-01

The paper describes the new prediction method of octanol-water partition coefficient, which is based on molecular graph theory. The results obtained using the new method are well correlated with experimental values. These results were compared with the ones obtained by use of ten other structure correlated methods. The comparison shows that graph theory can be very useful in structure correlation research.
Unsupervised Learning of Overlapping Image Components Using Divisive Input Modulation

PubMed Central

Spratling, M. W.; De Meyer, K.; Kompass, R.

2009-01-01

This paper demonstrates that nonnegative matrix factorisation is mathematically related to a class of neural networks that employ negative feedback as a mechanism of competition. This observation inspires a novel learning algorithm which we call Divisive Input Modulation (DIM). The proposed algorithm provides a mathematically simple and computationally efficient method for the unsupervised learning of image components, even in conditions where these elementary features overlap considerably. To test the proposed algorithm, a novel artificial task is introduced which is similar to the frequently-used bars problem but employs squares rather than bars to increase the degree of overlap between components. Using this task, we investigate how the proposed method performs on the parsing of artificial images composed of overlapping features, given the correct representation of the individual components; and secondly, we investigate how well it can learn the elementary components from artificial training images. We compare the performance of the proposed algorithm with its predecessors including variations on these algorithms that have produced state-of-the-art performance on the bars problem. The proposed algorithm is more successful than its predecessors in dealing with overlap and occlusion in the artificial task that has been used to assess performance. PMID:19424442
Robust extraction of basis functions for simultaneous and proportional myoelectric control via sparse non-negative matrix factorization

NASA Astrophysics Data System (ADS)

Lin, Chuang; Wang, Binghui; Jiang, Ning; Farina, Dario

2018-04-01

Objective. This paper proposes a novel simultaneous and proportional multiple degree of freedom (DOF) myoelectric control method for active prostheses. Approach. The approach is based on non-negative matrix factorization (NMF) of surface EMG signals with the inclusion of sparseness constraints. By applying a sparseness constraint to the control signal matrix, it is possible to extract the basis information from arbitrary movements (quasi-unsupervised approach) for multiple DOFs concurrently. Main Results. In online testing based on target hitting, able-bodied subjects reached a greater throughput (TP) when using sparse NMF (SNMF) than with classic NMF or with linear regression (LR). Accordingly, the completion time (CT) was shorter for SNMF than NMF or LR. The same observations were made in two patients with unilateral limb deficiencies. Significance. The addition of sparseness constraints to NMF allows for a quasi-unsupervised approach to myoelectric control with superior results with respect to previous methods for the simultaneous and proportional control of multi-DOF. The proposed factorization algorithm allows robust simultaneous and proportional control, is superior to previous supervised algorithms, and, because of minimal supervision, paves the way to online adaptation in myoelectric control.
Comparison of unsupervised classification methods for brain tumor segmentation using multi-parametric MRI.

PubMed

Sauwen, N; Acou, M; Van Cauter, S; Sima, D M; Veraart, J; Maes, F; Himmelreich, U; Achten, E; Van Huffel, S

2016-01-01

Tumor segmentation is a particularly challenging task in high-grade gliomas (HGGs), as they are among the most heterogeneous tumors in oncology. An accurate delineation of the lesion and its main subcomponents contributes to optimal treatment planning, prognosis and follow-up. Conventional MRI (cMRI) is the imaging modality of choice for manual segmentation, and is also considered in the vast majority of automated segmentation studies. Advanced MRI modalities such as perfusion-weighted imaging (PWI), diffusion-weighted imaging (DWI) and magnetic resonance spectroscopic imaging (MRSI) have already shown their added value in tumor tissue characterization, hence there have been recent suggestions of combining different MRI modalities into a multi-parametric MRI (MP-MRI) approach for brain tumor segmentation. In this paper, we compare the performance of several unsupervised classification methods for HGG segmentation based on MP-MRI data including cMRI, DWI, MRSI and PWI. Two independent MP-MRI datasets with a different acquisition protocol were available from different hospitals. We demonstrate that a hierarchical non-negative matrix factorization variant which was previously introduced for MP-MRI tumor segmentation gives the best performance in terms of mean Dice-scores for the pathologic tissue classes on both datasets.
Automatic microseismic event picking via unsupervised machine learning

NASA Astrophysics Data System (ADS)

Chen, Yangkang

2018-01-01

Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.
Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder.

PubMed

Song, Jingkuan; Zhang, Hanwang; Li, Xiangpeng; Gao, Lianli; Wang, Meng; Hong, Richang

2018-07-01

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed self-supervised video hashing (SSVH), which is able to capture the temporal nature of videos in an end-to-end learning to hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary auto-encoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world data sets show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the current best performance on the task of unsupervised video retrieval.
Classification of neocortical interneurons using affinity propagation.

PubMed

Santana, Roberto; McGarry, Laura M; Bielza, Concha; Larrañaga, Pedro; Yuste, Rafael

2013-01-01

In spite of over a century of research on cortical circuits, it is still unknown how many classes of cortical neurons exist. In fact, neuronal classification is a difficult problem because it is unclear how to designate a neuronal cell class and what are the best characteristics to define them. Recently, unsupervised classifications using cluster analysis based on morphological, physiological, or molecular characteristics, have provided quantitative and unbiased identification of distinct neuronal subtypes, when applied to selected datasets. However, better and more robust classification methods are needed for increasingly complex and larger datasets. Here, we explored the use of affinity propagation, a recently developed unsupervised classification algorithm imported from machine learning, which gives a representative example or exemplar for each cluster. As a case study, we applied affinity propagation to a test dataset of 337 interneurons belonging to four subtypes, previously identified based on morphological and physiological characteristics. We found that affinity propagation correctly classified most of the neurons in a blind, non-supervised manner. Affinity propagation outperformed Ward's method, a current standard clustering approach, in classifying the neurons into 4 subtypes. Affinity propagation could therefore be used in future studies to validly classify neurons, as a first step to help reverse engineer neural circuits.
Geological applications of machine learning on hyperspectral remote sensing data

NASA Astrophysics Data System (ADS)

Tse, C. H.; Li, Yi-liang; Lam, Edmund Y.

2015-02-01

The CRISM imaging spectrometer orbiting Mars has been producing a vast amount of data in the visible to infrared wavelengths in the form of hyperspectral data cubes. These data, compared with those obtained from previous remote sensing techniques, yield an unprecedented level of detailed spectral resolution in additional to an ever increasing level of spatial information. A major challenge brought about by the data is the burden of processing and interpreting these datasets and extract the relevant information from it. This research aims at approaching the challenge by exploring machine learning methods especially unsupervised learning to achieve cluster density estimation and classification, and ultimately devising an efficient means leading to identification of minerals. A set of software tools have been constructed by Python to access and experiment with CRISM hyperspectral cubes selected from two specific Mars locations. A machine learning pipeline is proposed and unsupervised learning methods were implemented onto pre-processed datasets. The resulting data clusters are compared with the published ASTER spectral library and browse data products from the Planetary Data System (PDS). The result demonstrated that this approach is capable of processing the huge amount of hyperspectral data and potentially providing guidance to scientists for more detailed studies.
Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets.

PubMed

Argelaguet, Ricard; Velten, Britta; Arnol, Damien; Dietrich, Sascha; Zenz, Thorsten; Marioni, John C; Buettner, Florian; Huber, Wolfgang; Stegle, Oliver

2018-06-20

Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation. © 2018 The Authors. Published under the terms of the CC BY 4.0 license.
Self-Supervised Video Hashing With Hierarchical Binary Auto-Encoder

NASA Astrophysics Data System (ADS)

Song, Jingkuan; Zhang, Hanwang; Li, Xiangpeng; Gao, Lianli; Wang, Meng; Hong, Richang

2018-07-01

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss. In this paper, we propose a novel unsupervised video hashing framework dubbed Self-Supervised Video Hashing (SSVH), that is able to capture the temporal nature of videos in an end-to-end learning-to-hash fashion. We specifically address two central problems: 1) how to design an encoder-decoder architecture to generate binary codes for videos; and 2) how to equip the binary codes with the ability of accurate video retrieval. We design a hierarchical binary autoencoder to model the temporal dependencies in videos with multiple granularities, and embed the videos into binary codes with less computations than the stacked architecture. Then, we encourage the binary codes to simultaneously reconstruct the visual content and neighborhood structure of the videos. Experiments on two real-world datasets (FCVID and YFCC) show that our SSVH method can significantly outperform the state-of-the-art methods and achieve the currently best performance on the task of unsupervised video retrieval.
Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline.

PubMed

Zhang, Jie; Li, Qingyang; Caselli, Richard J; Thompson, Paul M; Ye, Jieping; Wang, Yalin

2017-06-01

Alzheimer's Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms.
Ensemble Semi-supervised Frame-work for Brain Magnetic Resonance Imaging Tissue Segmentation

PubMed Central

Azmi, Reza; Pishgoo, Boshra; Norozi, Narges; Yeganeh, Samira

2013-01-01

Brain magnetic resonance images (MRIs) tissue segmentation is one of the most important parts of the clinical diagnostic tools. Pixel classification methods have been frequently used in the image segmentation with two supervised and unsupervised approaches up to now. Supervised segmentation methods lead to high accuracy, but they need a large amount of labeled data, which is hard, expensive, and slow to obtain. Moreover, they cannot use unlabeled data to train classifiers. On the other hand, unsupervised segmentation methods have no prior knowledge and lead to low level of performance. However, semi-supervised learning which uses a few labeled data together with a large amount of unlabeled data causes higher accuracy with less trouble. In this paper, we propose an ensemble semi-supervised frame-work for segmenting of brain magnetic resonance imaging (MRI) tissues that it has been used results of several semi-supervised classifiers simultaneously. Selecting appropriate classifiers has a significant role in the performance of this frame-work. Hence, in this paper, we present two semi-supervised algorithms expectation filtering maximization and MCo_Training that are improved versions of semi-supervised methods expectation maximization and Co_Training and increase segmentation accuracy. Afterward, we use these improved classifiers together with graph-based semi-supervised classifier as components of the ensemble frame-work. Experimental results show that performance of segmentation in this approach is higher than both supervised methods and the individual semi-supervised classifiers. PMID:24098863
A novel method for the determination of adsorption partition coefficients of minor gases in a shale sample by headspace gas chromatography.

PubMed

Zhang, Chun-Yun; Hu, Hui-Chao; Chai, Xin-Sheng; Pan, Lei; Xiao, Xian-Ming

2013-10-04

A novel method has been developed for the determination of adsorption partition coefficient (Kd) of minor gases in shale. The method uses samples of two different sizes (masses) of the same material, from which the partition coefficient of the gas can be determined from two independent headspace gas chromatographic (HS-GC) measurements. The equilibrium for the model gas (ethane) was achieved in 5h at 120°C. The method also involves establishing an equation based on the Kd at higher equilibrium temperature, from which the Kd at lower temperature can be calculated. Although the HS-GC method requires some time and effort, it is simpler and quicker than the isothermal adsorption method that is in widespread use today. As a result, the method is simple and practical and can be a valuable tool for shale gas-related research and applications. Copyright © 2013 Elsevier B.V. All rights reserved.
Combining Unsupervised and Supervised Classification to Build User Models for Exploratory Learning Environments

ERIC Educational Resources Information Center

Amershi, Saleema; Conati, Cristina

2009-01-01

In this paper, we present a data-based user modeling framework that uses both unsupervised and supervised classification to build student models for exploratory learning environments. We apply the framework to build student models for two different learning environments and using two different data sources (logged interface and eye-tracking data).…
Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation

ERIC Educational Resources Information Center

Hinton, Geoffrey; Osindero, Simon; Welling, Max; Teh, Yee-Whye

2006-01-01

We describe a way of modeling high-dimensional data vectors by using an unsupervised, nonlinear, multilayer neural network in which the activity of each neuron-like unit makes an additive contribution to a global energy score that indicates how surprised the network is by the data vector. The connection weights that determine how the activity of…

Validation of Unsupervised Computer-Based Screening for Reading Disability in Greek Elementary Grades 3 and 4

ERIC Educational Resources Information Center

Protopapas, Athanassios; Skaloumbakas, Christos; Bali, Persefoni

2008-01-01

After reviewing past efforts related to computer-based reading disability (RD) assessment, we present a fully automated screening battery that evaluates critical skills relevant for RD diagnosis designed for unsupervised application in the Greek educational system. Psychometric validation in 301 children, 8-10 years old (grades 3 and 4; including…
New Parallel Algorithms for Landscape Evolution Model

NASA Astrophysics Data System (ADS)

Jin, Y.; Zhang, H.; Shi, Y.

2017-12-01

Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
Unsupervised classification of remote multispectral sensing data

NASA Technical Reports Server (NTRS)

Su, M. Y.

1972-01-01

The new unsupervised classification technique for classifying multispectral remote sensing data which can be either from the multispectral scanner or digitized color-separation aerial photographs consists of two parts: (a) a sequential statistical clustering which is a one-pass sequential variance analysis and (b) a generalized K-means clustering. In this composite clustering technique, the output of (a) is a set of initial clusters which are input to (b) for further improvement by an iterative scheme. Applications of the technique using an IBM-7094 computer on multispectral data sets over Purdue's Flight Line C-1 and the Yellowstone National Park test site have been accomplished. Comparisons between the classification maps by the unsupervised technique and the supervised maximum liklihood technique indicate that the classification accuracies are in agreement.
Unsupervised learning in probabilistic neural networks with multi-state metal-oxide memristive synapses

NASA Astrophysics Data System (ADS)

Serb, Alexander; Bill, Johannes; Khiat, Ali; Berdan, Radu; Legenstein, Robert; Prodromakis, Themis

2016-09-01

In an increasingly data-rich world the need for developing computing systems that cannot only process, but ideally also interpret big data is becoming continuously more pressing. Brain-inspired concepts have shown great promise towards addressing this need. Here we demonstrate unsupervised learning in a probabilistic neural network that utilizes metal-oxide memristive devices as multi-state synapses. Our approach can be exploited for processing unlabelled data and can adapt to time-varying clusters that underlie incoming data by supporting the capability of reversible unsupervised learning. The potential of this work is showcased through the demonstration of successful learning in the presence of corrupted input data and probabilistic neurons, thus paving the way towards robust big-data processors.
Classification of earth terrain using polarimetric synthetic aperture radar images

NASA Technical Reports Server (NTRS)

Lim, H. H.; Swartz, A. A.; Yueh, H. A.; Kong, J. A.; Shin, R. T.; Van Zyl, J. J.

1989-01-01

Supervised and unsupervised classification techniques are developed and used to classify the earth terrain components from SAR polarimetric images of San Francisco Bay and Traverse City, Michigan. The supervised techniques include the Bayes classifiers, normalized polarimetric classification, and simple feature classification using discriminates such as the absolute and normalized magnitude response of individual receiver channel returns and the phase difference between receiver channels. An algorithm is developed as an unsupervised technique which classifies terrain elements based on the relationship between the orientation angle and the handedness of the transmitting and receiving polariation states. It is found that supervised classification produces the best results when accurate classifier training data are used, while unsupervised classification may be applied when training data are not available.
Convex Regression with Interpretable Sharp Partitions

PubMed Central

Petersen, Ashley; Simon, Noah; Witten, Daniela

2016-01-01

We consider the problem of predicting an outcome variable on the basis of a small number of covariates, using an interpretable yet non-additive model. We propose convex regression with interpretable sharp partitions (CRISP) for this task. CRISP partitions the covariate space into blocks in a data-adaptive way, and fits a mean model within each block. Unlike other partitioning methods, CRISP is fit using a non-greedy approach by solving a convex optimization problem, resulting in low-variance fits. We explore the properties of CRISP, and evaluate its performance in a simulation study and on a housing price data set. PMID:27635120
Instructional Videos for Unsupervised Harvesting and Learning of Action Examples

DTIC Science & Technology

2014-11-03

collection of image or video anno - tations has been tackled in different ways, but most existing methods still require a human in the loop. The...the views of ARO and NSF. 7. REFERENCES [1] C.-C. Chang and C.- J . Lin. LIBSVM: A library for support vector machines. In ACM Transactions on...feature encoding methods. In BMVC, 2011. [3] J . Chen, Y. Cui, G. Ye, D. Liu, and S.-F. Chang. Event-driven semantic concept discovery by exploiting
Methods of Sparse Modeling and Dimensionality Reduction to Deal with Big Data

DTIC Science & Technology

2015-04-01

supervised learning (c). Our framework consists of two separate phases: (a) first find an initial space in an unsupervised manner; then (b) utilize label...model that can learn thousands of topics from a large set of documents and infer the topic mixture of each document, 2) a supervised dimension reduction...model that can learn thousands of topics from a large set of documents and infer the topic mixture of each document, (i) a method of supervised
Unsupervised user similarity mining in GSM sensor networks.

PubMed

Shad, Shafqat Ali; Chen, Enhong

2013-01-01

Mobility data has attracted the researchers for the past few years because of its rich context and spatiotemporal nature, where this information can be used for potential applications like early warning system, route prediction, traffic management, advertisement, social networking, and community finding. All the mentioned applications are based on mobility profile building and user trend analysis, where mobility profile building is done through significant places extraction, user's actual movement prediction, and context awareness. However, significant places extraction and user's actual movement prediction for mobility profile building are a trivial task. In this paper, we present the user similarity mining-based methodology through user mobility profile building by using the semantic tagging information provided by user and basic GSM network architecture properties based on unsupervised clustering approach. As the mobility information is in low-level raw form, our proposed methodology successfully converts it to a high-level meaningful information by using the cell-Id location information rather than previously used location capturing methods like GPS, Infrared, and Wifi for profile mining and user similarity mining.
Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection.

PubMed

Zhu, Xiaofeng; Li, Xuelong; Zhang, Shichao; Ju, Chunhua; Wu, Xindong

2017-06-01

In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k -nearest neighbor classification performance.
Transformer fault diagnosis using continuous sparse autoencoder.

PubMed

Wang, Lukun; Zhao, Xiaoying; Pei, Jiangnan; Tang, Gongyou

2016-01-01

This paper proposes a novel continuous sparse autoencoder (CSAE) which can be used in unsupervised feature learning. The CSAE adds Gaussian stochastic unit into activation function to extract features of nonlinear data. In this paper, CSAE is applied to solve the problem of transformer fault recognition. Firstly, based on dissolved gas analysis method, IEC three ratios are calculated by the concentrations of dissolved gases. Then IEC three ratios data is normalized to reduce data singularity and improve training speed. Secondly, deep belief network is established by two layers of CSAE and one layer of back propagation (BP) network. Thirdly, CSAE is adopted to unsupervised training and getting features. Then BP network is used for supervised training and getting transformer fault. Finally, the experimental data from IEC TC 10 dataset aims to illustrate the effectiveness of the presented approach. Comparative experiments clearly show that CSAE can extract features from the original data, and achieve a superior correct differentiation rate on transformer fault diagnosis.
Association between mapped vegetation and Quaternary geology on Santa Rosa Island, California

NASA Astrophysics Data System (ADS)

Cronkite-Ratcliff, C.; Corbett, S.; Schmidt, K. M.

2017-12-01

Vegetation and surficial geology are closely connected through the interface generally referred to as the critical zone. Not only do they influence each other, but they also provide clues into the effects of climate, topography, and hydrology on the earth's surface. This presentation describes quantitative analyses of the association between the recently compiled, independently generated vegetation and geologic map units on Santa Rosa Island, part of the Channel Islands National Park in Southern California. Santa Rosa Island was heavily grazed by sheep and cattle ranching for over one hundred years prior to its acquisition by the National Park Service. During this period, the island experienced significant erosion and spatial reduction and diversity of native plant species. Understanding the relationship between geology and vegetation is necessary for monitoring the recovery of native plant species, enhancing the viability of restoration sites, and understanding hydrologic conditions favorable for plant growth. Differences in grain size distribution and soil depth between geologic units support different plant communities through their influence on soil moisture, while differences in unit age reflect different degrees of pedogenic maturity. We find that unsupervised machine learning methods provide more informative insight into vegetation-geology associations than traditional measures such as Cramer's V and Goodman and Kruskal's lambda. Correspondence analysis shows that unique vegetation-geology patterns associated with beach/dune, grassland, hillslope/colluvial, and fluvial/wetland environments can be discerned from the data. By combining geology and vegetation with topographic variables, mixture models can be used to partition the landscape into multiple representative types, which then be compared with conceptual models of plant growth and succession over different landforms. Using this collection of methods, we show various ways that that Quaternary geology provides valuable information on the distribution of vegetation species in recovering ecosystems. Going forward, these analyses provide insights on favorable areas for natural and managed recovery of native vegetation species as well as criteria for future field sampling and monitoring.
Semi-implicit time integration of atmospheric flows with characteristic-based flux partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghosh, Debojyoti; Constantinescu, Emil M.

2016-06-23

Here, this paper presents a characteristic-based flux partitioning for the semi-implicit time integration of atmospheric flows. Nonhydrostatic models require the solution of the compressible Euler equations. The acoustic time scale is significantly faster than the advective scale, yet it is typically not relevant to atmospheric and weather phenomena. The acoustic and advective components of the hyperbolic flux are separated in the characteristic space. High-order, conservative additive Runge-Kutta methods are applied to the partitioned equations so that the acoustic component is integrated in time implicitly with an unconditionally stable method, while the advective component is integrated explicitly. The time step ofmore » the overall algorithm is thus determined by the advective scale. Benchmark flow problems are used to demonstrate the accuracy, stability, and convergence of the proposed algorithm. The computational cost of the partitioned semi-implicit approach is compared with that of explicit time integration.« less
Machine-learning phenotypic classification of bicuspid aortopathy.

PubMed

Wojnarski, Charles M; Roselli, Eric E; Idrees, Jay J; Zhu, Yuanjia; Carnes, Theresa A; Lowry, Ashley M; Collier, Patrick H; Griffin, Brian; Ehrlinger, John; Blackstone, Eugene H; Svensson, Lars G; Lytle, Bruce W

2018-02-01

Bicuspid aortic valves (BAV) are associated with incompletely characterized aortopathy. Our objectives were to identify distinct patterns of aortopathy using machine-learning methods and characterize their association with valve morphology and patient characteristics. We analyzed preoperative 3-dimensional computed tomography reconstructions for 656 patients with BAV undergoing ascending aorta surgery between January 2002 and January 2014. Unsupervised partitioning around medoids was used to cluster aortic dimensions. Group differences were identified using polytomous random forest analysis. Three distinct aneurysm phenotypes were identified: root (n = 83; 13%), with predominant dilatation at sinuses of Valsalva; ascending (n = 364; 55%), with supracoronary enlargement rarely extending past the brachiocephalic artery; and arch (n = 209; 32%), with aortic arch dilatation. The arch phenotype had the greatest association with right-noncoronary cusp fusion: 29%, versus 13% for ascending and 15% for root phenotypes (P < .0001). Severe valve regurgitation was most prevalent in root phenotype (57%), followed by ascending (34%) and arch phenotypes (25%; P < .0001). Aortic stenosis was most prevalent in arch phenotype (62%), followed by ascending (50%) and root phenotypes (28%; P < .0001). Patient age increased as the extent of aneurysm became more distal (root, 49 years; ascending, 53 years; arch, 57 years; P < .0001), and root phenotype was associated with greater male predominance compared with ascending and arch phenotypes (94%, 76%, and 70%, respectively; P < .0001). Phenotypes were visually recognizable with 94% accuracy. Three distinct phenotypes of bicuspid valve-associated aortopathy were identified using machine-learning methodology. Patient characteristics and valvular dysfunction vary by phenotype, suggesting that the location of aortic pathology may be related to the underlying pathophysiology of this disease. Copyright © 2017 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.
Satellite altimetry in sea ice regions - detecting open water for estimating sea surface heights

NASA Astrophysics Data System (ADS)

Müller, Felix L.; Dettmering, Denise; Bosch, Wolfgang

2017-04-01

The Greenland Sea and the Farm Strait are transporting sea ice from the central Arctic ocean southwards. They are covered by a dynamic changing sea ice layer with significant influences on the Earth climate system. Between the sea ice there exist various sized open water areas known as leads, straight lined open water areas, and polynyas exhibiting a circular shape. Identifying these leads by satellite altimetry enables the extraction of sea surface height information. Analyzing the radar echoes, also called waveforms, provides information on the surface backscatter characteristics. For example waveforms reflected by calm water have a very narrow and single-peaked shape. Waveforms reflected by sea ice show more variability due to diffuse scattering. Here we analyze altimeter waveforms from different conventional pulse-limited satellite altimeters to separate open water and sea ice waveforms. An unsupervised classification approach employing partitional clustering algorithms such as K-medoids and memory-based classification methods such as K-nearest neighbor is used. The classification is based on six parameters derived from the waveform's shape, for example the maximum power or the peak's width. The open-water detection is quantitatively compared to SAR images processed while accounting for sea ice motion. The classification results are used to derive information about the temporal evolution of sea ice extent and sea surface heights. They allow to provide evidence on climate change relevant influences as for example Arctic sea level rise due to enhanced melting rates of Greenland's glaciers and an increasing fresh water influx into the Arctic ocean. Additionally, the sea ice cover extent analyzed over a long-time period provides an important indicator for a globally changing climate system.
A comparison of two methods for determining copper partitioning in oxidized sediments

USGS Publications Warehouse

Luoma, S.N.

1986-01-01

Model estimations of the proportion of Cu in oxidized sediments associated with extractable organic materials show some agreement with the proportion of Cu extracted from those sediments with ammonium hydroxide. Data were from 17 estuaries of widely differing sediment chemistry. The modelling and extraction methods agreed best where concentrations of organic materials were either in very high concentrations, relative to other sediment components, or in very low concentrations. In the range of component concentrations where the model predicted Cu should be distributed among a variety of components, agreement between the methods was poor. Both approaches indicated that Cu was predominantly partitioned to organic materials in some sediments, and predominantly partitioned to other components (most probably iron oxides and manganese oxides) in other sediments, and that these differences were related to the relative abundances of the specific components in the sediment. Although the results of the two methods of estimating Cu partitioning to organics correlated significantly among 24 stations from the 17 estuaries, the variability in the relationship suggested refinement of parameter values and verification of some important assumptions were essential to the further development of a reasonable model. ?? 1986.
Non-negative matrix factorisation methods for the spectral decomposition of MRS data from human brain tumours

PubMed Central

2012-01-01

Background In-vivo single voxel proton magnetic resonance spectroscopy (SV 1H-MRS), coupled with supervised pattern recognition (PR) methods, has been widely used in clinical studies of discrimination of brain tumour types and follow-up of patients bearing abnormal brain masses. SV 1H-MRS provides useful biochemical information about the metabolic state of tumours and can be performed at short (< 45 ms) or long (> 45 ms) echo time (TE), each with particular advantages. Short-TE spectra are more adequate for detecting lipids, while the long-TE provides a much flatter signal baseline in between peaks but also negative signals for metabolites such as lactate. Both, lipids and lactate, are respectively indicative of specific metabolic processes taking place. Ideally, the information provided by both TE should be of use for clinical purposes. In this study, we characterise the performance of a range of Non-negative Matrix Factorisation (NMF) methods in two respects: first, to derive sources correlated with the mean spectra of known tissue types (tumours and normal tissue); second, taking the best performing NMF method for source separation, we compare its accuracy for class assignment when using the mixing matrix directly as a basis for classification, as against using the method for dimensionality reduction (DR). For this, we used SV 1H-MRS data with positive and negative peaks, from a widely tested SV 1H-MRS human brain tumour database. Results The results reported in this paper reveal the advantage of using a recently described variant of NMF, namely Convex-NMF, as an unsupervised method of source extraction from SV1H-MRS. Most of the sources extracted in our experiments closely correspond to the mean spectra of some of the analysed tumour types. This similarity allows accurate diagnostic predictions to be made both in fully unsupervised mode and using Convex-NMF as a DR step previous to standard supervised classification. The obtained results are comparable to, or more accurate than those obtained with supervised techniques. Conclusions The unsupervised properties of Convex-NMF place this approach one step ahead of classical label-requiring supervised methods for the discrimination of brain tumour types, as it accounts for their increasingly recognised molecular subtype heterogeneity. The application of Convex-NMF in computer assisted decision support systems is expected to facilitate further improvements in the uptake of MRS-derived information by clinicians. PMID:22401579
Parametric symplectic partitioned Runge-Kutta methods with energy-preserving properties for Hamiltonian systems

NASA Astrophysics Data System (ADS)

Wang, Dongling; Xiao, Aiguo; Li, Xueyang

2013-02-01

Based on W-transformation, some parametric symplectic partitioned Runge-Kutta (PRK) methods depending on a real parameter α are developed. For α=0, the corresponding methods become the usual PRK methods, including Radau IA-IA¯ and Lobatto IIIA-IIIB methods as examples. For any α≠0, the corresponding methods are symplectic and there exists a value α∗ such that energy is preserved in the numerical solution at each step. The existence of the parameter and the order of the numerical methods are discussed. Some numerical examples are presented to illustrate these results.
Weak-value amplification and optimal parameter estimation in the presence of correlated noise

NASA Astrophysics Data System (ADS)

Sinclair, Josiah; Hallaji, Matin; Steinberg, Aephraim M.; Tollaksen, Jeff; Jordan, Andrew N.

2017-11-01

We analytically and numerically investigate the performance of weak-value amplification (WVA) and related parameter estimation methods in the presence of temporally correlated noise. WVA is a special instance of a general measurement strategy that involves sorting data into separate subsets based on the outcome of a second "partitioning" measurement. Using a simplified correlated noise model that can be analyzed exactly together with optimal statistical estimators, we compare WVA to a conventional measurement method. We find that WVA indeed yields a much lower variance of the parameter of interest than the conventional technique does, optimized in the absence of any partitioning measurements. In contrast, a statistically optimal analysis that employs partitioning measurements, incorporating all partitioned results and their known correlations, is found to yield an improvement—typically slight—over the noise reduction achieved by WVA. This result occurs because the simple WVA technique is not tailored to any specific noise environment and therefore does not make use of correlations between the different partitions. We also compare WVA to traditional background subtraction, a familiar technique where measurement outcomes are partitioned to eliminate unknown offsets or errors in calibration. Surprisingly, for the cases we consider, background subtraction turns out to be a special case of the optimal partitioning approach, possessing a similar typically slight advantage over WVA. These results give deeper insight into the role of partitioning measurements (with or without postselection) in enhancing measurement precision, which some have found puzzling. They also resolve previously made conflicting claims about the usefulness of weak-value amplification to precision measurement in the presence of correlated noise. We finish by presenting numerical results to model a more realistic laboratory situation of time-decaying correlations, showing that our conclusions hold for a wide range of statistical models.
Pelvic floor muscle exercises utilizing trunk stabilization for treating postpartum urinary incontinence: randomized controlled pilot trial of supervised versus unsupervised training.

PubMed

Kim, Eun-Young; Kim, Suhn-Yeop; Oh, Duck-Won

2012-02-01

To investigate the effect of supervised and unsupervised pelvic floor muscle exercises utilizing trunk stabilization for treating postpartum urinary incontinence and to compare the outcomes. Randomized, single-blind controlled study. Outpatient rehabilitation hospital. Eighteen subjects with postpartum urinary incontinence. Subjects were randomized to either a supervised training group with verbal instruction from a physiotherapist, or an unsupervised training group after undergoing a supervised demonstration session. Bristol Female Lower Urinary Tract Symptom questionnaire (urinary symptoms and quality of life) and vaginal function test (maximal vaginal squeeze pressure and holding time) using a perineometer. The change values for urinary symptoms (-27.22 ± 6.20 versus -18.22 ± 5.49), quality of life (-5.33 ± 2.96 versus -1.78 ± 3.93), total score (-32.56 ± 8.17 versus -20.00 ± 6.67), maximal vaginal squeeze pressure (18.96 ± 9.08 versus 2.67 ± 3.64 mmHg), and holding time (11.32 ± 3.17 versus 5.72 ± 2.29 seconds) were more improved in the supervised group than in the unsupervised group (P < 0.05). In the supervised group, significant differences were found for all variables between pre- and post-test values (P < 0.01), whereas the unsupervised group showed significant differences for urinary symptom score, total score and holding time between the pre- and post-test results (P < 0.05). These findings suggest that exercising the pelvic floor muscles by utilizing trunk stabilization under physiotherapist supervision may be beneficial for the management of postpartum urinary incontinence.

Detection of Tree Crowns Based on Reclassification Using Aerial Images and LIDAR Data

NASA Astrophysics Data System (ADS)

Talebi, S.; Zarea, A.; Sadeghian, S.; Arefi, H.

2013-09-01

Tree detection using aerial sensors in early decades was focused by many researchers in different fields including Remote Sensing and Photogrammetry. This paper is intended to detect trees in complex city areas using aerial imagery and laser scanning data. Our methodology is a hierarchal unsupervised method consists of some primitive operations. This method could be divided into three sections, in which, first section uses aerial imagery and both second and third sections use laser scanners data. In the first section a vegetation cover mask is created in both sunny and shadowed areas. In the second section Rate of Slope Change (RSC) is used to eliminate grasses. In the third section a Digital Terrain Model (DTM) is obtained from LiDAR data. By using DTM and Digital Surface Model (DSM) we would get to Normalized Digital Surface Model (nDSM). Then objects which are lower than a specific height are eliminated. Now there are three result layers from three sections. At the end multiplication operation is used to get final result layer. This layer will be smoothed by morphological operations. The result layer is sent to WG III/4 to evaluate. The evaluation result shows that our method has a good rank in comparing to other participants' methods in ISPRS WG III/4, when assessed in terms of 5 indices including area base completeness, area base correctness, object base completeness, object base correctness and boundary RMS. With regarding of being unsupervised and automatic, this method is improvable and could be integrate with other methods to get best results.
A novel method to augment extraction of mangiferin by application of microwave on three phase partitioning.

PubMed

Kulkarni, Vrushali M; Rathod, Virendra K

2015-06-01

This work reports a novel approach where three phase partitioning (TPP) was combined with microwave for extraction of mangiferin from leaves of Mangifera indica . Soxhlet extraction was used as reference method, which yielded 57 mg/g in 5 h. Under optimal conditions such as microwave irradiation time 5 min, ammonium sulphate concentration 40% w/v, power 272 W, solute to solvent ratio 1:20, slurry to t -butanol ratio 1:1, soaking time 5 min and duty cycle 50%, the mangiferin yield obtained was 54 mg/g by microwave assisted three phase partitioning extraction (MTPP). Thus extraction method developed resulted into higher extraction yield in a shorter span, thereby making it an interesting alternative prior to down-stream processing.
A Novel Feature Extraction Method for Monitoring (Vehicular) Fuel Storage System Leaks

DTIC Science & Technology

2014-10-02

gives a continuous output of the DPDF with predefined partitions . Resolution a DPDF is dependent on pre-determined signal range and number of... partitions within that range. Conceptually, proposed implementation is identical to the creation of a histogram with a moving data windown given some...window. The crisp partitions within specified signal range act as “competing and possible” scenarios or alternatives where we impose a “winner takes all
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations

PubMed Central

Mitchell, William F.

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.

PubMed

Mitchell, William F

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.

1999-01-01

Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.
Systems and methods to control multiple peripherals with a single-peripheral application code

DOEpatents

Ransom, Ray M.

2013-06-11

Methods and apparatus are provided for enhancing the BIOS of a hardware peripheral device to manage multiple peripheral devices simultaneously without modifying the application software of the peripheral device. The apparatus comprises a logic control unit and a memory in communication with the logic control unit. The memory is partitioned into a plurality of ranges, each range comprising one or more blocks of memory, one range being associated with each instance of the peripheral application and one range being reserved for storage of a data pointer related to each peripheral application of the plurality. The logic control unit is configured to operate multiple instances of the control application by duplicating one instance of the peripheral application for each peripheral device of the plurality and partitioning a memory device into partitions comprising one or more blocks of memory, one partition being associated with each instance of the peripheral application. The method then reserves a range of memory addresses for storage of a data pointer related to each peripheral device of the plurality, and initializes each of the plurality of peripheral devices.
Tensor Spectral Clustering for Partitioning Higher-order Network Structures.

PubMed

Benson, Austin R; Gleich, David F; Leskovec, Jure

2015-01-01

Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms.
Tensor Spectral Clustering for Partitioning Higher-order Network Structures

PubMed Central

Benson, Austin R.; Gleich, David F.; Leskovec, Jure

2016-01-01

Spectral graph theory-based methods represent an important class of tools for studying the structure of networks. Spectral methods are based on a first-order Markov chain derived from a random walk on the graph and thus they cannot take advantage of important higher-order network substructures such as triangles, cycles, and feed-forward loops. Here we propose a Tensor Spectral Clustering (TSC) algorithm that allows for modeling higher-order network structures in a graph partitioning framework. Our TSC algorithm allows the user to specify which higher-order network structures (cycles, feed-forward loops, etc.) should be preserved by the network clustering. Higher-order network structures of interest are represented using a tensor, which we then partition by developing a multilinear spectral method. Our framework can be applied to discovering layered flows in networks as well as graph anomaly detection, which we illustrate on synthetic networks. In directed networks, a higher-order structure of particular interest is the directed 3-cycle, which captures feedback loops in networks. We demonstrate that our TSC algorithm produces large partitions that cut fewer directed 3-cycles than standard spectral clustering algorithms. PMID:27812399
Hanging out with Which Friends? Friendship-Level Predictors of Unstructured and Unsupervised Socializing in Adolescence

ERIC Educational Resources Information Center

Siennick, Sonja E.; Osgood, D. Wayne

2012-01-01

Companions are central to explanations of the risky nature of unstructured and unsupervised socializing, yet we know little about whom adolescents are with when hanging out. We examine predictors of how often friendship dyads hang out via multilevel analyses of longitudinal friendship-level data on over 5,000 middle schoolers. Adolescents hang out…
Raster Data Partitioning for Supporting Distributed GIS Processing

NASA Astrophysics Data System (ADS)

Nguyen Thai, B.; Olasz, A.

2015-08-01

In the geospatial sector big data concept also has already impact. Several studies facing originally computer science techniques applied in GIS processing of huge amount of geospatial data. In other research studies geospatial data is considered as it were always been big data (Lee and Kang, 2015). Nevertheless, we can prove data acquisition methods have been improved substantially not only the amount, but the resolution of raw data in spectral, spatial and temporal aspects as well. A significant portion of big data is geospatial data, and the size of such data is growing rapidly at least by 20% every year (Dasgupta, 2013). The produced increasing volume of raw data, in different format, representation and purpose the wealth of information derived from this data sets represents only valuable results. However, the computing capability and processing speed rather tackle with limitations, even if semi-automatic or automatic procedures are aimed on complex geospatial data (Kristóf et al., 2014). In late times, distributed computing has reached many interdisciplinary areas of computer science inclusive of remote sensing and geographic information processing approaches. Cloud computing even more requires appropriate processing algorithms to be distributed and handle geospatial big data. Map-Reduce programming model and distributed file systems have proven their capabilities to process non GIS big data. But sometimes it's inconvenient or inefficient to rewrite existing algorithms to Map-Reduce programming model, also GIS data can not be partitioned as text-based data by line or by bytes. Hence, we would like to find an alternative solution for data partitioning, data distribution and execution of existing algorithms without rewriting or with only minor modifications. This paper focuses on technical overview of currently available distributed computing environments, as well as GIS data (raster data) partitioning, distribution and distributed processing of GIS algorithms. A proof of concept implementation have been made for raster data partitioning, distribution and processing. The first results on performance have been compared against commercial software ERDAS IMAGINE 2011 and 2014. Partitioning methods heavily depend on application areas, therefore we may consider data partitioning as a preprocessing step before applying processing services on data. As a proof of concept we have implemented a simple tile-based partitioning method splitting an image into smaller grids (NxM tiles) and comparing the processing time to existing methods by NDVI calculation. The concept is demonstrated using own development open source processing framework.
Learning representation hierarchies by sharing visual features: a computational investigation of Persian character recognition with unsupervised deep learning.

PubMed

Sadeghi, Zahra; Testolin, Alberto

2017-08-01

In humans, efficient recognition of written symbols is thought to rely on a hierarchical processing system, where simple features are progressively combined into more abstract, high-level representations. Here, we present a computational model of Persian character recognition based on deep belief networks, where increasingly more complex visual features emerge in a completely unsupervised manner by fitting a hierarchical generative model to the sensory data. Crucially, high-level internal representations emerging from unsupervised deep learning can be easily read out by a linear classifier, achieving state-of-the-art recognition accuracy. Furthermore, we tested the hypothesis that handwritten digits and letters share many common visual features: A generative model that captures the statistical structure of the letters distribution should therefore also support the recognition of written digits. To this aim, deep networks trained on Persian letters were used to build high-level representations of Persian digits, which were indeed read out with high accuracy. Our simulations show that complex visual features, such as those mediating the identification of Persian symbols, can emerge from unsupervised learning in multilayered neural networks and can support knowledge transfer across related domains.
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

PubMed

Goldstein, Markus; Uchida, Seiichi

2016-01-01

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.
Unsupervised categorization with individuals diagnosed as having moderate traumatic brain injury: Over-selective responding.

PubMed

Edwards, Darren J; Wood, Rodger

2016-01-01

This study explored over-selectivity (executive dysfunction) using a standard unsupervised categorization task. Over-selectivity has been demonstrated using supervised categorization procedures (where training is given); however, little has been done in the way of unsupervised categorization (without training). A standard unsupervised categorization task was used to assess levels of over-selectivity in a traumatic brain injury (TBI) population. Individuals with TBI were selected from the Tertiary Traumatic Brain Injury Clinic at Swansea University and were asked to categorize two-dimensional items (pictures on cards), into groups that they felt were most intuitive, and without any learning (feedback from experimenter). This was compared against categories made by a control group for the same task. The findings of this study demonstrate that individuals with TBI had deficits for both easy and difficult categorization sets, as indicated by a larger amount of one-dimensional sorting compared to control participants. Deficits were significantly greater for the easy condition. The implications of these findings are discussed in the context of over-selectivity, and the processes that underlie this deficit. Also, the implications for using this procedure as a screening measure for over-selectivity in TBI are discussed.
Effects of Supervised vs. Unsupervised Training Programs on Balance and Muscle Strength in Older Adults: A Systematic Review and Meta-Analysis.

PubMed

Lacroix, André; Hortobágyi, Tibor; Beurskens, Rainer; Granacher, Urs

2017-11-01

Balance and resistance training can improve healthy older adults' balance and muscle strength. Delivering such exercise programs at home without supervision may facilitate participation for older adults because they do not have to leave their homes. To date, no systematic literature analysis has been conducted to determine if supervision affects the effectiveness of these programs to improve healthy older adults' balance and muscle strength/power. The objective of this systematic review and meta-analysis was to quantify the effectiveness of supervised vs. unsupervised balance and/or resistance training programs on measures of balance and muscle strength/power in healthy older adults. In addition, the impact of supervision on training-induced adaptive processes was evaluated in the form of dose-response relationships by analyzing randomized controlled trials that compared supervised with unsupervised trials. A computerized systematic literature search was performed in the electronic databases PubMed, Web of Science, and SportDiscus to detect articles examining the role of supervision in balance and/or resistance training in older adults. The initially identified 6041 articles were systematically screened. Studies were included if they examined balance and/or resistance training in adults aged ≥65 years with no relevant diseases and registered at least one behavioral balance (e.g., time during single leg stance) and/or muscle strength/power outcome (e.g., time for 5-Times-Chair-Rise-Test). Finally, 11 studies were eligible for inclusion in this meta-analysis. Weighted mean standardized mean differences between subjects (SMD bs ) of supervised vs. unsupervised balance/resistance training studies were calculated. The included studies were coded for the following variables: number of participants, sex, age, number and type of interventions, type of balance/strength tests, and change (%) from pre- to post-intervention values. Additionally, we coded training according to the following modalities: period, frequency, volume, modalities of supervision (i.e., number of supervised/unsupervised sessions within the supervised or unsupervised training groups, respectively). Heterogeneity was computed using I 2 and χ 2 statistics. The methodological quality of the included studies was evaluated using the Physiotherapy Evidence Database scale. Our analyses revealed that in older adults, supervised balance/resistance training was superior compared with unsupervised balance/resistance training in improving measures of static steady-state balance (mean SMD bs = 0.28, p = 0.39), dynamic steady-state balance (mean SMD bs = 0.35, p = 0.02), proactive balance (mean SMD bs = 0.24, p = 0.05), balance test batteries (mean SMD bs = 0.53, p = 0.02), and measures of muscle strength/power (mean SMD bs = 0.51, p = 0.04). Regarding the examined dose-response relationships, our analyses showed that a number of 10-29 additional supervised sessions in the supervised training groups compared with the unsupervised training groups resulted in the largest effects for static steady-state balance (mean SMD bs = 0.35), dynamic steady-state balance (mean SMD bs = 0.37), and muscle strength/power (mean SMD bs = 1.12). Further, ≥30 additional supervised sessions in the supervised training groups were needed to produce the largest effects on proactive balance (mean SMD bs = 0.30) and balance test batteries (mean SMD bs = 0.77). Effects in favor of supervised programs were larger for studies that did not include any supervised sessions in their unsupervised programs (mean SMD bs : 0.28-1.24) compared with studies that implemented a few supervised sessions in their unsupervised programs (e.g., three supervised sessions throughout the entire intervention program; SMD bs : -0.06 to 0.41). The present findings have to be interpreted with caution because of the low number of eligible studies and the moderate methodological quality of the included studies, which is indicated by a median Physiotherapy Evidence Database scale score of 5. Furthermore, we indirectly compared dose-response relationships across studies and not from single controlled studies. Our analyses suggest that supervised balance and/or resistance training improved measures of balance and muscle strength/power to a greater extent than unsupervised programs in older adults. Owing to the small number of available studies, we were unable to establish a clear dose-response relationship with regard to the impact of supervision. However, the positive effects of supervised training are particularly prominent when compared with completely unsupervised training programs. It is therefore recommended to include supervised sessions (i.e., two out of three sessions/week) in balance/resistance training programs to effectively improve balance and muscle strength/power in older adults.
An Integrated approach to the Space Situational Awareness Problem

DTIC Science & Technology

2016-12-15

data coming from the sensors. We developed particle-based Gaussian Mixture Filters that are immune to the “curse of dimensionality”/ “particle...depletion” problem inherent in particle filtering . This method maps the data assimilation/ filtering problem into an unsupervised learning problem. Results...Gaussian Mixture Filters ; particle depletion; Finite Set Statistics 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF PAGES 1
Metrics for Systems Thinking in the Human Dimension

DTIC Science & Technology

2016-11-01

corpora of documents. 2 Methodology Overview We present a human-in-the- loop methodology that assists researchers and analysts by characterizing...supervised learning methods. Building on this foundation, we present an unsupervised, human-in-the- loop methodology that utilizes topic models to...the definition of strong systems thinking and in the interpretation of topics, but this is what makes the human-in-the- loop methodology so effective
Simultaneously Discovering and Localizing Common Objects in Wild Images.

PubMed

Wang, Zhenzhen; Yuan, Junsong

2018-09-01

Motivated by the recent success of supervised and weakly supervised common object discovery, in this paper, we move forward one step further to tackle common object discovery in a fully unsupervised way. Generally, object co-localization aims at simultaneously localizing objects of the same class across a group of images. Traditional object localization/detection usually trains specific object detectors which require bounding box annotations of object instances, or at least image-level labels to indicate the presence/absence of objects in an image. Given a collection of images without any annotations, our proposed fully unsupervised method is to simultaneously discover images that contain common objects and also localize common objects in corresponding images. Without requiring to know the total number of common objects, we formulate this unsupervised object discovery as a sub-graph mining problem from a weighted graph of object proposals, where nodes correspond to object proposals, and edges represent the similarities between neighbouring proposals. The positive images and common objects are jointly discovered by finding sub-graphs of strongly connected nodes, with each sub-graph capturing one object pattern. The optimization problem can be efficiently solved by our proposed maximal-flow-based algorithm. Instead of assuming that each image contains only one common object, our proposed solution can better address wild images where each image may contain multiple common objects or even no common object. Moreover, our proposed method can be easily tailored to the task of image retrieval in which the nodes correspond to the similarity between query and reference images. Extensive experiments on PASCAL VOC 2007 and Object Discovery data sets demonstrate that even without any supervision, our approach can discover/localize common objects of various classes in the presence of scale, view point, appearance variation, and partial occlusions. We also conduct broad experiments on image retrieval benchmarks, Holidays and Oxford5k data sets, to show that our proposed method, which considers both the similarity between query and reference images and also similarities among reference images, can help to improve the retrieval results significantly.
Mapping of rock types using a joint approach by combining the multivariate statistics, self-organizing map and Bayesian neural networks: an example from IODP 323 site

NASA Astrophysics Data System (ADS)

Karmakar, Mampi; Maiti, Saumen; Singh, Amrita; Ojha, Maheswar; Maity, Bhabani Sankar

2017-07-01

Modeling and classification of the subsurface lithology is very important to understand the evolution of the earth system. However, precise classification and mapping of lithology using a single framework are difficult due to the complexity and the nonlinearity of the problem driven by limited core sample information. Here, we implement a joint approach by combining the unsupervised and the supervised methods in a single framework for better classification and mapping of rock types. In the unsupervised method, we use the principal component analysis (PCA), K-means cluster analysis (K-means), dendrogram analysis, Fuzzy C-means (FCM) cluster analysis and self-organizing map (SOM). In the supervised method, we use the Bayesian neural networks (BNN) optimized by the Hybrid Monte Carlo (HMC) (BNN-HMC) and the scaled conjugate gradient (SCG) (BNN-SCG) techniques. We use P-wave velocity, density, neutron porosity, resistivity and gamma ray logs of the well U1343E of the Integrated Ocean Drilling Program (IODP) Expedition 323 in the Bering Sea slope region. While the SOM algorithm allows us to visualize the clustering results in spatial domain, the combined classification schemes (supervised and unsupervised) uncover the different patterns of lithology such of as clayey-silt, diatom-silt and silty-clay from an un-cored section of the drilled hole. In addition, the BNN approach is capable of estimating uncertainty in the predictive modeling of three types of rocks over the entire lithology section at site U1343. Alternate succession of clayey-silt, diatom-silt and silty-clay may be representative of crustal inhomogeneity in general and thus could be a basis for detail study related to the productivity of methane gas in the oceans worldwide. Moreover, at the 530 m depth down below seafloor (DSF), the transition from Pliocene to Pleistocene could be linked to lithological alternation between the clayey-silt and the diatom-silt. The present results could provide the basis for the detailed study to get deeper insight into the Bering Sea' sediment deposition and sequence.
Machine learning in APOGEE. Unsupervised spectral classification with K-means

NASA Astrophysics Data System (ADS)

Garcia-Dias, Rafael; Allende Prieto, Carlos; Sánchez Almeida, Jorge; Ordovás-Pascual, Ignacio

2018-05-01

Context. The volume of data generated by astronomical surveys is growing rapidly. Traditional analysis techniques in spectroscopy either demand intensive human interaction or are computationally expensive. In this scenario, machine learning, and unsupervised clustering algorithms in particular, offer interesting alternatives. The Apache Point Observatory Galactic Evolution Experiment (APOGEE) offers a vast data set of near-infrared stellar spectra, which is perfect for testing such alternatives. Aims: Our research applies an unsupervised classification scheme based on K-means to the massive APOGEE data set. We explore whether the data are amenable to classification into discrete classes. Methods: We apply the K-means algorithm to 153 847 high resolution spectra (R ≈ 22 500). We discuss the main virtues and weaknesses of the algorithm, as well as our choice of parameters. Results: We show that a classification based on normalised spectra captures the variations in stellar atmospheric parameters, chemical abundances, and rotational velocity, among other factors. The algorithm is able to separate the bulge and halo populations, and distinguish dwarfs, sub-giants, RC, and RGB stars. However, a discrete classification in flux space does not result in a neat organisation in the parameters' space. Furthermore, the lack of obvious groups in flux space causes the results to be fairly sensitive to the initialisation, and disrupts the efficiency of commonly-used methods to select the optimal number of clusters. Our classification is publicly available, including extensive online material associated with the APOGEE Data Release 12 (DR12). Conclusions: Our description of the APOGEE database can help greatly with the identification of specific types of targets for various applications. We find a lack of obvious groups in flux space, and identify limitations of the K-means algorithm in dealing with this kind of data. Full Tables B.1-B.4 are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/612/A98

Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data.

PubMed

Lasko, Thomas A; Denny, Joshua C; Levy, Mia A

2013-01-01

Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don't think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data - Electronic Medical Records - typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies.
Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data

PubMed Central

Lasko, Thomas A.; Denny, Joshua C.; Levy, Mia A.

2013-01-01

Inferring precise phenotypic patterns from population-scale clinical data is a core computational task in the development of precision, personalized medicine. The traditional approach uses supervised learning, in which an expert designates which patterns to look for (by specifying the learning task and the class labels), and where to look for them (by specifying the input variables). While appropriate for individual tasks, this approach scales poorly and misses the patterns that we don’t think to look for. Unsupervised feature learning overcomes these limitations by identifying patterns (or features) that collectively form a compact and expressive representation of the source data, with no need for expert input or labeled examples. Its rising popularity is driven by new deep learning methods, which have produced high-profile successes on difficult standardized problems of object recognition in images. Here we introduce its use for phenotype discovery in clinical data. This use is challenging because the largest source of clinical data – Electronic Medical Records – typically contains noisy, sparse, and irregularly timed observations, rendering them poor substrates for deep learning methods. Our approach couples dirty clinical data to deep learning architecture via longitudinal probability densities inferred using Gaussian process regression. From episodic, longitudinal sequences of serum uric acid measurements in 4368 individuals we produced continuous phenotypic features that suggest multiple population subtypes, and that accurately distinguished (0.97 AUC) the uric-acid signatures of gout vs. acute leukemia despite not being optimized for the task. The unsupervised features were as accurate as gold-standard features engineered by an expert with complete knowledge of the domain, the classification task, and the class labels. Our findings demonstrate the potential for achieving computational phenotype discovery at population scale. We expect such data-driven phenotypes to expose unknown disease variants and subtypes and to provide rich targets for genetic association studies. PMID:23826094
Improving Design Efficiency for Large-Scale Heterogeneous Circuits

NASA Astrophysics Data System (ADS)

Gregerson, Anthony

Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
The majority are not performing home-exercises correctly two weeks after their initial instruction-an assessor-blinded study.

PubMed

Faber, Mathilde; Andersen, Malene H; Sevel, Claus; Thorborg, Kristian; Bandholm, Thomas; Rathleff, Michael

2015-01-01

Introduction. Time-under-tension (TUT) reflects time under load during strength training and is a proxy of the total exercise dose during strength training. The purpose of this study was to investigate if young participants are able to reproduce TUT and exercise form after two weeks of unsupervised exercises. Material and Methods. The study was an assessor-blinded intervention study with 29 participants. After an initial instruction, all participants were instructed to perform two weeks of home-based unsupervised shoulder abduction exercises three times per week with an elastic exercise band. The participants were instructed in performing an exercise with a predefined TUT (3 s concentric; 2 s isometric; 3 s eccentric; 2 s break) corresponding to a total of 240 s of TUT during three sets of 10 repetitions. After completing two weeks of unsupervised home exercises, they returned for a follow-up assessment of TUT and exercise form while performing the shoulder abduction exercise. A stretch sensor attached to the elastic band was used to measure TUT at baseline and follow-up. A physiotherapist used a pre-defined clinical observation protocol to determine if participants used the correct exercise form. Results. Fourteen of the 29 participants trained with the instructed TUT at follow-up (predefined target: 240 s ±8%). Thirteen of the 29 participants performed the shoulder abduction exercise with a correct exercise form. Seven of the 29 participants trained with the instructed TUT and exercise form at follow-up. Conclusion. The majority of participants did not use the instructed TUT and exercise form at follow-up after two weeks of unsupervised exercises. These findings emphasize the importance of clear and specific home exercise instructions if participants are to follow the given exercise prescription regarding TUT and exercise form as too many or too few exercise stimuli in relation to the initially prescribed amount of exercise most likely will provide a misinterpretation of the actual effect of any given specific home exercise intervention.
Examining unsupervised time with peers and the role of association with delinquent peers on adolescent smoking.

PubMed

Greene, Kathryn; Banerjee, Smita C

2009-04-01

This study explored the association between unsupervised time with peers and adolescent smoking behavior both directly and indirectly through interaction with delinquent peers, social expectancies about cigarette smoking, and cigarette offers from peers. A cross-sectional survey was used for the study and included 248 male and female middle school students. Results of structural equation modeling revealed that unsupervised time with peers is associated indirectly with adolescent smoking behavior through the mediation of association with delinquent peers, social expectancies about cigarette smoking, and cigarette offers from peers. Interventions designed to motivate adolescents without adult supervision to associate more with friends who engage in prosocial activities may eventually reduce adolescent smoking. Further implications for structured supervised time for students outside of school time are discussed.
A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.; Bensaid, Amine M.; Clarke, Laurence P.; Velthuizen, Robert P.; Silbiger, Martin S.; Bezdek, James C.

1992-01-01

Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Initial clinical results are presented on both normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. However, for a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed.
Unsupervised nonlinear dimensionality reduction machine learning methods applied to multiparametric MRI in cerebral ischemia: preliminary results

NASA Astrophysics Data System (ADS)

Parekh, Vishwa S.; Jacobs, Jeremy R.; Jacobs, Michael A.

2014-03-01

The evaluation and treatment of acute cerebral ischemia requires a technique that can determine the total area of tissue at risk for infarction using diagnostic magnetic resonance imaging (MRI) sequences. Typical MRI data sets consist of T1- and T2-weighted imaging (T1WI, T2WI) along with advanced MRI parameters of diffusion-weighted imaging (DWI) and perfusion weighted imaging (PWI) methods. Each of these parameters has distinct radiological-pathological meaning. For example, DWI interrogates the movement of water in the tissue and PWI gives an estimate of the blood flow, both are critical measures during the evolution of stroke. In order to integrate these data and give an estimate of the tissue at risk or damaged; we have developed advanced machine learning methods based on unsupervised non-linear dimensionality reduction (NLDR) techniques. NLDR methods are a class of algorithms that uses mathematically defined manifolds for statistical sampling of multidimensional classes to generate a discrimination rule of guaranteed statistical accuracy and they can generate a two- or three-dimensional map, which represents the prominent structures of the data and provides an embedded image of meaningful low-dimensional structures hidden in their high-dimensional observations. In this manuscript, we develop NLDR methods on high dimensional MRI data sets of preclinical animals and clinical patients with stroke. On analyzing the performance of these methods, we observed that there was a high of similarity between multiparametric embedded images from NLDR methods and the ADC map and perfusion map. It was also observed that embedded scattergram of abnormal (infarcted or at risk) tissue can be visualized and provides a mechanism for automatic methods to delineate potential stroke volumes and early tissue at risk.
Segmenting Continuous Motions with Hidden Semi-markov Models and Gaussian Processes

PubMed Central

Nakamura, Tomoaki; Nagai, Takayuki; Mochihashi, Daichi; Kobayashi, Ichiro; Asoh, Hideki; Kaneko, Masahide

2017-01-01

Humans divide perceived continuous information into segments to facilitate recognition. For example, humans can segment speech waves into recognizable morphemes. Analogously, continuous motions are segmented into recognizable unit actions. People can divide continuous information into segments without using explicit segment points. This capacity for unsupervised segmentation is also useful for robots, because it enables them to flexibly learn languages, gestures, and actions. In this paper, we propose a Gaussian process-hidden semi-Markov model (GP-HSMM) that can divide continuous time series data into segments in an unsupervised manner. Our proposed method consists of a generative model based on the hidden semi-Markov model (HSMM), the emission distributions of which are Gaussian processes (GPs). Continuous time series data is generated by connecting segments generated by the GP. Segmentation can be achieved by using forward filtering-backward sampling to estimate the model's parameters, including the lengths and classes of the segments. In an experiment using the CMU motion capture dataset, we tested GP-HSMM with motion capture data containing simple exercise motions; the results of this experiment showed that the proposed GP-HSMM was comparable with other methods. We also conducted an experiment using karate motion capture data, which is more complex than exercise motion capture data; in this experiment, the segmentation accuracy of GP-HSMM was 0.92, which outperformed other methods. PMID:29311889
Effect of UV-A and UV-B irradiation on the metabolic profile of aqueous humor in rabbits analyzed by 1H NMR spectroscopy.

PubMed

Tessem, May-Britt; Bathen, Tone F; Cejková, Jitka; Midelfart, Anna

2005-03-01

This study was conducted to investigate metabolic changes in aqueous humor from rabbit eyes exposed to either UV-A or -B radiation, by using (1)H nuclear magnetic resonance (NMR) spectroscopy and unsupervised pattern recognition methods. Both eyes of adult albino rabbits were irradiated with UV-A (366 nm, 0.589 J/cm(2)) or UV-B (312 nm, 1.667 J/cm(2)) radiation for 8 minutes, once a day for 5 days. Three days after the last irradiation, samples of aqueous humor were aspirated, and the metabolic profiles analyzed with (1)H NMR spectroscopy. The metabolic concentrations in the exposed and control materials were statistically analyzed and compared, with multivariate methods and one-way ANOVA. UV-B radiation caused statistically significant alterations of betaine, glucose, ascorbate, valine, isoleucine, and formate in the rabbit aqueous humor. By using principal component analysis, the UV-B-irradiated samples were clearly separated from the UV-A-irradiated samples and the control group. No significant metabolic changes were detected in UV-A-irradiated samples. This study demonstrates the potential of using unsupervised pattern recognition methods to extract valuable metabolic information from complex (1)H NMR spectra. UV-B irradiation of rabbit eyes led to significant metabolic changes in the aqueous humor detected 3 days after the last exposure.
Nonlinear projection methods for visualizing Barcode data and application on two data sets.

PubMed

Olteanu, Madalina; Nicolas, Violaine; Schaeffer, Brigitte; Denys, Christiane; Missoup, Alain-Didier; Kennis, Jan; Larédo, Catherine

2013-11-01

Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding structure in the data. Multidimensional scaling (MDS) and Self-organizing maps (SOM) are unsupervised statistical tools for data visualization. Both algorithms map data onto a lower dimensional manifold: MDS looks for a projection that best preserves pairwise distances while SOM preserves the topology of the data. Both algorithms were initially developed for Euclidean data and the conditions necessary to their good implementation were not satisfied for Barcode data. We developed a workflow consisting in four steps: collapse data into distinct sequences; compute a dissimilarity matrix; run a modified version of SOM for dissimilarity matrices to structure the data and reduce dimensionality; project the results using MDS. This methodology was applied to Astraptes fulgerator and Hylomyscus, an African rodent with debated taxonomy. We obtained very good results for both data sets. The results were robust against unbalanced species. All the species in Astraptes were well displayed in very distinct groups in the various visualizations, except for LOHAMP and FABOV that were mixed up. For Hylomyscus, our findings were consistent with known species, confirmed the existence of four unnamed taxa and suggested the existence of potentially new species. © 2013 John Wiley & Sons Ltd.
Unsupervised Learning for Monaural Source Separation Using Maximization–Minimization Algorithm with Time–Frequency Deconvolution †

PubMed Central

Bouridane, Ahmed; Ling, Bingo Wing-Kuen

2018-01-01

This paper presents an unsupervised learning algorithm for sparse nonnegative matrix factor time–frequency deconvolution with optimized fractional β-divergence. The β-divergence is a group of cost functions parametrized by a single parameter β. The Itakura–Saito divergence, Kullback–Leibler divergence and Least Square distance are special cases that correspond to β=0, 1, 2, respectively. This paper presents a generalized algorithm that uses a flexible range of β that includes fractional values. It describes a maximization–minimization (MM) algorithm leading to the development of a fast convergence multiplicative update algorithm with guaranteed convergence. The proposed model operates in the time–frequency domain and decomposes an information-bearing matrix into two-dimensional deconvolution of factor matrices that represent the spectral dictionary and temporal codes. The deconvolution process has been optimized to yield sparse temporal codes through maximizing the likelihood of the observations. The paper also presents a method to estimate the fractional β value. The method is demonstrated on separating audio mixtures recorded from a single channel. The paper shows that the extraction of the spectral dictionary and temporal codes is significantly more efficient by using the proposed algorithm and subsequently leads to better source separation performance. Experimental tests and comparisons with other factorization methods have been conducted to verify its efficacy. PMID:29702629
Application of classification methods for mapping Mercury's surface composition: analysis on Rudaki's Area

NASA Astrophysics Data System (ADS)

Zambon, F.; De Sanctis, M. C.; Capaccioni, F.; Filacchione, G.; Carli, C.; Ammanito, E.; Friggeri, A.

2011-10-01

During the first two MESSENGER flybys (14th January 2008 and 6th October 2008) the Mercury Dual Imaging System (MDIS) has extended the coverage of the Mercury surface, obtained by Mariner 10 and now we have images of about 90% of the Mercury surface [1]. MDIS is equipped with a Narrow Angle Camera (NAC) and a Wide Angle Camera (WAC). The NAC uses an off-axis reflective design with a 1.5° field of view (FOV) centered at 747 nm. The WAC has a re- fractive design with a 10.5° FOV and 12-position filters that cover a 395-1040 nm spectral range [2]. The color images can be used to infer information on the surface composition and classification meth- ods are an interesting technique for multispectral image analysis which can be applied to the study of the planetary surfaces. Classification methods are based on clustering algorithms and they can be divided in two categories: unsupervised and supervised. The unsupervised classifiers do not require the analyst feedback, and the algorithm automatically organizes pixels values into classes. In the supervised method, instead, the analyst must choose the "training area" that define the pixels value of a given class [3]. Here we will describe the classification in different compositional units of the region near the Rudaki Crater on Mercury.
Searching Remote Homology with Spectral Clustering with Symmetry in Neighborhood Cluster Kernels

PubMed Central

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of “recent” paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. Contact: sarkar@labri.fr. PMID:23457439
Searching remote homology with spectral clustering with symmetry in neighborhood cluster kernels.

PubMed

Maulik, Ujjwal; Sarkar, Anasua

2013-01-01

Remote homology detection among proteins utilizing only the unlabelled sequences is a central problem in comparative genomics. The existing cluster kernel methods based on neighborhoods and profiles and the Markov clustering algorithms are currently the most popular methods for protein family recognition. The deviation from random walks with inflation or dependency on hard threshold in similarity measure in those methods requires an enhancement for homology detection among multi-domain proteins. We propose to combine spectral clustering with neighborhood kernels in Markov similarity for enhancing sensitivity in detecting homology independent of "recent" paralogs. The spectral clustering approach with new combined local alignment kernels more effectively exploits the unsupervised protein sequences globally reducing inter-cluster walks. When combined with the corrections based on modified symmetry based proximity norm deemphasizing outliers, the technique proposed in this article outperforms other state-of-the-art cluster kernels among all twelve implemented kernels. The comparison with the state-of-the-art string and mismatch kernels also show the superior performance scores provided by the proposed kernels. Similar performance improvement also is found over an existing large dataset. Therefore the proposed spectral clustering framework over combined local alignment kernels with modified symmetry based correction achieves superior performance for unsupervised remote homolog detection even in multi-domain and promiscuous domain proteins from Genolevures database families with better biological relevance. Source code available upon request. sarkar@labri.fr.
ANALYTICAL METHOD DEVELOPMENTS TO SUPPORT PARTITIONING INTERWELL TRACER TESTING

EPA Science Inventory

Partitioning Interwell Tracer Testing (PITT) uses alcohol tracer compounds in estimating subsurface contamination from non-polar pollutants. PITT uses the analysis of water samples for various alcohols as part of the overall measurement process. The water samples may contain many...
Methods for selecting fixed-effect models for heterogeneous codon evolution, with comments on their application to gene and genome data.

PubMed

Bao, Le; Gu, Hong; Dunn, Katherine A; Bielawski, Joseph P

2007-02-08

Models of codon evolution have proven useful for investigating the strength and direction of natural selection. In some cases, a priori biological knowledge has been used successfully to model heterogeneous evolutionary dynamics among codon sites. These are called fixed-effect models, and they require that all codon sites are assigned to one of several partitions which are permitted to have independent parameters for selection pressure, evolutionary rate, transition to transversion ratio or codon frequencies. For single gene analysis, partitions might be defined according to protein tertiary structure, and for multiple gene analysis partitions might be defined according to a gene's functional category. Given a set of related fixed-effect models, the task of selecting the model that best fits the data is not trivial. In this study, we implement a set of fixed-effect codon models which allow for different levels of heterogeneity among partitions in the substitution process. We describe strategies for selecting among these models by a backward elimination procedure, Akaike information criterion (AIC) or a corrected Akaike information criterion (AICc). We evaluate the performance of these model selection methods via a simulation study, and make several recommendations for real data analysis. Our simulation study indicates that the backward elimination procedure can provide a reliable method for model selection in this setting. We also demonstrate the utility of these models by application to a single-gene dataset partitioned according to tertiary structure (abalone sperm lysin), and a multi-gene dataset partitioned according to the functional category of the gene (flagellar-related proteins of Listeria). Fixed-effect models have advantages and disadvantages. Fixed-effect models are desirable when data partitions are known to exhibit significant heterogeneity or when a statistical test of such heterogeneity is desired. They have the disadvantage of requiring a priori knowledge for partitioning sites. We recommend: (i) selection of models by using backward elimination rather than AIC or AICc, (ii) use a stringent cut-off, e.g., p = 0.0001, and (iii) conduct sensitivity analysis of results. With thoughtful application, fixed-effect codon models should provide a useful tool for large scale multi-gene analyses.
Unsupervised neural spike sorting for high-density microelectrode arrays with convolutive independent component analysis.

PubMed

Leibig, Christian; Wachtler, Thomas; Zeck, Günther

2016-09-15

Unsupervised identification of action potentials in multi-channel extracellular recordings, in particular from high-density microelectrode arrays with thousands of sensors, is an unresolved problem. While independent component analysis (ICA) achieves rapid unsupervised sorting, it ignores the convolutive structure of extracellular data, thus limiting the unmixing to a subset of neurons. Here we present a spike sorting algorithm based on convolutive ICA (cICA) to retrieve a larger number of accurately sorted neurons than with instantaneous ICA while accounting for signal overlaps. Spike sorting was applied to datasets with varying signal-to-noise ratios (SNR: 3-12) and 27% spike overlaps, sampled at either 11.5 or 23kHz on 4365 electrodes. We demonstrate how the instantaneity assumption in ICA-based algorithms has to be relaxed in order to improve the spike sorting performance for high-density microelectrode array recordings. Reformulating the convolutive mixture as an instantaneous mixture by modeling several delayed samples jointly is necessary to increase signal-to-noise ratio. Our results emphasize that different cICA algorithms are not equivalent. Spike sorting performance was assessed with ground-truth data generated from experimentally derived templates. The presented spike sorter was able to extract ≈90% of the true spike trains with an error rate below 2%. It was superior to two alternative (c)ICA methods (≈80% accurately sorted neurons) and comparable to a supervised sorting. Our new algorithm represents a fast solution to overcome the current bottleneck in spike sorting of large datasets generated by simultaneous recording with thousands of electrodes. Copyright © 2016 Elsevier B.V. All rights reserved.
Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques

PubMed Central

Kavuluru, Ramakanth; Han, Sifei; Harris, Daniel

2017-01-01

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient’s medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult. PMID:28748227
FNAS phase partitions

NASA Technical Reports Server (NTRS)

Vanalstine, James M.

1993-01-01

Project NAS8-36955 D.O. #100 initially involved the following tasks: (1) evaluation of various coatings' ability to control wall wetting and surface zeta potential expression; (2) testing various methods to mix and control the demixing of phase systems; and (3) videomicroscopic investigation of cell partition. Three complementary areas were identified for modification and extension of the original contract. They were: (1) identification of new supports for column cell partition; (2) electrokinetic detection of protein adsorption; and (3) emulsion studies related to bioseparations.
Reconstruction of a piecewise constant conductivity on a polygonal partition via shape optimization in EIT

NASA Astrophysics Data System (ADS)

Beretta, Elena; Micheletti, Stefano; Perotto, Simona; Santacesaria, Matteo

2018-01-01

In this paper, we develop a shape optimization-based algorithm for the electrical impedance tomography (EIT) problem of determining a piecewise constant conductivity on a polygonal partition from boundary measurements. The key tool is to use a distributed shape derivative of a suitable cost functional with respect to movements of the partition. Numerical simulations showing the robustness and accuracy of the method are presented for simulated test cases in two dimensions.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.