Sample records for principal-components-based clustering method

  1. CLUSFAVOR 5.0: hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles

    PubMed Central

    Peterson, Leif E

    2002-01-01

    CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816

  2. A measure for objects clustering in principal component analysis biplot: A case study in inter-city buses maintenance cost data

    NASA Astrophysics Data System (ADS)

    Ginanjar, Irlandia; Pasaribu, Udjianna S.; Indratno, Sapto W.

    2017-03-01

    This article presents the application of the principal component analysis (PCA) biplot for the needs of data mining. This article aims to simplify and objectify the methods for objects clustering in PCA biplot. The novelty of this paper is to get a measure that can be used to objectify the objects clustering in PCA biplot. Orthonormal eigenvectors, which are the coefficients of a principal component model representing an association between principal components and initial variables. The existence of the association is a valid ground to objects clustering based on principal axes value, thus if m principal axes used in the PCA, then the objects can be classified into 2m clusters. The inter-city buses are clustered based on maintenance costs data by using two principal axes PCA biplot. The buses are clustered into four groups. The first group is the buses with high maintenance costs, especially for lube, and brake canvass. The second group is the buses with high maintenance costs, especially for tire, and filter. The third group is the buses with low maintenance costs, especially for lube, and brake canvass. The fourth group is buses with low maintenance costs, especially for tire, and filter.

  3. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  4. PCA based clustering for brain tumor segmentation of T1w MRI images.

    PubMed

    Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay

    2017-03-01

    Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. Evaluation of Low-Voltage Distribution Network Index Based on Improved Principal Component Analysis

    NASA Astrophysics Data System (ADS)

    Fan, Hanlu; Gao, Suzhou; Fan, Wenjie; Zhong, Yinfeng; Zhu, Lei

    2018-01-01

    In order to evaluate the development level of the low-voltage distribution network objectively and scientifically, chromatography analysis method is utilized to construct evaluation index model of low-voltage distribution network. Based on the analysis of principal component and the characteristic of logarithmic distribution of the index data, a logarithmic centralization method is adopted to improve the principal component analysis algorithm. The algorithm can decorrelate and reduce the dimensions of the evaluation model and the comprehensive score has a better dispersion degree. The clustering method is adopted to analyse the comprehensive score because the comprehensive score of the courts is concentrated. Then the stratification evaluation of the courts is realized. An example is given to verify the objectivity and scientificity of the evaluation method.

  6. Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison

    PubMed Central

    Matsen IV, Frederick A.; Evans, Steven N.

    2013-01-01

    Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. PMID:23505415

  7. Rapidly differentiating grape seeds from different sources based on characteristic fingerprints using direct analysis in real time coupled with time-of-flight mass spectrometry combined with chemometrics.

    PubMed

    Song, Yuqiao; Liao, Jie; Dong, Junxing; Chen, Li

    2015-09-01

    The seeds of grapevine (Vitis vinifera) are a byproduct of wine production. To examine the potential value of grape seeds, grape seeds from seven sources were subjected to fingerprinting using direct analysis in real time coupled with time-of-flight mass spectrometry combined with chemometrics. Firstly, we listed all reported components (56 components) from grape seeds and calculated the precise m/z values of the deprotonated ions [M-H](-) . Secondly, the experimental conditions were systematically optimized based on the peak areas of total ion chromatograms of the samples. Thirdly, the seven grape seed samples were examined using the optimized method. Information about 20 grape seed components was utilized to represent characteristic fingerprints. Finally, hierarchical clustering analysis and principal component analysis were performed to analyze the data. Grape seeds from seven different sources were classified into two clusters; hierarchical clustering analysis and principal component analysis yielded similar results. The results of this study lay the foundation for appropriate utilization and exploitation of grape seed samples. Due to the absence of complicated sample preparation methods and chromatographic separation, the method developed in this study represents one of the simplest and least time-consuming methods for grape seed fingerprinting. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  8. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    ERIC Educational Resources Information Center

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  9. Automated cloud screening of AVHRR imagery using split-and-merge clustering

    NASA Technical Reports Server (NTRS)

    Gallaudet, Timothy C.; Simpson, James J.

    1991-01-01

    Previous methods to segment clouds from ocean in AVHRR imagery have shown varying degrees of success, with nighttime approaches being the most limited. An improved method of automatic image segmentation, the principal component transformation split-and-merge clustering (PCTSMC) algorithm, is presented and applied to cloud screening of both nighttime and daytime AVHRR data. The method combines spectral differencing, the principal component transformation, and split-and-merge clustering to sample objectively the natural classes in the data. This segmentation method is then augmented by supervised classification techniques to screen clouds from the imagery. Comparisons with other nighttime methods demonstrate its improved capability in this application. The sensitivity of the method to clustering parameters is presented; the results show that the method is insensitive to the split-and-merge thresholds.

  10. An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks.

    PubMed

    Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong

    2015-08-07

    Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.

  11. Utilizing Hierarchical Clustering to improve Efficiency of Self-Organizing Feature Map to Identify Hydrological Homogeneous Regions

    NASA Astrophysics Data System (ADS)

    Farsadnia, Farhad; Ghahreman, Bijan

    2016-04-01

    Hydrologic homogeneous group identification is considered both fundamental and applied research in hydrology. Clustering methods are among conventional methods to assess the hydrological homogeneous regions. Recently, Self-Organizing feature Map (SOM) method has been applied in some studies. However, the main problem of this method is the interpretation on the output map of this approach. Therefore, SOM is used as input to other clustering algorithms. The aim of this study is to apply a two-level Self-Organizing feature map and Ward hierarchical clustering method to determine the hydrologic homogenous regions in North and Razavi Khorasan provinces. At first by principal component analysis, we reduced SOM input matrix dimension, then the SOM was used to form a two-dimensional features map. To determine homogeneous regions for flood frequency analysis, SOM output nodes were used as input into the Ward method. Generally, the regions identified by the clustering algorithms are not statistically homogeneous. Consequently, they have to be adjusted to improve their homogeneity. After adjustment of the homogeneity regions by L-moment tests, five hydrologic homogeneous regions were identified. Finally, adjusted regions were created by a two-level SOM and then the best regional distribution function and associated parameters were selected by the L-moment approach. The results showed that the combination of self-organizing maps and Ward hierarchical clustering by principal components as input is more effective than the hierarchical method, by principal components or standardized inputs to achieve hydrologic homogeneous regions.

  12. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    PubMed Central

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  13. Clustering of Variables for Mixed Data

    NASA Astrophysics Data System (ADS)

    Saracco, J.; Chavent, M.

    2016-05-01

    This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.

  14. Geochemical differentiation processes for arc magma of the Sengan volcanic cluster, Northeastern Japan, constrained from principal component analysis

    NASA Astrophysics Data System (ADS)

    Ueki, Kenta; Iwamori, Hikaru

    2017-10-01

    In this study, with a view of understanding the structure of high-dimensional geochemical data and discussing the chemical processes at work in the evolution of arc magmas, we employed principal component analysis (PCA) to evaluate the compositional variations of volcanic rocks from the Sengan volcanic cluster of the Northeastern Japan Arc. We analyzed the trace element compositions of various arc volcanic rocks, sampled from 17 different volcanoes in a volcanic cluster. The PCA results demonstrated that the first three principal components accounted for 86% of the geochemical variation in the magma of the Sengan region. Based on the relationships between the principal components and the major elements, the mass-balance relationships with respect to the contributions of minerals, the composition of plagioclase phenocrysts, geothermal gradient, and seismic velocity structure in the crust, the first, the second, and the third principal components appear to represent magma mixing, crystallizations of olivine/pyroxene, and crystallizations of plagioclase, respectively. These represented 59%, 20%, and 6%, respectively, of the variance in the entire compositional range, indicating that magma mixing accounted for the largest variance in the geochemical variation of the arc magma. Our result indicated that crustal processes dominate the geochemical variation of magma in the Sengan volcanic cluster.

  15. Functional principal component analysis of glomerular filtration rate curves after kidney transplant.

    PubMed

    Dong, Jianghu J; Wang, Liangliang; Gill, Jagbir; Cao, Jiguo

    2017-01-01

    This article is motivated by some longitudinal clinical data of kidney transplant recipients, where kidney function progression is recorded as the estimated glomerular filtration rates at multiple time points post kidney transplantation. We propose to use the functional principal component analysis method to explore the major source of variations of glomerular filtration rate curves. We find that the estimated functional principal component scores can be used to cluster glomerular filtration rate curves. Ordering functional principal component scores can detect abnormal glomerular filtration rate curves. Finally, functional principal component analysis can effectively estimate missing glomerular filtration rate values and predict future glomerular filtration rate values.

  16. Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes.

    PubMed

    Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun

    2015-11-04

    There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.

  17. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  18. Performance evaluation of PCA-based spike sorting algorithms.

    PubMed

    Adamos, Dimitrios A; Kosmidis, Efstratios K; Theophilidis, George

    2008-09-01

    Deciphering the electrical activity of individual neurons from multi-unit noisy recordings is critical for understanding complex neural systems. A widely used spike sorting algorithm is being evaluated for single-electrode nerve trunk recordings. The algorithm is based on principal component analysis (PCA) for spike feature extraction. In the neuroscience literature it is generally assumed that the use of the first two or most commonly three principal components is sufficient. We estimate the optimum PCA-based feature space by evaluating the algorithm's performance on simulated series of action potentials. A number of modifications are made to the open source nev2lkit software to enable systematic investigation of the parameter space. We introduce a new metric to define clustering error considering over-clustering more favorable than under-clustering as proposed by experimentalists for our data. Both the program patch and the metric are available online. Correlated and white Gaussian noise processes are superimposed to account for biological and artificial jitter in the recordings. We report that the employment of more than three principal components is in general beneficial for all noise cases considered. Finally, we apply our results to experimental data and verify that the sorting process with four principal components is in agreement with a panel of electrophysiology experts.

  19. Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components

    PubMed Central

    Wang, Min; Kornblau, Steven M; Coombes, Kevin R

    2018-01-01

    Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable. PMID:29881252

  20. The use of principal component and cluster analysis to differentiate banana peel flours based on their starch and dietary fibre components.

    PubMed

    Ramli, Saifullah; Ismail, Noryati; Alkarkhi, Abbas Fadhl Mubarek; Easa, Azhar Mat

    2010-08-01

    Banana peel flour (BPF) prepared from green or ripe Cavendish and Dream banana fruits were assessed for their total starch (TS), digestible starch (DS), resistant starch (RS), total dietary fibre (TDF), soluble dietary fibre (SDF) and insoluble dietary fibre (IDF). Principal component analysis (PCA) identified that only 1 component was responsible for 93.74% of the total variance in the starch and dietary fibre components that differentiated ripe and green banana flours. Cluster analysis (CA) applied to similar data obtained two statistically significant clusters (green and ripe bananas) to indicate difference in behaviours according to the stages of ripeness based on starch and dietary fibre components. We concluded that the starch and dietary fibre components could be used to discriminate between flours prepared from peels obtained from fruits of different ripeness. The results were also suggestive of the potential of green and ripe BPF as functional ingredients in food.

  1. The Use of Principal Component and Cluster Analysis to Differentiate Banana Peel Flours Based on Their Starch and Dietary Fibre Components

    PubMed Central

    Ramli, Saifullah; Ismail, Noryati; Alkarkhi, Abbas Fadhl Mubarek; Easa, Azhar Mat

    2010-01-01

    Banana peel flour (BPF) prepared from green or ripe Cavendish and Dream banana fruits were assessed for their total starch (TS), digestible starch (DS), resistant starch (RS), total dietary fibre (TDF), soluble dietary fibre (SDF) and insoluble dietary fibre (IDF). Principal component analysis (PCA) identified that only 1 component was responsible for 93.74% of the total variance in the starch and dietary fibre components that differentiated ripe and green banana flours. Cluster analysis (CA) applied to similar data obtained two statistically significant clusters (green and ripe bananas) to indicate difference in behaviours according to the stages of ripeness based on starch and dietary fibre components. We concluded that the starch and dietary fibre components could be used to discriminate between flours prepared from peels obtained from fruits of different ripeness. The results were also suggestive of the potential of green and ripe BPF as functional ingredients in food. PMID:24575193

  2. A graph-Laplacian-based feature extraction algorithm for neural spike sorting.

    PubMed

    Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos

    2009-01-01

    Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.

  3. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    PubMed Central

    Liu, Jingxian; Wu, Kefeng

    2017-01-01

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with traditional spectral clustering and fast affinity propagation clustering. Experimental results have illustrated its superior performance in terms of quantitative and qualitative evaluations. PMID:28777353

  4. A Spatial Division Clustering Method and Low Dimensional Feature Extraction Technique Based Indoor Positioning System

    PubMed Central

    Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao

    2014-01-01

    Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470

  5. Local Prediction Models on Mid-Atlantic Ridge MORB by Principal Component Regression

    NASA Astrophysics Data System (ADS)

    Ling, X.; Snow, J. E.; Chin, W.

    2017-12-01

    The isotopic compositions of the daughter isotopes of long-lived radioactive systems (Sr, Nd, Hf and Pb ) can be used to map the scale and history of mantle heterogeneities beneath mid-ocean ridges. Our goal is to relate the multidimensional structure in the existing isotopic dataset with an underlying physical reality of mantle sources. The numerical technique of Principal Component Analysis is useful to reduce the linear dependence of the data to a minimum set of orthogonal eigenvectors encapsulating the information contained (cf Agranier et al 2005). The dataset used for this study covers almost all the MORBs along mid-Atlantic Ridge (MAR), from 54oS to 77oN and 8.8oW to -46.7oW, including replicating the dataset of Agranier et al., 2005 published plus 53 basalt samples dredged and analyzed since then (data from PetDB). The principal components PC1 and PC2 account for 61.56% and 29.21%, respectively, of the total isotope ratios variability. The samples with similar compositions to HIMU and EM and DM are identified to better understand the PCs. PC1 and PC2 are accountable for HIMU and EM whereas PC2 has limited control over the DM source. PC3 is more strongly controlled by the depleted mantle source than PC2. What this means is that all three principal components have a high degree of significance relevant to the established mantle sources. We also tested the relationship between mantle heterogeneity and sample locality. K-means clustering algorithm is a type of unsupervised learning to find groups in the data based on feature similarity. The PC factor scores of each sample are clustered into three groups. Cluster one and three are alternating on the north and south MAR. Cluster two exhibits on 45.18oN to 0.79oN and -27.9oW to -30.40oW alternating with cluster one. The ridge has been preliminarily divided into 16 sections considering both the clusters and ridge segments. The principal component regression models the section based on 6 isotope ratios and PCs. The prediction residual is about 1-2km. It means that the combined 5 isotopes are a strong predictor of geographic location along the ridge, a slightly surprising result. PCR is a robust and powerful method for both visualizing and manipulating the multidimensional representation of isotope data.

  6. Assessment of sediment quality in the Mediterranean Sea-Boughrara lagoon exchange areas (southeastern Tunisia): GIS approach-based chemometric methods.

    PubMed

    Kharroubi, Adel; Gargouri, Dorra; Baati, Houda; Azri, Chafai

    2012-06-01

    Concentrations of selected heavy metals (Cd, Pb, Zn, Cu, Mn, and Fe) in surface sediments from 66 sites in both northern and eastern Mediterranean Sea-Boughrara lagoon exchange areas (southeastern Tunisia) were studied in order to understand current metal contamination due to the urbanization and economic development of nearby several coastal regions of the Gulf of Gabès. Multiple approaches were applied for the sediment quality assessment. These approaches were based on GIS coupled with chemometric methods (enrichment factors, geoaccumulation index, principal component analysis, and cluster analysis). Enrichment factors and principal component analysis revealed two distinct groups of metals. The first group corresponded to Fe and Mn derived from natural sources, and the second group contained Cd, Pb, Zn, and Cu originated from man-made sources. For these latter metals, cluster analysis showed two distinct distributions in the selected areas. They were attributed to temporal and spatial variations of contaminant sources input. The geoaccumulation index (I (geo)) values explained that only Cd, Pb, and Cu can be considered as moderate to extreme pollutants in the studied sediments.

  7. [Discrimination of varieties of brake fluid using visual-near infrared spectra].

    PubMed

    Jiang, Lu-lu; Tan, Li-hong; Qiu, Zheng-jun; Lu, Jiang-feng; He, Yong

    2008-06-01

    A new method was developed to fast discriminate brands of brake fluid by means of visual-near infrared spectroscopy. Five different brands of brake fluid were analyzed using a handheld near infrared spectrograph, manufactured by ASD Company, and 60 samples were gotten from each brand of brake fluid. The samples data were pretreated using average smoothing and standard normal variable method, and then analyzed using principal component analysis (PCA). A 2-dimensional plot was drawn based on the first and the second principal components, and the plot indicated that the clustering characteristic of different brake fluid is distinct. The foregoing 6 principal components were taken as input variable, and the band of brake fluid as output variable to build the discriminate model by stepwise discriminant analysis method. Two hundred twenty five samples selected randomly were used to create the model, and the rest 75 samples to verify the model. The result showed that the distinguishing rate was 94.67%, indicating that the method proposed in this paper has good performance in classification and discrimination. It provides a new way to fast discriminate different brands of brake fluid.

  8. A Data Analytics Approach to Discovering Unique Microstructural Configurations Susceptible to Fatigue

    NASA Astrophysics Data System (ADS)

    Jha, S. K.; Brockman, R. A.; Hoffman, R. M.; Sinha, V.; Pilchak, A. L.; Porter, W. J.; Buchanan, D. J.; Larsen, J. M.; John, R.

    2018-05-01

    Principal component analysis and fuzzy c-means clustering algorithms were applied to slip-induced strain and geometric metric data in an attempt to discover unique microstructural configurations and their frequencies of occurrence in statistically representative instantiations of a titanium alloy microstructure. Grain-averaged fatigue indicator parameters were calculated for the same instantiation. The fatigue indicator parameters strongly correlated with the spatial location of the microstructural configurations in the principal components space. The fuzzy c-means clustering method identified clusters of data that varied in terms of their average fatigue indicator parameters. Furthermore, the number of points in each cluster was inversely correlated to the average fatigue indicator parameter. This analysis demonstrates that data-driven methods have significant potential for providing unbiased determination of unique microstructural configurations and their frequencies of occurrence in a given volume from the point of view of strain localization and fatigue crack initiation.

  9. Analysis of indoor air pollutants checklist using environmetric technique for health risk assessment of sick building complaint in nonindustrial workplace

    PubMed Central

    Syazwan, AI; Rafee, B Mohd; Juahir, Hafizan; Azman, AZF; Nizar, AM; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, AA; Yunos, MA Syafiq; Anita, AR; Hanafiah, J Muhamad; Shaharuddin, MS; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, MN Mohamad; Azizan, HS; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, FT

    2012-01-01

    Purpose To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. Design A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. Method A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Result Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. Conclusion This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures. PMID:23055779

  10. A data fusion-based drought index

    NASA Astrophysics Data System (ADS)

    Azmi, Mohammad; Rüdiger, Christoph; Walker, Jeffrey P.

    2016-03-01

    Drought and water stress monitoring plays an important role in the management of water resources, especially during periods of extreme climate conditions. Here, a data fusion-based drought index (DFDI) has been developed and analyzed for three different locations of varying land use and climate regimes in Australia. The proposed index comprehensively considers all types of drought through a selection of indices and proxies associated with each drought type. In deriving the proposed index, weekly data from three different data sources (OzFlux Network, Asia-Pacific Water Monitor, and MODIS-Terra satellite) were employed to first derive commonly used individual standardized drought indices (SDIs), which were then grouped using an advanced clustering method. Next, three different multivariate methods (principal component analysis, factor analysis, and independent component analysis) were utilized to aggregate the SDIs located within each group. For the two clusters in which the grouped SDIs best reflected the water availability and vegetation conditions, the variables were aggregated based on an averaging between the standardized first principal components of the different multivariate methods. Then, considering those two aggregated indices as well as the classifications of months (dry/wet months and active/non-active months), the proposed DFDI was developed. Finally, the symbolic regression method was used to derive mathematical equations for the proposed DFDI. The results presented here show that the proposed index has revealed new aspects in water stress monitoring which previous indices were not able to, by simultaneously considering both hydrometeorological and ecological concepts to define the real water stress of the study areas.

  11. Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health

    PubMed Central

    Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S. Stanley

    2016-01-01

    Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability. PMID:27213413

  12. Applications of a Novel Clustering Approach Using Non-Negative Matrix Factorization to Environmental Research in Public Health.

    PubMed

    Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S Stanley

    2016-05-18

    Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability.

  13. Evaluation of methodologies for assessing the overall diet: dietary quality scores and dietary pattern analysis.

    PubMed

    Ocké, Marga C

    2013-05-01

    This paper aims to describe different approaches for studying the overall diet with advantages and limitations. Studies of the overall diet have emerged because the relationship between dietary intake and health is very complex with all kinds of interactions. These cannot be captured well by studying single dietary components. Three main approaches to study the overall diet can be distinguished. The first method is researcher-defined scores or indices of diet quality. These are usually based on guidelines for a healthy diet or on diets known to be healthy. The second approach, using principal component or cluster analysis, is driven by the underlying dietary data. In principal component analysis, scales are derived based on the underlying relationships between food groups, whereas in cluster analysis, subgroups of the population are created with people that cluster together based on their dietary intake. A third approach includes methods that are driven by a combination of biological pathways and the underlying dietary data. Reduced rank regression defines linear combinations of food intakes that maximally explain nutrient intakes or intermediate markers of disease. Decision tree analysis identifies subgroups of a population whose members share dietary characteristics that influence (intermediate markers of) disease. It is concluded that all approaches have advantages and limitations and essentially answer different questions. The third approach is still more in an exploration phase, but seems to have great potential with complementary value. More insight into the utility of conducting studies on the overall diet can be gained if more attention is given to methodological issues.

  14. Microglia Morphological Categorization in a Rat Model of Neuroinflammation by Hierarchical Cluster and Principal Components Analysis.

    PubMed

    Fernández-Arjona, María Del Mar; Grondona, Jesús M; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D

    2017-01-01

    It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor.

  15. Microglia Morphological Categorization in a Rat Model of Neuroinflammation by Hierarchical Cluster and Principal Components Analysis

    PubMed Central

    Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.

    2017-01-01

    It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor. PMID:28848398

  16. The fine-scale genetic structure and evolution of the Japanese population.

    PubMed

    Takeuchi, Fumihiko; Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua; Teo, Yik-Ying; Kato, Norihiro

    2017-01-01

    The contemporary Japanese populations largely consist of three genetically distinct groups-Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics.

  17. A modified procedure for mixture-model clustering of regional geochemical data

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David B.; Horton, John D.

    2014-01-01

    A modified procedure is proposed for mixture-model clustering of regional-scale geochemical data. The key modification is the robust principal component transformation of the isometric log-ratio transforms of the element concentrations. This principal component transformation and the associated dimension reduction are applied before the data are clustered. The principal advantage of this modification is that it significantly improves the stability of the clustering. The principal disadvantage is that it requires subjective selection of the number of clusters and the number of principal components. To evaluate the efficacy of this modified procedure, it is applied to soil geochemical data that comprise 959 samples from the state of Colorado (USA) for which the concentrations of 44 elements are measured. The distributions of element concentrations that are derived from the mixture model and from the field samples are similar, indicating that the mixture model is a suitable representation of the transformed geochemical data. Each cluster and the associated distributions of the element concentrations are related to specific geologic and anthropogenic features. In this way, mixture model clustering facilitates interpretation of the regional geochemical data.

  18. Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

    NASA Astrophysics Data System (ADS)

    Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

    2017-08-01

    Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.

  19. Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.; Henson, Robert

    2012-01-01

    A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…

  20. Molecular reclassification of Crohn's disease: a cautionary note on population stratification.

    PubMed

    Maus, Bärbel; Jung, Camille; Mahachie John, Jestinah M; Hugot, Jean-Pierre; Génin, Emmanuelle; Van Steen, Kristel

    2013-01-01

    Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn's disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn's disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals.

  1. Molecular Reclassification of Crohn’s Disease: A Cautionary Note on Population Stratification

    PubMed Central

    Maus, Bärbel; Jung, Camille; Mahachie John, Jestinah M.; Hugot, Jean-Pierre; Génin, Emmanuelle; Van Steen, Kristel

    2013-01-01

    Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn’s disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn’s disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals. PMID:24147066

  2. Principal component analysis vs. self-organizing maps combined with hierarchical clustering for pattern recognition in volcano seismic spectra

    NASA Astrophysics Data System (ADS)

    Unglert, K.; Radić, V.; Jellinek, A. M.

    2016-06-01

    Variations in the spectral content of volcano seismicity related to changes in volcanic activity are commonly identified manually in spectrograms. However, long time series of monitoring data at volcano observatories require tools to facilitate automated and rapid processing. Techniques such as self-organizing maps (SOM) and principal component analysis (PCA) can help to quickly and automatically identify important patterns related to impending eruptions. For the first time, we evaluate the performance of SOM and PCA on synthetic volcano seismic spectra constructed from observations during two well-studied eruptions at Klauea Volcano, Hawai'i, that include features observed in many volcanic settings. In particular, our objective is to test which of the techniques can best retrieve a set of three spectral patterns that we used to compose a synthetic spectrogram. We find that, without a priori knowledge of the given set of patterns, neither SOM nor PCA can directly recover the spectra. We thus test hierarchical clustering, a commonly used method, to investigate whether clustering in the space of the principal components and on the SOM, respectively, can retrieve the known patterns. Our clustering method applied to the SOM fails to detect the correct number and shape of the known input spectra. In contrast, clustering of the data reconstructed by the first three PCA modes reproduces these patterns and their occurrence in time more consistently. This result suggests that PCA in combination with hierarchical clustering is a powerful practical tool for automated identification of characteristic patterns in volcano seismic spectra. Our results indicate that, in contrast to PCA, common clustering algorithms may not be ideal to group patterns on the SOM and that it is crucial to evaluate the performance of these tools on a control dataset prior to their application to real data.

  3. Cross-visit tumor sub-segmentation and registration with outlier rejection for dynamic contrast-enhanced MRI time series data.

    PubMed

    Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M

    2010-01-01

    Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.

  4. Multi-angle backscatter classification and sub-bottom profiling for improved seafloor characterization

    NASA Astrophysics Data System (ADS)

    Alevizos, Evangelos; Snellen, Mirjam; Simons, Dick; Siemes, Kerstin; Greinert, Jens

    2018-06-01

    This study applies three classification methods exploiting the angular dependence of acoustic seafloor backscatter along with high resolution sub-bottom profiling for seafloor sediment characterization in the Eckernförde Bay, Baltic Sea Germany. This area is well suited for acoustic backscatter studies due to its shallowness, its smooth bathymetry and the presence of a wide range of sediment types. Backscatter data were acquired using a Seabeam1180 (180 kHz) multibeam echosounder and sub-bottom profiler data were recorded using a SES-2000 parametric sonar transmitting 6 and 12 kHz. The high density of seafloor soundings allowed extracting backscatter layers for five beam angles over a large part of the surveyed area. A Bayesian probability method was employed for sediment classification based on the backscatter variability at a single incidence angle, whereas Maximum Likelihood Classification (MLC) and Principal Components Analysis (PCA) were applied to the multi-angle layers. The Bayesian approach was used for identifying the optimum number of acoustic classes because cluster validation is carried out prior to class assignment and class outputs are ordinal categorical values. The method is based on the principle that backscatter values from a single incidence angle express a normal distribution for a particular sediment type. The resulting Bayesian classes were well correlated to median grain sizes and the percentage of coarse material. The MLC method uses angular response information from five layers of training areas extracted from the Bayesian classification map. The subsequent PCA analysis is based on the transformation of these five layers into two principal components that comprise most of the data variability. These principal components were clustered in five classes after running an external cluster validation test. In general both methods MLC and PCA, separated the various sediment types effectively, showing good agreement (kappa >0.7) with the Bayesian approach which also correlates well with ground truth data (r2 > 0.7). In addition, sub-bottom data were used in conjunction with the Bayesian classification results to characterize acoustic classes with respect to their geological and stratigraphic interpretation. The joined interpretation of seafloor and sub-seafloor data sets proved to be an efficient approach for a better understanding of seafloor backscatter patchiness and to discriminate acoustically similar classes in different geological/bathymetric settings.

  5. A dimension reduction strategy for improving the efficiency of computer-aided detection for CT colonography

    NASA Astrophysics Data System (ADS)

    Song, Bowen; Zhang, Guopeng; Wang, Huafeng; Zhu, Wei; Liang, Zhengrong

    2013-02-01

    Various types of features, e.g., geometric features, texture features, projection features etc., have been introduced for polyp detection and differentiation tasks via computer aided detection and diagnosis (CAD) for computed tomography colonography (CTC). Although these features together cover more information of the data, some of them are statistically highly-related to others, which made the feature set redundant and burdened the computation task of CAD. In this paper, we proposed a new dimension reduction method which combines hierarchical clustering and principal component analysis (PCA) for false positives (FPs) reduction task. First, we group all the features based on their similarity using hierarchical clustering, and then PCA is employed within each group. Different numbers of principal components are selected from each group to form the final feature set. Support vector machine is used to perform the classification. The results show that when three principal components were chosen from each group we can achieve an area under the curve of receiver operating characteristics of 0.905, which is as high as the original dataset. Meanwhile, the computation time is reduced by 70% and the feature set size is reduce by 77%. It can be concluded that the proposed method captures the most important information of the feature set and the classification accuracy is not affected after the dimension reduction. The result is promising and further investigation, such as automatically threshold setting, are worthwhile and are under progress.

  6. Analysis of indoor air pollutants checklist using environmetric technique for health risk assessment of sick building complaint in nonindustrial workplace.

    PubMed

    Syazwan, Ai; Rafee, B Mohd; Juahir, Hafizan; Azman, Azf; Nizar, Am; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, Aa; Yunos, Ma Syafiq; Anita, Ar; Hanafiah, J Muhamad; Shaharuddin, Ms; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, Mn Mohamad; Azizan, Hs; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, Ft

    2012-01-01

    To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures.

  7. Heavy metal contamination of agricultural soils affected by mining activities around the Ganxi River in Chenzhou, Southern China.

    PubMed

    Ma, Li; Sun, Jing; Yang, Zhaoguang; Wang, Lin

    2015-12-01

    Heavy metal contamination attracted a wide spread attention due to their strong toxicity and persistence. The Ganxi River, located in Chenzhou City, Southern China, has been severely polluted by lead/zinc ore mining activities. This work investigated the heavy metal pollution in agricultural soils around the Ganxi River. The total concentrations of heavy metals were determined by inductively coupled plasma-mass spectrometry. The potential risk associated with the heavy metals in soil was assessed by Nemerow comprehensive index and potential ecological risk index. In both methods, the study area was rated as very high risk. Multivariate statistical methods including Pearson's correlation analysis, hierarchical cluster analysis, and principal component analysis were employed to evaluate the relationships between heavy metals, as well as the correlation between heavy metals and pH, to identify the metal sources. Three distinct clusters have been observed by hierarchical cluster analysis. In principal component analysis, a total of two components were extracted to explain over 90% of the total variance, both of which were associated with anthropogenic sources.

  8. Use of multivariate statistics to identify unreliable data obtained using CASA.

    PubMed

    Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón

    2013-06-01

    In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.

  9. Principal Component and Linkage Analysis of Cardiovascular Risk Traits in the Norfolk Isolate

    PubMed Central

    Cox, Hannah C.; Bellis, Claire; Lea, Rod A.; Quinlan, Sharon; Hughes, Roger; Dyer, Thomas; Charlesworth, Jac; Blangero, John; Griffiths, Lyn R.

    2009-01-01

    Objective(s) An individual's risk of developing cardiovascular disease (CVD) is influenced by genetic factors. This study focussed on mapping genetic loci for CVD-risk traits in a unique population isolate derived from Norfolk Island. Methods This investigation focussed on 377 individuals descended from the population founders. Principal component analysis was used to extract orthogonal components from 11 cardiovascular risk traits. Multipoint variance component methods were used to assess genome-wide linkage using SOLAR to the derived factors. A total of 285 of the 377 related individuals were informative for linkage analysis. Results A total of 4 principal components accounting for 83% of the total variance were derived. Principal component 1 was loaded with body size indicators; principal component 2 with body size, cholesterol and triglyceride levels; principal component 3 with the blood pressures; and principal component 4 with LDL-cholesterol and total cholesterol levels. Suggestive evidence of linkage for principal component 2 (h2 = 0.35) was observed on chromosome 5q35 (LOD = 1.85; p = 0.0008). While peak regions on chromosome 10p11.2 (LOD = 1.27; p = 0.005) and 12q13 (LOD = 1.63; p = 0.003) were observed to segregate with principal components 1 (h2 = 0.33) and 4 (h2 = 0.42), respectively. Conclusion(s): This study investigated a number of CVD risk traits in a unique isolated population. Findings support the clustering of CVD risk traits and provide interesting evidence of a region on chromosome 5q35 segregating with weight, waist circumference, HDL-c and total triglyceride levels. PMID:19339786

  10. The fine-scale genetic structure and evolution of the Japanese population

    PubMed Central

    Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua

    2017-01-01

    The contemporary Japanese populations largely consist of three genetically distinct groups—Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics. PMID:29091727

  11. A Network View on Psychiatric Disorders: Network Clusters of Symptoms as Elementary Syndromes of Psychopathology

    PubMed Central

    Goekoop, Rutger; Goekoop, Jaap G.

    2014-01-01

    Introduction The vast number of psychopathological syndromes that can be observed in clinical practice can be described in terms of a limited number of elementary syndromes that are differentially expressed. Previous attempts to identify elementary syndromes have shown limitations that have slowed progress in the taxonomy of psychiatric disorders. Aim To examine the ability of network community detection (NCD) to identify elementary syndromes of psychopathology and move beyond the limitations of current classification methods in psychiatry. Methods 192 patients with unselected mental disorders were tested on the Comprehensive Psychopathological Rating Scale (CPRS). Principal component analysis (PCA) was performed on the bootstrapped correlation matrix of symptom scores to extract the principal component structure (PCS). An undirected and weighted network graph was constructed from the same matrix. Network community structure (NCS) was optimized using a previously published technique. Results In the optimal network structure, network clusters showed a 89% match with principal components of psychopathology. Some 6 network clusters were found, including "DEPRESSION", "MANIA", “ANXIETY”, "PSYCHOSIS", "RETARDATION", and "BEHAVIORAL DISORGANIZATION". Network metrics were used to quantify the continuities between the elementary syndromes. Conclusion We present the first comprehensive network graph of psychopathology that is free from the biases of previous classifications: a ‘Psychopathology Web’. Clusters within this network represent elementary syndromes that are connected via a limited number of bridge symptoms. Many problems of previous classifications can be overcome by using a network approach to psychopathology. PMID:25427156

  12. [Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

    PubMed

    Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

    2011-11-01

    The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.

  13. Unsupervised spike sorting based on discriminative subspace learning.

    PubMed

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2014-01-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.

  14. Measuring the Indonesian provinces competitiveness by using PCA technique

    NASA Astrophysics Data System (ADS)

    Runita, Ditha; Fajriyah, Rohmatul

    2017-12-01

    Indonesia is a country which has vast teritoty. It has 34 provinces. Building local competitiveness is critical to enhance the long-term national competitiveness especially for a country as diverse as Indonesia. A competitive local government can attract and maintain successful firms and increase living standards for its inhabitants, because investment and skilled workers gravitate from uncompetitive regions to more competitive ones. Altough there are other methods to measuring competitiveness, but here we have demonstrated a simple method using principal component analysis (PCA). It can directly be applied to correlated, multivariate data. The analysis on Indonesian provinces provides 3 clusters based on the competitiveness measurement and the clusters are Bad, Good and Best perform provinces.

  15. [Discrimination of types of polyacrylamide based on near infrared spectroscopy coupled with least square support vector machine].

    PubMed

    Zhang, Hong-Guang; Yang, Qin-Min; Lu, Jian-Gang

    2014-04-01

    In this paper, a novel discriminant methodology based on near infrared spectroscopic analysis technique and least square support vector machine was proposed for rapid and nondestructive discrimination of different types of Polyacrylamide. The diffuse reflectance spectra of samples of Non-ionic Polyacrylamide, Anionic Polyacrylamide and Cationic Polyacrylamide were measured. Then principal component analysis method was applied to reduce the dimension of the spectral data and extract of the principal compnents. The first three principal components were used for cluster analysis of the three different types of Polyacrylamide. Then those principal components were also used as inputs of least square support vector machine model. The optimization of the parameters and the number of principal components used as inputs of least square support vector machine model was performed through cross validation based on grid search. 60 samples of each type of Polyacrylamide were collected. Thus a total of 180 samples were obtained. 135 samples, 45 samples for each type of Polyacrylamide, were randomly split into a training set to build calibration model and the rest 45 samples were used as test set to evaluate the performance of the developed model. In addition, 5 Cationic Polyacrylamide samples and 5 Anionic Polyacrylamide samples adulterated with different proportion of Non-ionic Polyacrylamide were also prepared to show the feasibilty of the proposed method to discriminate the adulterated Polyacrylamide samples. The prediction error threshold for each type of Polyacrylamide was determined by F statistical significance test method based on the prediction error of the training set of corresponding type of Polyacrylamide in cross validation. The discrimination accuracy of the built model was 100% for prediction of the test set. The prediction of the model for the 10 mixing samples was also presented, and all mixing samples were accurately discriminated as adulterated samples. The overall results demonstrate that the discrimination method proposed in the present paper can rapidly and nondestructively discriminate the different types of Polyacrylamide and the adulterated Polyacrylamide samples, and offered a new approach to discriminate the types of Polyacrylamide.

  16. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    PubMed

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Diversity in phenotypic and nutritional traits in vegetable amaranth (Amaranthus tricolor), a nutritionally underutilised crop.

    PubMed

    Shukla, Sudhir; Bhargava, Atul; Chatterjee, Avijeet; Pandey, Avinash Chandra; Mishra, Brij K

    2010-01-15

    Assessment of genetic diversity in a crop-breeding programme helps in the identification of diverse parental combinations to create segregating progenies with maximum genetic variability and facilitates introgression of desirable genes from diverse germplasm into the available genetic base. In the present study, 39 strains of vegetable amaranth (Amaranthus tricolor) were evaluated for eight morphological and seven quality traits for two test seasons to study the extent of genetic divergence among the strains. Multivariate analysis showed that the first four principal components contributed 67.55% of the variability. Cluster analysis grouped the strains into six clusters that displayed a wide range of diversity for most of the traits. Cluster analysis has proved to be an effective method in grouping strains that may facilitate effective management and utilisation in crop-breeding programmes. The diverse strains falling in different clusters were identified, which can be utilised in different hybridisation programmes to develop high-foliage-yielding varieties rich in nutritional components. Copyright (c) 2009 Society of Chemical Industry.

  18. [Research on spectra recognition method for cabbages and weeds based on PCA and SIMCA].

    PubMed

    Zu, Qin; Deng, Wei; Wang, Xiu; Zhao, Chun-Jiang

    2013-10-01

    In order to improve the accuracy and efficiency of weed identification, the difference of spectral reflectance was employed to distinguish between crops and weeds. Firstly, the different combinations of Savitzky-Golay (SG) convolutional derivation and multiplicative scattering correction (MSC) method were applied to preprocess the raw spectral data. Then the clustering analysis of various types of plants was completed by using principal component analysis (PCA) method, and the feature wavelengths which were sensitive for classifying various types of plants were extracted according to the corresponding loading plots of the optimal principal components in PCA results. Finally, setting the feature wavelengths as the input variables, the soft independent modeling of class analogy (SIMCA) classification method was used to identify the various types of plants. The experimental results of classifying cabbages and weeds showed that on the basis of the optimal pretreatment by a synthetic application of MSC and SG convolutional derivation with SG's parameters set as 1rd order derivation, 3th degree polynomial and 51 smoothing points, 23 feature wavelengths were extracted in accordance with the top three principal components in PCA results. When SIMCA method was used for classification while the previously selected 23 feature wavelengths were set as the input variables, the classification rates of the modeling set and the prediction set were respectively up to 98.6% and 100%.

  19. [Study on discrimination of varieties of fire resistive coating for steel structure based on near-infrared spectroscopy].

    PubMed

    Xue, Gang; Song, Wen-qi; Li, Shu-chao

    2015-01-01

    In order to achieve the rapid identification of fire resistive coating for steel structure of different brands in circulating, a new method for the fast discrimination of varieties of fire resistive coating for steel structure by means of near infrared spectroscopy was proposed. The raster scanning near infrared spectroscopy instrument and near infrared diffuse reflectance spectroscopy were applied to collect the spectral curve of different brands of fire resistive coating for steel structure and the spectral data were preprocessed with standard normal variate transformation(standard normal variate transformation, SNV) and Norris second derivative. The principal component analysis (principal component analysis, PCA)was used to near infrared spectra for cluster analysis. The analysis results showed that the cumulate reliabilities of PC1 to PC5 were 99. 791%. The 3-dimentional plot was drawn with the scores of PC1, PC2 and PC3 X 10, which appeared to provide the best clustering of the varieties of fire resistive coating for steel structure. A total of 150 fire resistive coating samples were divided into calibration set and validation set randomly, the calibration set had 125 samples with 25 samples of each variety, and the validation set had 25 samples with 5 samples of each variety. According to the principal component scores of unknown samples, Mahalanobis distance values between each variety and unknown samples were calculated to realize the discrimination of different varieties. The qualitative analysis model for external verification of unknown samples is a 10% recognition ration. The results demonstrated that this identification method can be used as a rapid, accurate method to identify the classification of fire resistive coating for steel structure and provide technical reference for market regulation.

  20. Chemometric expertise of the quality of groundwater sources for domestic use.

    PubMed

    Spanos, Thomas; Ene, Antoaneta; Simeonova, Pavlina

    2015-01-01

    In the present study 49 representative sites have been selected for the collection of water samples from central water supplies with different geographical locations in the region of Kavala, Northern Greece. Ten physicochemical parameters (pH, electric conductivity, nitrate, chloride, sodium, potassium, total alkalinity, total hardness, bicarbonate and calcium) were analyzed monthly, in the period from January 2010 to December 2010. Chemometric methods were used for monitoring data mining and interpretation (cluster analysis, principal components analysis and source apportioning by principal components regression). The clustering of the chemical indicators delivers two major clusters related to the water hardness and the mineral components (impacted by sea, bedrock and acidity factors). The sampling locations are separated into three major clusters corresponding to the spatial distribution of the sites - coastal, lowland and semi-mountainous. The principal components analysis reveals two latent factors responsible for the data structures, which are also an indication for the sources determining the groundwater quality of the region (conditionally named "mineral" factor and "water hardness" factor). By the apportionment approach it is shown what the contribution is of each of the identified sources to the formation of the total concentration of each one of the chemical parameters. The mean values of the studied physicochemical parameters were found to be within the limits given in the 98/83/EC Directive. The water samples are appropriate for human consumption. The results of this study provide an overview of the hydrogeological profile of water supply system for the studied area.

  1. A network view on psychiatric disorders: network clusters of symptoms as elementary syndromes of psychopathology.

    PubMed

    Goekoop, Rutger; Goekoop, Jaap G

    2014-01-01

    The vast number of psychopathological syndromes that can be observed in clinical practice can be described in terms of a limited number of elementary syndromes that are differentially expressed. Previous attempts to identify elementary syndromes have shown limitations that have slowed progress in the taxonomy of psychiatric disorders. To examine the ability of network community detection (NCD) to identify elementary syndromes of psychopathology and move beyond the limitations of current classification methods in psychiatry. 192 patients with unselected mental disorders were tested on the Comprehensive Psychopathological Rating Scale (CPRS). Principal component analysis (PCA) was performed on the bootstrapped correlation matrix of symptom scores to extract the principal component structure (PCS). An undirected and weighted network graph was constructed from the same matrix. Network community structure (NCS) was optimized using a previously published technique. In the optimal network structure, network clusters showed a 89% match with principal components of psychopathology. Some 6 network clusters were found, including "Depression", "Mania", "Anxiety", "Psychosis", "Retardation", and "Behavioral Disorganization". Network metrics were used to quantify the continuities between the elementary syndromes. We present the first comprehensive network graph of psychopathology that is free from the biases of previous classifications: a 'Psychopathology Web'. Clusters within this network represent elementary syndromes that are connected via a limited number of bridge symptoms. Many problems of previous classifications can be overcome by using a network approach to psychopathology.

  2. Cluster-based exposure variation analysis

    PubMed Central

    2013-01-01

    Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p < 0.001). All three methods performed poorly in discriminating exposure patterns differing with respect to the variability in cycle time duration. Conclusion While C-EVA had a higher accuracy than conventional EVA, both failed to detect differences in temporal similarity. The data-driven optimality of data reduction and the capability of handling multiple exposure time lines in a single analysis are the advantages of the C-EVA. PMID:23557439

  3. Towards the identification of plant and animal binders on Australian stone knives.

    PubMed

    Blee, Alisa J; Walshe, Keryn; Pring, Allan; Quinton, Jamie S; Lenehan, Claire E

    2010-07-15

    There is limited information regarding the nature of plant and animal residues used as adhesives, fixatives and pigments found on Australian Aboriginal artefacts. This paper reports the use of FTIR in combination with the chemometric tools principal component analysis (PCA) and hierarchical clustering (HC) for the analysis and identification of Australian plant and animal fixatives on Australian stone artefacts. Ten different plant and animal residues were able to be discriminated from each other at a species level by combining FTIR spectroscopy with the chemometric data analysis methods, principal component analysis (PCA) and hierarchical clustering (HC). Application of this method to residues from three broken stone knives from the collections of the South Australian Museum indicated that two of the handles of knives were likely to have contained beeswax as the fixative whilst Spinifex resin was the probable binder on the third. Copyright 2010 Elsevier B.V. All rights reserved.

  4. Rosacea assessment by erythema index and principal component analysis segmentation maps

    NASA Astrophysics Data System (ADS)

    Kuzmina, Ilona; Rubins, Uldis; Saknite, Inga; Spigulis, Janis

    2017-12-01

    RGB images of rosacea were analyzed using segmentation maps of principal component analysis (PCA) and erythema index (EI). Areas of segmented clusters were compared to Clinician's Erythema Assessment (CEA) values given by two dermatologists. The results show that visible blood vessels are segmented more precisely on maps of the erythema index and the third principal component (PC3). In many cases, a distribution of clusters on EI and PC3 maps are very similar. Mean values of clusters' areas on these maps show a decrease of the area of blood vessels and erythema and an increase of lighter skin area after the therapy for the patients with diagnosis CEA = 2 on the first visit and CEA=1 on the second visit. This study shows that EI and PC3 maps are more useful than the maps of the first (PC1) and second (PC2) principal components for indicating vascular structures and erythema on the skin of rosacea patients and therapy monitoring.

  5. Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data

    NASA Astrophysics Data System (ADS)

    Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.

    2014-12-01

    We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.

  6. Transforming Graph Data for Statistical Relational Learning

    DTIC Science & Technology

    2012-10-01

    Jordan, 2003), PLSA (Hofmann, 1999), ? Classification via RMN (Taskar et al., 2003) or SVM (Hasan, Chaoji, Salem , & Zaki, 2006) ? Hierarchical...dimensionality reduction methods such as Principal 407 Rossi, McDowell, Aha, & Neville Component Analysis (PCA), Principal Factor Analysis ( PFA ), and...clustering algorithm. Journal of the Royal Statistical Society. Series C, Applied statistics, 28, 100–108. Hasan, M. A., Chaoji, V., Salem , S., & Zaki, M

  7. SELF-ORGANIZING MAPS FOR INTEGRATED ASSESSMENT OF THE MID-ATLANTIC REGION

    EPA Science Inventory

    A. new method was developed to perform an environmental assessment for the
    Mid-Atlantic Region (MAR). This was a combination of the self-organizing map (SOM) neural network and principal component analysis (PCA). The method is capable of clustering ecosystems in terms of envi...

  8. Dynamic competitive probabilistic principal components analysis.

    PubMed

    López-Rubio, Ezequiel; Ortiz-DE-Lazcano-Lobato, Juan Miguel

    2009-04-01

    We present a new neural model which extends the classical competitive learning (CL) by performing a Probabilistic Principal Components Analysis (PPCA) at each neuron. The model also has the ability to learn the number of basis vectors required to represent the principal directions of each cluster, so it overcomes a drawback of most local PCA models, where the dimensionality of a cluster must be fixed a priori. Experimental results are presented to show the performance of the network with multispectral image data.

  9. Analysis of genetic diversity in banana cultivars (Musa cvs.) from the South of Oman using AFLP markers and classification by phylogenetic, hierarchical clustering and principal component analyses*

    PubMed Central

    Opara, Umezuruike Linus; Jacobson, Dan; Al-Saady, Nadiya Abubakar

    2010-01-01

    Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs. This study employed amplified fragment length polymorphism (AFLP) to investigate the genetic variation in local banana cultivars from the southern region of Oman. Using 12 primer combinations, a total of 1094 bands were scored, of which 1012 were polymorphic. Eighty-two unique markers were identified, which revealed the distinct separation of the seven cultivars. The results obtained show that AFLP can be used to differentiate the banana cultivars. Further classification by phylogenetic, hierarchical clustering and principal component analyses showed significant differences between the clusters found with molecular markers and those clusters created by previous studies using morphological analysis. Based on the analytical results, a consensus dendrogram of the banana cultivars is presented. PMID:20443211

  10. Statistical classification of hydrogeologic regions in the fractured rock area of Maryland and parts of the District of Columbia, Virginia, West Virginia, Pennsylvania, and Delaware

    USGS Publications Warehouse

    Fleming, Brandon J.; LaMotte, Andrew E.; Sekellick, Andrew J.

    2013-01-01

    Hydrogeologic regions in the fractured rock area of Maryland were classified using geographic information system tools with principal components and cluster analyses. A study area consisting of the 8-digit Hydrologic Unit Code (HUC) watersheds with rivers that flow through the fractured rock area of Maryland and bounded by the Fall Line was further subdivided into 21,431 catchments from the National Hydrography Dataset Plus. The catchments were then used as a common hydrologic unit to compile relevant climatic, topographic, and geologic variables. A principal components analysis was performed on 10 input variables, and 4 principal components that accounted for 83 percent of the variability in the original data were identified. A subsequent cluster analysis grouped the catchments based on four principal component scores into six hydrogeologic regions. Two crystalline rock hydrogeologic regions, including large parts of the Washington, D.C. and Baltimore metropolitan regions that represent over 50 percent of the fractured rock area of Maryland, are distinguished by differences in recharge, Precipitation minus Potential Evapotranspiration, sand content in soils, and groundwater contributions to streams. This classification system will provide a georeferenced digital hydrogeologic framework for future investigations of groundwater availability in the fractured rock area of Maryland.

  11. Lightweight biometric detection system for human classification using pyroelectric infrared detectors.

    PubMed

    Burchett, John; Shankar, Mohan; Hamza, A Ben; Guenther, Bob D; Pitsianis, Nikos; Brady, David J

    2006-05-01

    We use pyroelectric detectors that are differential in nature to detect motion in humans by their heat emissions. Coded Fresnel lens arrays create boundaries that help to localize humans in space as well as to classify the nature of their motion. We design and implement a low-cost biometric tracking system by using off-the-shelf components. We demonstrate two classification methods by using data gathered from sensor clusters of dual-element pyroelectric detectors with coded Fresnel lens arrays. We propose two algorithms for person identification, a more generalized spectral clustering method and a more rigorous example that uses principal component regression to perform a blind classification.

  12. [Methods of a posteriori identification of food patterns in Brazilian children: a systematic review].

    PubMed

    Carvalho, Carolina Abreu de; Fonsêca, Poliana Cristina de Almeida; Nobre, Luciana Neri; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro

    2016-01-01

    The objective of this study is to provide guidance for identifying dietary patterns using the a posteriori approach, and analyze the methodological aspects of the studies conducted in Brazil that identified the dietary patterns of children. Articles were selected from the Latin American and Caribbean Literature on Health Sciences, Scientific Electronic Library Online and Pubmed databases. The key words were: Dietary pattern; Food pattern; Principal Components Analysis; Factor analysis; Cluster analysis; Reduced rank regression. We included studies that identified dietary patterns of children using the a posteriori approach. Seven studies published between 2007 and 2014 were selected, six of which were cross-sectional and one cohort, Five studies used the food frequency questionnaire for dietary assessment; one used a 24-hour dietary recall and the other a food list. The method of exploratory approach used in most publications was principal components factor analysis, followed by cluster analysis. The sample size of the studies ranged from 232 to 4231, the values of the Kaiser-Meyer-Olkin test from 0.524 to 0.873, and Cronbach's alpha from 0.51 to 0.69. Few Brazilian studies identified dietary patterns of children using the a posteriori approach and principal components factor analysis was the technique most used.

  13. Clustering high dimensional data using RIA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aziz, Nazrina

    2015-05-15

    Clustering may simply represent a convenient method for organizing a large data set so that it can easily be understood and information can efficiently be retrieved. However, identifying cluster in high dimensionality data sets is a difficult task because of the curse of dimensionality. Another challenge in clustering is some traditional functions cannot capture the pattern dissimilarity among objects. In this article, we used an alternative dissimilarity measurement called Robust Influence Angle (RIA) in the partitioning method. RIA is developed using eigenstructure of the covariance matrix and robust principal component score. We notice that, it can obtain cluster easily andmore » hence avoid the curse of dimensionality. It is also manage to cluster large data sets with mixed numeric and categorical value.« less

  14. Identification among morphologically similar Argyreia (Convolvulaceae) based on leaf anatomy and phenetic analyses.

    PubMed

    Traiperm, Paweena; Chow, Janene; Nopun, Possathorn; Staples, G; Swangpol, Sasivimon C

    2017-12-01

    The genus Argyreia Lour. is one of the species-rich Asian genera in the family Convolvulaceae. Several species complexes were recognized in which taxon delimitation was imprecise, especially when examining herbarium materials without fully developed open flowers. The main goal of this study is to investigate and describe leaf anatomy for some morphologically similar Argyreia using epidermal peeling, leaf and petiole transverse sections, and scanning electron microscopy. Phenetic analyses including cluster analysis and principal component analysis were used to investigate the similarity of these morpho-types. Anatomical differences observed between the morpho-types include epidermal cell walls and the trichome types on the leaf epidermis. Additional differences in the leaf and petiole transverse sections include the epidermal cell shape of the adaxial leaf blade, the leaf margins, and the petiole transverse sectional outline. The phenogram from cluster analysis using the UPGMA method represented four groups with an R value of 0.87. Moreover, the important quantitative and qualitative leaf anatomical traits of the four groups were confirmed by the principal component analysis of the first two components. The results from phenetic analyses confirmed the anatomical differentiation between the morpho-types. Leaf anatomical features regarded as particularly informative for morpho-type differentiation can be used to supplement macro morphological identification.

  15. An efficient method to identify differentially expressed genes in microarray experiments

    PubMed Central

    Qin, Huaizhen; Feng, Tao; Harding, Scott A.; Tsai, Chung-Jui; Zhang, Shuanglin

    2013-01-01

    Motivation Microarray experiments typically analyze thousands to tens of thousands of genes from small numbers of biological replicates. The fact that genes are normally expressed in functionally relevant patterns suggests that gene-expression data can be stratified and clustered into relatively homogenous groups. Cluster-wise dimensionality reduction should make it feasible to improve screening power while minimizing information loss. Results We propose a powerful and computationally simple method for finding differentially expressed genes in small microarray experiments. The method incorporates a novel stratification-based tight clustering algorithm, principal component analysis and information pooling. Comprehensive simulations show that our method is substantially more powerful than the popular SAM and eBayes approaches. We applied the method to three real microarray datasets: one from a Populus nitrogen stress experiment with 3 biological replicates; and two from public microarray datasets of human cancers with 10 to 40 biological replicates. In all three analyses, our method proved more robust than the popular alternatives for identification of differentially expressed genes. Availability The C++ code to implement the proposed method is available upon request for academic use. PMID:18453554

  16. Reformulation of Traditional Chamomile Oil: Quality Controls and Fingerprint Presentation Based on Cluster Analysis of Attenuated Total Reflectance–Infrared Spectral Data

    PubMed Central

    Sakhteman, Amirhossein; Faridi, Pouya; Daneshamouz, Saeid; Akbarizadeh, Amin Reza; Borhani-Haghighi, Afshin; Mohagheghzadeh, Abdolali

    2017-01-01

    Herbal oils have been widely used in Iran as medicinal compounds dating back to thousands of years in Iran. Chamomile oil is widely used as an example of traditional oil. We remade chamomile oils and tried to modify it with current knowledge and facilities. Six types of oil (traditional and modified) were prepared. Microbial limit tests and physicochemical tests were performed on them. Also, principal component analysis, hierarchical cluster analysis, and partial least squares discriminant analysis were done on the spectral data of attenuated total reflectance–infrared in order to obtain insight based on classification pattern of the samples. The results show that we can use modified versions of the chamomile oils (modified Clevenger-type apparatus method and microwave method) with the same content of traditional ones and with less microbial contaminations and better physicochemical properties. PMID:28585466

  17. Reformulation of Traditional Chamomile Oil: Quality Controls and Fingerprint Presentation Based on Cluster Analysis of Attenuated Total Reflectance-Infrared Spectral Data.

    PubMed

    Zargaran, Arman; Sakhteman, Amirhossein; Faridi, Pouya; Daneshamouz, Saeid; Akbarizadeh, Amin Reza; Borhani-Haghighi, Afshin; Mohagheghzadeh, Abdolali

    2017-10-01

    Herbal oils have been widely used in Iran as medicinal compounds dating back to thousands of years in Iran. Chamomile oil is widely used as an example of traditional oil. We remade chamomile oils and tried to modify it with current knowledge and facilities. Six types of oil (traditional and modified) were prepared. Microbial limit tests and physicochemical tests were performed on them. Also, principal component analysis, hierarchical cluster analysis, and partial least squares discriminant analysis were done on the spectral data of attenuated total reflectance-infrared in order to obtain insight based on classification pattern of the samples. The results show that we can use modified versions of the chamomile oils (modified Clevenger-type apparatus method and microwave method) with the same content of traditional ones and with less microbial contaminations and better physicochemical properties.

  18. A Spacecraft Electrical Characteristics Multi-Label Classification Method Based on Off-Line FCM Clustering and On-Line WPSVM

    PubMed Central

    Li, Ke; Liu, Yi; Wang, Quanxin; Wu, Yalei; Song, Shimin; Sun, Yi; Liu, Tengchong; Wang, Jun; Li, Yang; Du, Shaoyi

    2015-01-01

    This paper proposes a novel multi-label classification method for resolving the spacecraft electrical characteristics problems which involve many unlabeled test data processing, high-dimensional features, long computing time and identification of slow rate. Firstly, both the fuzzy c-means (FCM) offline clustering and the principal component feature extraction algorithms are applied for the feature selection process. Secondly, the approximate weighted proximal support vector machine (WPSVM) online classification algorithms is used to reduce the feature dimension and further improve the rate of recognition for electrical characteristics spacecraft. Finally, the data capture contribution method by using thresholds is proposed to guarantee the validity and consistency of the data selection. The experimental results indicate that the method proposed can obtain better data features of the spacecraft electrical characteristics, improve the accuracy of identification and shorten the computing time effectively. PMID:26544549

  19. Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

    PubMed

    Balouchestani, Mohammadreza; Krishnan, Sridhar

    2014-01-01

    Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.

  20. Efficient evaluation of sampling quality of molecular dynamics simulations by clustering of dihedral torsion angles and Sammon mapping.

    PubMed

    Frickenhaus, Stephan; Kannan, Srinivasaraghavan; Zacharias, Martin

    2009-02-01

    A direct conformational clustering and mapping approach for peptide conformations based on backbone dihedral angles has been developed and applied to compare conformational sampling of Met-enkephalin using two molecular dynamics (MD) methods. Efficient clustering in dihedrals has been achieved by evaluating all combinations resulting from independent clustering of each dihedral angle distribution, thus resolving all conformational substates. In contrast, Cartesian clustering was unable to accurately distinguish between all substates. Projection of clusters on dihedral principal component (PCA) subspaces did not result in efficient separation of highly populated clusters. However, representation in a nonlinear metric by Sammon mapping was able to separate well the 48 highest populated clusters in just two dimensions. In addition, this approach also allowed us to visualize the transition frequencies between clusters efficiently. Significantly, higher transition frequencies between more distinct conformational substates were found for a recently developed biasing-potential replica exchange MD simulation method allowing faster sampling of possible substates compared to conventional MD simulations. Although the number of theoretically possible clusters grows exponentially with peptide length, in practice, the number of clusters is only limited by the sampling size (typically much smaller), and therefore the method is well suited also for large systems. The approach could be useful to rapidly and accurately evaluate conformational sampling during MD simulations, to compare different sampling strategies and eventually to detect kinetic bottlenecks in folding pathways.

  1. Critical Analysis of Cluster Models and Exchange-Correlation Functionals for Calculating Magnetic Shielding in Molecular Solids.

    PubMed

    Holmes, Sean T; Iuliucci, Robbie J; Mueller, Karl T; Dybowski, Cecil

    2015-11-10

    Calculations of the principal components of magnetic-shielding tensors in crystalline solids require the inclusion of the effects of lattice structure on the local electronic environment to obtain significant agreement with experimental NMR measurements. We assess periodic (GIPAW) and GIAO/symmetry-adapted cluster (SAC) models for computing magnetic-shielding tensors by calculations on a test set containing 72 insulating molecular solids, with a total of 393 principal components of chemical-shift tensors from 13C, 15N, 19F, and 31P sites. When clusters are carefully designed to represent the local solid-state environment and when periodic calculations include sufficient variability, both methods predict magnetic-shielding tensors that agree well with experimental chemical-shift values, demonstrating the correspondence of the two computational techniques. At the basis-set limit, we find that the small differences in the computed values have no statistical significance for three of the four nuclides considered. Subsequently, we explore the effects of additional DFT methods available only with the GIAO/cluster approach, particularly the use of hybrid-GGA functionals, meta-GGA functionals, and hybrid meta-GGA functionals that demonstrate improved agreement in calculations on symmetry-adapted clusters. We demonstrate that meta-GGA functionals improve computed NMR parameters over those obtained by GGA functionals in all cases, and that hybrid functionals improve computed results over the respective pure DFT functional for all nuclides except 15N.

  2. Self-aggregation in scaled principal component space

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ding, Chris H.Q.; He, Xiaofeng; Zha, Hongyuan

    2001-10-05

    Automatic grouping of voluminous data into meaningful structures is a challenging task frequently encountered in broad areas of science, engineering and information processing. These data clustering tasks are frequently performed in Euclidean space or a subspace chosen from principal component analysis (PCA). Here we describe a space obtained by a nonlinear scaling of PCA in which data objects self-aggregate automatically into clusters. Projection into this space gives sharp distinctions among clusters. Gene expression profiles of cancer tissue subtypes, Web hyperlink structure and Internet newsgroups are analyzed to illustrate interesting properties of the space.

  3. [A study of Boletus bicolor from different areas using Fourier transform infrared spectrometry].

    PubMed

    Zhou, Zai-Jin; Liu, Gang; Ren, Xian-Pei

    2010-04-01

    It is hard to differentiate the same species of wild growing mushrooms from different areas by macromorphological features. In this paper, Fourier transform infrared (FTIR) spectroscopy combined with principal component analysis was used to identify 58 samples of boletus bicolor from five different areas. Based on the fingerprint infrared spectrum of boletus bicolor samples, principal component analysis was conducted on 58 boletus bicolor spectra in the range of 1 350-750 cm(-1) using the statistical software SPSS 13.0. According to the result, the accumulated contributing ratio of the first three principal components accounts for 88.87%. They included almost all the information of samples. The two-dimensional projection plot using first and second principal component is a satisfactory clustering effect for the classification and discrimination of boletus bicolor. All boletus bicolor samples were divided into five groups with a classification accuracy of 98.3%. The study demonstrated that wild growing boletus bicolor at species level from different areas can be identified by FTIR spectra combined with principal components analysis.

  4. puma: a Bioconductor package for propagating uncertainty in microarray analysis.

    PubMed

    Pearson, Richard D; Liu, Xuejun; Sanguinetti, Guido; Milo, Marta; Lawrence, Neil D; Rattray, Magnus

    2009-07-09

    Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied. puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.

  5. The pre-image problem in kernel methods.

    PubMed

    Kwok, James Tin-yau; Tsang, Ivor Wai-hung

    2004-11-01

    In this paper, we address the problem of finding the pre-image of a feature vector in the feature space induced by a kernel. This is of central importance in some kernel applications, such as on using kernel principal component analysis (PCA) for image denoising. Unlike the traditional method which relies on nonlinear optimization, our proposed method directly finds the location of the pre-image based on distance constraints in the feature space. It is noniterative, involves only linear algebra and does not suffer from numerical instability or local minimum problems. Evaluations on performing kernel PCA and kernel clustering on the USPS data set show much improved performance.

  6. [A spatial adaptive algorithm for endmember extraction on multispectral remote sensing image].

    PubMed

    Zhu, Chang-Ming; Luo, Jian-Cheng; Shen, Zhan-Feng; Li, Jun-Li; Hu, Xiao-Dong

    2011-10-01

    Due to the problem that the convex cone analysis (CCA) method can only extract limited endmember in multispectral imagery, this paper proposed a new endmember extraction method by spatial adaptive spectral feature analysis in multispectral remote sensing image based on spatial clustering and imagery slice. Firstly, in order to remove spatial and spectral redundancies, the principal component analysis (PCA) algorithm was used for lowering the dimensions of the multispectral data. Secondly, iterative self-organizing data analysis technology algorithm (ISODATA) was used for image cluster through the similarity of the pixel spectral. And then, through clustering post process and litter clusters combination, we divided the whole image data into several blocks (tiles). Lastly, according to the complexity of image blocks' landscape and the feature of the scatter diagrams analysis, the authors can determine the number of endmembers. Then using hourglass algorithm extracts endmembers. Through the endmember extraction experiment on TM multispectral imagery, the experiment result showed that the method can extract endmember spectra form multispectral imagery effectively. What's more, the method resolved the problem of the amount of endmember limitation and improved accuracy of the endmember extraction. The method has provided a new way for multispectral image endmember extraction.

  7. Accounting for Non-Gaussian Sources of Spatial Correlation in Parametric Functional Magnetic Resonance Imaging Paradigms II: A Method to Obtain First-Level Analysis Residuals with Uniform and Gaussian Spatial Autocorrelation Function and Independent and Identically Distributed Time-Series.

    PubMed

    Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Lacey, Simon; Sathian, K

    2018-02-01

    In a recent study Eklund et al. have shown that cluster-wise family-wise error (FWE) rate-corrected inferences made in parametric statistical method-based functional magnetic resonance imaging (fMRI) studies over the past couple of decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; principally because the spatial autocorrelation functions (sACFs) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggest otherwise. Hence, the residuals from general linear model (GLM)-based fMRI activation estimates in these studies may not have possessed a homogenously Gaussian sACF. Here we propose a method based on the assumption that heterogeneity and non-Gaussianity of the sACF of the first-level GLM analysis residuals, as well as temporal autocorrelations in the first-level voxel residual time-series, are caused by unmodeled MRI signal from neuronal and physiological processes as well as motion and other artifacts, which can be approximated by appropriate decompositions of the first-level residuals with principal component analysis (PCA), and removed. We show that application of this method yields GLM residuals with significantly reduced spatial correlation, nearly Gaussian sACF and uniform spatial smoothness across the brain, thereby allowing valid cluster-based FWE-corrected inferences based on assumption of Gaussian spatial noise. We further show that application of this method renders the voxel time-series of first-level GLM residuals independent, and identically distributed across time (which is a necessary condition for appropriate voxel-level GLM inference), without having to fit ad hoc stochastic colored noise models. Furthermore, the detection power of individual subject brain activation analysis is enhanced. This method will be especially useful for case studies, which rely on first-level GLM analysis inferences.

  8. The Potential of Multivariate Analysis in Assessing Students' Attitude to Curriculum Subjects

    ERIC Educational Resources Information Center

    Gaotlhobogwe, Michael; Laugharne, Janet; Durance, Isabelle

    2011-01-01

    Background: Understanding student attitudes to curriculum subjects is central to providing evidence-based options to policy makers in education. Purpose: We illustrate how quantitative approaches used in the social sciences and based on multivariate analysis (categorical Principal Components Analysis, Clustering Analysis and General Linear…

  9. Symptom Clusters in Advanced Cancer Patients: An Empirical Comparison of Statistical Methods and the Impact on Quality of Life.

    PubMed

    Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M

    2016-01-01

    Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  10. Multivariate analyses of salt stress and metabolite sensing in auto- and heterotroph Chenopodium cell suspensions.

    PubMed

    Wongchai, C; Chaidee, A; Pfeiffer, W

    2012-01-01

    Global warming increases plant salt stress via evaporation after irrigation, but how plant cells sense salt stress remains unknown. Here, we searched for correlation-based targets of salt stress sensing in Chenopodium rubrum cell suspension cultures. We proposed a linkage between the sensing of salt stress and the sensing of distinct metabolites. Consequently, we analysed various extracellular pH signals in autotroph and heterotroph cell suspensions. Our search included signals after 52 treatments: salt and osmotic stress, ion channel inhibitors (amiloride, quinidine), salt-sensing modulators (proline), amino acids, carboxylic acids and regulators (salicylic acid, 2,4-dichlorphenoxyacetic acid). Multivariate analyses revealed hirarchical clusters of signals and five principal components of extracellular proton flux. The principal component correlated with salt stress was an antagonism of γ-aminobutyric and salicylic acid, confirming involvement of acid-sensing ion channels (ASICs) in salt stress sensing. Proline, short non-substituted mono-carboxylic acids (C2-C6), lactic acid and amiloride characterised the four uncorrelated principal components of proton flux. The proline-associated principal component included an antagonism of 2,4-dichlorphenoxyacetic acid and a set of amino acids (hydrophobic, polar, acidic, basic). The five principal components captured 100% of variance of extracellular proton flux. Thus, a bias-free, functional high-throughput screening was established to extract new clusters of response elements and potential signalling pathways, and to serve as a core for quantitative meta-analysis in plant biology. The eigenvectors reorient research, associating proline with development instead of salt stress, and the proof of existence of multiple components of proton flux can help to resolve controversy about the acid growth theory. © 2011 German Botanical Society and The Royal Botanical Society of the Netherlands.

  11. Multi-Response Extraction Optimization Based on Anti-Oxidative Activity and Quality Evaluation by Main Indicator Ingredients Coupled with Chemometric Analysis on Thymus quinquecostatus Celak.

    PubMed

    Chang, Yan-Li; Shen, Meng; Ren, Xue-Yang; He, Ting; Wang, Le; Fan, Shu-Sheng; Wang, Xiu-Huan; Li, Xiao; Wang, Xiao-Ping; Chen, Xiao-Yi; Sui, Hong; She, Gai-Mei

    2018-04-19

    Thymus quinquecostatus Celak is a species of thyme in China and it used as condiment and herbal medicine for a long time. To set up the quality evaluation of T. quinquecostatus , the response surface methodology (RSM) based on its 2,2-Diphenyl-1-picrylhydrazyl (DPPH) radical scavenging activity was introduced to optimize the extraction condition, and the main indicator components were found through an UPLC-LTQ-Orbitrap MS n method. The ethanol concentration, solid-liquid ratio, and extraction time on optimum conditions were 42.32%, 1:17.51, and 1.8 h, respectively. 35 components having 12 phenolic acids and 23 flavonoids were unambiguously or tentatively identified both positive and negative modes to employ for the comprehensive analysis in the optimum anti-oxidative part. A simple, reliable, and sensitive HPLC method was performed for the multi-component quantitative analysis of T. quinquecostatus using six characteristic and principal phenolic acids and flavonoids as reference compounds. Furthermore, the chemometrics methods (principal components analysis (PCA) and hierarchical clustering analysis (HCA)) appraised the growing areas and harvest time of this herb closely relative to the quality-controlled. This study provided full-scale qualitative and quantitative information for the quality evaluation of T. quinquecostatus , which would be a valuable reference for further study and development of this herb and related laid the foundation of further study on its pharmacological efficacy.

  12. Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering.

    PubMed

    Chah, E; Hok, V; Della-Chiesa, A; Miller, J J H; O'Mara, S M; Reilly, R B

    2011-02-01

    This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%.A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected.

  13. Measuring Health Information Dissemination and Identifying Target Interest Communities on Twitter: Methods Development and Case Study of the @SafetyMD Network.

    PubMed

    Kandadai, Venk; Yang, Haodong; Jiang, Ling; Yang, Christopher C; Fleisher, Linda; Winston, Flaura Koplin

    2016-05-05

    Little is known about the ability of individual stakeholder groups to achieve health information dissemination goals through Twitter. This study aimed to develop and apply methods for the systematic evaluation and optimization of health information dissemination by stakeholders through Twitter. Tweet content from 1790 followers of @SafetyMD (July-November 2012) was examined. User emphasis, a new indicator of Twitter information dissemination, was defined and applied to retweets across two levels of retweeters originating from @SafetyMD. User interest clusters were identified based on principal component analysis (PCA) and hierarchical cluster analysis (HCA) of a random sample of 170 followers. User emphasis of keywords remained across levels but decreased by 9.5 percentage points. PCA and HCA identified 12 statistically unique clusters of followers within the @SafetyMD Twitter network. This study is one of the first to develop methods for use by stakeholders to evaluate and optimize their use of Twitter to disseminate health information. Our new methods provide preliminary evidence that individual stakeholders can evaluate the effectiveness of health information dissemination and create content-specific clusters for more specific targeted messaging.

  14. Employment relations and global health: a typological study of world labor markets.

    PubMed

    Chung, Haejoo; Muntaner, Carles; Benach, Joan

    2010-01-01

    In this study, the authors investigate the global labor market and employment relations, which are central building blocks of the welfare state; the aim is to propose a global typology of labor markets to explain global inequalities in population health. Countries are categorized into core (21), semi-peripheral (42), and peripheral (71) countries, based on gross national product per capita (Atlas method). Labor market-related variables and factors are then used to generate clusters of countries with principal components and cluster analysis methods. The authors then examine the relationship between the resulting clusters and health outcomes. The clusters of countries are largely geographically defined, each cluster with similar historical background and developmental strategy. However, there are interesting exceptions, which warrant further elaboration. The relationship between health outcomes and clusters largely follows the authors' expectations (except for communicable diseases): more egalitarian labor institutions have better health outcomes. The world system, then, can be divided according to different types of labor markets that are predictive of population health outcomes at each level of economic development. As is the case for health and social policies, variability in labor market characteristics is likely to reflect, in part, the relative strength of a country's political actors.

  15. Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data.

    PubMed

    Mwangi, Benson; Soares, Jair C; Hasan, Khader M

    2014-10-30

    Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Fast, Exact Bootstrap Principal Component Analysis for p > 1 million

    PubMed Central

    Fisher, Aaron; Caffo, Brian; Schwartz, Brian; Zipunnikov, Vadim

    2015-01-01

    Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same n-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the p-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (p = 900, n = 392), and to a dataset of brain magnetic resonance images (MRIs) (p ≈ 3 million, n = 352). For the MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods. PMID:27616801

  17. Unsupervised pattern recognition methods in ciders profiling based on GCE voltammetric signals.

    PubMed

    Jakubowska, Małgorzata; Sordoń, Wanda; Ciepiela, Filip

    2016-07-15

    This work presents a complete methodology of distinguishing between different brands of cider and ageing degrees, based on voltammetric signals, utilizing dedicated data preprocessing procedures and unsupervised multivariate analysis. It was demonstrated that voltammograms recorded on glassy carbon electrode in Britton-Robinson buffer at pH 2 are reproducible for each brand. By application of clustering algorithms and principal component analysis visible homogenous clusters were obtained. Advanced signal processing strategy which included automatic baseline correction, interval scaling and continuous wavelet transform with dedicated mother wavelet, was a key step in the correct recognition of the objects. The results show that voltammetry combined with optimized univariate and multivariate data processing is a sufficient tool to distinguish between ciders from various brands and to evaluate their freshness. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Free-energy landscape, principal component analysis, and structural clustering to identify representative conformations from molecular dynamics simulations: the myoglobin case.

    PubMed

    Papaleo, Elena; Mereghetti, Paolo; Fantucci, Piercarlo; Grandori, Rita; De Gioia, Luca

    2009-01-01

    Several molecular dynamics (MD) simulations were used to sample conformations in the neighborhood of the native structure of holo-myoglobin (holo-Mb), collecting trajectories spanning 0.22 micros at 300 K. Principal component (PCA) and free-energy landscape (FEL) analyses, integrated by cluster analysis, which was performed considering the position and structures of the individual helices of the globin fold, were carried out. The coherence between the different structural clusters and the basins of the FEL, together with the convergence of parameters derived by PCA indicates that an accurate description of the Mb conformational space around the native state was achieved by multiple MD trajectories spanning at least 0.14 micros. The integration of FEL, PCA, and structural clustering was shown to be a very useful approach to gain an overall view of the conformational landscape accessible to a protein and to identify representative protein substates. This method could be also used to investigate the conformational and dynamical properties of Mb apo-, mutant, or delete versions, in which greater conformational variability is expected and, therefore identification of representative substates from the simulations is relevant to disclose structure-function relationship.

  19. Pharmacophore modeling, docking, and principal component analysis based clustering: combined computer-assisted approaches to identify new inhibitors of the human rhinovirus coat protein.

    PubMed

    Steindl, Theodora M; Crump, Carolyn E; Hayden, Frederick G; Langer, Thierry

    2005-10-06

    The development and application of a sophisticated virtual screening and selection protocol to identify potential, novel inhibitors of the human rhinovirus coat protein employing various computer-assisted strategies are described. A large commercially available database of compounds was screened using a highly selective, structure-based pharmacophore model generated with the program Catalyst. A docking study and a principal component analysis were carried out within the software package Cerius and served to validate and further refine the obtained results. These combined efforts led to the selection of six candidate structures, for which in vitro anti-rhinoviral activity could be shown in a biological assay.

  20. Palate dimensions in six-year-old children with unilateral cleft lip and palate: a six-center study on dental casts.

    PubMed

    Koželj, Vesna; Vegnuti, Miljana; Drevenšek, Martina; Hortis-Dzierzbicka, Maria; Gonzalez-Landa, Gonzalo; Hanstein, Siiri; Klimova, Irena; Kobus, Kazimierz; Kobus-Zaleśna, Katarzyna; Semb, Gunvor; Shaw, Bill

    2012-11-01

    To compare palatal dimensions in 6-year-old children with unilateral cleft lip and palate (UCLP) treated by different protocols with those of noncleft children. Retrospective intercenter outcome study. Patients : Upper dental casts from 129 children with repaired UCLP and 30 controls were analyzed by the trigonometric method. Six European cleft centers. Main outcome measures : Sagittal, transverse, and vertical dimensions of the palate were observed. Palate variables were analyzed with descriptive methods and nonparametric tests. Regarding several various characteristics measured on a relatively small number of subjects, hierarchical, k-means clustering, and principal component analyses were used. Mean values of the observed dimensions for five cleft groups differed significantly from the control (p < .05). The group with one-stage closure of the cleft differed significantly from all other cleft groups in most variables (p < .05). Principal component analysis of all 159 cases identified three clusters with specific morphologic characteristics of the palate. A similar number of treated children were classified into each cluster, while all children without clefts were classified in the same cluster. The percentage of treated children from a particular group that fit this cluster ranged from 0% to 70% and increased with age at palatal closure and number of primary surgical procedures. At 6 years of age, children with stepwise repair and hard palate closure after the age of two more frequently result in palatal dimensions of noncleft control than children with earlier palatal closure and one-stage cleft repair.

  1. ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap

    PubMed Central

    Metsalu, Tauno; Vilo, Jaak

    2015-01-01

    The Principal Component Analysis (PCA) is a widely used method of reducing the dimensionality of high-dimensional data, often followed by visualizing two of the components on the scatterplot. Although widely used, the method is lacking an easy-to-use web interface that scientists with little programming skills could use to make plots of their own data. The same applies to creating heatmaps: it is possible to add conditional formatting for Excel cells to show colored heatmaps, but for more advanced features such as clustering and experimental annotations, more sophisticated analysis tools have to be used. We present a web tool called ClustVis that aims to have an intuitive user interface. Users can upload data from a simple delimited text file that can be created in a spreadsheet program. It is possible to modify data processing methods and the final appearance of the PCA and heatmap plots by using drop-down menus, text boxes, sliders etc. Appropriate defaults are given to reduce the time needed by the user to specify input parameters. As an output, users can download PCA plot and heatmap in one of the preferred file formats. This web server is freely available at http://biit.cs.ut.ee/clustvis/. PMID:25969447

  2. [Spatial distribution characteristics of the physical and chemical properties of water in the Kunes River after the supply of snowmelt during spring].

    PubMed

    Liu, Xiang; Guo, Ling-Peng; Zhang, Fei-Yun; Ma, Jie; Mu, Shu-Yong; Zhao, Xin; Li, Lan-Hai

    2015-02-01

    Eight physical and chemical indicators related to water quality were monitored from nineteen sampling sites along the Kunes River at the end of snowmelt season in spring. To investigate the spatial distribution characteristics of water physical and chemical properties, cluster analysis (CA), discriminant analysis (DA) and principal component analysis (PCA) are employed. The result of cluster analysis showed that the Kunes River could be divided into three reaches according to the similarities of water physical and chemical properties among sampling sites, representing the upstream, midstream and downstream of the river, respectively; The result of discriminant analysis demonstrated that the reliability of such a classification was high, and DO, Cl- and BOD5 were the significant indexes leading to this classification; Three principal components were extracted on the basis of the principal component analysis, in which accumulative variance contribution could reach 86.90%. The result of principal component analysis also indicated that water physical and chemical properties were mostly affected by EC, ORP, NO3(-) -N, NH4(+) -N, Cl- and BOD5. The sorted results of principal component scores in each sampling sites showed that the water quality was mainly influenced by DO in upstream, by pH in midstream, and by the rest of indicators in downstream. The order of comprehensive scores for principal components revealed that the water quality degraded from the upstream to downstream, i.e., the upstream had the best water quality, followed by the midstream, while the water quality at downstream was the worst. This result corresponded exactly to the three reaches classified using cluster analysis. Anthropogenic activity and the accumulation of pollutants along the river were probably the main reasons leading to this spatial difference.

  3. Common factor analysis versus principal component analysis: choice for symptom cluster research.

    PubMed

    Kim, Hee-Ju

    2008-03-01

    The purpose of this paper is to examine differences between two factor analytical methods and their relevance for symptom cluster research: common factor analysis (CFA) versus principal component analysis (PCA). Literature was critically reviewed to elucidate the differences between CFA and PCA. A secondary analysis (N = 84) was utilized to show the actual result differences from the two methods. CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data. If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research), CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.

  4. MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development

    PubMed Central

    Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer

    2015-01-01

    Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885

  5. Quantitative assessment in thermal image segmentation for artistic objects

    NASA Astrophysics Data System (ADS)

    Yousefi, Bardia; Sfarra, Stefano; Maldague, Xavier P. V.

    2017-07-01

    The application of the thermal and infrared technology in different areas of research is considerably increasing. These applications involve Non-destructive Testing (NDT), Medical analysis (Computer Aid Diagnosis/Detection- CAD), Arts and Archaeology among many others. In the arts and archaeology field, infrared technology provides significant contributions in term of finding defects of possible impaired regions. This has been done through a wide range of different thermographic experiments and infrared methods. The proposed approach here focuses on application of some known factor analysis methods such as standard Non-Negative Matrix Factorization (NMF) optimized by gradient-descent-based multiplicative rules (SNMF1) and standard NMF optimized by Non-negative least squares (NNLS) active-set algorithm (SNMF2) and eigen decomposition approaches such as Principal Component Thermography (PCT), Candid Covariance-Free Incremental Principal Component Thermography (CCIPCT) to obtain the thermal features. On one hand, these methods are usually applied as preprocessing before clustering for the purpose of segmentation of possible defects. On the other hand, a wavelet based data fusion combines the data of each method with PCT to increase the accuracy of the algorithm. The quantitative assessment of these approaches indicates considerable segmentation along with the reasonable computational complexity. It shows the promising performance and demonstrated a confirmation for the outlined properties. In particular, a polychromatic wooden statue and a fresco were analyzed using the above mentioned methods and interesting results were obtained.

  6. A HIERARCHIAL STOCHASTIC MODEL OF LARGE SCALE ATMOSPHERIC CIRCULATION PATTERNS AND MULTIPLE STATION DAILY PRECIPITATION

    EPA Science Inventory

    A stochastic model of weather states and concurrent daily precipitation at multiple precipitation stations is described. our algorithms are invested for classification of daily weather states; k means, fuzzy clustering, principal components, and principal components coupled with ...

  7. Accounting for Non-Gaussian Sources of Spatial Correlation in Parametric Functional Magnetic Resonance Imaging Paradigms I: Revisiting Cluster-Based Inferences.

    PubMed

    Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Sathian, K

    2018-02-01

    In a recent study, Eklund et al. employed resting-state functional magnetic resonance imaging data as a surrogate for null functional magnetic resonance imaging (fMRI) datasets and posited that cluster-wise family-wise error (FWE) rate-corrected inferences made by using parametric statistical methods in fMRI studies over the past two decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; this was principally because the spatial autocorrelation functions (sACF) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggested otherwise. Here, we show that accounting for non-Gaussian signal components such as those arising from resting-state neural activity as well as physiological responses and motion artifacts in the null fMRI datasets yields first- and second-level general linear model analysis residuals with nearly uniform and Gaussian sACF. Further comparison with nonparametric permutation tests indicates that cluster-based FWE corrected inferences made with Gaussian spatial noise approximations are valid.

  8. Genetic diversity studies in pea (Pisum sativum L.) using simple sequence repeat markers.

    PubMed

    Kumari, P; Basal, N; Singh, A K; Rai, V P; Srivastava, C P; Singh, P K

    2013-03-13

    The genetic diversity among 28 pea (Pisum sativum L.) genotypes was analyzed using 32 simple sequence repeat markers. A total of 44 polymorphic bands, with an average of 2.1 bands per primer, were obtained. The polymorphism information content ranged from 0.657 to 0.309 with an average of 0.493. The variation in genetic diversity among these cultivars ranged from 0.11 to 0.73. Cluster analysis based on Jaccard's similarity coefficient using the unweighted pair-group method with arithmetic mean (UPGMA) revealed 2 distinct clusters, I and II, comprising 6 and 22 genotypes, respectively. Cluster II was further differentiated into 2 subclusters, IIA and IIB, with 12 and 10 genotypes, respectively. Principal component (PC) analysis revealed results similar to those of UPGMA. The first, second, and third PCs contributed 21.6, 16.1, and 14.0% of the variation, respectively; cumulative variation of the first 3 PCs was 51.7%.

  9. Analysis of the principal component algorithm in phase-shifting interferometry.

    PubMed

    Vargas, J; Quiroga, J Antonio; Belenguer, T

    2011-06-15

    We recently presented a new asynchronous demodulation method for phase-sampling interferometry. The method is based in the principal component analysis (PCA) technique. In the former work, the PCA method was derived heuristically. In this work, we present an in-depth analysis of the PCA demodulation method.

  10. Considering Horn's Parallel Analysis from a Random Matrix Theory Point of View.

    PubMed

    Saccenti, Edoardo; Timmerman, Marieke E

    2017-03-01

    Horn's parallel analysis is a widely used method for assessing the number of principal components and common factors. We discuss the theoretical foundations of parallel analysis for principal components based on a covariance matrix by making use of arguments from random matrix theory. In particular, we show that (i) for the first component, parallel analysis is an inferential method equivalent to the Tracy-Widom test, (ii) its use to test high-order eigenvalues is equivalent to the use of the joint distribution of the eigenvalues, and thus should be discouraged, and (iii) a formal test for higher-order components can be obtained based on a Tracy-Widom approximation. We illustrate the performance of the two testing procedures using simulated data generated under both a principal component model and a common factors model. For the principal component model, the Tracy-Widom test performs consistently in all conditions, while parallel analysis shows unpredictable behavior for higher-order components. For the common factor model, including major and minor factors, both procedures are heuristic approaches, with variable performance. We conclude that the Tracy-Widom procedure is preferred over parallel analysis for statistically testing the number of principal components based on a covariance matrix.

  11. Ripening-dependent metabolic changes in the volatiles of pineapple (Ananas comosus (L.) Merr.) fruit: II. Multivariate statistical profiling of pineapple aroma compounds based on comprehensive two-dimensional gas chromatography-mass spectrometry.

    PubMed

    Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg

    2015-03-01

    Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.

  12. Application of multi-scale wavelet entropy and multi-resolution Volterra models for climatic downscaling

    NASA Astrophysics Data System (ADS)

    Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana

    2018-01-01

    This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.

  13. Detection and tracking of gas plumes in LWIR hyperspectral video sequence data

    NASA Astrophysics Data System (ADS)

    Gerhart, Torin; Sunu, Justin; Lieu, Lauren; Merkurjev, Ekaterina; Chang, Jen-Mei; Gilles, Jérôme; Bertozzi, Andrea L.

    2013-05-01

    Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we present an effective method of visualizing hyperspectral video sequences containing chemical plumes and investigate the effectiveness of segmentation techniques on these post-processed videos. Our approach uses a combination of dimension reduction and histogram equalization to prepare the hyperspectral videos for segmentation. First, Principal Components Analysis (PCA) is used to reduce the dimension of the entire video sequence. This is done by projecting each pixel onto the first few Principal Components resulting in a type of spectral filter. Next, a Midway method for histogram equalization is used. These methods redistribute the intensity values in order to reduce icker between frames. This properly prepares these high-dimensional video sequences for more traditional segmentation techniques. We compare the ability of various clustering techniques to properly segment the chemical plume. These include K-means, spectral clustering, and the Ginzburg-Landau functional.

  14. Two worlds collide: Image analysis methods for quantifying structural variation in cluster molecular dynamics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less

  15. Two worlds collide: image analysis methods for quantifying structural variation in cluster molecular dynamics.

    PubMed

    Steenbergen, K G; Gaston, N

    2014-02-14

    Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.

  16. A novel unsupervised spike sorting algorithm for intracranial EEG.

    PubMed

    Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R

    2011-01-01

    This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.

  17. Chemometrics-based Approach in Analysis of Arnicae flos

    PubMed Central

    Zheleva-Dimitrova, Dimitrina Zh.; Balabanova, Vessela; Gevrenova, Reneta; Doichinova, Irini; Vitkova, Antonina

    2015-01-01

    Introduction: Arnica montana flowers have a long history as herbal medicines for external use on injuries and rheumatic complaints. Objective: To investigate Arnicae flos of cultivated accessions from Bulgaria, Poland, Germany, Finland, and Pharmacy store for phenolic derivatives and sesquiterpene lactones (STLs). Materials and Methods: Samples of Arnica from nine origins were prepared by ultrasound-assisted extraction with 80% methanol for phenolic compounds analysis. Subsequent reverse-phase high-performance liquid chromatography (HPLC) separation of the analytes was performed using gradient elution and ultraviolet detection at 280 and 310 nm (phenolic acids), and 360 nm (flavonoids). Total STLs were determined in chloroform extracts by solid-phase extraction-HPLC at 225 nm. The HPLC generated chromatographic data were analyzed using principal component analysis (PCA) and hierarchical clustering (HC). Results: The highest total amount of phenolic acids was found in the sample from Botanical Garden at Joensuu University, Finland (2.36 mg/g dw). Astragalin, isoquercitrin, and isorhamnetin 3-glucoside were the main flavonol glycosides being present up to 3.37 mg/g (astragalin). Three well-defined clusters were distinguished by PCA and HC. Cluster C1 comprised of the German and Finnish accessions characterized by the highest content of flavonols. Cluster C2 included the Bulgarian and Polish samples presenting a low content of flavonoids. Cluster C3 consisted only of one sample from a pharmacy store. Conclusion: A validated HPLC method for simultaneous determination of phenolic acids, flavonoid glycosides, and aglycones in A. montana flowers was developed. The PCA loading plot showed that quercetin, kaempferol, and isorhamnetin can be used to distinguish different Arnica accessions. SUMMARY A principal component analysis (PCA) on 13 phenolic compounds and total amount of sesquiterpene lactones in Arnicae flos collection tended to cluster the studied 9 accessions into three main groups. The profiles obtained demonstrated that the samples from Germany and Finland are characterized by greater amounts of phenolic derivatives than the Bulgarian and Polish ones. The PCA loading plot showed that quercetin, kaemferol and isorhamnetin can be used to distinguish different arnica accessions. PMID:27013791

  18. Bioclimatic Classification of Northeast Asia for climate change response

    NASA Astrophysics Data System (ADS)

    Choi, Y.; Jeon, S. W.; Lim, C. H.

    2016-12-01

    As climate change has been getting worse, we should monitor the change of biodiversity, and distribution of species to handle the crisis and take advantage of climate change. The development of bioclimatic map which classifies land into homogenous zones by similar environment properties is the first step to establish a strategy. Statistically derived classifications of land provide useful spatial frameworks to support ecosystem research, monitoring and policy decisions. Many countries are trying to make this kind of map and actively utilize it to ecosystem conservation and management. However, the Northeast Asia including North Korea doesn't have detailed environmental information, and has not built environmental classification map. Therefore, this study presents a bioclimatic map of Northeast Asia based on statistical clustering of bioclimate data. Bioclim data ver1.4 which provided by WorldClim were considered for inclusion in a model. Eight of the most relevant climate variables were selected by correlation analysis, based on previous studies. Principal Components Analysis (PCA) was used to explain 86% of the variation into three independent dimensions, which were subsequently clustered using an ISODATA clustering. The bioclimatic zone of Northeast Asia could consist of 29, 35, and 50 zones. This bioclimatic map has a 30' resolution. To assess the accuracy, the correlation coefficient was calculated between the first principal component values of the classification variables and the vegetation index, Gross Primary Production (GPP). It shows about 0.5 Pearson correlation coefficient. This study constructed Northeast Asia bioclimatic map by statistical method with high resolution, but in order to better reflect the realities, the variety of climate variables should be considered. Also, further studies should do more quantitative and qualitative validation in various ways. Then, this could be used more effectively to support decision making on climate change adaptation.

  19. Going beyond Clustering in MD Trajectory Analysis: An Application to Villin Headpiece Folding

    PubMed Central

    Rajan, Aruna; Freddolino, Peter L.; Schulten, Klaus

    2010-01-01

    Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous. PMID:20419160

  20. Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.

    PubMed

    Rajan, Aruna; Freddolino, Peter L; Schulten, Klaus

    2010-04-15

    Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.

  1. Taxonomic Identity of the Invasive Fruit Fly Pest, Bactrocera invadens: Concordance in Morphometry and DNA Barcoding

    PubMed Central

    Khamis, Fathiya M.; Masiga, Daniel K.; Mohamed, Samira A.; Salifu, Daisy; de Meyer, Marc; Ekesi, Sunday

    2012-01-01

    In 2003, a new fruit fly pest species was recorded for the first time in Kenya and has subsequently been found in 28 countries across tropical Africa. The insect was described as Bactrocera invadens, due to its rapid invasion of the African continent. In this study, the morphometry and DNA Barcoding of different populations of B. invadens distributed across the species range of tropical Africa and a sample from the pest's putative aboriginal home of Sri Lanka was investigated. Morphometry using wing veins and tibia length was used to separate B. invadens populations from other closely related Bactrocera species. The Principal component analysis yielded 15 components which correspond to the 15 morphometric measurements. The first two principal axes contributed to 90.7% of the total variance and showed partial separation of these populations. Canonical discriminant analysis indicated that only the first five canonical variates were statistically significant. The first two canonical variates contributed a total of 80.9% of the total variance clustering B. invadens with other members of the B. dorsalis complex while distinctly separating B. correcta, B. cucurbitae, B. oleae and B. zonata. The largest Mahalanobis squared distance (D2 = 122.9) was found to be between B. cucurbitae and B. zonata, while the lowest was observed between B. invadens populations against B. kandiensis (8.1) and against B. dorsalis s.s (11.4). Evolutionary history inferred by the Neighbor-Joining method clustered the Bactrocera species populations into four clusters. First cluster consisted of the B. dorsalis complex (B. invadens, B. kandiensis and B. dorsalis s. s.), branching from the same node while the second group was paraphyletic clades of B. correcta and B. zonata. The last two are monophyletic clades, consisting of B. cucurbitae and B. oleae, respectively. Principal component analysis using the genetic distances confirmed the clustering inferred by the NJ tree. PMID:23028649

  2. Determination, speciation and distribution of mercury in soil in the surroundings of a former chlor-alkali plant: assessment of sequential extraction procedure and analytical technique

    PubMed Central

    2013-01-01

    Background The paper presents the evaluation of soil contamination with total, water-available, mobile, semi-mobile and non-mobile Hg fractions in the surroundings of a former chlor-alkali plant in connection with several chemical soil characteristics. Principal Component Analysis and Cluster Analysis were used to evaluate the chemical composition variability of soil and factors influencing the fate of Hg in such areas. The sequential extraction EPA 3200-Method and the determination technique based on capacitively coupled microplasma optical emission spectrometry were checked. Results A case study was conducted in the Turda town, Romania. The results revealed a high contamination with Hg in the area of the former chlor-alkali plant and waste landfills, where soils were categorized as hazardous waste. The weight of the Hg fractions decreased in the order semi-mobile > non-mobile > mobile > water leachable. Principal Component Analysis revealed 7 factors describing chemical composition variability of soil, of which 3 attributed to Hg species. Total Hg, semi-mobile, non-mobile and mobile fractions were observed to have a strong influence, while the water leachable fraction a weak influence. The two-dimensional plot of PCs highlighted 3 groups of sites according to the Hg contamination factor. The statistical approach has shown that the Hg fate in soil is dependent on pH, content of organic matter, Ca, Fe, Mn, Cu and SO42- rather than natural components, such as aluminosilicates. Cluster analysis of soil characteristics revealed 3 clusters, one of which including Hg species. Soil contamination with Cu as sulfate and Zn as nitrate was also observed. Conclusions The approach based on speciation and statistical interpretation of data developed in this study could be useful in the investigation of other chlor-alkali contaminated areas. According to the Bland and Altman test the 3-step sequential extraction scheme is suitable for Hg speciation in soil, while the used determination method of Hg is appropriate. PMID:24252185

  3. Grouping of Bulgarian wines according to grape variety by using statistical methods

    NASA Astrophysics Data System (ADS)

    Milev, M.; Nikolova, Kr.; Ivanova, Ir.; Minkova, St.; Evtimov, T.; Krustev, St.

    2017-12-01

    68 different types of Bulgarian wines were studied in accordance with 9 optical parameters as follows: color parameters in XYZ and SIE Lab color systems, lightness, Hue angle, chroma, fluorescence intensity and emission wavelength. The main objective of this research is using hierarchical cluster analysis to evaluate the similarity and the distance between examined different types of Bulgarian wines and their grouping based on physical parameters. We have found that wines are grouped in clusters on the base of the degree of identity between them. There are two main clusters each one with two subclusters. The first one contains white wines and Sira, the second contains red wines and rose. The results from cluster analysis are presented graphically by a dendrogram. The other statistical technique used is factor analysis performed by the Method of Principal Components (PCA). The aim is to reduce the large number of variables to a few factors by grouping the correlated variables into one factor and subdividing the noncorrelated variables into different factors. Moreover the factor analysis provided the possibility to determine the parameters with the greatest influence over the distribution of samples in different clusters. In our study after the rotation of the factors with Varimax method the parameters were combined into two factors, which explain about 80 % of the total variation. The first one explains the 61.49% and correlates with color characteristics, the second one explains 18.34% from the variation and correlates with the parameters connected with fluorescence spectroscopy.

  4. Comprehensive quality assessment based specific chemical profiles for geographic and tissue variation in Gentiana rigescens using HPLC and FTIR method combined with principal component analysis

    NASA Astrophysics Data System (ADS)

    Li, Jie; Zhang, Ji; Zhao, Yan-Li; Huang, Heng-Yu; Wang, Yuan-Zhong

    2017-12-01

    Roots, stems, leaves and flowers of Longdan (Gentiana rigescens Franch. ex Hemsl) were collected from six geographic origins of Yunnan Province (n = 240) to implement the quality assessment based on contents of gentiopicroside, loganic acid, sweroside and swertiamarin and chemical profile using HPLC-DAD and FTIR method combined with principal component analysis (PCA). The content of gentiopicroside (major iridoid glycoside) was the highest in G. rigescens, regardless of tissue and geographic origin. The level of swertiamarin was the lowest, even unable to be detected in samples from Kunming and Qujing. Significant correlations (p < 0.05) between gentiopicroside, loganic acid, sweroside and swertiamarin were found at inter- or intra-tissues, which were highly depended on geographic origins, indicating the influence of environmental conditions on the conversion and transport of secondary metabolites in G. rigescens. Furthermore, samples were reasonably classified as three clusters along large producing areas where have similar climate conditions, characterized by carbohydrates, phenols, benzoates, terpenoids, aliphatic alcohols, aromatic hydrocarbons, and so forth. The present work provided global information on the chemical profile and contents of major iridoid glycosides in G. rigescens originated from six different origins, which is helpful for controlling quality of herbal medicines systematically.

  5. Comprehensive Quality Assessment Based Specific Chemical Profiles for Geographic and Tissue Variation in Gentiana rigescens Using HPLC and FTIR Method Combined with Principal Component Analysis

    PubMed Central

    Li, Jie; Zhang, Ji; Zhao, Yan-Li; Huang, Heng-Yu; Wang, Yuan-Zhong

    2017-01-01

    Roots, stems, leaves, and flowers of Longdan (Gentiana rigescens Franch. ex Hemsl) were collected from six geographic origins of Yunnan Province (n = 240) to implement the quality assessment based on contents of gentiopicroside, loganic acid, sweroside and swertiamarin and chemical profile using HPLC-DAD and FTIR method combined with principal component analysis (PCA). The content of gentiopicroside (major iridoid glycoside) was the highest in G. rigescens, regardless of tissue and geographic origin. The level of swertiamarin was the lowest, even unable to be detected in samples from Kunming and Qujing. Significant correlations (p < 0.05) between gentiopicroside, loganic acid, sweroside, and swertiamarin were found at inter- or intra-tissues, which were highly depended on geographic origins, indicating the influence of environmental conditions on the conversion and transport of secondary metabolites in G. rigescens. Furthermore, samples were reasonably classified as three clusters along large producing areas where have similar climate conditions, characterized by carbohydrates, phenols, benzoates, terpenoids, aliphatic alcohols, aromatic hydrocarbons, and so forth. The present work provided global information on the chemical profile and contents of major iridoid glycosides in G. rigescens originated from six different origins, which is helpful for controlling quality of herbal medicines systematically. PMID:29312929

  6. Characterization of Hatay honeys according to their multi-element analysis using ICP-OES combined with chemometrics.

    PubMed

    Yücel, Yasin; Sultanoğlu, Pınar

    2013-09-01

    Chemical characterisation has been carried out on 45 honey samples collected from Hatay region of Turkey. The concentrations of 17 elements were determined by inductively coupled plasma optical emission spectrometry (ICP-OES). Ca, K, Mg and Na were the most abundant elements, with mean contents of 219.38, 446.93, 49.06 and 95.91 mg kg(-1) respectively. The trace element mean contents ranged between 0.03 and 15.07 mg kg(-1). Chemometric methods such as principal component analysis (PCA) and cluster analysis (CA) techniques were applied to classify honey according to mineral content. The first most important principal component (PC) was strongly associated with the value of Al, B, Cd and Co. CA showed eight clusters corresponding to the eight botanical origins of honey. PCA explained 75.69% of the variance with the first six PC variables. Chemometric analysis of the analytical data allowed the accurate classification of the honey samples according to origin. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Multi-trait Analysis of Agroclimate Variations During the Growing Season in East-Central Poland (1971-2005)

    NASA Astrophysics Data System (ADS)

    Radzka, Elżbieta; Rymuza, Katarzyna

    2015-04-01

    The work is based on meteorological data recorded by nine stations of the Institute of Meteorology and Water Management located in east-central Poland from 1971 to 2005. The region encompasses the North Podlasian Lowland and the South Podlasian Lowland. Average values of selected agroclimate indicators for the growing season were determined. Moreover, principal component analysis was conducted to indicate elements that exerted the greatest influence on the agroclimate. Also, cluster analysis was carried out to select stations with similar agroclimate. Ward method was used for clustering and the Euclidean distance was applied. Principal component analysis revealed that the agroclimate of east-central Poland was predominantly affected by climatic water balance, number of days of active plant growth, length of the farming period, and the average air temperature during the growing season (Apr-Sept). Based on the analysis, the region of east-central Poland was divided into two groups (areas) with different agroclimatic conditions. The first area comprized the following stations: Szepietowo and Białowieża located in the North Podlasian Lowland and Biała Podlaska situated in the northern part of the South Podlasian Lowland. This area was characterized by shorter farming periods and a lower average air temperature during the growing season. The other group included the remaining stations located in the western part of both the Lowlands which was warmer and where greater water deficits were recorded.

  8. Comprehensive analysis of Polygoni Multiflori Radix of different geographical origins using ultra-high-performance liquid chromatography fingerprints and multivariate chemometric methods.

    PubMed

    Sun, Li-Li; Wang, Meng; Zhang, Hui-Jie; Liu, Ya-Nan; Ren, Xiao-Liang; Deng, Yan-Ru; Qi, Ai-Di

    2018-01-01

    Polygoni Multiflori Radix (PMR) is increasingly being used not just as a traditional herbal medicine but also as a popular functional food. In this study, multivariate chemometric methods and mass spectrometry were combined to analyze the ultra-high-performance liquid chromatograph (UPLC) fingerprints of PMR from six different geographical origins. A chemometric strategy based on multivariate curve resolution-alternating least squares (MCR-ALS) and three classification methods is proposed to analyze the UPLC fingerprints obtained. Common chromatographic problems, including the background contribution, baseline contribution, and peak overlap, were handled by the established MCR-ALS model. A total of 22 components were resolved. Moreover, relative species concentrations were obtained from the MCR-ALS model, which was used for multivariate classification analysis. Principal component analysis (PCA) and Ward's method have been applied to classify 72 PMR samples from six different geographical regions. The PCA score plot showed that the PMR samples fell into four clusters, which related to the geographical location and climate of the source areas. The results were then corroborated by Ward's method. In addition, according to the variance-weighted distance between cluster centers obtained from Ward's method, five components were identified as the most significant variables (chemical markers) for cluster discrimination. A counter-propagation artificial neural network has been applied to confirm and predict the effects of chemical markers on different samples. Finally, the five chemical markers were identified by UPLC-quadrupole time-of-flight mass spectrometer. Components 3, 12, 16, 18, and 19 were identified as 2,3,5,4'-tetrahydroxy-stilbene-2-O-β-d-glucoside, emodin-8-O-β-d-glucopyranoside, emodin-8-O-(6'-O-acetyl)-β-d-glucopyranoside, emodin, and physcion, respectively. In conclusion, the proposed method can be applied for the comprehensive analysis of natural samples. Copyright © 2016. Published by Elsevier B.V.

  9. Geographical classification of Epimedium based on HPLC fingerprint analysis combined with multi-ingredients quantitative analysis.

    PubMed

    Xu, Ning; Zhou, Guofu; Li, Xiaojuan; Lu, Heng; Meng, Fanyun; Zhai, Huaqiang

    2017-05-01

    A reliable and comprehensive method for identifying the origin and assessing the quality of Epimedium has been developed. The method is based on analysis of HPLC fingerprints, combined with similarity analysis, hierarchical cluster analysis (HCA), principal component analysis (PCA) and multi-ingredient quantitative analysis. Nineteen batches of Epimedium, collected from different areas in the western regions of China, were used to establish the fingerprints and 18 peaks were selected for the analysis. Similarity analysis, HCA and PCA all classified the 19 areas into three groups. Simultaneous quantification of the five major bioactive ingredients in the Epimedium samples was also carried out to confirm the consistency of the quality tests. These methods were successfully used to identify the geographical origin of the Epimedium samples and to evaluate their quality. Copyright © 2016 John Wiley & Sons, Ltd.

  10. Cluster analysis of stress corrosion mechanisms for steel wires used in bridge cables through acoustic emission particle swarm optimization.

    PubMed

    Li, Dongsheng; Yang, Wei; Zhang, Wenyao

    2017-05-01

    Stress corrosion is the major failure type of bridge cable damage. The acoustic emission (AE) technique was applied to monitor the stress corrosion process of steel wires used in bridge cable structures. The damage evolution of stress corrosion in bridge cables was obtained according to the AE characteristic parameter figure. A particle swarm optimization cluster method was developed to determine the relationship between the AE signal and stress corrosion mechanisms. Results indicate that the main AE sources of stress corrosion in bridge cables included four types: passive film breakdown and detachment of the corrosion product, crack initiation, crack extension, and cable fracture. By analyzing different types of clustering data, the mean value of each damage pattern's AE characteristic parameters was determined. Different corrosion damage source AE waveforms and the peak frequency were extracted. AE particle swarm optimization cluster analysis based on principal component analysis was also proposed. This method can completely distinguish the four types of damage sources and simplifies the determination of the evolution process of corrosion damage and broken wire signals. Copyright © 2017. Published by Elsevier B.V.

  11. Optimal wavelength band clustering for multispectral iris recognition.

    PubMed

    Gong, Yazhuo; Zhang, David; Shi, Pengfei; Yan, Jingqi

    2012-07-01

    This work explores the possibility of clustering spectral wavelengths based on the maximum dissimilarity of iris textures. The eventual goal is to determine how many bands of spectral wavelengths will be enough for iris multispectral fusion and to find these bands that will provide higher performance of iris multispectral recognition. A multispectral acquisition system was first designed for imaging the iris at narrow spectral bands in the range of 420 to 940 nm. Next, a set of 60 human iris images that correspond to the right and left eyes of 30 different subjects were acquired for an analysis. Finally, we determined that 3 clusters were enough to represent the 10 feature bands of spectral wavelengths using the agglomerative clustering based on two-dimensional principal component analysis. The experimental results suggest (1) the number, center, and composition of clusters of spectral wavelengths and (2) the higher performance of iris multispectral recognition based on a three wavelengths-bands fusion.

  12. Raman spectroscopy differentiates between sensitive and resistant multiple myeloma cell lines

    NASA Astrophysics Data System (ADS)

    Franco, Domenico; Trusso, Sebastiano; Fazio, Enza; Allegra, Alessandro; Musolino, Caterina; Speciale, Antonio; Cimino, Francesco; Saija, Antonella; Neri, Fortunato; Nicolò, Marco S.; Guglielmino, Salvatore P. P.

    2017-12-01

    Current methods for identifying neoplastic cells and discerning them from their normal counterparts are often nonspecific and biologically perturbing. Here, we show that single-cell micro-Raman spectroscopy can be used to discriminate between resistant and sensitive multiple myeloma cell lines based on their highly reproducible biomolecular spectral signatures. In order to demonstrate robustness of the proposed approach, we used two different cell lines of multiple myeloma, namely MM.1S and U266B1, and their counterparts MM.1R and U266/BTZ-R subtypes, resistant to dexamethasone and bortezomib, respectively. Then, micro-Raman spectroscopy provides an easily accurate and noninvasive method for cancer detection for both research and clinical environments. Characteristic peaks, mostly due to different DNA/RNA ratio, nucleic acids, lipids and protein concentrations, allow for discerning the sensitive and resistant subtypes. We also explored principal component analysis (PCA) for resistant cell identification and classification. Sensitive and resistant cells form distinct clusters that can be defined using just two principal components. The identification of drug-resistant cells by confocal micro-Raman spectroscopy is thus proposed as a clinical tool to assess the development of resistance to glucocorticoids and proteasome inhibitors in myeloma cells.

  13. Sequential analysis of hydrochemical data for watershed characterization.

    PubMed

    Thyne, Geoffrey; Güler, Cüneyt; Poeter, Eileen

    2004-01-01

    A methodology for characterizing the hydrogeology of watersheds using hydrochemical data that combine statistical, geochemical, and spatial techniques is presented. Surface water and ground water base flow and spring runoff samples (180 total) from a single watershed are first classified using hierarchical cluster analysis. The statistical clusters are analyzed for spatial coherence confirming that the clusters have a geological basis corresponding to topographic flowpaths and showing that the fractured rock aquifer behaves as an equivalent porous medium on the watershed scale. Then principal component analysis (PCA) is used to determine the sources of variation between parameters. PCA analysis shows that the variations within the dataset are related to variations in calcium, magnesium, SO4, and HCO3, which are derived from natural weathering reactions, and pH, NO3, and chlorine, which indicate anthropogenic impact. PHREEQC modeling is used to quantitatively describe the natural hydrochemical evolution for the watershed and aid in discrimination of samples that have an anthropogenic component. Finally, the seasonal changes in the water chemistry of individual sites were analyzed to better characterize the spatial variability of vertical hydraulic conductivity. The integrated result provides a method to characterize the hydrogeology of the watershed that fully utilizes traditional data.

  14. Kinematic foot types in youth with equinovarus secondary to hemiplegia

    PubMed Central

    Krzak, Joseph J.; Corcos, Daniel M.; Damiano, Diane L.; Graf, Adam; Hedeker, Donald; Smith, Peter A.; Harris, Gerald F.

    2015-01-01

    Background Elevated kinematic variability of the foot and ankle segments exists during gait among individuals with equinovarus secondary to hemiplegic cerebral palsy (CP). Clinicians have previously addressed such variability by developing classification schemes to identify subgroups of individuals based on their kinematics. Objective To identify kinematic subgroups among youth with equinovarus secondary to CP using 3-dimensional multi-segment foot and ankle kinematics during locomotion as inputs for principal component analysis (PCA), and K-means cluster analysis. Methods In a single assessment session, multi-segment foot and ankle kinematics using the Milwaukee Foot Model (MFM) were collected in 24 children/adolescents with equinovarus and 20 typically developing children/adolescents. Results PCA was used as a data reduction technique on 40 variables. K-means cluster analysis was performed on the first six principal components (PCs) which accounted for 92% of the variance of the dataset. The PCs described the location and plane of involvement in the foot and ankle. Five distinct kinematic subgroups were identified using K-means clustering. Participants with equinovarus presented with variable involvement ranging from primary hindfoot or forefoot deviations to deformtiy that included both segments in multiple planes. Conclusion This study provides further evidence of the variability in foot characteristics associated with equinovarus secondary to hemiplegic CP. These findings would not have been detected using a single segment foot model. The identification of multiple kinematic subgroups with unique foot and ankle characteristics has the potential to improve treatment since similar patients within a subgroup are likely to benefit from the same intervention(s). PMID:25467429

  15. Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy

    NASA Astrophysics Data System (ADS)

    Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.

    2012-11-01

    Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.

  16. Extracting the regional common-mode component of GPS station position time series from dense continuous network

    NASA Astrophysics Data System (ADS)

    Tian, Yunfeng; Shen, Zheng-Kang

    2016-02-01

    We develop a spatial filtering method to remove random noise and extract the spatially correlated transients (i.e., common-mode component (CMC)) that deviate from zero mean over the span of detrended position time series of a continuous Global Positioning System (CGPS) network. The technique utilizes a weighting scheme that incorporates two factors—distances between neighboring sites and their correlations of long-term residual position time series. We use a grid search algorithm to find the optimal thresholds for deriving the CMC that minimizes the root-mean-square (RMS) of the filtered residual position time series. Comparing to the principal component analysis technique, our method achieves better (>13% on average) reduction of residual position scatters for the CGPS stations in western North America, eliminating regional transients of all spatial scales. It also has advantages in data manipulation: less intervention and applicable to a dense network of any spatial extent. Our method can also be used to detect CMC irrespective of its origins (i.e., tectonic or nontectonic), if such signals are of particular interests for further study. By varying the filtering distance range, the long-range CMC related to atmospheric disturbance can be filtered out, uncovering CMC associated with transient tectonic deformation. A correlation-based clustering algorithm is adopted to identify stations cluster that share the common regional transient characteristics.

  17. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  18. Detection of Fungus Infection on Petals of Rapeseed (Brassica napus L.) Using NIR Hyperspectral Imaging

    NASA Astrophysics Data System (ADS)

    Zhao, Yan-Ru; Yu, Ke-Qiang; Li, Xiaoli; He, Yong

    2016-12-01

    Infected petals are often regarded as the source for the spread of fungi Sclerotinia sclerotiorum in all growing process of rapeseed (Brassica napus L.) plants. This research aimed to detect fungal infection of rapeseed petals by applying hyperspectral imaging in the spectral region of 874-1734 nm coupled with chemometrics. Reflectance was extracted from regions of interest (ROIs) in the hyperspectral image of each sample. Firstly, principal component analysis (PCA) was applied to conduct a cluster analysis with the first several principal components (PCs). Then, two methods including X-loadings of PCA and random frog (RF) algorithm were used and compared for optimizing wavebands selection. Least squares-support vector machine (LS-SVM) methodology was employed to establish discriminative models based on the optimal and full wavebands. Finally, area under the receiver operating characteristics curve (AUC) was utilized to evaluate classification performance of these LS-SVM models. It was found that LS-SVM based on the combination of all optimal wavebands had the best performance with AUC of 0.929. These results were promising and demonstrated the potential of applying hyperspectral imaging in fungus infection detection on rapeseed petals.

  19. Preliminary Comparisons of the Information Content and Utility of TM Versus MSS Data

    NASA Technical Reports Server (NTRS)

    Markham, B. L.

    1984-01-01

    Comparisons were made between subscenes from the first TM scene acquired of the Washington, D.C. area and a MSS scene acquired approximately one year earlier. Three types of analyses were conducted to compare TM and MSS data: a water body analysis, a principal components analysis and a spectral clustering analysis. The water body analysis compared the capability of the TM to the MSS for detecting small uniform targets. Of the 59 ponds located on aerial photographs 34 (58%) were detected by the TM with six commission errors (15%) and 13 (22%) were detected by the MSS with three commission errors (19%). The smallest water body detected by the TM was 16 meters; the smallest detected by the MSS was 40 meters. For the principal components analysis, means and covariance matrices were calculated for each subscene, and principal components images generated and characterized. In the spectral clustering comparison each scene was independently clustered and the clusters were assigned to informational classes. The preliminary comparison indicated that TM data provides enhancements over MSS in terms of (1) small target detection and (2) data dimensionality (even with 4-band data). The extra dimension, partially resultant from TM band 1, appears useful for built-up/non-built-up area separation.

  20. HT-FRTC: a fast radiative transfer code using kernel regression

    NASA Astrophysics Data System (ADS)

    Thelen, Jean-Claude; Havemann, Stephan; Lewis, Warren

    2016-09-01

    The HT-FRTC is a principal component based fast radiative transfer code that can be used across the electromagnetic spectrum from the microwave through to the ultraviolet to calculate transmittance, radiance and flux spectra. The principal components cover the spectrum at a very high spectral resolution, which allows very fast line-by-line, hyperspectral and broadband simulations for satellite-based, airborne and ground-based sensors. The principal components are derived during a code training phase from line-by-line simulations for a diverse set of atmosphere and surface conditions. The derived principal components are sensor independent, i.e. no extra training is required to include additional sensors. During the training phase we also derive the predictors which are required by the fast radiative transfer code to determine the principal component scores from the monochromatic radiances (or fluxes, transmittances). These predictors are calculated for each training profile at a small number of frequencies, which are selected by a k-means cluster algorithm during the training phase. Until recently the predictors were calculated using a linear regression. However, during a recent rewrite of the code the linear regression was replaced by a Gaussian Process (GP) regression which resulted in a significant increase in accuracy when compared to the linear regression. The HT-FRTC has been trained with a large variety of gases, surface properties and scatterers. Rayleigh scattering as well as scattering by frozen/liquid clouds, hydrometeors and aerosols have all been included. The scattering phase function can be fully accounted for by an integrated line-by-line version of the Edwards-Slingo spherical harmonics radiation code or approximately by a modification to the extinction (Chou scaling).

  1. Delineation of Stenotrophomonas maltophilia isolates from cystic fibrosis patients by fatty acid methyl ester profiles and matrix-assisted laser desorption/ionization time-of-flight mass spectra using hierarchical cluster analysis and principal component analysis.

    PubMed

    Vidigal, Pedrina Gonçalves; Mosel, Frank; Koehling, Hedda Luise; Mueller, Karl Dieter; Buer, Jan; Rath, Peter Michael; Steinmann, Joerg

    2014-12-01

    Stenotrophomonas maltophilia is an opportunist multidrug-resistant pathogen that causes a wide range of nosocomial infections. Various cystic fibrosis (CF) centres have reported an increasing prevalence of S. maltophilia colonization/infection among patients with this disease. The purpose of this study was to assess specific fingerprints of S. maltophilia isolates from CF patients (n = 71) by investigating fatty acid methyl esters (FAMEs) through gas chromatography (GC) and highly abundant proteins by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), and to compare them with isolates obtained from intensive care unit (ICU) patients (n = 20) and the environment (n = 11). Principal component analysis (PCA) of GC-FAME patterns did not reveal a clustering corresponding to distinct CF, ICU or environmental types. Based on the peak area index, it was observed that S. maltophilia isolates from CF patients produced significantly higher amounts of fatty acids in comparison with ICU patients and the environmental isolates. Hierarchical cluster analysis (HCA) based on the MALDI-TOF MS peak profiles of S. maltophilia revealed the presence of five large clusters, suggesting a high phenotypic diversity. Although HCA of MALDI-TOF mass spectra did not result in distinct clusters predominantly composed of CF isolates, PCA revealed the presence of a distinct cluster composed of S. maltophilia isolates from CF patients. Our data suggest that S. maltophilia colonizing CF patients tend to modify not only their fatty acid patterns but also their protein patterns as a response to adaptation in the unfavourable environment of the CF lung. © 2014 The Authors.

  2. Long-term surface EMG monitoring using K-means clustering and compressive sensing

    NASA Astrophysics Data System (ADS)

    Balouchestani, Mohammadreza; Krishnan, Sridhar

    2015-05-01

    In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.

  3. Wavelet decomposition based principal component analysis for face recognition using MATLAB

    NASA Astrophysics Data System (ADS)

    Sharma, Mahesh Kumar; Sharma, Shashikant; Leeprechanon, Nopbhorn; Ranjan, Aashish

    2016-03-01

    For the realization of face recognition systems in the static as well as in the real time frame, algorithms such as principal component analysis, independent component analysis, linear discriminate analysis, neural networks and genetic algorithms are used for decades. This paper discusses an approach which is a wavelet decomposition based principal component analysis for face recognition. Principal component analysis is chosen over other algorithms due to its relative simplicity, efficiency, and robustness features. The term face recognition stands for identifying a person from his facial gestures and having resemblance with factor analysis in some sense, i.e. extraction of the principal component of an image. Principal component analysis is subjected to some drawbacks, mainly the poor discriminatory power and the large computational load in finding eigenvectors, in particular. These drawbacks can be greatly reduced by combining both wavelet transform decomposition for feature extraction and principal component analysis for pattern representation and classification together, by analyzing the facial gestures into space and time domain, where, frequency and time are used interchangeably. From the experimental results, it is envisaged that this face recognition method has made a significant percentage improvement in recognition rate as well as having a better computational efficiency.

  4. Genetic Variability among Lucerne Cultivars Based on Biochemical (SDS-PAGE) and Morphological Markers

    NASA Astrophysics Data System (ADS)

    Farshadfar, M.; Farshadfar, E.

    The present research was conducted to determine the genetic variability of 18 Lucerne cultivars, based on morphological and biochemical markers. The traits studied were plant height, tiller number, biomass, dry yield, dry yield/biomass, dry leaf/dry yield, macro and micro elements, crude protein, dry matter, crude fiber and ash percentage and SDS- PAGE in seed and leaf samples. Field experiments included 18 plots of two meter rows. Data based on morphological, chemical and SDS-PAGE markers were analyzed using SPSSWIN soft ware and the multivariate statistical procedures: cluster analysis (UPGMA), principal component. Analysis of analysis of variance and mean comparison for morphological traits reflected significant differences among genotypes. Genotype 13 and 15 had the greatest values for most traits. The Genotypic Coefficient of Variation (GCV), Phenotypic Coefficient of Variation (PCV) and Heritability (Hb) parameters for different characters raged from 12.49 to 26.58% for PCV, hence the GCV ranged from 6.84 to 18.84%. The greatest value of Hb was 0.94 for stem number. Lucerne genotypes could be classified, based on morphological traits, into four clusters and 94% of the variance among the genotypes was explained by two PCAs: Based on chemical traits they were classified into five groups and 73.492% of variance was explained by four principal components: Dry matter, protein, fiber, P, K, Na, Mg and Zn had higher variance. Genotypes based on the SDS-PAGE patterns all genotypes were classified into three clusters. The greatest genetic distance was between cultivar 10 and others, therefore they would be suitable parent in a breeding program.

  5. A new approach for computing a flood vulnerability index using cluster analysis

    NASA Astrophysics Data System (ADS)

    Fernandez, Paulo; Mourato, Sandra; Moreira, Madalena; Pereira, Luísa

    2016-08-01

    A Flood Vulnerability Index (FloodVI) was developed using Principal Component Analysis (PCA) and a new aggregation method based on Cluster Analysis (CA). PCA simplifies a large number of variables into a few uncorrelated factors representing the social, economic, physical and environmental dimensions of vulnerability. CA groups areas that have the same characteristics in terms of vulnerability into vulnerability classes. The grouping of the areas determines their classification contrary to other aggregation methods in which the areas' classification determines their grouping. While other aggregation methods distribute the areas into classes, in an artificial manner, by imposing a certain probability for an area to belong to a certain class, as determined by the assumption that the aggregation measure used is normally distributed, CA does not constrain the distribution of the areas by the classes. FloodVI was designed at the neighbourhood level and was applied to the Portuguese municipality of Vila Nova de Gaia where several flood events have taken place in the recent past. The FloodVI sensitivity was assessed using three different aggregation methods: the sum of component scores, the first component score and the weighted sum of component scores. The results highlight the sensitivity of the FloodVI to different aggregation methods. Both sum of component scores and weighted sum of component scores have shown similar results. The first component score aggregation method classifies almost all areas as having medium vulnerability and finally the results obtained using the CA show a distinct differentiation of the vulnerability where hot spots can be clearly identified. The information provided by records of previous flood events corroborate the results obtained with CA, because the inundated areas with greater damages are those that are identified as high and very high vulnerability areas by CA. This supports the fact that CA provides a reliable FloodVI.

  6. Classification of LC columns based on the QSRR method and selectivity toward moclobemide and its metabolites.

    PubMed

    Plenis, Alina; Olędzka, Ilona; Bączek, Tomasz

    2013-05-05

    This paper focuses on a comparative study of the column classification system based on the quantitative structure-retention relationships (QSRR method) and column performance in real biomedical analysis. The assay was carried out for the LC separation of moclobemide and its metabolites in human plasma, using a set of 24 stationary phases. The QSRR models established for the studied stationary phases were compared with the column test performance results under two chemometric techniques - the principal component analysis (PCA) and the hierarchical clustering analysis (HCA). The study confirmed that the stationary phase classes found closely related by the QSRR approach yielded comparable separation for moclobemide and its metabolites. Therefore, the QSRR method could be considered supportive in the selection of a suitable column for the biomedical analysis offering the selection of similar or dissimilar columns with a relatively higher certainty. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Clustering analysis strategies for electron energy loss spectroscopy (EELS).

    PubMed

    Torruella, Pau; Estrader, Marta; López-Ortega, Alberto; Baró, Maria Dolors; Varela, Maria; Peiró, Francesca; Estradé, Sònia

    2018-02-01

    In this work, the use of cluster analysis algorithms, widely applied in the field of big data, is proposed to explore and analyze electron energy loss spectroscopy (EELS) data sets. Three different data clustering approaches have been tested both with simulated and experimental data from Fe 3 O 4 /Mn 3 O 4 core/shell nanoparticles. The first method consists on applying data clustering directly to the acquired spectra. A second approach is to analyze spectral variance with principal component analysis (PCA) within a given data cluster. Lastly, data clustering on PCA score maps is discussed. The advantages and requirements of each approach are studied. Results demonstrate how clustering is able to recover compositional and oxidation state information from EELS data with minimal user input, giving great prospects for its usage in EEL spectroscopy. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. EMPCA and Cluster Analysis of Quasar Spectra: Construction and Application to Simulated Spectra

    NASA Astrophysics Data System (ADS)

    Marrs, Adam; Leighly, Karen; Wagner, Cassidy; Macinnis, Francis

    2017-01-01

    Quasars have complex spectra with emission lines influenced by many factors. Therefore, to fully describe the spectrum requires specification of a large number of parameters, such as line equivalent width, blueshift, and ratios. Principal Component Analysis (PCA) aims to construct eigenvectors-or principal components-from the data with the goal of finding a few key parameters that can be used to predict the rest of the spectrum fairly well. Analysis of simulated quasar spectra was used to verify and justify our modified application of PCA.We used a variant of PCA called Weighted Expectation Maximization PCA (EMPCA; Bailey 2012) along with k-means cluster analysis to analyze simulated quasar spectra. Our approach combines both analytical methods to address two known problems with classical PCA. EMPCA uses weights to account for uncertainty and missing points in the spectra. K-means groups similar spectra together to address the nonlinearity of quasar spectra, specifically variance in blueshifts and widths of the emission lines.In producing and analyzing simulations, we first tested the effects of varying equivalent widths and blueshifts on the derived principal components, and explored the differences between standard PCA and EMPCA. We also tested the effects of varying signal-to-noise ratio. Next we used the results of fits to composite quasar spectra (see accompanying poster by Wagner et al.) to construct a set of realistic simulated spectra, and subjected those spectra to the EMPCA /k-means analysis. We concluded that our approach was validated when we found that the mean spectra from our k-means clusters derived from PCA projection coefficients reproduced the trends observed in the composite spectra.Furthermore, our method needed only two eigenvectors to identify both sets of correlations used to construct the simulations, as well as indicating the linear and nonlinear segments. Comparing this to regular PCA, which can require a dozen or more components, or to direct spectral analysis that may need measurement of 20 fit parameters, shows why the dual application of these two techniques is such a powerful tool.

  9. Spatial analysis of pulmonary tuberculosis in Antananarivo Madagascar: tuberculosis-related knowledge, attitude and practice.

    PubMed

    Rakotosamimanana, Sitraka; Mandrosovololona, Vatsiharizandry; Rakotonirina, Julio; Ramamonjisoa, Joselyne; Ranjalahy, Justin Rasolofomanana; Randremanana, Rindra Vatosoa; Rakotomanana, Fanjasoa

    2014-01-01

    Tuberculosis infection may remain latent, but the disease is nevertheless a serious public health issue. Various epidemiological studies on pulmonary tuberculosis have considered the spatial component and taken it into account, revealing the tendency of this disease to cluster in particular locations. The aim was to assess the contribution of Knowledge Attitude and Practice (KAP) to the distribution of tuberculosis and to provide information for the improvement of the National Tuberculosis Program. We investigated the role of KAP to distribution patterns of pulmonary tuberculosis in Antananarivo. First, we performed spatial scanning of tuberculosis aggregation among permanent cases resident in Antananarivo Urban Township using the Kulldorff method, and then we carried out a quantitative study on KAP, involving TB patients. The KAP study in the population was based on qualitative methods with focus groups. The disease still clusters in the same districts identified in the previous study. The principal cluster covered 22 neighborhoods. Most of them are part of the first district. A secondary cluster was found, involving 18 neighborhoods in the sixth district and two neighborhoods in the fifth. The relative risk was respectively 1.7 (p<10-6) in the principal cluster and 1.6 (p<10-3) in the secondary cluster. Our study showed that more was known about TB symptoms than about the duration of the disease or free treatment. Knowledge about TB was limited to that acquired at school or from relatives with TB. The attitude and practices of patients and the population in general indicated that there is still a stigma attached to tuberculosis. This type of survey can be conducted in remote zones where the tuberculosis-related KAP of the TB patients and the general population is less known or not documented; the findings could be used to adapt control measures to the local particularities.

  10. Discriminating the Mineralogical Composition in Drill Cuttings Based on Absorption Spectra in the Terahertz Range.

    PubMed

    Miao, Xinyang; Li, Hao; Bao, Rima; Feng, Chengjing; Wu, Hang; Zhan, Honglei; Li, Yizhang; Zhao, Kun

    2017-02-01

    Understanding the geological units of a reservoir is essential to the development and management of the resource. In this paper, drill cuttings from several depths from an oilfield were studied using terahertz time domain spectroscopy (THz-TDS). Cluster analysis (CA) and principal component analysis (PCA) were employed to classify and analyze the cuttings. The cuttings were clearly classified based on CA and PCA methods, and the results were in agreement with the lithology. Moreover, calcite and dolomite have stronger absorption of a THz pulse than any other minerals, based on an analysis of the PC1 scores. Quantitative analyses of minor minerals were also realized by building a series of linear and non-linear models between contents and PC2 scores. The results prove THz technology to be a promising means for determining reservoir lithology as well as other properties, which will be a significant supplementary method in oil fields.

  11. Sensor fault diagnosis of aero-engine based on divided flight status.

    PubMed

    Zhao, Zhen; Zhang, Jun; Sun, Yigang; Liu, Zhexu

    2017-11-01

    Fault diagnosis and safety analysis of an aero-engine have attracted more and more attention in modern society, whose safety directly affects the flight safety of an aircraft. In this paper, the problem concerning sensor fault diagnosis is investigated for an aero-engine during the whole flight process. Considering that the aero-engine is always working in different status through the whole flight process, a flight status division-based sensor fault diagnosis method is presented to improve fault diagnosis precision for the aero-engine. First, aero-engine status is partitioned according to normal sensor data during the whole flight process through the clustering algorithm. Based on that, a diagnosis model is built for each status using the principal component analysis algorithm. Finally, the sensors are monitored using the built diagnosis models by identifying the aero-engine status. The simulation result illustrates the effectiveness of the proposed method.

  12. Sensor fault diagnosis of aero-engine based on divided flight status

    NASA Astrophysics Data System (ADS)

    Zhao, Zhen; Zhang, Jun; Sun, Yigang; Liu, Zhexu

    2017-11-01

    Fault diagnosis and safety analysis of an aero-engine have attracted more and more attention in modern society, whose safety directly affects the flight safety of an aircraft. In this paper, the problem concerning sensor fault diagnosis is investigated for an aero-engine during the whole flight process. Considering that the aero-engine is always working in different status through the whole flight process, a flight status division-based sensor fault diagnosis method is presented to improve fault diagnosis precision for the aero-engine. First, aero-engine status is partitioned according to normal sensor data during the whole flight process through the clustering algorithm. Based on that, a diagnosis model is built for each status using the principal component analysis algorithm. Finally, the sensors are monitored using the built diagnosis models by identifying the aero-engine status. The simulation result illustrates the effectiveness of the proposed method.

  13. Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques

    NASA Astrophysics Data System (ADS)

    Gulgundi, Mohammad Shahid; Shetty, Amba

    2018-03-01

    Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.

  14. A comparison of autonomous techniques for multispectral image analysis and classification

    NASA Astrophysics Data System (ADS)

    Valdiviezo-N., Juan C.; Urcid, Gonzalo; Toxqui-Quitl, Carina; Padilla-Vivanco, Alfonso

    2012-10-01

    Multispectral imaging has given place to important applications related to classification and identification of objects from a scene. Because of multispectral instruments can be used to estimate the reflectance of materials in the scene, these techniques constitute fundamental tools for materials analysis and quality control. During the last years, a variety of algorithms has been developed to work with multispectral data, whose main purpose has been to perform the correct classification of the objects in the scene. The present study introduces a brief review of some classical as well as a novel technique that have been used for such purposes. The use of principal component analysis and K-means clustering techniques as important classification algorithms is here discussed. Moreover, a recent method based on the min-W and max-M lattice auto-associative memories, that was proposed for endmember determination in hyperspectral imagery, is introduced as a classification method. Besides a discussion of their mathematical foundation, we emphasize their main characteristics and the results achieved for two exemplar images conformed by objects similar in appearance, but spectrally different. The classification results state that the first components computed from principal component analysis can be used to highlight areas with different spectral characteristics. In addition, the use of lattice auto-associative memories provides good results for materials classification even in the cases where some spectral similarities appears in their spectral responses.

  15. Statistical interpretation of chromatic indicators in correlation to phytochemical profile of a sulfur dioxide-free mulberry (Morus nigra) wine submitted to non-thermal maturation processes.

    PubMed

    Tchabo, William; Ma, Yongkun; Kwaw, Emmanuel; Zhang, Haining; Xiao, Lulu; Apaliya, Maurice T

    2018-01-15

    The four different methods of color measurement of wine proposed by Boulton, Giusti, Glories and Commission International de l'Eclairage (CIE) were applied to assess the statistical relationship between the phytochemical profile and chromatic characteristics of sulfur dioxide-free mulberry (Morus nigra) wine submitted to non-thermal maturation processes. The alteration in chromatic properties and phenolic composition of non-thermal aged mulberry wine were examined, aided by the used of Pearson correlation, cluster and principal component analysis. The results revealed a positive effect of non-thermal processes on phytochemical families of wines. From Pearson correlation analysis relationships between chromatic indexes and flavonols as well as anthocyanins were established. Cluster analysis highlighted similarities between Boulton and Giusti parameters, as well as Glories and CIE parameters in the assessment of chromatic properties of wines. Finally, principal component analysis was able to discriminate wines subjected to different maturation techniques on the basis of their chromatic and phenolics characteristics. Copyright © 2017. Published by Elsevier Ltd.

  16. Micro-PIXE analysis of trace element concentrations of natural rubies from different locations in Myanmar

    NASA Astrophysics Data System (ADS)

    Sanchez, J. L.; Osipowicz, T.; Tang, S. M.; Tay, T. S.; Win, T. T.

    1997-07-01

    The trace element concentrations found in geological samples can shed light on the formation process. In the case of gemstones, which might be of artificial or natural origin, there is also considerable interest in the development of methods that provide identification of the origin of a sample. For rubies, trace element concentrations present in natural samples were shown previously to be significant indicators of the region of origin [S.M. Tang et al., Appl. Spectr. 42 (1988) 44, and 43 (1989) 219]. Here we report the results of micro-PIXE analyses of trace element (Ti, V, Cr, Fe, Cu and Ga) concentrations of a large set ( n = 130) of natural rough rubies from nine locations in Myanmar (Burma). The resulting concentrations are subjected to statistical analysis. Six of the nine groups form clusters when the data base is evaluated using tree clustering and principal component analysis.

  17. A cluster analysis method for identification of subpopulations of cells in flow cytometric list-mode arrays

    NASA Technical Reports Server (NTRS)

    Li, Z. K.

    1985-01-01

    A specialized program was developed for flow cytometric list-mode data using an heirarchical tree method for identifying and enumerating individual subpopulations, the method of principal components for a two-dimensional display of 6-parameter data array, and a standard sorting algorithm for characterizing subpopulations. The program was tested against a published data set subjected to cluster analysis and experimental data sets from controlled flow cytometry experiments using a Coulter Electronics EPICS V Cell Sorter. A version of the program in compiled BASIC is usable on a 16-bit microcomputer with the MS-DOS operating system. It is specialized for 6 parameters and up to 20,000 cells. Its two-dimensional display of Euclidean distances reveals clusters clearly, as does its 1-dimensional display. The identified subpopulations can, in suitable experiments, be related to functional subpopulations of cells.

  18. Analyzing coastal environments by means of functional data analysis

    NASA Astrophysics Data System (ADS)

    Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.

    2017-07-01

    Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.

  19. Screening molecular associations with lipid membranes using natural abundance 13C cross-polarization magic-angle spinning NMR and principal component analysis.

    PubMed

    Middleton, David A; Hughes, Eleri; Madine, Jillian

    2004-08-11

    We describe an NMR approach for detecting the interactions between phospholipid membranes and proteins, peptides, or small molecules. First, 1H-13C dipolar coupling profiles are obtained from hydrated lipid samples at natural isotope abundance using cross-polarization magic-angle spinning NMR methods. Principal component analysis of dipolar coupling profiles for synthetic lipid membranes in the presence of a range of biologically active additives reveals clusters that relate to different modes of interaction of the additives with the lipid bilayer. Finally, by representing profiles from multiple samples in the form of contour plots, it is possible to reveal statistically significant changes in dipolar couplings, which reflect perturbations in the lipid molecules at the membrane surface or within the hydrophobic interior.

  20. Organic Food Market Segmentation in Lebanon

    NASA Astrophysics Data System (ADS)

    Tleis, Malak; Roma, Rocco; Callieris, Roberta

    2015-04-01

    Organic farming in Lebanon is not a new concept. It started with the efforts of the private sector more than a decade ago and is still present even with the limited agricultural production. The local market is quite developed in comparison to neighboring countries, depending mainly on imports. Few studies were addressed to organic consumption in Lebanon, were none of them dealt with organic consumers analysis. Therefore, our objectives were to identify the profiles of Lebanese organic consumer and non organic consumer and to propose appropriate marketing strategies for each segment of consumer with the final aim of developing the Lebanese organic market. A survey, based on the use of closed-ended questionnaire, was addressed to 400 consumers in the capital, Beirut, from the end of February till the end of March 2014. Data underwent descriptive analyses, principal component analyses (PCA) and cluster analyses (k-means method) through the statistical software SPSS. Four cluster were obtained based on psychographic characteristics and willingness to pay (WTP) for the principal organic products purchased. "Localists" and "Health conscious" clusters constituted the largest proportion of the selected sample, thus were the most critical to be addressed by specific marketing strategies emphasizing the combination of local and organic food and the healthy properties of organic products. "Rational" and "Irregular" cluster were relatively small groups, addressed by pricing and promotional strategies. This study showed a positive attitude among Lebanese consumer towards organic food, where egoistic motives are prevailing over altruistic motives. High prices of organic commodities and low trust in organic farming, remain a constraint to levitating organic consumption. The combined efforts of the public and the private sector are required to spread the knowledge about positive environmental payback of organic agriculture and for the promotion of locally produced organic goods.

  1. Network visualization of conformational sampling during molecular dynamics simulation.

    PubMed

    Ahlstrom, Logan S; Baker, Joseph Lee; Ehrlich, Kent; Campbell, Zachary T; Patel, Sunita; Vorontsov, Ivan I; Tama, Florence; Miyashita, Osamu

    2013-11-01

    Effective data reduction methods are necessary for uncovering the inherent conformational relationships present in large molecular dynamics (MD) trajectories. Clustering algorithms provide a means to interpret the conformational sampling of molecules during simulation by grouping trajectory snapshots into a few subgroups, or clusters, but the relationships between the individual clusters may not be readily understood. Here we show that network analysis can be used to visualize the dominant conformational states explored during simulation as well as the connectivity between them, providing a more coherent description of conformational space than traditional clustering techniques alone. We compare the results of network visualization against 11 clustering algorithms and principal component conformer plots. Several MD simulations of proteins undergoing different conformational changes demonstrate the effectiveness of networks in reaching functional conclusions. Copyright © 2013 Elsevier Inc. All rights reserved.

  2. [Seed vigor evaluation based on adversity resistance index of wheat seed germination under stress conditions.

    PubMed

    Chen, Lei Tai; Sun, Ai Qing; Yang, Min; Chen, Lu Lu; Ma, Xue Li; Li, Mei Ling; Yin, Yan Ping

    2016-09-01

    A total of 16 wheat cultivars were selected to detect seed vigor of different genotypes using standard germination test, seed germination test under stress conditions and field emergence test. The adversity resistance indices of seed vigor indices and field emergence percentage under different germination conditions were used as the indices to evaluate adversity resistance. Principal component analysis and cluster analysis were used for the comprehensive evaluation of seed vigor. Results showed that drought stress, artificial aging and cold soaking treatments affected seed vigor to some extent. The adversity resistance indices of the artificial aging and cold soaking tests were significantly positively correlated with the field emergence percentage, while the adversity resistance index of drought stress test had no significant correlation with the field emergence percentage. 16 wheat cultivars were classified as three groups based on the principal component analysis and cluster analysis. Yunong 949, Yumai 49-198, Luyuan 502, Zhengyumai 9987, Shimai 21, Shannong 23, and Shixin 828 belonged to high vigor seeds. Xunong 5, Yunong 982, Tangmai 8, Jimai 20, Jimai 22, Jinan 17, and Shannong 20 belonged to medium vigor seeds. The other two cultivars, Chang 4738 and Lunxuan 061, belonged to low vigor seeds.

  3. Feature extraction for ultrasonic sensor based defect detection in ceramic components

    NASA Astrophysics Data System (ADS)

    Kesharaju, Manasa; Nagarajah, Romesh

    2014-02-01

    High density silicon carbide materials are commonly used as the ceramic element of hard armour inserts used in traditional body armour systems to reduce their weight, while providing improved hardness, strength and elastic response to stress. Currently, armour ceramic tiles are inspected visually offline using an X-ray technique that is time consuming and very expensive. In addition, from X-rays multiple defects are also misinterpreted as single defects. Therefore, to address these problems the ultrasonic non-destructive approach is being investigated. Ultrasound based inspection would be far more cost effective and reliable as the methodology is applicable for on-line quality control including implementation of accept/reject criteria. This paper describes a recently developed methodology to detect, locate and classify various manufacturing defects in ceramic tiles using sub band coding of ultrasonic test signals. The wavelet transform is applied to the ultrasonic signal and wavelet coefficients in the different frequency bands are extracted and used as input features to an artificial neural network (ANN) for purposes of signal classification. Two different classifiers, using artificial neural networks (supervised) and clustering (un-supervised) are supplied with features selected using Principal Component Analysis(PCA) and their classification performance compared. This investigation establishes experimentally that Principal Component Analysis(PCA) can be effectively used as a feature selection method that provides superior results for classifying various defects in the context of ultrasonic inspection in comparison with the X-ray technique.

  4. Detecting transitions in protein dynamics using a recurrence quantification analysis based bootstrap method.

    PubMed

    Karain, Wael I

    2017-11-28

    Proteins undergo conformational transitions over different time scales. These transitions are closely intertwined with the protein's function. Numerous standard techniques such as principal component analysis are used to detect these transitions in molecular dynamics simulations. In this work, we add a new method that has the ability to detect transitions in dynamics based on the recurrences in the dynamical system. It combines bootstrapping and recurrence quantification analysis. We start from the assumption that a protein has a "baseline" recurrence structure over a given period of time. Any statistically significant deviation from this recurrence structure, as inferred from complexity measures provided by recurrence quantification analysis, is considered a transition in the dynamics of the protein. We apply this technique to a 132 ns long molecular dynamics simulation of the β-Lactamase Inhibitory Protein BLIP. We are able to detect conformational transitions in the nanosecond range in the recurrence dynamics of the BLIP protein during the simulation. The results compare favorably to those extracted using the principal component analysis technique. The recurrence quantification analysis based bootstrap technique is able to detect transitions between different dynamics states for a protein over different time scales. It is not limited to linear dynamics regimes, and can be generalized to any time scale. It also has the potential to be used to cluster frames in molecular dynamics trajectories according to the nature of their recurrence dynamics. One shortcoming for this method is the need to have large enough time windows to insure good statistical quality for the recurrence complexity measures needed to detect the transitions.

  5. The effect of pomegranate seed oil and grapeseed oil on cis-9, trans-11 CLA (rumenic acid), n-3 and n-6 fatty acids deposition in selected tissues of chickens.

    PubMed

    Białek, A; Białek, M; Lepionka, T; Kaszperuk, K; Banaszkiewicz, T; Tokarz, A

    2018-04-23

    The aim of this study was to determine whether diet modification with different doses of grapeseed oil or pomegranate seed oil will improve the nutritive value of poultry meat in terms of n-3 and n-6 fatty acids, as well as rumenic acid (cis-9, trans-11 conjugated linoleic acid) content in tissues diversified in lipid composition and roles in lipid metabolism. To evaluate the influence of applied diet modification comprehensively, two chemometric methods were used. Results of cluster analysis demonstrated that pomegranate seed oil modifies fatty acids profile in the most potent way, mainly by an increase in rumenic acid content. Principal component analysis showed that regardless of type of tissue first principal component is strongly associated with type of deposited fatty acid, while second principal component enables identification of place of deposition-type of tissue. Pomegranate seed oil seems to be a valuable feed additive in chickens' feeding. © 2018 Blackwell Verlag GmbH.

  6. Airborne electromagnetic data levelling using principal component analysis based on flight line difference

    NASA Astrophysics Data System (ADS)

    Zhang, Qiong; Peng, Cong; Lu, Yiming; Wang, Hao; Zhu, Kaiguang

    2018-04-01

    A novel technique is developed to level airborne geophysical data using principal component analysis based on flight line difference. In the paper, flight line difference is introduced to enhance the features of levelling error for airborne electromagnetic (AEM) data and improve the correlation between pseudo tie lines. Thus we conduct levelling to the flight line difference data instead of to the original AEM data directly. Pseudo tie lines are selected distributively cross profile direction, avoiding the anomalous regions. Since the levelling errors of selective pseudo tie lines show high correlations, principal component analysis is applied to extract the local levelling errors by low-order principal components reconstruction. Furthermore, we can obtain the levelling errors of original AEM data through inverse difference after spatial interpolation. This levelling method does not need to fly tie lines and design the levelling fitting function. The effectiveness of this method is demonstrated by the levelling results of survey data, comparing with the results from tie-line levelling and flight-line correlation levelling.

  7. Functional connectivity analysis of the neural bases of emotion regulation: A comparison of independent component method with density-based k-means clustering method.

    PubMed

    Zou, Ling; Guo, Qian; Xu, Yi; Yang, Biao; Jiao, Zhuqing; Xiang, Jianbo

    2016-04-29

    Functional magnetic resonance imaging (fMRI) is an important tool in neuroscience for assessing connectivity and interactions between distant areas of the brain. To find and characterize the coherent patterns of brain activity as a means of identifying brain systems for the cognitive reappraisal of the emotion task, both density-based k-means clustering and independent component analysis (ICA) methods can be applied to characterize the interactions between brain regions involved in cognitive reappraisal of emotion. Our results reveal that compared with the ICA method, the density-based k-means clustering method provides a higher sensitivity of polymerization. In addition, it is more sensitive to those relatively weak functional connection regions. Thus, the study concludes that in the process of receiving emotional stimuli, the relatively obvious activation areas are mainly distributed in the frontal lobe, cingulum and near the hypothalamus. Furthermore, density-based k-means clustering method creates a more reliable method for follow-up studies of brain functional connectivity.

  8. REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Glascoe, L G; Glaser, R E; Chin, H S

    2004-06-17

    The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less

  9. Principal component reconstruction (PCR) for cine CBCT with motion learning from 2D fluoroscopy.

    PubMed

    Gao, Hao; Zhang, Yawei; Ren, Lei; Yin, Fang-Fang

    2018-01-01

    This work aims to generate cine CT images (i.e., 4D images with high-temporal resolution) based on a novel principal component reconstruction (PCR) technique with motion learning from 2D fluoroscopic training images. In the proposed PCR method, the matrix factorization is utilized as an explicit low-rank regularization of 4D images that are represented as a product of spatial principal components and temporal motion coefficients. The key hypothesis of PCR is that temporal coefficients from 4D images can be reasonably approximated by temporal coefficients learned from 2D fluoroscopic training projections. For this purpose, we can acquire fluoroscopic training projections for a few breathing periods at fixed gantry angles that are free from geometric distortion due to gantry rotation, that is, fluoroscopy-based motion learning. Such training projections can provide an effective characterization of the breathing motion. The temporal coefficients can be extracted from these training projections and used as priors for PCR, even though principal components from training projections are certainly not the same for these 4D images to be reconstructed. For this purpose, training data are synchronized with reconstruction data using identical real-time breathing position intervals for projection binning. In terms of image reconstruction, with a priori temporal coefficients, the data fidelity for PCR changes from nonlinear to linear, and consequently, the PCR method is robust and can be solved efficiently. PCR is formulated as a convex optimization problem with the sum of linear data fidelity with respect to spatial principal components and spatiotemporal total variation regularization imposed on 4D image phases. The solution algorithm of PCR is developed based on alternating direction method of multipliers. The implementation is fully parallelized on GPU with NVIDIA CUDA toolbox and each reconstruction takes about a few minutes. The proposed PCR method is validated and compared with a state-of-art method, that is, PICCS, using both simulation and experimental data with the on-board cone-beam CT setting. The results demonstrated the feasibility of PCR for cine CBCT and significantly improved reconstruction quality of PCR from PICCS for cine CBCT. With a priori estimated temporal motion coefficients using fluoroscopic training projections, the PCR method can accurately reconstruct spatial principal components, and then generate cine CT images as a product of temporal motion coefficients and spatial principal components. © 2017 American Association of Physicists in Medicine.

  10. Chemical characteristics combined with bioactivity for comprehensive evaluation of Panax ginseng C.A. Meyer in different ages and seasons based on HPLC-DAD and chemometric methods.

    PubMed

    Shan, Si-Ming; Luo, Jian-Guang; Huang, Fang; Kong, Ling-Yi

    2014-02-01

    Panax ginseng C.A. Meyer has been known as a valuable traditional Chinese medicines for thousands years of history. Ginsenosides, the main active constituents, exhibit prominent immunoregulation effect. The present study first describes a holistic method based on chemical characteristic and lymphocyte proliferative capacity to evaluate systematically the quality of P. ginseng in thirty samples from different seasons during 2-6 years. The HPLC fingerprints were evaluated using principle component analysis (PCA) and hierarchical clustering analysis (HCA). The spectrum-efficacy model between HPLC fingerprints and T-lymphocyte proliferative activities was investigated by principal component regression (PCR) and partial least squares (PLS). The results indicated that the growth of the ginsenosides could be grouped into three periods and from August of the fifth year, P. ginseng appeared significant lymphocyte proliferative capacity. Close correlation existed between the spectrum-efficacy relationship and ginsenosides Rb1, Ro, Rc, Rb2 and Re were the main contributive components to the lymphocyte proliferative capacity. This comprehensive strategy, providing reliable and adequate scientific evidence, could be applied to other TCMs to ameliorate their quality control. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. Relating N2O emissions during biological nitrogen removal with operating conditions using multivariate statistical techniques.

    PubMed

    Vasilaki, V; Volcke, E I P; Nandi, A K; van Loosdrecht, M C M; Katsou, E

    2018-04-26

    Multivariate statistical analysis was applied to investigate the dependencies and underlying patterns between N 2 O emissions and online operational variables (dissolved oxygen and nitrogen component concentrations, temperature and influent flow-rate) during biological nitrogen removal from wastewater. The system under study was a full-scale reactor, for which hourly sensor data were available. The 15-month long monitoring campaign was divided into 10 sub-periods based on the profile of N 2 O emissions, using Binary Segmentation. The dependencies between operating variables and N 2 O emissions fluctuated according to Spearman's rank correlation. The correlation between N 2 O emissions and nitrite concentrations ranged between 0.51 and 0.78. Correlation >0.7 between N 2 O emissions and nitrate concentrations was observed at sub-periods with average temperature lower than 12 °C. Hierarchical k-means clustering and principal component analysis linked N 2 O emission peaks with precipitation events and ammonium concentrations higher than 2 mg/L, especially in sub-periods characterized by low N 2 O fluxes. Additionally, the highest ranges of measured N 2 O fluxes belonged to clusters corresponding with NO 3 -N concentration less than 1 mg/L in the upstream plug-flow reactor (middle of oxic zone), indicating slow nitrification rates. The results showed that the range of N 2 O emissions partially depends on the prior behavior of the system. The principal component analysis validated the findings from the clustering analysis and showed that ammonium, nitrate, nitrite and temperature explained a considerable percentage of the variance in the system for the majority of the sub-periods. The applied statistical methods, linked the different ranges of emissions with the system variables, provided insights on the effect of operating conditions on N 2 O emissions in each sub-period and can be integrated into N 2 O emissions data processing at wastewater treatment plants. Copyright © 2018. Published by Elsevier Ltd.

  12. Assessment of cluster yield components by image analysis.

    PubMed

    Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose

    2015-04-01

    Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.

  13. Utility of Metabolomics toward Assessing the Metabolic Basis of Quality Traits in Apple Fruit with an Emphasis on Antioxidants

    PubMed Central

    Cuthbertson, Daniel; Andrews, Preston K.; Reganold, John P.; Davies, Neal M.; Lange, B. Markus

    2012-01-01

    A gas chromatography–mass spectrometry approach was employed to evaluate the use of metabolite patterns to differentiate fruit from six commercially grown apple cultivars harvested in 2008. Principal component analysis (PCA) of apple fruit peel and flesh data indicated that individual cultivar replicates clustered together and were separated from all other cultivar samples. An independent metabolomics investigation with fruit harvested in 2003 confirmed the separate clustering of fruit from different cultivars. Further evidence for cultivar separation was obtained using a hierarchical clustering analysis. An evaluation of PCA component loadings revealed specific metabolite classes that contributed the most to each principal component, whereas a correlation analysis demonstrated that specific metabolites correlate directly with quality traits such as antioxidant activity, total phenolics, and total anthocyanins, which are important parameters in the selection of breeding germplasm. These data sets lay the foundation for elucidating the metabolic basis of commercially important fruit quality traits. PMID:22881116

  14. Principal component analysis for the comparison of metabolic profiles from human rectal cancer biopsies and colorectal xenografts using high-resolution magic angle spinning 1H magnetic resonance spectroscopy

    PubMed Central

    Seierstad, Therese; Røe, Kathrine; Sitter, Beathe; Halgunset, Jostein; Flatmark, Kjersti; Ree, Anne H; Olsen, Dag Rune; Gribbestad, Ingrid S; Bathen, Tone F

    2008-01-01

    Background This study was conducted in order to elucidate metabolic differences between human rectal cancer biopsies and colorectal HT29, HCT116 and SW620 xenografts by using high-resolution magnetic angle spinning (MAS) magnetic resonance spectroscopy (MRS) and for determination of the most appropriate human rectal xenograft model for preclinical MR spectroscopy studies. A further aim was to investigate metabolic changes following irradiation of HT29 xenografts. Methods HR MAS MRS of tissue samples from xenografts and rectal biopsies were obtained with a Bruker Avance DRX600 spectrometer and analyzed using principal component analysis (PCA) and partial least square (PLS) regression analysis. Results and conclusion HR MAS MRS enabled assignment of 27 metabolites. Score plots from PCA of spin-echo and single-pulse spectra revealed separate clusters of the different xenografts and rectal biopsies, reflecting underlying differences in metabolite composition. The loading profile indicated that clustering was mainly based on differences in relative amounts of lipids, lactate and choline-containing compounds, with HT29 exhibiting the metabolic profile most similar to human rectal cancers tissue. Due to high necrotic fractions in the HT29 xenografts, radiation-induced changes were not detected when comparing spectra from untreated and irradiated HT29 xenografts. However, PLS calibration relating spectral data to the necrotic fraction revealed a significant correlation, indicating that necrotic fraction can be assessed from the MR spectra. PMID:18439252

  15. Research of seafloor topographic analyses for a staged mineral exploration

    NASA Astrophysics Data System (ADS)

    Ikeda, M.; Kadoshima, K.; Koizumi, Y.; Yamakawa, T.; Asakawa, E.; Sumi, T.; Kose, M.

    2016-12-01

    J-MARES (Research and Development Partnership for Next Generation Technology of Marine Resources Survey, JAPAN) has been designing a low-cost and high-efficiency exploration system for seafloor hydrothermal massive sulfide (SMS) deposits in "Cross-ministerial Strategic Innovation Promotion Program (SIP)" granted by the Cabinet Office, Government of Japan since 2014. We proposed the multi-stage approach, which is designed from the regional scaled to the detail scaled survey stages through semi-detail scaled, focusing a prospective area by seafloor topographic analyses. We applied this method to the area of more than 100km x 100km around Okinawa Trough, including some well-known mineralized deposits. In the regional scale survey, we assume survey areas are more than 100 km x 100km. Then the spatial resolution of topography data should be bigger than 100m. The 500 m resolution data which is interpolated into 250 m resolution was used for extracting depression and performing principal component analysis (PCA) by the wavelength obtained from frequency analysis. As the result, we have successfully extracted the areas having the topographic features quite similar to well-known mineralized deposits. In the semi-local survey stage, we use the topography data obtained by bathymetric survey using multi-narrow beam echo-sounder. The 30m-resolution data was used for extracting depression, relative-large mounds, hills, lineaments as fault, and also for performing frequency analysis. As the result, wavelength as principal component constituting in the target area was extracted by PCA of wavelength obtained from frequency analysis. Therefore, color image was composited by using the second principal component (PC2) to the forth principal component (PC4) in which the continuity of specific wavelength was observed, and consistent with extracted lineaments. In addition, well-known mineralized deposits were discriminated in the same clusters by using clustering from PC2 to PC4.We applied the results described above to a new area, and successfully extract the quite similar area in vicinity to one of the well-known mineralized deposits. So we are going to verify the extracted areas by using geophysical methods, such as vertical cable seismic and time-domain EM survey, developed in this SIP project.

  16. Assessment of mechanical properties of isolated bovine intervertebral discs from multi-parametric magnetic resonance imaging.

    PubMed

    Recuerda, Maximilien; Périé, Delphine; Gilbert, Guillaume; Beaudoin, Gilles

    2012-10-12

    The treatment planning of spine pathologies requires information on the rigidity and permeability of the intervertebral discs (IVDs). Magnetic resonance imaging (MRI) offers great potential as a sensitive and non-invasive technique for describing the mechanical properties of IVDs. However, the literature reported small correlation coefficients between mechanical properties and MRI parameters. Our hypothesis is that the compressive modulus and the permeability of the IVD can be predicted by a linear combination of MRI parameters. Sixty IVDs were harvested from bovine tails, and randomly separated in four groups (in-situ, digested-6h, digested-18h, digested-24h). Multi-parametric MRI acquisitions were used to quantify the relaxation times T1 and T2, the magnetization transfer ratio MTR, the apparent diffusion coefficient ADC and the fractional anisotropy FA. Unconfined compression, confined compression and direct permeability measurements were performed to quantify the compressive moduli and the hydraulic permeabilities. Differences between groups were evaluated from a one way ANOVA. Multi linear regressions were performed between dependent mechanical properties and independent MRI parameters to verify our hypothesis. A principal component analysis was used to convert the set of possibly correlated variables into a set of linearly uncorrelated variables. Agglomerative Hierarchical Clustering was performed on the 3 principal components. Multilinear regressions showed that 45 to 80% of the Young's modulus E, the aggregate modulus in absence of deformation HA0, the radial permeability kr and the axial permeability in absence of deformation k0 can be explained by the MRI parameters within both the nucleus pulposus and the annulus pulposus. The principal component analysis reduced our variables to two principal components with a cumulative variability of 52-65%, which increased to 70-82% when considering the third principal component. The dendograms showed a natural division into four clusters for the nucleus pulposus and into three or four clusters for the annulus fibrosus. The compressive moduli and the permeabilities of isolated IVDs can be assessed mostly by MT and diffusion sequences. However, the relationships have to be improved with the inclusion of MRI parameters more sensitive to IVD degeneration. Before the use of this technique to quantify the mechanical properties of IVDs in vivo on patients suffering from various diseases, the relationships have to be defined for each degeneration state of the tissue that mimics the pathology. Our MRI protocol associated to principal component analysis and agglomerative hierarchical clustering are promising tools to classify the degenerated intervertebral discs and further find biomarkers and predictive factors of the evolution of the pathologies.

  17. Change detection for synthetic aperture radar images based on pattern and intensity distinctiveness analysis

    NASA Astrophysics Data System (ADS)

    Wang, Xiao; Gao, Feng; Dong, Junyu; Qi, Qiang

    2018-04-01

    Synthetic aperture radar (SAR) image is independent on atmospheric conditions, and it is the ideal image source for change detection. Existing methods directly analysis all the regions in the speckle noise contaminated difference image. The performance of these methods is easily affected by small noisy regions. In this paper, we proposed a novel change detection framework for saliency-guided change detection based on pattern and intensity distinctiveness analysis. The saliency analysis step can remove small noisy regions, and therefore makes the proposed method more robust to the speckle noise. In the proposed method, the log-ratio operator is first utilized to obtain a difference image (DI). Then, the saliency detection method based on pattern and intensity distinctiveness analysis is utilized to obtain the changed region candidates. Finally, principal component analysis and k-means clustering are employed to analysis pixels in the changed region candidates. Thus, the final change map can be obtained by classifying these pixels into changed or unchanged class. The experiment results on two real SAR images datasets have demonstrated the effectiveness of the proposed method.

  18. Analysis and improvement measures of flight delay in China

    NASA Astrophysics Data System (ADS)

    Zang, Yuhang

    2017-03-01

    Firstly, this paper establishes the principal component regression model to analyze the data quantitatively, based on principal component analysis to get the three principal component factors of flight delays. Then the least square method is used to analyze the factors and obtained the regression equation expression by substitution, and then found that the main reason for flight delays is airlines, followed by weather and traffic. Aiming at the above problems, this paper improves the controllable aspects of traffic flow control. For reasons of traffic flow control, an adaptive genetic queuing model is established for the runway terminal area. This paper, establish optimization method that fifteen planes landed simultaneously on the three runway based on Beijing capital international airport, comparing the results with the existing FCFS algorithm, the superiority of the model is proved.

  19. Detection of micro solder balls using active thermography and probabilistic neural network

    NASA Astrophysics Data System (ADS)

    He, Zhenzhi; Wei, Li; Shao, Minghui; Lu, Xingning

    2017-03-01

    Micro solder ball/bump has been widely used in electronic packaging. It has been challenging to inspect these structures as the solder balls/bumps are often embedded between the component and substrates, especially in flip-chip packaging. In this paper, a detection method for micro solder ball/bump based on the active thermography and the probabilistic neural network is investigated. A VH680 infrared imager is used to capture the thermal image of the test vehicle, SFA10 packages. The temperature curves are processed using moving average technique to remove the peak noise. And the principal component analysis (PCA) is adopted to reconstruct the thermal images. The missed solder balls can be recognized explicitly in the second principal component image. Probabilistic neural network (PNN) is then established to identify the defective bump intelligently. The hot spots corresponding to the solder balls are segmented from the PCA reconstructed image, and statistic parameters are calculated. To characterize the thermal properties of solder bump quantitatively, three representative features are selected and used as the input vector in PNN clustering. The results show that the actual outputs and the expected outputs are consistent in identification of the missed solder balls, and all the bumps were recognized accurately, which demonstrates the viability of the PNN in effective defect inspection in high-density microelectronic packaging.

  20. Spatial assessment of water quality using chemometrics in the Pearl River Estuary, China

    NASA Astrophysics Data System (ADS)

    Wu, Meilin; Wang, Youshao; Dong, Junde; Sun, Fulin; Wang, Yutu; Hong, Yiguo

    2017-03-01

    A cruise was commissioned in the summer of 2009 to evaluate water quality in the Pearl River Estuary (PRE). Chemometrics such as Principal Component Analysis (PCA), Cluster analysis (CA) and Self-Organizing Map (SOM) were employed to identify anthropogenic and natural influences on estuary water quality. The scores of stations in the surface layer in the first principal component (PC1) were related to NH4-N, PO4-P, NO2-N, NO3-N, TP, and Chlorophyll a while salinity, turbidity, and SiO3-Si in the second principal component (PC2). Similarly, the scores of stations in the bottom layers in PC1 were related to PO4-P, NO2-N, NO3-N, and TP, while salinity, Chlorophyll a, NH4-N, and SiO3-Si in PC2. Results of the PCA identified the spatial distribution of the surface and bottom water quality, namely the Guangzhou urban reach, Middle reach, and Lower reach of the estuary. Both cluster analysis and PCA produced the similar results. Self-organizing map delineated the Guangzhou urban reach of the Pearl River that was mainly influenced by human activities. The middle and lower reaches of the PRE were mainly influenced by the waters in the South China Sea. The information extracted by PCA, CA, and SOM would be very useful to regional agencies in developing a strategy to carry out scientific plans for resource use based on marine system functions.

  1. Fast principal component analysis for stacking seismic data

    NASA Astrophysics Data System (ADS)

    Wu, Juan; Bai, Min

    2018-04-01

    Stacking seismic data plays an indispensable role in many steps of the seismic data processing and imaging workflow. Optimal stacking of seismic data can help mitigate seismic noise and enhance the principal components to a great extent. Traditional average-based seismic stacking methods cannot obtain optimal performance when the ambient noise is extremely strong. We propose a principal component analysis (PCA) algorithm for stacking seismic data without being sensitive to noise level. Considering the computational bottleneck of the classic PCA algorithm in processing massive seismic data, we propose an efficient PCA algorithm to make the proposed method readily applicable for industrial applications. Two numerically designed examples and one real seismic data are used to demonstrate the performance of the presented method.

  2. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification.

    PubMed

    Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin

    2015-12-01

    One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .

  3. Principal Component Relaxation Mode Analysis of an All-Atom Molecular Dynamics Simulation of Human Lysozyme

    NASA Astrophysics Data System (ADS)

    Nagai, Toshiki; Mitsutake, Ayori; Takano, Hiroshi

    2013-02-01

    A new relaxation mode analysis method, which is referred to as the principal component relaxation mode analysis method, has been proposed to handle a large number of degrees of freedom of protein systems. In this method, principal component analysis is carried out first and then relaxation mode analysis is applied to a small number of principal components with large fluctuations. To reduce the contribution of fast relaxation modes in these principal components efficiently, we have also proposed a relaxation mode analysis method using multiple evolution times. The principal component relaxation mode analysis method using two evolution times has been applied to an all-atom molecular dynamics simulation of human lysozyme in aqueous solution. Slow relaxation modes and corresponding relaxation times have been appropriately estimated, demonstrating that the method is applicable to protein systems.

  4. Spike sorting based upon machine learning algorithms (SOMA).

    PubMed

    Horton, P M; Nicol, A U; Kendrick, K M; Feng, J F

    2007-02-15

    We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological data and automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensions to a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known number of clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance of misclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealing separable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired using PCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have applied it to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. The SOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.

  5. Multivariate analysis of selected metals in tannery effluents and related soil.

    PubMed

    Tariq, Saadia R; Shah, Munir H; Shaheen, N; Khalique, A; Manzoor, S; Jaffar, M

    2005-06-30

    Effluent and relevant soil samples from 38 tanning units housed in Kasur, Pakistan, were obtained for metal analysis by flame atomic absorption spectrophotometric method. The levels of 12 metals, Na, Ca, K, Mg, Fe, Mn, Cr, Co, Cd, Ni, Pb and Zn were determined in the two media. The data were evaluated towards metal distribution and metal-to-metal correlations. The study evidenced enhanced levels of Cr (391, 16.7 mg/L) and Na (25,519, 9369 mg/L) in tannery effluents and relevant soil samples, respectively. The effluent versus soil trace metal content relationship confirmed that the effluent Cr was strongly correlated with soil Cr. For metal source identification the techniques of principal component analysis, and cluster analysis were applied. The principal component analysis yielded two factors for effluents: factor 1 (49.6% variance) showed significant loading for Ca, Fe, Mn, Cr, Cd, Ni, Pb and Zn, referring to a tanning related source for these metals, and factor 2 (12.6% variance) with higher loadings of Na, K, Mg and Co, was associated with the processes during the skin/hide treatment. Similarly, two factors with a cumulative variance of 34.8% were obtained for soil samples: factor 1 manifested the contribution from Mg, Mn, Co, Cd, Ni and Pb, which though soil-based is basically effluent-derived, while factor 2 was found associated with Na, K, Ca, Cr and Zn which referred to a tannery-based source. The dendograms obtained from cluster analysis, also support the observed results. The study exhibits a gross pollution of soils with Cr at levels far exceeding the stipulated safe limit laid down for tannery effluents.

  6. MO-DE-207B-03: Improved Cancer Classification Using Patient-Specific Biological Pathway Information Via Gene Expression Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Young, M; Craft, D

    Purpose: To develop an efficient, pathway-based classification system using network biology statistics to assist in patient-specific response predictions to radiation and drug therapies across multiple cancer types. Methods: We developed PICS (Pathway Informed Classification System), a novel two-step cancer classification algorithm. In PICS, a matrix m of mRNA expression values for a patient cohort is collapsed into a matrix p of biological pathways. The entries of p, which we term pathway scores, are obtained from either principal component analysis (PCA), normal tissue centroid (NTC), or gene expression deviation (GED). The pathway score matrix is clustered using both k-means and hierarchicalmore » clustering, and a clustering is judged by how well it groups patients into distinct survival classes. The most effective pathway scoring/clustering combination, per clustering p-value, thus generates various ‘signatures’ for conventional and functional cancer classification. Results: PICS successfully regularized large dimension gene data, separated normal and cancerous tissues, and clustered a large patient cohort spanning six cancer types. Furthermore, PICS clustered patient cohorts into distinct, statistically-significant survival groups. For a suboptimally-debulked ovarian cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00127) showed significant improvement over that of a prior gene expression-classified study (p = .0179). For a pancreatic cancer set, the pathway-classified Kaplan-Meier survival curve (p = .00141) showed significant improvement over that of a prior gene expression-classified study (p = .04). Pathway-based classification confirmed biomarkers for the pyrimidine, WNT-signaling, glycerophosphoglycerol, beta-alanine, and panthothenic acid pathways for ovarian cancer. Despite its robust nature, PICS requires significantly less run time than current pathway scoring methods. Conclusion: This work validates the PICS method to improve cancer classification using biological pathways. Patients are classified with greater specificity and physiological relevance as compared to current gene-specific approaches. Focus now moves to utilizing PICS for pan-cancer patient-specific treatment response prediction.« less

  7. Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering

    PubMed Central

    2010-01-01

    Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082

  8. Applications of Nonlinear Principal Components Analysis to Behavioral Data.

    ERIC Educational Resources Information Center

    Hicks, Marilyn Maginley

    1981-01-01

    An empirical investigation of the statistical procedure entitled nonlinear principal components analysis was conducted on a known equation and on measurement data in order to demonstrate the procedure and examine its potential usefulness. This method was suggested by R. Gnanadesikan and based on an early paper of Karl Pearson. (Author/AL)

  9. Stationary Wavelet-based Two-directional Two-dimensional Principal Component Analysis for EMG Signal Classification

    NASA Astrophysics Data System (ADS)

    Ji, Yi; Sun, Shanlin; Xie, Hong-Bo

    2017-06-01

    Discrete wavelet transform (WT) followed by principal component analysis (PCA) has been a powerful approach for the analysis of biomedical signals. Wavelet coefficients at various scales and channels were usually transformed into a one-dimensional array, causing issues such as the curse of dimensionality dilemma and small sample size problem. In addition, lack of time-shift invariance of WT coefficients can be modeled as noise and degrades the classifier performance. In this study, we present a stationary wavelet-based two-directional two-dimensional principal component analysis (SW2D2PCA) method for the efficient and effective extraction of essential feature information from signals. Time-invariant multi-scale matrices are constructed in the first step. The two-directional two-dimensional principal component analysis then operates on the multi-scale matrices to reduce the dimension, rather than vectors in conventional PCA. Results are presented from an experiment to classify eight hand motions using 4-channel electromyographic (EMG) signals recorded in healthy subjects and amputees, which illustrates the efficiency and effectiveness of the proposed method for biomedical signal analysis.

  10. Prediction of activation patterns preceding hallucinations in patients with schizophrenia using machine learning with structured sparsity.

    PubMed

    de Pierrefeu, Amicie; Fovet, Thomas; Hadj-Selem, Fouad; Löfstedt, Tommy; Ciuciu, Philippe; Lefebvre, Stephanie; Thomas, Pierre; Lopes, Renaud; Jardri, Renaud; Duchesnay, Edouard

    2018-04-01

    Despite significant progress in the field, the detection of fMRI signal changes during hallucinatory events remains difficult and time-consuming. This article first proposes a machine-learning algorithm to automatically identify resting-state fMRI periods that precede hallucinations versus periods that do not. When applied to whole-brain fMRI data, state-of-the-art classification methods, such as support vector machines (SVM), yield dense solutions that are difficult to interpret. We proposed to extend the existing sparse classification methods by taking the spatial structure of brain images into account with structured sparsity using the total variation penalty. Based on this approach, we obtained reliable classifying performances associated with interpretable predictive patterns, composed of two clearly identifiable clusters in speech-related brain regions. The variation in transition-to-hallucination functional patterns not only from one patient to another but also from one occurrence to the next (e.g., also depending on the sensory modalities involved) appeared to be the major difficulty when developing effective classifiers. Consequently, second, this article aimed to characterize the variability within the prehallucination patterns using an extension of principal component analysis with spatial constraints. The principal components (PCs) and the associated basis patterns shed light on the intrinsic structures of the variability present in the dataset. Such results are promising in the scope of innovative fMRI-guided therapy for drug-resistant hallucinations, such as fMRI-based neurofeedback. © 2018 Wiley Periodicals, Inc.

  11. Simultaneous analysis of 11 main active components in Cirsium setosum based on HPLC-ESI-MS/MS and combined with statistical methods.

    PubMed

    Sun, Qian; Chang, Lu; Ren, Yanping; Cao, Liang; Sun, Yingguang; Du, Yingfeng; Shi, Xiaowei; Wang, Qiao; Zhang, Lantong

    2012-11-01

    A novel method based on high-performance liquid chromatography coupled with electrospray ionization tandem mass spectrometry was developed for simultaneous determination of the 11 major active components including ten flavonoids and one phenolic acid in Cirsium setosum. Separation was performed on a reversed-phase C(18) column with gradient elution of methanol and 0.1‰ acetic acid (v/v). The identification and quantification of the analytes were achieved on a hybrid quadrupole linear ion trap mass spectrometer. Multiple-reaction monitoring scanning was employed for quantification with switching electrospray ion source polarity between positive and negative modes in a single run. Full validation of the assay was carried out including linearity, precision, accuracy, stability, limits of detection and quantification. The results demonstrated that the method developed was reliable, rapid, and specific. The 25 batches of C. setosum samples from different sources were first determined using the developed method and the total contents of 11 analytes ranged from 1717.460 to 23028.258 μg/g. Among them, the content of linarin was highest, and its mean value was 7340.967 μg/g. Principal component analysis and hierarchical clustering analysis were performed to differentiate and classify the samples, which is helpful for comprehensive evaluation of the quality of C. setosum. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Method of Real-Time Principal-Component Analysis

    NASA Technical Reports Server (NTRS)

    Duong, Tuan; Duong, Vu

    2005-01-01

    Dominant-element-based gradient descent and dynamic initial learning rate (DOGEDYN) is a method of sequential principal-component analysis (PCA) that is well suited for such applications as data compression and extraction of features from sets of data. In comparison with a prior method of gradient-descent-based sequential PCA, this method offers a greater rate of learning convergence. Like the prior method, DOGEDYN can be implemented in software. However, the main advantage of DOGEDYN over the prior method lies in the facts that it requires less computation and can be implemented in simpler hardware. It should be possible to implement DOGEDYN in compact, low-power, very-large-scale integrated (VLSI) circuitry that could process data in real time.

  13. Main differences between volatiles of sparkling and base wines accessed through comprehensive two dimensional gas chromatography with time-of-flight mass spectrometric detection and chemometric tools.

    PubMed

    Welke, Juliane Elisa; Zanus, Mauro; Lazzarotto, Marcelo; Pulgati, Fernando Hepp; Zini, Cláudia Alcaraz

    2014-12-01

    The main changes in the volatile profile of base wines and their corresponding sparkling wines produced by traditional method were evaluated and investigated for the first time using headspace solid-phase microextraction combined with comprehensive two-dimensional gas chromatography with time-of-flight mass spectrometry detection (GC×GC/TOFMS) and chemometric tools. Fisher ratios helped to find the 119 analytes that were responsible for the main differences between base and sparkling wines and principal component analysis explained 93.1% of the total variance related to the selected 78 compounds. It was also possible to observe five subclusters in base wines and four subclusters in sparkling wines samples through hierarchical cluster analysis, which seemed to have an organised distribution according to the regions where the wines came from. Twenty of the most important volatile compounds co-eluted with other components and separation of some of them was possible due to GC×GC/TOFMS performance. Copyright © 2014. Published by Elsevier Ltd.

  14. Three dimensional empirical mode decomposition analysis apparatus, method and article manufacture

    NASA Technical Reports Server (NTRS)

    Gloersen, Per (Inventor)

    2004-01-01

    An apparatus and method of analysis for three-dimensional (3D) physical phenomena. The physical phenomena may include any varying 3D phenomena such as time varying polar ice flows. A repesentation of the 3D phenomena is passed through a Hilbert transform to convert the data into complex form. A spatial variable is separated from the complex representation by producing a time based covariance matrix. The temporal parts of the principal components are produced by applying Singular Value Decomposition (SVD). Based on the rapidity with which the eigenvalues decay, the first 3-10 complex principal components (CPC) are selected for Empirical Mode Decomposition into intrinsic modes. The intrinsic modes produced are filtered in order to reconstruct the spatial part of the CPC. Finally, a filtered time series may be reconstructed from the first 3-10 filtered complex principal components.

  15. Chemometric analysis of multisensor hyperspectral images of precipitated atmospheric particulate matter.

    PubMed

    Ofner, Johannes; Kamilli, Katharina A; Eitenberger, Elisabeth; Friedbacher, Gernot; Lendl, Bernhard; Held, Andreas; Lohninger, Hans

    2015-09-15

    The chemometric analysis of multisensor hyperspectral data allows a comprehensive image-based analysis of precipitated atmospheric particles. Atmospheric particulate matter was precipitated on aluminum foils and analyzed by Raman microspectroscopy and subsequently by electron microscopy and energy dispersive X-ray spectroscopy. All obtained images were of the same spot of an area of 100 × 100 μm(2). The two hyperspectral data sets and the high-resolution scanning electron microscope images were fused into a combined multisensor hyperspectral data set. This multisensor data cube was analyzed using principal component analysis, hierarchical cluster analysis, k-means clustering, and vertex component analysis. The detailed chemometric analysis of the multisensor data allowed an extensive chemical interpretation of the precipitated particles, and their structure and composition led to a comprehensive understanding of atmospheric particulate matter.

  16. GDF15(MIC1) H6D Polymorphism Does Not Influence Cardiovascular Disease in a Latin American Population with Rheumatoid Arthritis

    PubMed Central

    Amaya-Amaya, Jenny; Rojas-Villarraga, Adriana; Molano-Gonzalez, Nicolas; Montoya-Sánchez, Laura; Nath, Swapan K.; Anaya, Juan-Manuel

    2015-01-01

    Objective. Rheumatoid arthritis (RA) is the most common autoimmune arthropathy worldwide. The increased prevalence of cardiovascular disease (CVD) in RA is not fully explained by classic risk factors. The aim of this study was to determine the influence of rs1058587 SNP within GDF15(MIC1) gene on the risk of CVD in a Colombian RA population. Methods. This was a cross-sectional analytical study in which 310 consecutive Colombian patients with RA and 228 age- and sex-matched controls were included and assessed for variables associated with CVD. The mixed cluster methodology based on multivariate descriptive methods such as principal components analysis and multiple correspondence analyses and regression tree (CART) predictive model were performed. Results. Of the 310 patients, 87.4% were women and CVD was reported in 69.5%. Significant differences concerning GDF15 polymorphism were not observed between patients and controls. Mean arterial pressure, current smoking, and some clusters were significantly associated with CVD. Conclusion. GDF15 (rs1058587) does not influence the development of CVD in the population studied. PMID:26090487

  17. A stable systemic risk ranking in China's banking sector: Based on principal component analysis

    NASA Astrophysics Data System (ADS)

    Fang, Libing; Xiao, Binqing; Yu, Honghai; You, Qixing

    2018-02-01

    In this paper, we compare five popular systemic risk rankings, and apply principal component analysis (PCA) model to provide a stable systemic risk ranking for the Chinese banking sector. Our empirical results indicate that five methods suggest vastly different systemic risk rankings for the same bank, while the combined systemic risk measure based on PCA provides a reliable ranking. Furthermore, according to factor loadings of the first component, PCA combined ranking is mainly based on fundamentals instead of market price data. We clearly find that price-based rankings are not as practical a method as fundamentals-based ones. This PCA combined ranking directly shows systemic risk contributions of each bank for banking supervision purpose and reminds banks to prevent and cope with the financial crisis in advance.

  18. Robust fiber clustering of cerebral fiber bundles in white matter

    NASA Astrophysics Data System (ADS)

    Yao, Xufeng; Wang, Yongxiong; Zhuang, Songlin

    2014-11-01

    Diffusion tensor imaging fiber tracking (DTI-FT) has been widely accepted in the diagnosis and treatment of brain diseases. During the rendering pipeline of specific fiber tracts, the image noise and low resolution of DTI would lead to false propagations. In this paper, we propose a robust fiber clustering (FC) approach to diminish false fibers from one fiber tract. Our algorithm consists of three steps. Firstly, the optimized fiber assignment continuous tracking (FACT) is implemented to reconstruct one fiber tract; and then each curved fiber in the fiber tract is mapped to a point by kernel principal component analysis (KPCA); finally, the point clouds of fiber tract are clustered by hierarchical clustering which could distinguish false fibers from true fibers in one tract. In our experiment, the corticospinal tract (CST) in one case of human data in vivo was used to validate our method. Our method showed reliable capability in decreasing the false fibers in one tract. In conclusion, our method could effectively optimize the visualization of fiber bundles and would help a lot in the field of fiber evaluation.

  19. Using principal component analysis and annual seasonal trend analysis to assess karst rocky desertification in southwestern China.

    PubMed

    Zhang, Zhiming; Ouyang, Zhiyun; Xiao, Yi; Xiao, Yang; Xu, Weihua

    2017-06-01

    Increasing exploitation of karst resources is causing severe environmental degradation because of the fragility and vulnerability of karst areas. By integrating principal component analysis (PCA) with annual seasonal trend analysis (ASTA), this study assessed karst rocky desertification (KRD) within a spatial context. We first produced fractional vegetation cover (FVC) data from a moderate-resolution imaging spectroradiometer normalized difference vegetation index using a dimidiate pixel model. Then, we generated three main components of the annual FVC data using PCA. Subsequently, we generated the slope image of the annual seasonal trends of FVC using median trend analysis. Finally, we combined the three PCA components and annual seasonal trends of FVC with the incidence of KRD for each type of carbonate rock to classify KRD into one of four categories based on K-means cluster analysis: high, moderate, low, and none. The results of accuracy assessments indicated that this combination approach produced greater accuracy and more reasonable KRD mapping than the average FVC based on the vegetation coverage standard. The KRD map for 2010 indicated that the total area of KRD was 78.76 × 10 3  km 2 , which constitutes about 4.06% of the eight southwest provinces of China. The largest KRD areas were found in Yunnan province. The combined PCA and ASTA approach was demonstrated to be an easily implemented, robust, and flexible method for the mapping and assessment of KRD, which can be used to enhance regional KRD management schemes or to address assessment of other environmental issues.

  20. [The principal components analysis--method to classify the statistical variables with applications in medicine].

    PubMed

    Dascălu, Cristina Gena; Antohe, Magda Ecaterina

    2009-01-01

    Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.

  1. Inequalities in neighbourhood socioeconomic characteristics: potential evidence-base for neighbourhood health planning

    PubMed Central

    Odoi, Agricola; Wray, Ron; Emo, Marion; Birch, Stephen; Hutchison, Brian; Eyles, John; Abernathy, Tom

    2005-01-01

    Background Population health planning aims to improve the health of the entire population and to reduce health inequities among population groups. Socioeconomic factors are increasingly being recognized as major determinants of many aspects of health and causes of health inequities. Knowledge of socioeconomic characteristics of neighbourhoods is necessary to identify their unique health needs and enhance identification of socioeconomically disadvantaged populations. Careful integration of this knowledge into health planning activities is necessary to ensure that health planning and service provision are tailored to unique neighbourhood population health needs. In this study, we identify unique neighbourhood socioeconomic characteristics and classify the neighbourhoods based on these characteristics. Principal components analysis (PCA) of 18 socioeconomic variables was used to identify the principal components explaining most of the variation in socioeconomic characteristics across the neighbourhoods. Cluster analysis was used to classify neighbourhoods based on their socioeconomic characteristics. Results Results of the PCA and cluster analysis were similar but the latter were more objective and easier to interpret. Five neighbourhood types with distinguishing socioeconomic and demographic characteristics were identified. The methodology provides a more complete picture of the neighbourhood socioeconomic characteristics than when a single variable (e.g. income) is used to classify neighbourhoods. Conclusion Cluster analysis is useful for generating neighbourhood population socioeconomic and demographic characteristics that can be useful in guiding neighbourhood health planning and service provision. This study is the first of a series of studies designed to investigate health inequalities at the neighbourhood level with a view to providing evidence-base for health planners, service providers and policy makers to help address health inequity issues at the neighbourhood level. Subsequent studies will investigate inequalities in health outcomes both within and across the neighbourhood types identified in the current study. PMID:16092969

  2. Determination of Arctic sea ice variability modes on interannual timescales via nonhierarchical clustering

    NASA Astrophysics Data System (ADS)

    Fučkar, Neven-Stjepan; Guemas, Virginie; Massonnet, François; Doblas-Reyes, Francisco

    2015-04-01

    Over the modern observational era, the northern hemisphere sea ice concentration, age and thickness have experienced a sharp long-term decline superimposed with strong internal variability. Hence, there is a crucial need to identify robust patterns of Arctic sea ice variability on interannual timescales and disentangle them from the long-term trend in noisy datasets. The principal component analysis (PCA) is a versatile and broadly used method for the study of climate variability. However, the PCA has several limiting aspects because it assumes that all modes of variability have symmetry between positive and negative phases, and suppresses nonlinearities by using a linear covariance matrix. Clustering methods offer an alternative set of dimension reduction tools that are more robust and capable of taking into account possible nonlinear characteristics of a climate field. Cluster analysis aggregates data into groups or clusters based on their distance, to simultaneously minimize the distance between data points in a given cluster and maximize the distance between the centers of the clusters. We extract modes of Arctic interannual sea-ice variability with nonhierarchical K-means cluster analysis and investigate the mechanisms leading to these modes. Our focus is on the sea ice thickness (SIT) as the base variable for clustering because SIT holds most of the climate memory for variability and predictability on interannual timescales. We primarily use global reconstructions of sea ice fields with a state-of-the-art ocean-sea-ice model, but we also verify the robustness of determined clusters in other Arctic sea ice datasets. Applied cluster analysis over the 1958-2013 period shows that the optimal number of detrended SIT clusters is K=3. Determined SIT cluster patterns and their time series of occurrence are rather similar between different seasons and months. Two opposite thermodynamic modes are characterized with prevailing negative or positive SIT anomalies over the Arctic basin. The intermediate mode, with negative anomalies centered on the East Siberian shelf and positive anomalies along the North American side of the basin, has predominately dynamic characteristics. The associated sea ice concentration (SIC) clusters vary more between different seasons and months, but the SIC patterns are physically framed by the SIT cluster patterns.

  3. The potential of clustering methods to define intersection test scenarios: Assessing real-life performance of AEB.

    PubMed

    Sander, Ulrich; Lubbe, Nils

    2018-04-01

    Intersection accidents are frequent and harmful. The accident types 'straight crossing path' (SCP), 'left turn across path - oncoming direction' (LTAP/OD), and 'left-turn across path - lateral direction' (LTAP/LD) represent around 95% of all intersection accidents and one-third of all police-reported car-to-car accidents in Germany. The European New Car Assessment Program (Euro NCAP) have announced that intersection scenarios will be included in their rating from 2020; however, how these scenarios are to be tested has not been defined. This study investigates whether clustering methods can be used to identify a small number of test scenarios sufficiently representative of the accident dataset to evaluate Intersection Automated Emergency Braking (AEB). Data from the German In-Depth Accident Study (GIDAS) and the GIDAS-based Pre-Crash Matrix (PCM) from 1999 to 2016, containing 784 SCP and 453 LTAP/OD accidents, were analyzed with principal component methods to identify variables that account for the relevant total variances of the sample. Three different methods for data clustering were applied to each of the accident types, two similarity-based approaches, namely Hierarchical Clustering (HC) and Partitioning Around Medoids (PAM), and the probability-based Latent Class Clustering (LCC). The optimum number of clusters was derived for HC and PAM with the silhouette method. The PAM algorithm was both initiated with random start medoid selection and medoids from HC. For LCC, the Bayesian Information Criterion (BIC) was used to determine the optimal number of clusters. Test scenarios were defined from optimal cluster medoids weighted by their real-life representation in GIDAS. The set of variables for clustering was further varied to investigate the influence of variable type and character. We quantified how accurately each cluster variation represents real-life AEB performance using pre-crash simulations with PCM data and a generic algorithm for AEB intervention. The usage of different sets of clustering variables resulted in substantially different numbers of clusters. The stability of the resulting clusters increased with prioritization of categorical over continuous variables. For each different set of cluster variables, a strong in-cluster variance of avoided versus non-avoided accidents for the specified Intersection AEB was present. The medoids did not predict the most common Intersection AEB behavior in each cluster. Despite thorough analysis using various cluster methods and variable sets, it was impossible to reduce the diversity of intersection accidents into a set of test scenarios without compromising the ability to predict real-life performance of Intersection AEB. Although this does not imply that other methods cannot succeed, it was observed that small changes in the definition of a scenario resulted in a different avoidance outcome. Therefore, we suggest using limited physical testing to validate more extensive virtual simulations to evaluate vehicle safety. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. Toward structure prediction of cyclic peptides.

    PubMed

    Yu, Hongtao; Lin, Yu-Shan

    2015-02-14

    Cyclic peptides are a promising class of molecules that can be used to target specific protein-protein interactions. A computational method to accurately predict their structures would substantially advance the development of cyclic peptides as modulators of protein-protein interactions. Here, we develop a computational method that integrates bias-exchange metadynamics simulations, a Boltzmann reweighting scheme, dihedral principal component analysis and a modified density peak-based cluster analysis to provide a converged structural description for cyclic peptides. Using this method, we evaluate the performance of a number of popular protein force fields on a model cyclic peptide. All the tested force fields seem to over-stabilize the α-helix and PPII/β regions in the Ramachandran plot, commonly populated by linear peptides and proteins. Our findings suggest that re-parameterization of a force field that well describes the full Ramachandran plot is necessary to accurately model cyclic peptides.

  5. Detection Method of TOXOPLASMA GONDII Tachyzoites

    NASA Astrophysics Data System (ADS)

    Eassa, Souzan; Bose, Chhanda; Alusta, Pierre; Tarasenko, Olga

    2011-06-01

    Tachyzoites are considered to be the most important stage of Toxoplasma gondii which causes toxoplasmosis. T. gondii is, an obligate intracellular parasite which infects a wide range of cells. The present study was designed to develop a method for an early detection of T. gondii tachyzoites. The method comprised of a binding assay which was analyzed using principal component and cluster analysis. Our data showed that glycoconjugates GC1, GC2, GC3 and GC10 exhibit a significantly higher binding affinity for T. gondii tachyzoites as compared to controls (T. gondii only, PAA only, GC 1, 2, 3, and 10 only).

  6. A rapid ATR-FTIR spectroscopic method for detection of sibutramine adulteration in tea and coffee based on hierarchical cluster and principal component analyses.

    PubMed

    Cebi, Nur; Yilmaz, Mustafa Tahsin; Sagdic, Osman

    2017-08-15

    Sibutramine may be illicitly included in herbal slimming foods and supplements marketed as "100% natural" to enhance weight loss. Considering public health and legal regulations, there is an urgent need for effective, rapid and reliable techniques to detect sibutramine in dietetic herbal foods, teas and dietary supplements. This research comprehensively explored, for the first time, detection of sibutramine in green tea, green coffee and mixed herbal tea using ATR-FTIR spectroscopic technique combined with chemometrics. Hierarchical cluster analysis and PCA principle component analysis techniques were employed in spectral range (2746-2656cm -1 ) for classification and discrimination through Euclidian distance and Ward's algorithm. Unadulterated and adulterated samples were classified and discriminated with respect to their sibutramine contents with perfect accuracy without any false prediction. The results suggest that existence of the active substance could be successfully determined at the levels in the range of 0.375-12mg in totally 1.75g of green tea, green coffee and mixed herbal tea by using FTIR-ATR technique combined with chemometrics. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. A precipitation regionalization and regime for Iran based on multivariate analysis

    NASA Astrophysics Data System (ADS)

    Raziei, Tayeb

    2018-02-01

    Monthly precipitation time series of 155 synoptic stations distributed over Iran, covering 1990-2014 time period, were used to identify areas with different precipitation time variability and regimes utilizing S-mode principal component analysis (PCA) and cluster analysis (CA) preceded by T-mode PCA, respectively. Taking into account the maximum loading values of the rotated components, the first approach revealed five sub-regions characterized by different precipitation time variability, while the second method delineated eight sub-regions featured with different precipitation regimes. The sub-regions identified by the two used methods, although partly overlapping, are different considering their areal extent and complement each other as they are useful for different purposes and applications. Northwestern Iran and the Caspian Sea area were found as the two most distinctive Iranian precipitation sub-regions considering both time variability and precipitation regime since they were well captured with relatively identical areas by the two used approaches. However, the areal extents of the other three sub-regions identified by the first approach were not coincident with the coverage of their counterpart sub-regions defined by the second approach. Results suggest that the precipitation sub-region identified by the two methods would not be necessarily the same, as the first method which accounts for the variance of the data grouped stations with similar temporal variability while the second one which considers a fixed climatology defined by the average over the period 1990-2014 clusters stations having a similar march of monthly precipitation.

  8. Pixel-level multisensor image fusion based on matrix completion and robust principal component analysis

    NASA Astrophysics Data System (ADS)

    Wang, Zhuozheng; Deller, J. R.; Fleet, Blair D.

    2016-01-01

    Acquired digital images are often corrupted by a lack of camera focus, faulty illumination, or missing data. An algorithm is presented for fusion of multiple corrupted images of a scene using the lifting wavelet transform. The method employs adaptive fusion arithmetic based on matrix completion and self-adaptive regional variance estimation. Characteristics of the wavelet coefficients are used to adaptively select fusion rules. Robust principal component analysis is applied to low-frequency image components, and regional variance estimation is applied to high-frequency components. Experiments reveal that the method is effective for multifocus, visible-light, and infrared image fusion. Compared with traditional algorithms, the new algorithm not only increases the amount of preserved information and clarity but also improves robustness.

  9. [Discrimination of varieties of borneol using terahertz spectra based on principal component analysis and support vector machine].

    PubMed

    Li, Wu; Hu, Bing; Wang, Ming-wei

    2014-12-01

    In the present paper, the terahertz time-domain spectroscopy (THz-TDS) identification model of borneol based on principal component analysis (PCA) and support vector machine (SVM) was established. As one Chinese common agent, borneol needs a rapid, simple and accurate detection and identification method for its different source and being easily confused in the pharmaceutical and trade links. In order to assure the quality of borneol product and guard the consumer's right, quickly, efficiently and correctly identifying borneol has significant meaning to the production and transaction of borneol. Terahertz time-domain spectroscopy is a new spectroscopy approach to characterize material using terahertz pulse. The absorption terahertz spectra of blumea camphor, borneol camphor and synthetic borneol were measured in the range of 0.2 to 2 THz with the transmission THz-TDS. The PCA scores of 2D plots (PC1 X PC2) and 3D plots (PC1 X PC2 X PC3) of three kinds of borneol samples were obtained through PCA analysis, and both of them have good clustering effect on the 3 different kinds of borneol. The value matrix of the first 10 principal components (PCs) was used to replace the original spectrum data, and the 60 samples of the three kinds of borneol were trained and then the unknown 60 samples were identified. Four kinds of support vector machine model of different kernel functions were set up in this way. Results show that the accuracy of identification and classification of SVM RBF kernel function for three kinds of borneol is 100%, and we selected the SVM with the radial basis kernel function to establish the borneol identification model, in addition, in the noisy case, the classification accuracy rates of four SVM kernel function are above 85%, and this indicates that SVM has strong generalization ability. This study shows that PCA with SVM method of borneol terahertz spectroscopy has good classification and identification effects, and provides a new method for species identification of borneol in Chinese medicine.

  10. HPLC-DAD-ESI-MS Analysis of Flavonoids from Leaves of Different Cultivars of Sweet Osmanthus.

    PubMed

    Wang, Yiguang; Fu, Jianxin; Zhang, Chao; Zhao, Hongbo

    2016-09-14

    Osmanthus fragrans Lour. has traditionally been a popular ornamental plant in China. In this study, ethanol extracts of the leaves of four cultivar groups of O. fragrans were analyzed by high-performance liquid chromatography coupled with diode array detection (HPLC-DAD) and high-performance liquid chromatography with electrospray ionization and mass spectrometry (HPLC-ESI-MS). The results suggest that variation in flavonoids among O. fragrans cultivars is quantitative, rather than qualitative. Fifteen components were detected and separated, among which, the structures of 11 flavonoids and two coumarins were identified or tentatively identified. According to principal component analysis (PCA) and hierarchical cluster analysis (HCA) based on the abundance of these components (expressed as rutin equivalents), 22 selected cultivars were classified into four clusters. The seven cultivars from Cluster III ('Xiaoye Sugui', 'Boye Jingui', 'Wuyi Dangui', 'Yingye Dangui', 'Danzhuang', 'Foding Zhu', and 'Tianxiang Taige'), which are enriched in rutin and total flavonoids, and 'Sijigui' from Cluster II which contained the highest amounts of kaempferol glycosides and apigenin 7-O-glucoside, could be selected as potential pharmaceutical resources. However, the chemotaxonomy in this paper does not correlate with the distribution of the existing cultivar groups, demonstrating that the distribution of flavonoids in O. fragrans leaves does not provide an effective means of classification for O. fragrans cultivars based on flower color.

  11. Study of atmospheric dynamics and pollution in the coastal area of English Channel using clustering technique

    NASA Astrophysics Data System (ADS)

    Sokolov, Anton; Dmitriev, Egor; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmenten, Marc

    2016-04-01

    The problem of atmospheric contamination by principal air pollutants was considered in the industrialized coastal region of English Channel in Dunkirk influenced by north European metropolitan areas. MESO-NH nested models were used for the simulation of the local atmospheric dynamics and the online calculation of Lagrangian backward trajectories with 15-minute temporal resolution and the horizontal resolution down to 500 m. The one-month mesoscale numerical simulation was coupled with local pollution measurements of volatile organic components, particulate matter, ozone, sulphur dioxide and nitrogen oxides. Principal atmospheric pathways were determined by clustering technique applied to backward trajectories simulated. Six clusters were obtained which describe local atmospheric dynamics, four winds blowing through the English Channel, one coming from the south, and the biggest cluster with small wind speeds. This last cluster includes mostly sea breeze events. The analysis of meteorological data and pollution measurements allows relating the principal atmospheric pathways with local air contamination events. It was shown that contamination events are mostly connected with a channelling of pollution from local sources and low-turbulent states of the local atmosphere.

  12. A New 4D Trajectory-Based Approach Unveils Abnormal LV Revolution Dynamics in Hypertrophic Cardiomyopathy

    PubMed Central

    Madeo, Andrea; Piras, Paolo; Re, Federica; Gabriele, Stefano; Nardinocchi, Paola; Teresi, Luciano; Torromeo, Concetta; Chialastri, Claudia; Schiariti, Michele; Giura, Geltrude; Evangelista, Antonietta; Dominici, Tania; Varano, Valerio; Zachara, Elisabetta; Puddu, Paolo Emilio

    2015-01-01

    The assessment of left ventricular shape changes during cardiac revolution may be a new step in clinical cardiology to ease early diagnosis and treatment. To quantify these changes, only point registration was adopted and neither Generalized Procrustes Analysis nor Principal Component Analysis were applied as we did previously to study a group of healthy subjects. Here, we extend to patients affected by hypertrophic cardiomyopathy the original approach and preliminarily include genotype positive/phenotype negative individuals to explore the potential that incumbent pathology might also be detected. Using 3D Speckle Tracking Echocardiography, we recorded left ventricular shape of 48 healthy subjects, 24 patients affected by hypertrophic cardiomyopathy and 3 genotype positive/phenotype negative individuals. We then applied Generalized Procrustes Analysis and Principal Component Analysis and inter-individual differences were cleaned by Parallel Transport performed on the tangent space, along the horizontal geodesic, between the per-subject consensuses and the grand mean. Endocardial and epicardial layers were evaluated separately, different from many ecocardiographic applications. Under a common Principal Component Analysis, we then evaluated left ventricle morphological changes (at both layers) explained by first Principal Component scores. Trajectories’ shape and orientation were investigated and contrasted. Logistic regression and Receiver Operating Characteristic curves were used to compare these morphometric indicators with traditional 3D Speckle Tracking Echocardiography global parameters. Geometric morphometrics indicators performed better than 3D Speckle Tracking Echocardiography global parameters in recognizing pathology both in systole and diastole. Genotype positive/phenotype negative individuals clustered with patients affected by hypertrophic cardiomyopathy during diastole, suggesting that incumbent pathology may indeed be foreseen by these methods. Left ventricle deformation in patients affected by hypertrophic cardiomyopathy compared to healthy subjects may be assessed by modern shape analysis better than by traditional 3D Speckle Tracking Echocardiography global parameters. Hypertrophic cardiomyopathy pathophysiology was unveiled in a new manner whereby also diastolic phase abnormalities are evident which is more difficult to investigate by traditional ecocardiographic techniques. PMID:25875818

  13. Design of an audio advertisement dataset

    NASA Astrophysics Data System (ADS)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  14. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data

    PubMed Central

    Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.

    2015-01-01

    Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888

  15. Ecological characteristics of Simulium breeding sites in West Africa.

    PubMed

    Cheke, Robert A; Young, Stephen; Garms, Rolf

    2017-03-01

    Twenty-nine taxa of Simulium were identified amongst 527 collections of larvae and pupae from untreated rivers and streams in Liberia (362 collections in 1967-71 & 1989), Togo (125 in 1979-81), Benin (35 in 1979-81) and Ghana (5 in 1980-81). Presence or absence of associations between different taxa were used to group them into six clusters using Ward agglomerative hierarchical cluster analysis. Environmental data associated with the pre-imaginal habitats were then analysed in relation to the six clusters by one way ANOVA. The results revealed significant effects in determining the clusters of maximum river width (all P<0.001 unless stated otherwise), water temperature, dry bulb air temperature, relative humidity, altitude, type of water (on a range from trickle to large river), water level, slope, current, vegetation, light conditions, discharge, length of breeding area, environs, terrain, river bed type (P<0.01), and the supports to which the insects were attached (P<0.01). When four non-significant contributors (wet bulb temperature, river features, height of waterfall and depth) were excluded and the reduced data-set analysed by principal components analysis (PCA), the first two principal components (PCs) accounted for 87% of the variance, with geographical features dominant in PC1 and hydrological characteristics in PC2. The analyses also revealed the ecological characteristics of each taxon's pre-imaginal habitats, which are discussed with particular reference to members of the Simulium damnosum species complex, whose breeding site distributions were further analysed by canonical correspondence analysis (CCA), a method also applied to the data on non-vector species. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox.

    PubMed

    Basto, Mafalda P; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation.

  17. Assessing Genetic Structure in Common but Ecologically Distinct Carnivores: The Stone Marten and Red Fox

    PubMed Central

    Basto, Mafalda P.; Santos-Reis, Margarida; Simões, Luciana; Grilo, Clara; Cardoso, Luís; Cortes, Helder; Bruford, Michael W.; Fernandes, Carlos

    2016-01-01

    The identification of populations and spatial genetic patterns is important for ecological and conservation research, and spatially explicit individual-based methods have been recognised as powerful tools in this context. Mammalian carnivores are intrinsically vulnerable to habitat fragmentation but not much is known about the genetic consequences of fragmentation in common species. Stone martens (Martes foina) and red foxes (Vulpes vulpes) share a widespread Palearctic distribution and are considered habitat generalists, but in the Iberian Peninsula stone martens tend to occur in higher quality habitats. We compared their genetic structure in Portugal to see if they are consistent with their differences in ecological plasticity, and also to illustrate an approach to explicitly delineate the spatial boundaries of consistently identified genetic units. We analysed microsatellite data using spatial Bayesian clustering methods (implemented in the software BAPS, GENELAND and TESS), a progressive partitioning approach and a multivariate technique (Spatial Principal Components Analysis-sPCA). Three consensus Bayesian clusters were identified for the stone marten. No consensus was achieved for the red fox, but one cluster was the most probable clustering solution. Progressive partitioning and sPCA suggested additional clusters in the stone marten but they were not consistent among methods and were geographically incoherent. The contrasting results between the two species are consistent with the literature reporting stricter ecological requirements of the stone marten in the Iberian Peninsula. The observed genetic structure in the stone marten may have been influenced by landscape features, particularly rivers, and fragmentation. We suggest that an approach based on a consensus clustering solution of multiple different algorithms may provide an objective and effective means to delineate potential boundaries of inferred subpopulations. sPCA and progressive partitioning offer further verification of possible population structure and may be useful for revealing cryptic spatial genetic patterns worth further investigation. PMID:26727497

  18. Unsupervised analysis of small animal dynamic Cerenkov luminescence imaging

    NASA Astrophysics Data System (ADS)

    Spinelli, Antonello E.; Boschi, Federico

    2011-12-01

    Clustering analysis (CA) and principal component analysis (PCA) were applied to dynamic Cerenkov luminescence images (dCLI). In order to investigate the performances of the proposed approaches, two distinct dynamic data sets obtained by injecting mice with 32P-ATP and 18F-FDG were acquired using the IVIS 200 optical imager. The k-means clustering algorithm has been applied to dCLI and was implemented using interactive data language 8.1. We show that cluster analysis allows us to obtain good agreement between the clustered and the corresponding emission regions like the bladder, the liver, and the tumor. We also show a good correspondence between the time activity curves of the different regions obtained by using CA and manual region of interest analysis on dCLIT and PCA images. We conclude that CA provides an automatic unsupervised method for the analysis of preclinical dynamic Cerenkov luminescence image data.

  19. Automated Classification and Analysis of Non-metallic Inclusion Data Sets

    NASA Astrophysics Data System (ADS)

    Abdulsalam, Mohammad; Zhang, Tongsheng; Tan, Jia; Webler, Bryan A.

    2018-05-01

    The aim of this study is to utilize principal component analysis (PCA), clustering methods, and correlation analysis to condense and examine large, multivariate data sets produced from automated analysis of non-metallic inclusions. Non-metallic inclusions play a major role in defining the properties of steel and their examination has been greatly aided by automated analysis in scanning electron microscopes equipped with energy dispersive X-ray spectroscopy. The methods were applied to analyze inclusions on two sets of samples: two laboratory-scale samples and four industrial samples from a near-finished 4140 alloy steel components with varying machinability. The laboratory samples had well-defined inclusions chemistries, composed of MgO-Al2O3-CaO, spinel (MgO-Al2O3), and calcium aluminate inclusions. The industrial samples contained MnS inclusions as well as (Ca,Mn)S + calcium aluminate oxide inclusions. PCA could be used to reduce inclusion chemistry variables to a 2D plot, which revealed inclusion chemistry groupings in the samples. Clustering methods were used to automatically classify inclusion chemistry measurements into groups, i.e., no user-defined rules were required.

  20. A Taxonomy-Based Approach to Shed Light on the Babel of Mathematical Models for Rice Simulation

    NASA Technical Reports Server (NTRS)

    Confalonieri, Roberto; Bregaglio, Simone; Adam, Myriam; Ruget, Francoise; Li, Tao; Hasegawa, Toshihiro; Yin, Xinyou; Zhu, Yan; Boote, Kenneth; Buis, Samuel; hide

    2016-01-01

    For most biophysical domains, differences in model structures are seldom quantified. Here, we used a taxonomy-based approach to characterise thirteen rice models. Classification keys and binary attributes for each key were identified, and models were categorised into five clusters using a binary similarity measure and the unweighted pair-group method with arithmetic mean. Principal component analysis was performed on model outputs at four sites. Results indicated that (i) differences in structure often resulted in similar predictions and (ii) similar structures can lead to large differences in model outputs. User subjectivity during calibration may have hidden expected relationships between model structure and behaviour. This explanation, if confirmed, highlights the need for shared protocols to reduce the degrees of freedom during calibration, and to limit, in turn, the risk that user subjectivity influences model performance.

  1. Taxonomic discrimination of higher plants by pyrolysis mass spectrometry.

    PubMed

    Kim, S W; Ban, S H; Chung, H J; Choi, D W; Choi, P S; Yoo, O J; Liu, J R

    2004-02-01

    Pyrolysis mass spectrometry (PyMS) is a rapid, simple, high-resolution analytical method based on thermal degradation of complex material in a vacuum and has been widely applied to the discrimination of closely related microbial strains. Leaf samples of six species and one variety of higher plants (Rosa multiflora, R. multiflora var. platyphylla, Sedum kamtschaticum, S. takesimense, S. sarmentosum, Hepatica insularis, and H. asiatica) were subjected to PyMS for spectral fingerprinting. Principal component analysis of PyMS data was not able to discriminate these plants in discrete clusters. However, canonical variate analysis of PyMS data separated these plants from one another. A hierarchical dendrogram based on canonical variate analysis was in agreement with the known taxonomy of the plants at the variety level. These results indicate that PyMS is able to discriminate higher plants based on taxonomic classification at the family, genus, species, and variety level.

  2. Research on potential user identification model for electric energy substitution

    NASA Astrophysics Data System (ADS)

    Xia, Huaijian; Chen, Meiling; Lin, Haiying; Yang, Shuo; Miao, Bo; Zhu, Xinzhi

    2018-01-01

    The implementation of energy substitution plays an important role in promoting the development of energy conservation and emission reduction in china. Energy service management platform of alternative energy users based on the data in the enterprise production value, product output, coal and other energy consumption as a potential evaluation index, using principal component analysis model to simplify the formation of characteristic index, comprehensive index contains the original variables, and using fuzzy clustering model for the same industry user’s flexible classification. The comprehensive index number and user clustering classification based on constructed particle optimization neural network classification model based on the user, user can replace electric potential prediction. The results of an example show that the model can effectively predict the potential of users’ energy potential.

  3. Experimental Researches on the Durability Indicators and the Physiological Comfort of Fabrics using the Principal Component Analysis (PCA) Method

    NASA Astrophysics Data System (ADS)

    Hristian, L.; Ostafe, M. M.; Manea, L. R.; Apostol, L. L.

    2017-06-01

    The work pursued the distribution of combed wool fabrics destined to manufacturing of external articles of clothing in terms of the values of durability and physiological comfort indices, using the mathematical model of Principal Component Analysis (PCA). Principal Components Analysis (PCA) applied in this study is a descriptive method of the multivariate analysis/multi-dimensional data, and aims to reduce, under control, the number of variables (columns) of the matrix data as much as possible to two or three. Therefore, based on the information about each group/assortment of fabrics, it is desired that, instead of nine inter-correlated variables, to have only two or three new variables called components. The PCA target is to extract the smallest number of components which recover the most of the total information contained in the initial data.

  4. Identification of Three Kinds of Citri Reticulatae Pericarpium Based on Deoxyribonucleic Acid Barcoding and High-performance Liquid Chromatography-diode Array Detection-electrospray Ionization/Mass Spectrometry/Mass Spectrometry Combined with Chemometric Analysis

    PubMed Central

    Yu, Xiaoxue; Zhang, Yafeng; Wang, Dongmei; Jiang, Lin; Xu, Xinjun

    2018-01-01

    Background: Citri Reticulatae Pericarpium is the dried mature pericarp of Citrus reticulata Blanco which can be divided into “Chenpi” and “Guangchenpi.” “Guangchenpi” is the genuine Chinese medicinal material in Xinhui, Guangdong province; based on the greatest quality and least amount, it is most expensive among others. Hesperidin is used as the marker to identify Citri Reticulatae Pericarpium described in the Chinese Pharmacopoeia 2010. However, both “Chenpi” and “Guangchenpi” contain hesperidin so that it is impossible to differentiate them by measuring hesperidin. Objective: Our study aims to develop an efficient and accurate method to separate and identify “Guangchenpi” from other Citri Reticulatae Pericarpium. Materials and Methods: The genomic deoxyribonucleic acid (DNA) of all the materials was extracted and then the internal transcribed spacer 2 was amplified, sequenced, aligned, and analyzed. The secondary structures were created in terms of the database and website established by Jörg Schultz et al. High-performance liquid chromatography-diode array detection-electrospray Ionization/mass spectrometry (HPLC-DAD-ESI-MS)/MS coupled with chemometric analysis was applied to compare the differences in chemical profiles of the three kinds of Citri Reticulatae Pericarpium. Results: A total of 22 samples were classified into three groups. The results of DNA barcoding were in accordance with principal component analysis and hierarchical cluster analysis. Eight compounds were deduced from HPLC-DAD-ESI-MS/MS. Conclusions: This method is a reliable and effective tool to differentiate the three Citri Reticulatae Pericarpium. SUMMARY The internal transcribed spacer 2 regions and the secondary structure among three kinds of Citri Reticulatae Pericarpium varied considerablyAll the 22 samples were analyzed by high-performance liquid chromatography (HPLC) to obtain the chemical profilesPrincipal component analysis and hierarchical cluster analysis were used in the chemometric analysisdeoxyribonucleic acid barcoding and HPLC-diode array detection-electrospray ionization/mass spectrometry/MS coupled with chemometric analysis provided an accurate and strong proof to identify these three herbs. Abbreviations used: CTAB: Hexadecyltrimethylammonium bromide, DNA: Deoxyribonucleic acid, ITS2: Internal transcribed spacer 2, PCR: Polymerase chain reaction. PMID:29576703

  5. Priority of VHS Development Based in Potential Area using Principal Component Analysis

    NASA Astrophysics Data System (ADS)

    Meirawan, D.; Ana, A.; Saripudin, S.

    2018-02-01

    The current condition of VHS is still inadequate in quality, quantity and relevance. The purpose of this research is to analyse the development of VHS based on the development of regional potential by using principal component analysis (PCA) in Bandung, Indonesia. This study used descriptive qualitative data analysis using the principle of secondary data reduction component. The method used is Principal Component Analysis (PCA) analysis with Minitab Statistics Software tool. The results of this study indicate the value of the lowest requirement is a priority of the construction of development VHS with a program of majors in accordance with the development of regional potential. Based on the PCA score found that the main priority in the development of VHS in Bandung is in Saguling, which has the lowest PCA value of 416.92 in area 1, Cihampelas with the lowest PCA value in region 2 and Padalarang with the lowest PCA value.

  6. Genetic diversity and relationship analysis of Gossypium arboreum accessions.

    PubMed

    Liu, F; Zhou, Z L; Wang, C Y; Wang, Y H; Cai, X Y; Wang, X X; Zhang, Z S; Wang, K B

    2015-11-19

    Simple sequence repeat techniques were used to identify the genetic diversity of 101 Gossypium arboreum accessions collected from India, Vietnam, and the southwest of China (Guizhou, Guangxi, and Yunnan provinces). Twenty-six pairs of SSR primers produced a total of 103 polymorphic loci with an average of 3.96 polymorphic loci per primer. The average of the effective number of alleles, Nei's gene diversity, and Shannon's information index were 0.59, 0.2835, and 0.4361, respectively. The diversity varied among different geographic regions. The result of principal component analysis was consistent with that of unweighted pair group method with arithmetic mean clustering analysis. The 101 G. arboreum accessions were clustered into 2 groups.

  7. Construction and comparison of gene co-expression networks shows complex plant immune responses

    PubMed Central

    López, Camilo; López-Kleine, Liliana

    2014-01-01

    Gene co-expression networks (GCNs) are graphic representations that depict the coordinated transcription of genes in response to certain stimuli. GCNs provide functional annotations of genes whose function is unknown and are further used in studies of translational functional genomics among species. In this work, a methodology for the reconstruction and comparison of GCNs is presented. This approach was applied using gene expression data that were obtained from immunity experiments in Arabidopsis thaliana, rice, soybean, tomato and cassava. After the evaluation of diverse similarity metrics for the GCN reconstruction, we recommended the mutual information coefficient measurement and a clustering coefficient-based method for similarity threshold selection. To compare GCNs, we proposed a multivariate approach based on the Principal Component Analysis (PCA). Branches of plant immunity that were exemplified by each experiment were analyzed in conjunction with the PCA results, suggesting both the robustness and the dynamic nature of the cellular responses. The dynamic of molecular plant responses produced networks with different characteristics that are differentiable using our methodology. The comparison of GCNs from plant pathosystems, showed that in response to similar pathogens plants could activate conserved signaling pathways. The results confirmed that the closeness of GCNs projected on the principal component space is an indicative of similarity among GCNs. This also can be used to understand global patterns of events triggered during plant immune responses. PMID:25320678

  8. Measurement of Scenic Spots Sustainable Capacity Based on PCA-Entropy TOPSIS: A Case Study from 30 Provinces, China

    PubMed Central

    Liang, Xuedong; Liu, Canmian; Li, Zhi

    2017-01-01

    In connection with the sustainable development of scenic spots, this paper, with consideration of resource conditions, economic benefits, auxiliary industry scale and ecological environment, establishes a comprehensive measurement model of the sustainable capacity of scenic spots; optimizes the index system by principal components analysis to extract principal components; assigns the weight of principal components by entropy method; analyzes the sustainable capacity of scenic spots in each province of China comprehensively in combination with TOPSIS method and finally puts forward suggestions aid decision-making. According to the study, this method provides an effective reference for the study of the sustainable development of scenic spots and is very significant for considering the sustainable development of scenic spots and auxiliary industries to establish specific and scientific countermeasures for improvement. PMID:29271947

  9. Measurement of Scenic Spots Sustainable Capacity Based on PCA-Entropy TOPSIS: A Case Study from 30 Provinces, China.

    PubMed

    Liang, Xuedong; Liu, Canmian; Li, Zhi

    2017-12-22

    In connection with the sustainable development of scenic spots, this paper, with consideration of resource conditions, economic benefits, auxiliary industry scale and ecological environment, establishes a comprehensive measurement model of the sustainable capacity of scenic spots; optimizes the index system by principal components analysis to extract principal components; assigns the weight of principal components by entropy method; analyzes the sustainable capacity of scenic spots in each province of China comprehensively in combination with TOPSIS method and finally puts forward suggestions aid decision-making. According to the study, this method provides an effective reference for the study of the sustainable development of scenic spots and is very significant for considering the sustainable development of scenic spots and auxiliary industries to establish specific and scientific countermeasures for improvement.

  10. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study

    PubMed Central

    Vergara, María; Basto, Mafalda P.; Madeira, María José; Gómez-Moliner, Benjamín J.; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the rivers Ebro, Tagus and Guadiana, suggesting that main watercourses in the Iberian Peninsula may act as semi-permeable barriers to gene flow in stone martens. To our knowledge, this is the first phylogeographic and population genetic study of the species at a broad regional scale. We also wanted to make the case for the importance and benefits of using and comparing multiple different clustering and multivariate methods in spatial genetic analyses of mobile and continuously distributed species. PMID:26222680

  11. Inferring Population Genetic Structure in Widely and Continuously Distributed Carnivores: The Stone Marten (Martes foina) as a Case Study.

    PubMed

    Vergara, María; Basto, Mafalda P; Madeira, María José; Gómez-Moliner, Benjamín J; Santos-Reis, Margarida; Fernandes, Carlos; Ruiz-González, Aritz

    2015-01-01

    The stone marten is a widely distributed mustelid in the Palaearctic region that exhibits variable habitat preferences in different parts of its range. The species is a Holocene immigrant from southwest Asia which, according to fossil remains, followed the expansion of the Neolithic farming cultures into Europe and possibly colonized the Iberian Peninsula during the Early Neolithic (ca. 7,000 years BP). However, the population genetic structure and historical biogeography of this generalist carnivore remains essentially unknown. In this study we have combined mitochondrial DNA (mtDNA) sequencing (621 bp) and microsatellite genotyping (23 polymorphic markers) to infer the population genetic structure of the stone marten within the Iberian Peninsula. The mtDNA data revealed low haplotype and nucleotide diversities and a lack of phylogeographic structure, most likely due to a recent colonization of the Iberian Peninsula by a few mtDNA lineages during the Early Neolithic. The microsatellite data set was analysed with a) spatial and non-spatial Bayesian individual-based clustering (IBC) approaches (STRUCTURE, TESS, BAPS and GENELAND), and b) multivariate methods [discriminant analysis of principal components (DAPC) and spatial principal component analysis (sPCA)]. Additionally, because isolation by distance (IBD) is a common spatial genetic pattern in mobile and continuously distributed species and it may represent a challenge to the performance of the above methods, the microsatellite data set was tested for its presence. Overall, the genetic structure of the stone marten in the Iberian Peninsula was characterized by a NE-SW spatial pattern of IBD, and this may explain the observed disagreement between clustering solutions obtained by the different IBC methods. However, there was significant indication for contemporary genetic structuring, albeit weak, into at least three different subpopulations. The detected subdivision could be attributed to the influence of the rivers Ebro, Tagus and Guadiana, suggesting that main watercourses in the Iberian Peninsula may act as semi-permeable barriers to gene flow in stone martens. To our knowledge, this is the first phylogeographic and population genetic study of the species at a broad regional scale. We also wanted to make the case for the importance and benefits of using and comparing multiple different clustering and multivariate methods in spatial genetic analyses of mobile and continuously distributed species.

  12. Exploring relationships between Dairy Herd Improvement monitors of performance and the Transition Cow Index in Wisconsin dairy herds.

    PubMed

    Schultz, K K; Bennett, T B; Nordlund, K V; Döpfer, D; Cook, N B

    2016-09-01

    Transition cow management has been tracked via the Transition Cow Index (TCI; AgSource Cooperative Services, Verona, WI) since 2006. Transition Cow Index was developed to measure the difference between actual and predicted milk yield at first test day to evaluate the relative success of the transition period program. This project aimed to assess TCI in relation to all commonly used Dairy Herd Improvement (DHI) metrics available through AgSource Cooperative Services. Regression analysis was used to isolate variables that were relevant to TCI, and then principal components analysis and network analysis were used to determine the relative strength and relatedness among variables. Finally, cluster analysis was used to segregate herds based on similarity of relevant variables. The DHI data were obtained from 2,131 Wisconsin dairy herds with test-day mean ≥30 cows, which were tested ≥10 times throughout the 2014 calendar year. The original list of 940 DHI variables was reduced through expert-driven selection and regression analysis to 23 variables. The K-means cluster analysis produced 5 distinct clusters. Descriptive statistics were calculated for the 23 variables per cluster grouping. Using principal components analysis, cluster analysis, and network analysis, 4 parameters were isolated as most relevant to TCI; these were energy-corrected milk, 3 measures of intramammary infection (dry cow cure rate, linear somatic cell count score in primiparous cows, and new infection rate), peak ratio, and days in milk at peak milk production. These variables together with cow and newborn calf survival measures form a group of metrics that can be used to assist in the evaluation of overall transition period performance. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  13. Classification and identification of Rhodobryum roseum Limpr. and its adulterants based on fourier-transform infrared spectroscopy (FTIR) and chemometrics.

    PubMed

    Cao, Zhen; Wang, Zhenjie; Shang, Zhonglin; Zhao, Jiancheng

    2017-01-01

    Fourier-transform infrared spectroscopy (FTIR) with the attenuated total reflectance technique was used to identify Rhodobryum roseum from its four adulterants. The FTIR spectra of six samples in the range from 4000 cm-1 to 600 cm-1 were obtained. The second-derivative transformation test was used to identify the small and nearby absorption peaks. A cluster analysis was performed to classify the spectra in a dendrogram based on the spectral similarity. Principal component analysis (PCA) was used to classify the species of six moss samples. A cluster analysis with PCA was used to identify different genera. However, some species of the same genus exhibited highly similar chemical components and FTIR spectra. Fourier self-deconvolution and discrete wavelet transform (DWT) were used to enhance the differences among the species with similar chemical components and FTIR spectra. Three scales were selected as the feature-extracting space in the DWT domain. The results show that FTIR spectroscopy with chemometrics is suitable for identifying Rhodobryum roseum and its adulterants.

  14. Corrected confidence bands for functional data using principal components.

    PubMed

    Goldsmith, J; Greven, S; Crainiceanu, C

    2013-03-01

    Functional principal components (FPC) analysis is widely used to decompose and express functional observations. Curve estimates implicitly condition on basis functions and other quantities derived from FPC decompositions; however these objects are unknown in practice. In this article, we propose a method for obtaining correct curve estimates by accounting for uncertainty in FPC decompositions. Additionally, pointwise and simultaneous confidence intervals that account for both model- and decomposition-based variability are constructed. Standard mixed model representations of functional expansions are used to construct curve estimates and variances conditional on a specific decomposition. Iterated expectation and variance formulas combine model-based conditional estimates across the distribution of decompositions. A bootstrap procedure is implemented to understand the uncertainty in principal component decomposition quantities. Our method compares favorably to competing approaches in simulation studies that include both densely and sparsely observed functions. We apply our method to sparse observations of CD4 cell counts and to dense white-matter tract profiles. Code for the analyses and simulations is publicly available, and our method is implemented in the R package refund on CRAN. Copyright © 2013, The International Biometric Society.

  15. Corrected Confidence Bands for Functional Data Using Principal Components

    PubMed Central

    Goldsmith, J.; Greven, S.; Crainiceanu, C.

    2014-01-01

    Functional principal components (FPC) analysis is widely used to decompose and express functional observations. Curve estimates implicitly condition on basis functions and other quantities derived from FPC decompositions; however these objects are unknown in practice. In this article, we propose a method for obtaining correct curve estimates by accounting for uncertainty in FPC decompositions. Additionally, pointwise and simultaneous confidence intervals that account for both model- and decomposition-based variability are constructed. Standard mixed model representations of functional expansions are used to construct curve estimates and variances conditional on a specific decomposition. Iterated expectation and variance formulas combine model-based conditional estimates across the distribution of decompositions. A bootstrap procedure is implemented to understand the uncertainty in principal component decomposition quantities. Our method compares favorably to competing approaches in simulation studies that include both densely and sparsely observed functions. We apply our method to sparse observations of CD4 cell counts and to dense white-matter tract profiles. Code for the analyses and simulations is publicly available, and our method is implemented in the R package refund on CRAN. PMID:23003003

  16. Approximation-based common principal component for feature extraction in multi-class brain-computer interfaces.

    PubMed

    Hoang, Tuan; Tran, Dat; Huang, Xu

    2013-01-01

    Common Spatial Pattern (CSP) is a state-of-the-art method for feature extraction in Brain-Computer Interface (BCI) systems. However it is designed for 2-class BCI classification problems. Current extensions of this method to multiple classes based on subspace union and covariance matrix similarity do not provide a high performance. This paper presents a new approach to solving multi-class BCI classification problems by forming a subspace resembled from original subspaces and the proposed method for this approach is called Approximation-based Common Principal Component (ACPC). We perform experiments on Dataset 2a used in BCI Competition IV to evaluate the proposed method. This dataset was designed for motor imagery classification with 4 classes. Preliminary experiments show that the proposed ACPC feature extraction method when combining with Support Vector Machines outperforms CSP-based feature extraction methods on the experimental dataset.

  17. Chromatographic and computational assessment of lipophilicity using sum of ranking differences and generalized pair-correlation.

    PubMed

    Andrić, Filip; Héberger, Károly

    2015-02-06

    Lipophilicity (logP) represents one of the most studied and most frequently used fundamental physicochemical properties. At present there are several possibilities for its quantitative expression and many of them stems from chromatographic experiments. Numerous attempts have been made to compare different computational methods, chromatographic methods vs. computational approaches, as well as chromatographic methods and direct shake-flask procedure without definite results or these findings are not accepted generally. In the present work numerous chromatographically derived lipophilicity measures in combination with diverse computational methods were ranked and clustered using the novel variable discrimination and ranking approaches based on the sum of ranking differences and the generalized pair correlation method. Available literature logP data measured on HILIC, and classical reversed-phase combining different classes of compounds have been compared with most frequently used multivariate data analysis techniques (principal component and hierarchical cluster analysis) as well as with the conclusions in the original sources. Chromatographic lipophilicity measures obtained under typical reversed-phase conditions outperform the majority of computationally estimated logPs. Oppositely, in the case of HILIC none of the many proposed chromatographic indices overcomes any of the computationally assessed logPs. Only two of them (logkmin and kmin) may be selected as recommended chromatographic lipophilicity measures. Both ranking approaches, sum of ranking differences and generalized pair correlation method, although based on different backgrounds, provides highly similar variable ordering and grouping leading to the same conclusions. Copyright © 2015. Published by Elsevier B.V.

  18. Biomolecular Characterization of Diazotrophs Isolated from the Tropical Soil in Malaysia

    PubMed Central

    Naher, Umme Aminun; Othman, Radziah; Latif, Mohammad Abdul; Panhwar, Qurban Ali; Amaddin, Puteri Aminatulhawa Megat; Shamsuddin, Zulkifli H

    2013-01-01

    This study was conducted to evaluate selected biomolecular characteristics of rice root-associated diazotrophs isolated from the Tanjong Karang rice irrigation project area of Malaysia. Soil and rice plant samples were collected from seven soil series belonging to order Inceptisol (USDA soil taxonomy). A total of 38 diazotrophs were isolated using a nitrogen-free medium. The biochemical properties of the isolated bacteria, such as nitrogenase activity, indoleacetic acid (IAA) production and sugar utilization, were measured. According to a cluster analysis of Jaccard’s similarity coefficients, the genetic similarities among the isolated diazotrophs ranged from 10% to 100%. A dendogram constructed using the unweighted pair-group method with arithmetic mean (UPGMA) showed that the isolated diazotrophs clustered into 12 groups. The genomic DNA rep-PCR data were subjected to a principal component analysis, and the first four principal components (PC) accounted for 52.46% of the total variation among the 38 diazotrophs. The 10 diazotrophs that tested highly positive in the acetylene reduction assay (ARA) were identified as Bacillus spp. (9 diazotrophs) and Burkholderia sp. (Sb16) using the partial 16S rRNA gene sequence analysis. In the analysis of the biochemical characteristics, three principal components were accounted for approximately 85% of the total variation among the identified diazotrophs. The examination of root colonization using scanning electron microscopy (SEM) and transmission electron microscopy (TEM) proved that two of the isolated diazotrophs (Sb16 and Sb26) were able to colonize the surface and interior of rice roots and fixed 22%–24% of the total tissue nitrogen from the atmosphere. In general, the tropical soils (Inceptisols) of the Tanjong Karang rice irrigation project area in Malaysia harbor a diverse group of diazotrophs that exhibit a large variation of biomolecular characteristics. PMID:23999588

  19. Craniometric relationships among medieval Central European populations: implications for Croat migration and expansion.

    PubMed

    Slaus, Mario; Tomicić, Zeljko; Uglesić, Ante; Jurić, Radomir

    2004-08-01

    To determine the ethnic composition of the early medieval Croats, the location from which they migrated to the east coast of the Adriatic, and to separate early medieval Croats from Bijelo brdo culture members, using principal components analysis and discriminant function analysis of craniometric data from Central and South-East European medieval archaeological sites. Mean male values for 8 cranial measurements from 39 European and 5 Iranian sites were analyzed by principal components analysis. Raw data for 17 cranial measurements for 103 female and 112 male skulls were used to develop discriminant functions. The scatter-plot of the analyzed sites on the first 2 principal components showed a pattern of intergroup relationships consistent with geographical and archaeological information not included in the data set. The first 2 principal components separated the sites into 4 distinct clusters: Avaroslav sites west of the Danube, Avaroslav sites east of the Danube, Bijelo brdo sites, and Polish sites. All early medieval Croat sites were located in the cluster of Polish sites. Two discriminant functions successfully differentiated between early medieval Croats and Bijelo brdo members. Overall accuracies were high -- 89.3% for males, and 97.1% for females. Early medieval Croats seem to be of Slavic ancestry, and at one time shared a common homeland with medieval Poles. Application of unstandardized discriminant function coefficients to unclassified crania from 18 sites showed an expansion of early medieval Croats into continental Croatia during the 10th to 13th century.

  20. Using Machine Learning Techniques in the Analysis of Oceanographic Data

    NASA Astrophysics Data System (ADS)

    Falcinelli, K. E.; Abuomar, S.

    2017-12-01

    Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.

  1. Combining Mixture Components for Clustering*

    PubMed Central

    Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël

    2010-01-01

    Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302

  2. Reconstruction Error and Principal Component Based Anomaly Detection in Hyperspectral Imagery

    DTIC Science & Technology

    2014-03-27

    2003), and (Jackson D. A., 1993). In 1933, Hotelling ( Hotelling , 1933), who coined the term ‘principal components,’ surmised that there was a...goodness of fit and multivariate quality control with the statistic Qi = (Xi(1×p) − X̂i(1×p) )(Xi(1×p) − X̂i(1×p) ) T (20) where, under the...sparsely targeted scenes through SNR or other methods. 5) Customize sorting and histogram construction methods in Multiple PCA to avoid redundancy

  3. Modified neural networks for rapid recovery of tokamak plasma parameters for real time control

    NASA Astrophysics Data System (ADS)

    Sengupta, A.; Ranjan, P.

    2002-07-01

    Two modified neural network techniques are used for the identification of the equilibrium plasma parameters of the Superconducting Steady State Tokamak I from external magnetic measurements. This is expected to ultimately assist in a real time plasma control. As different from the conventional network structure where a single network with the optimum number of processing elements calculates the outputs, a multinetwork system connected in parallel does the calculations here in one of the methods. This network is called the double neural network. The accuracy of the recovered parameters is clearly more than the conventional network. The other type of neural network used here is based on the statistical function parametrization combined with a neural network. The principal component transformation removes linear dependences from the measurements and a dimensional reduction process reduces the dimensionality of the input space. This reduced and transformed input set, rather than the entire set, is fed into the neural network input. This is known as the principal component transformation-based neural network. The accuracy of the recovered parameters in the latter type of modified network is found to be a further improvement over the accuracy of the double neural network. This result differs from that obtained in an earlier work where the double neural network showed better performance. The conventional network and the function parametrization methods have also been used for comparison. The conventional network has been used for an optimization of the set of magnetic diagnostics. The effective set of sensors, as assessed by this network, are compared with the principal component based network. Fault tolerance of the neural networks has been tested. The double neural network showed the maximum resistance to faults in the diagnostics, while the principal component based network performed poorly. Finally the processing times of the methods have been compared. The double network and the principal component network involve the minimum computation time, although the conventional network also performs well enough to be used in real time.

  4. Sensory characteristics and consumer preference for chicken meat in Guinea.

    PubMed

    Sow, T M A; Grongnet, J F

    2010-10-01

    This study identified the sensory characteristics and consumer preference for chicken meat in Guinea. Five chicken samples [live village chicken, live broiler, live spent laying hen, ready-to-cook broiler, and ready-to-cook broiler (imported)] bought from different locations were assessed by 10 trained panelists using 19 sensory attributes. The ANOVA results showed that 3 chicken appearance attributes (brown, yellow, and white), 5 chicken odor attributes (oily, intense, medicine smell, roasted, and mouth persistent), 3 chicken flavor attributes (sweet, bitter, and astringent), and 8 chicken texture attributes (firm, tender, juicy, chew, smooth, springy, hard, and fibrous) were significantly discriminating between the chicken samples (P<0.05). Principal component analysis of the sensory data showed that the first 2 principal components explained 84% of the sensory data variance. The principal component analysis results showed that the live village chicken, the live spent laying hen, and the ready-to-cook broiler (imported) were very well represented and clearly distinguished from the live broiler and the ready-to-cook broiler. One hundred twenty consumers expressed their preferences for the chicken samples using a 5-point Likert scale. The hierarchical cluster analysis of the preference data identified 4 homogenous consumer clusters. The hierarchical cluster analysis results showed that the live village chicken was the most preferred chicken sample, whereas the ready-to-cook broiler was the least preferred one. The partial least squares regression (PLSR) type 1 showed that 72% of the sensory data for the first 2 principal components explained 83% of the chicken preference. The PLSR1 identified that the sensory characteristics juicy, oily, sweet, hard, mouth persistent, and yellow were the most relevant sensory drivers of the Guinean chicken preference. The PLSR2 (with multiple responses) identified the relationship between the chicken samples, their sensory attributes, and the consumer clusters. Our results showed that there was not a chicken category that was exclusively preferred from the other chicken samples and therefore highlight the existence of place for development of all chicken categories in the local market.

  5. Identifying the regional-scale groundwater-surface water interaction on the Sanjiang Plain, Northeast China.

    PubMed

    Wang, Xihua; Zhang, Guangxin; Xu, Y Jun; Sun, Guangzhi

    2015-11-01

    Assessment on the interaction between groundwater and surface water (GW-SW) can generate information that is critical to regional water resource management, especially for regions that are highly dependent on groundwater resources for irrigation. This study investigated such interaction on China's Sanjiang Plain (10.9 × 10(4) km(2)) and produced results to assist sustainable regional water management for intensive agricultural activities. Methods of hierarchical cluster analysis (HCA), principal component analysis (PCA), and statistical analysis were used in this study. One hundred two water samplings (60 from shallow groundwater, 7 from deep groundwater, and 35 from surface water) were collected and grouped into three clusters and seven sub-clusters during the analyses. The PCA analysis identified four principal components of the interaction, which explained 85.9% variance of total database, attributed to the dissolution and evolution of gypsum, feldspar, and other natural minerals in the region that was affected by anthropic and geological (sedimentary rock mineral) activities. The analyses showed that surface water in the upper region of the Sanjiang Plain gained water from local shallow groundwater, indicating that the surface water in the upper region was relatively more resilient to withdrawal for usage, whereas in the middle region, there was only a weak interaction between shallow groundwater and surface water. In the lower region of the Sanjiang Plain, surface water lost water to shallow groundwater, indicating that the groundwater was vulnerable to pollution by pesticides and fertilizers from terrestrial sources.

  6. Principal component and clustering analysis on molecular dynamics data of the ribosomal L11·23S subdomain.

    PubMed

    Wolf, Antje; Kirschner, Karl N

    2013-02-01

    With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.

  7. Pepper seed variety identification based on visible/near-infrared spectral technology

    NASA Astrophysics Data System (ADS)

    Li, Cuiling; Wang, Xiu; Meng, Zhijun; Fan, Pengfei; Cai, Jichen

    2016-11-01

    Pepper is a kind of important fruit vegetable, with the expansion of pepper hybrid planting area, detection of pepper seed purity is especially important. This research used visible/near infrared (VIS/NIR) spectral technology to detect the variety of single pepper seed, and chose hybrid pepper seeds "Zhuo Jiao NO.3", "Zhuo Jiao NO.4" and "Zhuo Jiao NO.5" as research sample. VIS/NIR spectral data of 80 "Zhuo Jiao NO.3", 80 "Zhuo Jiao NO.4" and 80 "Zhuo Jiao NO.5" pepper seeds were collected, and the original spectral data was pretreated with standard normal variable (SNV) transform, first derivative (FD), and Savitzky-Golay (SG) convolution smoothing methods. Principal component analysis (PCA) method was adopted to reduce the dimension of the spectral data and extract principal components, according to the distribution of the first principal component (PC1) along with the second principal component(PC2) in the twodimensional plane, similarly, the distribution of PC1 coupled with the third principal component(PC3), and the distribution of PC2 combined with PC3, distribution areas of three varieties of pepper seeds were divided in each twodimensional plane, and the discriminant accuracy of PCA was tested through observing the distribution area of samples' principal components in validation set. This study combined PCA and linear discriminant analysis (LDA) to identify single pepper seed varieties, results showed that with the FD preprocessing method, the discriminant accuracy of pepper seed varieties was 98% for validation set, it concludes that using VIS/NIR spectral technology is feasible for identification of single pepper seed varieties.

  8. Comparison of multianalyte proficiency test results by sum of ranking differences, principal component analysis, and hierarchical cluster analysis.

    PubMed

    Škrbić, Biljana; Héberger, Károly; Durišić-Mladenović, Nataša

    2013-10-01

    Sum of ranking differences (SRD) was applied for comparing multianalyte results obtained by several analytical methods used in one or in different laboratories, i.e., for ranking the overall performances of the methods (or laboratories) in simultaneous determination of the same set of analytes. The data sets for testing of the SRD applicability contained the results reported during one of the proficiency tests (PTs) organized by EU Reference Laboratory for Polycyclic Aromatic Hydrocarbons (EU-RL-PAH). In this way, the SRD was also tested as a discriminant method alternative to existing average performance scores used to compare mutlianalyte PT results. SRD should be used along with the z scores--the most commonly used PT performance statistics. SRD was further developed to handle the same rankings (ties) among laboratories. Two benchmark concentration series were selected as reference: (a) the assigned PAH concentrations (determined precisely beforehand by the EU-RL-PAH) and (b) the averages of all individual PAH concentrations determined by each laboratory. Ranking relative to the assigned values and also to the average (or median) values pointed to the laboratories with the most extreme results, as well as revealed groups of laboratories with similar overall performances. SRD reveals differences between methods or laboratories even if classical test(s) cannot. The ranking was validated using comparison of ranks by random numbers (a randomization test) and using seven folds cross-validation, which highlighted the similarities among the (methods used in) laboratories. Principal component analysis and hierarchical cluster analysis justified the findings based on SRD ranking/grouping. If the PAH-concentrations are row-scaled, (i.e., z scores are analyzed as input for ranking) SRD can still be used for checking the normality of errors. Moreover, cross-validation of SRD on z scores groups the laboratories similarly. The SRD technique is general in nature, i.e., it can be applied to any experimental problem in which multianalyte results obtained either by several analytical procedures, analysts, instruments, or laboratories need to be compared.

  9. A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

    PubMed

    Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

    2015-10-01

    Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.

  10. Patterns in longitudinal growth of refraction in Southern Chinese children: cluster and principal component analysis.

    PubMed

    Chen, Yanxian; Chang, Billy Heung Wing; Ding, Xiaohu; He, Mingguang

    2016-11-22

    In the present study we attempt to use hypothesis-independent analysis in investigating the patterns in refraction growth in Chinese children, and to explore the possible risk factors affecting the different components of progression, as defined by Principal Component Analysis (PCA). A total of 637 first-born twins in Guangzhou Twin Eye Study with 6-year annual visits (baseline age 7-15 years) were available in the analysis. Cluster 1 to 3 were classified after a partitioning clustering, representing stable, slow and fast progressing groups of refraction respectively. Baseline age and refraction, paternal refraction, maternal refraction and proportion of two myopic parents showed significant differences across the three groups. Three major components of progression were extracted using PCA: "Average refraction", "Acceleration" and the combination of "Myopia stabilization" and "Late onset of refraction progress". In regression models, younger children with more severe myopia were associated with larger "Acceleration". The risk factors of "Acceleration" included change of height and weight, near work, and parental myopia, while female gender, change of height and weight were associated with "Stabilization", and increased outdoor time was related to "Late onset of refraction progress". We therefore concluded that genetic and environmental risk factors have different impacts on patterns of refraction progression.

  11. Patterns in longitudinal growth of refraction in Southern Chinese children: cluster and principal component analysis

    PubMed Central

    Chen, Yanxian; Chang, Billy Heung Wing; Ding, Xiaohu; He, Mingguang

    2016-01-01

    In the present study we attempt to use hypothesis-independent analysis in investigating the patterns in refraction growth in Chinese children, and to explore the possible risk factors affecting the different components of progression, as defined by Principal Component Analysis (PCA). A total of 637 first-born twins in Guangzhou Twin Eye Study with 6-year annual visits (baseline age 7–15 years) were available in the analysis. Cluster 1 to 3 were classified after a partitioning clustering, representing stable, slow and fast progressing groups of refraction respectively. Baseline age and refraction, paternal refraction, maternal refraction and proportion of two myopic parents showed significant differences across the three groups. Three major components of progression were extracted using PCA: “Average refraction”, “Acceleration” and the combination of “Myopia stabilization” and “Late onset of refraction progress”. In regression models, younger children with more severe myopia were associated with larger “Acceleration”. The risk factors of “Acceleration” included change of height and weight, near work, and parental myopia, while female gender, change of height and weight were associated with “Stabilization”, and increased outdoor time was related to “Late onset of refraction progress”. We therefore concluded that genetic and environmental risk factors have different impacts on patterns of refraction progression. PMID:27874105

  12. Iris recognition based on robust principal component analysis

    NASA Astrophysics Data System (ADS)

    Karn, Pradeep; He, Xiao Hai; Yang, Shuai; Wu, Xiao Hong

    2014-11-01

    Iris images acquired under different conditions often suffer from blur, occlusion due to eyelids and eyelashes, specular reflection, and other artifacts. Existing iris recognition systems do not perform well on these types of images. To overcome these problems, we propose an iris recognition method based on robust principal component analysis. The proposed method decomposes all training images into a low-rank matrix and a sparse error matrix, where the low-rank matrix is used for feature extraction. The sparsity concentration index approach is then applied to validate the recognition result. Experimental results using CASIA V4 and IIT Delhi V1iris image databases showed that the proposed method achieved competitive performances in both recognition accuracy and computational efficiency.

  13. Hyperspectral remote sensing for advanced detection of early blight (Alternaria solani) disease in potato (Solanum tuberosum) plants

    NASA Astrophysics Data System (ADS)

    Atherton, Daniel

    Early detection of disease and insect infestation within crops and precise application of pesticides can help reduce potential production losses, reduce environmental risk, and reduce the cost of farming. The goal of this study was the advanced detection of early blight (Alternaria solani) in potato (Solanum tuberosum) plants using hyperspectral remote sensing data captured with a handheld spectroradiometer. Hyperspectral reflectance spectra were captured 10 times over five weeks from plants grown to the vegetative and tuber bulking growth stages. The spectra were analyzed using principal component analysis (PCA), spectral change (ratio) analysis, partial least squares (PLS), cluster analysis, and vegetative indices. PCA successfully distinguished more heavily diseased plants from healthy and minimally diseased plants using two principal components. Spectral change (ratio) analysis provided wavelengths (490-510, 640, 665-670, 690, 740-750, and 935 nm) most sensitive to early blight infection followed by ANOVA results indicating a highly significant difference (p < 0.0001) between disease rating group means. In the majority of the experiments, comparisons of diseased plants with healthy plants using Fisher's LSD revealed more heavily diseased plants were significantly different from healthy plants. PLS analysis demonstrated the feasibility of detecting early blight infected plants, finding four optimal factors for raw spectra with the predictor variation explained ranging from 93.4% to 94.6% and the response variation explained ranging from 42.7% to 64.7%. Cluster analysis successfully distinguished healthy plants from all diseased plants except for the most mildly diseased plants, showing clustering analysis was an effective method for detection of early blight. Analysis of the reflectance spectra using the simple ratio (SR) and the normalized difference vegetative index (NDVI) was effective at differentiating all diseased plants from healthy plants, except for the most mildly diseased plants. Of the analysis methods attempted, cluster analysis and vegetative indices were the most promising. The results show the potential of hyperspectral remote sensing for the detection of early blight in potato plants.

  14. An efficient classification method based on principal component and sparse representation.

    PubMed

    Zhai, Lin; Fu, Shujun; Zhang, Caiming; Liu, Yunxian; Wang, Lu; Liu, Guohua; Yang, Mingqiang

    2016-01-01

    As an important application in optical imaging, palmprint recognition is interfered by many unfavorable factors. An effective fusion of blockwise bi-directional two-dimensional principal component analysis and grouping sparse classification is presented. The dimension reduction and normalizing are implemented by the blockwise bi-directional two-dimensional principal component analysis for palmprint images to extract feature matrixes, which are assembled into an overcomplete dictionary in sparse classification. A subspace orthogonal matching pursuit algorithm is designed to solve the grouping sparse representation. Finally, the classification result is gained by comparing the residual between testing and reconstructed images. Experiments are carried out on a palmprint database, and the results show that this method has better robustness against position and illumination changes of palmprint images, and can get higher rate of palmprint recognition.

  15. [Quality evaluation of Artemisiae Argyi Folium based on fingerprint analysis and quantitative analysis of multicomponents].

    PubMed

    Guo, Long; Jiao, Qian; Zhang, Dan; Liu, Ai-Peng; Wang, Qian; Zheng, Yu-Guang

    2018-03-01

    Artemisiae Argyi Folium, the dried leaves of Artemisia argyi, has been widely used in traditional Chinese and folk medicines for treatment of hemorrhage, pain, and skin itch. Phytochemical studies indicated that volatile oil, organic acid and flavonoids were the main bioactive components in Artemisiae Argyi Folium. Compared to the volatile compounds, the research of nonvolatile compounds in Artemisiae Argyi Folium are limited. In the present study, an accurate and reliable fingerprint approach was developed using HPLC for quality control of Artemisiae Argyi Folium. A total of 10 common peaks were marked,and the similarity of all the Artemisiae Argyi Folium samples was above 0.940. The established fingerprint method could be used for quality control of Artemisiae Argyi Folium. Furthermore, an HPLC method was applied for simultaneous determination of seven bioactive compounds including five organic acids and two flavonoids in Artemisiae Argyi Folium and Artemisiae Lavandulaefoliae Folium samples. Moreover, chemometrics methods such as hierarchical clustering analysis and principal component analysis were performed to compare and discriminate the Artemisiae Argyi Folium and Artemisiae Lavandulaefoliae Folium based on the quantitative data of analytes. The results indicated that simultaneous quantification of multicomponents coupled with chemometrics analysis could be a well-acceptable strategy to identify and evaluate the quality of Artemisiae Argyi Folium. Copyright© by the Chinese Pharmaceutical Association.

  16. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    PubMed

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  17. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth

    PubMed Central

    Zhang, Zhaoyang; Wang, Honggang

    2016-01-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering is more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063

  18. Quality Evaluation and Chemical Markers Screening of Salvia miltiorrhiza Bge. (Danshen) Based on HPLC Fingerprints and HPLC-MSn Coupled with Chemometrics.

    PubMed

    Liang, Wenyi; Chen, Wenjing; Wu, Lingfang; Li, Shi; Qi, Qi; Cui, Yaping; Liang, Linjin; Ye, Ting; Zhang, Lanzhen

    2017-03-17

    Danshen, the dried root of Salvia miltiorrhiza Bge., is a widely used commercially available herbal drug, and unstable quality of different samples is a current issue. This study focused on a comprehensive and systematic method combining fingerprints and chemical identification with chemometrics for discrimination and quality assessment of Danshen samples. Twenty-five samples were analyzed by HPLC-PAD and HPLC-MS n . Forty-nine components were identified and characteristic fragmentation regularities were summarized for further interpretation of bioactive components. Chemometric analysis was employed to differentiate samples and clarify the quality differences of Danshen including hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis. Consistent results were that the samples were divided into three categories which reflected the difference in quality of Danshen samples. By analyzing the reasons for sample classification, it was revealed that the processing method had a more obvious impact on sample classification than the geographical origin, it induced the different content of bioactive compounds and finally lead to different qualities. Cryptotanshinone, trijuganone B, and 15,16-dihydrotanshinone I were screened out as markers to distinguish samples by different processing methods. The developed strategy could provide a reference for evaluation and discrimination of other traditional herbal medicines.

  19. An Updated Review of Meat Authenticity Methods and Applications.

    PubMed

    Vlachos, Antonios; Arvanitoyannis, Ioannis S; Tserkezou, Persefoni

    2016-05-18

    Adulteration of foods is a serious economic problem concerning most foodstuffs, and in particular meat products. Since high-priced meat demand premium prices, producers of meat-based products might be tempted to blend these products with lower cost meat. Moreover, the labeled meat contents may not be met. Both types of adulteration are difficult to detect and lead to deterioration of product quality. For the consumer, it is of outmost importance to guarantee both authenticity and compliance with product labeling. The purpose of this article is to review the state of the art of meat authenticity with analytical and immunochemical methods with the focus on the issue of geographic origin and sensory characteristics. This review is also intended to provide an overview of the various currently applied statistical analyses (multivariate analysis (MAV), such as principal component analysis, discriminant analysis, cluster analysis, etc.) and their effectiveness for meat authenticity.

  20. Application of chemometric analysis based on physicochemical and chromatographic data for the differentiation origin of plant protection products containing chlorpyrifos.

    PubMed

    Miszczyk, Marek; Płonka, Marlena; Bober, Katarzyna; Dołowy, Małgorzata; Pyka, Alina; Pszczolińska, Klaudia

    2015-01-01

    The aim of this study was to investigate the similarities and dissimilarities between the pesticide samples in form of emulsifiable concentrates (EC) formulation containing chlorpyrifos as active ingredient coming from different sources (i.e., shops and wholesales) and also belonging to various series. The results obtained by the Headspace Gas Chromatography-Mass Spectrometry method and also some selected physicochemical properties of examined pesticides including pH, density, stability, active ingredient and water content in pesticides tested were compared using two chemometric methods. Applicability of simple cluster analysis and also principal component analysis of obtained data in differentiation of examined plant protection products coming from different sources was confirmed. It would be advantageous in the routine control of originality and also in the detection of counterfeit pesticides, respectively, among commercially available pesticides containing chlorpyrifos as an active ingredient.

  1. A Simple and Effective Mass Spectrometric Approach to Identify the Adulteration of the Mediterranean Diet Component Extra-Virgin Olive Oil with Corn Oil.

    PubMed

    Di Girolamo, Francesco; Masotti, Andrea; Lante, Isabella; Scapaticci, Margherita; Calvano, Cosima Damiana; Zambonin, Carlo; Muraca, Maurizio; Putignani, Lorenza

    2015-09-01

    Extra virgin olive oil (EVOO) with its nutraceutical characteristics substantially contributes as a major nutrient to the health benefit of the Mediterranean diet. Unfortunately, the adulteration of EVOO with less expensive oils (e.g., peanut and corn oils), has become one of the biggest source of agricultural fraud in the European Union, with important health implications for consumers, mainly due to the introduction of seed oil-derived allergens causing, especially in children, severe food allergy phenomena. In this regard, revealing adulterations of EVOO is of fundamental importance for health care and prevention reasons, especially in children. To this aim, effective analytical methods to assess EVOO purity are necessary. Here, we propose a simple, rapid, robust and very sensitive method for non-specialized mass spectrometric laboratory, based on the matrix-assisted laser desorption/ionization mass spectrometry (MALDI-TOF MS) coupled to unsupervised hierarchical clustering (UHC), principal component (PCA) and Pearson's correlation analyses, to reveal corn oil (CO) adulterations in EVOO at very low levels (down to 0.5%).

  2. A Simple and Effective Mass Spectrometric Approach to Identify the Adulteration of the Mediterranean Diet Component Extra-Virgin Olive Oil with Corn Oil

    PubMed Central

    Di Girolamo, Francesco; Masotti, Andrea; Lante, Isabella; Scapaticci, Margherita; Calvano, Cosima Damiana; Zambonin, Carlo; Muraca, Maurizio; Putignani, Lorenza

    2015-01-01

    Extra virgin olive oil (EVOO) with its nutraceutical characteristics substantially contributes as a major nutrient to the health benefit of the Mediterranean diet. Unfortunately, the adulteration of EVOO with less expensive oils (e.g., peanut and corn oils), has become one of the biggest source of agricultural fraud in the European Union, with important health implications for consumers, mainly due to the introduction of seed oil-derived allergens causing, especially in children, severe food allergy phenomena. In this regard, revealing adulterations of EVOO is of fundamental importance for health care and prevention reasons, especially in children. To this aim, effective analytical methods to assess EVOO purity are necessary. Here, we propose a simple, rapid, robust and very sensitive method for non-specialized mass spectrometric laboratory, based on the matrix-assisted laser desorption/ionization mass spectrometry (MALDI-TOF MS) coupled to unsupervised hierarchical clustering (UHC), principal component (PCA) and Pearson’s correlation analyses, to reveal corn oil (CO) adulterations in EVOO at very low levels (down to 0.5%). PMID:26340625

  3. Clustering of metabolic and cardiovascular risk factors in the polycystic ovary syndrome: a principal component analysis.

    PubMed

    Stuckey, Bronwyn G A; Opie, Nicole; Cussons, Andrea J; Watts, Gerald F; Burke, Valerie

    2014-08-01

    Polycystic ovary syndrome (PCOS) is a prevalent condition with heterogeneity of clinical features and cardiovascular risk factors that implies multiple aetiological factors and possible outcomes. To reduce a set of correlated variables to a smaller number of uncorrelated and interpretable factors that may delineate subgroups within PCOS or suggest pathogenetic mechanisms. We used principal component analysis (PCA) to examine the endocrine and cardiometabolic variables associated with PCOS defined by the National Institutes of Health (NIH) criteria. Data were retrieved from the database of a single clinical endocrinologist. We included women with PCOS (N = 378) who were not taking the oral contraceptive pill or other sex hormones, lipid lowering medication, metformin or other medication that could influence the variables of interest. PCA was performed retaining those factors with eigenvalues of at least 1.0. Varimax rotation was used to produce interpretable factors. We identified three principal components. In component 1, the dominant variables were homeostatic model assessment (HOMA) index, body mass index (BMI), high density lipoprotein (HDL) cholesterol and sex hormone binding globulin (SHBG); in component 2, systolic blood pressure, low density lipoprotein (LDL) cholesterol and triglycerides; in component 3, total testosterone and LH/FSH ratio. These components explained 37%, 13% and 11% of the variance in the PCOS cohort respectively. Multiple correlated variables from patients with PCOS can be reduced to three uncorrelated components characterised by insulin resistance, dyslipidaemia/hypertension or hyperandrogenaemia. Clustering of risk factors is consistent with different pathogenetic pathways within PCOS and/or differing cardiometabolic outcomes. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Support vector machine based classification of fast Fourier transform spectroscopy of proteins

    NASA Astrophysics Data System (ADS)

    Lazarevic, Aleksandar; Pokrajac, Dragoljub; Marcano, Aristides; Melikechi, Noureddine

    2009-02-01

    Fast Fourier transform spectroscopy has proved to be a powerful method for study of the secondary structure of proteins since peak positions and their relative amplitude are affected by the number of hydrogen bridges that sustain this secondary structure. However, to our best knowledge, the method has not been used yet for identification of proteins within a complex matrix like a blood sample. The principal reason is the apparent similarity of protein infrared spectra with actual differences usually masked by the solvent contribution and other interactions. In this paper, we propose a novel machine learning based method that uses protein spectra for classification and identification of such proteins within a given sample. The proposed method uses principal component analysis (PCA) to identify most important linear combinations of original spectral components and then employs support vector machine (SVM) classification model applied on such identified combinations to categorize proteins into one of given groups. Our experiments have been performed on the set of four different proteins, namely: Bovine Serum Albumin, Leptin, Insulin-like Growth Factor 2 and Osteopontin. Our proposed method of applying principal component analysis along with support vector machines exhibits excellent classification accuracy when identifying proteins using their infrared spectra.

  5. Antioxidant capacity of cornelian cherry (Cornus mas L.) - comparison between permanganate reducing antioxidant capacity and other antioxidant methods.

    PubMed

    Popović, Boris M; Stajner, Dubravka; Slavko, Kevrešan; Sandra, Bijelić

    2012-09-15

    Ethanol extracts (80% in water) of 10 cornelian cherry (Cornus mas L.) genotypes were studied for antioxidant properties, using methods including DPPH(), ()NO, O(2)(-) and ()OH antiradical powers, FRAP, total phenolic and anthocyanin content (TPC and ACC) and also one relatively new, permanganate method (permanganate reducing antioxidant capacity-PRAC). Lipid peroxidation (LP) was also determined as an indicator of oxidative stress. The data from different procedures were compared and analysed by multivariate techniques (correlation matrix calculation and principal component analysis (PCA)). Significant positive correlations were obtained between TPC, ACC and DPPH(), ()NO, O(2)(-), and ()OH antiradical powers, and also between PRAC and TPC, ACC and FRAP. PCA found two major clusters of cornelian cherry, based on antiradical power, FRAP and PRAC and also on chemical composition. Chemometric evaluation showed close interdependence between PRAC method and FRAP and ACC. There was a huge variation between C. mas genotypes in terms of antioxidant activity. Copyright © 2012 Elsevier Ltd. All rights reserved.

  6. Spatial and temporal characterizations of water quality in Kuwait Bay.

    PubMed

    Al-Mutairi, N; Abahussain, A; El-Battay, A

    2014-06-15

    The spatial and temporal patterns of water quality in Kuwait Bay have been investigated using data from six stations between 2009 and 2011. The results showed that most of water quality parameters such as phosphorus (PO4), nitrate (NO3), dissolved oxygen (DO), and Total Suspended Solids (TSS) fluctuated over time and space. Based on Water Quality Index (WQI) data, six stations were significantly clustered into two main classes using cluster analysis, one group located in western side of the Bay, and other in eastern side. Three principal components are responsible for water quality variations in the Bay. The first component included DO and pH. The second included PO4, TSS and NO3, and the last component contained seawater temperature and turbidity. The spatial and temporal patterns of water quality in Kuwait Bay are mainly controlled by seasonal variations and discharges from point sources of pollution along Kuwait Bay's coast as well as from Shatt Al-Arab River. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Variation of heavy metals in recent sediments from Piratininga Lagoon (Brazil): interpretation of geochemical data with the aid of multivariate analysis

    NASA Astrophysics Data System (ADS)

    Huang, W.; Campredon, R.; Abrao, J. J.; Bernat, M.; Latouche, C.

    1994-06-01

    In the last decade, the Atlantic coast of south-eastern Brazil has been affected by increasing deforestation and anthropogenic effluents. Sediments in the coastal lagoons have recorded the process of such environmental change. Thirty-seven sediment samples from three cores in Piratininga Lagoon, Rio de Janeiro, were analyzed for their major components and minor element concentrations in order to examine geochemical characteristics and the depositional environment and to investigate the variation of heavy metals of environmental concern. Two multivariate analysis methods, principal component analysis and cluster analysis, were performed on the analytical data set to help visualize the sample clusters and the element associations. On the whole, the sediment samples from each core are similar and the sample clusters corresponding to the three cores are clearly separated, as a result of the different conditions of sedimentation. Some changes in the depositional environment are recognized using the results of multivariate analysis. The enrichment of Pb, Cu, and Zn in the upper parts of cores is in agreement with increasing anthropogenic influx (pollution).

  8. Using Structural Equation Modeling To Fit Models Incorporating Principal Components.

    ERIC Educational Resources Information Center

    Dolan, Conor; Bechger, Timo; Molenaar, Peter

    1999-01-01

    Considers models incorporating principal components from the perspectives of structural-equation modeling. These models include the following: (1) the principal-component analysis of patterned matrices; (2) multiple analysis of variance based on principal components; and (3) multigroup principal-components analysis. Discusses fitting these models…

  9. Multivariate analysis of the volatile components in tobacco based on infrared-assisted extraction coupled to headspace solid-phase microextraction and gas chromatography-mass spectrometry.

    PubMed

    Yang, Yanqin; Pan, Yuanjiang; Zhou, Guojun; Chu, Guohai; Jiang, Jian; Yuan, Kailong; Xia, Qian; Cheng, Changhe

    2016-11-01

    A novel infrared-assisted extraction coupled to headspace solid-phase microextraction followed by gas chromatography with mass spectrometry method has been developed for the rapid determination of the volatile components in tobacco. The optimal extraction conditions for maximizing the extraction efficiency were as follows: 65 μm polydimethylsiloxane-divinylbenzene fiber, extraction time of 20 min, infrared power of 175 W, and distance between the infrared lamp and the headspace vial of 2 cm. Under the optimum conditions, 50 components were found to exist in all ten tobacco samples from different geographical origins. Compared with conventional water-bath heating and nonheating extraction methods, the extraction efficiency of infrared-assisted extraction was greatly improved. Furthermore, multivariate analysis including principal component analysis, hierarchical cluster analysis, and similarity analysis were performed to evaluate the chemical information of these samples and divided them into three classifications, including rich, moderate, and fresh flavors. The above-mentioned classification results were consistent with the sensory evaluation, which was pivotal and meaningful for tobacco discrimination. As a simple, fast, cost-effective, and highly efficient method, the infrared-assisted extraction coupled to headspace solid-phase microextraction technique is powerful and promising for distinguishing the geographical origins of the tobacco samples coupled to suitable chemometrics. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Broadband terahertz time-domain spectroscopy of drugs-of-abuse and the use of principal component analysis.

    PubMed

    Burnett, Andrew D; Fan, Wenhui; Upadhya, Prashanth C; Cunningham, John E; Hargreaves, Michael D; Munshi, Tasnim; Edwards, Howell G M; Linfield, Edmund H; Davies, A Giles

    2009-08-01

    Terahertz frequency time-domain spectroscopy has been used to analyse a wide range of samples containing cocaine hydrochloride, heroin and ecstasy--common drugs-of-abuse. We investigated real-world samples seized by law enforcement agencies, together with pure drugs-of-abuse, and pure drugs-of-abuse systematically adulterated in the laboratory to emulate real-world samples. In order to investigate the feasibility of automatic spectral recognition of such illicit materials by terahertz spectroscopy, principal component analysis was employed to cluster spectra of similar compounds.

  11. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study

    PubMed Central

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637

  12. Principal Component Analysis of Cerebellar Shape on MRI Separates SCA Types 2 and 6 into Two Archetypal Modes of Degeneration

    PubMed Central

    Jung, Brian C.; Choi, Soo I.; Du, Annie X.; Cuzzocreo, Jennifer L.; Geng, Zhuo Z.; Ying, Howard S.; Perlman, Susan L.; Toga, Arthur W.; Prince, Jerry L.

    2014-01-01

    Although “cerebellar ataxia” is often used in reference to a disease process, presumably there are different underlying pathogenetic mechanisms for different subtypes. Indeed, spinocerebellar ataxia (SCA) types 2 and 6 demonstrate complementary phenotypes, thus predicting a different anatomic pattern of degeneration. Here, we show that an unsupervised classification method, based on principal component analysis (PCA) of cerebellar shape characteristics, can be used to separate SCA2 and SCA6 into two classes, which may represent disease-specific archetypes. Patients with SCA2 (n=11) and SCA6 (n=7) were compared against controls (n=15) using PCA to classify cerebellar anatomic shape characteristics. Within the first three principal components, SCA2 and SCA6 differed from controls and from each other. In a secondary analysis, we studied five additional subjects and found that these patients were consistent with the previously defined archetypal clusters of clinical and anatomical characteristics. Secondary analysis of five subjects with related diagnoses showed that disease groups that were clinically and pathophysiologically similar also shared similar anatomic characteristics. Specifically, Archetype #1 consisted of SCA3 (n=1) and SCA2, suggesting that cerebellar syndromes accompanied by atrophy of the pons may be associated with a characteristic pattern of cerebellar neurodegeneration. In comparison, Archetype #2 was comprised of disease groups with pure cerebellar atrophy (episodic ataxia type 2 (n=1), idiopathic late-onset cerebellar ataxias (n=3), and SCA6). This suggests that cerebellar shape analysis could aid in discriminating between different pathologies. Our findings further suggest that magnetic resonance imaging is a promising imaging biomarker that could aid in the diagnosis and therapeutic management in patients with cerebellar syndromes. PMID:22258915

  13. Principal components derived from CSF inflammatory profiles predict outcome in survivors after severe traumatic brain injury.

    PubMed

    Kumar, Raj G; Rubin, Jonathan E; Berger, Rachel P; Kochanek, Patrick M; Wagner, Amy K

    2016-03-01

    Studies have characterized absolute levels of multiple inflammatory markers as significant risk factors for poor outcomes after traumatic brain injury (TBI). However, inflammatory marker concentrations are highly inter-related, and production of one may result in the production or regulation of another. Therefore, a more comprehensive characterization of the inflammatory response post-TBI should consider relative levels of markers in the inflammatory pathway. We used principal component analysis (PCA) as a dimension-reduction technique to characterize the sets of markers that contribute independently to variability in cerebrospinal (CSF) inflammatory profiles after TBI. Using PCA results, we defined groups (or clusters) of individuals (n=111) with similar patterns of acute CSF inflammation that were then evaluated in the context of outcome and other relevant CSF and serum biomarkers collected days 0-3 and 4-5 post-injury. We identified four significant principal components (PC1-PC4) for CSF inflammation from days 0-3, and PC1 accounted for the greatest (31%) percentage of variance. PC1 was characterized by relatively higher CSF sICAM-1, sFAS, IL-10, IL-6, sVCAM-1, IL-5, and IL-8 levels. Cluster analysis then defined two distinct clusters, such that individuals in cluster 1 had highly positive PC1 scores and relatively higher levels of CSF cortisol, progesterone, estradiol, testosterone, brain derived neurotrophic factor (BDNF), and S100b; this group also had higher serum cortisol and lower serum BDNF. Multinomial logistic regression analyses showed that individuals in cluster 1 had a 10.9 times increased likelihood of GOS scores of 2/3 vs. 4/5 at 6 months compared to cluster 2, after controlling for covariates. Cluster group did not discriminate between mortality compared to GOS scores of 4/5 after controlling for age and other covariates. Cluster groupings also did not discriminate mortality or 12 month outcomes in multivariate models. PCA and cluster analysis establish that a subset of CSF inflammatory markers measured in days 0-3 post-TBI may distinguish individuals with poor 6-month outcome, and future studies should prospectively validate these findings. PCA of inflammatory mediators after TBI could aid in prognostication and in identifying patient subgroups for therapeutic interventions. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Meteorology-induced variations in the spatial behavior of summer ozone pollution in Central California

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jin, Ling; Harley, Robert A.; Brown, Nancy J.

    Cluster analysis was applied to daily 8 h ozone maxima modeled for a summer season to characterize meteorology-induced variations in the spatial distribution of ozone. Principal component analysis is employed to form a reduced dimension set to describe and interpret ozone spatial patterns. The first three principal components (PCs) capture {approx}85% of total variance, with PC1 describing a general spatial trend, and PC2 and PC3 each describing a spatial contrast. Six clusters were identified for California's San Joaquin Valley (SJV) with two low, three moderate, and one high-ozone cluster. The moderate ozone clusters are distinguished by elevated ozone levels inmore » different parts of the valley: northern, western, and eastern, respectively. The SJV ozone clusters have stronger coupling with the San Francisco Bay area (SFB) than with the Sacramento Valley (SV). Variations in ozone spatial distributions induced by anthropogenic emission changes are small relative to the overall variations in ozone amomalies observed for the whole summer. Ozone regimes identified here are mostly determined by the direct and indirect meteorological effects. Existing measurement sites are sufficiently representative to capture ozone spatial patterns in the SFB and SV, but the western side of the SJV is under-sampled.« less

  15. Finger crease pattern recognition using Legendre moments and principal component analysis

    NASA Astrophysics Data System (ADS)

    Luo, Rongfang; Lin, Tusheng

    2007-03-01

    The finger joint lines defined as finger creases and its distribution can identify a person. In this paper, we propose a new finger crease pattern recognition method based on Legendre moments and principal component analysis (PCA). After obtaining the region of interest (ROI) for each finger image in the pre-processing stage, Legendre moments under Radon transform are applied to construct a moment feature matrix from the ROI, which greatly decreases the dimensionality of ROI and can represent principal components of the finger creases quite well. Then, an approach to finger crease pattern recognition is designed based on Karhunen-Loeve (K-L) transform. The method applies PCA to a moment feature matrix rather than the original image matrix to achieve the feature vector. The proposed method has been tested on a database of 824 images from 103 individuals using the nearest neighbor classifier. The accuracy up to 98.584% has been obtained when using 4 samples per class for training. The experimental results demonstrate that our proposed approach is feasible and effective in biometrics.

  16. Detection and discrimination of cotton foreign matter using push-broom based hyperspectral imaging: system design and capability.

    PubMed

    Jiang, Yu; Li, Changying

    2015-01-01

    Cotton quality, a major factor determining both cotton profitability and marketability, is affected by not only the overall quantity of but also the type of the foreign matter. Although current commercial instruments can measure the overall amount of the foreign matter, no instrument can differentiate various types of foreign matter. The goal of this study was to develop a hyperspectral imaging system to discriminate major types of foreign matter in cotton lint. A push-broom based hyperspectral imaging system with a custom-built multi-thread software was developed to acquire hyperspectral images of cotton fiber with 15 types of foreign matter commonly found in the U.S. cotton lint. A total of 450 (30 replicates for each foreign matter) foreign matter samples were cut into 1 by 1 cm2 pieces and imaged on the lint surface using reflectance mode in the spectral range from 400-1000 nm. The mean spectra of the foreign matter and lint were extracted from the user-defined region-of-interests in the hyperspectral images. The principal component analysis was performed on the mean spectra to reduce the feature dimension from the original 256 bands to the top 3 principal components. The score plots of the 3 principal components were used to examine clusterization patterns for classifying the foreign matter. These patterns were further validated by statistical tests. The experimental results showed that the mean spectra of all 15 types of cotton foreign matter were different from that of the lint. Nine types of cotton foreign matter formed distinct clusters in the score plots. Additionally, all of them were significantly different from each other at the significance level of 0.05 except brown leaf and bract. The developed hyperspectral imaging system is effective to detect and classify cotton foreign matter on the lint surface and has the potential to be implemented in commercial cotton classing offices.

  17. Recovery of a spectrum based on a compressive-sensing algorithm with weighted principal component analysis

    NASA Astrophysics Data System (ADS)

    Dafu, Shen; Leihong, Zhang; Dong, Liang; Bei, Li; Yi, Kang

    2017-07-01

    The purpose of this study is to improve the reconstruction precision and better copy the color of spectral image surfaces. A new spectral reflectance reconstruction algorithm based on an iterative threshold combined with weighted principal component space is presented in this paper, and the principal component with weighted visual features is the sparse basis. Different numbers of color cards are selected as the training samples, a multispectral image is the testing sample, and the color differences in the reconstructions are compared. The channel response value is obtained by a Mega Vision high-accuracy, multi-channel imaging system. The results show that spectral reconstruction based on weighted principal component space is superior in performance to that based on traditional principal component space. Therefore, the color difference obtained using the compressive-sensing algorithm with weighted principal component analysis is less than that obtained using the algorithm with traditional principal component analysis, and better reconstructed color consistency with human eye vision is achieved.

  18. Functional Interference Clusters in Cancer Patients With Bone Metastases: A Secondary Analysis of RTOG 9714

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chow, Edward, E-mail: Edward.Chow@sunnybrook.c; James, Jennifer; Barsevick, Andrea

    Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up.more » Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.« less

  19. Wilderness ecology: virgin plant communities of the Boundary Waters Canoe Area.

    Treesearch

    Lewis F. Ohmann; Robert R. Ream

    1971-01-01

    Describes virgin plant communities in the Boundary Waters Canoe Area. Data from all vegetative components of 106 virgin upland stands were used to construct a community classification through a combination of agglomerative clustering and principal components analysis. Discusses the relation of communities to their environment and to past wildfires.

  20. Effect of sample stratification on dairy GWAS results

    PubMed Central

    2012-01-01

    Background Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach. Results Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows. Conclusions Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection. PMID:23039970

  1. Network-Based Detection and Classification of Seismovolcanic Tremors: Example From the Klyuchevskoy Volcanic Group in Kamchatka

    NASA Astrophysics Data System (ADS)

    Soubestre, Jean; Shapiro, Nikolai M.; Seydoux, Léonard; de Rosny, Julien; Droznin, Dmitry V.; Droznina, Svetlana Ya.; Senyukov, Sergey L.; Gordeev, Evgeniy I.

    2018-01-01

    We develop a network-based method for detecting and classifying seismovolcanic tremors. The proposed approach exploits the coherence of tremor signals across the network that is estimated from the array covariance matrix. The method is applied to four and a half years of continuous seismic data recorded by 19 permanent seismic stations in the vicinity of the Klyuchevskoy volcanic group in Kamchatka (Russia), where five volcanoes were erupting during the considered time period. We compute and analyze daily covariance matrices together with their eigenvalues and eigenvectors. As a first step, most coherent signals corresponding to dominating tremor sources are detected based on the width of the covariance matrix eigenvalues distribution. Thus, volcanic tremors of the two volcanoes known as most active during the considered period, Klyuchevskoy and Tolbachik, are efficiently detected. As a next step, we consider the daily array covariance matrix's first eigenvector. Our main hypothesis is that these eigenvectors represent the principal components of the daily seismic wavefield and, for days with tremor activity, characterize dominant tremor sources. Those daily first eigenvectors, which can be used as network-based fingerprints of tremor sources, are then grouped into clusters using correlation coefficient as a measure of the vector similarity. As a result, we identify seven clusters associated with different periods of activity of four volcanoes: Tolbachik, Klyuchevskoy, Shiveluch, and Kizimen. The developed method does not require a priori knowledge and is fully automatic; and the database of the network-based tremor fingerprints can be continuously enriched with newly available data.

  2. Analysis of cytokine release assay data using machine learning approaches.

    PubMed

    Xiong, Feiyu; Janko, Marco; Walker, Mindi; Makropoulos, Dorie; Weinstock, Daniel; Kam, Moshe; Hrebien, Leonid

    2014-10-01

    The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development of monoclonal antibody (mAb) therapeutics. In this study, several machine learning approaches are used to analyze CRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential of mAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic (Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from 44 donors. Three (3) machine learning approaches were applied in combination to observations obtained from that assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followed by K-means clustering; and (iii) Decision Tree Classification (DTC). All three approaches were able to identify the treatment that caused the most severe cytokine response. HCA was able to provide information about the expected number of clusters in the data. PCA coupled with K-means clustering allowed classification of treatments sample by sample, and visualizing clusters of treatments. DTC models showed the relative importance of various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem provides better selection of parameters for one method based on outcomes from another, and an overall improved analysis of the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 may be correlated with CRS reactions, although this correlation has not yet been corroborated in the literature. Copyright © 2014 Elsevier B.V. All rights reserved.

  3. Research on distributed heterogeneous data PCA algorithm based on cloud platform

    NASA Astrophysics Data System (ADS)

    Zhang, Jin; Huang, Gang

    2018-05-01

    Principal component analysis (PCA) of heterogeneous data sets can solve the problem that centralized data scalability is limited. In order to reduce the generation of intermediate data and error components of distributed heterogeneous data sets, a principal component analysis algorithm based on heterogeneous data sets under cloud platform is proposed. The algorithm performs eigenvalue processing by using Householder tridiagonalization and QR factorization to calculate the error component of the heterogeneous database associated with the public key to obtain the intermediate data set and the lost information. Experiments on distributed DBM heterogeneous datasets show that the model method has the feasibility and reliability in terms of execution time and accuracy.

  4. A Probabilistic Framework for Constructing Temporal Relations in Replica Exchange Molecular Trajectories.

    PubMed

    Chattopadhyay, Aditya; Zheng, Min; Waller, Mark Paul; Priyakumar, U Deva

    2018-05-23

    Knowledge of the structure and dynamics of biomolecules is essential for elucidating the underlying mechanisms of biological processes. Given the stochastic nature of many biological processes, like protein unfolding, it's almost impossible that two independent simulations will generate the exact same sequence of events, which makes direct analysis of simulations difficult. Statistical models like Markov Chains, transition networks etc. help in shedding some light on the mechanistic nature of such processes by predicting long-time dynamics of these systems from short simulations. However, such methods fall short in analyzing trajectories with partial or no temporal information, for example, replica exchange molecular dynamics or Monte Carlo simulations. In this work we propose a probabilistic algorithm, borrowing concepts from graph theory and machine learning, to extract reactive pathways from molecular trajectories in the absence of temporal data. A suitable vector representation was chosen to represent each frame in the macromolecular trajectory (as a series of interaction and conformational energies) and dimensionality reduction was performed using principal component analysis (PCA). The trajectory was then clustered using a density-based clustering algorithm, where each cluster represents a metastable state on the potential energy surface (PES) of the biomolecule under study. A graph was created with these clusters as nodes with the edges learnt using an iterative expectation maximization algorithm. The most reactive path is conceived as the widest path along this graph. We have tested our method on RNA hairpin unfolding trajectory in aqueous urea solution. Our method makes the understanding of the mechanism of unfolding in RNA hairpin molecule more tractable. As this method doesn't rely on temporal data it can be used to analyze trajectories from Monte Carlo sampling techniques and replica exchange molecular dynamics (REMD).

  5. A Nonlinear Model for Gene-Based Gene-Environment Interaction.

    PubMed

    Sa, Jian; Liu, Xu; He, Tao; Liu, Guifen; Cui, Yuehua

    2016-06-04

    A vast amount of literature has confirmed the role of gene-environment (G×E) interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP) and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects) are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR) model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC) model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR) model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.

  6. Impact of leather industries on fluoride dynamics in groundwater around a tannery cluster in South India.

    PubMed

    Sajil Kumar, P J

    2013-03-01

    The aim of this study was to investigate the controls of leather industries on fluoride contamination in and around a tannery cluster in Vaniyambadi. Hydrochemical analysis, mineral saturation indices and statistical methods were used to evaluate the intervening factors that controls the contamination processes. Fluoride in groundwater is exceeded the WHO guideline value (1.5 mg/L), in 62 % of the samples, mostly with Na-HCO3 and Na-Cl type of water. Results of the principal component analysis grouped Na, F, HCO3 and NO3 under component 1. This result was in agreement with the cross plot indicating high positive correlation between F and Na (r (2)  = 0.87), HCO3 (r (2)  = 0.84) and NO3 (r (2)  = 0.55). Fluorite (CaF2) and Halite (NaCl) was undersaturated, while calcite (CaCO3) was oversaturated for all the samples. This suggest more dissolution of F-rich minerals under the active supports of Na. Bivariate plots of Na versus Cl and Na + K versus HCO3 showed a combined origin of Na from tannery effluent as well as silicate weathering. Two major clusters, based on the Na, HCO3 and F concentration showed that groundwater is affected by tanneries and silicate weathering. Fluoride concentration in 38 % of samples (n = 5) have significantly affected by the high Na concentration from tanneries.

  7. Metal mixtures in urban and rural populations in the US: The Multi-Ethnic Study of Atherosclerosis and the Strong Heart Study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pang, Yuanjie, E-mail: yuanjie.p@gmail.com

    Background: Natural and anthropogenic sources of metal exposure differ for urban and rural residents. We searched to identify patterns of metal mixtures which could suggest common environmental sources and/or metabolic pathways of different urinary metals, and compared metal-mixtures in two population-based studies from urban/sub-urban and rural/town areas in the US: the Multi-Ethnic Study of Atherosclerosis (MESA) and the Strong Heart Study (SHS). Methods: We studied a random sample of 308 White, Black, Chinese-American, and Hispanic participants in MESA (2000–2002) and 277 American Indian participants in SHS (1998–2003). We used principal component analysis (PCA), cluster analysis (CA), and linear discriminant analysismore » (LDA) to evaluate nine urinary metals (antimony [Sb], arsenic [As], cadmium [Cd], lead [Pb], molybdenum [Mo], selenium [Se], tungsten [W], uranium [U] and zinc [Zn]). For arsenic, we used the sum of inorganic and methylated species (∑As). Results: All nine urinary metals were higher in SHS compared to MESA participants. PCA and CA revealed the same patterns in SHS, suggesting 4 distinct principal components (PC) or clusters (∑As-U-W, Pb-Sb, Cd-Zn, Mo-Se). In MESA, CA showed 2 large clusters (∑As-Mo-Sb-U-W, Cd-Pb-Se-Zn), while PCA showed 4 PCs (Sb-U-W, Pb-Se-Zn, Cd-Mo, ∑As). LDA indicated that ∑As, U, W, and Zn were the most discriminant variables distinguishing MESA and SHS participants. Conclusions: In SHS, the ∑As-U-W cluster and PC might reflect groundwater contamination in rural areas, and the Cd-Zn cluster and PC could reflect common sources from meat products or metabolic interactions. Among the metals assayed, ∑As, U, W and Zn differed the most between MESA and SHS, possibly reflecting disproportionate exposure from drinking water and perhaps food in rural Native communities compared to urban communities around the US. - Highlights: • We identified and compared environmental sources of urinary metals in MESA and SHS. • ∑As-U-W in SHS may reflect groundwater contamination in rural areas. • Cd-Zn in SHS may reflect common sources from meat products or metabolic interaction. • ∑As, U, W, and Zn differed the most between MESA and SHS participants.« less

  8. Chemical indices and methods of multivariate statistics as a tool for odor classification.

    PubMed

    Mahlke, Ingo T; Thiesen, Peter H; Niemeyer, Bernd

    2007-04-01

    Industrial and agricultural off-gas streams are comprised of numerous volatile compounds, many of which have substantially different odorous properties. State-of-the-art waste-gas treatment includes the characterization of these molecules and is directed at, if possible, either the avoidance of such odorants during processing or the use of existing standardized air purification techniques like bioscrubbing or afterburning, which however, often show low efficiency under ecological and economical regards. Selective odor separation from the off-gas streams could ease many of these disadvantages but is not yet widely applicable. Thus, the aim of this paper is to identify possible model substances in selective odor separation research from 155 volatile molecules mainly originating from livestock facilities, fat refineries, and cocoa and coffee production by knowledge-based methods. All compounds are examined with regard to their structure and information-content using topological and information-theoretical indices. Resulting data are fitted in an observation matrix, and similarities between the substances are computed. Principal component analysis and k-means cluster analysis are conducted showing that clustering of indices data can depict odor information correlating well to molecular composition and molecular shape. Quantitative molecule describtion along with the application of such statistical means therefore provide a good classification tool of malodorant structure properties with no thermodynamic data needed. The approximate look-alike shape of odorous compounds within the clusters suggests a fair choice of possible model molecules.

  9. Advanced Treatment Monitoring for Olympic-Level Athletes Using Unsupervised Modeling Techniques

    PubMed Central

    Siedlik, Jacob A.; Bergeron, Charles; Cooper, Michael; Emmons, Russell; Moreau, William; Nabhan, Dustin; Gallagher, Philip; Vardiman, John P.

    2016-01-01

    Context Analysis of injury and illness data collected at large international competitions provides the US Olympic Committee and the national governing bodies for each sport with information to best prepare for future competitions. Research in which authors have evaluated medical contacts to provide the expected level of medical care and sports medicine services at international competitions is limited. Objective To analyze the medical-contact data for athletes, staff, and coaches who participated in the 2011 Pan American Games in Guadalajara, Mexico, using unsupervised modeling techniques to identify underlying treatment patterns. Design Descriptive epidemiology study. Setting Pan American Games. Patients or Other Participants A total of 618 US athletes (337 males, 281 females) participated in the 2011 Pan American Games. Main Outcome Measure(s) Medical data were recorded from the injury-evaluation and injury-treatment forms used by clinicians assigned to the central US Olympic Committee Sport Medicine Clinic and satellite locations during the operational 17-day period of the 2011 Pan American Games. We used principal components analysis and agglomerative clustering algorithms to identify and define grouped modalities. Lift statistics were calculated for within-cluster subgroups. Results Principal component analyses identified 3 components, accounting for 72.3% of the variability in datasets. Plots of the principal components showed that individual contacts focused on 4 treatment clusters: massage, paired manipulation and mobilization, soft tissue therapy, and general medical. Conclusions Unsupervised modeling techniques were useful for visualizing complex treatment data and provided insights for improved treatment modeling in athletes. Given its ability to detect clinically relevant treatment pairings in large datasets, unsupervised modeling should be considered a feasible option for future analyses of medical-contact data from international competitions. PMID:26794628

  10. Analysis of genetic diversity and population structure of oil palm (Elaeis guineensis) from China and Malaysia based on species-specific simple sequence repeat markers.

    PubMed

    Zhou, L X; Xiao, Y; Xia, W; Yang, Y D

    2015-12-08

    Genetic diversity and patterns of population structure of the 94 oil palm lines were investigated using species-specific simple sequence repeat (SSR) markers. We designed primers for 63 SSR loci based on their flanking sequences and conducted amplification in 94 oil palm DNA samples. The amplification result showed that a relatively high level of genetic diversity was observed between oil palm individuals according a set of 21 polymorphic microsatellite loci. The observed heterozygosity (Ho) was 0.3683 and 0.4035, with an average of 0.3859. The Ho value was a reliable determinant of the discriminatory power of the SSR primer combinations. The principal component analysis and unweighted pair-group method with arithmetic averaging cluster analysis showed the 94 oil palm lines were grouped into one cluster. These results demonstrated that the oil palm in Hainan Province of China and the germplasm introduced from Malaysia may be from the same source. The SSR protocol was effective and reliable for assessing the genetic diversity of oil palm. Knowledge of the genetic diversity and population structure will be crucial for establishing appropriate management stocks for this species.

  11. Application of principal component analysis to distinguish patients with schizophrenia from healthy controls based on fractional anisotropy measurements.

    PubMed

    Caprihan, A; Pearlson, G D; Calhoun, V D

    2008-08-15

    Principal component analysis (PCA) is often used to reduce the dimension of data before applying more sophisticated data analysis methods such as non-linear classification algorithms or independent component analysis. This practice is based on selecting components corresponding to the largest eigenvalues. If the ultimate goal is separation of data in two groups, then these set of components need not have the most discriminatory power. We measured the distance between two such populations using Mahalanobis distance and chose the eigenvectors to maximize it, a modified PCA method, which we call the discriminant PCA (DPCA). DPCA was applied to diffusion tensor-based fractional anisotropy images to distinguish age-matched schizophrenia subjects from healthy controls. The performance of the proposed method was evaluated by the one-leave-out method. We show that for this fractional anisotropy data set, the classification error with 60 components was close to the minimum error and that the Mahalanobis distance was twice as large with DPCA, than with PCA. Finally, by masking the discriminant function with the white matter tracts of the Johns Hopkins University atlas, we identified left superior longitudinal fasciculus as the tract which gave the least classification error. In addition, with six optimally chosen tracts the classification error was zero.

  12. Fourier Transform Infrared Spectroscopy (FTIR) and Multivariate Analysis for Identification of Different Vegetable Oils Used in Biodiesel Production

    PubMed Central

    Mueller, Daniela; Ferrão, Marco Flôres; Marder, Luciano; da Costa, Adilson Ben; de Cássia de Souza Schneider, Rosana

    2013-01-01

    The main objective of this study was to use infrared spectroscopy to identify vegetable oils used as raw material for biodiesel production and apply multivariate analysis to the data. Six different vegetable oil sources—canola, cotton, corn, palm, sunflower and soybeans—were used to produce biodiesel batches. The spectra were acquired by Fourier transform infrared spectroscopy using a universal attenuated total reflectance sensor (FTIR-UATR). For the multivariate analysis principal component analysis (PCA), hierarchical cluster analysis (HCA), interval principal component analysis (iPCA) and soft independent modeling of class analogy (SIMCA) were used. The results indicate that is possible to develop a methodology to identify vegetable oils used as raw material in the production of biodiesel by FTIR-UATR applying multivariate analysis. It was also observed that the iPCA found the best spectral range for separation of biodiesel batches using FTIR-UATR data, and with this result, the SIMCA method classified 100% of the soybean biodiesel samples. PMID:23539030

  13. Application of principal component analysis to ecodiversity assessment of postglacial landscape (on the example of Debnica Kaszubska commune, Middle Pomerania)

    NASA Astrophysics Data System (ADS)

    Wojciechowski, Adam

    2017-04-01

    In order to assess ecodiversity understood as a comprehensive natural landscape factor (Jedicke 2001), it is necessary to apply research methods which recognize the environment in a holistic way. Principal component analysis may be considered as one of such methods as it allows to distinguish the main factors determining landscape diversity on the one hand, and enables to discover regularities shaping the relationships between various elements of the environment under study on the other hand. The procedure adopted to assess ecodiversity with the use of principal component analysis involves: a) determining and selecting appropriate factors of the assessed environment qualities (hypsometric, geological, hydrographic, plant, and others); b) calculating the absolute value of individual qualities for the basic areas under analysis (e.g. river length, forest area, altitude differences, etc.); c) principal components analysis and obtaining factor maps (maps of selected components); d) generating a resultant, detailed map and isolating several classes of ecodiversity. An assessment of ecodiversity with the use of principal component analysis was conducted in the test area of 299,67 km2 in Debnica Kaszubska commune. The whole commune is situated in the Weichselian glaciation area of high hypsometric and morphological diversity as well as high geo- and biodiversity. The analysis was based on topographical maps of the commune area in scale 1:25000 and maps of forest habitats. Consequently, nine factors reflecting basic environment elements were calculated: maximum height (m), minimum height (m), average height (m), the length of watercourses (km), the area of water reservoirs (m2), total forest area (ha), coniferous forests habitats area (ha), deciduous forest habitats area (ha), alder habitats area (ha). The values for individual factors were analysed for 358 grid cells of 1 km2. Based on the principal components analysis, four major factors affecting commune ecodiversity were distinguished: hypsometric component (PC1), deciduous forest habitats component (PC2), river valleys and alder habitats component (PC3), and lakes component (PC4). The distinguished factors characterise natural qualities of postglacial area and reflect well the role of the four most important groups of environment components in shaping ecodiversity of the area under study. The map of ecodiversity of Debnica Kaszubska commune was created on the basis of the first four principal component scores and then five classes of diversity were isolated: very low, low, average, high and very high. As a result of the assessment, five commune regions of very high ecodiversity were separated. These regions are also very attractive for tourists and valuable in terms of their rich nature which include protected areas such as Slupia Valley Landscape Park. The suggested method of ecodiversity assessment with the use of principal component analysis may constitute an alternative methodological proposition to other research methods used so far. Literature Jedicke E., 2001. Biodiversität, Geodiversität, Ökodiversität. Kriterien zur Analyse der Landschaftsstruktur - ein konzeptioneller Diskussionsbeitrag. Naturschutz und Landschaftsplanung, 33(2/3), 59-68.

  14. Novel algorithm for simultaneous component detection and pseudo-molecular ion characterization in liquid chromatography-mass spectrometry.

    PubMed

    Zhang, Yufeng; Wang, Xiaoan; Wo, Siukwan; Ho, Hingman; Han, Quanbin; Fan, Xiaohui; Zuo, Zhong

    2015-01-01

    Resolving components and determining their pseudo-molecular ions (PMIs) are crucial steps in identifying complex herbal mixtures by liquid chromatography-mass spectrometry. To tackle such labor-intensive steps, we present here a novel algorithm for simultaneous detection of components and their PMIs. Our method consists of three steps: (1) obtaining a simplified dataset containing only mono-isotopic masses by removal of background noise and isotopic cluster ions based on the isotopic distribution model derived from all the reported natural compounds in dictionary of natural products; (2) stepwise resolving and removing all features of the highest abundant component from current simplified dataset and calculating PMI of each component according to an adduct-ion model, in which all non-fragment ions in a mass spectrum are considered as PMI plus one or several neutral species; (3) visual classification of detected components by principal component analysis (PCA) to exclude possible non-natural compounds (such as pharmaceutical excipients). This algorithm has been successfully applied to a standard mixture and three herbal extract/preparations. It indicated that our algorithm could detect components' features as a whole and report their PMI with an accuracy of more than 98%. Furthermore, components originated from excipients/contaminants could be easily separated from those natural components in the bi-plots of PCA. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Principal component analysis on a torus: Theory and application to protein dynamics.

    PubMed

    Sittel, Florian; Filk, Thomas; Stock, Gerhard

    2017-12-28

    A dimensionality reduction method for high-dimensional circular data is developed, which is based on a principal component analysis (PCA) of data points on a torus. Adopting a geometrical view of PCA, various distance measures on a torus are introduced and the associated problem of projecting data onto the principal subspaces is discussed. The main idea is that the (periodicity-induced) projection error can be minimized by transforming the data such that the maximal gap of the sampling is shifted to the periodic boundary. In a second step, the covariance matrix and its eigendecomposition can be computed in a standard manner. Adopting molecular dynamics simulations of two well-established biomolecular systems (Aib 9 and villin headpiece), the potential of the method to analyze the dynamics of backbone dihedral angles is demonstrated. The new approach allows for a robust and well-defined construction of metastable states and provides low-dimensional reaction coordinates that accurately describe the free energy landscape. Moreover, it offers a direct interpretation of covariances and principal components in terms of the angular variables. Apart from its application to PCA, the method of maximal gap shifting is general and can be applied to any other dimensionality reduction method for circular data.

  16. Principal component analysis on a torus: Theory and application to protein dynamics

    NASA Astrophysics Data System (ADS)

    Sittel, Florian; Filk, Thomas; Stock, Gerhard

    2017-12-01

    A dimensionality reduction method for high-dimensional circular data is developed, which is based on a principal component analysis (PCA) of data points on a torus. Adopting a geometrical view of PCA, various distance measures on a torus are introduced and the associated problem of projecting data onto the principal subspaces is discussed. The main idea is that the (periodicity-induced) projection error can be minimized by transforming the data such that the maximal gap of the sampling is shifted to the periodic boundary. In a second step, the covariance matrix and its eigendecomposition can be computed in a standard manner. Adopting molecular dynamics simulations of two well-established biomolecular systems (Aib9 and villin headpiece), the potential of the method to analyze the dynamics of backbone dihedral angles is demonstrated. The new approach allows for a robust and well-defined construction of metastable states and provides low-dimensional reaction coordinates that accurately describe the free energy landscape. Moreover, it offers a direct interpretation of covariances and principal components in terms of the angular variables. Apart from its application to PCA, the method of maximal gap shifting is general and can be applied to any other dimensionality reduction method for circular data.

  17. Damage detection methodology under variable load conditions based on strain field pattern recognition using FBGs, nonlinear principal component analysis, and clustering techniques

    NASA Astrophysics Data System (ADS)

    Sierra-Pérez, Julián; Torres-Arredondo, M.-A.; Alvarez-Montoya, Joham

    2018-01-01

    Structural health monitoring consists of using sensors integrated within structures together with algorithms to perform load monitoring, damage detection, damage location, damage size and severity, and prognosis. One possibility is to use strain sensors to infer structural integrity by comparing patterns in the strain field between the pristine and damaged conditions. In previous works, the authors have demonstrated that it is possible to detect small defects based on strain field pattern recognition by using robust machine learning techniques. They have focused on methodologies based on principal component analysis (PCA) and on the development of several unfolding and standardization techniques, which allow dealing with multiple load conditions. However, before a real implementation of this approach in engineering structures, changes in the strain field due to conditions different from damage occurrence need to be isolated. Since load conditions may vary in most engineering structures and promote significant changes in the strain field, it is necessary to implement novel techniques for uncoupling such changes from those produced by damage occurrence. A damage detection methodology based on optimal baseline selection (OBS) by means of clustering techniques is presented. The methodology includes the use of hierarchical nonlinear PCA as a nonlinear modeling technique in conjunction with Q and nonlinear-T 2 damage indices. The methodology is experimentally validated using strain measurements obtained by 32 fiber Bragg grating sensors bonded to an aluminum beam under dynamic bending loads and simultaneously submitted to variations in its pitch angle. The results demonstrated the capability of the methodology for clustering data according to 13 different load conditions (pitch angles), performing the OBS and detecting six different damages induced in a cumulative way. The proposed methodology showed a true positive rate of 100% and a false positive rate of 1.28% for a 99% of confidence.

  18. Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity.

    PubMed

    He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei

    2015-01-01

    The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.

  19. Finding Planets in K2: A New Method of Cleaning the Data

    NASA Astrophysics Data System (ADS)

    Currie, Miles; Mullally, Fergal; Thompson, Susan E.

    2017-01-01

    We present a new method of removing systematic flux variations from K2 light curves by employing a pixel-level principal component analysis (PCA). This method decomposes the light curves into its principal components (eigenvectors), each with an associated eigenvalue, the value of which is correlated to how much influence the basis vector has on the shape of the light curve. This method assumes that the most influential basis vectors will correspond to the unwanted systematic variations in the light curve produced by K2’s constant motion. We correct the raw light curve by automatically fitting and removing the strongest principal components. The strongest principal components generally correspond to the flux variations that result from the motion of the star in the field of view. Our primary method of calculating the strongest principal components to correct for in the raw light curve estimates the noise by measuring the scatter in the light curve after using an algorithm for Savitsy-Golay detrending, which computes the combined photometric precision value (SG-CDPP value) used in classic Kepler. We calculate this value after correcting the raw light curve for each element in a list of cumulative sums of principal components so that we have as many noise estimate values as there are principal components. We then take the derivative of the list of SG-CDPP values and take the number of principal components that correlates to the point at which the derivative effectively goes to zero. This is the optimal number of principal components to exclude from the refitting of the light curve. We find that a pixel-level PCA is sufficient for cleaning unwanted systematic and natural noise from K2’s light curves. We present preliminary results and a basic comparison to other methods of reducing the noise from the flux variations.

  20. Chemical modeling of groundwater in the Banat Plain, southwestern Romania, with elevated As content and co-occurring species by combining diagrams and unsupervised multivariate statistical approaches.

    PubMed

    Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu

    2017-04-01

    The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Dynamic of consumer groups and response of commodity markets by principal component analysis

    NASA Astrophysics Data System (ADS)

    Nobi, Ashadun; Alam, Shafiqul; Lee, Jae Woo

    2017-09-01

    This study investigates financial states and group dynamics by applying principal component analysis to the cross-correlation coefficients of the daily returns of commodity futures. The eigenvalues of the cross-correlation matrix in the 6-month timeframe displays similar values during 2010-2011, but decline following 2012. A sharp drop in eigenvalue implies the significant change of the market state. Three commodity sectors, energy, metals and agriculture, are projected into two dimensional spaces consisting of two principal components (PC). We observe that they form three distinct clusters in relation to various sectors. However, commodities with distinct features have intermingled with one another and scattered during severe crises, such as the European sovereign debt crises. We observe the notable change of the position of two dimensional spaces of groups during financial crises. By considering the first principal component (PC1) within the 6-month moving timeframe, we observe that commodities of the same group change states in a similar pattern, and the change of states of one group can be used as a warning for other group.

  2. Subgroups Among Opiate Addicts

    ERIC Educational Resources Information Center

    Berzins, Juris I.; And Others

    1974-01-01

    The principal objective of the present investigation was to delineate homogeneous MMPI profile subgroups (types) through multivariate clustering procedures and to compare the derived (replicable) types on measures of the components of "sociopathy" as well as on other psychometric devices. (Author)

  3. Source Evaluation and Trace Metal Contamination in Benthic Sediments from Equatorial Ecosystems Using Multivariate Statistical Techniques

    PubMed Central

    Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.

    2016-01-01

    Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934

  4. Absorption spectroscopy and multi-angle scattering measurements in the visible spectral range for the geographic classification of Italian exravirgin olive oils

    NASA Astrophysics Data System (ADS)

    Mignani, Anna G.; Ciaccheri, Leonardo; Cimato, Antonio; Sani, Graziano; Smith, Peter R.

    2004-03-01

    Absorption spectroscopy and multi-angle scattering measurements in the visible spectral range are innovately used to analyze samples of extra virgin olive oils coming from selected areas of Tuscany, a famous Italian region for the production of extra virgin olive oil. The measured spectra are processed by means of the Principal Component Analysis method, so as to create a 3D map capable of clustering the Tuscan oils within the wider area of Italian extra virgin olive oils.

  5. Distributions of experimental protein structures on coarse-grained free energy landscapes

    PubMed Central

    Liu, Jie; Jernigan, Robert L.

    2015-01-01

    Predicting conformational changes of proteins is needed in order to fully comprehend functional mechanisms. With the large number of available structures in sets of related proteins, it is now possible to directly visualize the clusters of conformations and their conformational transitions through the use of principal component analysis. The most striking observation about the distributions of the structures along the principal components is their highly non-uniform distributions. In this work, we use principal component analysis of experimental structures of 50 diverse proteins to extract the most important directions of their motions, sample structures along these directions, and estimate their free energy landscapes by combining knowledge-based potentials and entropy computed from elastic network models. When these resulting motions are visualized upon their coarse-grained free energy landscapes, the basis for conformational pathways becomes readily apparent. Using three well-studied proteins, T4 lysozyme, serum albumin, and sarco-endoplasmic reticular Ca2+ adenosine triphosphatase (SERCA), as examples, we show that such free energy landscapes of conformational changes provide meaningful insights into the functional dynamics and suggest transition pathways between different conformational states. As a further example, we also show that Monte Carlo simulations on the coarse-grained landscape of HIV-1 protease can directly yield pathways for force-driven conformational changes. PMID:26723638

  6. Interpretation of data on the aggregate composition of typical chernozems under different land use by cluster and principal component analyses

    NASA Astrophysics Data System (ADS)

    Kholodov, V. A.; Yaroslavtseva, N. V.; Lazarev, V. I.; Frid, A. S.

    2016-09-01

    Cluster analysis and principal component analysis (PCA) have been used for the interpretation of dry sieving data. Chernozems from the treatments of long-term field experiments with different land-use patterns— annually mowed steppe, continuous potato culture, permanent black fallow, and untilled fallow since 1998 after permanent black fallow—have been used. Analysis of dry sieving data by PCA has shown that the treatments of untilled fallow after black fallow and annually mowed steppe differ most in the series considered; the content of dry aggregates of 10-7 mm makes the largest contribution to the distribution of objects along the first principal component. This fraction has been sieved in water and analyzed by PCA. In contrast to dry sieving data, the wet sieving data showed the closest mathematical distance between the treatment of untilled fallow after black fallow and the undisturbed treatment of annually mowed steppe, while the untilled fallow after black fallow and the permanent black fallow were the most distant treatments. Thus, it may be suggested that the water stability of structure is first restored after the removal of destructive anthropogenic load. However, the restoration of the distribution of structural separates to the parameters characteristic of native soils is a significantly longer process.

  7. Oral mucosal color changes as a clinical biomarker for cancer detection.

    PubMed

    Latini, Giuseppe; De Felice, Claudio; Barducci, Alessandro; Chitano, Giovanna; Pignatelli, Antonietta; Grimaldi, Luca; Tramacere, Francesco; Laurini, Ricardo; Andreassi, Maria Grazia; Portaluri, Maurizio

    2012-07-01

    Screening is a key tool for early cancer detection/prevention and potentially saves lives. Oral mucosal vascular aberrations and color changes have been reported in hereditary nonpolyposis colorectal cancer patients, possibly reflecting a subclinical extracellular matrix abnormality implicated in the general process of cancer development. Reasoning that physicochemical changes of a tissue should affect its optical properties, we investigated the diagnostic ability of oral mucosal color to identify patients with several types of cancer. A total of 67 patients with several histologically proven malignancies at different stages were enrolled along with a group of 60 healthy controls of comparable age and sex ratio. Oral mucosal color was measured in selected areas, and then univariate, cluster, and principal component analyses were carried out. Lower red and green and higher blue values were significantly associated with evidence of cancer (all P<0.0001), and efficiently discriminated patients from controls. The blue color coordinate showed significantly higher sensitivity and specificity (96.66±2.77 and 97.16±3.46%, respectively) compared with the red and green coordinates. Likewise, the second principal component coordinate of the red-green clusters discriminated patients from controls with 98.2% sensitivity and 95% specificity (cut-off criterion≤0.4547; P=0.0001). The scatterplots of the chrominances revealed the formation of two well separated clusters, separating cancer patients from controls with a 99.4% probability of correct classification. These findings highlight the ability of oral color to encode clinically relevant biophysical information. In the near future, this low-cost and noninvasive method may become a useful tool for early cancer detection.

  8. Genetic diversity among silkworm (Bombyx mori L., Lep., Bombycidae) germplasms revealed by microsatellites.

    PubMed

    Li, Muwang; Shen, Li; Xu, Anying; Miao, Xuexia; Hou, Chengxiang; Sun, Pingjiang; Zhang, Yuehua; Huang, Yongping

    2005-10-01

    To determine genetic relationships among strains of silkworm, Bombyx mori L., 31 strains with different origins, number of generations per year, number of molts per generation, and morphological characters were studied using simple sequence repeat (SSR) markers. Twenty-six primer pairs flanking microsatellite sequences in the silkworm genome were assayed. All were polymorphic and unambiguously separated silkworm strains from each other. A total of 188 alleles were detected with a mean value of 7.2 alleles/locus (range 2-17). The average heterozygosity value for each SSR locus ranged from 0 to 0.60, and the highest one was 0.96 (Fl0516 in 4013). The mean polymorphism index content (PIC) was 0.66 (range 0.12-0.89). Unweighted pair group method with arithmetic means (UPGMA) cluster analysis of Nei's genetic distance grouped silkworm strains based on their origin. Seven major ecotypic silkworm groups were analyzed. Principal components analysis (PCA) for SSR data support their UPGMA clustering. The results indicated that SSR markers are an efficient tool for fingerprinting cultivars and conducting genetic-diversity studies in the silkworm.

  9. Inference from clustering with application to gene-expression microarrays.

    PubMed

    Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M

    2002-01-01

    There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.

  10. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    PubMed

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.

  11. An Improved Cloud Classification Algorithm for China’s FY-2C Multi-Channel Images Using Artificial Neural Network

    PubMed Central

    Liu, Yu; Xia, Jun; Shi, Chun-Xiang; Hong, Yang

    2009-01-01

    The crowning objective of this research was to identify a better cloud classification method to upgrade the current window-based clustering algorithm used operationally for China’s first operational geostationary meteorological satellite FengYun-2C (FY-2C) data. First, the capabilities of six widely-used Artificial Neural Network (ANN) methods are analyzed, together with the comparison of two other methods: Principal Component Analysis (PCA) and a Support Vector Machine (SVM), using 2864 cloud samples manually collected by meteorologists in June, July, and August in 2007 from three FY-2C channel (IR1, 10.3–11.3 μm; IR2, 11.5–12.5 μm and WV 6.3–7.6 μm) imagery. The result shows that: (1) ANN approaches, in general, outperformed the PCA and the SVM given sufficient training samples and (2) among the six ANN networks, higher cloud classification accuracy was obtained with the Self-Organizing Map (SOM) and Probabilistic Neural Network (PNN). Second, to compare the ANN methods to the present FY-2C operational algorithm, this study implemented SOM, one of the best ANN network identified from this study, as an automated cloud classification system for the FY-2C multi-channel data. It shows that SOM method has improved the results greatly not only in pixel-level accuracy but also in cloud patch-level classification by more accurately identifying cloud types such as cumulonimbus, cirrus and clouds in high latitude. Findings of this study suggest that the ANN-based classifiers, in particular the SOM, can be potentially used as an improved Automated Cloud Classification Algorithm to upgrade the current window-based clustering method for the FY-2C operational products. PMID:22346714

  12. An Improved Cloud Classification Algorithm for China's FY-2C Multi-Channel Images Using Artificial Neural Network.

    PubMed

    Liu, Yu; Xia, Jun; Shi, Chun-Xiang; Hong, Yang

    2009-01-01

    The crowning objective of this research was to identify a better cloud classification method to upgrade the current window-based clustering algorithm used operationally for China's first operational geostationary meteorological satellite FengYun-2C (FY-2C) data. First, the capabilities of six widely-used Artificial Neural Network (ANN) methods are analyzed, together with the comparison of two other methods: Principal Component Analysis (PCA) and a Support Vector Machine (SVM), using 2864 cloud samples manually collected by meteorologists in June, July, and August in 2007 from three FY-2C channel (IR1, 10.3-11.3 μm; IR2, 11.5-12.5 μm and WV 6.3-7.6 μm) imagery. The result shows that: (1) ANN approaches, in general, outperformed the PCA and the SVM given sufficient training samples and (2) among the six ANN networks, higher cloud classification accuracy was obtained with the Self-Organizing Map (SOM) and Probabilistic Neural Network (PNN). Second, to compare the ANN methods to the present FY-2C operational algorithm, this study implemented SOM, one of the best ANN network identified from this study, as an automated cloud classification system for the FY-2C multi-channel data. It shows that SOM method has improved the results greatly not only in pixel-level accuracy but also in cloud patch-level classification by more accurately identifying cloud types such as cumulonimbus, cirrus and clouds in high latitude. Findings of this study suggest that the ANN-based classifiers, in particular the SOM, can be potentially used as an improved Automated Cloud Classification Algorithm to upgrade the current window-based clustering method for the FY-2C operational products.

  13. Principal component regression analysis with SPSS.

    PubMed

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  14. A hybrid spatial-spectral denoising method for infrared hyperspectral images using 2DPCA

    NASA Astrophysics Data System (ADS)

    Huang, Jun; Ma, Yong; Mei, Xiaoguang; Fan, Fan

    2016-11-01

    The traditional noise reduction methods for 3-D infrared hyperspectral images typically operate independently in either the spatial or spectral domain, and such methods overlook the relationship between the two domains. To address this issue, we propose a hybrid spatial-spectral method in this paper to link both domains. First, principal component analysis and bivariate wavelet shrinkage are performed in the 2-D spatial domain. Second, 2-D principal component analysis transformation is conducted in the 1-D spectral domain to separate the basic components from detail ones. The energy distribution of noise is unaffected by orthogonal transformation; therefore, the signal-to-noise ratio of each component is used as a criterion to determine whether a component should be protected from over-denoising or denoised with certain 1-D denoising methods. This study implements the 1-D wavelet shrinking threshold method based on Stein's unbiased risk estimator, and the quantitative results on publicly available datasets demonstrate that our method can improve denoising performance more effectively than other state-of-the-art methods can.

  15. Quality assessment of crude and processed Arecae semen based on colorimeter and HPLC combined with chemometrics methods.

    PubMed

    Sun, Meng; Yan, Donghui; Yang, Xiaolu; Xue, Xingyang; Zhou, Sujuan; Liang, Shengwang; Wang, Shumei; Meng, Jiang

    2017-05-01

    Raw Arecae Semen, the seed of Areca catechu L., as well as Arecae Semen Tostum and Arecae semen carbonisata are traditionally processed by stir-baking for subsequent use in a variety of clinical applications. These three Arecae semen types, important Chinese herbal drugs, have been used in China and other Asian countries for thousands of years. In this study, the sensory technologies of a colorimeter and sensitive validated high-performance liquid chromatography with diode array detection were employed to discriminate raw Arecae semen and its processed drugs. The color parameters of the samples were determined by a colorimeter instrument CR-410. Moreover, the fingerprints of the four alkaloids of arecaidine, guvacine, arecoline and guvacoline were surveyed by high-performance liquid chromatography. Subsequently, Student's t test, the analysis of variance, fingerprint similarity analysis, hierarchical cluster analysis, principal component analysis, factor analysis and Pearson's correlation test were performed for final data analysis. The results obtained demonstrated a significant color change characteristic for components in raw Arecae semen and its processed drugs. Crude and processed Arecae semen could be determined based on colorimetry and high-performance liquid chromatography with a diode array detector coupled with chemometrics methods for a comprehensive quality evaluation. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. The Relation between Factor Score Estimates, Image Scores, and Principal Component Scores

    ERIC Educational Resources Information Center

    Velicer, Wayne F.

    1976-01-01

    Investigates the relation between factor score estimates, principal component scores, and image scores. The three methods compared are maximum likelihood factor analysis, principal component analysis, and a variant of rescaled image analysis. (RC)

  17. A novel genome signature based on inter-nucleotide distances profiles for visualization of metagenomic data

    NASA Astrophysics Data System (ADS)

    Xie, Xian-Hua; Yu, Zu-Guo; Ma, Yuan-Lin; Han, Guo-Sheng; Anh, Vo

    2017-09-01

    There has been a growing interest in visualization of metagenomic data. The present study focuses on the visualization of metagenomic data using inter-nucleotide distances profile. We first convert the fragment sequences into inter-nucleotide distances profiles. Then we analyze these profiles by principal component analysis. Finally the principal components are used to obtain the 2-D scattered plot according to their source of species. We name our method as inter-nucleotide distances profiles (INP) method. Our method is evaluated on three benchmark data sets used in previous published papers. Our results demonstrate that the INP method is good, alternative and efficient for visualization of metagenomic data.

  18. Discrimination of multilocus sequence typing-based Campylobacter jejuni subgroups by MALDI-TOF mass spectrometry.

    PubMed

    Zautner, Andreas Erich; Masanta, Wycliffe Omurwa; Tareen, Abdul Malik; Weig, Michael; Lugert, Raimond; Groß, Uwe; Bader, Oliver

    2013-11-07

    Campylobacter jejuni, the most common bacterial pathogen causing gastroenteritis, shows a wide genetic diversity. Previously, we demonstrated by the combination of multi locus sequence typing (MLST)-based UPGMA-clustering and analysis of 16 genetic markers that twelve different C. jejuni subgroups can be distinguished. Among these are two prominent subgroups. The first subgroup contains the majority of hyperinvasive strains and is characterized by a dimeric form of the chemotaxis-receptor Tlp7(m+c). The second has an extended amino acid metabolism and is characterized by the presence of a periplasmic asparaginase (ansB) and gamma-glutamyl-transpeptidase (ggt). Phyloproteomic principal component analysis (PCA) hierarchical clustering of MALDI-TOF based intact cell mass spectrometry (ICMS) spectra was able to group particular C. jejuni subgroups of phylogenetic related isolates in distinct clusters. Especially the aforementioned Tlp7(m+c)(+) and ansB+/ ggt+ subgroups could be discriminated by PCA. Overlay of ICMS spectra of all isolates led to the identification of characteristic biomarker ions for these specific C. jejuni subgroups. Thus, mass peak shifts can be used to identify the C. jejuni subgroup with an extended amino acid metabolism. Although the PCA hierarchical clustering of ICMS-spectra groups the tested isolates into a different order as compared to MLST-based UPGMA-clustering, the isolates of the indicator-groups form predominantly coherent clusters. These clusters reflect phenotypic aspects better than phylogenetic clustering, indicating that the genes corresponding to the biomarker ions are phylogenetically coupled to the tested marker genes. Thus, PCA clustering could be an additional tool for analyzing the relatedness of bacterial isolates.

  19. Reduction in Acute Gastroenteritis among Military Trainees: Secondary Effects of a Hygiene-based Cluster-Randomized Trial for Skin and Soft Tissue Infection Prevention

    PubMed Central

    D’Onofrio, Michael J.; Schlett, Carey D.; Millar, Eugene V.; Cui, Tianyuan; Lanier, Jeffrey B.; Law, Natasha N.; Tribble, David R.; Ellis, Michael W.

    2018-01-01

    Military personnel in congregate settings are at increased risk for acute gastroenteritis.1,2 Personal hygiene (eg, frequent hand washing, hand sanitizers, etc.) remains a central strategy. A skin and soft tissue infection (SSTI) prevention trial was conducted among military trainees.3 Trainees were randomized to 1 of 3 groups with incrementally increasing education- and hygiene-based measures. The principal components were promotion of hand washing in addition to a once-weekly application of a chlorhexidine-based body wash. Herein, we report the trial’s impact on acute gastroenteritis. PMID:25695181

  20. Competency Based Teacher Education Component. Curriculum Methods and Materials, Elementary Mathematics and Social Studies.

    ERIC Educational Resources Information Center

    Woodworth, William D.

    Four mathematical/social studies module clusters are presented in an effort to develop proficiency in instruction and in inductive and deductive teaching procedures. Modules within the first cluster concern systems of numeration, set operations, numbers, measurement, geometry, mathematics, and reasoning. The second mathematical cluster presents…

  1. Data-driven mapping of hypoxia-related tumor heterogeneity using DCE-MRI and OE-MRI.

    PubMed

    Featherstone, Adam K; O'Connor, James P B; Little, Ross A; Watson, Yvonne; Cheung, Sue; Babur, Muhammad; Williams, Kaye J; Matthews, Julian C; Parker, Geoff J M

    2018-04-01

    Previous work has shown that combining dynamic contrast-enhanced (DCE)-MRI and oxygen-enhanced (OE)-MRI binary enhancement maps can identify tumor hypoxia. The current work proposes a novel, data-driven method for mapping tissue oxygenation and perfusion heterogeneity, based on clustering DCE/OE-MRI data. DCE-MRI and OE-MRI were performed on nine U87 (glioblastoma) and seven Calu6 (non-small cell lung cancer) murine xenograft tumors. Area under the curve and principal component analysis features were calculated and clustered separately using Gaussian mixture modelling. Evaluation metrics were calculated to determine the optimum feature set and cluster number. Outputs were quantitatively compared with a previous non data-driven approach. The optimum method located six robustly identifiable clusters in the data, yielding tumor region maps with spatially contiguous regions in a rim-core structure, suggesting a biological basis. Mean within-cluster enhancement curves showed physiologically distinct, intuitive kinetics of enhancement. Regions of DCE/OE-MRI enhancement mismatch were located, and voxel categorization agreed well with the previous non data-driven approach (Cohen's kappa = 0.61, proportional agreement = 0.75). The proposed method locates similar regions to the previous published method of binarization of DCE/OE-MRI enhancement, but renders a finer segmentation of intra-tumoral oxygenation and perfusion. This could aid in understanding the tumor microenvironment and its heterogeneity. Magn Reson Med 79:2236-2245, 2018. © 2017 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2017 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.

  2. Efficient three-dimensional resist profile-driven source mask optimization optical proximity correction based on Abbe-principal component analysis and Sylvester equation

    NASA Astrophysics Data System (ADS)

    Lin, Pei-Chun; Yu, Chun-Chang; Chen, Charlie Chung-Ping

    2015-01-01

    As one of the critical stages of a very large scale integration fabrication process, postexposure bake (PEB) plays a crucial role in determining the final three-dimensional (3-D) profiles and lessening the standing wave effects. However, the full 3-D chemically amplified resist simulation is not widely adopted during the postlayout optimization due to the long run-time and huge memory usage. An efficient simulation method is proposed to simulate the PEB while considering standing wave effects and resolution enhancement techniques, such as source mask optimization and subresolution assist features based on the Sylvester equation and Abbe-principal component analysis method. Simulation results show that our algorithm is 20× faster than the conventional Gaussian convolution method.

  3. Sensory analysis of characterising flavours: evaluating tobacco product odours using an expert panel.

    PubMed

    Krüsemann, Erna J Z; Lasschuijt, Marlou P; de Graaf, C; de Wijk, René A; Punter, Pieter H; van Tiel, Loes; Cremers, Johannes W J M; van de Nobelen, Suzanne; Boesveldt, Sanne; Talhout, Reinskje

    2018-05-23

    Tobacco flavours are an important regulatory concept in several jurisdictions, for example in the USA, Canada and Europe. The European Tobacco Products Directive 2014/40/EU prohibits cigarettes and roll-your-own tobacco having a characterising flavour. This directive defines characterising flavour as 'a clearly noticeable smell or taste other than one of tobacco […]'. To distinguish between products with and without a characterising flavour, we trained an expert panel to identify characterising flavours by smelling. An expert panel (n=18) evaluated the smell of 20 tobacco products using self-defined odour attributes, following Quantitative Descriptive Analysis. The panel was trained during 14 attribute training, consensus training and performance monitoring sessions. Products were assessed during six test sessions. Principal component analysis, hierarchical clustering (four and six clusters) and Hotelling's T-tests (95% and 99% CIs) were used to determine differences and similarities between tobacco products based on odour attributes. The final attribute list contained 13 odour descriptors. Panel performance was sufficient after 14 training sessions. Products marketed as unflavoured that formed a cluster were considered reference products. A four-cluster method distinguished cherry-flavoured, vanilla-flavoured and menthol-flavoured products from reference products. Six clusters subdivided reference products into tobacco leaves, roll-your-own and commercial products. An expert panel was successfully trained to assess characterising odours in cigarettes and roll-your-own tobacco. This method could be applied to other product types such as e-cigarettes. Regulatory decisions on the choice of reference products and significance level are needed which directly influences the products being assessed as having a characterising odour. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  4. Assessing prescription drug abuse using functional principal component analysis (FPCA) of wastewater data.

    PubMed

    Salvatore, Stefania; Røislien, Jo; Baz-Lomba, Jose A; Bramness, Jørgen G

    2017-03-01

    Wastewater-based epidemiology is an alternative method for estimating the collective drug use in a community. We applied functional data analysis, a statistical framework developed for analysing curve data, to investigate weekly temporal patterns in wastewater measurements of three prescription drugs with known abuse potential: methadone, oxazepam and methylphenidate, comparing them to positive and negative control drugs. Sewage samples were collected in February 2014 from a wastewater treatment plant in Oslo, Norway. The weekly pattern of each drug was extracted by fitting of generalized additive models, using trigonometric functions to model the cyclic behaviour. From the weekly component, the main temporal features were then extracted using functional principal component analysis. Results are presented through the functional principal components (FPCs) and corresponding FPC scores. Clinically, the most important weekly feature of the wastewater-based epidemiology data was the second FPC, representing the difference between average midweek level and a peak during the weekend, representing possible recreational use of a drug in the weekend. Estimated scores on this FPC indicated recreational use of methylphenidate, with a high weekend peak, but not for methadone and oxazepam. The functional principal component analysis uncovered clinically important temporal features of the weekly patterns of the use of prescription drugs detected from wastewater analysis. This may be used as a post-marketing surveillance method to monitor prescription drugs with abuse potential. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  5. Performance-based measures associate with frailty in patients with end-stage liver disease

    PubMed Central

    Lai, Jennifer C.; Volk, Michael L; Strasburg, Debra; Alexander, Neil

    2016-01-01

    Background Physical frailty, as measured by the Fried Frailty Index, is increasingly recognized as a critical determinant of outcomes in cirrhotics. However, its utility is limited by the inclusion of self-reported components. We aimed to identify performance-based measures associated with frailty in patients with cirrhosis. Methods Cirrhotics ≥50 years underwent: 6-minute walk test (6MWT, cardiopulmonary endurance), chair stands in 30 seconds (muscle endurance), isometric knee extension (lower extremity strength), unipedal stance time (static balance), and maximal step length (dynamic balance/coordination). Linear regression associated each physical performance test with frailty. Principal components exploratory factor analysis evaluated the inter-relatedness of frailty and the 5 physical performance tests. Results Of forty cirrhotics, with a median age of 64 years and Model for End-stage Liver Disease (MELD) MELD of 12,10 (25%) were frail by Fried Frailty Index ≥3. Frail cirrhotics had poorer performance in 6MWT distance (231 vs. 338 meters), 30 second chair stands (7 vs. 10), isometric knee extension (86 vs. 122 Newton meters), and maximal step length (22 vs. 27 inches) [p≤0.02 for each]. Each physical performance test was significantly associated with frailty (p<0.01), even after adjustment for MELD or hepatic encephalopathy. Principal component factor analysis demonstrated substantial, but unique, clustering of each physical performance test to a single factor – frailty. Conclusion Frailty in cirrhosis is a multi-dimensional construct that is distinct from liver dysfunction and incorporates endurance, strength, and balance. Our data provide specific targets for prehabilitation interventions aimed at reducing frailty in cirrhotics in preparation for liver transplantation. PMID:27495749

  6. DCE-MRI defined subvolumes of a brain metastatic lesion by principle component analysis and fuzzy-c-means clustering for response assessment of radiation therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.

    Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. Amore » DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.« less

  7. DCE-MRI defined subvolumes of a brain metastatic lesion by principle component analysis and fuzzy-c-means clustering for response assessment of radiation therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Farjam, Reza; Tsien, Christina I.; Lawrence, Theodore S.

    2014-01-15

    Purpose: To develop a pharmacokinetic modelfree framework to analyze the dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) data for assessment of response of brain metastases to radiation therapy. Methods: Twenty patients with 45 analyzable brain metastases had MRI scans prior to whole brain radiation therapy (WBRT) and at the end of the 2-week therapy. The volumetric DCE images covering the whole brain were acquired on a 3T scanner with approximately 5 s temporal resolution and a total scan time of about 3 min. DCE curves from all voxels of the 45 brain metastases were normalized and then temporally aligned. Amore » DCE matrix that is constructed from the aligned DCE curves of all voxels of the 45 lesions obtained prior to WBRT is processed by principal component analysis to generate the principal components (PCs). Then, the projection coefficient maps prior to and at the end of WBRT are created for each lesion. Next, a pattern recognition technique, based upon fuzzy-c-means clustering, is used to delineate the tumor subvolumes relating to the value of the significant projection coefficients. The relationship between changes in different tumor subvolumes and treatment response was evaluated to differentiate responsive from stable and progressive tumors. Performance of the PC-defined tumor subvolume was also evaluated by receiver operating characteristic (ROC) analysis in prediction of nonresponsive lesions and compared with physiological-defined tumor subvolumes. Results: The projection coefficient maps of the first three PCs contain almost all response-related information in DCE curves of brain metastases. The first projection coefficient, related to the area under DCE curves, is the major component to determine response while the third one has a complimentary role. In ROC analysis, the area under curve of 0.88 ± 0.05 and 0.86 ± 0.06 were achieved for the PC-defined and physiological-defined tumor subvolume in response assessment. Conclusions: The PC-defined subvolume of a brain metastasis could predict tumor response to therapy similar to the physiological-defined one, while the former is determined more rapidly for clinical decision-making support.« less

  8. Performance Analysis of Hybrid Electric Vehicle over Different Driving Cycles

    NASA Astrophysics Data System (ADS)

    Panday, Aishwarya; Bansal, Hari Om

    2017-02-01

    Article aims to find the nature and response of a hybrid vehicle on various standard driving cycles. Road profile parameters play an important role in determining the fuel efficiency. Typical parameters of road profile can be reduced to a useful smaller set using principal component analysis and independent component analysis. Resultant data set obtained after size reduction may result in more appropriate and important parameter cluster. With reduced parameter set fuel economies over various driving cycles, are ranked using TOPSIS and VIKOR multi-criteria decision making methods. The ranking trend is then compared with the fuel economies achieved after driving the vehicle over respective roads. Control strategy responsible for power split is optimized using genetic algorithm. 1RC battery model and modified SOC estimation method are considered for the simulation and improved results compared with the default are obtained.

  9. [Vis-NIR spectroscopic pattern recognition combined with SG smoothing applied to breed screening of transgenic sugarcane].

    PubMed

    Liu, Gui-Song; Guo, Hao-Song; Pan, Tao; Wang, Ji-Hua; Cao, Gan

    2014-10-01

    Based on Savitzky-Golay (SG) smoothing screening, principal component analysis (PCA) combined with separately supervised linear discriminant analysis (LDA) and unsupervised hierarchical clustering analysis (HCA) were used for non-destructive visible and near-infrared (Vis-NIR) detection for breed screening of transgenic sugarcane. A random and stability-dependent framework of calibration, prediction, and validation was proposed. A total of 456 samples of sugarcane leaves planting in the elongating stage were collected from the field, which was composed of 306 transgenic (positive) samples containing Bt and Bar gene and 150 non-transgenic (negative) samples. A total of 156 samples (negative 50 and positive 106) were randomly selected as the validation set; the remaining samples (negative 100 and positive 200, a total of 300 samples) were used as the modeling set, and then the modeling set was subdivided into calibration (negative 50 and positive 100, a total of 150 samples) and prediction sets (negative 50 and positive 100, a total of 150 samples) for 50 times. The number of SG smoothing points was ex- panded, while some modes of higher derivative were removed because of small absolute value, and a total of 264 smoothing modes were used for screening. The pairwise combinations of first three principal components were used, and then the optimal combination of principal components was selected according to the model effect. Based on all divisions of calibration and prediction sets and all SG smoothing modes, the SG-PCA-LDA and SG-PCA-HCA models were established, the model parameters were optimized based on the average prediction effect for all divisions to produce modeling stability. Finally, the model validation was performed by validation set. With SG smoothing, the modeling accuracy and stability of PCA-LDA, PCA-HCA were signif- icantly improved. For the optimal SG-PCA-LDA model, the recognition rate of positive and negative validation samples were 94.3%, 96.0%; and were 92.5%, 98.0% for the optimal SG-PCA-LDA model, respectively. Vis-NIR spectro- scopic pattern recognition combined with SG smoothing could be used for accurate recognition of transgenic sugarcane leaves, and provided a convenient screening method for transgenic sugarcane breeding.

  10. Bayesian and Phylogenic Approaches for Studying Relationships among Table Olive Cultivars.

    PubMed

    Ben Ayed, Rayda; Ennouri, Karim; Ben Amar, Fathi; Moreau, Fabienne; Triki, Mohamed Ali; Rebai, Ahmed

    2017-08-01

    To enhance table olive tree authentication, relationship, and productivity, we consider the analysis of 18 worldwide table olive cultivars (Olea europaea L.) based on morphological, biological, and physicochemical markers analyzed by bioinformatic and biostatistic tools. Accordingly, we assess the relationships between the studied varieties, on the one hand, and the potential productivity-quantitative parameter links on the other hand. The bioinformatic analysis based on the graphical representation of the matrix of Euclidean distances, the principal components analysis, unweighted pair group method with arithmetic mean, and principal coordinate analysis (PCoA) revealed three major clusters which were not correlated with the geographic origin. The statistical analysis based on Kendall's and Spearman correlation coefficients suggests two highly significant associations with both fruit color and pollinization and the productivity character. These results are confirmed by the multiple linear regression prediction models. In fact, based on the coefficient of determination (R 2 ) value, the best model demonstrated the power of the pollinization on the tree productivity (R 2  = 0.846). Moreover, the derived directed acyclic graph showed that only two direct influences are detected: effect of tolerance on fruit and stone symmetry on side and effect of tolerance on stone form and oil content on the other side. This work provides better understanding of the diversity available in worldwide table olive cultivars and supplies an important contribution for olive breeding and authenticity.

  11. Origin of fecal contamination in waters from contrasted areas: stanols as Microbial Source Tracking markers.

    PubMed

    Derrien, M; Jardé, E; Gruau, G; Pourcher, A M; Gourmelon, M; Jadas-Hécart, A; Pierson Wickmann, A C

    2012-09-01

    Improving the microbiological quality of coastal and river waters relies on the development of reliable markers that are capable of determining sources of fecal pollution. Recently, a principal component analysis (PCA) method based on six stanol compounds (i.e. 5β-cholestan-3β-ol (coprostanol), 5β-cholestan-3α-ol (epicoprostanol), 24-methyl-5α-cholestan-3β-ol (campestanol), 24-ethyl-5α-cholestan-3β-ol (sitostanol), 24-ethyl-5β-cholestan-3β-ol (24-ethylcoprostanol) and 24-ethyl-5β-cholestan-3α-ol (24-ethylepicoprostanol)) was shown to be suitable for distinguishing between porcine and bovine feces. In this study, we tested if this PCA method, using the above six stanols, could be used as a tool in "Microbial Source Tracking (MST)" methods in water from areas of intensive agriculture where diffuse fecal contamination is often marked by the co-existence of human and animal sources. In particular, well-defined and stable clusters were found in PCA score plots clustering samples of "pure" human, bovine and porcine feces along with runoff and diluted waters in which the source of contamination is known. A good consistency was also observed between the source assignments made by the 6-stanol-based PCA method and the microbial markers for river waters contaminated by fecal matter of unknown origin. More generally, the tests conducted in this study argue for the addition of the PCA method based on six stanols in the MST toolbox to help identify fecal contamination sources. The data presented in this study show that this addition would improve the determination of fecal contamination sources when the contamination levels are low to moderate. Copyright © 2012 Elsevier Ltd. All rights reserved.

  12. Dietary patterns and socioeconomic position.

    PubMed

    Mullie, P; Clarys, P; Hulens, M; Vansant, G

    2010-03-01

    To test a socioeconomic hypothesis on three dietary patterns and to describe the relation between three commonly used methods to determine dietary patterns, namely Healthy Eating Index, Mediterranean Diet Score and principal component analysis. Cross-sectional design in 1852 military men. Using mailed questionnaires, the food consumption frequency was recorded. The correlation coefficients between the three dietary patterns varied between 0.43 and 0.62. The highest correlation was found between Healthy Eating Index and Healthy Dietary Pattern (principal components analysis). Cohen's kappa coefficient of agreement varied between 0.10 and 0.20. After age-adjustment, education and income remained associated with the most healthy dietary pattern. Even when both socioeconomic indicators were used together in one model, higher income and education were associated with higher scores for Healthy Eating Index, Mediterranean Diet Score and Healthy Dietary Pattern. The least healthy quintiles of dietary pattern as measured by the three methods were associated with a clustering of unhealthy behaviors, that is, smoking, low physical activity, highest intake of total fat and saturated fatty acids, and low intakes of fruits and vegetables. The three dietary patterns used indicated that the most healthy patterns were associated with a higher socioeconomic position, while lower patterns were associated with several unhealthy behaviors.

  13. A SAR and QSAR study of new artemisinin compounds with antimalarial activity.

    PubMed

    Santos, Cleydson Breno R; Vieira, Josinete B; Lobato, Cleison C; Hage-Melim, Lorane I S; Souto, Raimundo N P; Lima, Clarissa S; Costa, Elizabeth V M; Brasil, Davi S B; Macêdo, Williams Jorge C; Carvalho, José Carlos T

    2013-12-30

    The Hartree-Fock method and the 6-31G** basis set were employed to calculate the molecular properties of artemisinin and 20 derivatives with antimalarial activity. Maps of molecular electrostatic potential (MEPs) and molecular docking were used to investigate the interaction between ligands and the receptor (heme). Principal component analysis and hierarchical cluster analysis were employed to select the most important descriptors related to activity. The correlation between biological activity and molecular properties was obtained using the partial least squares and principal component regression methods. The regression PLS and PCR models built in this study were also used to predict the antimalarial activity of 30 new artemisinin compounds with unknown activity. The models obtained showed not only statistical significance but also predictive ability. The significant molecular descriptors related to the compounds with antimalarial activity were the hydration energy (HE), the charge on the O11 oxygen atom (QO11), the torsion angle O1-O2-Fe-N2 (D2) and the maximum rate of R/Sanderson Electronegativity (RTe+). These variables led to a physical and structural explanation of the molecular properties that should be selected for when designing new ligands to be used as antimalarial agents.

  14. Regionalization of precipitation characteristics in Iran's Lake Urmia basin

    NASA Astrophysics Data System (ADS)

    Fazel, Nasim; Berndtsson, Ronny; Uvo, Cintia Bertacchi; Madani, Kaveh; Kløve, Bjørn

    2018-04-01

    Lake Urmia in northwest Iran, once one of the largest hypersaline lakes in the world, has shrunk by almost 90% in area and 80% in volume during the last four decades. To improve the understanding of regional differences in water availability throughout the region and to refine the existing information on precipitation variability, this study investigated the spatial pattern of precipitation for the Lake Urmia basin. Daily rainfall time series from 122 precipitation stations with different record lengths were used to extract 15 statistical descriptors comprising 25th percentile, 75th percentile, and coefficient of variation for annual and seasonal total precipitation. Principal component analysis in association with cluster analysis identified three main homogeneous precipitation groups in the lake basin. The first sub-region (group 1) includes stations located in the center and southeast; the second sub-region (group 2) covers mostly northern and northeastern part of the basin, and the third sub-region (group 3) covers the western and southern edges of the basin. Results of principal component (PC) and clustering analyses showed that seasonal precipitation variation is the most important feature controlling the spatial pattern of precipitation in the lake basin. The 25th and 75th percentiles of winter and autumn are the most important variables controlling the spatial pattern of the first rotated principal component explaining about 32% of the total variance. Summer and spring precipitation variations are the most important variables in the second and third rotated principal components, respectively. Seasonal variation in precipitation amount and seasonality are explained by topography and influenced by the lake and westerly winds that are related to the strength of the North Atlantic Oscillation. Despite using incomplete time series with different lengths, the identified sub-regions are physically meaningful.

  15. Interpretable functional principal component analysis.

    PubMed

    Lin, Zhenhua; Wang, Liangliang; Cao, Jiguo

    2016-09-01

    Functional principal component analysis (FPCA) is a popular approach to explore major sources of variation in a sample of random curves. These major sources of variation are represented by functional principal components (FPCs). The intervals where the values of FPCs are significant are interpreted as where sample curves have major variations. However, these intervals are often hard for naïve users to identify, because of the vague definition of "significant values". In this article, we develop a novel penalty-based method to derive FPCs that are only nonzero precisely in the intervals where the values of FPCs are significant, whence the derived FPCs possess better interpretability than the FPCs derived from existing methods. To compute the proposed FPCs, we devise an efficient algorithm based on projection deflation techniques. We show that the proposed interpretable FPCs are strongly consistent and asymptotically normal under mild conditions. Simulation studies confirm that with a competitive performance in explaining variations of sample curves, the proposed FPCs are more interpretable than the traditional counterparts. This advantage is demonstrated by analyzing two real datasets, namely, electroencephalography data and Canadian weather data. © 2015, The International Biometric Society.

  16. Metal mixtures in urban and rural populations in the US: The Multi-Ethnic Study of Atherosclerosis and the Strong Heart Study.

    PubMed

    Pang, Yuanjie; Peng, Roger D; Jones, Miranda R; Francesconi, Kevin A; Goessler, Walter; Howard, Barbara V; Umans, Jason G; Best, Lyle G; Guallar, Eliseo; Post, Wendy S; Kaufman, Joel D; Vaidya, Dhananjay; Navas-Acien, Ana

    2016-05-01

    Natural and anthropogenic sources of metal exposure differ for urban and rural residents. We searched to identify patterns of metal mixtures which could suggest common environmental sources and/or metabolic pathways of different urinary metals, and compared metal-mixtures in two population-based studies from urban/sub-urban and rural/town areas in the US: the Multi-Ethnic Study of Atherosclerosis (MESA) and the Strong Heart Study (SHS). We studied a random sample of 308 White, Black, Chinese-American, and Hispanic participants in MESA (2000-2002) and 277 American Indian participants in SHS (1998-2003). We used principal component analysis (PCA), cluster analysis (CA), and linear discriminant analysis (LDA) to evaluate nine urinary metals (antimony [Sb], arsenic [As], cadmium [Cd], lead [Pb], molybdenum [Mo], selenium [Se], tungsten [W], uranium [U] and zinc [Zn]). For arsenic, we used the sum of inorganic and methylated species (∑As). All nine urinary metals were higher in SHS compared to MESA participants. PCA and CA revealed the same patterns in SHS, suggesting 4 distinct principal components (PC) or clusters (∑As-U-W, Pb-Sb, Cd-Zn, Mo-Se). In MESA, CA showed 2 large clusters (∑As-Mo-Sb-U-W, Cd-Pb-Se-Zn), while PCA showed 4 PCs (Sb-U-W, Pb-Se-Zn, Cd-Mo, ∑As). LDA indicated that ∑As, U, W, and Zn were the most discriminant variables distinguishing MESA and SHS participants. In SHS, the ∑As-U-W cluster and PC might reflect groundwater contamination in rural areas, and the Cd-Zn cluster and PC could reflect common sources from meat products or metabolic interactions. Among the metals assayed, ∑As, U, W and Zn differed the most between MESA and SHS, possibly reflecting disproportionate exposure from drinking water and perhaps food in rural Native communities compared to urban communities around the US. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. Analysis of genetic relationships and identification of lily cultivars based on inter-simple sequence repeat markers.

    PubMed

    Cui, G F; Wu, L F; Wang, X N; Jia, W J; Duan, Q; Ma, L L; Jiang, Y L; Wang, J H

    2014-07-29

    Inter-simple sequence repeat (ISSR) markers were used to discriminate 62 lily cultivars of 5 hybrid series. Eight ISSR primers generated 104 bands in total, which all showed 100% polymorphism, and an average of 13 bands were amplified by each primer. Two software packages, POPGENE 1.32 and NTSYSpc 2.1, were used to analyze the data matrix. Our results showed that the observed number of alleles (NA), effective number of alleles (NE), Nei's genetic diversity (H), and Shannon's information index (I) were 1.9630, 1.4179, 0.2606, and 0.4080, respectively. The highest genetic similarity (0.9601) was observed between the Oriental x Trumpet and Oriental lilies, which indicated that the two hybrids had a close genetic relationship. An unweighted pair-group method with arithmetic means dendrogram showed that the 62 lily cultivars clustered into two discrete groups. The first group included the Oriental and OT cultivars, while the Asiatic, LA, and Longiflorum lilies were placed in the second cluster. The distribution of individuals in the principal component analysis was consistent with the clustering of the dendrogram. Fingerprints of all lily cultivars built from 8 primers could be separated completely. This study confirmed the effect and efficiency of ISSR identification in lily cultivars.

  18. An improved principal component analysis based region matching method for fringe direction estimation

    NASA Astrophysics Data System (ADS)

    He, A.; Quan, C.

    2018-04-01

    The principal component analysis (PCA) and region matching combined method is effective for fringe direction estimation. However, its mask construction algorithm for region matching fails in some circumstances, and the algorithm for conversion of orientation to direction in mask areas is computationally-heavy and non-optimized. We propose an improved PCA based region matching method for the fringe direction estimation, which includes an improved and robust mask construction scheme, and a fast and optimized orientation-direction conversion algorithm for the mask areas. Along with the estimated fringe direction map, filtered fringe pattern by automatic selective reconstruction modification and enhanced fast empirical mode decomposition (ASRm-EFEMD) is used for Hilbert spiral transform (HST) to demodulate the phase. Subsequently, windowed Fourier ridge (WFR) method is used for the refinement of the phase. The robustness and effectiveness of proposed method are demonstrated by both simulated and experimental fringe patterns.

  19. Improving 3d Spatial Queries Search: Newfangled Technique of Space Filling Curves in 3d City Modeling

    NASA Astrophysics Data System (ADS)

    Uznir, U.; Anton, F.; Suhaibah, A.; Rahman, A. A.; Mioc, D.

    2013-09-01

    The advantages of three dimensional (3D) city models can be seen in various applications including photogrammetry, urban and regional planning, computer games, etc.. They expand the visualization and analysis capabilities of Geographic Information Systems on cities, and they can be developed using web standards. However, these 3D city models consume much more storage compared to two dimensional (2D) spatial data. They involve extra geometrical and topological information together with semantic data. Without a proper spatial data clustering method and its corresponding spatial data access method, retrieving portions of and especially searching these 3D city models, will not be done optimally. Even though current developments are based on an open data model allotted by the Open Geospatial Consortium (OGC) called CityGML, its XML-based structure makes it challenging to cluster the 3D urban objects. In this research, we propose an opponent data constellation technique of space-filling curves (3D Hilbert curves) for 3D city model data representation. Unlike previous methods, that try to project 3D or n-dimensional data down to 2D or 3D using Principal Component Analysis (PCA) or Hilbert mappings, in this research, we extend the Hilbert space-filling curve to one higher dimension for 3D city model data implementations. The query performance was tested using a CityGML dataset of 1,000 building blocks and the results are presented in this paper. The advantages of implementing space-filling curves in 3D city modeling will improve data retrieval time by means of optimized 3D adjacency, nearest neighbor information and 3D indexing. The Hilbert mapping, which maps a subinterval of the [0, 1] interval to the corresponding portion of the d-dimensional Hilbert's curve, preserves the Lebesgue measure and is Lipschitz continuous. Depending on the applications, several alternatives are possible in order to cluster spatial data together in the third dimension compared to its clustering in 2D.

  20. Longitudinal Patterns of Glycemic Control and Blood Pressure in Pregnant Women with Type 1 Diabetes Mellitus: Phenotypes from Functional Data Analysis.

    PubMed

    Szczesniak, Rhonda D; Li, Dan; Duan, Leo L; Altaye, Mekibib; Miodovnik, Menachem; Khoury, Jane C

    2016-11-01

    Objective  To identify phenotypes of type 1 diabetes control and associations with maternal/neonatal characteristics based on blood pressure (BP), glucose, and insulin curves during gestation, using a novel functional data analysis approach that accounts for sparse longitudinal patterns of medical monitoring during pregnancy. Methods  We performed a retrospective longitudinal cohort study of women with type 1 diabetes whose BP, glucose, and insulin requirements were monitored throughout gestation as part of a program-project grant. Scores from sparse functional principal component analysis (fPCA) were used to classify gestational profiles according to the degree of control for each monitored measure. Phenotypes created using fPCA were compared with respect to maternal and neonatal characteristics and outcome. Results  Most of the gestational profile variation in the monitored measures was explained by the first principal component (82-94%). Profiles clustered into three subgroups of high, moderate, or low heterogeneity, relative to the overall mean response. Phenotypes were associated with baseline characteristics, longitudinal changes in glycohemoglobin A1 and weight, and to pregnancy-related outcomes. Conclusion  Three distinct longitudinal patterns of glucose, insulin, and BP control were found. By identifying these phenotypes, interventions can be targeted for subgroups at highest risk for compromised outcome, to optimize diabetes management during pregnancy. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.

  1. Tuning the Voices of a Choir: Detecting Ecological Gradients in Time-Series Populations.

    PubMed

    Buras, Allan; van der Maaten-Theunissen, Marieke; van der Maaten, Ernst; Ahlgrimm, Svenja; Hermann, Philipp; Simard, Sonia; Heinrich, Ingo; Helle, Gerd; Unterseher, Martin; Schnittler, Martin; Eusemann, Pascal; Wilmking, Martin

    2016-01-01

    This paper introduces a new approach-the Principal Component Gradient Analysis (PCGA)-to detect ecological gradients in time-series populations, i.e. several time-series originating from different individuals of a population. Detection of ecological gradients is of particular importance when dealing with time-series from heterogeneous populations which express differing trends. PCGA makes use of polar coordinates of loadings from the first two axes obtained by principal component analysis (PCA) to define groups of similar trends. Based on the mean inter-series correlation (rbar) the gain of increasing a common underlying signal by PCGA groups is quantified using Monte Carlo Simulations. In terms of validation PCGA is compared to three other existing approaches. Focusing on dendrochronological examples, PCGA is shown to correctly determine population gradients and in particular cases to be advantageous over other considered methods. Furthermore, PCGA groups in each example allowed for enhancing the strength of a common underlying signal and comparably well as hierarchical cluster analysis. Our results indicate that PCGA potentially allows for a better understanding of mechanisms causing time-series population gradients as well as objectively enhancing the performance of climate transfer functions in dendroclimatology. While our examples highlight the relevance of PCGA to the field of dendrochronology, we believe that also other disciplines working with data of comparable structure may benefit from PCGA.

  2. Tuning the Voices of a Choir: Detecting Ecological Gradients in Time-Series Populations

    PubMed Central

    Buras, Allan; van der Maaten-Theunissen, Marieke; van der Maaten, Ernst; Ahlgrimm, Svenja; Hermann, Philipp; Simard, Sonia; Heinrich, Ingo; Helle, Gerd; Unterseher, Martin; Schnittler, Martin; Eusemann, Pascal; Wilmking, Martin

    2016-01-01

    This paper introduces a new approach–the Principal Component Gradient Analysis (PCGA)–to detect ecological gradients in time-series populations, i.e. several time-series originating from different individuals of a population. Detection of ecological gradients is of particular importance when dealing with time-series from heterogeneous populations which express differing trends. PCGA makes use of polar coordinates of loadings from the first two axes obtained by principal component analysis (PCA) to define groups of similar trends. Based on the mean inter-series correlation (rbar) the gain of increasing a common underlying signal by PCGA groups is quantified using Monte Carlo Simulations. In terms of validation PCGA is compared to three other existing approaches. Focusing on dendrochronological examples, PCGA is shown to correctly determine population gradients and in particular cases to be advantageous over other considered methods. Furthermore, PCGA groups in each example allowed for enhancing the strength of a common underlying signal and comparably well as hierarchical cluster analysis. Our results indicate that PCGA potentially allows for a better understanding of mechanisms causing time-series population gradients as well as objectively enhancing the performance of climate transfer functions in dendroclimatology. While our examples highlight the relevance of PCGA to the field of dendrochronology, we believe that also other disciplines working with data of comparable structure may benefit from PCGA. PMID:27467508

  3. Global population structure and adaptive evolution of aflatoxin-producing fungi

    USDA-ARS?s Scientific Manuscript database

    We employed interspecific principal component analyses for six different categories (geography, species, precipitation, temperature, aflatoxin chemotype profile, and mating type) and inferred maximum likelihood phylogenies for six combined loci, including two aflatoxin cluster regions (aflM/alfN and...

  4. Genetic differences in the two main groups of the Japanese population based on autosomal SNPs and haplotypes.

    PubMed

    Yamaguchi-Kabata, Yumi; Tsunoda, Tatsuhiko; Kumasaka, Natsuhiko; Takahashi, Atsushi; Hosono, Naoya; Kubo, Michiaki; Nakamura, Yusuke; Kamatani, Naoyuki

    2012-05-01

    Although the Japanese population has a rather low genetic diversity, we recently confirmed the presence of two main clusters (the Hondo and Ryukyu clusters) through principal component analysis of genome-wide single-nucleotide polymorphism (SNP) genotypes. Understanding the genetic differences between the two main clusters requires further genome-wide analyses based on a dense SNP set and comparison of haplotype frequencies. In the present study, we determined haplotypes for the Hondo cluster of the Japanese population by detecting SNP homozygotes with 388,591 autosomal SNPs from 18,379 individuals and estimated the haplotype frequencies. Haplotypes for the Ryukyu cluster were inferred by a statistical approach using the genotype data from 504 individuals. We then compared the haplotype frequencies between the Hondo and Ryukyu clusters. In most genomic regions, the haplotype frequencies in the Hondo and Ryukyu clusters were very similar. However, in addition to the human leukocyte antigen region on chromosome 6, other genomic regions (chromosomes 3, 4, 5, 7, 10 and 12) showed dissimilarities in haplotype frequency. These regions were enriched for genes involved in the immune system, cell-cell adhesion and the intracellular signaling cascade. These differentiated genomic regions between the Hondo and Ryukyu clusters are of interest because they (1) should be examined carefully in association studies and (2) likely contain genes responsible for morphological or physiological differences between the two groups.

  5. Real Time Intelligent Target Detection and Analysis with Machine Vision

    NASA Technical Reports Server (NTRS)

    Howard, Ayanna; Padgett, Curtis; Brown, Kenneth

    2000-01-01

    We present an algorithm for detecting a specified set of targets for an Automatic Target Recognition (ATR) application. ATR involves processing images for detecting, classifying, and tracking targets embedded in a background scene. We address the problem of discriminating between targets and nontarget objects in a scene by evaluating 40x40 image blocks belonging to an image. Each image block is first projected onto a set of templates specifically designed to separate images of targets embedded in a typical background scene from those background images without targets. These filters are found using directed principal component analysis which maximally separates the two groups. The projected images are then clustered into one of n classes based on a minimum distance to a set of n cluster prototypes. These cluster prototypes have previously been identified using a modified clustering algorithm based on prior sensed data. Each projected image pattern is then fed into the associated cluster's trained neural network for classification. A detailed description of our algorithm will be given in this paper. We outline our methodology for designing the templates, describe our modified clustering algorithm, and provide details on the neural network classifiers. Evaluation of the overall algorithm demonstrates that our detection rates approach 96% with a false positive rate of less than 0.03%.

  6. Hyperspectral Image Denoising Using a Nonlocal Spectral Spatial Principal Component Analysis

    NASA Astrophysics Data System (ADS)

    Li, D.; Xu, L.; Peng, J.; Ma, J.

    2018-04-01

    Hyperspectral images (HSIs) denoising is a critical research area in image processing duo to its importance in improving the quality of HSIs, which has a negative impact on object detection and classification and so on. In this paper, we develop a noise reduction method based on principal component analysis (PCA) for hyperspectral imagery, which is dependent on the assumption that the noise can be removed by selecting the leading principal components. The main contribution of paper is to introduce the spectral spatial structure and nonlocal similarity of the HSIs into the PCA denoising model. PCA with spectral spatial structure can exploit spectral correlation and spatial correlation of HSI by using 3D blocks instead of 2D patches. Nonlocal similarity means the similarity between the referenced pixel and other pixels in nonlocal area, where Mahalanobis distance algorithm is used to estimate the spatial spectral similarity by calculating the distance in 3D blocks. The proposed method is tested on both simulated and real hyperspectral images, the results demonstrate that the proposed method is superior to several other popular methods in HSI denoising.

  7. Kinematic foot types in youth with equinovarus secondary to hemiplegia.

    PubMed

    Krzak, Joseph J; Corcos, Daniel M; Damiano, Diane L; Graf, Adam; Hedeker, Donald; Smith, Peter A; Harris, Gerald F

    2015-02-01

    Elevated kinematic variability of the foot and ankle segments exists during gait among individuals with equinovarus secondary to hemiplegic cerebral palsy (CP). Clinicians have previously addressed such variability by developing classification schemes to identify subgroups of individuals based on their kinematics. To identify kinematic subgroups among youth with equinovarus secondary to CP using 3-dimensional multi-segment foot and ankle kinematics during locomotion as inputs for principal component analysis (PCA), and K-means cluster analysis. In a single assessment session, multi-segment foot and ankle kinematics using the Milwaukee Foot Model (MFM) were collected in 24 children/adolescents with equinovarus and 20 typically developing children/adolescents. PCA was used as a data reduction technique on 40 variables. K-means cluster analysis was performed on the first six principal components (PCs) which accounted for 92% of the variance of the dataset. The PCs described the location and plane of involvement in the foot and ankle. Five distinct kinematic subgroups were identified using K-means clustering. Participants with equinovarus presented with variable involvement ranging from primary hindfoot or forefoot deviations to deformtiy that included both segments in multiple planes. This study provides further evidence of the variability in foot characteristics associated with equinovarus secondary to hemiplegic CP. These findings would not have been detected using a single segment foot model. The identification of multiple kinematic subgroups with unique foot and ankle characteristics has the potential to improve treatment since similar patients within a subgroup are likely to benefit from the same intervention(s). Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Radar fall detection using principal component analysis

    NASA Astrophysics Data System (ADS)

    Jokanovic, Branka; Amin, Moeness; Ahmad, Fauzia; Boashash, Boualem

    2016-05-01

    Falls are a major cause of fatal and nonfatal injuries in people aged 65 years and older. Radar has the potential to become one of the leading technologies for fall detection, thereby enabling the elderly to live independently. Existing techniques for fall detection using radar are based on manual feature extraction and require significant parameter tuning in order to provide successful detections. In this paper, we employ principal component analysis for fall detection, wherein eigen images of observed motions are employed for classification. Using real data, we demonstrate that the PCA based technique provides performance improvement over the conventional feature extraction methods.

  9. Incremental Principal Component Analysis Based Outlier Detection Methods for Spatiotemporal Data Streams

    NASA Astrophysics Data System (ADS)

    Bhushan, A.; Sharker, M. H.; Karimi, H. A.

    2015-07-01

    In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams.

  10. Registration algorithm of point clouds based on multiscale normal features

    NASA Astrophysics Data System (ADS)

    Lu, Jun; Peng, Zhongtao; Su, Hang; Xia, GuiHua

    2015-01-01

    The point cloud registration technology for obtaining a three-dimensional digital model is widely applied in many areas. To improve the accuracy and speed of point cloud registration, a registration method based on multiscale normal vectors is proposed. The proposed registration method mainly includes three parts: the selection of key points, the calculation of feature descriptors, and the determining and optimization of correspondences. First, key points are selected from the point cloud based on the changes of magnitude of multiscale curvatures obtained by using principal components analysis. Then the feature descriptor of each key point is proposed, which consists of 21 elements based on multiscale normal vectors and curvatures. The correspondences in a pair of two point clouds are determined according to the descriptor's similarity of key points in the source point cloud and target point cloud. Correspondences are optimized by using a random sampling consistency algorithm and clustering technology. Finally, singular value decomposition is applied to optimized correspondences so that the rigid transformation matrix between two point clouds is obtained. Experimental results show that the proposed point cloud registration algorithm has a faster calculation speed, higher registration accuracy, and better antinoise performance.

  11. Proteomic analysis of cow, yak, buffalo, goat and camel milk whey proteins: quantitative differential expression patterns.

    PubMed

    Yang, Yongxin; Bu, Dengpan; Zhao, Xiaowei; Sun, Peng; Wang, Jiaqi; Zhou, Lingyun

    2013-04-05

    To aid in unraveling diverse genetic and biological unknowns, a proteomic approach was used to analyze the whey proteome in cow, yak, buffalo, goat, and camel milk based on the isobaric tag for relative and absolute quantification (iTRAQ) techniques. This analysis is the first to produce proteomic data for the milk from the above-mentioned animal species: 211 proteins have been identified and 113 proteins have been categorized according to molecular function, cellular components, and biological processes based on gene ontology annotation. The results of principal component analysis showed significant differences in proteomic patterns among goat, camel, cow, buffalo, and yak milk. Furthermore, 177 differentially expressed proteins were submitted to advanced hierarchical clustering. The resulting clustering pattern included three major sample clusters: (1) cow, buffalo, and yak milk; (2) goat, cow, buffalo, and yak milk; and (3) camel milk. Certain proteins were chosen as characterization traits for a given species: whey acidic protein and quinone oxidoreductase for camel milk, biglycan for goat milk, uncharacterized protein (Accession Number: F1MK50 ) for yak milk, clusterin for buffalo milk, and primary amine oxidase for cow milk. These results help reveal the quantitative milk whey proteome pattern for analyzed species. This provides information for evaluating adulteration of specific specie milk and may provide potential directions for application of specific milk protein production based on physiological differences among animal species.

  12. Essential oil variation among natural populations of Lavandula multifida L. (Lamiaceae).

    PubMed

    Chograni, Hnia; Zaouali, Yosr; Rajeb, Chayma; Boussaid, Mohamed

    2010-04-01

    Volatiles from twelve wild Tunisian populations of Lavandula multifida L. growing in different bioclimatic zones were assessed by GC (RI) and GC/MS. Thirty-six constituents, representing 83.48% of the total oil were identified. The major components at the species level were carvacrol (31.81%), beta-bisabolene (14.89%), and acrylic acid dodecyl ester (11.43%). These volatiles, together with alpha-pinene, were also the main compounds discriminating the populations. According to these dominant compounds, one chemotype was revealed, a carvacrol/beta-bisabolene/acrylic acid dodecyl ester chemotype. However, a significant variation among the populations was observed for the majority of the constituents. A high chemical-population structure, estimated both by principal component analysis (PCA) and unweighted pair group method with averaging (UPGMA) cluster analysis based on Euclidean distances, was observed. Both methods allowed separation of the populations in three groups defined rather by minor than by major compounds. The population groups were not strictly concordant with their bioclimatic or geographic location. Conservation strategies should concern all populations, because of their low size and their high level of destruction. Populations exhibiting particular compounds other than the major ones should be protected first.

  13. Hydrochemical and multivariate analysis of groundwater quality in the northwest of Sinai, Egypt.

    PubMed

    El-Shahat, M F; Sadek, M A; Salem, W M; Embaby, A A; Mohamed, F A

    2017-08-01

    The northwestern coast of Sinai is home to many economic activities and development programs, thus evaluation of the potentiality and vulnerability of water resources is important. The present work has been conducted on the groundwater resources of this area for describing the major features of groundwater quality and the principal factors that control salinity evolution. The major ionic content of 39 groundwater samples collected from the Quaternary aquifer shows high coefficients of variation reflecting asymmetry of aquifer recharge. The groundwater samples have been classified into four clusters (using hierarchical cluster analysis), these match the variety of total dissolvable solids, water types and ionic orders. The principal component analysis combined the ionic parameters of the studied groundwater samples into two principal components. The first represents about 56% of the whole sample variance reflecting a salinization due to evaporation, leaching, dissolution of marine salts and/or seawater intrusion. The second represents about 15.8% reflecting dilution with rain water and the El-Salam Canal. Most groundwater samples were not suitable for human consumption and about 41% are suitable for irrigation. However, all groundwater samples are suitable for cattle, about 69% and 15% are suitable for horses and poultry, respectively.

  14. Methods of Implementation of Evidence-Based Stroke Care in Europe: European Implementation Score Collaboration.

    PubMed

    Di Carlo, Antonio; Pezzella, Francesca Romana; Fraser, Alec; Bovis, Francesca; Baeza, Juan; McKevitt, Chris; Boaz, Annette; Heuschmann, Peter; Wolfe, Charles D A; Inzitari, Domenico

    2015-08-01

    Differences in stroke care and outcomes reported in Europe may reflect different degrees of implementation of evidence-based interventions. We evaluated strategies for implementing research evidence into stroke care in 10 European countries. A questionnaire was developed and administered through face-to-face interviews with key informants. Implementation strategies were investigated considering 3 levels (macro, meso, and micro, eg, policy, organization, patients/professionals) identified by the framing analysis, and different settings (primary, hospital, and specialist) of stroke care. Similarities and differences among countries were evaluated using the categorical principal components analysis. Implementation methods reported by ≥7 countries included nonmandatory policies, public financial incentives, continuing professional education, distribution of educational material, educational meetings and campaigns, guidelines, opinion leaders', and stroke patients associations' activities. Audits were present in 6 countries at national level; national and regional regulations in 4 countries. Private financial incentives, reminders, and educational outreach visits were reported only in 2 countries. At national level, the first principal component of categorical principal components analysis separated England, France, Scotland, and Sweden, all with positive object scores, from the other countries. Belgium and Lithuania obtained the lowest scores. At regional level, England, France, Germany, Italy, and Sweden had positive scores in the first principal component, whereas Belgium, Lithuania, Poland, and Scotland showed negative scores. Spain was in an intermediate position. We developed a novel method to assess different domains of implementation in stroke care. Clear variations were observed among European countries. The new tool may be used elsewhere for future contributions. © 2015 American Heart Association, Inc.

  15. Multi-wavelength HPLC fingerprints from complex substances: An exploratory chemometrics study of the Cassia seed example.

    PubMed

    Ni, Yongnian; Lai, Yanhua; Brandes, Sarina; Kokot, Serge

    2009-08-11

    Multi-wavelength fingerprints of Cassia seed, a traditional Chinese medicine (TCM), were collected by high-performance liquid chromatography (HPLC) at two wavelengths with the use of diode array detection. The two data sets of chromatograms were combined by the data fusion-based method. This data set of fingerprints was compared separately with the two data sets collected at each of the two wavelengths. It was demonstrated with the use of principal component analysis (PCA), that multi-wavelength fingerprints provided a much improved representation of the differences in the samples. Thereafter, the multi-wavelength fingerprint data set was submitted for classification to a suite of chemometrics methods viz. fuzzy clustering (FC), SIMCA and the rank ordering MCDM PROMETHEE and GAIA. Each method highlighted different properties of the data matrix according to the fingerprints from different types of Cassia seeds. In general, the PROMETHEE and GAIA MCDM methods provided the most comprehensive information for matching and discrimination of the fingerprints, and appeared to be best suited for quality assurance purposes for these and similar types of sample.

  16. Differentiation of aflatoxigenic and non-aflatoxigenic strains of Aspergilli by FT-IR spectroscopy.

    PubMed

    Atkinson, Curtis; Pechanova, Olga; Sparks, Darrell L; Brown, Ashli; Rodriguez, Jose M

    2014-01-01

    Fourier transform infrared spectroscopy (FT-IR) is a well-established and widely accepted methodology to identify and differentiate diverse microbial species. In this study, FT-IR was used to differentiate 20 strains of ubiquitous and agronomically important phytopathogens of Aspergillus flavus and Aspergillus parasiticus. By analyzing their spectral profiles via principal component and cluster analysis, differentiation was achieved between the aflatoxin-producing and nonproducing strains of both fungal species. This study thus indicates that FT-IR coupled to multivariate statistics can rapidly differentiate strains of Aspergilli based on their toxigenicity.

  17. Factor Analysis and Counseling Research

    ERIC Educational Resources Information Center

    Weiss, David J.

    1970-01-01

    Topics discussed include factor analysis versus cluster analysis, analysis of Q correlation matrices, ipsativity and factor analysis, and tests for the significance of a correlation matrix prior to application of factor analytic techniques. Techniques for factor extraction discussed include principal components, canonical factor analysis, alpha…

  18. Inferring the use of forelimb suspensory locomotion by extinct primate species via shape exploration of the ulna.

    PubMed

    Rein, Thomas R; Harvati, Katerina; Harrison, Terry

    2015-01-01

    Uncovering links between skeletal morphology and locomotor behavior is an essential component of paleobiology because it allows researchers to infer the locomotor repertoire of extinct species based on preserved fossils. In this study, we explored ulnar shape in anthropoid primates using 3D geometric morphometrics to discover novel aspects of shape variation that correspond to observed differences in the relative amount of forelimb suspensory locomotion performed by species. The ultimate goal of this research was to construct an accurate predictive model that can be applied to infer the significance of these behaviors. We studied ulnar shape variation in extant species using principal component analysis. Species mainly clustered into phylogenetic groups along the first two principal components. Upon closer examination, the results showed that the position of species within each major clade corresponded closely with the proportion of forelimb suspensory locomotion that they have been observed to perform in nature. We used principal component regression to construct a predictive model for the proportion of these behaviors that would be expected to occur in the locomotor repertoire of anthropoid primates. We then applied this regression analysis to Pliopithecus vindobonensis, a stem catarrhine from the Miocene of central Europe, and found strong evidence that this species was adapted to perform a proportion of forelimb suspensory locomotion similar to that observed in the extant woolly monkey, Lagothrix lagothricha. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. A SPECTRAL GRAPH APPROACH TO DISCOVERING GENETIC ANCESTRY1

    PubMed Central

    Lee, Ann B.; Luca, Diana; Roeder, Kathryn

    2010-01-01

    Mapping human genetic variation is fundamentally interesting in fields such as anthropology and forensic inference. At the same time, patterns of genetic diversity confound efforts to determine the genetic basis of complex disease. Due to technological advances, it is now possible to measure hundreds of thousands of genetic variants per individual across the genome. Principal component analysis (PCA) is routinely used to summarize the genetic similarity between subjects. The eigenvectors are interpreted as dimensions of ancestry. We build on this idea using a spectral graph approach. In the process we draw on connections between multidimensional scaling and spectral kernel methods. Our approach, based on a spectral embedding derived from the normalized Laplacian of a graph, can produce more meaningful delineation of ancestry than by using PCA. The method is stable to outliers and can more easily incorporate different similarity measures of genetic data than PCA. We illustrate a new algorithm for genetic clustering and association analysis on a large, genetically heterogeneous sample. PMID:20689656

  20. Characterization and Differentiation of Petroleum-Derived Products by E-Nose Fingerprints

    PubMed Central

    Ferreiro-González, Marta; Palma, Miguel; Ayuso, Jesús; Álvarez, José A.; Barroso, Carmelo G.

    2017-01-01

    Characterization of petroleum-derived products is an area of continuing importance in environmental science, mainly related to fuel spills. In this study, a non-separative analytical method based on E-Nose (Electronic Nose) is presented as a rapid alternative for the characterization of several different petroleum-derived products including gasoline, diesel, aromatic solvents, and ethanol samples, which were poured onto different surfaces (wood, cork, and cotton). The working conditions about the headspace generation were 145 °C and 10 min. Mass spectroscopic data (45–200 m/z) combined with chemometric tools such as hierarchical cluster analysis (HCA), later principal component analysis (PCA), and finally linear discriminant analysis (LDA) allowed for a full discrimination of the samples. A characteristic fingerprint for each product can be used for discrimination or identification. The E-Nose can be considered as a green technique, and it is rapid and easy to use in routine analysis, thus providing a good alternative to currently used methods. PMID:29113069

  1. Cortical hypometabolism and hypoperfusion in Parkinson's disease is extensive: probably even at early disease stages.

    PubMed

    Borghammer, Per; Chakravarty, Mallar; Jonsdottir, Kristjana Yr; Sato, Noriko; Matsuda, Hiroshi; Ito, Kengo; Arahata, Yutaka; Kato, Takashi; Gjedde, Albert

    2010-05-01

    Recent cerebral blood flow (CBF) and glucose consumption (CMRglc) studies of Parkinson's disease (PD) revealed conflicting results. Using simulated data, we previously demonstrated that the often-reported subcortical hypermetabolism in PD could be explained as an artifact of biased global mean (GM) normalization, and that low-magnitude, extensive cortical hypometabolism is best detected by alternative data-driven normalization methods. Thus, we hypothesized that PD is characterized by extensive cortical hypometabolism but no concurrent widespread subcortical hypermetabolism and tested it on three independent samples of PD patients. We compared SPECT CBF images of 32 early-stage and 33 late-stage PD patients with that of 60 matched controls. We also compared PET FDG images from 23 late-stage PD patients with that of 13 controls. Three different normalization methods were compared: (1) GM normalization, (2) cerebellum normalization, (3) reference cluster normalization (Yakushev et al.). We employed standard voxel-based statistics (fMRIstat) and principal component analysis (SSM). Additionally, we performed a meta-analysis of all quantitative CBF and CMRglc studies in the literature to investigate whether the global mean (GM) values in PD are decreased. Voxel-based analysis with GM normalization and the SSM method performed similarly, i.e., both detected decreases in small cortical clusters and concomitant increases in extensive subcortical regions. Cerebellum normalization revealed more widespread cortical decreases but no subcortical increase. In all comparisons, the Yakushev method detected nearly identical patterns of very extensive cortical hypometabolism. Lastly, the meta-analyses demonstrated that global CBF and CMRglc values are decreased in PD. Based on the results, we conclude that PD most likely has widespread cortical hypometabolism, even at early disease stages. In contrast, extensive subcortical hypermetabolism is probably not a feature of PD.

  2. Risk-Based Prioritization Method for the Classification of Groundwater Pollution from Hazardous Waste Landfills.

    PubMed

    Yang, Yu; Jiang, Yong-Hai; Lian, Xin-Ying; Xi, Bei-Dou; Ma, Zhi-Fei; Xu, Xiang-Jian; An, Da

    2016-12-01

    Hazardous waste landfill sites are a significant source of groundwater pollution. To ensure that these landfills with a significantly high risk of groundwater contamination are properly managed, a risk-based ranking method related to groundwater contamination is needed. In this research, a risk-based prioritization method for the classification of groundwater pollution from hazardous waste landfills was established. The method encompasses five phases, including risk pre-screening, indicator selection, characterization, classification and, lastly, validation. In the risk ranking index system employed here, 14 indicators involving hazardous waste landfills and migration in the vadose zone as well as aquifer were selected. The boundary of each indicator was determined by K-means cluster analysis and the weight of each indicator was calculated by principal component analysis. These methods were applied to 37 hazardous waste landfills in China. The result showed that the risk for groundwater contamination from hazardous waste landfills could be ranked into three classes from low to high risk. In all, 62.2 % of the hazardous waste landfill sites were classified in the low and medium risk classes. The process simulation method and standardized anomalies were used to validate the result of risk ranking; the results were consistent with the simulated results related to the characteristics of contamination. The risk ranking method was feasible, valid and can provide reference data related to risk management for groundwater contamination at hazardous waste landfill sites.

  3. Chemical Composition of Juniperus communis L. Cone Essential Oil and Its Variability among Wild Populations in Kosovo.

    PubMed

    Hajdari, Avni; Mustafa, Behxhet; Nebija, Dashnor; Miftari, Elheme; Quave, Cassandra L; Novak, Johannes

    2015-11-01

    Ripe cones of Juniperus communis L. (Cupressaceae) were collected from five wild populations in Kosovo, with the aim of investigating the chemical composition and natural variation of essential oils between and within wild populations. Ripe cones were collected, air dried, crushed, and the essential oils obtained by hydrodistillation. The essential-oil constituents were identified by GC-FID and GC/MS analyses. The yield of essential oil differed depending on the population origins and ranged from 0.4 to 3.8% (v/w, based on the dry weight). In total, 42 compounds were identified in the essential oils of all populations. The principal components of the cone-essential oils were α-pinene, followed by β-myrcene, sabinene, and D-limonene. Taking into consideration the yield and chemical composition, the essential oil originating from various collection sites in Kosovo fulfilled the minimum requirements for J. communis essential oils of the European Pharmacopoeia. Hierarchical cluster analysis (HCA) and principal component analysis (PCA) were used to determine the influence of the geographical variations on the essential-oil composition. These statistical analyses suggested that the clustering of populations was not related to their geographic location, but rather appeared to be linked to local selective forces acting on the chemotype diversity. Copyright © 2015 Verlag Helvetica Chimica Acta AG, Zürich.

  4. Analysis of PETT images in psychiatric disorders

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brodie, J.D.; Gomez-Mont, F.; Volkow, N.D.

    1983-01-01

    A quantitative method is presented for studying the pattern of metabolic activity in a set of Positron Emission Transaxial Tomography (PETT) images. Using complex Fourier coefficients as a feature vector for each image, cluster, principal components, and discriminant function analyses are used to empirically describe metabolic differences between control subjects and patients with DSM III diagnosis for schizophrenia or endogenous depression. We also present data on the effects of neuroleptic treatment on the local cerebral metabolic rate of glucose utilization (LCMRGI) in a group of chronic schizophrenics using the region of interest approach. 15 references, 4 figures, 3 tables.

  5. The relationship between leadership, teamworking, structure, burnout and attitude to patients on acute psychiatric wards

    PubMed Central

    Nijman, Henk; Simpson, Alan; Jones, Julia

    2010-01-01

    Background Conflict (aggression, substance use, absconding, etc.) and containment (coerced medication, manual restraint, etc.) threaten the safety of patients and staff on psychiatric wards. Previous work has suggested that staff variables may be significant in explaining differences between wards in their rates of these behaviours, and that structure (ward organisation, rules and daily routines) might be the most critical of these. This paper describes the exploration of a large dataset to assess the relationship between structure and other staff variables. Methods A multivariate cross-sectional design was utilised. Data were collected from staff on 136 acute psychiatric wards in 26 NHS Trusts in England, measuring leadership, teamwork, structure, burnout and attitudes towards difficult patients. Relationships between these variables were explored through principal components analysis (PCA), structural equation modelling and cluster analysis. Results Principal components analysis resulted in the identification of each questionnaire as a separate factor, indicating that the selected instruments assessed a number of non-overlapping items relevant for ward functioning. Structural equation modelling suggested a linear model in which leadership influenced teamwork, teamwork structure; structure burnout; and burnout feelings about difficult patients. Finally, cluster analysis identified two significantly distinct groups of wards: the larger of which had particularly good leadership, teamwork, structure, attitudes towards patients and low burnout; and the second smaller proportion which was poor on all variables and high on burnout. The better functioning cluster of wards had significantly lower rates of containment events. Conclusion The overall performance of staff teams is associated with differing rates of containment on wards. Interventions to reduce rates of containment on wards may need to address staff issues at every level, from leadership through to staff attitudes. PMID:20082064

  6. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  7. Exploring functional data analysis and wavelet principal component analysis on ecstasy (MDMA) wastewater data.

    PubMed

    Salvatore, Stefania; Bramness, Jørgen G; Røislien, Jo

    2016-07-12

    Wastewater-based epidemiology (WBE) is a novel approach in drug use epidemiology which aims to monitor the extent of use of various drugs in a community. In this study, we investigate functional principal component analysis (FPCA) as a tool for analysing WBE data and compare it to traditional principal component analysis (PCA) and to wavelet principal component analysis (WPCA) which is more flexible temporally. We analysed temporal wastewater data from 42 European cities collected daily over one week in March 2013. The main temporal features of ecstasy (MDMA) were extracted using FPCA using both Fourier and B-spline basis functions with three different smoothing parameters, along with PCA and WPCA with different mother wavelets and shrinkage rules. The stability of FPCA was explored through bootstrapping and analysis of sensitivity to missing data. The first three principal components (PCs), functional principal components (FPCs) and wavelet principal components (WPCs) explained 87.5-99.6 % of the temporal variation between cities, depending on the choice of basis and smoothing. The extracted temporal features from PCA, FPCA and WPCA were consistent. FPCA using Fourier basis and common-optimal smoothing was the most stable and least sensitive to missing data. FPCA is a flexible and analytically tractable method for analysing temporal changes in wastewater data, and is robust to missing data. WPCA did not reveal any rapid temporal changes in the data not captured by FPCA. Overall the results suggest FPCA with Fourier basis functions and common-optimal smoothing parameter as the most accurate approach when analysing WBE data.

  8. Superpixel-based spectral classification for the detection of head and neck cancer with hyperspectral imaging

    NASA Astrophysics Data System (ADS)

    Chung, Hyunkoo; Lu, Guolan; Tian, Zhiqiang; Wang, Dongsheng; Chen, Zhuo Georgia; Fei, Baowei

    2016-03-01

    Hyperspectral imaging (HSI) is an emerging imaging modality for medical applications. HSI acquires two dimensional images at various wavelengths. The combination of both spectral and spatial information provides quantitative information for cancer detection and diagnosis. This paper proposes using superpixels, principal component analysis (PCA), and support vector machine (SVM) to distinguish regions of tumor from healthy tissue. The classification method uses 2 principal components decomposed from hyperspectral images and obtains an average sensitivity of 93% and an average specificity of 85% for 11 mice. The hyperspectral imaging technology and classification method can have various applications in cancer research and management.

  9. Rapid and sensitive analysis of 27 underivatized free amino acids, dipeptides, and tripeptides in fruits of Siraitia grosvenorii Swingle using HILIC-UHPLC-QTRAP(®)/MS (2) combined with chemometrics methods.

    PubMed

    Zhou, Guisheng; Wang, Mengyue; Li, Yang; Peng, Ying; Li, Xiaobo

    2015-08-01

    In the present study, a new strategy based on chemical analysis and chemometrics methods was proposed for the comprehensive analysis and profiling of underivatized free amino acids (FAAs) and small peptides among various Luo-Han-Guo (LHG) samples. Firstly, the ultrasound-assisted extraction (UAE) parameters were optimized using Plackett-Burman (PB) screening and Box-Behnken designs (BBD), and the following optimal UAE conditions were obtained: ultrasound power of 280 W, extraction time of 43 min, and the solid-liquid ratio of 302 mL/g. Secondly, a rapid and sensitive analytical method was developed for simultaneous quantification of 24 FAAs and 3 active small peptides in LHG at trace levels using hydrophilic interaction ultra-performance liquid chromatography coupled with triple-quadrupole linear ion-trap tandem mass spectrometry (HILIC-UHPLC-QTRAP(®)/MS(2)). The analytical method was validated by matrix effects, linearity, LODs, LOQs, precision, repeatability, stability, and recovery. Thirdly, the proposed optimal UAE conditions and analytical methods were applied to measurement of LHG samples. It was shown that LHG was rich in essential amino acids, which were beneficial nutrient substances for human health. Finally, based on the contents of the 27 analytes, the chemometrics methods of unsupervised principal component analysis (PCA) and supervised counter propagation artificial neural network (CP-ANN) were applied to differentiate and classify the 40 batches of LHG samples from different cultivated forms, regions, and varieties. As a result, these samples were mainly clustered into three clusters, which illustrated the cultivating disparity among the samples. In summary, the presented strategy had potential for the investigation of edible plants and agricultural products containing FAAs and small peptides.

  10. Mapping brain activity in gradient-echo functional MRI using principal component analysis

    NASA Astrophysics Data System (ADS)

    Khosla, Deepak; Singh, Manbir; Don, Manuel

    1997-05-01

    The detection of sites of brain activation in functional MRI has been a topic of immense research interest and many technique shave been proposed to this end. Recently, principal component analysis (PCA) has been applied to extract the activated regions and their time course of activation. This method is based on the assumption that the activation is orthogonal to other signal variations such as brain motion, physiological oscillations and other uncorrelated noises. A distinct advantage of this method is that it does not require any knowledge of the time course of the true stimulus paradigm. This technique is well suited to EPI image sequences where the sampling rate is high enough to capture the effects of physiological oscillations. In this work, we propose and apply tow methods that are based on PCA to conventional gradient-echo images and investigate their usefulness as tools to extract reliable information on brain activation. The first method is a conventional technique where a single image sequence with alternating on and off stages is subject to a principal component analysis. The second method is a PCA-based approach called the common spatial factor analysis technique (CSF). As the name suggests, this method relies on common spatial factors between the above fMRI image sequence and a background fMRI. We have applied these methods to identify active brain ares during visual stimulation and motor tasks. The results from these methods are compared to those obtained by using the standard cross-correlation technique. We found good agreement in the areas identified as active across all three techniques. The results suggest that PCA and CSF methods have good potential in detecting the true stimulus correlated changes in the presence of other interfering signals.

  11. Discrimination of chicken seasonings and beef seasonings using electronic nose and sensory evaluation.

    PubMed

    Tian, Huaixiang; Li, Fenghua; Qin, Lan; Yu, Haiyan; Ma, Xia

    2014-11-01

    This study examines the feasibility of electronic nose as a method to discriminate chicken and beef seasonings and to predict sensory attributes. Sensory evaluation showed that 8 chicken seasonings and 4 beef seasonings could be well discriminated and classified based on 8 sensory attributes. The sensory attributes including chicken/beef, gamey, garlic, spicy, onion, soy sauce, retention, and overall aroma intensity were generated by a trained evaluation panel. Principal component analysis (PCA), discriminant factor analysis (DFA), and cluster analysis (CA) combined with electronic nose were used to discriminate seasoning samples based on the difference of the sensor response signals of chicken and beef seasonings. The correlation between sensory attributes and electronic nose sensors signal was established using partial least squares regression (PLSR) method. The results showed that the seasoning samples were all correctly classified by the electronic nose combined with PCA, DFA, and CA. The electronic nose gave good prediction results for all the sensory attributes with correlation coefficient (r) higher than 0.8. The work indicated that electronic nose is an effective method for discriminating different seasonings and predicting sensory attributes. © 2014 Institute of Food Technologists®

  12. Gambling, games of skill and human ecology: a pilot study by a multidimensional analysis approach.

    PubMed

    Valera, Luca; Giuliani, Alessandro; Gizzi, Alessio; Tartaglia, Francesco; Tambone, Vittoradolfo

    2015-01-01

    The present pilot study aims at analyzing the human activity of playing in the light of an indicator of human ecology (HE). We highlighted the four essential anthropological dimensions (FEAD), starting from the analysis of questionnaires administered to actual gamers. The coherence between theoretical construct and observational data is a remarkable proof-of-concept of the possibility of establishing an experimentally motivated link between a philosophical construct (coming from Huizinga's Homo ludens definition) and actual gamers' motivation pattern. The starting hypothesis is that the activity of playing becomes ecological (and thus not harmful) when it achieves the harmony between the FEAD, thus realizing HE; conversely, it becomes at risk of creating some form of addiction, when destroying FEAD balance. We analyzed the data by means of variable clustering (oblique principal components) so to experimentally verify the existence of the hypothesized dimensions. The subsequent projection of statistical units (gamers) on the orthogonal space spanned by principal components allowed us to generate a meaningful, albeit preliminary, clusterization of gamer profiles.

  13. Recursive approach of EEG-segment-based principal component analysis substantially reduces cryogenic pump artifacts in simultaneous EEG-fMRI data.

    PubMed

    Kim, Hyun-Chul; Yoo, Seung-Schik; Lee, Jong-Hwan

    2015-01-01

    Electroencephalography (EEG) data simultaneously acquired with functional magnetic resonance imaging (fMRI) data are preprocessed to remove gradient artifacts (GAs) and ballistocardiographic artifacts (BCAs). Nonetheless, these data, especially in the gamma frequency range, can be contaminated by residual artifacts produced by mechanical vibrations in the MRI system, in particular the cryogenic pump that compresses and transports the helium that chills the magnet (the helium-pump). However, few options are available for the removal of helium-pump artifacts. In this study, we propose a recursive approach of EEG-segment-based principal component analysis (rsPCA) that enables the removal of these helium-pump artifacts. Using the rsPCA method, feature vectors representing helium-pump artifacts were successfully extracted as eigenvectors, and the reconstructed signals of the feature vectors were subsequently removed. A test using simultaneous EEG-fMRI data acquired from left-hand (LH) and right-hand (RH) clenching tasks performed by volunteers found that the proposed rsPCA method substantially reduced helium-pump artifacts in the EEG data and significantly enhanced task-related gamma band activity levels (p=0.0038 and 0.0363 for LH and RH tasks, respectively) in EEG data that have had GAs and BCAs removed. The spatial patterns of the fMRI data were estimated using a hemodynamic response function (HRF) modeled from the estimated gamma band activity in a general linear model (GLM) framework. Active voxel clusters were identified in the post-/pre-central gyri of motor area, only from the rsPCA method (uncorrected p<0.001 for both LH/RH tasks). In addition, the superior temporal pole areas were consistently observed (uncorrected p<0.001 for the LH task and uncorrected p<0.05 for the RH task) in the spatial patterns of the HRF model for gamma band activity when the task paradigm and movement were also included in the GLM. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. A novel combined approach of diffuse reflectance UV-Vis-NIR spectroscopy and multivariate analysis for non-destructive examination of blue ballpoint pen inks in forensic application

    NASA Astrophysics Data System (ADS)

    Kumar, Raj; Sharma, Vishal

    2017-03-01

    The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%).

  15. Automated X-Ray Diffraction of Irradiated Materials

    DOE PAGES

    Rodman, John; Lin, Yuewei; Sprouster, David; ...

    2017-10-26

    Synchrotron-based X-ray diffraction (XRD) and small-angle Xray scattering (SAXS) characterization techniques used on unirradiated and irradiated reactor pressure vessel steels yield large amounts of data. Machine learning techniques, including PCA, offer a novel method of analyzing and visualizing these large data sets in order to determine the effects of chemistry and irradiation conditions on the formation of radiation induced precipitates. In order to run analysis on these data sets, preprocessing must be carried out to convert the data to a usable format and mask the 2-D detector images to account for experimental variations. Once the data has been preprocessed, itmore » can be organized and visualized using principal component analysis (PCA), multi-dimensional scaling, and k-means clustering. In conclusion, from these techniques, it is shown that sample chemistry has a notable effect on the formation of the radiation induced precipitates in reactor pressure vessel steels.« less

  16. [Sources and potential risk of heavy metals in roadside soils of Xi' an City].

    PubMed

    Chen, Jing-hui; Lu, Xin-wei; Zhai, Meng

    2011-07-01

    Based on the X-Ray fluorescence spectroscopic measurement of heavy metals concentration in roadside soil samples from Xi' an City, and by the methods of principal component analysis, cluster analysis, and correlation analysis, this paper approached the possible sources of heavy metals in the roadside soils of the City. In the meantime, potential ecological risk index was used to assess the ecological risk of the heavy metals. In the roadside soils, the mean concentrations of Co, Cr, Cu, Mn, Ni, Pb, and Zn were higher than those of the Shaanxi soil background values. The As, Mn and Ni in roadside soils mainly came from natural source and transportation source, the Cu, Pb, and Zn mainly came from transportation source, and the Co and Cr mainly came from industry source. These heavy metals in the roadside soils belonged to medium pollution, and had medium potential ecological risk.

  17. Comparative Analysis of a Principal Component Analysis-Based and an Artificial Neural Network-Based Method for Baseline Removal.

    PubMed

    Carvajal, Roberto C; Arias, Luis E; Garces, Hugo O; Sbarbaro, Daniel G

    2016-04-01

    This work presents a non-parametric method based on a principal component analysis (PCA) and a parametric one based on artificial neural networks (ANN) to remove continuous baseline features from spectra. The non-parametric method estimates the baseline based on a set of sampled basis vectors obtained from PCA applied over a previously composed continuous spectra learning matrix. The parametric method, however, uses an ANN to filter out the baseline. Previous studies have demonstrated that this method is one of the most effective for baseline removal. The evaluation of both methods was carried out by using a synthetic database designed for benchmarking baseline removal algorithms, containing 100 synthetic composed spectra at different signal-to-baseline ratio (SBR), signal-to-noise ratio (SNR), and baseline slopes. In addition to deomonstrating the utility of the proposed methods and to compare them in a real application, a spectral data set measured from a flame radiation process was used. Several performance metrics such as correlation coefficient, chi-square value, and goodness-of-fit coefficient were calculated to quantify and compare both algorithms. Results demonstrate that the PCA-based method outperforms the one based on ANN both in terms of performance and simplicity. © The Author(s) 2016.

  18. Efficient principal component analysis for multivariate 3D voxel-based mapping of brain functional imaging data sets as applied to FDG-PET and normal aging.

    PubMed

    Zuendorf, Gerhard; Kerrouche, Nacer; Herholz, Karl; Baron, Jean-Claude

    2003-01-01

    Principal component analysis (PCA) is a well-known technique for reduction of dimensionality of functional imaging data. PCA can be looked at as the projection of the original images onto a new orthogonal coordinate system with lower dimensions. The new axes explain the variance in the images in decreasing order of importance, showing correlations between brain regions. We used an efficient, stable and analytical method to work out the PCA of Positron Emission Tomography (PET) images of 74 normal subjects using [(18)F]fluoro-2-deoxy-D-glucose (FDG) as a tracer. Principal components (PCs) and their relation to age effects were investigated. Correlations between the projections of the images on the new axes and the age of the subjects were carried out. The first two PCs could be identified as being the only PCs significantly correlated to age. The first principal component, which explained 10% of the data set variance, was reduced only in subjects of age 55 or older and was related to loss of signal in and adjacent to ventricles and basal cisterns, reflecting expected age-related brain atrophy with enlarging CSF spaces. The second principal component, which accounted for 8% of the total variance, had high loadings from prefrontal, posterior parietal and posterior cingulate cortices and showed the strongest correlation with age (r = -0.56), entirely consistent with previously documented age-related declines in brain glucose utilization. Thus, our method showed that the effect of aging on brain metabolism has at least two independent dimensions. This method should have widespread applications in multivariate analysis of brain functional images. Copyright 2002 Wiley-Liss, Inc.

  19. Nonlinear Principal Components Analysis: Introduction and Application

    ERIC Educational Resources Information Center

    Linting, Marielle; Meulman, Jacqueline J.; Groenen, Patrick J. F.; van der Koojj, Anita J.

    2007-01-01

    The authors provide a didactic treatment of nonlinear (categorical) principal components analysis (PCA). This method is the nonlinear equivalent of standard PCA and reduces the observed variables to a number of uncorrelated principal components. The most important advantages of nonlinear over linear PCA are that it incorporates nominal and ordinal…

  20. Variety identification of brown sugar using short-wave near infrared spectroscopy and multivariate calibration

    NASA Astrophysics Data System (ADS)

    Yang, Haiqing; Wu, Di; He, Yong

    2007-11-01

    Near-infrared spectroscopy (NIRS) with the characteristics of high speed, non-destructiveness, high precision and reliable detection data, etc. is a pollution-free, rapid, quantitative and qualitative analysis method. A new approach for variety discrimination of brown sugars using short-wave NIR spectroscopy (800-1050nm) was developed in this work. The relationship between the absorbance spectra and brown sugar varieties was established. The spectral data were compressed by the principal component analysis (PCA). The resulting features can be visualized in principal component (PC) space, which can lead to discovery of structures correlative with the different class of spectral samples. It appears to provide a reasonable variety clustering of brown sugars. The 2-D PCs plot obtained using the first two PCs can be used for the pattern recognition. Least-squares support vector machines (LS-SVM) was applied to solve the multivariate calibration problems in a relatively fast way. The work has shown that short-wave NIR spectroscopy technique is available for the brand identification of brown sugar, and LS-SVM has the better identification ability than PLS when the calibration set is small.

  1. Establishment of one-step approach to detoxification of hypertoxic aconite based on the evaluation of alkaloids contents and quality.

    PubMed

    Zhang, Ding-Kun; Han, Xue; Tan, Peng; Li, Rui-Yu; Niu, Ming; Zhang, Cong-En; Wang, Jia-Bo; Yang, Ming; Xiao, Xiao-He

    2017-01-01

    Aconite is a valuable drug and also a toxic material, which can be used only after detoxification processing. Although traditional processing methods can achieve detoxification effect as desired, there are some obvious drawbacks, including a significant loss of alkaloids and poor quality consistency. It is thus necessary to develop a new detoxification approach. In the present study, we designed a novel one-step detoxification approach by quickly drying fresh-cut aconite particles. In order to evaluate the technical advantages, the contents of mesaconitine, aconitine, hypaconitine, benzoylmesaconine, benzoylaconine, benzoylhypaconine, neoline, fuziline, songorine, and talatisamine were determined using HPLC and UHPLC/Q-TOF-MS. Multivariate analysis methods, such as Clustering analysis and Principle component analysis, were applied to determine the quality differences between samples. Our results showed that traditional processes could reduce toxicity as desired, but also led to more than 85.2% alkaloids loss. However, our novel one-step method was capable of achieving virtually the same detoxification effect, with only an approximately 30% alkaloids loss. Cluster analysis and Principal component analysis analyses suggested that Shengfupian and the novel products were significantly different from various traditional products. Acute toxicity testing showed that the novel products achieved a good detoxification effect, with its maximum tolerated dose being equivalent to 20 times of adult dosage. And cardiac effect testing also showed that the activity of the novel products was stronger than that of traditional products. Moreover, particles specification greatly improved the quality consistency of the novel products, which was immensely superior to the traditional products. These results would help guide the rational optimization of aconite processing technologies, providing better drugs for clinical treatment. Copyright © 2017 China Pharmaceutical University. Published by Elsevier B.V. All rights reserved.

  2. Multivariate proteomic profiling identifies novel accessory proteins of coated vesicles

    PubMed Central

    Antrobus, Robin; Hirst, Jennifer; Bhumbra, Gary S.; Kozik, Patrycja; Jackson, Lauren P.; Sahlender, Daniela A.

    2012-01-01

    Despite recent advances in mass spectrometry, proteomic characterization of transport vesicles remains challenging. Here, we describe a multivariate proteomics approach to analyzing clathrin-coated vesicles (CCVs) from HeLa cells. siRNA knockdown of coat components and different fractionation protocols were used to obtain modified coated vesicle-enriched fractions, which were compared by stable isotope labeling of amino acids in cell culture (SILAC)-based quantitative mass spectrometry. 10 datasets were combined through principal component analysis into a “profiling” cluster analysis. Overall, 136 CCV-associated proteins were predicted, including 36 new proteins. The method identified >93% of established CCV coat proteins and assigned >91% correctly to intracellular or endocytic CCVs. Furthermore, the profiling analysis extends to less well characterized types of coated vesicles, and we identify and characterize the first AP-4 accessory protein, which we have named tepsin. Finally, our data explain how sequestration of TACC3 in cytosolic clathrin cages causes the severe mitotic defects observed in auxilin-depleted cells. The profiling approach can be adapted to address related cell and systems biological questions. PMID:22472443

  3. Diversity and genetic stability in banana genotypes in a breeding program using inter simple sequence repeats (ISSR) markers.

    PubMed

    Silva, A V C; Nascimento, A L S; Vitória, M F; Rabbani, A R C; Soares, A N R; Lédo, A S

    2017-02-23

    Banana (Musa spp) is a fruit species frequently cultivated and consumed worldwide. Molecular markers are important for estimating genetic diversity in germplasm and between genotypes in breeding programs. The objective of this study was to analyze the genetic diversity of 21 banana genotypes (FHIA 23, PA42-44, Maçã, Pacovan Ken, Bucaneiro, YB42-47, Grand Naine, Tropical, FHIA 18, PA94-01, YB42-17, Enxerto, Japira, Pacovã, Prata-Anã, Maravilha, PV79-34, Caipira, Princesa, Garantida, and Thap Maeo), by using inter-simple sequence repeat (ISSR) markers. Material was generated from the banana breeding program of Embrapa Cassava & Fruits and evaluated at Embrapa Coastal Tablelands. The 12 primers used in this study generated 97.5% polymorphism. Four clusters were identified among the different genotypes studied, and the sum of the first two principal components was 48.91%. From the Unweighted Pair Group Method using Arithmetic averages (UPGMA) dendrogram, it was possible to identify two main clusters and subclusters. Two genotypes (Garantida and Thap Maeo) remained isolated from the others, both in the UPGMA clustering and in the principal cordinate analysis (PCoA). Using ISSR markers, we could analyze the genetic diversity of the studied material and state that these markers were efficient at detecting sufficient polymorphism to estimate the genetic variability in banana genotypes.

  4. A Quantum Chemical and Statistical Study of Phenolic Schiff Bases with Antioxidant Activity against DPPH Free Radical

    PubMed Central

    Anouar, El Hassane

    2014-01-01

    Phenolic Schiff bases are known as powerful antioxidants. To select the electronic, 2D and 3D descriptors responsible for the free radical scavenging ability of a series of 30 phenolic Schiff bases, a set of molecular descriptors were calculated by using B3P86 (Becke’s three parameter hybrid functional with Perdew 86 correlation functional) combined with 6-31 + G(d,p) basis set (i.e., at the B3P86/6-31 + G(d,p) level of theory). The chemometric methods, simple and multiple linear regressions (SLR and MLR), principal component analysis (PCA) and hierarchical cluster analysis (HCA) were employed to reduce the dimensionality and to investigate the relationship between the calculated descriptors and the antioxidant activity. The results showed that the antioxidant activity mainly depends on the first and second bond dissociation enthalpies of phenolic hydroxyl groups, the dipole moment and the hydrophobicity descriptors. The antioxidant activity is inversely proportional to the main descriptors. The selected descriptors discriminate the Schiff bases into active and inactive antioxidants. PMID:26784873

  5. [The application of the multidimensional statistical methods in the evaluation of the influence of atmospheric pollution on the population's health].

    PubMed

    Surzhikov, V D; Surzhikov, D V

    2014-01-01

    The search and measurement of causal relationships between exposure to air pollution and health state of the population is based on the system analysis and risk assessment to improve the quality of research. With this purpose there is applied the modern statistical analysis with the use of criteria of independence, principal component analysis and discriminate function analysis. As a result of analysis out of all atmospheric pollutants there were separated four main components: for diseases of the circulatory system main principal component is implied with concentrations of suspended solids, nitrogen dioxide, carbon monoxide, hydrogen fluoride, for the respiratory diseases the main c principal component is closely associated with suspended solids, sulfur dioxide and nitrogen dioxide, charcoal black. The discriminant function was shown to be used as a measure of the level of air pollution.

  6. Nuclear norm-based 2-DPCA for extracting features from images.

    PubMed

    Zhang, Fanlong; Yang, Jian; Qian, Jianjun; Xu, Yong

    2015-10-01

    The 2-D principal component analysis (2-DPCA) is a widely used method for image feature extraction. However, it can be equivalently implemented via image-row-based principal component analysis. This paper presents a structured 2-D method called nuclear norm-based 2-DPCA (N-2-DPCA), which uses a nuclear norm-based reconstruction error criterion. The nuclear norm is a matrix norm, which can provide a structured 2-D characterization for the reconstruction error image. The reconstruction error criterion is minimized by converting the nuclear norm-based optimization problem into a series of F-norm-based optimization problems. In addition, N-2-DPCA is extended to a bilateral projection-based N-2-DPCA (N-B2-DPCA). The virtue of N-B2-DPCA over N-2-DPCA is that an image can be represented with fewer coefficients. N-2-DPCA and N-B2-DPCA are applied to face recognition and reconstruction and evaluated using the Extended Yale B, CMU PIE, FRGC, and AR databases. Experimental results demonstrate the effectiveness of the proposed methods.

  7. An Efficient Method for Automatic Road Extraction Based on Multiple Features from LiDAR Data

    NASA Astrophysics Data System (ADS)

    Li, Y.; Hu, X.; Guan, H.; Liu, P.

    2016-06-01

    The road extraction in urban areas is difficult task due to the complicated patterns and many contextual objects. LiDAR data directly provides three dimensional (3D) points with less occlusions and smaller shadows. The elevation information and surface roughness are distinguishing features to separate roads. However, LiDAR data has some disadvantages are not beneficial to object extraction, such as the irregular distribution of point clouds and lack of clear edges of roads. For these problems, this paper proposes an automatic road centerlines extraction method which has three major steps: (1) road center point detection based on multiple feature spatial clustering for separating road points from ground points, (2) local principal component analysis with least squares fitting for extracting the primitives of road centerlines, and (3) hierarchical grouping for connecting primitives into complete roads network. Compared with MTH (consist of Mean shift algorithm, Tensor voting, and Hough transform) proposed in our previous article, this method greatly reduced the computational cost. To evaluate the proposed method, the Vaihingen data set, a benchmark testing data provided by ISPRS for "Urban Classification and 3D Building Reconstruction" project, was selected. The experimental results show that our method achieve the same performance by less time in road extraction using LiDAR data.

  8. Assessment of stem cell differentiation based on genome-wide expression profiles.

    PubMed

    Godoy, Patricio; Schmidt-Heck, Wolfgang; Hellwig, Birte; Nell, Patrick; Feuerborn, David; Rahnenführer, Jörg; Kattler, Kathrin; Walter, Jörn; Blüthgen, Nils; Hengstler, Jan G

    2018-07-05

    In recent years, protocols have been established to differentiate stem and precursor cells into more mature cell types. However, progress in this field has been hampered by difficulties to assess the differentiation status of stem cell-derived cells in an unbiased manner. Here, we present an analysis pipeline based on published data and methods to quantify the degree of differentiation and to identify transcriptional control factors explaining differences from the intended target cells or tissues. The pipeline requires RNA-Seq or gene array data of the stem cell starting population, derived 'mature' cells and primary target cells or tissue. It consists of a principal component analysis to represent global expression changes and to identify possible problems of the dataset that require special attention, such as: batch effects; clustering techniques to identify gene groups with similar features; over-representation analysis to characterize biological motifs and transcriptional control factors of the identified gene clusters; and metagenes as well as gene regulatory networks for quantitative cell-type assessment and identification of influential transcription factors. Possibilities and limitations of the analysis pipeline are illustrated using the example of human embryonic stem cell and human induced pluripotent cells to generate 'hepatocyte-like cells'. The pipeline quantifies the degree of incomplete differentiation as well as remaining stemness and identifies unwanted features, such as colon- and fibroblast-associated gene clusters that are absent in real hepatocytes but typically induced by currently available differentiation protocols. Finally, transcription factors responsible for incomplete and unwanted differentiation are identified. The proposed method is widely applicable and allows an unbiased and quantitative assessment of stem cell-derived cells.This article is part of the theme issue 'Designer human tissue: coming to a lab near you'. © 2018 The Author(s).

  9. Short communication: Discrimination between retail bovine milks with different fat contents using chemometrics and fatty acid profiling.

    PubMed

    Vargas-Bello-Pérez, Einar; Toro-Mujica, Paula; Enriquez-Hidalgo, Daniel; Fellenberg, María Angélica; Gómez-Cortés, Pilar

    2017-06-01

    We used a multivariate chemometric approach to differentiate or associate retail bovine milks with different fat contents and non-dairy beverages, using fatty acid profiles and statistical analysis. We collected samples of bovine milk (whole, semi-skim, and skim; n = 62) and non-dairy beverages (n = 27), and we analyzed them using gas-liquid chromatography. Principal component analysis of the fatty acid data yielded 3 significant principal components, which accounted for 72% of the total variance in the data set. Principal component 1 was related to saturated fatty acids (C4:0, C6:0, C8:0, C12:0, C14:0, C17:0, and C18:0) and monounsaturated fatty acids (C14:1 cis-9, C16:1 cis-9, C17:1 cis-9, and C18:1 trans-11); whole milk samples were clearly differentiated from the rest using this principal component. Principal component 2 differentiated semi-skim milk samples by n-3 fatty acid content (C20:3n-3, C20:5n-3, and C22:6n-3). Principal component 3 was related to C18:2 trans-9,trans-12 and C20:4n-6, and its lower scores were observed in skim milk and non-dairy beverages. A cluster analysis yielded 3 groups: group 1 consisted of only whole milk samples, group 2 was represented mainly by semi-skim milks, and group 3 included skim milk and non-dairy beverages. Overall, the present study showed that a multivariate chemometric approach is a useful tool for differentiating or associating retail bovine milks and non-dairy beverages using their fatty acid profile. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  10. Evaluation of genetic diversity in jackfruit (Artocarpus heterophyllus Lam.) based on amplified fragment length polymorphism markers.

    PubMed

    Shyamalamma, S; Chandra, S B C; Hegde, M; Naryanswamy, P

    2008-07-22

    Artocarpus heterophyllus Lam., commonly called jackfruit, is a medium-sized evergreen tree that bears high yields of the largest known edible fruit. Yet, it has been little explored commercially due to wide variation in fruit quality. The genetic diversity and genetic relatedness of 50 jackfruit accessions were studied using amplified fragment length polymorphism markers. Of 16 primer pairs evaluated, eight were selected for screening of genotypes based on the number and quality of polymorphic fragments produced. These primer combinations produced 5976 bands, 1267 (22%) of which were polymorphic. Among the jackfruit accessions, the similarity coefficient ranged from 0.137 to 0.978; the accessions also shared a large number of monomorphic fragments (78%). Cluster analysis and principal component analysis grouped all jackfruit genotypes into three major clusters. Cluster I included the genotypes grown in a jackfruit region of Karnataka, called Tamaka, with very dry conditions; cluster II contained the genotypes collected from locations having medium to heavy rainfall in Karnataka; cluster III grouped the genotypes in distant locations with different environmental conditions. Strong coincidence of these amplified fragment length polymorphism-based groupings with geographical localities as well as morphological characters was observed. We found moderate genetic diversity in these jackfruit accessions. This information should be useful for tree breeding programs, as part of our effort to popularize jackfruit as a commercial crop.

  11. Multivariate Statistical Analysis: a tool for groundwater quality assessment in the hidrogeologic region of the Ring of Cenotes, Yucatan, Mexico.

    NASA Astrophysics Data System (ADS)

    Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.

    2014-12-01

    The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.

  12. Q-mode versus R-mode principal component analysis for linear discriminant analysis (LDA)

    NASA Astrophysics Data System (ADS)

    Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz

    2017-05-01

    Many literature apply Principal Component Analysis (PCA) as either preliminary visualization or variable con-struction methods or both. Focus of PCA can be on the samples (R-mode PCA) or variables (Q-mode PCA). Traditionally, R-mode PCA has been the usual approach to reduce high-dimensionality data before the application of Linear Discriminant Analysis (LDA), to solve classification problems. Output from PCA composed of two new matrices known as loadings and scores matrices. Each matrix can then be used to produce a plot, i.e. loadings plot aids identification of important variables whereas scores plot presents spatial distribution of samples on new axes that are also known as Principal Components (PCs). Fundamentally, the scores matrix always be the input variables for building classification model. A recent paper uses Q-mode PCA but the focus of analysis was not on the variables but instead on the samples. As a result, the authors have exchanged the use of both loadings and scores plots in which clustering of samples was studied using loadings plot whereas scores plot has been used to identify important manifest variables. Therefore, the aim of this study is to statistically validate the proposed practice. Evaluation is based on performance of external error obtained from LDA models according to number of PCs. On top of that, bootstrapping was also conducted to evaluate the external error of each of the LDA models. Results show that LDA models produced by PCs from R-mode PCA give logical performance and the matched external error are also unbiased whereas the ones produced with Q-mode PCA show the opposites. With that, we concluded that PCs produced from Q-mode is not statistically stable and thus should not be applied to problems of classifying samples, but variables. We hope this paper will provide some insights on the disputable issues.

  13. Influence of Different Drying Treatments and Extraction Solvents on the Metabolite Profile and Nitric Oxide Inhibitory Activity of Ajwa Dates.

    PubMed

    Abdul-Hamid, Nur Ashikin; Abas, Faridah; Ismail, Intan Safinar; Shaari, Khozirah; Lajis, Nordin H

    2015-11-01

    This study aimed to examine the variation in the metabolite profiles and nitric oxide (NO) inhibitory activity of Ajwa dates that were subjected to 2 drying treatments and different extraction solvents. (1)H NMR coupled with multivariate data analysis was employed. A Griess assay was used to determine the inhibition of the production of NO in RAW 264.7 cells treated with LPS and interferon-γ. The oven dried (OD) samples demonstrated the absence of asparagine and ascorbic acid as compared to the freeze dried (FD) dates. The principal component analysis showed distinct clusters between the OD and FD dates by the second principal component. In respect of extraction solvents, chloroform extracts can be distinguished by the absence of arginine, glycine and asparagine compared to the methanol and 50% methanol extracts. The chloroform extracts can be clearly distinguished from the methanol and 50% methanol extracts by first principal component. Meanwhile, the loading score plot of partial least squares analysis suggested that beta glucose, alpha glucose, choline, ascorbic acid and glycine were among the metabolites that were contributing to higher biological activity displayed by FD and methanol extracts of Ajwa. The results highlight an alternative method of metabolomics approach for determination of the metabolites that contribute to NO inhibitory activity. The association between metabolite profiles and nitric oxide (NO) inhibitory activity of the various extracts of Ajwa dates was evaluated by utilizing partial least squares (PLS) model. The validated PLS model can be employed to predict the NO inhibitory activity of new samples of date fruits based on their NMR spectra which was important for assessing fruit quality. The information gained might be used as guidance for quality control, nutritional values and as a basis for the preparation of any food supplements for human health that employs date palm fruit as the raw material. © 2015 Institute of Food Technologists®

  14. Multivariate analysis of molecular and morphological diversity in fig (Ficus carica L.)

    USDA-ARS?s Scientific Manuscript database

    Genetic polymorphism across 15 microsatellite loci among 194 fig accessions including Common, Smyrna, San Pedro, and Caprifig were analyzed using a cluster analysis (CA) and the principal components analysis (PCA). The collection was moderately variable with observed number of alleles per locus rang...

  15. Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students

    ERIC Educational Resources Information Center

    Valero-Mora, Pedro M.; Ledesma, Ruben D.

    2011-01-01

    This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…

  16. Fuzzy Subspace Clustering

    NASA Astrophysics Data System (ADS)

    Borgelt, Christian

    In clustering we often face the situation that only a subset of the available attributes is relevant for forming clusters, even though this may not be known beforehand. In such cases it is desirable to have a clustering algorithm that automatically weights attributes or even selects a proper subset. In this paper I study such an approach for fuzzy clustering, which is based on the idea to transfer an alternative to the fuzzifier (Klawonn and Höppner, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier, In: Proc. 5th Int. Symp. on Intelligent Data Analysis, 254-264, Springer, Berlin, 2003) to attribute weighting fuzzy clustering (Keller and Klawonn, Int J Uncertain Fuzziness Knowl Based Syst 8:735-746, 2000). In addition, by reformulating Gustafson-Kessel fuzzy clustering, a scheme for weighting and selecting principal axes can be obtained. While in Borgelt (Feature weighting and feature selection in fuzzy clustering, In: Proc. 17th IEEE Int. Conf. on Fuzzy Systems, IEEE Press, Piscataway, NJ, 2008) I already presented such an approach for a global selection of attributes and principal axes, this paper extends it to a cluster-specific selection, thus arriving at a fuzzy subspace clustering algorithm (Parsons, Haque, and Liu, 2004).

  17. Orbit Clustering Based on Transfer Cost

    NASA Technical Reports Server (NTRS)

    Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

    2013-01-01

    We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.

  18. A stock market forecasting model combining two-directional two-dimensional principal component analysis and radial basis function neural network.

    PubMed

    Guo, Zhiqiang; Wang, Huaiqing; Yang, Jie; Miller, David J

    2015-01-01

    In this paper, we propose and implement a hybrid model combining two-directional two-dimensional principal component analysis ((2D)2PCA) and a Radial Basis Function Neural Network (RBFNN) to forecast stock market behavior. First, 36 stock market technical variables are selected as the input features, and a sliding window is used to obtain the input data of the model. Next, (2D)2PCA is utilized to reduce the dimension of the data and extract its intrinsic features. Finally, an RBFNN accepts the data processed by (2D)2PCA to forecast the next day's stock price or movement. The proposed model is used on the Shanghai stock market index, and the experiments show that the model achieves a good level of fitness. The proposed model is then compared with one that uses the traditional dimension reduction method principal component analysis (PCA) and independent component analysis (ICA). The empirical results show that the proposed model outperforms the PCA-based model, as well as alternative models based on ICA and on the multilayer perceptron.

  19. A Stock Market Forecasting Model Combining Two-Directional Two-Dimensional Principal Component Analysis and Radial Basis Function Neural Network

    PubMed Central

    Guo, Zhiqiang; Wang, Huaiqing; Yang, Jie; Miller, David J.

    2015-01-01

    In this paper, we propose and implement a hybrid model combining two-directional two-dimensional principal component analysis ((2D)2PCA) and a Radial Basis Function Neural Network (RBFNN) to forecast stock market behavior. First, 36 stock market technical variables are selected as the input features, and a sliding window is used to obtain the input data of the model. Next, (2D)2PCA is utilized to reduce the dimension of the data and extract its intrinsic features. Finally, an RBFNN accepts the data processed by (2D)2PCA to forecast the next day's stock price or movement. The proposed model is used on the Shanghai stock market index, and the experiments show that the model achieves a good level of fitness. The proposed model is then compared with one that uses the traditional dimension reduction method principal component analysis (PCA) and independent component analysis (ICA). The empirical results show that the proposed model outperforms the PCA-based model, as well as alternative models based on ICA and on the multilayer perceptron. PMID:25849483

  20. Conformational states and folding pathways of peptides revealed by principal-independent component analyses.

    PubMed

    Nguyen, Phuong H

    2007-05-15

    Principal component analysis is a powerful method for projecting multidimensional conformational space of peptides or proteins onto lower dimensional subspaces in which the main conformations are present, making it easier to reveal the structures of molecules from e.g. molecular dynamics simulation trajectories. However, the identification of all conformational states is still difficult if the subspaces consist of more than two dimensions. This is mainly due to the fact that the principal components are not independent with each other, and states in the subspaces cannot be visualized. In this work, we propose a simple and fast scheme that allows one to obtain all conformational states in the subspaces. The basic idea is that instead of directly identifying the states in the subspace spanned by principal components, we first transform this subspace into another subspace formed by components that are independent of one other. These independent components are obtained from the principal components by employing the independent component analysis method. Because of independence between components, all states in this new subspace are defined as all possible combinations of the states obtained from each single independent component. This makes the conformational analysis much simpler. We test the performance of the method by analyzing the conformations of the glycine tripeptide and the alanine hexapeptide. The analyses show that our method is simple and quickly reveal all conformational states in the subspaces. The folding pathways between the identified states of the alanine hexapeptide are analyzed and discussed in some detail. 2007 Wiley-Liss, Inc.

  1. Coarse Point Cloud Registration by Egi Matching of Voxel Clusters

    NASA Astrophysics Data System (ADS)

    Wang, Jinhu; Lindenbergh, Roderik; Shen, Yueqian; Menenti, Massimo

    2016-06-01

    Laser scanning samples the surface geometry of objects efficiently and records versatile information as point clouds. However, often more scans are required to fully cover a scene. Therefore, a registration step is required that transforms the different scans into a common coordinate system. The registration of point clouds is usually conducted in two steps, i.e. coarse registration followed by fine registration. In this study an automatic marker-free coarse registration method for pair-wise scans is presented. First the two input point clouds are re-sampled as voxels and dimensionality features of the voxels are determined by principal component analysis (PCA). Then voxel cells with the same dimensionality are clustered. Next, the Extended Gaussian Image (EGI) descriptor of those voxel clusters are constructed using significant eigenvectors of each voxel in the cluster. Correspondences between clusters in source and target data are obtained according to the similarity between their EGI descriptors. The random sampling consensus (RANSAC) algorithm is employed to remove outlying correspondences until a coarse alignment is obtained. If necessary, a fine registration is performed in a final step. This new method is illustrated on scan data sampling two indoor scenarios. The results of the tests are evaluated by computing the point to point distance between the two input point clouds. The presented two tests resulted in mean distances of 7.6 mm and 9.5 mm respectively, which are adequate for fine registration.

  2. Evaluation of different approaches for identifying optimal sites to predict mean hillslope soil moisture content

    NASA Astrophysics Data System (ADS)

    Liao, Kaihua; Zhou, Zhiwen; Lai, Xiaoming; Zhu, Qing; Feng, Huihui

    2017-04-01

    The identification of representative soil moisture sampling sites is important for the validation of remotely sensed mean soil moisture in a certain area and ground-based soil moisture measurements in catchment or hillslope hydrological studies. Numerous approaches have been developed to identify optimal sites for predicting mean soil moisture. Each method has certain advantages and disadvantages, but they have rarely been evaluated and compared. In our study, surface (0-20 cm) soil moisture data from January 2013 to March 2016 (a total of 43 sampling days) were collected at 77 sampling sites on a mixed land-use (tea and bamboo) hillslope in the hilly area of Taihu Lake Basin, China. A total of 10 methods (temporal stability (TS) analyses based on 2 indices, K-means clustering based on 6 kinds of inputs and 2 random sampling strategies) were evaluated for determining optimal sampling sites for mean soil moisture estimation. They were TS analyses based on the smallest index of temporal stability (ITS, a combination of the mean relative difference and standard deviation of relative difference (SDRD)) and based on the smallest SDRD, K-means clustering based on soil properties and terrain indices (EFs), repeated soil moisture measurements (Theta), EFs plus one-time soil moisture data (EFsTheta), and the principal components derived from EFs (EFs-PCA), Theta (Theta-PCA), and EFsTheta (EFsTheta-PCA), and global and stratified random sampling strategies. Results showed that the TS based on the smallest ITS was better (RMSE = 0.023 m3 m-3) than that based on the smallest SDRD (RMSE = 0.034 m3 m-3). The K-means clustering based on EFsTheta (-PCA) was better (RMSE <0.020 m3 m-3) than these based on EFs (-PCA) and Theta (-PCA). The sampling design stratified by the land use was more efficient than the global random method. Forty and 60 sampling sites are needed for stratified sampling and global sampling respectively to make their performances comparable to the best K-means method (EFsTheta-PCA). Overall, TS required only one site, but its accuracy was limited. The best K-means method required <8 sites and yielded high accuracy, but extra soil and terrain information is necessary when using this method. The stratified sampling strategy can only be used if no pre-knowledge about soil moisture variation is available. This information will help in selecting the optimal methods for estimation the area mean soil moisture.

  3. Constructivism in Practice: an Exploratory Study of Teaching Patterns and Student Motivation in Physics Classrooms in Finland, Germany and Switzerland

    NASA Astrophysics Data System (ADS)

    Beerenwinkel, Anne; von Arx, Matthias

    2017-04-01

    For the last three decades, moderate constructivism has become an increasingly prominent perspective in science education. Researchers have defined characteristics of constructivist-oriented science classrooms, but the implementation of such science teaching in daily classroom practice seems difficult. Against this background, we conducted a sub-study within the tri-national research project Quality of Instruction in Physics (QuIP) analysing 60 videotaped physics classes involving a large sample of students ( N = 1192) from Finland, Germany and Switzerland in order to investigate the kinds of constructivist components and teaching patterns that can be found in regular classrooms without any intervention. We applied a newly developed coding scheme to capture constructivist facets of science teaching and conducted principal component and cluster analyses to explore which components and patterns were most prominent in the classes observed. Two underlying components were found, resulting in two scales—Structured Knowledge Acquisition and Fostering Autonomy—which describe key aspects of constructivist teaching. Only the first scale was rather well established in the lessons investigated. Classes were clustered based on these scales. The analysis of the different clusters suggested that teaching physics in a structured way combined with fostering students' autonomy contributes to students' motivation. However, our regression models indicated that content knowledge is a more important predictor for students' motivation, and there was no homogeneous pattern for all gender- and country-specific subgroups investigated. The results are discussed in light of recent discussions on the feasibility of constructivism in practice.

  4. Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.

    ERIC Educational Resources Information Center

    Olson, Jeffery E.

    Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…

  5. Unparalleled sample treatment throughput for proteomics workflows relying on ultrasonic energy.

    PubMed

    Jorge, Susana; Araújo, J E; Pimentel-Santos, F M; Branco, Jaime C; Santos, Hugo M; Lodeiro, Carlos; Capelo, J L

    2018-02-01

    We report on the new microplate horn ultrasonic device as a powerful tool to speed proteomics workflows with unparalleled throughput. 96 complex proteomes were digested at the same time in 4min. Variables such as ultrasonication time, ultrasonication amplitude, and protein to enzyme ratio were optimized. The "classic" method relying on overnight protein digestion (12h) and the sonoreactor-based method were also employed for comparative purposes. We found the protein digestion efficiency homogeneously distributed in the entire microplate horn surface using the following conditions: 4min sonication time and 25% amplitude. Using this approach, patients with lymphoma and myeloma were classified using principal component analysis and a 2D gel-mass spectrometry based approach. Furthermore, we demonstrate the excellent performance by using MALDI-mass spectrometry based profiling as a fast way to classify patients with rheumatoid arthritis, systemic lupus erythematosus, and ankylosing spondylitis. Finally, the speed and simplicity of this method were demonstrated by clustering 90 patients with knee osteoarthritis disease (30), with a prosthesis (30, control group) and healthy individuals (30) with no history of joint disease. Overall, the new approach allows profiling a disease in just one week while allows to match the minimalism rules as outlined by Halls. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Assessment of metal pollution based on multivariate statistical modeling of 'hot spot' sediments from the Black Sea.

    PubMed

    Simeonov, V; Massart, D L; Andreev, G; Tsakovski, S

    2000-11-01

    The paper deals with application of different statistical methods like cluster and principal components analysis (PCA), partial least squares (PLSs) modeling. These approaches are an efficient tool in achieving better understanding about the contamination of two gulf regions in Black Sea. As objects of the study, a collection of marine sediment samples from Varna and Bourgas "hot spots" gulf areas are used. In the present case the use of cluster and PCA make it possible to separate three zones of the marine environment with different levels of pollution by interpretation of the sediment analysis (Bourgas gulf, Varna gulf and lake buffer zone). Further, the extraction of four latent factors offers a specific interpretation of the possible pollution sources and separates natural from anthropogenic factors, the latter originating from contamination by chemical, oil refinery and steel-work enterprises. Finally, the PLSs modeling gives a better opportunity in predicting contaminant concentration on tracer (or tracers) element as compared to the one-dimensional approach of the baseline models. The results of the study are important not only in local aspect as they allow quick response in finding solutions and decision making but also in broader sense as a useful environmetrical methodology.

  7. [Studies on the brand traceability of milk powder based on NIR spectroscopy technology].

    PubMed

    Guan, Xiao; Gu, Fang-Qing; Liu, Jing; Yang, Yong-Jian

    2013-10-01

    Brand traceability of several different kinds of milk powder was studied by combining near infrared spectroscopy diffuse reflectance mode with soft independent modeling of class analogy (SIMCA) in the present paper. The near infrared spectrum of 138 samples, including 54 Guangming milk powder samples, 43 Netherlands samples, and 33 Nestle samples and 8 Yili samples, were collected. After pretreatment of full spectrum data variables in training set, principal component analysis was performed, and the contribution rate of the cumulative variance of the first three principal components was about 99.07%. Milk powder principal component regression model based on SIMCA was established, and used to classify the milk powder samples in prediction sets. The results showed that the recognition rate of Guangming milk powder, Netherlands milk powder and Nestle milk powder was 78%, 75% and 100%, the rejection rate was 100%, 87%, and 88%, respectively. Therefore, the near infrared spectroscopy combined with SIMCA model can classify milk powder with high accuracy, and is a promising identification method of milk powder variety.

  8. Quality evaluation of Houttuynia cordata Thunb. by high performance liquid chromatography with photodiode-array detection (HPLC-DAD).

    PubMed

    Yang, Zhan-nan; Sun, Yi-ming; Luo, Shi-qiong; Chen, Jin-wu; Chen, Jin-wu; Yu, Zheng-wen; Sun, Min

    2014-03-01

    A new, validated method, developed for the simultaneous determination of 16 phenolics (chlorogenic acid, scopoletin, vitexin, rutin, afzelin, isoquercitrin, narirutin, kaempferitrin, quercitrin, quercetin, kaempferol, chrysosplenol D, vitexicarpin, 5-hydroxy-3,3',4',7-tetramethoxy flavonoids, 5-hydroxy-3,4',6,7-tetramethoxy flavonoids and kaempferol-3,7,4'-trimethyl ether) in Houttuynia cordata Thunb. was successfully applied to 35 batches of samples collected from different regions or at different times and their total antioxidant activities (TAAs) were investigated. The aim was to develop a quality control method to simultaneously determine the major active components in H. cordata. The HPLC-DAD method was performed using a reverse-phase C18 column with a gradient elution system (acetonitrile-methanol-water) and simultaneous detection at 345 nm. Linear behaviors of method for all the analytes were observed with linear regression relationship (r(2)>0.999) at the concentration ranges investigated. The recoveries of the 16 phenolics ranged from 98.93% to 101.26%. The samples analyzed were differentiated and classified based on the contents of the 16 characteristic compounds and the TAA using hierarchical clustering analysis (HCA) and principal component analysis (PCA). The results analyzed showed that similar chemical profiles and TAAs were divided into the same group. There was some evidence that active compounds, although they varied significantly, may possess uniform anti-oxidant activities and have potentially synergistic effects.

  9. Rapid quality assessment of Radix Aconiti Preparata using direct analysis in real time mass spectrometry.

    PubMed

    Zhu, Hongbin; Wang, Chunyan; Qi, Yao; Song, Fengrui; Liu, Zhiqiang; Liu, Shuying

    2012-11-08

    This study presents a novel and rapid method to identify chemical markers for the quality control of Radix Aconiti Preparata, a world widely used traditional herbal medicine. In the method, the samples with a fast extraction procedure were analyzed using direct analysis in real time mass spectrometry (DART MS) combined with multivariate data analysis. At present, the quality assessment approach of Radix Aconiti Preparata was based on the two processing methods recorded in Chinese Pharmacopoeia for the purpose of reducing the toxicity of Radix Aconiti and ensuring its clinical therapeutic efficacy. In order to ensure the safety and effectivity in clinical use, the processing degree of Radix Aconiti should be well controlled and assessed. In the paper, hierarchical cluster analysis and principal component analysis were performed to evaluate the DART MS data of Radix Aconiti Preparata samples in different processing times. The results showed that the well processed Radix Aconiti Preparata, unqualified processed and the raw Radix Aconiti could be clustered reasonably corresponding to their constituents. The loading plot shows that the main chemical markers having the most influence on the discrimination amongst the qualified and unqualified samples were mainly some monoester diterpenoid aconitines and diester diterpenoid aconitines, i.e. benzoylmesaconine, hypaconitine, mesaconitine, neoline, benzoylhypaconine, benzoylaconine, fuziline, aconitine and 10-OH-mesaconitine. The established DART MS approach in combination with multivariate data analysis provides a very flexible and reliable method for quality assessment of toxic herbal medicine. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Blood hyperviscosity identification with reflective spectroscopy of tongue tip based on principal component analysis combining artificial neural network.

    PubMed

    Liu, Ming; Zhao, Jing; Lu, XiaoZuo; Li, Gang; Wu, Taixia; Zhang, LiFu

    2018-05-10

    With spectral methods, noninvasive determination of blood hyperviscosity in vivo is very potential and meaningful in clinical diagnosis. In this study, 67 male subjects (41 health, and 26 hyperviscosity according to blood sample analysis results) participate. Reflectance spectra of subjects' tongue tips is measured, and a classification method bases on principal component analysis combined with artificial neural network model is built to identify hyperviscosity. Hold-out and Leave-one-out methods are used to avoid significant bias and lessen overfitting problem, which are widely accepted in the model validation. To measure the performance of the classification, sensitivity, specificity, accuracy and F-measure are calculated, respectively. The accuracies with 100 times Hold-out method and 67 times Leave-one-out method are 88.05% and 97.01%, respectively. Experimental results indicate that the built classification model has certain practical value and proves the feasibility of using spectroscopy to identify hyperviscosity by noninvasive determination.

  11. The Researches on Damage Detection Method for Truss Structures

    NASA Astrophysics Data System (ADS)

    Wang, Meng Hong; Cao, Xiao Nan

    2018-06-01

    This paper presents an effective method to detect damage in truss structures. Numerical simulation and experimental analysis were carried out on a damaged truss structure under instantaneous excitation. The ideal excitation point and appropriate hammering method were determined to extract time domain signals under two working conditions. The frequency response function and principal component analysis were used for data processing, and the angle between the frequency response function vectors was selected as a damage index to ascertain the location of a damaged bar in the truss structure. In the numerical simulation, the time domain signal of all nodes was extracted to determine the location of the damaged bar. In the experimental analysis, the time domain signal of a portion of the nodes was extracted on the basis of an optimal sensor placement method based on the node strain energy coefficient. The results of the numerical simulation and experimental analysis showed that the damage detection method based on the frequency response function and principal component analysis could locate the damaged bar accurately.

  12. Real-time detection of organic contamination events in water distribution systems by principal components analysis of ultraviolet spectral data.

    PubMed

    Zhang, Jian; Hou, Dibo; Wang, Ke; Huang, Pingjie; Zhang, Guangxin; Loáiciga, Hugo

    2017-05-01

    The detection of organic contaminants in water distribution systems is essential to protect public health from potential harmful compounds resulting from accidental spills or intentional releases. Existing methods for detecting organic contaminants are based on quantitative analyses such as chemical testing and gas/liquid chromatography, which are time- and reagent-consuming and involve costly maintenance. This study proposes a novel procedure based on discrete wavelet transform and principal component analysis for detecting organic contamination events from ultraviolet spectral data. Firstly, the spectrum of each observation is transformed using discrete wavelet with a coiflet mother wavelet to capture the abrupt change along the wavelength. Principal component analysis is then employed to approximate the spectra based on capture and fusion features. The significant value of Hotelling's T 2 statistics is calculated and used to detect outliers. An alarm of contamination event is triggered by sequential Bayesian analysis when the outliers appear continuously in several observations. The effectiveness of the proposed procedure is tested on-line using a pilot-scale setup and experimental data.

  13. Genetic diversity and relationships among different tomato varieties revealed by EST-SSR markers.

    PubMed

    Korir, N K; Diao, W; Tao, R; Li, X; Kayesh, E; Li, A; Zhen, W; Wang, S

    2014-01-08

    The genetic diversity and relationship of 42 tomato varieties sourced from different geographic regions was examined with EST-SSR markers. The genetic diversity was between 0.18 and 0.77, with a mean of 0.49; the polymorphic information content ranged from 0.17 to 0.74, with a mean of 0.45. This indicates a fairly high degree of diversity among these tomato varieties. Based on the cluster analysis using unweighted pair-group method with arithmetic average (UPGMA), all the tomato varieties fell into 5 groups, with no obvious geographical distribution characteristics despite their diverse sources. The principal component analysis (PCA) supported the clustering result; however, relationships among varieties were more complex in the PCA scatterplot than in the UPGMA dendrogram. This information about the genetic relationships between these tomato lines helps distinguish these 42 varieties and will be useful for tomato variety breeding and selection. We confirm that the EST-SSR marker system is useful for studying genetic diversity among tomato varieties. The high degree of polymorphism and the large number of bands obtained per assay shows that SSR is the most informative marker system for tomato genotyping for purposes of rights/protection and for the tomato industry in general. It is recommended that these varieties be subjected to identification using an SSR-based manual cultivar identification diagram strategy or other easy-to-use and referable methods so as to provide a complete set of information concerning genetic relationships and a readily usable means of identifying these varieties.

  14. Chemical Polymorphism of Essential Oils of Artemisia vulgaris Growing Wild in Lithuania.

    PubMed

    Judzentiene, Asta; Budiene, Jurga

    2018-02-01

    Compositional variability of mugwort (Artemisia vulgaris L.) essential oils has been investigated in the study. Plant material (over ground parts at full flowering stage) was collected from forty-four wild populations in Lithuania. The oils from aerial parts were obtained by hydrodistillation and analyzed by GC(FID) and GC/MS. In total, up to 111 components were determined in the oils. As the major constituents were found: sabinene, 1,8-cineole, artemisia ketone, both thujone isomers, camphor, cis-chrysanthenyl acetate, davanone and davanone B. The compositional data were subjected to statistical analysis. The application of PCA (Principal Component Analysis) and AHC (Agglomerative Hierarchical Clustering) allowed grouping the oils into six clusters. AHC permitted to distinguish an artemisia ketone chemotype, which, to the best of our knowledge, is very scarce. Additionally, two rare cis-chrysanthenyl acetate and sabinene oil types were determined for the plants growing in Lithuania. Besides, davanone was found for the first time as a principal component in mugwort oils. The performed study revealed significant chemical polymorphism of essential oils in mugwort plants native to Lithuania; it has expanded our chemotaxonomic knowledge both of A. vulgaris species and Artemisia genus. © 2018 Wiley-VHCA AG, Zurich, Switzerland.

  15. Objective sampling design in a highly heterogeneous landscape - characterizing environmental determinants of malaria vector distribution in French Guiana, in the Amazonian region.

    PubMed

    Roux, Emmanuel; Gaborit, Pascal; Romaña, Christine A; Girod, Romain; Dessay, Nadine; Dusfour, Isabelle

    2013-12-01

    Sampling design is a key issue when establishing species inventories and characterizing habitats within highly heterogeneous landscapes. Sampling efforts in such environments may be constrained and many field studies only rely on subjective and/or qualitative approaches to design collection strategy. The region of Cacao, in French Guiana, provides an excellent study site to understand the presence and abundance of Anopheles mosquitoes, their species dynamics and the transmission risk of malaria across various environments. We propose an objective methodology to define a stratified sampling design. Following thorough environmental characterization, a factorial analysis of mixed groups allows the data to be reduced and non-collinear principal components to be identified while balancing the influences of the different environmental factors. Such components defined new variables which could then be used in a robust k-means clustering procedure. Then, we identified five clusters that corresponded to our sampling strata and selected sampling sites in each stratum. We validated our method by comparing the species overlap of entomological collections from selected sites and the environmental similarities of the same sites. The Morisita index was significantly correlated (Pearson linear correlation) with environmental similarity based on i) the balanced environmental variable groups considered jointly (p = 0.001) and ii) land cover/use (p-value < 0.001). The Jaccard index was significantly correlated with land cover/use-based environmental similarity (p-value = 0.001). The results validate our sampling approach. Land cover/use maps (based on high spatial resolution satellite images) were shown to be particularly useful when studying the presence, density and diversity of Anopheles mosquitoes at local scales and in very heterogeneous landscapes.

  16. Objective sampling design in a highly heterogeneous landscape - characterizing environmental determinants of malaria vector distribution in French Guiana, in the Amazonian region

    PubMed Central

    2013-01-01

    Background Sampling design is a key issue when establishing species inventories and characterizing habitats within highly heterogeneous landscapes. Sampling efforts in such environments may be constrained and many field studies only rely on subjective and/or qualitative approaches to design collection strategy. The region of Cacao, in French Guiana, provides an excellent study site to understand the presence and abundance of Anopheles mosquitoes, their species dynamics and the transmission risk of malaria across various environments. We propose an objective methodology to define a stratified sampling design. Following thorough environmental characterization, a factorial analysis of mixed groups allows the data to be reduced and non-collinear principal components to be identified while balancing the influences of the different environmental factors. Such components defined new variables which could then be used in a robust k-means clustering procedure. Then, we identified five clusters that corresponded to our sampling strata and selected sampling sites in each stratum. Results We validated our method by comparing the species overlap of entomological collections from selected sites and the environmental similarities of the same sites. The Morisita index was significantly correlated (Pearson linear correlation) with environmental similarity based on i) the balanced environmental variable groups considered jointly (p = 0.001) and ii) land cover/use (p-value << 0.001). The Jaccard index was significantly correlated with land cover/use-based environmental similarity (p-value = 0.001). Conclusions The results validate our sampling approach. Land cover/use maps (based on high spatial resolution satellite images) were shown to be particularly useful when studying the presence, density and diversity of Anopheles mosquitoes at local scales and in very heterogeneous landscapes. PMID:24289184

  17. A taxonomy of health networks and systems: bringing order out of chaos.

    PubMed Central

    Bazzoli, G J; Shortell, S M; Dubbs, N; Chan, C; Kralovec, P

    1999-01-01

    OBJECTIVE: To use existing theory and data for empirical development of a taxonomy that identifies clusters of organizations sharing common strategic/structural features. DATA SOURCES: Data from the 1994 and 1995 American Hospital Association Annual Surveys, which provide extensive data on hospital involvement in hospital-led health networks and systems. STUDY DESIGN: Theories of organization behavior and industrial organization economics were used to identify three strategic/structural dimensions: differentiation, which refers to the number of different products/services along a healthcare continuum; integration, which refers to mechanisms used to achieve unity of effort across organizational components; and centralization, which relates to the extent to which activities take place at centralized versus dispersed locations. These dimensions were applied to three components of the health service/product continuum: hospital services, physician arrangements, and provider-based insurance activities. DATA EXTRACTION METHODS: We identified 295 health systems and 274 health networks across the United States in 1994, and 297 health systems and 306 health networks in 1995 using AHA data. Empirical measures aggregated individual hospital data to the health network and system level. PRINCIPAL FINDINGS: We identified a reliable, internally valid, and stable four-cluster solution for health networks and a five-cluster solution for health systems. We found that differentiation and centralization were particularly important in distinguishing unique clusters of organizations. High differentiation typically occurred with low centralization, which suggests that a broader scope of activity is more difficult to centrally coordinate. Integration was also important, but we found that health networks and systems typically engaged in both ownership-based and contractual-based integration or they were not integrated at all. CONCLUSIONS: Overall, we were able to classify approximately 70 percent of hospital-led health networks and 90 percent of hospital-led health systems into well-defined organizational clusters. Given the widespread perception that organizational change in healthcare has been chaotic, our research suggests that important and meaningful similarities exist across many evolving organizations. The resulting taxonomy provides a new lexicon for researchers, policymakers, and healthcare executives for characterizing key strategic and structural features of evolving organizations. The taxonomy also provides a framework for future inquiry about the relationships between organizational strategy, structure, and performance, and for assessing policy issues, such as Medicare Provider Sponsored Organizations, antitrust, and insurance regulation. Images Figure 2A Figure 2A Figure 2B Figure 2B PMID:10029504

  18. QSAR study of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2 using LS-SVM and GRNN based on principal components.

    PubMed

    Shahlaei, Mohsen; Sabet, Razieh; Ziari, Maryam Bahman; Moeinifard, Behzad; Fassihi, Afshin; Karbakhsh, Reza

    2010-10-01

    Quantitative relationships between molecular structure and methionine aminopeptidase-2 inhibitory activity of a series of cytotoxic anthranilic acid sulfonamide derivatives were discovered. We have demonstrated the detailed application of two efficient nonlinear methods for evaluation of quantitative structure-activity relationships of the studied compounds. Components produced by principal component analysis as input of developed nonlinear models were used. The performance of the developed models namely PC-GRNN and PC-LS-SVM were tested by several validation methods. The resulted PC-LS-SVM model had a high statistical quality (R(2)=0.91 and R(CV)(2)=0.81) for predicting the cytotoxic activity of the compounds. Comparison between predictability of PC-GRNN and PC-LS-SVM indicates that later method has higher ability to predict the activity of the studied molecules. Copyright (c) 2010 Elsevier Masson SAS. All rights reserved.

  19. Home on the Great River, part 3: An Integrated Habitat and Hydrology Index

    EPA Science Inventory

    The U.S. EPA’s Environmental Monitoring and Assessment Program sampled 395 sites in the Upper Mississippi, Lower Missouri and Ohio Rivers in 2004-2006 as part of an integrated assessment of ecological condition. Using principal components and cluster analyses, we developed fish ...

  20. Principal Component 2-D Long Short-Term Memory for Font Recognition on Single Chinese Characters.

    PubMed

    Tao, Dapeng; Lin, Xu; Jin, Lianwen; Li, Xuelong

    2016-03-01

    Chinese character font recognition (CCFR) has received increasing attention as the intelligent applications based on optical character recognition becomes popular. However, traditional CCFR systems do not handle noisy data effectively. By analyzing in detail the basic strokes of Chinese characters, we propose that font recognition on a single Chinese character is a sequence classification problem, which can be effectively solved by recurrent neural networks. For robust CCFR, we integrate a principal component convolution layer with the 2-D long short-term memory (2DLSTM) and develop principal component 2DLSTM (PC-2DLSTM) algorithm. PC-2DLSTM considers two aspects: 1) the principal component layer convolution operation helps remove the noise and get a rational and complete font information and 2) simultaneously, 2DLSTM deals with the long-range contextual processing along scan directions that can contribute to capture the contrast between character trajectory and background. Experiments using the frequently used CCFR dataset suggest the effectiveness of PC-2DLSTM compared with other state-of-the-art font recognition methods.

  1. [Determination and principal component analysis of mineral elements based on ICP-OES in Nitraria roborowskii fruits from different regions].

    PubMed

    Yuan, Yuan-Yuan; Zhou, Yu-Bi; Sun, Jing; Deng, Juan; Bai, Ying; Wang, Jie; Lu, Xue-Feng

    2017-06-01

    The content of elements in fifteen different regions of Nitraria roborowskii samples were determined by inductively coupled plasma-atomic emission spectrometry(ICP-OES), and its elemental characteristics were analyzed by principal component analysis. The results indicated that 18 mineral elements were detected in N. roborowskii of which V cannot be detected. In addition, contents of Na, K and Ca showed high concentration. Ti showed maximum content variance, while K is minimum. Four principal components were gained from the original data. The cumulative variance contribution rate is 81.542% and the variance contribution of the first principal component was 44.997%, indicating that Cr, Fe, P and Ca were the characteristic elements of N. roborowskii.Thus, the established method was simple, precise and can be used for determination of mineral elements in N.roborowskii Kom. fruits. The elemental distribution characteristics among N.roborowskii fruits are related to geographical origins which were clearly revealed by PCA. All the results will provide good basis for comprehensive utilization of N.roborowskii. Copyright© by the Chinese Pharmaceutical Association.

  2. Combination of multivariate curve resolution and multivariate classification techniques for comprehensive high-performance liquid chromatography-diode array absorbance detection fingerprints analysis of Salvia reuterana extracts.

    PubMed

    Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad

    2014-01-24

    In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. ASCS online fault detection and isolation based on an improved MPCA

    NASA Astrophysics Data System (ADS)

    Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

    2014-09-01

    Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.

  4. Capturing farm diversity with hypothesis-based typologies: An innovative methodological framework for farming system typology development

    PubMed Central

    Alvarez, Stéphanie; Timler, Carl J.; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A.; Groot, Jeroen C. J.

    2018-01-01

    Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia’s Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies. PMID:29763422

  5. Capturing farm diversity with hypothesis-based typologies: An innovative methodological framework for farming system typology development.

    PubMed

    Alvarez, Stéphanie; Timler, Carl J; Michalscheck, Mirja; Paas, Wim; Descheemaeker, Katrien; Tittonell, Pablo; Andersson, Jens A; Groot, Jeroen C J

    2018-01-01

    Creating typologies is a way to summarize the large heterogeneity of smallholder farming systems into a few farm types. Various methods exist, commonly using statistical analysis, to create these typologies. We demonstrate that the methodological decisions on data collection, variable selection, data-reduction and clustering techniques can bear a large impact on the typology results. We illustrate the effects of analysing the diversity from different angles, using different typology objectives and different hypotheses, on typology creation by using an example from Zambia's Eastern Province. Five separate typologies were created with principal component analysis (PCA) and hierarchical clustering analysis (HCA), based on three different expert-informed hypotheses. The greatest overlap between typologies was observed for the larger, wealthier farm types but for the remainder of the farms there were no clear overlaps between typologies. Based on these results, we argue that the typology development should be guided by a hypothesis on the local agriculture features and the drivers and mechanisms of differentiation among farming systems, such as biophysical and socio-economic conditions. That hypothesis is based both on the typology objective and on prior expert knowledge and theories of the farm diversity in the study area. We present a methodological framework that aims to integrate participatory and statistical methods for hypothesis-based typology construction. This is an iterative process whereby the results of the statistical analysis are compared with the reality of the target population as hypothesized by the local experts. Using a well-defined hypothesis and the presented methodological framework, which consolidates the hypothesis through local expert knowledge for the creation of typologies, warrants development of less subjective and more contextualized quantitative farm typologies.

  6. Profiling the different needs and expectations of patients for population-based medicine: a case study using segmentation analysis

    PubMed Central

    2012-01-01

    Background This study illustrates an evidence-based method for the segmentation analysis of patients that could greatly improve the approach to population-based medicine, by filling a gap in the empirical analysis of this topic. Segmentation facilitates individual patient care in the context of the culture, health status, and the health needs of the entire population to which that patient belongs. Because many health systems are engaged in developing better chronic care management initiatives, patient profiles are critical to understanding whether some patients can move toward effective self-management and can play a central role in determining their own care, which fosters a sense of responsibility for their own health. A review of the literature on patient segmentation provided the background for this research. Method First, we conducted a literature review on patient satisfaction and segmentation to build a survey. Then, we performed 3,461 surveys of outpatient services users. The key structures on which the subjects’ perception of outpatient services was based were extrapolated using principal component factor analysis with varimax rotation. After the factor analysis, segmentation was performed through cluster analysis to better analyze the influence of individual attitudes on the results. Results Four segments were identified through factor and cluster analysis: the “unpretentious,” the “informed and supported,” the “experts” and the “advanced” patients. Their policies and managerial implications are outlined. Conclusions With this research, we provide the following: – a method for profiling patients based on common patient satisfaction surveys that is easily replicable in all health systems and contexts; – a proposal for segments based on the results of a broad-based analysis conducted in the Italian National Health System (INHS). Segments represent profiles of patients requiring different strategies for delivering health services. Their knowledge and analysis might support an effort to build an effective population-based medicine approach. PMID:23256543

  7. Design of hybrid radial basis function neural networks (HRBFNNs) realized with the aid of hybridization of fuzzy clustering method (FCM) and polynomial neural networks (PNNs).

    PubMed

    Huang, Wei; Oh, Sung-Kwun; Pedrycz, Witold

    2014-12-01

    In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature. Copyright © 2014 Elsevier Ltd. All rights reserved.

  8. How do components of evidence-based psychological treatment cluster in practice? A survey and cluster analysis.

    PubMed

    Gifford, Elizabeth V; Tavakoli, Sara; Weingardt, Kenneth R; Finney, John W; Pierson, Heather M; Rosen, Craig S; Hagedorn, Hildi J; Cook, Joan M; Curran, Geoff M

    2012-01-01

    Evidence-based psychological treatments (EBPTs) are clusters of interventions, but it is unclear how providers actually implement these clusters in practice. A disaggregated measure of EBPTs was developed to characterize clinicians' component-level evidence-based practices and to examine relationships among these practices. Survey items captured components of evidence-based treatments based on treatment integrity measures. The Web-based survey was conducted with 75 U.S. Department of Veterans Affairs (VA) substance use disorder (SUD) practitioners and 149 non-VA community-based SUD practitioners. Clinician's self-designated treatment orientations were positively related to their endorsement of those EBPT components; however, clinicians used components from a variety of EBPTs. Hierarchical cluster analysis indicated that clinicians combined and organized interventions from cognitive-behavioral therapy, the community reinforcement approach, motivational interviewing, structured family and couples therapy, 12-step facilitation, and contingency management into clusters including empathy and support, treatment engagement and activation, abstinence initiation, and recovery maintenance. Understanding how clinicians use EBPT components may lead to improved evidence-based practice dissemination and implementation. Published by Elsevier Inc.

  9. Randomized subspace-based robust principal component analysis for hyperspectral anomaly detection

    NASA Astrophysics Data System (ADS)

    Sun, Weiwei; Yang, Gang; Li, Jialin; Zhang, Dianfa

    2018-01-01

    A randomized subspace-based robust principal component analysis (RSRPCA) method for anomaly detection in hyperspectral imagery (HSI) is proposed. The RSRPCA combines advantages of randomized column subspace and robust principal component analysis (RPCA). It assumes that the background has low-rank properties, and the anomalies are sparse and do not lie in the column subspace of the background. First, RSRPCA implements random sampling to sketch the original HSI dataset from columns and to construct a randomized column subspace of the background. Structured random projections are also adopted to sketch the HSI dataset from rows. Sketching from columns and rows could greatly reduce the computational requirements of RSRPCA. Second, the RSRPCA adopts the columnwise RPCA (CWRPCA) to eliminate negative effects of sampled anomaly pixels and that purifies the previous randomized column subspace by removing sampled anomaly columns. The CWRPCA decomposes the submatrix of the HSI data into a low-rank matrix (i.e., background component), a noisy matrix (i.e., noise component), and a sparse anomaly matrix (i.e., anomaly component) with only a small proportion of nonzero columns. The algorithm of inexact augmented Lagrange multiplier is utilized to optimize the CWRPCA problem and estimate the sparse matrix. Nonzero columns of the sparse anomaly matrix point to sampled anomaly columns in the submatrix. Third, all the pixels are projected onto the complemental subspace of the purified randomized column subspace of the background and the anomaly pixels in the original HSI data are finally exactly located. Several experiments on three real hyperspectral images are carefully designed to investigate the detection performance of RSRPCA, and the results are compared with four state-of-the-art methods. Experimental results show that the proposed RSRPCA outperforms four comparison methods both in detection performance and in computational time.

  10. Delineation of marine ecosystem zones in the northern Arabian Sea during winter

    NASA Astrophysics Data System (ADS)

    Shalin, Saleem; Samuelsen, Annette; Korosov, Anton; Menon, Nandini; Backeberg, Björn C.; Pettersson, Lasse H.

    2018-03-01

    The spatial and temporal variability of marine autotrophic abundance, expressed as chlorophyll concentration, is monitored from space and used to delineate the surface signature of marine ecosystem zones with distinct optical characteristics. An objective zoning method is presented and applied to satellite-derived Chlorophyll a (Chl a) data from the northern Arabian Sea (50-75° E and 15-30° N) during the winter months (November-March). Principal component analysis (PCA) and cluster analysis (CA) were used to statistically delineate the Chl a into zones with similar surface distribution patterns and temporal variability. The PCA identifies principal components of variability and the CA splits these into zones based on similar characteristics. Based on the temporal variability of the Chl a pattern within the study area, the statistical clustering revealed six distinct ecological zones. The obtained zones are related to the Longhurst provinces to evaluate how these compared to established ecological provinces. The Chl a variability within each zone was then compared with the variability of oceanic and atmospheric properties viz. mixed-layer depth (MLD), wind speed, sea-surface temperature (SST), photosynthetically active radiation (PAR), nitrate and dust optical thickness (DOT) as an indication of atmospheric input of iron to the ocean. The analysis showed that in all zones, peak values of Chl a coincided with low SST and deep MLD. The rate of decrease in SST and the deepening of MLD are observed to trigger the algae bloom events in the first four zones. Lagged cross-correlation analysis shows that peak Chl a follows peak MLD and SST minima. The MLD time lag is shorter than the SST lag by 8 days, indicating that the cool surface conditions might have enhanced mixing, leading to increased primary production in the study area. An analysis of monthly climatological nitrate values showed increased concentrations associated with the deepening of the mixed layer. The input of iron seems to be important in both the open-ocean and coastal areas of the northern and north-western parts of the northern Arabian Sea, where the seasonal variability of the Chl a pattern closely follows the variability of iron deposition.

  11. Data for Known Geothermal Resource Areas (KGRA) and Identified Hydrothermal Resource Areas (IHRA) in Southern Idaho and Southeastern Oregon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neupane, Ghanashyam; McLing, Travis; Mattson, Earl

    The presented database includes water chemistry data and structural rating values for various geothermal features used for performing principal component (PC) and cluster analyses work to identify promising KGRAs and IHRAs in southern Idaho and southeastern Oregon. A brief note on various KGRAs/IHRAs is also included herewith. Results of PC and cluster analyses are presented as a separate paper (Lindsey et al., 2017) that is, as of the time of this submission, in 'revision' status.

  12. RP-HPLC method using 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate incorporated with normalization technique in principal component analysis to differentiate the bovine, porcine and fish gelatins.

    PubMed

    Azilawati, M I; Hashim, D M; Jamilah, B; Amin, I

    2015-04-01

    The amino acid compositions of bovine, porcine and fish gelatin were determined by amino acid analysis using 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate as derivatization reagent. Sixteen amino acids were identified with similar spectral chromatograms. Data pre-treatment via centering and transformation of data by normalization were performed to provide data that are more suitable for analysis and easier to be interpreted. Principal component analysis (PCA) transformed the original data matrix into a number of principal components (PCs). Three principal components (PCs) described 96.5% of the total variance, and 2 PCs (91%) explained the highest variances. The PCA model demonstrated the relationships among amino acids in the correlation loadings plot to the group of gelatins in the scores plot. Fish gelatin was correlated to threonine, serine and methionine on the positive side of PC1; bovine gelatin was correlated to the non-polar side chains amino acids that were proline, hydroxyproline, leucine, isoleucine and valine on the negative side of PC1 and porcine gelatin was correlated to the polar side chains amino acids that were aspartate, glutamic acid, lysine and tyrosine on the negative side of PC2. Verification on the database using 12 samples from commercial products gelatin-based had confirmed the grouping patterns and the variables correlations. Therefore, this quantitative method is very useful as a screening method to determine gelatin from various sources. Copyright © 2014 Elsevier Ltd. All rights reserved.

  13. [In vitro transdermal delivery of the active fraction of xiangfusiwu decoction based on principal component analysis].

    PubMed

    Li, Zhen-Hao; Liu, Pei; Qian, Da-Wei; Li, Wei; Shang, Er-Xin; Duan, Jin-Ao

    2013-06-01

    The objective of the present study was to establish a method based on principal component analysis (PCA) for the study of transdermal delivery of multiple components in Chinese medicine, and to choose the best penetration enhancers for the active fraction of Xiangfusiwu decoction (BW) with this method. Improved Franz diffusion cells with isolated rat abdomen skins were carried out to experiment on the transdermal delivery of six active components, including ferulic acid, paeoniflorin, albiflorin, protopine, tetrahydropalmatine and tetrahydrocolumbamine. The concentrations of these components were determined by LC-MS/MS, then the total factor scores of the concentrations at different times were calculated using PCA and were employed instead of the concentrations to compute the cumulative amounts and steady fluxes, the latter of which were considered as the indexes for optimizing penetration enhancers. The results showed that compared to the control group, the steady fluxes of the other groups increased significantly and furthermore, 4% azone with 1% propylene glycol manifested the best effect. The six components could penetrate through skin well under the action of penetration enhancers. The method established in this study has been proved to be suitable for the study of transdermal delivery of multiple components, and it provided a scientific basis for preparation research of Xiangfusiwu decoction and moreover, it could be a reference for Chinese medicine research.

  14. Characterization of Type Ia Supernova Light Curves Using Principal Component Analysis of Sparse Functional Data

    NASA Astrophysics Data System (ADS)

    He, Shiyuan; Wang, Lifan; Huang, Jianhua Z.

    2018-04-01

    With growing data from ongoing and future supernova surveys, it is possible to empirically quantify the shapes of SNIa light curves in more detail, and to quantitatively relate the shape parameters with the intrinsic properties of SNIa. Building such relationships is critical in controlling systematic errors associated with supernova cosmology. Based on a collection of well-observed SNIa samples accumulated in the past years, we construct an empirical SNIa light curve model using a statistical method called the functional principal component analysis (FPCA) for sparse and irregularly sampled functional data. Using this method, the entire light curve of an SNIa is represented by a linear combination of principal component functions, and the SNIa is represented by a few numbers called “principal component scores.” These scores are used to establish relations between light curve shapes and physical quantities such as intrinsic color, interstellar dust reddening, spectral line strength, and spectral classes. These relations allow for descriptions of some critical physical quantities based purely on light curve shape parameters. Our study shows that some important spectral feature information is being encoded in the broad band light curves; for instance, we find that the light curve shapes are correlated with the velocity and velocity gradient of the Si II λ6355 line. This is important for supernova surveys (e.g., LSST and WFIRST). Moreover, the FPCA light curve model is used to construct the entire light curve shape, which in turn is used in a functional linear form to adjust intrinsic luminosity when fitting distance models.

  15. Metabolic Clustering Analysis as a Strategy for Compound Selection in the Drug Discovery Pipeline for Leishmaniasis.

    PubMed

    Armitage, Emily G; Godzien, Joanna; Peña, Imanol; López-Gonzálvez, Ángeles; Angulo, Santiago; Gradillas, Ana; Alonso-Herranz, Vanesa; Martín, Julio; Fiandor, Jose M; Barrett, Michael P; Gabarro, Raquel; Barbas, Coral

    2018-05-18

    A lack of viable hits, increasing resistance, and limited knowledge on mode of action is hindering drug discovery for many diseases. To optimize prioritization and accelerate the discovery process, a strategy to cluster compounds based on more than chemical structure is required. We show the power of metabolomics in comparing effects on metabolism of 28 different candidate treatments for Leishmaniasis (25 from the GSK Leishmania box, two analogues of Leishmania box series, and amphotericin B as a gold standard treatment), tested in the axenic amastigote form of Leishmania donovani. Capillary electrophoresis-mass spectrometry was applied to identify the metabolic profile of Leishmania donovani, and principal components analysis was used to cluster compounds on potential mode of action, offering a medium throughput screening approach in drug selection/prioritization. The comprehensive and sensitive nature of the data has also made detailed effects of each compound obtainable, providing a resource to assist in further mechanistic studies and prioritization of these compounds for the development of new antileishmanial drugs.

  16. A theoretical study in extracting the essential features and dynamics of molecular motions: Intrinsic geometry methods for PF(5) pseudorotations and statistical methods for argon clusters

    NASA Astrophysics Data System (ADS)

    Panahi, Nima S.

    We studied the problem of understanding and computing the essential features and dynamics of molecular motions through the development of two theories for two different systems. First, we studied the process of the Berry Pseudorotation of PF5 and the rotations it induces in the molecule through its natural and intrinsic geometric nature by setting it in the language of fiber bundles and graph theory. With these tools, we successfully extracted the essentials of the process' loops and induced rotations. The infinite number of pseudorotation loops were broken down into a small set of essential loops called "super loops", with their intrinsic properties and link to the physical movements of the molecule extensively studied. In addition, only the three "self-edge loops" generated any induced rotations, and then only a finite number of classes of them. Second, we studied applying the statistical methods of Principal Components Analysis (PCA) and Principal Coordinate Analysis (PCO) to capture only the most important changes in Argon clusters so as to reduce computational costs and graph the potential energy surface (PES) in three dimensions respectively. Both methods proved successful, but PCA was only partially successful since one will only see advantages for PES database systems much larger than those both currently being studied and those that can be computationally studied in the next few decades to come. In addition, PCA is only needed for the very rare case of a PES database that does not already include Hessian eigenvalues.

  17. AFLP-based genetic diversity assessment of commercially important tea germplasm in India.

    PubMed

    Sharma, R K; Negi, M S; Sharma, S; Bhardwaj, P; Kumar, R; Bhattachrya, E; Tripathi, S B; Vijayan, D; Baruah, A R; Das, S C; Bera, B; Rajkumar, R; Thomas, J; Sud, R K; Muraleedharan, N; Hazarika, M; Lakshmikumaran, M; Raina, S N; Ahuja, P S

    2010-08-01

    India has a large repository of important tea accessions and, therefore, plays a major role in improving production and quality of tea across the world. Using seven AFLP primer combinations, we analyzed 123 commercially important tea accessions representing major populations in India. The overall genetic similarity recorded was 51%. No significant differences were recorded in average genetic similarity among tea populations cultivated in various geographic regions (northwest 0.60, northeast and south both 0.59). UPGMA cluster analysis grouped the tea accessions according to geographic locations, with a bias toward China or Assam/Cambod types. Cluster analysis results were congruent with principal component analysis. Further, analysis of molecular variance detected a high level of genetic variation (85%) within and limited genetic variation (15%) among the populations, suggesting their origin from a similar genetic pool.

  18. Computational methods for evaluation of cell-based data assessment--Bioconductor.

    PubMed

    Le Meur, Nolwenn

    2013-02-01

    Recent advances in miniaturization and automation of technologies have enabled cell-based assay high-throughput screening, bringing along new challenges in data analysis. Automation, standardization, reproducibility have become requirements for qualitative research. The Bioconductor community has worked in that direction proposing several R packages to handle high-throughput data including flow cytometry (FCM) experiment. Altogether, these packages cover the main steps of a FCM analysis workflow, that is, data management, quality assessment, normalization, outlier detection, automated gating, cluster labeling, and feature extraction. Additionally, the open-source philosophy of R and Bioconductor, which offers room for new development, continuously drives research and improvement of theses analysis methods, especially in the field of clustering and data mining. This review presents the principal FCM packages currently available in R and Bioconductor, their advantages and their limits. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Investigation of cell wall composition related to stem lodging resistance in wheat (Triticum aestivum L.) by FTIR spectroscopy.

    PubMed

    Wang, Jian; Zhu, Jinmao; Huang, RuZhu; Yang, YuSheng

    2012-07-01

    We explored the rapid qualitative analysis of wheat cultivars with good lodging resistances by Fourier transform infrared resonance (FTIR) spectroscopy and multivariate statistical analysis. FTIR imaging showing that wheat stem cell walls were mainly composed of cellulose, pectin, protein, and lignin. Principal components analysis (PCA) was used to eliminate multicollinearity among multiple peak absorptions. PCA revealed the developmental internodes of wheat stems could be distributed from low to high along the load of the second principal component, which was consistent with the corresponding bands of cellulose in the FTIR spectra of the cell walls. Furthermore, four distinct stem populations could also be identified by spectral features related to their corresponding mechanical properties via PCA and cluster analysis. Histochemical staining of four types of wheat stems with various abilities to resist lodging revealed that cellulose contributed more than lignin to the ability to resist lodging. These results strongly suggested that the main cell wall component responsible for these differences was cellulose. Therefore, the combination of multivariate analysis and FTIR could rapidly screen wheat cultivars with good lodging resistance. Furthermore, the application of these methods to a much wider range of cultivars of unknown mechanical properties promises to be of interest.

  20. Fine structure of the low-frequency spectra of heart rate and blood pressure

    PubMed Central

    Kuusela, Tom A; Kaila, Timo J; Kähönen, Mika

    2003-01-01

    Background The aim of this study was to explore the principal frequency components of the heart rate and blood pressure variability in the low frequency (LF) and very low frequency (VLF) band. The spectral composition of the R–R interval (RRI) and systolic arterial blood pressure (SAP) in the frequency range below 0.15 Hz were carefully analyzed using three different spectral methods: Fast Fourier transform (FFT), Wigner-Ville distribution (WVD), and autoregression (AR). All spectral methods were used to create time–frequency plots to uncover the principal spectral components that are least dependent on time. The accurate frequencies of these components were calculated from the pole decomposition of the AR spectral density after determining the optimal model order – the most crucial factor when using this method – with the help of FFT and WVD methods. Results Spectral analysis of the RRI and SAP of 12 healthy subjects revealed that there are always at least three spectral components below 0.15 Hz. The three principal frequency components are 0.026 ± 0.003 (mean ± SD) Hz, 0.076 ± 0.012 Hz, and 0.117 ± 0.016 Hz. These principal components vary only slightly over time. FFT-based coherence and phase-function analysis suggests that the second and third components are related to the baroreflex control of blood pressure, since the phase difference between SAP and RRI was negative and almost constant, whereas the origin of the first component is different since no clear SAP–RRI phase relationship was found. Conclusion The above data indicate that spontaneous fluctuations in heart rate and blood pressure within the standard low-frequency range of 0.04–0.15 Hz typically occur at two frequency components rather than only at one as widely believed, and these components are not harmonically related. This new observation in humans can help explain divergent results in the literature concerning spontaneous low-frequency oscillations. It also raises methodological and computational questions regarding the usability and validity of the low-frequency spectral band when estimating sympathetic activity and baroreflex gain. PMID:14552660

  1. Fine structure of the low-frequency spectra of heart rate and blood pressure.

    PubMed

    Kuusela, Tom A; Kaila, Timo J; Kähönen, Mika

    2003-10-13

    The aim of this study was to explore the principal frequency components of the heart rate and blood pressure variability in the low frequency (LF) and very low frequency (VLF) band. The spectral composition of the R-R interval (RRI) and systolic arterial blood pressure (SAP) in the frequency range below 0.15 Hz were carefully analyzed using three different spectral methods: Fast Fourier transform (FFT), Wigner-Ville distribution (WVD), and autoregression (AR). All spectral methods were used to create time-frequency plots to uncover the principal spectral components that are least dependent on time. The accurate frequencies of these components were calculated from the pole decomposition of the AR spectral density after determining the optimal model order--the most crucial factor when using this method--with the help of FFT and WVD methods. Spectral analysis of the RRI and SAP of 12 healthy subjects revealed that there are always at least three spectral components below 0.15 Hz. The three principal frequency components are 0.026 +/- 0.003 (mean +/- SD) Hz, 0.076 +/- 0.012 Hz, and 0.117 +/- 0.016 Hz. These principal components vary only slightly over time. FFT-based coherence and phase-function analysis suggests that the second and third components are related to the baroreflex control of blood pressure, since the phase difference between SAP and RRI was negative and almost constant, whereas the origin of the first component is different since no clear SAP-RRI phase relationship was found. The above data indicate that spontaneous fluctuations in heart rate and blood pressure within the standard low-frequency range of 0.04-0.15 Hz typically occur at two frequency components rather than only at one as widely believed, and these components are not harmonically related. This new observation in humans can help explain divergent results in the literature concerning spontaneous low-frequency oscillations. It also raises methodological and computational questions regarding the usability and validity of the low-frequency spectral band when estimating sympathetic activity and baroreflex gain.

  2. Genetic differentiation and geographical Relationship of Asian barley landraces using SSRs

    PubMed Central

    Naeem, Rehan; Dahleen, Lynn; Mirza, Bushra

    2011-01-01

    Genetic diversity in 403 morphologically distinct landraces of barley (Hordeum vulgare L. subsp. vulgare) originating from seven geographical zones of Asia was studied using simple sequence repeat (SSR) markers from regions of medium to high recombination in the barley genome. The seven polymorphic SSR markers representing each of the chromosomes chosen for the study revealed a high level of allelic diversity among the landraces. Genetic richness was highest in those from India, followed by Pakistan while it was lowest for Uzbekistan and Turkmenistan. Out of the 50 alleles detected, 15 were unique to a geographic region. Genetic diversity was highest for landraces from Pakistan (0.70 ± 0.06) and lowest for those from Uzbekistan (0.18 ± 0.17). Likewise, polymorphic information content (PIC) was highest for Pakistan (0.67 ± 0.06) and lowest for Uzbekistan (0.15 ± 0.17). Diversity among groups was 40% compared to 60% within groups. Principal component analysis clustered the barley landraces into three groups to predict their domestication patterns. In total 51.58% of the variation was explained by the first two principal components of the barley germplasm. Pakistan landraces were clustered separately from those of India, Iran, Nepal and Iraq, whereas those from Turkmenistan and Uzbekistan were clustered together into a separate group. PMID:21734828

  3. Early forest fire detection using principal component analysis of infrared video

    NASA Astrophysics Data System (ADS)

    Saghri, John A.; Radjabi, Ryan; Jacobs, John T.

    2011-09-01

    A land-based early forest fire detection scheme which exploits the infrared (IR) temporal signature of fire plume is described. Unlike common land-based and/or satellite-based techniques which rely on measurement and discrimination of fire plume directly from its infrared and/or visible reflectance imagery, this scheme is based on exploitation of fire plume temporal signature, i.e., temperature fluctuations over the observation period. The method is simple and relatively inexpensive to implement. The false alarm rate is expected to be lower that of the existing methods. Land-based infrared (IR) cameras are installed in a step-stare-mode configuration in potential fire-prone areas. The sequence of IR video frames from each camera is digitally processed to determine if there is a fire within camera's field of view (FOV). The process involves applying a principal component transformation (PCT) to each nonoverlapping sequence of video frames from the camera to produce a corresponding sequence of temporally-uncorrelated principal component (PC) images. Since pixels that form a fire plume exhibit statistically similar temporal variation (i.e., have a unique temporal signature), PCT conveniently renders the footprint/trace of the fire plume in low-order PC images. The PC image which best reveals the trace of the fire plume is then selected and spatially filtered via simple threshold and median filter operations to remove the background clutter, such as traces of moving tree branches due to wind.

  4. Assessment of technological level of stem cell research using principal component analysis.

    PubMed

    Do Cho, Sung; Hwan Hyun, Byung; Kim, Jae Kyeom

    2016-01-01

    In general, technological levels have been assessed based on specialist's opinion through the methods such as Delphi. But in such cases, results could be significantly biased per study design and individual expert. In this study, therefore scientific literatures and patents were selected by means of analytic indexes for statistic approach and technical assessment of stem cell fields. The analytic indexes, numbers and impact indexes of scientific literatures and patents, were weighted based on principal component analysis, and then, were summated into the single value. Technological obsolescence was calculated through the cited half-life of patents issued by the United States Patents and Trademark Office and was reflected in technological level assessment. As results, ranks of each nation's in reference to the technology level were rated by the proposed method. Furthermore we were able to evaluate strengthens and weaknesses thereof. Although our empirical research presents faithful results, in the further study, there is a need to compare the existing methods and the suggested method.

  5. Architectural measures of the cancellous bone of the mandibular condyle identified by principal components analysis.

    PubMed

    Giesen, E B W; Ding, M; Dalstra, M; van Eijden, T M G J

    2003-09-01

    As several morphological parameters of cancellous bone express more or less the same architectural measure, we applied principal components analysis to group these measures and correlated these to the mechanical properties. Cylindrical specimens (n = 24) were obtained in different orientations from embalmed mandibular condyles; the angle of the first principal direction and the axis of the specimen, expressing the orientation of the trabeculae, ranged from 10 degrees to 87 degrees. Morphological parameters were determined by a method based on Archimedes' principle and by micro-CT scanning, and the mechanical properties were obtained by mechanical testing. The principal components analysis was used to obtain a set of independent components to describe the morphology. This set was entered into linear regression analyses for explaining the variance in mechanical properties. The principal components analysis revealed four components: amount of bone, number of trabeculae, trabecular orientation, and miscellaneous. They accounted for about 90% of the variance in the morphological variables. The component loadings indicated that a higher amount of bone was primarily associated with more plate-like trabeculae, and not with more or thicker trabeculae. The trabecular orientation was most determinative (about 50%) in explaining stiffness, strength, and failure energy. The amount of bone was second most determinative and increased the explained variance to about 72%. These results suggest that trabecular orientation and amount of bone are important in explaining the anisotropic mechanical properties of the cancellous bone of the mandibular condyle.

  6. Dimension Reduction of Hyperspectral Data on Beowulf Clusters

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek

    2000-01-01

    Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operation. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold a great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, which is used widely in remote sensing, is the Principal Components Analysis (PCA). In light of the growing number of spectral channels of modern instruments, the paper reports on the development of a parallel PCA and its implementation on two Beowulf cluster configurations, on with fast Ethernet switch and the other is with a Myrinet interconnection.

  7. Authentication of monofloral Yemeni Sidr honey using ultraviolet spectroscopy and chemometric analysis.

    PubMed

    Roshan, Abdul-Rahman A; Gad, Haidy A; El-Ahmady, Sherweit H; Khanbash, Mohamed S; Abou-Shoer, Mohamed I; Al-Azizi, Mohamed M

    2013-08-14

    This work describes a simple model developed for the authentication of monofloral Yemeni Sidr honey using UV spectroscopy together with chemometric techniques of hierarchical cluster analysis (HCA), principal component analysis (PCA), and soft independent modeling of class analogy (SIMCA). The model was constructed using 13 genuine Sidr honey samples and challenged with 25 honey samples of different botanical origins. HCA and PCA were successfully able to present a preliminary clustering pattern to segregate the genuine Sidr samples from the lower priced local polyfloral and non-Sidr samples. The SIMCA model presented a clear demarcation of the samples and was used to identify genuine Sidr honey samples as well as detect admixture with lower priced polyfloral honey by detection limits >10%. The constructed model presents a simple and efficient method of analysis and may serve as a basis for the authentication of other honey types worldwide.

  8. Improved Statistical Fault Detection Technique and Application to Biological Phenomena Modeled by S-Systems.

    PubMed

    Mansouri, Majdi; Nounou, Mohamed N; Nounou, Hazem N

    2017-09-01

    In our previous work, we have demonstrated the effectiveness of the linear multiscale principal component analysis (PCA)-based moving window (MW)-generalized likelihood ratio test (GLRT) technique over the classical PCA and multiscale principal component analysis (MSPCA)-based GLRT methods. The developed fault detection algorithm provided optimal properties by maximizing the detection probability for a particular false alarm rate (FAR) with different values of windows, and however, most real systems are nonlinear, which make the linear PCA method not able to tackle the issue of non-linearity to a great extent. Thus, in this paper, first, we apply a nonlinear PCA to obtain an accurate principal component of a set of data and handle a wide range of nonlinearities using the kernel principal component analysis (KPCA) model. The KPCA is among the most popular nonlinear statistical methods. Second, we extend the MW-GLRT technique to one that utilizes exponential weights to residuals in the moving window (instead of equal weightage) as it might be able to further improve fault detection performance by reducing the FAR using exponentially weighed moving average (EWMA). The developed detection method, which is called EWMA-GLRT, provides improved properties, such as smaller missed detection and FARs and smaller average run length. The idea behind the developed EWMA-GLRT is to compute a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. This provides a more accurate estimation of the GLRT statistic and provides a stronger memory that will enable better decision making with respect to fault detection. Therefore, in this paper, a KPCA-based EWMA-GLRT method is developed and utilized in practice to improve fault detection in biological phenomena modeled by S-systems and to enhance monitoring process mean. The idea behind a KPCA-based EWMA-GLRT fault detection algorithm is to combine the advantages brought forward by the proposed EWMA-GLRT fault detection chart with the KPCA model. Thus, it is used to enhance fault detection of the Cad System in E. coli model through monitoring some of the key variables involved in this model such as enzymes, transport proteins, regulatory proteins, lysine, and cadaverine. The results demonstrate the effectiveness of the proposed KPCA-based EWMA-GLRT method over Q , GLRT, EWMA, Shewhart, and moving window-GLRT methods. The detection performance is assessed and evaluated in terms of FAR, missed detection rates, and average run length (ARL 1 ) values.

  9. Classifying Facial Actions

    PubMed Central

    Donato, Gianluca; Bartlett, Marian Stewart; Hager, Joseph C.; Ekman, Paul; Sejnowski, Terrence J.

    2010-01-01

    The Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions. PMID:21188284

  10. Authentication of commercial spices based on the similarities between gas chromatographic fingerprints.

    PubMed

    Matsushita, Takaya; Zhao, Jing Jing; Igura, Noriyuki; Shimoda, Mitsuya

    2018-06-01

    A simple and solvent-free method was developed for the authentication of commercial spices. The similarities between gas chromatographic fingerprints were measured using similarity indices and multivariate data analyses, as morphological differentiation between dried powders and small spice particles was challenging. The volatile compounds present in 11 spices (i.e. allspice, anise, black pepper, caraway, clove, coriander, cumin, dill, fennel, star anise, and white pepper) were extracted by headspace solid-phase microextraction, and analysed by gas chromatography-mass spectrometry. The largest 10 peaks were selected from each total ion chromatogram, and a total of 65 volatiles were tentatively identified. The similarity indices (i.e. the congruence coefficients) were calculated using the data matrices of the identified compound relative peak areas to differentiate between two sets of fingerprints. Where pairs of similar fingerprints produced high congruence coefficients (>0.80), distinctive volatile markers were employed to distinguish between these samples. In addition, hierarchical cluster analysis and principal component analysis were performed to visualise the similarity among fingerprints, and the analysed spices were grouped and characterised according to their distinctive major components. This method is suitable for screening unknown spices, and can therefore be employed to evaluate the quality and authenticity of various spices. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  11. Time-oriented hierarchical method for computation of principal components using subspace learning algorithm.

    PubMed

    Jankovic, Marko; Ogawa, Hidemitsu

    2004-10-01

    Principal Component Analysis (PCA) and Principal Subspace Analysis (PSA) are classic techniques in statistical data analysis, feature extraction and data compression. Given a set of multivariate measurements, PCA and PSA provide a smaller set of "basis vectors" with less redundancy, and a subspace spanned by them, respectively. Artificial neurons and neural networks have been shown to perform PSA and PCA when gradient ascent (descent) learning rules are used, which is related to the constrained maximization (minimization) of statistical objective functions. Due to their low complexity, such algorithms and their implementation in neural networks are potentially useful in cases of tracking slow changes of correlations in the input data or in updating eigenvectors with new samples. In this paper we propose PCA learning algorithm that is fully homogeneous with respect to neurons. The algorithm is obtained by modification of one of the most famous PSA learning algorithms--Subspace Learning Algorithm (SLA). Modification of the algorithm is based on Time-Oriented Hierarchical Method (TOHM). The method uses two distinct time scales. On a faster time scale PSA algorithm is responsible for the "behavior" of all output neurons. On a slower scale, output neurons will compete for fulfillment of their "own interests". On this scale, basis vectors in the principal subspace are rotated toward the principal eigenvectors. At the end of the paper it will be briefly analyzed how (or why) time-oriented hierarchical method can be used for transformation of any of the existing neural network PSA method, into PCA method.

  12. Multimorbidity Patterns in Older Adults: An Approach to the Complex Interrelationships Among Chronic Diseases.

    PubMed

    Mino-León, Dolores; Reyes-Morales, Hortensia; Doubova, Svetlana V; Pérez-Cuevas, Ricardo; Giraldo-Rodríguez, Liliana; Agudelo-Botero, Marcela

    2017-01-01

    There is a growing need for evidence based answers to multimorbidity, especially in primary care settings. The aim was estimate the prevalence and patterns of multimorbidity in a Mexican population of public health institution users ≥60 years old. Observational and multicenter study was carried out in four family medicine units in Mexico City; included older men and women who attended at least one consultation with their family doctor during 2013. The most common diseases were grouped into 11 domains. The observed and expected rates, as well as the prevalence ratios, were calculated for the pairs of the more common domains. Logistic regression models were developed to estimate the magnitude of the association. Cluster and principal components analyses were performed to identify multimorbidity patterns. Half of all of the patients who were ≥60 years old and treated by a family doctor had multimorbidity. The most common disease domains were hypertensive and endocrine diseases. The highest prevalence of multimorbidity concerned the renal domain. The domain pairs with the strongest associations were endocrine + renal and hypertension + cardiac. The cluster and principal components analyses revealed five consistent patterns of multimorbidity. The domains grouped into five patterns could establish the framework for developing treatment guides, deepen the knowledge of multimorbidity, develop strategies to prevent it, decrease its burden, and align health services to the care needs that doctors face in daily practice. Copyright © 2017 IMSS. Published by Elsevier Inc. All rights reserved.

  13. Reservoir zonation based on statistical analyses: A case study of the Nubian sandstone, Gulf of Suez, Egypt

    NASA Astrophysics Data System (ADS)

    El Sharawy, Mohamed S.; Gaafar, Gamal R.

    2016-12-01

    Both reservoir engineers and petrophysicists have been concerned about dividing a reservoir into zones for engineering and petrophysics purposes. Through decades, several techniques and approaches were introduced. Out of them, statistical reservoir zonation, stratigraphic modified Lorenz (SML) plot and the principal component and clustering analyses techniques were chosen to apply on the Nubian sandstone reservoir of Palaeozoic - Lower Cretaceous age, Gulf of Suez, Egypt, by using five adjacent wells. The studied reservoir consists mainly of sandstone with some intercalation of shale layers with varying thickness from one well to another. The permeability ranged from less than 1 md to more than 1000 md. The statistical reservoir zonation technique, depending on core permeability, indicated that the cored interval of the studied reservoir can be divided into two zones. Using reservoir properties such as porosity, bulk density, acoustic impedance and interval transit time indicated also two zones with an obvious variation in separation depth and zones continuity. The stratigraphic modified Lorenz (SML) plot indicated the presence of more than 9 flow units in the cored interval as well as a high degree of microscopic heterogeneity. On the other hand, principal component and cluster analyses, depending on well logging data (gamma ray, sonic, density and neutron), indicated that the whole reservoir can be divided at least into four electrofacies having a noticeable variation in reservoir quality, as correlated with the measured permeability. Furthermore, continuity or discontinuity of the reservoir zones can be determined using this analysis.

  14. Isomap transform for segmenting human body shapes.

    PubMed

    Cerveri, P; Sarro, K J; Marchente, M; Barros, R M L

    2011-09-01

    Segmentation of the 3D human body is a very challenging problem in applications exploiting volume capture data. Direct clustering in the Euclidean space is usually complex or even unsolvable. This paper presents an original method based on the Isomap (isometric feature mapping) transform of the volume data-set. The 3D articulated posture is mapped by Isomap in the pose of Da Vinci's Vitruvian man. The limbs are unrolled from each other and separated from the trunk and pelvis, and the topology of the human body shape is recovered. In such a configuration, Hoshen-Kopelman clustering applied to concentric spherical shells is used to automatically group points into the labelled principal curves. Shepard interpolation is utilised to back-map points of the principal curves into the original volume space. The experimental results performed on many different postures have proved the validity of the proposed method. Reliability of less than 2 cm and 3° in the location of the joint centres and direction axes of rotations has been obtained, respectively, which qualifies this procedure as a potential tool for markerless motion analysis.

  15. Extracting insights from the shape of complex data using topology

    PubMed Central

    Lum, P. Y.; Singh, G.; Lehman, A.; Ishkanov, T.; Vejdemo-Johansson, M.; Alagappan, M.; Carlsson, J.; Carlsson, G.

    2013-01-01

    This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods. PMID:23393618

  16. Extracting insights from the shape of complex data using topology.

    PubMed

    Lum, P Y; Singh, G; Lehman, A; Ishkanov, T; Vejdemo-Johansson, M; Alagappan, M; Carlsson, J; Carlsson, G

    2013-01-01

    This paper applies topological methods to study complex high dimensional data sets by extracting shapes (patterns) and obtaining insights about them. Our method combines the best features of existing standard methodologies such as principal component and cluster analyses to provide a geometric representation of complex data sets. Through this hybrid method, we often find subgroups in data sets that traditional methodologies fail to find. Our method also permits the analysis of individual data sets as well as the analysis of relationships between related data sets. We illustrate the use of our method by applying it to three very different kinds of data, namely gene expression from breast tumors, voting data from the United States House of Representatives and player performance data from the NBA, in each case finding stratifications of the data which are more refined than those produced by standard methods.

  17. Minimum number of measurements for evaluating soursop (Annona muricata L.) yield.

    PubMed

    Sánchez, C F B; Teodoro, P E; Londoño, S; Silva, L A; Peixoto, L A; Bhering, L L

    2017-05-31

    Repeatability studies on fruit species are of great importance to identify the minimum number of measurements necessary to accurately select superior genotypes. This study aimed to identify the most efficient method to estimate the repeatability coefficient (r) and predict the minimum number of measurements needed for a more accurate evaluation of soursop (Annona muricata L.) genotypes based on fruit yield. Sixteen measurements of fruit yield from 71 soursop genotypes were carried out between 2000 and 2016. In order to estimate r with the best accuracy, four procedures were used: analysis of variance, principal component analysis based on the correlation matrix, principal component analysis based on the phenotypic variance and covariance matrix, and structural analysis based on the correlation matrix. The minimum number of measurements needed to predict the actual value of individuals was estimated. Principal component analysis using the phenotypic variance and covariance matrix provided the most accurate estimates of both r and the number of measurements required for accurate evaluation of fruit yield in soursop. Our results indicate that selection of soursop genotypes with high fruit yield can be performed based on the third and fourth measurements in the early years and/or based on the eighth and ninth measurements at more advanced stages.

  18. Surface-enhanced Raman spectroscopy introduced into the International Standard Organization (ISO) regulations as an alternative method for detection and identification of pathogens in the food industry.

    PubMed

    Witkowska, Evelin; Korsak, Dorota; Kowalska, Aneta; Księżopolska-Gocalska, Monika; Niedziółka-Jönsson, Joanna; Roźniecka, Ewa; Michałowicz, Weronika; Albrycht, Paweł; Podrażka, Marta; Hołyst, Robert; Waluk, Jacek; Kamińska, Agnieszka

    2017-02-01

    We show that surface-enhanced Raman spectroscopy (SERS) coupled with principal component analysis (PCA) can serve as a fast, reliable, and easy method for detection and identification of food-borne bacteria, namely Salmonella spp., Listeria monocytogenes, and Cronobacter spp., in different types of food matrices (salmon, eggs, powdered infant formula milk, mixed herbs, respectively). The main aim of this work was to introduce the SERS technique into three ISO (6579:2002; 11290-1:1996/A1:2004; 22964:2006) standard procedures required for detection of these bacteria in food. Our study demonstrates that the SERS technique is effective in distinguishing very closely related bacteria within a genus grown on solid and liquid media. The advantages of the proposed ISO-SERS method for bacteria identification include simplicity and reduced time of analysis, from almost 144 h required by standard methods to 48 h for the SERS-based approach. Additionally, PCA allows one to perform statistical classification of studied bacteria and to identify the spectrum of an unknown sample. Calculated first and second principal components (PC-1, PC-2) account for 96, 98, and 90% of total variance in the spectra and enable one to identify the Salmonella spp., L. monocytogenes, and Cronobacter spp., respectively. Moreover, the presented study demonstrates the excellent possibility for simultaneous detection of analyzed food-borne bacteria in one sample test (98% of PC-1 and PC-2) with a goal of splitting the data set into three separated clusters corresponding to the three studied bacteria species. The studies described in this paper suggest that SERS represents an alternative to standard microorganism diagnostic procedures. Graphical Abstract New approach of the SERS strategy for detection and identification of food-borne bacteria, namely S. enterica, L. monocytogenes, and C. sakazakii in selected food matrices.

  19. The Technical and Biological Reproducibility of Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS) Based Typing: Employment of Bioinformatics in a Multicenter Study.

    PubMed

    Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian

    2016-01-01

    The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.

  20. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    PubMed

    Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O

    2015-01-01

    To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  1. Shape characteristics of equilibrium and non-equilibrium fractal clusters.

    PubMed

    Mansfield, Marc L; Douglas, Jack F

    2013-07-28

    It is often difficult in practice to discriminate between equilibrium and non-equilibrium nanoparticle or colloidal-particle clusters that form through aggregation in gas or solution phases. Scattering studies often permit the determination of an apparent fractal dimension, but both equilibrium and non-equilibrium clusters in three dimensions frequently have fractal dimensions near 2, so that it is often not possible to discriminate on the basis of this geometrical property. A survey of the anisotropy of a wide variety of polymeric structures (linear and ring random and self-avoiding random walks, percolation clusters, lattice animals, diffusion-limited aggregates, and Eden clusters) based on the principal components of both the radius of gyration and electric polarizability tensor indicates, perhaps counter-intuitively, that self-similar equilibrium clusters tend to be intrinsically anisotropic at all sizes, while non-equilibrium processes such as diffusion-limited aggregation or Eden growth tend to be isotropic in the large-mass limit, providing a potential means of discriminating these clusters experimentally if anisotropy could be determined along with the fractal dimension. Equilibrium polymer structures, such as flexible polymer chains, are normally self-similar due to the existence of only a single relevant length scale, and are thus anisotropic at all length scales, while non-equilibrium polymer structures that grow irreversibly in time eventually become isotropic if there is no difference in the average growth rates in different directions. There is apparently no proof of these general trends and little theoretical insight into what controls the universal anisotropy in equilibrium polymer structures of various kinds. This is an obvious topic of theoretical investigation, as well as a matter of practical interest. To address this general problem, we consider two experimentally accessible ratios, one between the hydrodynamic and gyration radii, the other between the viscosity and hydrodynamic radii, as potential measures of shape anisotropy. We also find a strong correlation between anisotropy and effective fractal dimension. These observations should provide new practical methods for quantifying the nature of particle clustering in diverse contexts.

  2. Determination of the Characteristics and Classification of Near-Infrared Spectra of Patchouli Oil (Pogostemon Cablin Benth.) from Different Origin

    NASA Astrophysics Data System (ADS)

    Diego, M. C. R.; Purwanto, Y. A.; Sutrisno; Budiastra, I. W.

    2018-05-01

    Research related to the non-destructive method of near-infrared (NIR) spectroscopy in aromatic oil is still in development in Indonesia. The objectives of the study were to determine the characteristics of the near-infrared spectra of patchouli oil and classify it based on its origin. The samples were selected from seven different places in Indonesia (Bogor and Garut from West Java, Aceh, and Jambi from Sumatra and Konawe, Masamba and Kolaka from Sulawesi Island). The spectral data of patchouli oil was obtained by FT-NIR spectrometer at the wavelength of 1000-2500 nm, and after that, the samples were subjected to composition analysis using Gas Chromatography-Mass Spectrometry. The transmittance and absorbance spectra were analyzed and then principal component analysis (PCA) was carried out. Discriminant analysis (DA) of the principal component was developed to classify patchouli oil based on its origin. The result shows that the data of both spectra (transmittance and absorbance spectra) by the PC analysis give a similar result for discriminating the seven types of patchouli oil due to their distribution and behavior. The DA of the three principal component in both data processed spectra could classify patchouli oil accurately. This result exposed that NIR spectroscopy can be successfully used as a correct method to classify patchouli oil based on its origin.

  3. Non-negative Matrix Factorization and Co-clustering: A Promising Tool for Multi-tasks Bearing Fault Diagnosis

    NASA Astrophysics Data System (ADS)

    Shen, Fei; Chen, Chao; Yan, Ruqiang

    2017-05-01

    Classical bearing fault diagnosis methods, being designed according to one specific task, always pay attention to the effectiveness of extracted features and the final diagnostic performance. However, most of these approaches suffer from inefficiency when multiple tasks exist, especially in a real-time diagnostic scenario. A fault diagnosis method based on Non-negative Matrix Factorization (NMF) and Co-clustering strategy is proposed to overcome this limitation. Firstly, some high-dimensional matrixes are constructed using the Short-Time Fourier Transform (STFT) features, where the dimension of each matrix equals to the number of target tasks. Then, the NMF algorithm is carried out to obtain different components in each dimension direction through optimized matching, such as Euclidean distance and divergence distance. Finally, a Co-clustering technique based on information entropy is utilized to realize classification of each component. To verity the effectiveness of the proposed approach, a series of bearing data sets were analysed in this research. The tests indicated that although the diagnostic performance of single task is comparable to traditional clustering methods such as K-mean algorithm and Guassian Mixture Model, the accuracy and computational efficiency in multi-tasks fault diagnosis are improved.

  4. USING GRADIENTS IN LANDSCAPECHARACTER TO IDENTIFY RESPONSES TO NUTRIENTS AND OTHER STRESSORS IN GREAT LAKES COASTAL ECOSYSTEMS

    EPA Science Inventory

    Using GIS and coarse-scale, publicly-available data, the GLEI project has defined the landscape character of areas draining to 76d2 shoreline segments - the entire US portion of the Great Lakes basin. Using principal components and clustering analyses to discriminate among the se...

  5. The Cohort Model: Lessons Learned When Principals Collaborate

    ERIC Educational Resources Information Center

    Umekubo, Lisa A.; Chrispeels, Janet H.; Daly, Alan J.

    2015-01-01

    This study explored a formal structure, the cohort model that a decentralized district put in place over a decade ago. Schools were clustered into cohorts to facilitate professional development for leadership teams for all 44 schools within the district. Drawing upon Senge's components of organizational learning, we used a single case study design…

  6. Multivariate Analysis of Remains of Molluscan Foods Consumed by Latest Pleistocene and Holocene Humans in Nerja Cave, Málaga, Spain

    NASA Astrophysics Data System (ADS)

    Serrano, Francisco; Guerra-Merchán, Antonio; Lozano-Francisco, Carmen; Vera-Peláez, José Luis

    1997-09-01

    Nerja Cave is a karstic cavity used by humans from Late Paleolithic to post-Chalcolithic times. Remains of molluscan foods in the uppermost Pleistocene and Holocene sediments were studied with cluster analysis and principal components analysis, in both Qand Rmodes. The results from cluster analysis distinguished interval groups mainly in accordance with chronology and distinguished assemblages of species mainly according to habitat. Significant changes in the shellfish diet through time were revealed. In the Late Magdalenian, most molluscs consumed consisted of pulmonate gastropods and species from sandy sea bottoms. The Epipaleolithic diet was more varied and included species from rocky shorelines. From the Neolithic onward most molluscs consumed were from rocky shorelines. From the principal components analysis in Qmode, the first factor reflected mainly changes in the predominant capture environment, probably because of major paleogeographic changes. The second factor may reflect selective capture along rocky coastlines during certain times. The third factor correlated well with the sea-surface temperature curve in the western Mediterranean (Alboran Sea) during the late Quaternary.

  7. Determination of Key Flavor Components in Methylene Chloride Extracts from Processed Grapefruit Juice.

    PubMed

    Jella; Rouseff; Goodner; Widmer

    1998-01-19

    The relative correlation of 52 aroma and 5 taste components in commercial not-from-concentrate grapefruit juices with flavor panel preference was determined. Methylene chloride extracts of juice were analyzed using GC/MS with a DB-5 column. Nonvolatiles determined included limonin and naringin by HPLC, degrees Brix, total acids, and degrees Brix/acid ratio. Juice samples were classified into low, medium, or high categories, based on average taste panel preference scores (nine-point hedonic scale). Principal component analysis demonstrated that highest quality juices were tightly clustered. Discriminant analysis indicated that 82% of the samples could be identified in the correct preference category using only myrcene, beta-caryophyllene, linalool, nootkatone, and degrees Brix. Nootkatone alone was not strongly associated with preference scores. The most preferred juices were strongly associated with low myrcene, low linalool, and intermediate levels of beta-caryophyllene.

  8. Discrimination of healthy and osteoarthritic articular cartilage by Fourier transform infrared imaging and Fisher’s discriminant analysis

    PubMed Central

    Mao, Zhi-Hua; Yin, Jian-Hua; Zhang, Xue-Xi; Wang, Xiao; Xia, Yang

    2016-01-01

    Fourier transform infrared spectroscopic imaging (FTIRI) technique can be used to obtain the quantitative information of content and spatial distribution of principal components in cartilage by combining with chemometrics methods. In this study, FTIRI combining with principal component analysis (PCA) and Fisher’s discriminant analysis (FDA) was applied to identify the healthy and osteoarthritic (OA) articular cartilage samples. Ten 10-μm thick sections of canine cartilages were imaged at 6.25μm/pixel in FTIRI. The infrared spectra extracted from the FTIR images were imported into SPSS software for PCA and FDA. Based on the PCA result of 2 principal components, the healthy and OA cartilage samples were effectively discriminated by the FDA with high accuracy of 94% for the initial samples (training set) and cross validation, as well as 86.67% for the prediction group. The study showed that cartilage degeneration became gradually weak with the increase of the depth. FTIRI combined with chemometrics may become an effective method for distinguishing healthy and OA cartilages in future. PMID:26977354

  9. Chemical Composition and Crystal Morphology of Epicuticular Wax in Mature Fruits of 35 Pear (Pyrus spp.) Cultivars

    PubMed Central

    Wu, Xiao; Yin, Hao; Shi, Zebin; Chen, Yangyang; Qi, Kaijie; Qiao, Xin; Wang, Guoming; Cao, Peng; Zhang, Shaoling

    2018-01-01

    An evaluation of fruit wax components will provide us with valuable information for pear breeding and enhancing fruit quality. Here, we dissected the epicuticular wax concentration, composition and structure of mature fruits from 35 pear cultivars belonging to five different species and hybrid interspecies. A total of 146 epicuticular wax compounds were detected, and the wax composition and concentration varied dramatically among species, with the highest level of 1.53 mg/cm2 in Pyrus communis and the lowest level of 0.62 mg/cm2 in Pyrus pyrifolia. Field emission scanning electron microscopy (FESEM) analysis showed amorphous structures of the epicuticular wax crystals of different pear cultivars. Cluster analysis revealed that the Pyrus bretschneideri cultivars were grouped much closer to Pyrus pyrifolia and Pyrus ussuriensis, and the Pyrus sinkiangensis cultivars were clustered into a distant group. Based on the principal component analysis (PCA), the cultivars could be divided into three groups and five groups according to seven main classes of epicuticular wax compounds and 146 wax compounds, respectively. PMID:29875784

  10. A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering.

    PubMed

    Luo, Junhai; Fu, Liang

    2017-06-09

    With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.

  11. Liver DCE-MRI Registration in Manifold Space Based on Robust Principal Component Analysis.

    PubMed

    Feng, Qianjin; Zhou, Yujia; Li, Xueli; Mei, Yingjie; Lu, Zhentai; Zhang, Yu; Feng, Yanqiu; Liu, Yaqin; Yang, Wei; Chen, Wufan

    2016-09-29

    A technical challenge in the registration of dynamic contrast-enhanced magnetic resonance (DCE-MR) imaging in the liver is intensity variations caused by contrast agents. Such variations lead to the failure of the traditional intensity-based registration method. To address this problem, a manifold-based registration framework for liver DCE-MR time series is proposed. We assume that liver DCE-MR time series are located on a low-dimensional manifold and determine intrinsic similarities between frames. Based on the obtained manifold, the large deformation of two dissimilar images can be decomposed into a series of small deformations between adjacent images on the manifold through gradual deformation of each frame to the template image along the geodesic path. Furthermore, manifold construction is important in automating the selection of the template image, which is an approximation of the geodesic mean. Robust principal component analysis is performed to separate motion components from intensity changes induced by contrast agents; the components caused by motion are used to guide registration in eliminating the effect of contrast enhancement. Visual inspection and quantitative assessment are further performed on clinical dataset registration. Experiments show that the proposed method effectively reduces movements while preserving the topology of contrast-enhancing structures and provides improved registration performance.

  12. Description and typology of intensive Chios dairy sheep farms in Greece.

    PubMed

    Gelasakis, A I; Valergakis, G E; Arsenos, G; Banos, G

    2012-06-01

    The aim was to assess the intensified dairy sheep farming systems of the Chios breed in Greece, establishing a typology that may properly describe and characterize them. The study included the total of the 66 farms of the Chios sheep breeders' cooperative Macedonia. Data were collected using a structured direct questionnaire for in-depth interviews, including questions properly selected to obtain a general description of farm characteristics and overall management practices. A multivariate statistical analysis was used on the data to obtain the most appropriate typology. Initially, principal component analysis was used to produce uncorrelated variables (principal components), which would be used for the consecutive cluster analysis. The number of clusters was decided using hierarchical cluster analysis, whereas, the farms were allocated in 4 clusters using k-means cluster analysis. The identified clusters were described and afterward compared using one-way ANOVA or a chi-squared test. The main differences were evident on land availability and use, facility and equipment availability and type, expansion rates, and application of preventive flock health programs. In general, cluster 1 included newly established, intensive, well-equipped, specialized farms and cluster 2 included well-established farms with balanced sheep and feed/crop production. In cluster 3 were assigned small flock farms focusing more on arable crops than on sheep farming with a tendency to evolve toward cluster 2, whereas cluster 4 included farms representing a rather conservative form of Chios sheep breeding with low/intermediate inputs and choosing not to focus on feed/crop production. In the studied set of farms, 4 different farmer attitudes were evident: 1) farming disrupts sheep breeding; feed should be purchased and economies of scale will decrease costs (mainly cluster 1), 2) only exercise/pasture land is necessary; at least part of the feed (pasture) must be home-grown to decrease costs (clusters 1 and 4), 3) providing pasture to sheep is essential; on-farm feed production decreases costs (mainly cluster 3), and 4) large-scale farming (feed production and cash crops) does not disrupt sheep breeding; all feed must be produced on-farm to decrease costs (mainly cluster 3). Conducting a profitability analysis among different clusters, exploring and discovering the most beneficial levels of intensified management and capital investment should now be considered. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  13. Factor structure and individual patterns of DSM-IV conduct disorder criteria in adolescent psychiatric inpatients.

    PubMed

    Janson, Harald; Kjelsberg, Ellen

    2006-01-01

    We investigated the factor structure of the DSM-IV conduct disorder (CD) diagnostic criteria and typical individual patterns of CD subscales in an adolescent inpatient population using detailed hospital records of a Norwegian nationwide sample of 1087 adolescent psychiatric inpatients scored for the 15 DSM-IV CD criteria. Varimax rotated principal components and full-information factor analyses of 12 CD criteria were carried out separately for boys and girls employing two methods. Standardized values on three subscales of CD criteria were subjected to Ward's method of hierarchical cluster analyses followed by k-means relocation employing a double cross-replication design. Similar factor structures emerged regardless of factoring method and gender. With the exception of Criteria 8 ("Fire setting") and 14 ("Run away from home") the factor loadings for both genders were in accordance with Loeber's tripartite model, with Aggression, Delinquency, and Rule Breaking factors largely corresponding to Loeber's overt, covert and authority conflict pathways. A five-cluster solution proved highly replicable and interpretable. One cluster gathered adolescents without CD, and the remaining four described groups with different conceptually meaningful constellations of CD criteria, which were not equally prevalent in each gender. Delinquency appeared in all symptomatic clusters. The cluster analytic results highlighted typical forms of expressions of conduct problems, and the fact that these forms may not be equally prevalent in girls and boys even while the underlying structure of conduct problems may be similar across genders. Future research should address the prediction of specific outcomes from CD criteria subscales or constellations.

  14. Dataset of Fourier transform-infrared coupled with chemometric analysis used to distinguish accessions of Garcinia mangostana L. in Peninsular Malaysia.

    PubMed

    Samsir, Sri A'jilah; Bunawan, Hamidun; Yen, Choong Chee; Noor, Normah Mohd

    2016-09-01

    In this dataset, we distinguish 15 accessions of Garcinia mangostana from Peninsular Malaysia using Fourier transform-infrared spectroscopy coupled with chemometric analysis. We found that the position and intensity of characteristic peaks at 3600-3100 cm(-) (1) in IR spectra allowed discrimination of G. mangostana from different locations. Further principal component analysis (PCA) of all the accessions suggests the two main clusters were formed: samples from Johor, Melaka, and Negeri Sembilan (South) were clustered together in one group while samples from Perak, Kedah, Penang, Selangor, Kelantan, and Terengganu (North and East Coast) were in another clustered group.

  15. A hybrid symplectic principal component analysis and central tendency measure method for detection of determinism in noisy time series with application to mechanomyography

    NASA Astrophysics Data System (ADS)

    Xie, Hong-Bo; Dokos, Socrates

    2013-06-01

    We present a hybrid symplectic geometry and central tendency measure (CTM) method for detection of determinism in noisy time series. CTM is effective for detecting determinism in short time series and has been applied in many areas of nonlinear analysis. However, its performance significantly degrades in the presence of strong noise. In order to circumvent this difficulty, we propose to use symplectic principal component analysis (SPCA), a new chaotic signal de-noising method, as the first step to recover the system dynamics. CTM is then applied to determine whether the time series arises from a stochastic process or has a deterministic component. Results from numerical experiments, ranging from six benchmark deterministic models to 1/f noise, suggest that the hybrid method can significantly improve detection of determinism in noisy time series by about 20 dB when the data are contaminated by Gaussian noise. Furthermore, we apply our algorithm to study the mechanomyographic (MMG) signals arising from contraction of human skeletal muscle. Results obtained from the hybrid symplectic principal component analysis and central tendency measure demonstrate that the skeletal muscle motor unit dynamics can indeed be deterministic, in agreement with previous studies. However, the conventional CTM method was not able to definitely detect the underlying deterministic dynamics. This result on MMG signal analysis is helpful in understanding neuromuscular control mechanisms and developing MMG-based engineering control applications.

  16. A hybrid symplectic principal component analysis and central tendency measure method for detection of determinism in noisy time series with application to mechanomyography.

    PubMed

    Xie, Hong-Bo; Dokos, Socrates

    2013-06-01

    We present a hybrid symplectic geometry and central tendency measure (CTM) method for detection of determinism in noisy time series. CTM is effective for detecting determinism in short time series and has been applied in many areas of nonlinear analysis. However, its performance significantly degrades in the presence of strong noise. In order to circumvent this difficulty, we propose to use symplectic principal component analysis (SPCA), a new chaotic signal de-noising method, as the first step to recover the system dynamics. CTM is then applied to determine whether the time series arises from a stochastic process or has a deterministic component. Results from numerical experiments, ranging from six benchmark deterministic models to 1/f noise, suggest that the hybrid method can significantly improve detection of determinism in noisy time series by about 20 dB when the data are contaminated by Gaussian noise. Furthermore, we apply our algorithm to study the mechanomyographic (MMG) signals arising from contraction of human skeletal muscle. Results obtained from the hybrid symplectic principal component analysis and central tendency measure demonstrate that the skeletal muscle motor unit dynamics can indeed be deterministic, in agreement with previous studies. However, the conventional CTM method was not able to definitely detect the underlying deterministic dynamics. This result on MMG signal analysis is helpful in understanding neuromuscular control mechanisms and developing MMG-based engineering control applications.

  17. Quality assessment of raw and processed Arctium lappa L. through multicomponent quantification, chromatographic fingerprint, and related chemometric analysis.

    PubMed

    Qin, Kunming; Wang, Bin; Li, Weidong; Cai, Hao; Chen, Danni; Liu, Xiao; Yin, Fangzhou; Cai, Baochang

    2015-05-01

    In traditional Chinese medicine, raw and processed herbs are used to treat different diseases. Suitable quality assessment methods are crucial for the discrimination between raw and processed herbs. The dried fruit of Arctium lappa L. and their processed products are widely used in traditional Chinese medicine, yet their therapeutic effects are different. In this study, a novel strategy using high-performance liquid chromatography and diode array detection coupled with multivariate statistical analysis to rapidly explore raw and processed Arctium lappa L. was proposed and validated. Four main components in a total of 30 batches of raw and processed Fructus Arctii samples were analyzed, and ten characteristic peaks were identified in the fingerprint common pattern. Furthermore, similarity evaluation, principal component analysis, and hierachical cluster analysis were applied to demonstrate the distinction. The results suggested that the relative amounts of the chemical components of raw and processed Fructus Arctii samples are different. This new method has been successfully applied to detect the raw and processed Fructus Arctii in marketed herbal medicinal products. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. [Research on discrimination of cabbage and weeds based on visible and near-infrared spectrum analysis].

    PubMed

    Zu, Qin; Zhao, Chun-Jiang; Deng, Wei; Wang, Xiu

    2013-05-01

    The automatic identification of weeds forms the basis for precision spraying of crops infest. The canopy spectral reflectance within the 350-2 500 nm band of two strains of cabbages and five kinds of weeds such as barnyard grass, setaria, crabgrass, goosegrass and pigweed was acquired by ASD spectrometer. According to the spectral curve characteristics, the data in different bands were compressed with different levels to improve the operation efficiency. Firstly, the spectrum was denoised in accordance with the different order of multiple scattering correction (MSC) method and Savitzky-Golay (SG) convolution smoothing method set by different parameters, then the model was built by combining the principal component analysis (PCA) method to extract principal components, finally all kinds of plants were classified by using the soft independent modeling of class analogy (SIMCA) taxonomy and the classification results were compared. The tests results indicate that after the pretreatment of the spectral data with the method of the combination of MSC and SG set with 3rd order, 5th degree polynomial, 21 smoothing points, and the top 10 principal components extraction using PCA as a classification model input variable, 100% correct classification rate was achieved, and it is able to identify cabbage and several kinds of common weeds quickly and nondestructively.

  19. Penalized unsupervised learning with outliers

    PubMed Central

    Witten, Daniela M.

    2013-01-01

    We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057

  20. Identifying natural and anthropogenic sources of metals in urban and rural soils using GIS-based data, PCA, and spatial interpolation

    PubMed Central

    Davis, Harley T.; Aelion, C. Marjorie; McDermott, Suzanne; Lawson, Andrew B.

    2009-01-01

    Determining sources of neurotoxic metals in rural and urban soils is important for mitigating human exposure. Surface soil from four areas with significant clusters of mental retardation and developmental delay (MR/DD) in children, and one control site were analyzed for nine metals and characterized by soil type, climate, ecological region, land use and industrial facilities using readily-available GIS-based data. Kriging, principal component analysis (PCA) and cluster analysis (CA) were used to identify commonalities of metal distribution. Three MR/DD areas (one rural and two urban) had similar soil types and significantly higher soil metal concentrations. PCA and CA results suggested that Ba, Be and Mn were consistently from natural sources; Pb and Hg from anthropogenic sources; and As, Cr, Cu, and Ni from both sources. Arsenic had low commonality estimates, was highly associated with a third PCA factor, and had a complex distribution, complicating mitigation strategies to minimize concentrations and exposures. PMID:19361902

  1. Discrimination of three Pegaga (Centella) varieties and determination of growth-lighting effects on metabolites content based on the chemometry of 1H nuclear magnetic resonance spectroscopy.

    PubMed

    H, Maulidiani; Khatib, Alfi; Shaari, Khozirah; Abas, Faridah; Shitan, Mahendran; Kneer, Ralf; Neto, Victor; Lajis, Nordin H

    2012-01-11

    The metabolites of three species of Apiaceae, also known as Pegaga, were analyzed utilizing (1)H NMR spectroscopy and multivariate data analysis. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) resolved the species, Centella asiatica, Hydrocotyle bonariensis, and Hydrocotyle sibthorpioides, into three clusters. The saponins, asiaticoside and madecassoside, along with chlorogenic acids were the metabolites that contributed most to the separation. Furthermore, the effects of growth-lighting condition to metabolite contents were also investigated. The extracts of C. asiatica grown in full-day light exposure exhibited a stronger radical scavenging activity and contained more triterpenes (asiaticoside and madecassoside), flavonoids, and chlorogenic acids as compared to plants grown in 50% shade. This study established the potential of using a combination of (1)H NMR spectroscopy and multivariate data analyses in differentiating three closely related species and the effects of growth lighting, based on their metabolite contents and identification of the markers contributing to their differences.

  2. Selecting climate simulations for impact studies based on multivariate patterns of climate change.

    PubMed

    Mendlik, Thomas; Gobiet, Andreas

    In climate change impact research it is crucial to carefully select the meteorological input for impact models. We present a method for model selection that enables the user to shrink the ensemble to a few representative members, conserving the model spread and accounting for model similarity. This is done in three steps: First, using principal component analysis for a multitude of meteorological parameters, to find common patterns of climate change within the multi-model ensemble. Second, detecting model similarities with regard to these multivariate patterns using cluster analysis. And third, sampling models from each cluster, to generate a subset of representative simulations. We present an application based on the ENSEMBLES regional multi-model ensemble with the aim to provide input for a variety of climate impact studies. We find that the two most dominant patterns of climate change relate to temperature and humidity patterns. The ensemble can be reduced from 25 to 5 simulations while still maintaining its essential characteristics. Having such a representative subset of simulations reduces computational costs for climate impact modeling and enhances the quality of the ensemble at the same time, as it prevents double-counting of dependent simulations that would lead to biased statistics. The online version of this article (doi:10.1007/s10584-015-1582-0) contains supplementary material, which is available to authorized users.

  3. High Performance Parallel Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek; Kaewpijit, Sinthop

    1998-01-01

    Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operational. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, aimed at concentrating the vital information and discarding redundant data. One such transformation, which is widely used in remote sensing, is the Principal Components Analysis (PCA). This report summarizes our progress on the development of a parallel PCA and its implementation on two Beowulf cluster configuration; one with fast Ethernet switch and the other with a Myrinet interconnection. Details of the implementation and performance results, for typical sets of multispectral and hyperspectral NASA remote sensing data, are presented and analyzed based on the algorithm requirements and the underlying machine configuration. It will be shown that the PCA application is quite challenging and hard to scale on Ethernet-based clusters. However, the measurements also show that a high- performance interconnection network, such as Myrinet, better matches the high communication demand of PCA and can lead to a more efficient PCA execution.

  4. A novel combined approach of diffuse reflectance UV-Vis-NIR spectroscopy and multivariate analysis for non-destructive examination of blue ballpoint pen inks in forensic application.

    PubMed

    Kumar, Raj; Sharma, Vishal

    2017-03-15

    The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%). Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Comparison of dimensionality reduction methods to predict genomic breeding values for carcass traits in pigs.

    PubMed

    Azevedo, C F; Nascimento, M; Silva, F F; Resende, M D V; Lopes, P S; Guimarães, S E F; Glória, L S

    2015-10-09

    A significant contribution of molecular genetics is the direct use of DNA information to identify genetically superior individuals. With this approach, genome-wide selection (GWS) can be used for this purpose. GWS consists of analyzing a large number of single nucleotide polymorphism markers widely distributed in the genome; however, because the number of markers is much larger than the number of genotyped individuals, and such markers are highly correlated, special statistical methods are widely required. Among these methods, independent component regression, principal component regression, partial least squares, and partial principal components stand out. Thus, the aim of this study was to propose an application of the methods of dimensionality reduction to GWS of carcass traits in an F2 (Piau x commercial line) pig population. The results show similarities between the principal and the independent component methods and provided the most accurate genomic breeding estimates for most carcass traits in pigs.

  6. Symptom clustering and quality of life in patients with ovarian cancer undergoing chemotherapy.

    PubMed

    Nho, Ju-Hee; Reul Kim, Sung; Nam, Joo-Hyun

    2017-10-01

    The symptom clusters in patients with ovarian cancer undergoing chemotherapy have not been well evaluated. We investigated the symptom clusters and effects of symptom clusters on the quality of life of patients with ovarian cancer. We recruited 210 ovarian cancer patients being treated with chemotherapy and used a descriptive cross-sectional study design to collect information on their symptoms. To determine inter-relationships among symptoms, a principal component analysis with varimax rotation was performed based on the patient's symptoms (fatigue, pain, sleep disturbance, chemotherapy-induced peripheral neuropathy, anxiety, depression, and sexual dysfunction). All patients had experienced at least two domains of concurrent symptoms, and there were two types of symptom clusters. The first symptom cluster consisted of anxiety, depression, fatigue, and sleep disturbance symptoms, while the second symptom cluster consisted of pain and chemotherapy-induced peripheral neuropathy symptoms. Our subgroup cluster analysis showed that ovarian cancer patients with higher-scoring symptoms had significantly poorer quality of life in both symptom cluster 1 and 2 subgroups, with subgroup-specific patterns. The symptom clusters were different depending on age, age at disease onset, disease duration, recurrence, and performance status of patients with ovarian cancer. In addition, ovarian cancer patients experienced different symptom clusters according to cancer stage. The current study demonstrated that there is a specific pattern of symptom clusters, and symptom clusters negatively influence the quality of life in patients with ovarian cancer. Identifying symptom clusters of ovarian cancer patients may have clinical implications in improving symptom management. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. Characterisation of nucleosides and nucleobases in Mactra veneriformis by high performance liquid chromatography coupled with diode array detector-mass spectrometry (HPLC-DAD-MS).

    PubMed

    Liu, Rui; Ji, Jing; Wang, Lingchong; Chen, Shiyong; Guo, Sheng; Wu, Hao

    2012-11-15

    Mactra veneriformis has been used as sea food and traditional Chinese medicine (TCM) for thousands of years in China. In the present study, a high performance liquid chromatograph coupled with photodiode array detector and electrospray ionisation-mass spectrometer (HPLC-DAD-ESI-MS) method was established for detection of the nucleosides and nucleobases in M. veneriformis from four aquaticultural area of Jiangsu during different harvest time of one year. The validated method was successfully applied to identifying 10 nucleosides and nucleobases in 48 M. veneriformis samples. Quantitative analysis showed that nucleosides and nucleobases are rich in all M. veneriformis samples. However, their contents vary in different areas and harvest times. Principal component analysis (PCA) was used to classify the 48 samples based on the contents of the nucleosides and nucleobases. As a result, the samples could be mainly clustered into four groups, which was similar as aquaticultural areas classification. Based on the results, present method might be applicable for the quality control of M. veneriformis, or even other marine shellfish aquiculture and their products, and the quality of M. veneriformis might be more related with aquaticultural areas. Copyright © 2012 Elsevier Ltd. All rights reserved.

  8. Wing morphometrics as a possible tool for the diagnosis of the Ceratitis fasciventris, C. anonae, C. rosa complex (Diptera, Tephritidae).

    PubMed

    Van Cann, Joannes; Virgilio, Massimiliano; Jordaens, Kurt; De Meyer, Marc

    2015-01-01

    Previous attempts to resolve the Ceratitis FAR complex (Ceratitis fasciventris, Ceratitis anonae, Ceratitis rosa, Diptera, Tephritidae) showed contrasting results and revealed the occurrence of five microsatellite genotypic clusters (A, F1, F2, R1, R2). In this paper we explore the potential of wing morphometrics for the diagnosis of FAR morphospecies and genotypic clusters. We considered a set of 227 specimens previously morphologically identified and genotyped at 16 microsatellite loci. Seventeen wing landmarks and 6 wing band areas were used for morphometric analyses. Permutational multivariate analysis of variance detected significant differences both across morphospecies and genotypic clusters (for both males and females). Unconstrained and constrained ordinations did not properly resolve groups corresponding to morphospecies or genotypic clusters. However, posterior group membership probabilities (PGMPs) of the Discriminant Analysis of Principal Components (DAPC) allowed the consistent identification of a relevant proportion of specimens (but with performances differing across morphospecies and genotypic clusters). This study suggests that wing morphometrics and PGMPs might represent a possible tool for the diagnosis of species within the FAR complex. Here, we propose a tentative diagnostic method and provide a first reference library of morphometric measures that might be used for the identification of additional and unidentified FAR specimens.

  9. Confocal Raman imaging for cancer cell classification

    NASA Astrophysics Data System (ADS)

    Mathieu, Evelien; Van Dorpe, Pol; Stakenborg, Tim; Liu, Chengxun; Lagae, Liesbet

    2014-05-01

    We propose confocal Raman imaging as a label-free single cell characterization method that can be used as an alternative for conventional cell identification techniques that typically require labels, long incubation times and complex sample preparation. In this study it is investigated whether cancer and blood cells can be distinguished based on their Raman spectra. 2D Raman scans are recorded of 114 single cells, i.e. 60 breast (MCF-7), 5 cervix (HeLa) and 39 prostate (LNCaP) cancer cells and 10 monocytes (from healthy donors). For each cell an average spectrum is calculated and principal component analysis is performed on all average cell spectra. The main features of these principal components indicate that the information for cell identification based on Raman spectra mainly comes from the fatty acid composition in the cell. Based on the second and third principal component, blood cells could be distinguished from cancer cells; and prostate cancer cells could be distinguished from breast and cervix cancer cells. However, it was not possible to distinguish breast and cervix cancer cells. The results obtained in this study, demonstrate the potential of confocal Raman imaging for cell type classification and identification purposes.

  10. How Many Separable Sources? Model Selection In Independent Components Analysis

    PubMed Central

    Woods, Roger P.; Hansen, Lars Kai; Strother, Stephen

    2015-01-01

    Unlike mixtures consisting solely of non-Gaussian sources, mixtures including two or more Gaussian components cannot be separated using standard independent components analysis methods that are based on higher order statistics and independent observations. The mixed Independent Components Analysis/Principal Components Analysis (mixed ICA/PCA) model described here accommodates one or more Gaussian components in the independent components analysis model and uses principal components analysis to characterize contributions from this inseparable Gaussian subspace. Information theory can then be used to select from among potential model categories with differing numbers of Gaussian components. Based on simulation studies, the assumptions and approximations underlying the Akaike Information Criterion do not hold in this setting, even with a very large number of observations. Cross-validation is a suitable, though computationally intensive alternative for model selection. Application of the algorithm is illustrated using Fisher's iris data set and Howells' craniometric data set. Mixed ICA/PCA is of potential interest in any field of scientific investigation where the authenticity of blindly separated non-Gaussian sources might otherwise be questionable. Failure of the Akaike Information Criterion in model selection also has relevance in traditional independent components analysis where all sources are assumed non-Gaussian. PMID:25811988

  11. A study on the impact of hydroxypropyl methylcellulose on the viscosity of PEG melt suspensions using surface plots and principal component analysis.

    PubMed

    Oh, Ching Mien; Heng, Paul Wan Sia; Chan, Lai Wah

    2015-04-01

    An understanding of the rheological behaviour of polymer melt suspensions is crucial in pharmaceutical manufacturing, especially when processed by spray congealing or melt extruding. However, a detailed comparison of the viscosities at each and every temperature and concentration between the various grades of adjuvants in the formulation will be tedious and time-consuming. Therefore, the statistical method, principal component analysis (PCA), was explored in this study. The composite formulations comprising polyethylene glycol (PEG) 3350 and hydroxypropyl methylcellulose (HPMC) of ten different grades (K100 LV, K4M, K15M, K100M, E15 LV, E50 LV, E4M, F50 LV, F4M and Methocel VLV) at various concentrations were prepared and their viscosities at different temperatures determined. Surface plots showed that concentration of HPMC had a greater effect on the viscosity compared to temperature. Particle size and size distribution of HPMC played an important role in the viscosity of melt suspensions. Smaller particles led to a greater viscosity than larger particles. PCA was used to evaluate formulations of different viscosities. The complex viscosity profiles of the various formulations containing HPMC were successfully classified into three clusters of low, moderate and high viscosity. Formulations within each group showed similar viscosities despite differences in grade or concentration of HPMC. Formulations in the low viscosity cluster were found to be sprayable. PCA was able to differentiate the complex viscosity profiles of different formulations containing HPMC in an efficient and time-saving manner and provided an excellent visualisation of the data.

  12. Localized Principal Component Analysis based Curve Evolution: A Divide and Conquer Approach

    PubMed Central

    Appia, Vikram; Ganapathy, Balaji; Yezzi, Anthony; Faber, Tracy

    2014-01-01

    We propose a novel localized principal component analysis (PCA) based curve evolution approach which evolves the segmenting curve semi-locally within various target regions (divisions) in an image and then combines these locally accurate segmentation curves to obtain a global segmentation. The training data for our approach consists of training shapes and associated auxiliary (target) masks. The masks indicate the various regions of the shape exhibiting highly correlated variations locally which may be rather independent of the variations in the distant parts of the global shape. Thus, in a sense, we are clustering the variations exhibited in the training data set. We then use a parametric model to implicitly represent each localized segmentation curve as a combination of the local shape priors obtained by representing the training shapes and the masks as a collection of signed distance functions. We also propose a parametric model to combine the locally evolved segmentation curves into a single hybrid (global) segmentation. Finally, we combine the evolution of these semilocal and global parameters to minimize an objective energy function. The resulting algorithm thus provides a globally accurate solution, which retains the local variations in shape. We present some results to illustrate how our approach performs better than the traditional approach with fully global PCA. PMID:25520901

  13. Relaxation mode analysis of a peptide system: comparison with principal component analysis.

    PubMed

    Mitsutake, Ayori; Iijima, Hiromitsu; Takano, Hiroshi

    2011-10-28

    This article reports the first attempt to apply the relaxation mode analysis method to a simulation of a biomolecular system. In biomolecular systems, the principal component analysis is a well-known method for analyzing the static properties of fluctuations of structures obtained by a simulation and classifying the structures into some groups. On the other hand, the relaxation mode analysis has been used to analyze the dynamic properties of homopolymer systems. In this article, a long Monte Carlo simulation of Met-enkephalin in gas phase has been performed. The results are analyzed by the principal component analysis and relaxation mode analysis methods. We compare the results of both methods and show the effectiveness of the relaxation mode analysis.

  14. Nucleus and cytoplasm segmentation in microscopic images using K-means clustering and region growing

    PubMed Central

    Sarrafzadeh, Omid; Dehnavi, Alireza Mehri

    2015-01-01

    Background: Segmentation of leukocytes acts as the foundation for all automated image-based hematological disease recognition systems. Most of the time, hematologists are interested in evaluation of white blood cells only. Digital image processing techniques can help them in their analysis and diagnosis. Materials and Methods: The main objective of this paper is to detect leukocytes from a blood smear microscopic image and segment them into their two dominant elements, nucleus and cytoplasm. The segmentation is conducted using two stages of applying K-means clustering. First, the nuclei are segmented using K-means clustering. Then, a proposed method based on region growing is applied to separate the connected nuclei. Next, the nuclei are subtracted from the original image. Finally, the cytoplasm is segmented using the second stage of K-means clustering. Results: The results indicate that the proposed method is able to extract the nucleus and cytoplasm regions accurately and works well even though there is no significant contrast between the components in the image. Conclusions: In this paper, a method based on K-means clustering and region growing is proposed in order to detect leukocytes from a blood smear microscopic image and segment its components, the nucleus and the cytoplasm. As region growing step of the algorithm relies on the information of edges, it will not able to separate the connected nuclei more accurately in poor edges and it requires at least a weak edge to exist between the nuclei. The nucleus and cytoplasm segments of a leukocyte can be used for feature extraction and classification which leads to automated leukemia detection. PMID:26605213

  15. Compressive strength of human openwedges: a selection method

    NASA Astrophysics Data System (ADS)

    Follet, H.; Gotteland, M.; Bardonnet, R.; Sfarghiu, A. M.; Peyrot, J.; Rumelhart, C.

    2004-02-01

    A series of 44 samples of bone wedges of human origin, intended for allograft openwedge osteotomy and obtained without particular precautions during hip arthroplasty were re-examined. After viral inactivity chemical treatment, lyophilisation and radio-sterilisation (intended to produce optimal health safety), the compressive strength, independent of age, sex and the height of the sample (or angle of cut), proved to be too widely dispersed [ 10{-}158 MPa] in the first study. We propose a method for selecting samples which takes into account their geometry (width, length, thicknesses, cortical surface area). Statistical methods (Principal Components Analysis PCA, Hierarchical Cluster Analysis, Multilinear regression) allowed final selection of 29 samples having a mean compressive strength σ_{max} =103 MPa ± 26 and with variation [ 61{-}158 MPa] . These results are equivalent or greater than average materials currently used in openwedge osteotomy.

  16. Micro-Raman spectroscopy of natural and synthetic indigo samples.

    PubMed

    Vandenabeele, Peter; Moens, Luc

    2003-02-01

    In this work indigo samples from three different sources are studied by using Raman spectroscopy: the synthetic pigment and pigments from the woad (Isatis tinctoria) and the indigo plant (Indigofera tinctoria). 21 samples were obtained from 8 suppliers; for each sample 5 Raman spectra were recorded and used for further chemometrical analysis. Principal components analysis (PCA) was performed as data reduction method before applying hierarchical cluster analysis. Linear discriminant analysis (LDA) was implemented as a non-hierarchical supervised pattern recognition method to build a classification model. In order to avoid broad-shaped interferences from the fluorescence background, the influence of 1st and 2nd derivatives on the classification was studied by using cross-validation. Although chemically identical, it is shown that Raman spectroscopy in combination with suitable chemometric methods has the potential to discriminate between synthetic and natural indigo samples.

  17. Quantification and recognition of parkinsonian gait from monocular video imaging using kernel-based principal component analysis

    PubMed Central

    2011-01-01

    Background The computer-aided identification of specific gait patterns is an important issue in the assessment of Parkinson's disease (PD). In this study, a computer vision-based gait analysis approach is developed to assist the clinical assessments of PD with kernel-based principal component analysis (KPCA). Method Twelve PD patients and twelve healthy adults with no neurological history or motor disorders within the past six months were recruited and separated according to their "Non-PD", "Drug-On", and "Drug-Off" states. The participants were asked to wear light-colored clothing and perform three walking trials through a corridor decorated with a navy curtain at their natural pace. The participants' gait performance during the steady-state walking period was captured by a digital camera for gait analysis. The collected walking image frames were then transformed into binary silhouettes for noise reduction and compression. Using the developed KPCA-based method, the features within the binary silhouettes can be extracted to quantitatively determine the gait cycle time, stride length, walking velocity, and cadence. Results and Discussion The KPCA-based method uses a feature-extraction approach, which was verified to be more effective than traditional image area and principal component analysis (PCA) approaches in classifying "Non-PD" controls and "Drug-Off/On" PD patients. Encouragingly, this method has a high accuracy rate, 80.51%, for recognizing different gaits. Quantitative gait parameters are obtained, and the power spectrums of the patients' gaits are analyzed. We show that that the slow and irregular actions of PD patients during walking tend to transfer some of the power from the main lobe frequency to a lower frequency band. Our results indicate the feasibility of using gait performance to evaluate the motor function of patients with PD. Conclusion This KPCA-based method requires only a digital camera and a decorated corridor setup. The ease of use and installation of the current method provides clinicians and researchers a low cost solution to monitor the progression of and the treatment to PD. In summary, the proposed method provides an alternative to perform gait analysis for patients with PD. PMID:22074315

  18. Comparison of five Lonicera flowers by simultaneous determination of multi-components with single reference standard method and principal component analysis.

    PubMed

    Gao, Wen; Wang, Rui; Li, Dan; Liu, Ke; Chen, Jun; Li, Hui-Jun; Xu, Xiaojun; Li, Ping; Yang, Hua

    2016-01-05

    The flowers of Lonicera japonica Thunb. were extensively used to treat many diseases. As the demands for L. japonica increased, some related Lonicera plants were often confused or misused. Caffeoylquinic acids were always regarded as chemical markers in the quality control of L. japonica, but they could be found in all Lonicera species. Thus, a simple and reliable method for the evaluation of different Lonicera flowers is necessary to be established. In this work a method based on single standard to determine multi-components (SSDMC) combined with principal component analysis (PCA) for control and distinguish of Lonicera species flowers have been developed. Six components including three caffeoylquinic acids and three iridoid glycosides were assayed simultaneously using chlorogenic acid as the reference standard. The credibility and feasibility of the SSDMC method were carefully validated and the results demonstrated that there were no remarkable differences compared with external standard method. Finally, a total of fifty-one batches covering five Lonicera species were analyzed and PCA was successfully applied to distinguish the Lonicera species. This strategy simplifies the processes in the quality control of multiple-componential herbal medicine which effectively adapted for improving the quality control of those herbs belonging to closely related species. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Sample handling for mass spectrometric proteomic investigations of human sera.

    PubMed

    West-Nielsen, Mikkel; Høgdall, Estrid V; Marchiori, Elena; Høgdall, Claus K; Schou, Christian; Heegaard, Niels H H

    2005-08-15

    Proteomic investigations of sera are potentially of value for diagnosis, prognosis, choice of therapy, and disease activity assessment by virtue of discovering new biomarkers and biomarker patterns. Much debate focuses on the biological relevance and the need for identification of such biomarkers while less effort has been invested in devising standard procedures for sample preparation and storage in relation to model building based on complex sets of mass spectrometric (MS) data. Thus, development of standardized methods for collection and storage of patient samples together with standards for transportation and handling of samples are needed. This requires knowledge about how sample processing affects MS-based proteome analyses and thereby how nonbiological biased classification errors are avoided. In this study, we characterize the effects of sample handling, including clotting conditions, storage temperature, storage time, and freeze/thaw cycles, on MS-based proteomics of human serum by using principal components analysis, support vector machine learning, and clustering methods based on genetic algorithms as class modeling and prediction methods. Using spiking to artificially create differentiable sample groups, this integrated approach yields data that--even when working with sample groups that differ more than may be expected in biological studies--clearly demonstrate the need for comparable sampling conditions for samples used for modeling and for the samples that are going into the test set group. Also, the study emphasizes the difference between class prediction and class comparison studies as well as the advantages and disadvantages of different modeling methods.

  20. The evolution of cerebrotypes in birds.

    PubMed

    Iwaniuk, Andrew N; Hurd, Peter L

    2005-01-01

    Multivariate analyses of brain composition in mammals, amphibians and fish have revealed the evolution of 'cerebrotypes' that reflect specific niches and/or clades. Here, we present the first demonstration of similar cerebrotypes in birds. Using principal component analysis and hierarchical clustering methods to analyze a data set of 67 species, we demonstrate that five main cerebrotypes can be recognized. One type is dominated by galliforms and pigeons, among other species, that all share relatively large brainstems, but can be further differentiated by the proportional size of the cerebellum and telencephalic regions. The second cerebrotype contains a range of species that all share relatively large cerebellar and small nidopallial volumes. A third type is composed of two species, the tawny frogmouth (Podargus strigoides) and an owl, both of which share extremely large Wulst volumes. Parrots and passerines, the principal members of the fourth group, possess much larger nidopallial, mesopallial and striatopallidal proportions than the other groups. The fifth cerebrotype contains species such as raptors and waterfowl that are not found at the extremes for any of the brain regions and could therefore be classified as 'generalist' brains. Overall, the clustering of species does not directly reflect the phylogenetic relationships among species, but there is a tendency for species within an order to clump together. There may also be a weak relationship between cerebrotype and developmental differences, but two of the main clusters contained species with both altricial and precocial developmental patterns. As a whole, the groupings do agree with behavioral and ecological similarities among species. Most notably, species that share similarities in locomotor behavior, mode of prey capture or cognitive ability are clustered together. The relationship between cerebrotype and behavior/ecology in birds suggests that future comparative studies of brain-behavior relationships will benefit from adopting a multivariate approach. Copyright 2005 S. Karger AG, Basel.

Top