Peterson, Leif E
2002-01-01
CLUSFAVOR (CLUSter and Factor Analysis with Varimax Orthogonal Rotation) 5.0 is a Windows-based computer program for hierarchical cluster and principal-component analysis of microarray-based transcriptional profiles. CLUSFAVOR 5.0 standardizes input data; sorts data according to gene-specific coefficient of variation, standard deviation, average and total expression, and Shannon entropy; performs hierarchical cluster analysis using nearest-neighbor, unweighted pair-group method using arithmetic averages (UPGMA), or furthest-neighbor joining methods, and Euclidean, correlation, or jack-knife distances; and performs principal-component analysis. PMID:12184816
ERIC Educational Resources Information Center
Steinley, Douglas; Brusco, Michael J.; Henson, Robert
2012-01-01
A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…
NASA Astrophysics Data System (ADS)
Ginanjar, Irlandia; Pasaribu, Udjianna S.; Indratno, Sapto W.
2017-03-01
This article presents the application of the principal component analysis (PCA) biplot for the needs of data mining. This article aims to simplify and objectify the methods for objects clustering in PCA biplot. The novelty of this paper is to get a measure that can be used to objectify the objects clustering in PCA biplot. Orthonormal eigenvectors, which are the coefficients of a principal component model representing an association between principal components and initial variables. The existence of the association is a valid ground to objects clustering based on principal axes value, thus if m principal axes used in the PCA, then the objects can be classified into 2m clusters. The inter-city buses are clustered based on maintenance costs data by using two principal axes PCA biplot. The buses are clustered into four groups. The first group is the buses with high maintenance costs, especially for lube, and brake canvass. The second group is the buses with high maintenance costs, especially for tire, and filter. The third group is the buses with low maintenance costs, especially for lube, and brake canvass. The fourth group is buses with low maintenance costs, especially for tire, and filter.
Principal Component Clustering Approach to Teaching Quality Discriminant Analysis
ERIC Educational Resources Information Center
Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan
2016-01-01
Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…
NASA Astrophysics Data System (ADS)
Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue
2017-08-01
Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.
Use of multivariate statistics to identify unreliable data obtained using CASA.
Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón
2013-06-01
In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Dynamic competitive probabilistic principal components analysis.
López-Rubio, Ezequiel; Ortiz-DE-Lazcano-Lobato, Juan Miguel
2009-04-01
We present a new neural model which extends the classical competitive learning (CL) by performing a Probabilistic Principal Components Analysis (PPCA) at each neuron. The model also has the ability to learn the number of basis vectors required to represent the principal directions of each cluster, so it overcomes a drawback of most local PCA models, where the dimensionality of a cluster must be fixed a priori. Experimental results are presented to show the performance of the network with multispectral image data.
Yin, Yihang; Liu, Fengzheng; Zhou, Xiang; Li, Quanzhong
2015-08-07
Wireless sensor networks (WSNs) have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA). First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.
Matsen IV, Frederick A.; Evans, Steven N.
2013-01-01
Principal components analysis (PCA) and hierarchical clustering are two of the most heavily used techniques for analyzing the differences between nucleic acid sequence samples taken from a given environment. They have led to many insights regarding the structure of microbial communities. We have developed two new complementary methods that leverage how this microbial community data sits on a phylogenetic tree. Edge principal components analysis enables the detection of important differences between samples that contain closely related taxa. Each principal component axis is a collection of signed weights on the edges of the phylogenetic tree, and these weights are easily visualized by a suitable thickening and coloring of the edges. Squash clustering outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context. We present these methods and illustrate their use with data from the human microbiome. PMID:23505415
Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie
2011-11-01
The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.
NASA Astrophysics Data System (ADS)
Ueki, Kenta; Iwamori, Hikaru
2017-10-01
In this study, with a view of understanding the structure of high-dimensional geochemical data and discussing the chemical processes at work in the evolution of arc magmas, we employed principal component analysis (PCA) to evaluate the compositional variations of volcanic rocks from the Sengan volcanic cluster of the Northeastern Japan Arc. We analyzed the trace element compositions of various arc volcanic rocks, sampled from 17 different volcanoes in a volcanic cluster. The PCA results demonstrated that the first three principal components accounted for 86% of the geochemical variation in the magma of the Sengan region. Based on the relationships between the principal components and the major elements, the mass-balance relationships with respect to the contributions of minerals, the composition of plagioclase phenocrysts, geothermal gradient, and seismic velocity structure in the crust, the first, the second, and the third principal components appear to represent magma mixing, crystallizations of olivine/pyroxene, and crystallizations of plagioclase, respectively. These represented 59%, 20%, and 6%, respectively, of the variance in the entire compositional range, indicating that magma mixing accounted for the largest variance in the geochemical variation of the arc magma. Our result indicated that crustal processes dominate the geochemical variation of magma in the Sengan volcanic cluster.
Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques
NASA Astrophysics Data System (ADS)
Gulgundi, Mohammad Shahid; Shetty, Amba
2018-03-01
Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.
Liu, Xiang; Guo, Ling-Peng; Zhang, Fei-Yun; Ma, Jie; Mu, Shu-Yong; Zhao, Xin; Li, Lan-Hai
2015-02-01
Eight physical and chemical indicators related to water quality were monitored from nineteen sampling sites along the Kunes River at the end of snowmelt season in spring. To investigate the spatial distribution characteristics of water physical and chemical properties, cluster analysis (CA), discriminant analysis (DA) and principal component analysis (PCA) are employed. The result of cluster analysis showed that the Kunes River could be divided into three reaches according to the similarities of water physical and chemical properties among sampling sites, representing the upstream, midstream and downstream of the river, respectively; The result of discriminant analysis demonstrated that the reliability of such a classification was high, and DO, Cl- and BOD5 were the significant indexes leading to this classification; Three principal components were extracted on the basis of the principal component analysis, in which accumulative variance contribution could reach 86.90%. The result of principal component analysis also indicated that water physical and chemical properties were mostly affected by EC, ORP, NO3(-) -N, NH4(+) -N, Cl- and BOD5. The sorted results of principal component scores in each sampling sites showed that the water quality was mainly influenced by DO in upstream, by pH in midstream, and by the rest of indicators in downstream. The order of comprehensive scores for principal components revealed that the water quality degraded from the upstream to downstream, i.e., the upstream had the best water quality, followed by the midstream, while the water quality at downstream was the worst. This result corresponded exactly to the three reaches classified using cluster analysis. Anthropogenic activity and the accumulation of pollutants along the river were probably the main reasons leading to this spatial difference.
Rosacea assessment by erythema index and principal component analysis segmentation maps
NASA Astrophysics Data System (ADS)
Kuzmina, Ilona; Rubins, Uldis; Saknite, Inga; Spigulis, Janis
2017-12-01
RGB images of rosacea were analyzed using segmentation maps of principal component analysis (PCA) and erythema index (EI). Areas of segmented clusters were compared to Clinician's Erythema Assessment (CEA) values given by two dermatologists. The results show that visible blood vessels are segmented more precisely on maps of the erythema index and the third principal component (PC3). In many cases, a distribution of clusters on EI and PC3 maps are very similar. Mean values of clusters' areas on these maps show a decrease of the area of blood vessels and erythema and an increase of lighter skin area after the therapy for the patients with diagnosis CEA = 2 on the first visit and CEA=1 on the second visit. This study shows that EI and PC3 maps are more useful than the maps of the first (PC1) and second (PC2) principal components for indicating vascular structures and erythema on the skin of rosacea patients and therapy monitoring.
Preliminary Comparisons of the Information Content and Utility of TM Versus MSS Data
NASA Technical Reports Server (NTRS)
Markham, B. L.
1984-01-01
Comparisons were made between subscenes from the first TM scene acquired of the Washington, D.C. area and a MSS scene acquired approximately one year earlier. Three types of analyses were conducted to compare TM and MSS data: a water body analysis, a principal components analysis and a spectral clustering analysis. The water body analysis compared the capability of the TM to the MSS for detecting small uniform targets. Of the 59 ponds located on aerial photographs 34 (58%) were detected by the TM with six commission errors (15%) and 13 (22%) were detected by the MSS with three commission errors (19%). The smallest water body detected by the TM was 16 meters; the smallest detected by the MSS was 40 meters. For the principal components analysis, means and covariance matrices were calculated for each subscene, and principal components images generated and characterized. In the spectral clustering comparison each scene was independently clustered and the clusters were assigned to informational classes. The preliminary comparison indicated that TM data provides enhancements over MSS in terms of (1) small target detection and (2) data dimensionality (even with 4-band data). The extra dimension, partially resultant from TM band 1, appears useful for built-up/non-built-up area separation.
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value
Using Machine Learning Techniques in the Analysis of Oceanographic Data
NASA Astrophysics Data System (ADS)
Falcinelli, K. E.; Abuomar, S.
2017-12-01
Acoustic Doppler Current Profilers (ADCPs) are oceanographic tools capable of collecting large amounts of current profile data. Using unsupervised machine learning techniques such as principal component analysis, fuzzy c-means clustering, and self-organizing maps, patterns and trends in an ADCP dataset are found. Cluster validity algorithms such as visual assessment of cluster tendency and clustering index are used to determine the optimal number of clusters in the ADCP dataset. These techniques prove to be useful in analysis of ADCP data and demonstrate potential for future use in other oceanographic applications.
Self-aggregation in scaled principal component space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, Chris H.Q.; He, Xiaofeng; Zha, Hongyuan
2001-10-05
Automatic grouping of voluminous data into meaningful structures is a challenging task frequently encountered in broad areas of science, engineering and information processing. These data clustering tasks are frequently performed in Euclidean space or a subspace chosen from principal component analysis (PCA). Here we describe a space obtained by a nonlinear scaling of PCA in which data objects self-aggregate automatically into clusters. Projection into this space gives sharp distinctions among clusters. Gene expression profiles of cancer tissue subtypes, Web hyperlink structure and Internet newsgroups are analyzed to illustrate interesting properties of the space.
Molecular reclassification of Crohn's disease: a cautionary note on population stratification.
Maus, Bärbel; Jung, Camille; Mahachie John, Jestinah M; Hugot, Jean-Pierre; Génin, Emmanuelle; Van Steen, Kristel
2013-01-01
Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn's disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn's disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals.
Molecular Reclassification of Crohn’s Disease: A Cautionary Note on Population Stratification
Maus, Bärbel; Jung, Camille; Mahachie John, Jestinah M.; Hugot, Jean-Pierre; Génin, Emmanuelle; Van Steen, Kristel
2013-01-01
Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn’s disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn’s disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals. PMID:24147066
Dong, Jianghu J; Wang, Liangliang; Gill, Jagbir; Cao, Jiguo
2017-01-01
This article is motivated by some longitudinal clinical data of kidney transplant recipients, where kidney function progression is recorded as the estimated glomerular filtration rates at multiple time points post kidney transplantation. We propose to use the functional principal component analysis method to explore the major source of variations of glomerular filtration rate curves. We find that the estimated functional principal component scores can be used to cluster glomerular filtration rate curves. Ordering functional principal component scores can detect abnormal glomerular filtration rate curves. Finally, functional principal component analysis can effectively estimate missing glomerular filtration rate values and predict future glomerular filtration rate values.
Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis
2016-10-01
AWARD NUMBER: W81XWH-15-2-0032 TITLE: Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis PRINCIPAL...4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Identifying Subgroups of Tinnitus Using Novel Resting State fMRI Biomarkers and Cluster Analysis 5b...Public Release; Distribution Unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT The subject of the project is FY14 PRMRP Topic Area – Tinnitus . The broad
NASA Astrophysics Data System (ADS)
Jha, S. K.; Brockman, R. A.; Hoffman, R. M.; Sinha, V.; Pilchak, A. L.; Porter, W. J.; Buchanan, D. J.; Larsen, J. M.; John, R.
2018-05-01
Principal component analysis and fuzzy c-means clustering algorithms were applied to slip-induced strain and geometric metric data in an attempt to discover unique microstructural configurations and their frequencies of occurrence in statistically representative instantiations of a titanium alloy microstructure. Grain-averaged fatigue indicator parameters were calculated for the same instantiation. The fatigue indicator parameters strongly correlated with the spatial location of the microstructural configurations in the principal components space. The fuzzy c-means clustering method identified clusters of data that varied in terms of their average fatigue indicator parameters. Furthermore, the number of points in each cluster was inversely correlated to the average fatigue indicator parameter. This analysis demonstrates that data-driven methods have significant potential for providing unbiased determination of unique microstructural configurations and their frequencies of occurrence in a given volume from the point of view of strain localization and fatigue crack initiation.
NASA Astrophysics Data System (ADS)
Sokolov, Anton; Dmitriev, Egor; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmenten, Marc
2016-04-01
The problem of atmospheric contamination by principal air pollutants was considered in the industrialized coastal region of English Channel in Dunkirk influenced by north European metropolitan areas. MESO-NH nested models were used for the simulation of the local atmospheric dynamics and the online calculation of Lagrangian backward trajectories with 15-minute temporal resolution and the horizontal resolution down to 500 m. The one-month mesoscale numerical simulation was coupled with local pollution measurements of volatile organic components, particulate matter, ozone, sulphur dioxide and nitrogen oxides. Principal atmospheric pathways were determined by clustering technique applied to backward trajectories simulated. Six clusters were obtained which describe local atmospheric dynamics, four winds blowing through the English Channel, one coming from the south, and the biggest cluster with small wind speeds. This last cluster includes mostly sea breeze events. The analysis of meteorological data and pollution measurements allows relating the principal atmospheric pathways with local air contamination events. It was shown that contamination events are mostly connected with a channelling of pollution from local sources and low-turbulent states of the local atmosphere.
NASA Astrophysics Data System (ADS)
Farsadnia, Farhad; Ghahreman, Bijan
2016-04-01
Hydrologic homogeneous group identification is considered both fundamental and applied research in hydrology. Clustering methods are among conventional methods to assess the hydrological homogeneous regions. Recently, Self-Organizing feature Map (SOM) method has been applied in some studies. However, the main problem of this method is the interpretation on the output map of this approach. Therefore, SOM is used as input to other clustering algorithms. The aim of this study is to apply a two-level Self-Organizing feature map and Ward hierarchical clustering method to determine the hydrologic homogenous regions in North and Razavi Khorasan provinces. At first by principal component analysis, we reduced SOM input matrix dimension, then the SOM was used to form a two-dimensional features map. To determine homogeneous regions for flood frequency analysis, SOM output nodes were used as input into the Ward method. Generally, the regions identified by the clustering algorithms are not statistically homogeneous. Consequently, they have to be adjusted to improve their homogeneity. After adjustment of the homogeneity regions by L-moment tests, five hydrologic homogeneous regions were identified. Finally, adjusted regions were created by a two-level SOM and then the best regional distribution function and associated parameters were selected by the L-moment approach. The results showed that the combination of self-organizing maps and Ward hierarchical clustering by principal components as input is more effective than the hierarchical method, by principal components or standardized inputs to achieve hydrologic homogeneous regions.
Wolf, Antje; Kirschner, Karl N
2013-02-01
With improvements in computer speed and algorithm efficiency, MD simulations are sampling larger amounts of molecular and biomolecular conformations. Being able to qualitatively and quantitatively sift these conformations into meaningful groups is a difficult and important task, especially when considering the structure-activity paradigm. Here we present a study that combines two popular techniques, principal component (PC) analysis and clustering, for revealing major conformational changes that occur in molecular dynamics (MD) simulations. Specifically, we explored how clustering different PC subspaces effects the resulting clusters versus clustering the complete trajectory data. As a case example, we used the trajectory data from an explicitly solvated simulation of a bacteria's L11·23S ribosomal subdomain, which is a target of thiopeptide antibiotics. Clustering was performed, using K-means and average-linkage algorithms, on data involving the first two to the first five PC subspace dimensions. For the average-linkage algorithm we found that data-point membership, cluster shape, and cluster size depended on the selected PC subspace data. In contrast, K-means provided very consistent results regardless of the selected subspace. Since we present results on a single model system, generalization concerning the clustering of different PC subspaces of other molecular systems is currently premature. However, our hope is that this study illustrates a) the complexities in selecting the appropriate clustering algorithm, b) the complexities in interpreting and validating their results, and c) by combining PC analysis with subsequent clustering valuable dynamic and conformational information can be obtained.
PCA based clustering for brain tumor segmentation of T1w MRI images.
Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay
2017-03-01
Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
ICAP - An Interactive Cluster Analysis Procedure for analyzing remotely sensed data
NASA Technical Reports Server (NTRS)
Wharton, S. W.; Turner, B. J.
1981-01-01
An Interactive Cluster Analysis Procedure (ICAP) was developed to derive classifier training statistics from remotely sensed data. ICAP differs from conventional clustering algorithms by allowing the analyst to optimize the cluster configuration by inspection, rather than by manipulating process parameters. Control of the clustering process alternates between the algorithm, which creates new centroids and forms clusters, and the analyst, who can evaluate and elect to modify the cluster structure. Clusters can be deleted, or lumped together pairwise, or new centroids can be added. A summary of the cluster statistics can be requested to facilitate cluster manipulation. The principal advantage of this approach is that it allows prior information (when available) to be used directly in the analysis, since the analyst interacts with ICAP in a straightforward manner, using basic terms with which he is more likely to be familiar. Results from testing ICAP showed that an informed use of ICAP can improve classification, as compared to an existing cluster analysis procedure.
Opara, Umezuruike Linus; Jacobson, Dan; Al-Saady, Nadiya Abubakar
2010-01-01
Banana is an important crop grown in Oman and there is a dearth of information on its genetic diversity to assist in crop breeding and improvement programs. This study employed amplified fragment length polymorphism (AFLP) to investigate the genetic variation in local banana cultivars from the southern region of Oman. Using 12 primer combinations, a total of 1094 bands were scored, of which 1012 were polymorphic. Eighty-two unique markers were identified, which revealed the distinct separation of the seven cultivars. The results obtained show that AFLP can be used to differentiate the banana cultivars. Further classification by phylogenetic, hierarchical clustering and principal component analyses showed significant differences between the clusters found with molecular markers and those clusters created by previous studies using morphological analysis. Based on the analytical results, a consensus dendrogram of the banana cultivars is presented. PMID:20443211
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steenbergen, K. G., E-mail: kgsteen@gmail.com; Gaston, N.
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement formore » a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.« less
Steenbergen, K G; Gaston, N
2014-02-14
Inspired by methods of remote sensing image analysis, we analyze structural variation in cluster molecular dynamics (MD) simulations through a unique application of the principal component analysis (PCA) and Pearson Correlation Coefficient (PCC). The PCA analysis characterizes the geometric shape of the cluster structure at each time step, yielding a detailed and quantitative measure of structural stability and variation at finite temperature. Our PCC analysis captures bond structure variation in MD, which can be used to both supplement the PCA analysis as well as compare bond patterns between different cluster sizes. Relying only on atomic position data, without requirement for a priori structural input, PCA and PCC can be used to analyze both classical and ab initio MD simulations for any cluster composition or electronic configuration. Taken together, these statistical tools represent powerful new techniques for quantitative structural characterization and isomer identification in cluster MD.
Transforming Graph Data for Statistical Relational Learning
2012-10-01
Jordan, 2003), PLSA (Hofmann, 1999), ? Classification via RMN (Taskar et al., 2003) or SVM (Hasan, Chaoji, Salem , & Zaki, 2006) ? Hierarchical...dimensionality reduction methods such as Principal 407 Rossi, McDowell, Aha, & Neville Component Analysis (PCA), Principal Factor Analysis ( PFA ), and...clustering algorithm. Journal of the Royal Statistical Society. Series C, Applied statistics, 28, 100–108. Hasan, M. A., Chaoji, V., Salem , S., & Zaki, M
Ma, Li; Sun, Jing; Yang, Zhaoguang; Wang, Lin
2015-12-01
Heavy metal contamination attracted a wide spread attention due to their strong toxicity and persistence. The Ganxi River, located in Chenzhou City, Southern China, has been severely polluted by lead/zinc ore mining activities. This work investigated the heavy metal pollution in agricultural soils around the Ganxi River. The total concentrations of heavy metals were determined by inductively coupled plasma-mass spectrometry. The potential risk associated with the heavy metals in soil was assessed by Nemerow comprehensive index and potential ecological risk index. In both methods, the study area was rated as very high risk. Multivariate statistical methods including Pearson's correlation analysis, hierarchical cluster analysis, and principal component analysis were employed to evaluate the relationships between heavy metals, as well as the correlation between heavy metals and pH, to identify the metal sources. Three distinct clusters have been observed by hierarchical cluster analysis. In principal component analysis, a total of two components were extracted to explain over 90% of the total variance, both of which were associated with anthropogenic sources.
Factor Analysis and Counseling Research
ERIC Educational Resources Information Center
Weiss, David J.
1970-01-01
Topics discussed include factor analysis versus cluster analysis, analysis of Q correlation matrices, ipsativity and factor analysis, and tests for the significance of a correlation matrix prior to application of factor analytic techniques. Techniques for factor extraction discussed include principal components, canonical factor analysis, alpha…
NASA Astrophysics Data System (ADS)
Borgelt, Christian
In clustering we often face the situation that only a subset of the available attributes is relevant for forming clusters, even though this may not be known beforehand. In such cases it is desirable to have a clustering algorithm that automatically weights attributes or even selects a proper subset. In this paper I study such an approach for fuzzy clustering, which is based on the idea to transfer an alternative to the fuzzifier (Klawonn and Höppner, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier, In: Proc. 5th Int. Symp. on Intelligent Data Analysis, 254-264, Springer, Berlin, 2003) to attribute weighting fuzzy clustering (Keller and Klawonn, Int J Uncertain Fuzziness Knowl Based Syst 8:735-746, 2000). In addition, by reformulating Gustafson-Kessel fuzzy clustering, a scheme for weighting and selecting principal axes can be obtained. While in Borgelt (Feature weighting and feature selection in fuzzy clustering, In: Proc. 17th IEEE Int. Conf. on Fuzzy Systems, IEEE Press, Piscataway, NJ, 2008) I already presented such an approach for a global selection of attributes and principal axes, this paper extends it to a cluster-specific selection, thus arriving at a fuzzy subspace clustering algorithm (Parsons, Haque, and Liu, 2004).
Fernández-Arjona, María Del Mar; Grondona, Jesús M; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor.
Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable morphological change upon neuraminidase induced inflammation.Hierarchical cluster and principal components analysis allow morphological classification of microglia.Brain location of microglia is a relevant factor. PMID:28848398
Silva, D M; Siqueira, M V B M; Carrasco, N F; Mantello, C C; Nascimento, W F; Veasey, E A
2016-05-23
Dioscorea is the largest genus in the Dioscoreaceae family, and includes a number of economically important species including the air yam, D. bulbifera L. This study aimed to develop new single sequence repeat primers and characterize the genetic diversity of local varieties that originated in several municipalities of Brazil. We developed an enriched genomic library for D. bulbifera resulting in seven primers, six of which were polymorphic, and added four polymorphic loci developed for other Dioscorea species. This resulted in 10 polymorphic primers to evaluate 42 air yam accessions. Thirty-three alleles (bands) were found, with an average of 3.3 alleles per locus. The discrimination power ranged from 0.113 to 0.834, with an average of 0.595. Both principal coordinate and cluster analyses (using the Jaccard Index) failed to clearly separate the accessions according to their origins. However, the 13 accessions from Conceição dos Ouros, Minas Gerais State were clustered above zero on the principal coordinate 2 axis, and were also clustered into one subgroup in the cluster analysis. Accessions from Ubatuba, São Paulo State were clustered below zero on the same principal coordinate 2 axis, except for one accession, although they were scattered in several subgroups in the cluster analysis. Therefore, we found little spatial structure in the accessions, although those from Conceição dos Ouros and Ubatuba exhibited some spatial structure, and that there is a considerable level of genetic diversity in D. bulbifera maintained by traditional farmers in Brazil.
NASA Astrophysics Data System (ADS)
Unglert, K.; Radić, V.; Jellinek, A. M.
2016-06-01
Variations in the spectral content of volcano seismicity related to changes in volcanic activity are commonly identified manually in spectrograms. However, long time series of monitoring data at volcano observatories require tools to facilitate automated and rapid processing. Techniques such as self-organizing maps (SOM) and principal component analysis (PCA) can help to quickly and automatically identify important patterns related to impending eruptions. For the first time, we evaluate the performance of SOM and PCA on synthetic volcano seismic spectra constructed from observations during two well-studied eruptions at Klauea Volcano, Hawai'i, that include features observed in many volcanic settings. In particular, our objective is to test which of the techniques can best retrieve a set of three spectral patterns that we used to compose a synthetic spectrogram. We find that, without a priori knowledge of the given set of patterns, neither SOM nor PCA can directly recover the spectra. We thus test hierarchical clustering, a commonly used method, to investigate whether clustering in the space of the principal components and on the SOM, respectively, can retrieve the known patterns. Our clustering method applied to the SOM fails to detect the correct number and shape of the known input spectra. In contrast, clustering of the data reconstructed by the first three PCA modes reproduces these patterns and their occurrence in time more consistently. This result suggests that PCA in combination with hierarchical clustering is a powerful practical tool for automated identification of characteristic patterns in volcano seismic spectra. Our results indicate that, in contrast to PCA, common clustering algorithms may not be ideal to group patterns on the SOM and that it is crucial to evaluate the performance of these tools on a control dataset prior to their application to real data.
Ramli, Saifullah; Ismail, Noryati; Alkarkhi, Abbas Fadhl Mubarek; Easa, Azhar Mat
2010-08-01
Banana peel flour (BPF) prepared from green or ripe Cavendish and Dream banana fruits were assessed for their total starch (TS), digestible starch (DS), resistant starch (RS), total dietary fibre (TDF), soluble dietary fibre (SDF) and insoluble dietary fibre (IDF). Principal component analysis (PCA) identified that only 1 component was responsible for 93.74% of the total variance in the starch and dietary fibre components that differentiated ripe and green banana flours. Cluster analysis (CA) applied to similar data obtained two statistically significant clusters (green and ripe bananas) to indicate difference in behaviours according to the stages of ripeness based on starch and dietary fibre components. We concluded that the starch and dietary fibre components could be used to discriminate between flours prepared from peels obtained from fruits of different ripeness. The results were also suggestive of the potential of green and ripe BPF as functional ingredients in food.
Ramli, Saifullah; Ismail, Noryati; Alkarkhi, Abbas Fadhl Mubarek; Easa, Azhar Mat
2010-01-01
Banana peel flour (BPF) prepared from green or ripe Cavendish and Dream banana fruits were assessed for their total starch (TS), digestible starch (DS), resistant starch (RS), total dietary fibre (TDF), soluble dietary fibre (SDF) and insoluble dietary fibre (IDF). Principal component analysis (PCA) identified that only 1 component was responsible for 93.74% of the total variance in the starch and dietary fibre components that differentiated ripe and green banana flours. Cluster analysis (CA) applied to similar data obtained two statistically significant clusters (green and ripe bananas) to indicate difference in behaviours according to the stages of ripeness based on starch and dietary fibre components. We concluded that the starch and dietary fibre components could be used to discriminate between flours prepared from peels obtained from fruits of different ripeness. The results were also suggestive of the potential of green and ripe BPF as functional ingredients in food. PMID:24575193
Cuthbertson, Daniel; Andrews, Preston K.; Reganold, John P.; Davies, Neal M.; Lange, B. Markus
2012-01-01
A gas chromatography–mass spectrometry approach was employed to evaluate the use of metabolite patterns to differentiate fruit from six commercially grown apple cultivars harvested in 2008. Principal component analysis (PCA) of apple fruit peel and flesh data indicated that individual cultivar replicates clustered together and were separated from all other cultivar samples. An independent metabolomics investigation with fruit harvested in 2003 confirmed the separate clustering of fruit from different cultivars. Further evidence for cultivar separation was obtained using a hierarchical clustering analysis. An evaluation of PCA component loadings revealed specific metabolite classes that contributed the most to each principal component, whereas a correlation analysis demonstrated that specific metabolites correlate directly with quality traits such as antioxidant activity, total phenolics, and total anthocyanins, which are important parameters in the selection of breeding germplasm. These data sets lay the foundation for elucidating the metabolic basis of commercially important fruit quality traits. PMID:22881116
Chemometric expertise of the quality of groundwater sources for domestic use.
Spanos, Thomas; Ene, Antoaneta; Simeonova, Pavlina
2015-01-01
In the present study 49 representative sites have been selected for the collection of water samples from central water supplies with different geographical locations in the region of Kavala, Northern Greece. Ten physicochemical parameters (pH, electric conductivity, nitrate, chloride, sodium, potassium, total alkalinity, total hardness, bicarbonate and calcium) were analyzed monthly, in the period from January 2010 to December 2010. Chemometric methods were used for monitoring data mining and interpretation (cluster analysis, principal components analysis and source apportioning by principal components regression). The clustering of the chemical indicators delivers two major clusters related to the water hardness and the mineral components (impacted by sea, bedrock and acidity factors). The sampling locations are separated into three major clusters corresponding to the spatial distribution of the sites - coastal, lowland and semi-mountainous. The principal components analysis reveals two latent factors responsible for the data structures, which are also an indication for the sources determining the groundwater quality of the region (conditionally named "mineral" factor and "water hardness" factor). By the apportionment approach it is shown what the contribution is of each of the identified sources to the formation of the total concentration of each one of the chemical parameters. The mean values of the studied physicochemical parameters were found to be within the limits given in the 98/83/EC Directive. The water samples are appropriate for human consumption. The results of this study provide an overview of the hydrogeological profile of water supply system for the studied area.
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data
NASA Astrophysics Data System (ADS)
Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.
2014-12-01
We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.
Description and typology of intensive Chios dairy sheep farms in Greece.
Gelasakis, A I; Valergakis, G E; Arsenos, G; Banos, G
2012-06-01
The aim was to assess the intensified dairy sheep farming systems of the Chios breed in Greece, establishing a typology that may properly describe and characterize them. The study included the total of the 66 farms of the Chios sheep breeders' cooperative Macedonia. Data were collected using a structured direct questionnaire for in-depth interviews, including questions properly selected to obtain a general description of farm characteristics and overall management practices. A multivariate statistical analysis was used on the data to obtain the most appropriate typology. Initially, principal component analysis was used to produce uncorrelated variables (principal components), which would be used for the consecutive cluster analysis. The number of clusters was decided using hierarchical cluster analysis, whereas, the farms were allocated in 4 clusters using k-means cluster analysis. The identified clusters were described and afterward compared using one-way ANOVA or a chi-squared test. The main differences were evident on land availability and use, facility and equipment availability and type, expansion rates, and application of preventive flock health programs. In general, cluster 1 included newly established, intensive, well-equipped, specialized farms and cluster 2 included well-established farms with balanced sheep and feed/crop production. In cluster 3 were assigned small flock farms focusing more on arable crops than on sheep farming with a tendency to evolve toward cluster 2, whereas cluster 4 included farms representing a rather conservative form of Chios sheep breeding with low/intermediate inputs and choosing not to focus on feed/crop production. In the studied set of farms, 4 different farmer attitudes were evident: 1) farming disrupts sheep breeding; feed should be purchased and economies of scale will decrease costs (mainly cluster 1), 2) only exercise/pasture land is necessary; at least part of the feed (pasture) must be home-grown to decrease costs (clusters 1 and 4), 3) providing pasture to sheep is essential; on-farm feed production decreases costs (mainly cluster 3), and 4) large-scale farming (feed production and cash crops) does not disrupt sheep breeding; all feed must be produced on-farm to decrease costs (mainly cluster 3). Conducting a profitability analysis among different clusters, exploring and discovering the most beneficial levels of intensified management and capital investment should now be considered. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Clustering analysis strategies for electron energy loss spectroscopy (EELS).
Torruella, Pau; Estrader, Marta; López-Ortega, Alberto; Baró, Maria Dolors; Varela, Maria; Peiró, Francesca; Estradé, Sònia
2018-02-01
In this work, the use of cluster analysis algorithms, widely applied in the field of big data, is proposed to explore and analyze electron energy loss spectroscopy (EELS) data sets. Three different data clustering approaches have been tested both with simulated and experimental data from Fe 3 O 4 /Mn 3 O 4 core/shell nanoparticles. The first method consists on applying data clustering directly to the acquired spectra. A second approach is to analyze spectral variance with principal component analysis (PCA) within a given data cluster. Lastly, data clustering on PCA score maps is discussed. The advantages and requirements of each approach are studied. Results demonstrate how clustering is able to recover compositional and oxidation state information from EELS data with minimal user input, giving great prospects for its usage in EEL spectroscopy. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.
2014-12-01
The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.
Common factor analysis versus principal component analysis: choice for symptom cluster research.
Kim, Hee-Ju
2008-03-01
The purpose of this paper is to examine differences between two factor analytical methods and their relevance for symptom cluster research: common factor analysis (CFA) versus principal component analysis (PCA). Literature was critically reviewed to elucidate the differences between CFA and PCA. A secondary analysis (N = 84) was utilized to show the actual result differences from the two methods. CFA analyzes only the reliable common variance of data, while PCA analyzes all the variance of data. An underlying hypothetical process or construct is involved in CFA but not in PCA. PCA tends to increase factor loadings especially in a study with a small number of variables and/or low estimated communality. Thus, PCA is not appropriate for examining the structure of data. If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research), CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.
Towards the identification of plant and animal binders on Australian stone knives.
Blee, Alisa J; Walshe, Keryn; Pring, Allan; Quinton, Jamie S; Lenehan, Claire E
2010-07-15
There is limited information regarding the nature of plant and animal residues used as adhesives, fixatives and pigments found on Australian Aboriginal artefacts. This paper reports the use of FTIR in combination with the chemometric tools principal component analysis (PCA) and hierarchical clustering (HC) for the analysis and identification of Australian plant and animal fixatives on Australian stone artefacts. Ten different plant and animal residues were able to be discriminated from each other at a species level by combining FTIR spectroscopy with the chemometric data analysis methods, principal component analysis (PCA) and hierarchical clustering (HC). Application of this method to residues from three broken stone knives from the collections of the South Australian Museum indicated that two of the handles of knives were likely to have contained beeswax as the fixative whilst Spinifex resin was the probable binder on the third. Copyright 2010 Elsevier B.V. All rights reserved.
A modified procedure for mixture-model clustering of regional geochemical data
Ellefsen, Karl J.; Smith, David B.; Horton, John D.
2014-01-01
A modified procedure is proposed for mixture-model clustering of regional-scale geochemical data. The key modification is the robust principal component transformation of the isometric log-ratio transforms of the element concentrations. This principal component transformation and the associated dimension reduction are applied before the data are clustered. The principal advantage of this modification is that it significantly improves the stability of the clustering. The principal disadvantage is that it requires subjective selection of the number of clusters and the number of principal components. To evaluate the efficacy of this modified procedure, it is applied to soil geochemical data that comprise 959 samples from the state of Colorado (USA) for which the concentrations of 44 elements are measured. The distributions of element concentrations that are derived from the mixture model and from the field samples are similar, indicating that the mixture model is a suitable representation of the transformed geochemical data. Each cluster and the associated distributions of the element concentrations are related to specific geologic and anthropogenic features. In this way, mixture model clustering facilitates interpretation of the regional geochemical data.
A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis
Liu, Jingxian; Wu, Kefeng
2017-01-01
The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with traditional spectral clustering and fast affinity propagation clustering. Experimental results have illustrated its superior performance in terms of quantitative and qualitative evaluations. PMID:28777353
Schultz, K K; Bennett, T B; Nordlund, K V; Döpfer, D; Cook, N B
2016-09-01
Transition cow management has been tracked via the Transition Cow Index (TCI; AgSource Cooperative Services, Verona, WI) since 2006. Transition Cow Index was developed to measure the difference between actual and predicted milk yield at first test day to evaluate the relative success of the transition period program. This project aimed to assess TCI in relation to all commonly used Dairy Herd Improvement (DHI) metrics available through AgSource Cooperative Services. Regression analysis was used to isolate variables that were relevant to TCI, and then principal components analysis and network analysis were used to determine the relative strength and relatedness among variables. Finally, cluster analysis was used to segregate herds based on similarity of relevant variables. The DHI data were obtained from 2,131 Wisconsin dairy herds with test-day mean ≥30 cows, which were tested ≥10 times throughout the 2014 calendar year. The original list of 940 DHI variables was reduced through expert-driven selection and regression analysis to 23 variables. The K-means cluster analysis produced 5 distinct clusters. Descriptive statistics were calculated for the 23 variables per cluster grouping. Using principal components analysis, cluster analysis, and network analysis, 4 parameters were isolated as most relevant to TCI; these were energy-corrected milk, 3 measures of intramammary infection (dry cow cure rate, linear somatic cell count score in primiparous cows, and new infection rate), peak ratio, and days in milk at peak milk production. These variables together with cow and newborn calf survival measures form a group of metrics that can be used to assist in the evaluation of overall transition period performance. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students
ERIC Educational Resources Information Center
Valero-Mora, Pedro M.; Ledesma, Ruben D.
2011-01-01
This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…
Samsir, Sri A'jilah; Bunawan, Hamidun; Yen, Choong Chee; Noor, Normah Mohd
2016-09-01
In this dataset, we distinguish 15 accessions of Garcinia mangostana from Peninsular Malaysia using Fourier transform-infrared spectroscopy coupled with chemometric analysis. We found that the position and intensity of characteristic peaks at 3600-3100 cm(-) (1) in IR spectra allowed discrimination of G. mangostana from different locations. Further principal component analysis (PCA) of all the accessions suggests the two main clusters were formed: samples from Johor, Melaka, and Negeri Sembilan (South) were clustered together in one group while samples from Perak, Kedah, Penang, Selangor, Kelantan, and Terengganu (North and East Coast) were in another clustered group.
Analyzing coastal environments by means of functional data analysis
NASA Astrophysics Data System (ADS)
Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.
2017-07-01
Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.
The Potential of Multivariate Analysis in Assessing Students' Attitude to Curriculum Subjects
ERIC Educational Resources Information Center
Gaotlhobogwe, Michael; Laugharne, Janet; Durance, Isabelle
2011-01-01
Background: Understanding student attitudes to curriculum subjects is central to providing evidence-based options to policy makers in education. Purpose: We illustrate how quantitative approaches used in the social sciences and based on multivariate analysis (categorical Principal Components Analysis, Clustering Analysis and General Linear…
Carvalho, Carolina Abreu de; Fonsêca, Poliana Cristina de Almeida; Nobre, Luciana Neri; Priore, Silvia Eloiza; Franceschini, Sylvia do Carmo Castro
2016-01-01
The objective of this study is to provide guidance for identifying dietary patterns using the a posteriori approach, and analyze the methodological aspects of the studies conducted in Brazil that identified the dietary patterns of children. Articles were selected from the Latin American and Caribbean Literature on Health Sciences, Scientific Electronic Library Online and Pubmed databases. The key words were: Dietary pattern; Food pattern; Principal Components Analysis; Factor analysis; Cluster analysis; Reduced rank regression. We included studies that identified dietary patterns of children using the a posteriori approach. Seven studies published between 2007 and 2014 were selected, six of which were cross-sectional and one cohort, Five studies used the food frequency questionnaire for dietary assessment; one used a 24-hour dietary recall and the other a food list. The method of exploratory approach used in most publications was principal components factor analysis, followed by cluster analysis. The sample size of the studies ranged from 232 to 4231, the values of the Kaiser-Meyer-Olkin test from 0.524 to 0.873, and Cronbach's alpha from 0.51 to 0.69. Few Brazilian studies identified dietary patterns of children using the a posteriori approach and principal components factor analysis was the technique most used.
Kumar, Raj G; Rubin, Jonathan E; Berger, Rachel P; Kochanek, Patrick M; Wagner, Amy K
2016-03-01
Studies have characterized absolute levels of multiple inflammatory markers as significant risk factors for poor outcomes after traumatic brain injury (TBI). However, inflammatory marker concentrations are highly inter-related, and production of one may result in the production or regulation of another. Therefore, a more comprehensive characterization of the inflammatory response post-TBI should consider relative levels of markers in the inflammatory pathway. We used principal component analysis (PCA) as a dimension-reduction technique to characterize the sets of markers that contribute independently to variability in cerebrospinal (CSF) inflammatory profiles after TBI. Using PCA results, we defined groups (or clusters) of individuals (n=111) with similar patterns of acute CSF inflammation that were then evaluated in the context of outcome and other relevant CSF and serum biomarkers collected days 0-3 and 4-5 post-injury. We identified four significant principal components (PC1-PC4) for CSF inflammation from days 0-3, and PC1 accounted for the greatest (31%) percentage of variance. PC1 was characterized by relatively higher CSF sICAM-1, sFAS, IL-10, IL-6, sVCAM-1, IL-5, and IL-8 levels. Cluster analysis then defined two distinct clusters, such that individuals in cluster 1 had highly positive PC1 scores and relatively higher levels of CSF cortisol, progesterone, estradiol, testosterone, brain derived neurotrophic factor (BDNF), and S100b; this group also had higher serum cortisol and lower serum BDNF. Multinomial logistic regression analyses showed that individuals in cluster 1 had a 10.9 times increased likelihood of GOS scores of 2/3 vs. 4/5 at 6 months compared to cluster 2, after controlling for covariates. Cluster group did not discriminate between mortality compared to GOS scores of 4/5 after controlling for age and other covariates. Cluster groupings also did not discriminate mortality or 12 month outcomes in multivariate models. PCA and cluster analysis establish that a subset of CSF inflammatory markers measured in days 0-3 post-TBI may distinguish individuals with poor 6-month outcome, and future studies should prospectively validate these findings. PCA of inflammatory mediators after TBI could aid in prognostication and in identifying patient subgroups for therapeutic interventions. Copyright © 2015 Elsevier Inc. All rights reserved.
Multivariate analysis of molecular and morphological diversity in fig (Ficus carica L.)
USDA-ARS?s Scientific Manuscript database
Genetic polymorphism across 15 microsatellite loci among 194 fig accessions including Common, Smyrna, San Pedro, and Caprifig were analyzed using a cluster analysis (CA) and the principal components analysis (PCA). The collection was moderately variable with observed number of alleles per locus rang...
Unsupervised analysis of small animal dynamic Cerenkov luminescence imaging
NASA Astrophysics Data System (ADS)
Spinelli, Antonello E.; Boschi, Federico
2011-12-01
Clustering analysis (CA) and principal component analysis (PCA) were applied to dynamic Cerenkov luminescence images (dCLI). In order to investigate the performances of the proposed approaches, two distinct dynamic data sets obtained by injecting mice with 32P-ATP and 18F-FDG were acquired using the IVIS 200 optical imager. The k-means clustering algorithm has been applied to dCLI and was implemented using interactive data language 8.1. We show that cluster analysis allows us to obtain good agreement between the clustered and the corresponding emission regions like the bladder, the liver, and the tumor. We also show a good correspondence between the time activity curves of the different regions obtained by using CA and manual region of interest analysis on dCLIT and PCA images. We conclude that CA provides an automatic unsupervised method for the analysis of preclinical dynamic Cerenkov luminescence image data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Ling; Harley, Robert A.; Brown, Nancy J.
Cluster analysis was applied to daily 8 h ozone maxima modeled for a summer season to characterize meteorology-induced variations in the spatial distribution of ozone. Principal component analysis is employed to form a reduced dimension set to describe and interpret ozone spatial patterns. The first three principal components (PCs) capture {approx}85% of total variance, with PC1 describing a general spatial trend, and PC2 and PC3 each describing a spatial contrast. Six clusters were identified for California's San Joaquin Valley (SJV) with two low, three moderate, and one high-ozone cluster. The moderate ozone clusters are distinguished by elevated ozone levels inmore » different parts of the valley: northern, western, and eastern, respectively. The SJV ozone clusters have stronger coupling with the San Francisco Bay area (SFB) than with the Sacramento Valley (SV). Variations in ozone spatial distributions induced by anthropogenic emission changes are small relative to the overall variations in ozone amomalies observed for the whole summer. Ozone regimes identified here are mostly determined by the direct and indirect meteorological effects. Existing measurement sites are sufficiently representative to capture ozone spatial patterns in the SFB and SV, but the western side of the SJV is under-sampled.« less
Worldwide Topology of the Scientific Subject Profile: A Macro Approach in the Country Level
Moya-Anegón, Félix; Herrero-Solana, Víctor
2013-01-01
Background Models for the production of knowledge and systems of innovation and science are key elements for characterizing a country in view of its scientific thematic profile. With regard to scientific output and publication in journals of international visibility, the countries of the world may be classified into three main groups according to their thematic bias. Methodology/Principal Findings This paper aims to classify the countries of the world in several broad groups, described in terms of behavioural models that attempt to sum up the characteristics of their systems of knowledge and innovation. We perceive three clusters in our analysis: 1) the biomedical cluster, 2) the basic science & engineering cluster, and 3) the agricultural cluster. The countries are conceptually associated with the clusters via Principal Component Analysis (PCA), and a Multidimensional Scaling (MDS) map with all the countries is presented. Conclusions/Significance As we have seen, insofar as scientific output and publication in journals of international visibility is concerned, the countries of the world may be classified into three main groups according to their thematic profile. These groups can be described in terms of behavioral models that attempt to sum up the characteristics of their systems of knowledge and innovation. PMID:24349467
Sensory characteristics and consumer preference for chicken meat in Guinea.
Sow, T M A; Grongnet, J F
2010-10-01
This study identified the sensory characteristics and consumer preference for chicken meat in Guinea. Five chicken samples [live village chicken, live broiler, live spent laying hen, ready-to-cook broiler, and ready-to-cook broiler (imported)] bought from different locations were assessed by 10 trained panelists using 19 sensory attributes. The ANOVA results showed that 3 chicken appearance attributes (brown, yellow, and white), 5 chicken odor attributes (oily, intense, medicine smell, roasted, and mouth persistent), 3 chicken flavor attributes (sweet, bitter, and astringent), and 8 chicken texture attributes (firm, tender, juicy, chew, smooth, springy, hard, and fibrous) were significantly discriminating between the chicken samples (P<0.05). Principal component analysis of the sensory data showed that the first 2 principal components explained 84% of the sensory data variance. The principal component analysis results showed that the live village chicken, the live spent laying hen, and the ready-to-cook broiler (imported) were very well represented and clearly distinguished from the live broiler and the ready-to-cook broiler. One hundred twenty consumers expressed their preferences for the chicken samples using a 5-point Likert scale. The hierarchical cluster analysis of the preference data identified 4 homogenous consumer clusters. The hierarchical cluster analysis results showed that the live village chicken was the most preferred chicken sample, whereas the ready-to-cook broiler was the least preferred one. The partial least squares regression (PLSR) type 1 showed that 72% of the sensory data for the first 2 principal components explained 83% of the chicken preference. The PLSR1 identified that the sensory characteristics juicy, oily, sweet, hard, mouth persistent, and yellow were the most relevant sensory drivers of the Guinean chicken preference. The PLSR2 (with multiple responses) identified the relationship between the chicken samples, their sensory attributes, and the consumer clusters. Our results showed that there was not a chicken category that was exclusively preferred from the other chicken samples and therefore highlight the existence of place for development of all chicken categories in the local market.
Cluster Analysis of Atmospheric Dynamics and Pollution Transport in a Coastal Area
NASA Astrophysics Data System (ADS)
Sokolov, Anton; Dmitriev, Egor; Maksimovich, Elena; Delbarre, Hervé; Augustin, Patrick; Gengembre, Cyril; Fourmentin, Marc; Locoge, Nadine
2016-11-01
Summertime atmospheric dynamics in the coastal zone of the industrialized Dunkerque agglomeration in northern France was characterized by a cluster analysis of back trajectories in the context of pollution transport. The MESO-NH atmospheric model was used to simulate the local dynamics at multiple scales with horizontal resolution down to 500 m, and for the online calculation of the Lagrangian backward trajectories with 30-min temporal resolution. Airmass transport was performed along six principal pathways obtained by the weighted k-means clustering technique. Four of these centroids corresponded to a range of wind speeds over the English Channel: two for wind directions from the north-east and two from the south-west. Another pathway corresponded to a south-westerly continental transport. The backward trajectories of the largest and most dispersed sixth cluster contained low wind speeds, including sea-breeze circulations. Based on analyses of meteorological data and pollution measurements, the principal atmospheric pathways were related to local air-contamination events. Continuous air quality and meteorological data were collected during the Benzene-Toluene-Ethylbenzene-Xylene 2006 campaign. The sites of the pollution measurements served as the endpoints for the backward trajectories. Pollutant transport pathways corresponding to the highest air contamination were defined.
Papaleo, Elena; Mereghetti, Paolo; Fantucci, Piercarlo; Grandori, Rita; De Gioia, Luca
2009-01-01
Several molecular dynamics (MD) simulations were used to sample conformations in the neighborhood of the native structure of holo-myoglobin (holo-Mb), collecting trajectories spanning 0.22 micros at 300 K. Principal component (PCA) and free-energy landscape (FEL) analyses, integrated by cluster analysis, which was performed considering the position and structures of the individual helices of the globin fold, were carried out. The coherence between the different structural clusters and the basins of the FEL, together with the convergence of parameters derived by PCA indicates that an accurate description of the Mb conformational space around the native state was achieved by multiple MD trajectories spanning at least 0.14 micros. The integration of FEL, PCA, and structural clustering was shown to be a very useful approach to gain an overall view of the conformational landscape accessible to a protein and to identify representative protein substates. This method could be also used to investigate the conformational and dynamical properties of Mb apo-, mutant, or delete versions, in which greater conformational variability is expected and, therefore identification of representative substates from the simulations is relevant to disclose structure-function relationship.
Vidigal, Pedrina Gonçalves; Mosel, Frank; Koehling, Hedda Luise; Mueller, Karl Dieter; Buer, Jan; Rath, Peter Michael; Steinmann, Joerg
2014-12-01
Stenotrophomonas maltophilia is an opportunist multidrug-resistant pathogen that causes a wide range of nosocomial infections. Various cystic fibrosis (CF) centres have reported an increasing prevalence of S. maltophilia colonization/infection among patients with this disease. The purpose of this study was to assess specific fingerprints of S. maltophilia isolates from CF patients (n = 71) by investigating fatty acid methyl esters (FAMEs) through gas chromatography (GC) and highly abundant proteins by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS), and to compare them with isolates obtained from intensive care unit (ICU) patients (n = 20) and the environment (n = 11). Principal component analysis (PCA) of GC-FAME patterns did not reveal a clustering corresponding to distinct CF, ICU or environmental types. Based on the peak area index, it was observed that S. maltophilia isolates from CF patients produced significantly higher amounts of fatty acids in comparison with ICU patients and the environmental isolates. Hierarchical cluster analysis (HCA) based on the MALDI-TOF MS peak profiles of S. maltophilia revealed the presence of five large clusters, suggesting a high phenotypic diversity. Although HCA of MALDI-TOF mass spectra did not result in distinct clusters predominantly composed of CF isolates, PCA revealed the presence of a distinct cluster composed of S. maltophilia isolates from CF patients. Our data suggest that S. maltophilia colonizing CF patients tend to modify not only their fatty acid patterns but also their protein patterns as a response to adaptation in the unfavourable environment of the CF lung. © 2014 The Authors.
Performance evaluation of PCA-based spike sorting algorithms.
Adamos, Dimitrios A; Kosmidis, Efstratios K; Theophilidis, George
2008-09-01
Deciphering the electrical activity of individual neurons from multi-unit noisy recordings is critical for understanding complex neural systems. A widely used spike sorting algorithm is being evaluated for single-electrode nerve trunk recordings. The algorithm is based on principal component analysis (PCA) for spike feature extraction. In the neuroscience literature it is generally assumed that the use of the first two or most commonly three principal components is sufficient. We estimate the optimum PCA-based feature space by evaluating the algorithm's performance on simulated series of action potentials. A number of modifications are made to the open source nev2lkit software to enable systematic investigation of the parameter space. We introduce a new metric to define clustering error considering over-clustering more favorable than under-clustering as proposed by experimentalists for our data. Both the program patch and the metric are available online. Correlated and white Gaussian noise processes are superimposed to account for biological and artificial jitter in the recordings. We report that the employment of more than three principal components is in general beneficial for all noise cases considered. Finally, we apply our results to experimental data and verify that the sorting process with four principal components is in agreement with a panel of electrophysiology experts.
Evaluation of Low-Voltage Distribution Network Index Based on Improved Principal Component Analysis
NASA Astrophysics Data System (ADS)
Fan, Hanlu; Gao, Suzhou; Fan, Wenjie; Zhong, Yinfeng; Zhu, Lei
2018-01-01
In order to evaluate the development level of the low-voltage distribution network objectively and scientifically, chromatography analysis method is utilized to construct evaluation index model of low-voltage distribution network. Based on the analysis of principal component and the characteristic of logarithmic distribution of the index data, a logarithmic centralization method is adopted to improve the principal component analysis algorithm. The algorithm can decorrelate and reduce the dimensions of the evaluation model and the comprehensive score has a better dispersion degree. The clustering method is adopted to analyse the comprehensive score because the comprehensive score of the courts is concentrated. Then the stratification evaluation of the courts is realized. An example is given to verify the objectivity and scientificity of the evaluation method.
Khamis, Fathiya M.; Masiga, Daniel K.; Mohamed, Samira A.; Salifu, Daisy; de Meyer, Marc; Ekesi, Sunday
2012-01-01
In 2003, a new fruit fly pest species was recorded for the first time in Kenya and has subsequently been found in 28 countries across tropical Africa. The insect was described as Bactrocera invadens, due to its rapid invasion of the African continent. In this study, the morphometry and DNA Barcoding of different populations of B. invadens distributed across the species range of tropical Africa and a sample from the pest's putative aboriginal home of Sri Lanka was investigated. Morphometry using wing veins and tibia length was used to separate B. invadens populations from other closely related Bactrocera species. The Principal component analysis yielded 15 components which correspond to the 15 morphometric measurements. The first two principal axes contributed to 90.7% of the total variance and showed partial separation of these populations. Canonical discriminant analysis indicated that only the first five canonical variates were statistically significant. The first two canonical variates contributed a total of 80.9% of the total variance clustering B. invadens with other members of the B. dorsalis complex while distinctly separating B. correcta, B. cucurbitae, B. oleae and B. zonata. The largest Mahalanobis squared distance (D2 = 122.9) was found to be between B. cucurbitae and B. zonata, while the lowest was observed between B. invadens populations against B. kandiensis (8.1) and against B. dorsalis s.s (11.4). Evolutionary history inferred by the Neighbor-Joining method clustered the Bactrocera species populations into four clusters. First cluster consisted of the B. dorsalis complex (B. invadens, B. kandiensis and B. dorsalis s. s.), branching from the same node while the second group was paraphyletic clades of B. correcta and B. zonata. The last two are monophyletic clades, consisting of B. cucurbitae and B. oleae, respectively. Principal component analysis using the genetic distances confirmed the clustering inferred by the NJ tree. PMID:23028649
Hydrochemical and multivariate analysis of groundwater quality in the northwest of Sinai, Egypt.
El-Shahat, M F; Sadek, M A; Salem, W M; Embaby, A A; Mohamed, F A
2017-08-01
The northwestern coast of Sinai is home to many economic activities and development programs, thus evaluation of the potentiality and vulnerability of water resources is important. The present work has been conducted on the groundwater resources of this area for describing the major features of groundwater quality and the principal factors that control salinity evolution. The major ionic content of 39 groundwater samples collected from the Quaternary aquifer shows high coefficients of variation reflecting asymmetry of aquifer recharge. The groundwater samples have been classified into four clusters (using hierarchical cluster analysis), these match the variety of total dissolvable solids, water types and ionic orders. The principal component analysis combined the ionic parameters of the studied groundwater samples into two principal components. The first represents about 56% of the whole sample variance reflecting a salinization due to evaporation, leaching, dissolution of marine salts and/or seawater intrusion. The second represents about 15.8% reflecting dilution with rain water and the El-Salam Canal. Most groundwater samples were not suitable for human consumption and about 41% are suitable for irrigation. However, all groundwater samples are suitable for cattle, about 69% and 15% are suitable for horses and poultry, respectively.
Yücel, Yasin; Sultanoğlu, Pınar
2013-09-01
Chemical characterisation has been carried out on 45 honey samples collected from Hatay region of Turkey. The concentrations of 17 elements were determined by inductively coupled plasma optical emission spectrometry (ICP-OES). Ca, K, Mg and Na were the most abundant elements, with mean contents of 219.38, 446.93, 49.06 and 95.91 mg kg(-1) respectively. The trace element mean contents ranged between 0.03 and 15.07 mg kg(-1). Chemometric methods such as principal component analysis (PCA) and cluster analysis (CA) techniques were applied to classify honey according to mineral content. The first most important principal component (PC) was strongly associated with the value of Al, B, Cd and Co. CA showed eight clusters corresponding to the eight botanical origins of honey. PCA explained 75.69% of the variance with the first six PC variables. Chemometric analysis of the analytical data allowed the accurate classification of the honey samples according to origin. Copyright © 2013 Elsevier Ltd. All rights reserved.
STAR FORMATION ACROSS THE W3 COMPLEX
DOE Office of Scientific and Technical Information (OSTI.GOV)
Román-Zúñiga, Carlos G.; Ybarra, Jason E.; Tapia, Mauricio
We present a multi-wavelength analysis of the history of star formation in the W3 complex. Using deep, near-infrared ground-based images combined with images obtained with Spitzer and Chandra observatories, we identified and classified young embedded sources. We identified the principal clusters in the complex and determined their structure and extension. We constructed extinction-limited samples for five principal clusters and constructed K-band luminosity functions that we compare with those of artificial clusters with varying ages. This analysis provided mean ages and possible age spreads for the clusters. We found that IC 1795, the centermost cluster of the complex, still hosts amore » large fraction of young sources with circumstellar disks. This indicates that star formation was active in IC 1795 as recently as 2 Myr ago, simultaneous to the star-forming activity in the flanking embedded clusters, W3-Main and W3(OH). A comparison with carbon monoxide emission maps indicates strong velocity gradients in the gas clumps hosting W3-Main and W3(OH) and shows small receding clumps of gas at IC 1795, suggestive of rapid gas removal (faster than the T Tauri timescale) in the cluster-forming regions. We discuss one possible scenario for the progression of cluster formation in the W3 complex. We propose that early processes of gas collapse in the main structure of the complex could have defined the progression of cluster formation across the complex with relatively small age differences from one group to another. However, triggering effects could act as catalysts for enhanced efficiency of formation at a local level, in agreement with previous studies.« less
Recuerda, Maximilien; Périé, Delphine; Gilbert, Guillaume; Beaudoin, Gilles
2012-10-12
The treatment planning of spine pathologies requires information on the rigidity and permeability of the intervertebral discs (IVDs). Magnetic resonance imaging (MRI) offers great potential as a sensitive and non-invasive technique for describing the mechanical properties of IVDs. However, the literature reported small correlation coefficients between mechanical properties and MRI parameters. Our hypothesis is that the compressive modulus and the permeability of the IVD can be predicted by a linear combination of MRI parameters. Sixty IVDs were harvested from bovine tails, and randomly separated in four groups (in-situ, digested-6h, digested-18h, digested-24h). Multi-parametric MRI acquisitions were used to quantify the relaxation times T1 and T2, the magnetization transfer ratio MTR, the apparent diffusion coefficient ADC and the fractional anisotropy FA. Unconfined compression, confined compression and direct permeability measurements were performed to quantify the compressive moduli and the hydraulic permeabilities. Differences between groups were evaluated from a one way ANOVA. Multi linear regressions were performed between dependent mechanical properties and independent MRI parameters to verify our hypothesis. A principal component analysis was used to convert the set of possibly correlated variables into a set of linearly uncorrelated variables. Agglomerative Hierarchical Clustering was performed on the 3 principal components. Multilinear regressions showed that 45 to 80% of the Young's modulus E, the aggregate modulus in absence of deformation HA0, the radial permeability kr and the axial permeability in absence of deformation k0 can be explained by the MRI parameters within both the nucleus pulposus and the annulus pulposus. The principal component analysis reduced our variables to two principal components with a cumulative variability of 52-65%, which increased to 70-82% when considering the third principal component. The dendograms showed a natural division into four clusters for the nucleus pulposus and into three or four clusters for the annulus fibrosus. The compressive moduli and the permeabilities of isolated IVDs can be assessed mostly by MT and diffusion sequences. However, the relationships have to be improved with the inclusion of MRI parameters more sensitive to IVD degeneration. Before the use of this technique to quantify the mechanical properties of IVDs in vivo on patients suffering from various diseases, the relationships have to be defined for each degeneration state of the tissue that mimics the pathology. Our MRI protocol associated to principal component analysis and agglomerative hierarchical clustering are promising tools to classify the degenerated intervertebral discs and further find biomarkers and predictive factors of the evolution of the pathologies.
Computational gene expression profiling under salt stress reveals patterns of co-expression
Sanchita; Sharma, Ashok
2016-01-01
Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411
Fleming, Brandon J.; LaMotte, Andrew E.; Sekellick, Andrew J.
2013-01-01
Hydrogeologic regions in the fractured rock area of Maryland were classified using geographic information system tools with principal components and cluster analyses. A study area consisting of the 8-digit Hydrologic Unit Code (HUC) watersheds with rivers that flow through the fractured rock area of Maryland and bounded by the Fall Line was further subdivided into 21,431 catchments from the National Hydrography Dataset Plus. The catchments were then used as a common hydrologic unit to compile relevant climatic, topographic, and geologic variables. A principal components analysis was performed on 10 input variables, and 4 principal components that accounted for 83 percent of the variability in the original data were identified. A subsequent cluster analysis grouped the catchments based on four principal component scores into six hydrogeologic regions. Two crystalline rock hydrogeologic regions, including large parts of the Washington, D.C. and Baltimore metropolitan regions that represent over 50 percent of the fractured rock area of Maryland, are distinguished by differences in recharge, Precipitation minus Potential Evapotranspiration, sand content in soils, and groundwater contributions to streams. This classification system will provide a georeferenced digital hydrogeologic framework for future investigations of groundwater availability in the fractured rock area of Maryland.
Formation of charged nanoparticles in hydrocarbon flames: principal mechanisms
NASA Astrophysics Data System (ADS)
Starik, A. M.; Savel'ev, A. M.; Titova, N. S.
2008-11-01
The processes of charged gaseous and particulate species formation in sooting hydrocarbon/air flame are studied. The original kinetic model, comprising the chemistry of neutral and charged gaseous species, generation of primary clusters, which then undergo charging due to attachment of ions and electrons to clusters and via thermoemission, and coagulation of charged-charged, charged-neutral and neutral-neutral particles, is reported. The analysis shows that the principal mechanisms of charged particle origin in hydrocarbon flames are associated with the attachment of ions and electrons produced in the course of chemoionization reactions to primary small clusters and particles and coagulation via charged-charged and charged-neutral particle interaction. Thermal ionization of particles does not play a significant role in the particle charging. This paper was presented at the Third International Symposium on Nonequilibrium Process, combustion, and Atmospheric Phenomena (Dagomys, Sochi, Russia, 25-29 June 2007).
Solid-state NMR/NQR and first-principles study of two niobium halide cluster compounds.
Perić, Berislav; Gautier, Régis; Pickard, Chris J; Bosiočić, Marko; Grbić, Mihael S; Požek, Miroslav
2014-01-01
Two hexanuclear niobium halide cluster compounds with a [Nb6X12](2+) (X=Cl, Br) diamagnetic cluster core, have been studied by a combination of experimental solid-state NMR/NQR techniques and PAW/GIPAW calculations. For niobium sites the NMR parameters were determined by using variable Bo field static broadband NMR measurements and additional NQR measurements. It was found that they possess large positive chemical shifts, contrary to majority of niobium compounds studied so far by solid-state NMR, but in accordance with chemical shifts of (95)Mo nuclei in structurally related compounds containing [Mo6Br8](4+) cluster cores. Experimentally determined δiso((93)Nb) values are in the range from 2,400 to 3,000 ppm. A detailed analysis of geometrical relations between computed electric field gradient (EFG) and chemical shift (CS) tensors with respect to structural features of cluster units was carried out. These tensors on niobium sites are almost axially symmetric with parallel orientation of the largest EFG and the smallest CS principal axes (Vzz and δ33) coinciding with the molecular four-fold axis of the [Nb6X12](2+) unit. Bridging halogen sites are characterized by large asymmetry of EFG and CS tensors, the largest EFG principal axis (Vzz) is perpendicular to the X-Nb bonds, while intermediate EFG principal axis (Vyy) and the largest CS principal axis (δ11) are oriented in the radial direction with respect to the center of the cluster unit. For more symmetrical bromide compound the PAW predictions for EFG parameters are in better correspondence with the NMR/NQR measurements than in the less symmetrical chlorine compound. Theoretically predicted NMR parameters of bridging halogen sites were checked by (79/81)Br NQR and (35)Cl solid-state NMR measurements. Copyright © 2014 Elsevier Inc. All rights reserved.
Song, Yuqiao; Liao, Jie; Dong, Junxing; Chen, Li
2015-09-01
The seeds of grapevine (Vitis vinifera) are a byproduct of wine production. To examine the potential value of grape seeds, grape seeds from seven sources were subjected to fingerprinting using direct analysis in real time coupled with time-of-flight mass spectrometry combined with chemometrics. Firstly, we listed all reported components (56 components) from grape seeds and calculated the precise m/z values of the deprotonated ions [M-H](-) . Secondly, the experimental conditions were systematically optimized based on the peak areas of total ion chromatograms of the samples. Thirdly, the seven grape seed samples were examined using the optimized method. Information about 20 grape seed components was utilized to represent characteristic fingerprints. Finally, hierarchical clustering analysis and principal component analysis were performed to analyze the data. Grape seeds from seven different sources were classified into two clusters; hierarchical clustering analysis and principal component analysis yielded similar results. The results of this study lay the foundation for appropriate utilization and exploitation of grape seed samples. Due to the absence of complicated sample preparation methods and chromatographic separation, the method developed in this study represents one of the simplest and least time-consuming methods for grape seed fingerprinting. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Chen, Yanxian; Chang, Billy Heung Wing; Ding, Xiaohu; He, Mingguang
2016-11-22
In the present study we attempt to use hypothesis-independent analysis in investigating the patterns in refraction growth in Chinese children, and to explore the possible risk factors affecting the different components of progression, as defined by Principal Component Analysis (PCA). A total of 637 first-born twins in Guangzhou Twin Eye Study with 6-year annual visits (baseline age 7-15 years) were available in the analysis. Cluster 1 to 3 were classified after a partitioning clustering, representing stable, slow and fast progressing groups of refraction respectively. Baseline age and refraction, paternal refraction, maternal refraction and proportion of two myopic parents showed significant differences across the three groups. Three major components of progression were extracted using PCA: "Average refraction", "Acceleration" and the combination of "Myopia stabilization" and "Late onset of refraction progress". In regression models, younger children with more severe myopia were associated with larger "Acceleration". The risk factors of "Acceleration" included change of height and weight, near work, and parental myopia, while female gender, change of height and weight were associated with "Stabilization", and increased outdoor time was related to "Late onset of refraction progress". We therefore concluded that genetic and environmental risk factors have different impacts on patterns of refraction progression.
Chen, Yanxian; Chang, Billy Heung Wing; Ding, Xiaohu; He, Mingguang
2016-01-01
In the present study we attempt to use hypothesis-independent analysis in investigating the patterns in refraction growth in Chinese children, and to explore the possible risk factors affecting the different components of progression, as defined by Principal Component Analysis (PCA). A total of 637 first-born twins in Guangzhou Twin Eye Study with 6-year annual visits (baseline age 7–15 years) were available in the analysis. Cluster 1 to 3 were classified after a partitioning clustering, representing stable, slow and fast progressing groups of refraction respectively. Baseline age and refraction, paternal refraction, maternal refraction and proportion of two myopic parents showed significant differences across the three groups. Three major components of progression were extracted using PCA: “Average refraction”, “Acceleration” and the combination of “Myopia stabilization” and “Late onset of refraction progress”. In regression models, younger children with more severe myopia were associated with larger “Acceleration”. The risk factors of “Acceleration” included change of height and weight, near work, and parental myopia, while female gender, change of height and weight were associated with “Stabilization”, and increased outdoor time was related to “Late onset of refraction progress”. We therefore concluded that genetic and environmental risk factors have different impacts on patterns of refraction progression. PMID:27874105
Burnett, Andrew D; Fan, Wenhui; Upadhya, Prashanth C; Cunningham, John E; Hargreaves, Michael D; Munshi, Tasnim; Edwards, Howell G M; Linfield, Edmund H; Davies, A Giles
2009-08-01
Terahertz frequency time-domain spectroscopy has been used to analyse a wide range of samples containing cocaine hydrochloride, heroin and ecstasy--common drugs-of-abuse. We investigated real-world samples seized by law enforcement agencies, together with pure drugs-of-abuse, and pure drugs-of-abuse systematically adulterated in the laboratory to emulate real-world samples. In order to investigate the feasibility of automatic spectral recognition of such illicit materials by terahertz spectroscopy, principal component analysis was employed to cluster spectra of similar compounds.
Vavougios, George D; George D, George; Pastaka, Chaido; Zarogiannis, Sotirios G; Gourgoulianis, Konstantinos I
2016-02-01
Phenotyping obstructive sleep apnea syndrome's comorbidity has been attempted for the first time only recently. The aim of our study was to determine phenotypes of comorbidity in obstructive sleep apnea syndrome patients employing a data-driven approach. Data from 1472 consecutive patient records were recovered from our hospital's database. Categorical principal component analysis and two-step clustering were employed to detect distinct clusters in the data. Univariate comparisons between clusters included one-way analysis of variance with Bonferroni correction and chi-square tests. Predictors of pairwise cluster membership were determined via a binary logistic regression model. The analyses revealed six distinct clusters: A, 'healthy, reporting sleeping related symptoms'; B, 'mild obstructive sleep apnea syndrome without significant comorbidities'; C1: 'moderate obstructive sleep apnea syndrome, obesity, without significant comorbidities'; C2: 'moderate obstructive sleep apnea syndrome with severe comorbidity, obesity and the exclusive inclusion of stroke'; D1: 'severe obstructive sleep apnea syndrome and obesity without comorbidity and a 33.8% prevalence of hypertension'; and D2: 'severe obstructive sleep apnea syndrome with severe comorbidities, along with the highest Epworth Sleepiness Scale score and highest body mass index'. Clusters differed significantly in apnea-hypopnea index, oxygen desaturation index; arousal index; age, body mass index, minimum oxygen saturation and daytime oxygen saturation (one-way analysis of variance P < 0.0001). Binary logistic regression indicated that older age, greater body mass index, lower daytime oxygen saturation and hypertension were associated independently with an increased risk of belonging in a comorbid cluster. Six distinct phenotypes of obstructive sleep apnea syndrome and its comorbidities were identified. Mapping the heterogeneity of the obstructive sleep apnea syndrome may help the early identification of at-risk groups. Finally, determining predictors of comorbidity for the moderate and severe strata of these phenotypes implies a need to take these factors into account when considering obstructive sleep apnea syndrome treatment options. © 2015 The Authors. Journal of Sleep Research published by John Wiley & Sons Ltd on behalf of European Sleep Research Society.
Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.
Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.
Borri, Marco; Schmidt, Maria A; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M; Partridge, Mike; Bhide, Shreerang A; Nutting, Christopher M; Harrington, Kevin J; Newbold, Katie L; Leach, Martin O
2015-01-01
To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.
NASA Astrophysics Data System (ADS)
Serrano, Francisco; Guerra-Merchán, Antonio; Lozano-Francisco, Carmen; Vera-Peláez, José Luis
1997-09-01
Nerja Cave is a karstic cavity used by humans from Late Paleolithic to post-Chalcolithic times. Remains of molluscan foods in the uppermost Pleistocene and Holocene sediments were studied with cluster analysis and principal components analysis, in both Qand Rmodes. The results from cluster analysis distinguished interval groups mainly in accordance with chronology and distinguished assemblages of species mainly according to habitat. Significant changes in the shellfish diet through time were revealed. In the Late Magdalenian, most molluscs consumed consisted of pulmonate gastropods and species from sandy sea bottoms. The Epipaleolithic diet was more varied and included species from rocky shorelines. From the Neolithic onward most molluscs consumed were from rocky shorelines. From the principal components analysis in Qmode, the first factor reflected mainly changes in the predominant capture environment, probably because of major paleogeographic changes. The second factor may reflect selective capture along rocky coastlines during certain times. The third factor correlated well with the sea-surface temperature curve in the western Mediterranean (Alboran Sea) during the late Quaternary.
Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.
2016-01-01
Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934
NASA Astrophysics Data System (ADS)
Chakraborty, Debdutta; Chattaraj, Pratim Kumar
2017-10-01
The possibility of functionalizing boron nitride flakes (BNFs) with some selected main group metal clusters, viz. OLi4, NLi5, CLi6, BLI7 and Al12Be, has been analyzed with the aid of density functional theory (DFT) based computations. Thermochemical as well as energetic considerations suggest that all the metal clusters interact with the BNF moiety in a favorable fashion. As a result of functionalization, the static (first) hyperpolarizability (β ) values of the metal cluster supported BNF moieties increase quite significantly as compared to that in the case of pristine BNF. Time dependent DFT analysis reveals that the metal clusters can lower the transition energies associated with the dominant electronic transitions quite significantly thereby enabling the metal cluster supported BNF moieties to exhibit significant non-linear optical activity. Moreover, the studied systems demonstrate broad band absorption capability spanning the UV-visible as well as infra-red domains. Energy decomposition analysis reveals that the electrostatic interactions principally stabilize the metal cluster supported BNF moieties.
Chakraborty, Debdutta; Chattaraj, Pratim Kumar
2017-10-25
The possibility of functionalizing boron nitride flakes (BNFs) with some selected main group metal clusters, viz. OLi 4 , NLi 5 , CLi 6 , BLI 7 and Al 12 Be, has been analyzed with the aid of density functional theory (DFT) based computations. Thermochemical as well as energetic considerations suggest that all the metal clusters interact with the BNF moiety in a favorable fashion. As a result of functionalization, the static (first) hyperpolarizability ([Formula: see text]) values of the metal cluster supported BNF moieties increase quite significantly as compared to that in the case of pristine BNF. Time dependent DFT analysis reveals that the metal clusters can lower the transition energies associated with the dominant electronic transitions quite significantly thereby enabling the metal cluster supported BNF moieties to exhibit significant non-linear optical activity. Moreover, the studied systems demonstrate broad band absorption capability spanning the UV-visible as well as infra-red domains. Energy decomposition analysis reveals that the electrostatic interactions principally stabilize the metal cluster supported BNF moieties.
A graph-Laplacian-based feature extraction algorithm for neural spike sorting.
Ghanbari, Yasser; Spence, Larry; Papamichalis, Panos
2009-01-01
Analysis of extracellular neural spike recordings is highly dependent upon the accuracy of neural waveform classification, commonly referred to as spike sorting. Feature extraction is an important stage of this process because it can limit the quality of clustering which is performed in the feature space. This paper proposes a new feature extraction method (which we call Graph Laplacian Features, GLF) based on minimizing the graph Laplacian and maximizing the weighted variance. The algorithm is compared with Principal Components Analysis (PCA, the most commonly-used feature extraction method) using simulated neural data. The results show that the proposed algorithm produces more compact and well-separated clusters compared to PCA. As an added benefit, tentative cluster centers are output which can be used to initialize a subsequent clustering stage.
Clustering of Variables for Mixed Data
NASA Astrophysics Data System (ADS)
Saracco, J.; Chavent, M.
2016-05-01
This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.
Slaus, Mario; Tomicić, Zeljko; Uglesić, Ante; Jurić, Radomir
2004-08-01
To determine the ethnic composition of the early medieval Croats, the location from which they migrated to the east coast of the Adriatic, and to separate early medieval Croats from Bijelo brdo culture members, using principal components analysis and discriminant function analysis of craniometric data from Central and South-East European medieval archaeological sites. Mean male values for 8 cranial measurements from 39 European and 5 Iranian sites were analyzed by principal components analysis. Raw data for 17 cranial measurements for 103 female and 112 male skulls were used to develop discriminant functions. The scatter-plot of the analyzed sites on the first 2 principal components showed a pattern of intergroup relationships consistent with geographical and archaeological information not included in the data set. The first 2 principal components separated the sites into 4 distinct clusters: Avaroslav sites west of the Danube, Avaroslav sites east of the Danube, Bijelo brdo sites, and Polish sites. All early medieval Croat sites were located in the cluster of Polish sites. Two discriminant functions successfully differentiated between early medieval Croats and Bijelo brdo members. Overall accuracies were high -- 89.3% for males, and 97.1% for females. Early medieval Croats seem to be of Slavic ancestry, and at one time shared a common homeland with medieval Poles. Application of unstandardized discriminant function coefficients to unclassified crania from 18 sites showed an expansion of early medieval Croats into continental Croatia during the 10th to 13th century.
Goekoop, Rutger; Goekoop, Jaap G
2014-01-01
The vast number of psychopathological syndromes that can be observed in clinical practice can be described in terms of a limited number of elementary syndromes that are differentially expressed. Previous attempts to identify elementary syndromes have shown limitations that have slowed progress in the taxonomy of psychiatric disorders. To examine the ability of network community detection (NCD) to identify elementary syndromes of psychopathology and move beyond the limitations of current classification methods in psychiatry. 192 patients with unselected mental disorders were tested on the Comprehensive Psychopathological Rating Scale (CPRS). Principal component analysis (PCA) was performed on the bootstrapped correlation matrix of symptom scores to extract the principal component structure (PCS). An undirected and weighted network graph was constructed from the same matrix. Network community structure (NCS) was optimized using a previously published technique. In the optimal network structure, network clusters showed a 89% match with principal components of psychopathology. Some 6 network clusters were found, including "Depression", "Mania", "Anxiety", "Psychosis", "Retardation", and "Behavioral Disorganization". Network metrics were used to quantify the continuities between the elementary syndromes. We present the first comprehensive network graph of psychopathology that is free from the biases of previous classifications: a 'Psychopathology Web'. Clusters within this network represent elementary syndromes that are connected via a limited number of bridge symptoms. Many problems of previous classifications can be overcome by using a network approach to psychopathology.
NASA Astrophysics Data System (ADS)
Kholodov, V. A.; Yaroslavtseva, N. V.; Lazarev, V. I.; Frid, A. S.
2016-09-01
Cluster analysis and principal component analysis (PCA) have been used for the interpretation of dry sieving data. Chernozems from the treatments of long-term field experiments with different land-use patterns— annually mowed steppe, continuous potato culture, permanent black fallow, and untilled fallow since 1998 after permanent black fallow—have been used. Analysis of dry sieving data by PCA has shown that the treatments of untilled fallow after black fallow and annually mowed steppe differ most in the series considered; the content of dry aggregates of 10-7 mm makes the largest contribution to the distribution of objects along the first principal component. This fraction has been sieved in water and analyzed by PCA. In contrast to dry sieving data, the wet sieving data showed the closest mathematical distance between the treatment of untilled fallow after black fallow and the undisturbed treatment of annually mowed steppe, while the untilled fallow after black fallow and the permanent black fallow were the most distant treatments. Thus, it may be suggested that the water stability of structure is first restored after the removal of destructive anthropogenic load. However, the restoration of the distribution of structural separates to the parameters characteristic of native soils is a significantly longer process.
Going beyond Clustering in MD Trajectory Analysis: An Application to Villin Headpiece Folding
Rajan, Aruna; Freddolino, Peter L.; Schulten, Klaus
2010-01-01
Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous. PMID:20419160
Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.
Rajan, Aruna; Freddolino, Peter L; Schulten, Klaus
2010-04-15
Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD) simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i) they are not data driven, (ii) they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii) they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA) and a non-metric multidimensional scaling (nMDS) method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.
Optimal wavelength band clustering for multispectral iris recognition.
Gong, Yazhuo; Zhang, David; Shi, Pengfei; Yan, Jingqi
2012-07-01
This work explores the possibility of clustering spectral wavelengths based on the maximum dissimilarity of iris textures. The eventual goal is to determine how many bands of spectral wavelengths will be enough for iris multispectral fusion and to find these bands that will provide higher performance of iris multispectral recognition. A multispectral acquisition system was first designed for imaging the iris at narrow spectral bands in the range of 420 to 940 nm. Next, a set of 60 human iris images that correspond to the right and left eyes of 30 different subjects were acquired for an analysis. Finally, we determined that 3 clusters were enough to represent the 10 feature bands of spectral wavelengths using the agglomerative clustering based on two-dimensional principal component analysis. The experimental results suggest (1) the number, center, and composition of clusters of spectral wavelengths and (2) the higher performance of iris multispectral recognition based on a three wavelengths-bands fusion.
Syazwan, AI; Rafee, B Mohd; Juahir, Hafizan; Azman, AZF; Nizar, AM; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, AA; Yunos, MA Syafiq; Anita, AR; Hanafiah, J Muhamad; Shaharuddin, MS; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, MN Mohamad; Azizan, HS; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, FT
2012-01-01
Purpose To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. Design A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. Method A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Result Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. Conclusion This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures. PMID:23055779
Syazwan, Ai; Rafee, B Mohd; Juahir, Hafizan; Azman, Azf; Nizar, Am; Izwyn, Z; Syahidatussyakirah, K; Muhaimin, Aa; Yunos, Ma Syafiq; Anita, Ar; Hanafiah, J Muhamad; Shaharuddin, Ms; Ibthisham, A Mohd; Hasmadi, I Mohd; Azhar, Mn Mohamad; Azizan, Hs; Zulfadhli, I; Othman, J; Rozalini, M; Kamarul, Ft
2012-01-01
To analyze and characterize a multidisciplinary, integrated indoor air quality checklist for evaluating the health risk of building occupants in a nonindustrial workplace setting. A cross-sectional study based on a participatory occupational health program conducted by the National Institute of Occupational Safety and Health (Malaysia) and Universiti Putra Malaysia. A modified version of the indoor environmental checklist published by the Department of Occupational Health and Safety, based on the literature and discussion with occupational health and safety professionals, was used in the evaluation process. Summated scores were given according to the cluster analysis and principal component analysis in the characterization of risk. Environmetric techniques was used to classify the risk of variables in the checklist. Identification of the possible source of item pollutants was also evaluated from a semiquantitative approach. Hierarchical agglomerative cluster analysis resulted in the grouping of factorial components into three clusters (high complaint, moderate-high complaint, moderate complaint), which were further analyzed by discriminant analysis. From this, 15 major variables that influence indoor air quality were determined. Principal component analysis of each cluster revealed that the main factors influencing the high complaint group were fungal-related problems, chemical indoor dispersion, detergent, renovation, thermal comfort, and location of fresh air intake. The moderate-high complaint group showed significant high loading on ventilation, air filters, and smoking-related activities. The moderate complaint group showed high loading on dampness, odor, and thermal comfort. This semiquantitative assessment, which graded risk from low to high based on the intensity of the problem, shows promising and reliable results. It should be used as an important tool in the preliminary assessment of indoor air quality and as a categorizing method for further IAQ investigations and complaints procedures.
[A study of Boletus bicolor from different areas using Fourier transform infrared spectrometry].
Zhou, Zai-Jin; Liu, Gang; Ren, Xian-Pei
2010-04-01
It is hard to differentiate the same species of wild growing mushrooms from different areas by macromorphological features. In this paper, Fourier transform infrared (FTIR) spectroscopy combined with principal component analysis was used to identify 58 samples of boletus bicolor from five different areas. Based on the fingerprint infrared spectrum of boletus bicolor samples, principal component analysis was conducted on 58 boletus bicolor spectra in the range of 1 350-750 cm(-1) using the statistical software SPSS 13.0. According to the result, the accumulated contributing ratio of the first three principal components accounts for 88.87%. They included almost all the information of samples. The two-dimensional projection plot using first and second principal component is a satisfactory clustering effect for the classification and discrimination of boletus bicolor. All boletus bicolor samples were divided into five groups with a classification accuracy of 98.3%. The study demonstrated that wild growing boletus bicolor at species level from different areas can be identified by FTIR spectra combined with principal components analysis.
Kharroubi, Adel; Gargouri, Dorra; Baati, Houda; Azri, Chafai
2012-06-01
Concentrations of selected heavy metals (Cd, Pb, Zn, Cu, Mn, and Fe) in surface sediments from 66 sites in both northern and eastern Mediterranean Sea-Boughrara lagoon exchange areas (southeastern Tunisia) were studied in order to understand current metal contamination due to the urbanization and economic development of nearby several coastal regions of the Gulf of Gabès. Multiple approaches were applied for the sediment quality assessment. These approaches were based on GIS coupled with chemometric methods (enrichment factors, geoaccumulation index, principal component analysis, and cluster analysis). Enrichment factors and principal component analysis revealed two distinct groups of metals. The first group corresponded to Fe and Mn derived from natural sources, and the second group contained Cd, Pb, Zn, and Cu originated from man-made sources. For these latter metals, cluster analysis showed two distinct distributions in the selected areas. They were attributed to temporal and spatial variations of contaminant sources input. The geoaccumulation index (I (geo)) values explained that only Cd, Pb, and Cu can be considered as moderate to extreme pollutants in the studied sediments.
Tchabo, William; Ma, Yongkun; Kwaw, Emmanuel; Zhang, Haining; Xiao, Lulu; Apaliya, Maurice T
2018-01-15
The four different methods of color measurement of wine proposed by Boulton, Giusti, Glories and Commission International de l'Eclairage (CIE) were applied to assess the statistical relationship between the phytochemical profile and chromatic characteristics of sulfur dioxide-free mulberry (Morus nigra) wine submitted to non-thermal maturation processes. The alteration in chromatic properties and phenolic composition of non-thermal aged mulberry wine were examined, aided by the used of Pearson correlation, cluster and principal component analysis. The results revealed a positive effect of non-thermal processes on phytochemical families of wines. From Pearson correlation analysis relationships between chromatic indexes and flavonols as well as anthocyanins were established. Cluster analysis highlighted similarities between Boulton and Giusti parameters, as well as Glories and CIE parameters in the assessment of chromatic properties of wines. Finally, principal component analysis was able to discriminate wines subjected to different maturation techniques on the basis of their chromatic and phenolics characteristics. Copyright © 2017. Published by Elsevier Ltd.
Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.
Williams, N J; Nasuto, S J; Saddy, J D
2015-07-30
The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.
REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Glascoe, L G; Glaser, R E; Chin, H S
2004-06-17
The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less
VanderKnyff, Jeremy; Friedman, Daniela B; Tanner, Andrea
2015-01-01
Using a sample of YouTube videos posted on the YouTube channels of organ procurement organizations, a content analysis was conducted to identify the frames used to strategically communicate prodonation messages. A total of 377 videos were coded for general characteristics, format, speaker characteristics, organs discussed, structure, problem definition, and treatment. Principal components analysis identified message frames, and k-means cluster analysis established distinct groupings of videos on the basis of the strength of their relationship to message frames. Analysis of these frames and clusters found that organ procurement organizations present multiple, and sometimes competing, video types and message frames on YouTube. This study serves as important formative research that will inform future studies to measure the effectiveness of the distinct message frames and clusters identified.
Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components
Wang, Min; Kornblau, Steven M; Coombes, Kevin R
2018-01-01
Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 “biological components,” 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable. PMID:29881252
Borri, Marco; Schmidt, Maria A.; Powell, Ceri; Koh, Dow-Mu; Riddell, Angela M.; Partridge, Mike; Bhide, Shreerang A.; Nutting, Christopher M.; Harrington, Kevin J.; Newbold, Katie L.; Leach, Martin O.
2015-01-01
Purpose To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters) of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment. Material and Methods The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4). Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters. Results The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4), determined with cluster validation, produced the best separation between reducing and non-reducing clusters. Conclusion The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes. PMID:26398888
Wu, Dingming; Wang, Dongfang; Zhang, Michael Q; Gu, Jin
2015-12-01
One major goal of large-scale cancer omics study is to identify molecular subtypes for more accurate cancer diagnoses and treatments. To deal with high-dimensional cancer multi-omics data, a promising strategy is to find an effective low-dimensional subspace of the original data and then cluster cancer samples in the reduced subspace. However, due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data. In this study, we proposed a novel low-rank approximation based integrative probabilistic model to fast find the shared principal subspace across multiple data types: the convexity of the low-rank regularized likelihood function of the probabilistic model ensures efficient and stable model fitting. Candidate molecular subtypes can be identified by unsupervised clustering hundreds of cancer samples in the reduced low-dimensional subspace. On testing datasets, our method LRAcluster (low-rank approximation based multi-omics data clustering) runs much faster with better clustering performances than the existing method. Then, we applied LRAcluster on large-scale cancer multi-omics data from TCGA. The pan-cancer analysis results show that the cancers of different tissue origins are generally grouped as independent clusters, except squamous-like carcinomas. While the single cancer type analysis suggests that the omics data have different subtyping abilities for different cancer types. LRAcluster is a very useful method for fast dimension reduction and unsupervised clustering of large-scale multi-omics data. LRAcluster is implemented in R and freely available via http://bioinfo.au.tsinghua.edu.cn/software/lracluster/ .
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
NASA Astrophysics Data System (ADS)
Atherton, Daniel
Early detection of disease and insect infestation within crops and precise application of pesticides can help reduce potential production losses, reduce environmental risk, and reduce the cost of farming. The goal of this study was the advanced detection of early blight (Alternaria solani) in potato (Solanum tuberosum) plants using hyperspectral remote sensing data captured with a handheld spectroradiometer. Hyperspectral reflectance spectra were captured 10 times over five weeks from plants grown to the vegetative and tuber bulking growth stages. The spectra were analyzed using principal component analysis (PCA), spectral change (ratio) analysis, partial least squares (PLS), cluster analysis, and vegetative indices. PCA successfully distinguished more heavily diseased plants from healthy and minimally diseased plants using two principal components. Spectral change (ratio) analysis provided wavelengths (490-510, 640, 665-670, 690, 740-750, and 935 nm) most sensitive to early blight infection followed by ANOVA results indicating a highly significant difference (p < 0.0001) between disease rating group means. In the majority of the experiments, comparisons of diseased plants with healthy plants using Fisher's LSD revealed more heavily diseased plants were significantly different from healthy plants. PLS analysis demonstrated the feasibility of detecting early blight infected plants, finding four optimal factors for raw spectra with the predictor variation explained ranging from 93.4% to 94.6% and the response variation explained ranging from 42.7% to 64.7%. Cluster analysis successfully distinguished healthy plants from all diseased plants except for the most mildly diseased plants, showing clustering analysis was an effective method for detection of early blight. Analysis of the reflectance spectra using the simple ratio (SR) and the normalized difference vegetative index (NDVI) was effective at differentiating all diseased plants from healthy plants, except for the most mildly diseased plants. Of the analysis methods attempted, cluster analysis and vegetative indices were the most promising. The results show the potential of hyperspectral remote sensing for the detection of early blight in potato plants.
Goekoop, Rutger; Goekoop, Jaap G.
2014-01-01
Introduction The vast number of psychopathological syndromes that can be observed in clinical practice can be described in terms of a limited number of elementary syndromes that are differentially expressed. Previous attempts to identify elementary syndromes have shown limitations that have slowed progress in the taxonomy of psychiatric disorders. Aim To examine the ability of network community detection (NCD) to identify elementary syndromes of psychopathology and move beyond the limitations of current classification methods in psychiatry. Methods 192 patients with unselected mental disorders were tested on the Comprehensive Psychopathological Rating Scale (CPRS). Principal component analysis (PCA) was performed on the bootstrapped correlation matrix of symptom scores to extract the principal component structure (PCS). An undirected and weighted network graph was constructed from the same matrix. Network community structure (NCS) was optimized using a previously published technique. Results In the optimal network structure, network clusters showed a 89% match with principal components of psychopathology. Some 6 network clusters were found, including "DEPRESSION", "MANIA", “ANXIETY”, "PSYCHOSIS", "RETARDATION", and "BEHAVIORAL DISORGANIZATION". Network metrics were used to quantify the continuities between the elementary syndromes. Conclusion We present the first comprehensive network graph of psychopathology that is free from the biases of previous classifications: a ‘Psychopathology Web’. Clusters within this network represent elementary syndromes that are connected via a limited number of bridge symptoms. Many problems of previous classifications can be overcome by using a network approach to psychopathology. PMID:25427156
Gambling, games of skill and human ecology: a pilot study by a multidimensional analysis approach.
Valera, Luca; Giuliani, Alessandro; Gizzi, Alessio; Tartaglia, Francesco; Tambone, Vittoradolfo
2015-01-01
The present pilot study aims at analyzing the human activity of playing in the light of an indicator of human ecology (HE). We highlighted the four essential anthropological dimensions (FEAD), starting from the analysis of questionnaires administered to actual gamers. The coherence between theoretical construct and observational data is a remarkable proof-of-concept of the possibility of establishing an experimentally motivated link between a philosophical construct (coming from Huizinga's Homo ludens definition) and actual gamers' motivation pattern. The starting hypothesis is that the activity of playing becomes ecological (and thus not harmful) when it achieves the harmony between the FEAD, thus realizing HE; conversely, it becomes at risk of creating some form of addiction, when destroying FEAD balance. We analyzed the data by means of variable clustering (oblique principal components) so to experimentally verify the existence of the hypothesized dimensions. The subsequent projection of statistical units (gamers) on the orthogonal space spanned by principal components allowed us to generate a meaningful, albeit preliminary, clusterization of gamer profiles.
Bible, Joe; Beck, James D.; Datta, Somnath
2016-01-01
Summary Ignorance of the mechanisms responsible for the availability of information presents an unusual problem for analysts. It is often the case that the availability of information is dependent on the outcome. In the analysis of cluster data we say that a condition for informative cluster size (ICS) exists when the inference drawn from analysis of hypothetical balanced data varies from that of inference drawn on observed data. Much work has been done in order to address the analysis of clustered data with informative cluster size; examples include Inverse Probability Weighting (IPW), Cluster Weighted Generalized Estimating Equations (CWGEE), and Doubly Weighted Generalized Estimating Equations (DWGEE). When cluster size changes with time, i.e., the data set possess temporally varying cluster sizes (TVCS), these methods may produce biased inference for the underlying marginal distribution of interest. We propose a new marginalization that may be appropriate for addressing clustered longitudinal data with TVCS. The principal motivation for our present work is to analyze the periodontal data collected by Beck et al. (1997, Journal of Periodontal Research 6, 497–505). Longitudinal periodontal data often exhibits both ICS and TVCS as the number of teeth possessed by participants at the onset of study is not constant and teeth as well as individuals may be displaced throughout the study. PMID:26682911
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.
Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain
2015-10-01
Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Ofner, Johannes; Kamilli, Katharina A; Eitenberger, Elisabeth; Friedbacher, Gernot; Lendl, Bernhard; Held, Andreas; Lohninger, Hans
2015-09-15
The chemometric analysis of multisensor hyperspectral data allows a comprehensive image-based analysis of precipitated atmospheric particles. Atmospheric particulate matter was precipitated on aluminum foils and analyzed by Raman microspectroscopy and subsequently by electron microscopy and energy dispersive X-ray spectroscopy. All obtained images were of the same spot of an area of 100 × 100 μm(2). The two hyperspectral data sets and the high-resolution scanning electron microscope images were fused into a combined multisensor hyperspectral data set. This multisensor data cube was analyzed using principal component analysis, hierarchical cluster analysis, k-means clustering, and vertex component analysis. The detailed chemometric analysis of the multisensor data allowed an extensive chemical interpretation of the precipitated particles, and their structure and composition led to a comprehensive understanding of atmospheric particulate matter.
The fine-scale genetic structure and evolution of the Japanese population.
Takeuchi, Fumihiko; Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua; Teo, Yik-Ying; Kato, Norihiro
2017-01-01
The contemporary Japanese populations largely consist of three genetically distinct groups-Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics.
Shukla, Sudhir; Bhargava, Atul; Chatterjee, Avijeet; Pandey, Avinash Chandra; Mishra, Brij K
2010-01-15
Assessment of genetic diversity in a crop-breeding programme helps in the identification of diverse parental combinations to create segregating progenies with maximum genetic variability and facilitates introgression of desirable genes from diverse germplasm into the available genetic base. In the present study, 39 strains of vegetable amaranth (Amaranthus tricolor) were evaluated for eight morphological and seven quality traits for two test seasons to study the extent of genetic divergence among the strains. Multivariate analysis showed that the first four principal components contributed 67.55% of the variability. Cluster analysis grouped the strains into six clusters that displayed a wide range of diversity for most of the traits. Cluster analysis has proved to be an effective method in grouping strains that may facilitate effective management and utilisation in crop-breeding programmes. The diverse strains falling in different clusters were identified, which can be utilised in different hybridisation programmes to develop high-foliage-yielding varieties rich in nutritional components. Copyright (c) 2009 Society of Chemical Industry.
Ecological characteristics of Simulium breeding sites in West Africa.
Cheke, Robert A; Young, Stephen; Garms, Rolf
2017-03-01
Twenty-nine taxa of Simulium were identified amongst 527 collections of larvae and pupae from untreated rivers and streams in Liberia (362 collections in 1967-71 & 1989), Togo (125 in 1979-81), Benin (35 in 1979-81) and Ghana (5 in 1980-81). Presence or absence of associations between different taxa were used to group them into six clusters using Ward agglomerative hierarchical cluster analysis. Environmental data associated with the pre-imaginal habitats were then analysed in relation to the six clusters by one way ANOVA. The results revealed significant effects in determining the clusters of maximum river width (all P<0.001 unless stated otherwise), water temperature, dry bulb air temperature, relative humidity, altitude, type of water (on a range from trickle to large river), water level, slope, current, vegetation, light conditions, discharge, length of breeding area, environs, terrain, river bed type (P<0.01), and the supports to which the insects were attached (P<0.01). When four non-significant contributors (wet bulb temperature, river features, height of waterfall and depth) were excluded and the reduced data-set analysed by principal components analysis (PCA), the first two principal components (PCs) accounted for 87% of the variance, with geographical features dominant in PC1 and hydrological characteristics in PC2. The analyses also revealed the ecological characteristics of each taxon's pre-imaginal habitats, which are discussed with particular reference to members of the Simulium damnosum species complex, whose breeding site distributions were further analysed by canonical correspondence analysis (CCA), a method also applied to the data on non-vector species. Copyright © 2016 Elsevier B.V. All rights reserved.
Chen, Lei Tai; Sun, Ai Qing; Yang, Min; Chen, Lu Lu; Ma, Xue Li; Li, Mei Ling; Yin, Yan Ping
2016-09-01
A total of 16 wheat cultivars were selected to detect seed vigor of different genotypes using standard germination test, seed germination test under stress conditions and field emergence test. The adversity resistance indices of seed vigor indices and field emergence percentage under different germination conditions were used as the indices to evaluate adversity resistance. Principal component analysis and cluster analysis were used for the comprehensive evaluation of seed vigor. Results showed that drought stress, artificial aging and cold soaking treatments affected seed vigor to some extent. The adversity resistance indices of the artificial aging and cold soaking tests were significantly positively correlated with the field emergence percentage, while the adversity resistance index of drought stress test had no significant correlation with the field emergence percentage. 16 wheat cultivars were classified as three groups based on the principal component analysis and cluster analysis. Yunong 949, Yumai 49-198, Luyuan 502, Zhengyumai 9987, Shimai 21, Shannong 23, and Shixin 828 belonged to high vigor seeds. Xunong 5, Yunong 982, Tangmai 8, Jimai 20, Jimai 22, Jinan 17, and Shannong 20 belonged to medium vigor seeds. The other two cultivars, Chang 4738 and Lunxuan 061, belonged to low vigor seeds.
Genetic diversity and relationship analysis of Gossypium arboreum accessions.
Liu, F; Zhou, Z L; Wang, C Y; Wang, Y H; Cai, X Y; Wang, X X; Zhang, Z S; Wang, K B
2015-11-19
Simple sequence repeat techniques were used to identify the genetic diversity of 101 Gossypium arboreum accessions collected from India, Vietnam, and the southwest of China (Guizhou, Guangxi, and Yunnan provinces). Twenty-six pairs of SSR primers produced a total of 103 polymorphic loci with an average of 3.96 polymorphic loci per primer. The average of the effective number of alleles, Nei's gene diversity, and Shannon's information index were 0.59, 0.2835, and 0.4361, respectively. The diversity varied among different geographic regions. The result of principal component analysis was consistent with that of unweighted pair group method with arithmetic mean clustering analysis. The 101 G. arboreum accessions were clustered into 2 groups.
Wang, Xihua; Zhang, Guangxin; Xu, Y Jun; Sun, Guangzhi
2015-11-01
Assessment on the interaction between groundwater and surface water (GW-SW) can generate information that is critical to regional water resource management, especially for regions that are highly dependent on groundwater resources for irrigation. This study investigated such interaction on China's Sanjiang Plain (10.9 × 10(4) km(2)) and produced results to assist sustainable regional water management for intensive agricultural activities. Methods of hierarchical cluster analysis (HCA), principal component analysis (PCA), and statistical analysis were used in this study. One hundred two water samplings (60 from shallow groundwater, 7 from deep groundwater, and 35 from surface water) were collected and grouped into three clusters and seven sub-clusters during the analyses. The PCA analysis identified four principal components of the interaction, which explained 85.9% variance of total database, attributed to the dissolution and evolution of gypsum, feldspar, and other natural minerals in the region that was affected by anthropic and geological (sedimentary rock mineral) activities. The analyses showed that surface water in the upper region of the Sanjiang Plain gained water from local shallow groundwater, indicating that the surface water in the upper region was relatively more resilient to withdrawal for usage, whereas in the middle region, there was only a weak interaction between shallow groundwater and surface water. In the lower region of the Sanjiang Plain, surface water lost water to shallow groundwater, indicating that the groundwater was vulnerable to pollution by pesticides and fertilizers from terrestrial sources.
Dong, Skye T; Costa, Daniel S J; Butow, Phyllis N; Lovell, Melanie R; Agar, Meera; Velikova, Galina; Teckle, Paulos; Tong, Allison; Tebbutt, Niall C; Clarke, Stephen J; van der Hoek, Kim; King, Madeleine T; Fayers, Peter M
2016-01-01
Symptom clusters in advanced cancer can influence patient outcomes. There is large heterogeneity in the methods used to identify symptom clusters. To investigate the consistency of symptom cluster composition in advanced cancer patients using different statistical methodologies for all patients across five primary cancer sites, and to examine which clusters predict functional status, a global assessment of health and global quality of life. Principal component analysis and exploratory factor analysis (with different rotation and factor selection methods) and hierarchical cluster analysis (with different linkage and similarity measures) were used on a data set of 1562 advanced cancer patients who completed the European Organization for the Research and Treatment of Cancer Quality of Life Questionnaire-Core 30. Four clusters consistently formed for many of the methods and cancer sites: tense-worry-irritable-depressed (emotional cluster), fatigue-pain, nausea-vomiting, and concentration-memory (cognitive cluster). The emotional cluster was a stronger predictor of overall quality of life than the other clusters. Fatigue-pain was a stronger predictor of overall health than the other clusters. The cognitive cluster and fatigue-pain predicted physical functioning, role functioning, and social functioning. The four identified symptom clusters were consistent across statistical methods and cancer types, although there were some noteworthy differences. Statistical derivation of symptom clusters is in need of greater methodological guidance. A psychosocial pathway in the management of symptom clusters may improve quality of life. Biological mechanisms underpinning symptom clusters need to be delineated by future research. A framework for evidence-based screening, assessment, treatment, and follow-up of symptom clusters in advanced cancer is essential. Copyright © 2016 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
Local Prediction Models on Mid-Atlantic Ridge MORB by Principal Component Regression
NASA Astrophysics Data System (ADS)
Ling, X.; Snow, J. E.; Chin, W.
2017-12-01
The isotopic compositions of the daughter isotopes of long-lived radioactive systems (Sr, Nd, Hf and Pb ) can be used to map the scale and history of mantle heterogeneities beneath mid-ocean ridges. Our goal is to relate the multidimensional structure in the existing isotopic dataset with an underlying physical reality of mantle sources. The numerical technique of Principal Component Analysis is useful to reduce the linear dependence of the data to a minimum set of orthogonal eigenvectors encapsulating the information contained (cf Agranier et al 2005). The dataset used for this study covers almost all the MORBs along mid-Atlantic Ridge (MAR), from 54oS to 77oN and 8.8oW to -46.7oW, including replicating the dataset of Agranier et al., 2005 published plus 53 basalt samples dredged and analyzed since then (data from PetDB). The principal components PC1 and PC2 account for 61.56% and 29.21%, respectively, of the total isotope ratios variability. The samples with similar compositions to HIMU and EM and DM are identified to better understand the PCs. PC1 and PC2 are accountable for HIMU and EM whereas PC2 has limited control over the DM source. PC3 is more strongly controlled by the depleted mantle source than PC2. What this means is that all three principal components have a high degree of significance relevant to the established mantle sources. We also tested the relationship between mantle heterogeneity and sample locality. K-means clustering algorithm is a type of unsupervised learning to find groups in the data based on feature similarity. The PC factor scores of each sample are clustered into three groups. Cluster one and three are alternating on the north and south MAR. Cluster two exhibits on 45.18oN to 0.79oN and -27.9oW to -30.40oW alternating with cluster one. The ridge has been preliminarily divided into 16 sections considering both the clusters and ridge segments. The principal component regression models the section based on 6 isotope ratios and PCs. The prediction residual is about 1-2km. It means that the combined 5 isotopes are a strong predictor of geographic location along the ridge, a slightly surprising result. PCR is a robust and powerful method for both visualizing and manipulating the multidimensional representation of isotope data.
SELF-ORGANIZING MAPS FOR INTEGRATED ASSESSMENT OF THE MID-ATLANTIC REGION
A. new method was developed to perform an environmental assessment for the
Mid-Atlantic Region (MAR). This was a combination of the self-organizing map (SOM) neural network and principal component analysis (PCA). The method is capable of clustering ecosystems in terms of envi...
Characterization of spatial and temporal variability in hydrochemistry of Johor Straits, Malaysia.
Abdullah, Pauzi; Abdullah, Sharifah Mastura Syed; Jaafar, Othman; Mahmud, Mastura; Khalik, Wan Mohd Afiq Wan Mohd
2015-12-15
Characterization of hydrochemistry changes in Johor Straits within 5 years of monitoring works was successfully carried out. Water quality data sets (27 stations and 19 parameters) collected in this area were interpreted subject to multivariate statistical analysis. Cluster analysis grouped all the stations into four clusters ((Dlink/Dmax) × 100<90) and two clusters ((Dlink/Dmax) × 100<80) for site and period similarities. Principal component analysis rendered six significant components (eigenvalue>1) that explained 82.6% of the total variance of the data set. Classification matrix of discriminant analysis assigned 88.9-92.6% and 83.3-100% correctness in spatial and temporal variability, respectively. Times series analysis then confirmed that only four parameters were not significant over time change. Therefore, it is imperative that the environmental impact of reclamation and dredging works, municipal or industrial discharge, marine aquaculture and shipping activities in this area be effectively controlled and managed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lindsey, Cary R.; Neupane, Ghanashym; Spycher, Nicolas; ...
2018-01-03
Although many Known Geothermal Resource Areas in Oregon and Idaho were identified during the 1970s and 1980s, few were subsequently developed commercially. Because of advances in power plant design and energy conversion efficiency since the 1980s, some previously identified KGRAs may now be economically viable prospects. Unfortunately, available characterization data vary widely in accuracy, precision, and granularity, making assessments problematic. In this paper, we suggest a procedure for comparing test areas against proven resources using Principal Component Analysis and cluster identification. The result is a low-cost tool for evaluating potential exploration targets using uncertain or incomplete data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lindsey, Cary R.; Neupane, Ghanashym; Spycher, Nicolas
Although many Known Geothermal Resource Areas in Oregon and Idaho were identified during the 1970s and 1980s, few were subsequently developed commercially. Because of advances in power plant design and energy conversion efficiency since the 1980s, some previously identified KGRAs may now be economically viable prospects. Unfortunately, available characterization data vary widely in accuracy, precision, and granularity, making assessments problematic. In this paper, we suggest a procedure for comparing test areas against proven resources using Principal Component Analysis and cluster identification. The result is a low-cost tool for evaluating potential exploration targets using uncertain or incomplete data.
Perceptions of Principal Attributes in the Era of Accountability
ERIC Educational Resources Information Center
Mosley, Jahmal I.
2010-01-01
This dissertation investigates Vermont principals' perceptions of leadership attributes linked to the role of the principal. It is guided by four research questions: (1) are there any clusters of participants who sorted the principal leadership attribute items similarly and differently; (2) how are the principal leadership attribute items within…
Principal Component and Linkage Analysis of Cardiovascular Risk Traits in the Norfolk Isolate
Cox, Hannah C.; Bellis, Claire; Lea, Rod A.; Quinlan, Sharon; Hughes, Roger; Dyer, Thomas; Charlesworth, Jac; Blangero, John; Griffiths, Lyn R.
2009-01-01
Objective(s) An individual's risk of developing cardiovascular disease (CVD) is influenced by genetic factors. This study focussed on mapping genetic loci for CVD-risk traits in a unique population isolate derived from Norfolk Island. Methods This investigation focussed on 377 individuals descended from the population founders. Principal component analysis was used to extract orthogonal components from 11 cardiovascular risk traits. Multipoint variance component methods were used to assess genome-wide linkage using SOLAR to the derived factors. A total of 285 of the 377 related individuals were informative for linkage analysis. Results A total of 4 principal components accounting for 83% of the total variance were derived. Principal component 1 was loaded with body size indicators; principal component 2 with body size, cholesterol and triglyceride levels; principal component 3 with the blood pressures; and principal component 4 with LDL-cholesterol and total cholesterol levels. Suggestive evidence of linkage for principal component 2 (h2 = 0.35) was observed on chromosome 5q35 (LOD = 1.85; p = 0.0008). While peak regions on chromosome 10p11.2 (LOD = 1.27; p = 0.005) and 12q13 (LOD = 1.63; p = 0.003) were observed to segregate with principal components 1 (h2 = 0.33) and 4 (h2 = 0.42), respectively. Conclusion(s): This study investigated a number of CVD risk traits in a unique isolated population. Findings support the clustering of CVD risk traits and provide interesting evidence of a region on chromosome 5q35 segregating with weight, waist circumference, HDL-c and total triglyceride levels. PMID:19339786
Ocké, Marga C
2013-05-01
This paper aims to describe different approaches for studying the overall diet with advantages and limitations. Studies of the overall diet have emerged because the relationship between dietary intake and health is very complex with all kinds of interactions. These cannot be captured well by studying single dietary components. Three main approaches to study the overall diet can be distinguished. The first method is researcher-defined scores or indices of diet quality. These are usually based on guidelines for a healthy diet or on diets known to be healthy. The second approach, using principal component or cluster analysis, is driven by the underlying dietary data. In principal component analysis, scales are derived based on the underlying relationships between food groups, whereas in cluster analysis, subgroups of the population are created with people that cluster together based on their dietary intake. A third approach includes methods that are driven by a combination of biological pathways and the underlying dietary data. Reduced rank regression defines linear combinations of food intakes that maximally explain nutrient intakes or intermediate markers of disease. Decision tree analysis identifies subgroups of a population whose members share dietary characteristics that influence (intermediate markers of) disease. It is concluded that all approaches have advantages and limitations and essentially answer different questions. The third approach is still more in an exploration phase, but seems to have great potential with complementary value. More insight into the utility of conducting studies on the overall diet can be gained if more attention is given to methodological issues.
Chemometrics-based Approach in Analysis of Arnicae flos
Zheleva-Dimitrova, Dimitrina Zh.; Balabanova, Vessela; Gevrenova, Reneta; Doichinova, Irini; Vitkova, Antonina
2015-01-01
Introduction: Arnica montana flowers have a long history as herbal medicines for external use on injuries and rheumatic complaints. Objective: To investigate Arnicae flos of cultivated accessions from Bulgaria, Poland, Germany, Finland, and Pharmacy store for phenolic derivatives and sesquiterpene lactones (STLs). Materials and Methods: Samples of Arnica from nine origins were prepared by ultrasound-assisted extraction with 80% methanol for phenolic compounds analysis. Subsequent reverse-phase high-performance liquid chromatography (HPLC) separation of the analytes was performed using gradient elution and ultraviolet detection at 280 and 310 nm (phenolic acids), and 360 nm (flavonoids). Total STLs were determined in chloroform extracts by solid-phase extraction-HPLC at 225 nm. The HPLC generated chromatographic data were analyzed using principal component analysis (PCA) and hierarchical clustering (HC). Results: The highest total amount of phenolic acids was found in the sample from Botanical Garden at Joensuu University, Finland (2.36 mg/g dw). Astragalin, isoquercitrin, and isorhamnetin 3-glucoside were the main flavonol glycosides being present up to 3.37 mg/g (astragalin). Three well-defined clusters were distinguished by PCA and HC. Cluster C1 comprised of the German and Finnish accessions characterized by the highest content of flavonols. Cluster C2 included the Bulgarian and Polish samples presenting a low content of flavonoids. Cluster C3 consisted only of one sample from a pharmacy store. Conclusion: A validated HPLC method for simultaneous determination of phenolic acids, flavonoid glycosides, and aglycones in A. montana flowers was developed. The PCA loading plot showed that quercetin, kaempferol, and isorhamnetin can be used to distinguish different Arnica accessions. SUMMARY A principal component analysis (PCA) on 13 phenolic compounds and total amount of sesquiterpene lactones in Arnicae flos collection tended to cluster the studied 9 accessions into three main groups. The profiles obtained demonstrated that the samples from Germany and Finland are characterized by greater amounts of phenolic derivatives than the Bulgarian and Polish ones. The PCA loading plot showed that quercetin, kaemferol and isorhamnetin can be used to distinguish different arnica accessions. PMID:27013791
Alignments of the galaxies in and around the Virgo cluster with the local velocity shear
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Jounghun; Rey, Soo Chang; Kim, Suk, E-mail: jounghun@astro.snu.ac.kr
2014-08-10
Observational evidence is presented for the alignment between the cosmic sheet and the principal axis of the velocity shear field at the position of the Virgo cluster. The galaxies in and around the Virgo cluster from the Extended Virgo Cluster Catalog that was recently constructed by Kim et al. are used to determine the direction of the local sheet. The peculiar velocity field reconstructed from the Sloan Digital Sky Survey Data Release 7 is analyzed to estimate the local velocity shear tensor at the Virgo center. Showing first that the minor principal axis of the local velocity shear tensor ismore » almost parallel to the direction of the line of sight, we detect a clear signal of alignment between the positions of the Virgo satellites and the intermediate principal axis of the local velocity shear projected onto the plane of the sky. Furthermore, the dwarf satellites are found to appear more strongly aligned than their normal counterparts, which is interpreted as an indication of the following. (1) The normal satellites and the dwarf satellites fall in the Virgo cluster preferentially along the local filament and the local sheet, respectively. (2) The local filament is aligned with the minor principal axis of the local velocity shear while the local sheet is parallel to the plane spanned by the minor and intermediate principal axes. Our result is consistent with the recent numerical claim that the velocity shear is a good tracer of the cosmic web.« less
Network visualization of conformational sampling during molecular dynamics simulation.
Ahlstrom, Logan S; Baker, Joseph Lee; Ehrlich, Kent; Campbell, Zachary T; Patel, Sunita; Vorontsov, Ivan I; Tama, Florence; Miyashita, Osamu
2013-11-01
Effective data reduction methods are necessary for uncovering the inherent conformational relationships present in large molecular dynamics (MD) trajectories. Clustering algorithms provide a means to interpret the conformational sampling of molecules during simulation by grouping trajectory snapshots into a few subgroups, or clusters, but the relationships between the individual clusters may not be readily understood. Here we show that network analysis can be used to visualize the dominant conformational states explored during simulation as well as the connectivity between them, providing a more coherent description of conformational space than traditional clustering techniques alone. We compare the results of network visualization against 11 clustering algorithms and principal component conformer plots. Several MD simulations of proteins undergoing different conformational changes demonstrate the effectiveness of networks in reaching functional conclusions. Copyright © 2013 Elsevier Inc. All rights reserved.
EMPCA and Cluster Analysis of Quasar Spectra: Construction and Application to Simulated Spectra
NASA Astrophysics Data System (ADS)
Marrs, Adam; Leighly, Karen; Wagner, Cassidy; Macinnis, Francis
2017-01-01
Quasars have complex spectra with emission lines influenced by many factors. Therefore, to fully describe the spectrum requires specification of a large number of parameters, such as line equivalent width, blueshift, and ratios. Principal Component Analysis (PCA) aims to construct eigenvectors-or principal components-from the data with the goal of finding a few key parameters that can be used to predict the rest of the spectrum fairly well. Analysis of simulated quasar spectra was used to verify and justify our modified application of PCA.We used a variant of PCA called Weighted Expectation Maximization PCA (EMPCA; Bailey 2012) along with k-means cluster analysis to analyze simulated quasar spectra. Our approach combines both analytical methods to address two known problems with classical PCA. EMPCA uses weights to account for uncertainty and missing points in the spectra. K-means groups similar spectra together to address the nonlinearity of quasar spectra, specifically variance in blueshifts and widths of the emission lines.In producing and analyzing simulations, we first tested the effects of varying equivalent widths and blueshifts on the derived principal components, and explored the differences between standard PCA and EMPCA. We also tested the effects of varying signal-to-noise ratio. Next we used the results of fits to composite quasar spectra (see accompanying poster by Wagner et al.) to construct a set of realistic simulated spectra, and subjected those spectra to the EMPCA /k-means analysis. We concluded that our approach was validated when we found that the mean spectra from our k-means clusters derived from PCA projection coefficients reproduced the trends observed in the composite spectra.Furthermore, our method needed only two eigenvectors to identify both sets of correlations used to construct the simulations, as well as indicating the linear and nonlinear segments. Comparing this to regular PCA, which can require a dozen or more components, or to direct spectral analysis that may need measurement of 20 fit parameters, shows why the dual application of these two techniques is such a powerful tool.
Holmes, Sean T; Iuliucci, Robbie J; Mueller, Karl T; Dybowski, Cecil
2015-11-10
Calculations of the principal components of magnetic-shielding tensors in crystalline solids require the inclusion of the effects of lattice structure on the local electronic environment to obtain significant agreement with experimental NMR measurements. We assess periodic (GIPAW) and GIAO/symmetry-adapted cluster (SAC) models for computing magnetic-shielding tensors by calculations on a test set containing 72 insulating molecular solids, with a total of 393 principal components of chemical-shift tensors from 13C, 15N, 19F, and 31P sites. When clusters are carefully designed to represent the local solid-state environment and when periodic calculations include sufficient variability, both methods predict magnetic-shielding tensors that agree well with experimental chemical-shift values, demonstrating the correspondence of the two computational techniques. At the basis-set limit, we find that the small differences in the computed values have no statistical significance for three of the four nuclides considered. Subsequently, we explore the effects of additional DFT methods available only with the GIAO/cluster approach, particularly the use of hybrid-GGA functionals, meta-GGA functionals, and hybrid meta-GGA functionals that demonstrate improved agreement in calculations on symmetry-adapted clusters. We demonstrate that meta-GGA functionals improve computed NMR parameters over those obtained by GGA functionals in all cases, and that hybrid functionals improve computed results over the respective pure DFT functional for all nuclides except 15N.
Spatial assessment of water quality using chemometrics in the Pearl River Estuary, China
NASA Astrophysics Data System (ADS)
Wu, Meilin; Wang, Youshao; Dong, Junde; Sun, Fulin; Wang, Yutu; Hong, Yiguo
2017-03-01
A cruise was commissioned in the summer of 2009 to evaluate water quality in the Pearl River Estuary (PRE). Chemometrics such as Principal Component Analysis (PCA), Cluster analysis (CA) and Self-Organizing Map (SOM) were employed to identify anthropogenic and natural influences on estuary water quality. The scores of stations in the surface layer in the first principal component (PC1) were related to NH4-N, PO4-P, NO2-N, NO3-N, TP, and Chlorophyll a while salinity, turbidity, and SiO3-Si in the second principal component (PC2). Similarly, the scores of stations in the bottom layers in PC1 were related to PO4-P, NO2-N, NO3-N, and TP, while salinity, Chlorophyll a, NH4-N, and SiO3-Si in PC2. Results of the PCA identified the spatial distribution of the surface and bottom water quality, namely the Guangzhou urban reach, Middle reach, and Lower reach of the estuary. Both cluster analysis and PCA produced the similar results. Self-organizing map delineated the Guangzhou urban reach of the Pearl River that was mainly influenced by human activities. The middle and lower reaches of the PRE were mainly influenced by the waters in the South China Sea. The information extracted by PCA, CA, and SOM would be very useful to regional agencies in developing a strategy to carry out scientific plans for resource use based on marine system functions.
Steindl, Theodora M; Crump, Carolyn E; Hayden, Frederick G; Langer, Thierry
2005-10-06
The development and application of a sophisticated virtual screening and selection protocol to identify potential, novel inhibitors of the human rhinovirus coat protein employing various computer-assisted strategies are described. A large commercially available database of compounds was screened using a highly selective, structure-based pharmacophore model generated with the program Catalyst. A docking study and a principal component analysis were carried out within the software package Cerius and served to validate and further refine the obtained results. These combined efforts led to the selection of six candidate structures, for which in vitro anti-rhinoviral activity could be shown in a biological assay.
[A spatial adaptive algorithm for endmember extraction on multispectral remote sensing image].
Zhu, Chang-Ming; Luo, Jian-Cheng; Shen, Zhan-Feng; Li, Jun-Li; Hu, Xiao-Dong
2011-10-01
Due to the problem that the convex cone analysis (CCA) method can only extract limited endmember in multispectral imagery, this paper proposed a new endmember extraction method by spatial adaptive spectral feature analysis in multispectral remote sensing image based on spatial clustering and imagery slice. Firstly, in order to remove spatial and spectral redundancies, the principal component analysis (PCA) algorithm was used for lowering the dimensions of the multispectral data. Secondly, iterative self-organizing data analysis technology algorithm (ISODATA) was used for image cluster through the similarity of the pixel spectral. And then, through clustering post process and litter clusters combination, we divided the whole image data into several blocks (tiles). Lastly, according to the complexity of image blocks' landscape and the feature of the scatter diagrams analysis, the authors can determine the number of endmembers. Then using hourglass algorithm extracts endmembers. Through the endmember extraction experiment on TM multispectral imagery, the experiment result showed that the method can extract endmember spectra form multispectral imagery effectively. What's more, the method resolved the problem of the amount of endmember limitation and improved accuracy of the endmember extraction. The method has provided a new way for multispectral image endmember extraction.
Advanced Treatment Monitoring for Olympic-Level Athletes Using Unsupervised Modeling Techniques
Siedlik, Jacob A.; Bergeron, Charles; Cooper, Michael; Emmons, Russell; Moreau, William; Nabhan, Dustin; Gallagher, Philip; Vardiman, John P.
2016-01-01
Context Analysis of injury and illness data collected at large international competitions provides the US Olympic Committee and the national governing bodies for each sport with information to best prepare for future competitions. Research in which authors have evaluated medical contacts to provide the expected level of medical care and sports medicine services at international competitions is limited. Objective To analyze the medical-contact data for athletes, staff, and coaches who participated in the 2011 Pan American Games in Guadalajara, Mexico, using unsupervised modeling techniques to identify underlying treatment patterns. Design Descriptive epidemiology study. Setting Pan American Games. Patients or Other Participants A total of 618 US athletes (337 males, 281 females) participated in the 2011 Pan American Games. Main Outcome Measure(s) Medical data were recorded from the injury-evaluation and injury-treatment forms used by clinicians assigned to the central US Olympic Committee Sport Medicine Clinic and satellite locations during the operational 17-day period of the 2011 Pan American Games. We used principal components analysis and agglomerative clustering algorithms to identify and define grouped modalities. Lift statistics were calculated for within-cluster subgroups. Results Principal component analyses identified 3 components, accounting for 72.3% of the variability in datasets. Plots of the principal components showed that individual contacts focused on 4 treatment clusters: massage, paired manipulation and mobilization, soft tissue therapy, and general medical. Conclusions Unsupervised modeling techniques were useful for visualizing complex treatment data and provided insights for improved treatment modeling in athletes. Given its ability to detect clinically relevant treatment pairings in large datasets, unsupervised modeling should be considered a feasible option for future analyses of medical-contact data from international competitions. PMID:26794628
Wilderness ecology: virgin plant communities of the Boundary Waters Canoe Area.
Lewis F. Ohmann; Robert R. Ream
1971-01-01
Describes virgin plant communities in the Boundary Waters Canoe Area. Data from all vegetative components of 106 virgin upland stands were used to construct a community classification through a combination of agglomerative clustering and principal components analysis. Discusses the relation of communities to their environment and to past wildfires.
The fine-scale genetic structure and evolution of the Japanese population
Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua
2017-01-01
The contemporary Japanese populations largely consist of three genetically distinct groups—Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics. PMID:29091727
A novel unsupervised spike sorting algorithm for intracranial EEG.
Yadav, R; Shah, A K; Loeb, J A; Swamy, M N S; Agarwal, R
2011-01-01
This paper presents a novel, unsupervised spike classification algorithm for intracranial EEG. The method combines template matching and principal component analysis (PCA) for building a dynamic patient-specific codebook without a priori knowledge of the spike waveforms. The problem of misclassification due to overlapping classes is resolved by identifying similar classes in the codebook using hierarchical clustering. Cluster quality is visually assessed by projecting inter- and intra-clusters onto a 3D plot. Intracranial EEG from 5 patients was utilized to optimize the algorithm. The resulting codebook retains 82.1% of the detected spikes in non-overlapping and disjoint clusters. Initial results suggest a definite role of this method for both rapid review and quantitation of interictal spikes that could enhance both clinical treatment and research studies on epileptic patients.
Pang, Yuanjie; Peng, Roger D; Jones, Miranda R; Francesconi, Kevin A; Goessler, Walter; Howard, Barbara V; Umans, Jason G; Best, Lyle G; Guallar, Eliseo; Post, Wendy S; Kaufman, Joel D; Vaidya, Dhananjay; Navas-Acien, Ana
2016-05-01
Natural and anthropogenic sources of metal exposure differ for urban and rural residents. We searched to identify patterns of metal mixtures which could suggest common environmental sources and/or metabolic pathways of different urinary metals, and compared metal-mixtures in two population-based studies from urban/sub-urban and rural/town areas in the US: the Multi-Ethnic Study of Atherosclerosis (MESA) and the Strong Heart Study (SHS). We studied a random sample of 308 White, Black, Chinese-American, and Hispanic participants in MESA (2000-2002) and 277 American Indian participants in SHS (1998-2003). We used principal component analysis (PCA), cluster analysis (CA), and linear discriminant analysis (LDA) to evaluate nine urinary metals (antimony [Sb], arsenic [As], cadmium [Cd], lead [Pb], molybdenum [Mo], selenium [Se], tungsten [W], uranium [U] and zinc [Zn]). For arsenic, we used the sum of inorganic and methylated species (∑As). All nine urinary metals were higher in SHS compared to MESA participants. PCA and CA revealed the same patterns in SHS, suggesting 4 distinct principal components (PC) or clusters (∑As-U-W, Pb-Sb, Cd-Zn, Mo-Se). In MESA, CA showed 2 large clusters (∑As-Mo-Sb-U-W, Cd-Pb-Se-Zn), while PCA showed 4 PCs (Sb-U-W, Pb-Se-Zn, Cd-Mo, ∑As). LDA indicated that ∑As, U, W, and Zn were the most discriminant variables distinguishing MESA and SHS participants. In SHS, the ∑As-U-W cluster and PC might reflect groundwater contamination in rural areas, and the Cd-Zn cluster and PC could reflect common sources from meat products or metabolic interactions. Among the metals assayed, ∑As, U, W and Zn differed the most between MESA and SHS, possibly reflecting disproportionate exposure from drinking water and perhaps food in rural Native communities compared to urban communities around the US. Copyright © 2016 Elsevier Inc. All rights reserved.
Xue, Gang; Song, Wen-qi; Li, Shu-chao
2015-01-01
In order to achieve the rapid identification of fire resistive coating for steel structure of different brands in circulating, a new method for the fast discrimination of varieties of fire resistive coating for steel structure by means of near infrared spectroscopy was proposed. The raster scanning near infrared spectroscopy instrument and near infrared diffuse reflectance spectroscopy were applied to collect the spectral curve of different brands of fire resistive coating for steel structure and the spectral data were preprocessed with standard normal variate transformation(standard normal variate transformation, SNV) and Norris second derivative. The principal component analysis (principal component analysis, PCA)was used to near infrared spectra for cluster analysis. The analysis results showed that the cumulate reliabilities of PC1 to PC5 were 99. 791%. The 3-dimentional plot was drawn with the scores of PC1, PC2 and PC3 X 10, which appeared to provide the best clustering of the varieties of fire resistive coating for steel structure. A total of 150 fire resistive coating samples were divided into calibration set and validation set randomly, the calibration set had 125 samples with 25 samples of each variety, and the validation set had 25 samples with 5 samples of each variety. According to the principal component scores of unknown samples, Mahalanobis distance values between each variety and unknown samples were calculated to realize the discrimination of different varieties. The qualitative analysis model for external verification of unknown samples is a 10% recognition ration. The results demonstrated that this identification method can be used as a rapid, accurate method to identify the classification of fire resistive coating for steel structure and provide technical reference for market regulation.
An Empirical Taxonomy of Hospital Governing Board Roles
Lee, Shoou-Yih D; Alexander, Jeffrey A; Wang, Virginia; Margolin, Frances S; Combes, John R
2008-01-01
Objective To develop a taxonomy of governing board roles in U.S. hospitals. Data Sources 2005 AHA Hospital Governance Survey, 2004 AHA Annual Survey of Hospitals, and Area Resource File. Study Design A governing board taxonomy was developed using cluster analysis. Results were validated and reviewed by industry experts. Differences in hospital and environmental characteristics across clusters were examined. Data Extraction Methods One-thousand three-hundred thirty-four hospitals with complete information on the study variables were included in the analysis. Principal Findings Five distinct clusters of hospital governing boards were identified. Statistical tests showed that the five clusters had high internal reliability and high internal validity. Statistically significant differences in hospital and environmental conditions were found among clusters. Conclusions The developed taxonomy provides policy makers, health care executives, and researchers a useful way to describe and understand hospital governing board roles. The taxonomy may also facilitate valid and systematic assessment of governance performance. Further, the taxonomy could be used as a framework for governing boards themselves to identify areas for improvement and direction for change. PMID:18355260
ERIC Educational Resources Information Center
Moss, S. C.; Hogg, J.
1990-01-01
Principal components analysis was employed on the Adaptive Behavior Scales with scores of 122 older (mean age 63.5) individuals with severe intellectual impairment living in England. The study found the structure of adaptive skills and interpersonal maladaptive behaviors similar to that found for younger retarded adults. Two factors, personal…
Unsupervised spike sorting based on discriminative subspace learning.
Keshtkaran, Mohammad Reza; Yang, Zhi
2014-01-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. In this paper, we present two unsupervised spike sorting algorithms based on discriminative subspace learning. The first algorithm simultaneously learns the discriminative feature subspace and performs clustering. It uses histogram of features in the most discriminative projection to detect the number of neurons. The second algorithm performs hierarchical divisive clustering that learns a discriminative 1-dimensional subspace for clustering in each level of the hierarchy until achieving almost unimodal distribution in the subspace. The algorithms are tested on synthetic and in-vivo data, and are compared against two widely used spike sorting methods. The comparative results demonstrate that our spike sorting methods can achieve substantially higher accuracy in lower dimensional feature space, and they are highly robust to noise. Moreover, they provide significantly better cluster separability in the learned subspace than in the subspace obtained by principal component analysis or wavelet transform.
Traiperm, Paweena; Chow, Janene; Nopun, Possathorn; Staples, G; Swangpol, Sasivimon C
2017-12-01
The genus Argyreia Lour. is one of the species-rich Asian genera in the family Convolvulaceae. Several species complexes were recognized in which taxon delimitation was imprecise, especially when examining herbarium materials without fully developed open flowers. The main goal of this study is to investigate and describe leaf anatomy for some morphologically similar Argyreia using epidermal peeling, leaf and petiole transverse sections, and scanning electron microscopy. Phenetic analyses including cluster analysis and principal component analysis were used to investigate the similarity of these morpho-types. Anatomical differences observed between the morpho-types include epidermal cell walls and the trichome types on the leaf epidermis. Additional differences in the leaf and petiole transverse sections include the epidermal cell shape of the adaxial leaf blade, the leaf margins, and the petiole transverse sectional outline. The phenogram from cluster analysis using the UPGMA method represented four groups with an R value of 0.87. Moreover, the important quantitative and qualitative leaf anatomical traits of the four groups were confirmed by the principal component analysis of the first two components. The results from phenetic analyses confirmed the anatomical differentiation between the morpho-types. Leaf anatomical features regarded as particularly informative for morpho-type differentiation can be used to supplement macro morphological identification.
Measuring the Indonesian provinces competitiveness by using PCA technique
NASA Astrophysics Data System (ADS)
Runita, Ditha; Fajriyah, Rohmatul
2017-12-01
Indonesia is a country which has vast teritoty. It has 34 provinces. Building local competitiveness is critical to enhance the long-term national competitiveness especially for a country as diverse as Indonesia. A competitive local government can attract and maintain successful firms and increase living standards for its inhabitants, because investment and skilled workers gravitate from uncompetitive regions to more competitive ones. Altough there are other methods to measuring competitiveness, but here we have demonstrated a simple method using principal component analysis (PCA). It can directly be applied to correlated, multivariate data. The analysis on Indonesian provinces provides 3 clusters based on the competitiveness measurement and the clusters are Bad, Good and Best perform provinces.
Potential of SNP markers for the characterization of Brazilian cassava germplasm.
de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte
2014-06-01
High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.
Mantle, Peter; Modalca, Mirela; Nicholls, Andrew; Tatu, Calin; Tatu, Diana; Toncheva, Draga
2011-01-01
1H NMR spectroscopy of urine has been applied to exploring metabolomic differences between people diagnosed with Balkan endemic nephropathy (BEN), and treated by haemodialysis, and those without overt renal disease in Romania and Bulgaria. Convenience sampling was made from patients receiving haemodialysis in hospital and healthy controls in their village. Principal component analysis clustered healthy controls from both countries together. Bulgarian BEN patients clustered separately from controls, though in the same space. However, Romanian BEN patients not only also clustered away from controls but also clustered separately from the BEN patients in Bulgaria. Notably, the urinary metabolomic data of two people sampled as Romanian controls clustered within the Romanian BEN group. One of these had been suspected of incipient symptoms of BEN at the time of selection as a ‘healthy’ control. This implies, at first sight, that metabolomic analysis can be predictive of impending morbidity before conventional criteria can diagnose BEN. Separate clustering of BEN patients from Romania and Bulgaria could indicate difference in aetiology of this particular silent renal atrophy in different geographic foci across the Balkans. PMID:22069742
Phase stability and electronic structure of UMo2Al20: A first-principles study
NASA Astrophysics Data System (ADS)
Liu, Peng-Chuang; Xian, Ya-Jiang; Wang, Xin; Zhang, Yu-Ting; Zhang, Peng-Cheng
2017-09-01
In this paper, the phase stability of UMo2Al20 was explored using cluster formula in combination with first-principles calculations. Cluster formula analysis uncovered that the compound was composed of two principal clusters, i.e. [Mo-Al12] and [U-Al16]. The electronic interactions between U, Mo and Al atoms in this compound were discussed using elastic property, Bader charges and energy-resolved local bonding analysis, as well as the electronic interactions between Mo and Al atoms in [Mo-Al12] cluster and between U and Al atoms in [U-Al16] cluster. It revealed that UMo2Al20 satisfied the mechanical stability criterion for cubic system, and exhibited near ionic bonding character with weak bonding directionality. The calculations within both standard DFT and HSE frameworks demonstrated that U and Al atoms acted as an electron donor while Mo atoms acted as electron acceptor. The intrinsic stability of UMo2Al20 mainly stemmed from the bonding states of Mo-Al bonds and Al-Al bonds in [Mo-Al12] cluster. These calculations provide a further insight on the CeCr2Al20-type ternary compounds.
Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne
2015-10-01
The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking <300 minutes per week of moderate activity was significantly greater in cluster 1 than in clusters 2 and 3. Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.
2010-01-01
Background Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization. Result We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions. Conclusions The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data. PMID:20937082
Dynamic of consumer groups and response of commodity markets by principal component analysis
NASA Astrophysics Data System (ADS)
Nobi, Ashadun; Alam, Shafiqul; Lee, Jae Woo
2017-09-01
This study investigates financial states and group dynamics by applying principal component analysis to the cross-correlation coefficients of the daily returns of commodity futures. The eigenvalues of the cross-correlation matrix in the 6-month timeframe displays similar values during 2010-2011, but decline following 2012. A sharp drop in eigenvalue implies the significant change of the market state. Three commodity sectors, energy, metals and agriculture, are projected into two dimensional spaces consisting of two principal components (PC). We observe that they form three distinct clusters in relation to various sectors. However, commodities with distinct features have intermingled with one another and scattered during severe crises, such as the European sovereign debt crises. We observe the notable change of the position of two dimensional spaces of groups during financial crises. By considering the first principal component (PC1) within the 6-month moving timeframe, we observe that commodities of the same group change states in a similar pattern, and the change of states of one group can be used as a warning for other group.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Technical and biological reproducibility ranged between 96.8-99.4% and 47.6-94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable.
NASA Astrophysics Data System (ADS)
Huang, W.; Campredon, R.; Abrao, J. J.; Bernat, M.; Latouche, C.
1994-06-01
In the last decade, the Atlantic coast of south-eastern Brazil has been affected by increasing deforestation and anthropogenic effluents. Sediments in the coastal lagoons have recorded the process of such environmental change. Thirty-seven sediment samples from three cores in Piratininga Lagoon, Rio de Janeiro, were analyzed for their major components and minor element concentrations in order to examine geochemical characteristics and the depositional environment and to investigate the variation of heavy metals of environmental concern. Two multivariate analysis methods, principal component analysis and cluster analysis, were performed on the analytical data set to help visualize the sample clusters and the element associations. On the whole, the sediment samples from each core are similar and the sample clusters corresponding to the three cores are clearly separated, as a result of the different conditions of sedimentation. Some changes in the depositional environment are recognized using the results of multivariate analysis. The enrichment of Pb, Cu, and Zn in the upper parts of cores is in agreement with increasing anthropogenic influx (pollution).
NASA Technical Reports Server (NTRS)
Li, Z. K.
1985-01-01
A specialized program was developed for flow cytometric list-mode data using an heirarchical tree method for identifying and enumerating individual subpopulations, the method of principal components for a two-dimensional display of 6-parameter data array, and a standard sorting algorithm for characterizing subpopulations. The program was tested against a published data set subjected to cluster analysis and experimental data sets from controlled flow cytometry experiments using a Coulter Electronics EPICS V Cell Sorter. A version of the program in compiled BASIC is usable on a 16-bit microcomputer with the MS-DOS operating system. It is specialized for 6 parameters and up to 20,000 cells. Its two-dimensional display of Euclidean distances reveals clusters clearly, as does its 1-dimensional display. The identified subpopulations can, in suitable experiments, be related to functional subpopulations of cells.
Genetic diversity studies in pea (Pisum sativum L.) using simple sequence repeat markers.
Kumari, P; Basal, N; Singh, A K; Rai, V P; Srivastava, C P; Singh, P K
2013-03-13
The genetic diversity among 28 pea (Pisum sativum L.) genotypes was analyzed using 32 simple sequence repeat markers. A total of 44 polymorphic bands, with an average of 2.1 bands per primer, were obtained. The polymorphism information content ranged from 0.657 to 0.309 with an average of 0.493. The variation in genetic diversity among these cultivars ranged from 0.11 to 0.73. Cluster analysis based on Jaccard's similarity coefficient using the unweighted pair-group method with arithmetic mean (UPGMA) revealed 2 distinct clusters, I and II, comprising 6 and 22 genotypes, respectively. Cluster II was further differentiated into 2 subclusters, IIA and IIB, with 12 and 10 genotypes, respectively. Principal component (PC) analysis revealed results similar to those of UPGMA. The first, second, and third PCs contributed 21.6, 16.1, and 14.0% of the variation, respectively; cumulative variation of the first 3 PCs was 51.7%.
USDA-ARS?s Scientific Manuscript database
The United State Department of Agriculture (USDA), Agricultural Research Service, (ARS), Plant Genetic Resources Conservation Unit’s (PGRCU) sunn hemp (Crotalaria juncea L.) germlasm collection consists of 22 accessions. Sixteen (16) accessions of the most seed productive were selected. These access...
ERIC Educational Resources Information Center
Silva, Marisa; da Silva, Sofia Marques; Araújo, Helena C
2017-01-01
This article presents an analysis of school principals' perspectives on networking concerning schools and school clusters from areas of social vulnerability (Educational Territories of Priority Intervention (TEIP)) in Northern Portugal. The meanings, purpose, benefits and difficulties of networking in education are examined, based on interviews…
Koželj, Vesna; Vegnuti, Miljana; Drevenšek, Martina; Hortis-Dzierzbicka, Maria; Gonzalez-Landa, Gonzalo; Hanstein, Siiri; Klimova, Irena; Kobus, Kazimierz; Kobus-Zaleśna, Katarzyna; Semb, Gunvor; Shaw, Bill
2012-11-01
To compare palatal dimensions in 6-year-old children with unilateral cleft lip and palate (UCLP) treated by different protocols with those of noncleft children. Retrospective intercenter outcome study. Patients : Upper dental casts from 129 children with repaired UCLP and 30 controls were analyzed by the trigonometric method. Six European cleft centers. Main outcome measures : Sagittal, transverse, and vertical dimensions of the palate were observed. Palate variables were analyzed with descriptive methods and nonparametric tests. Regarding several various characteristics measured on a relatively small number of subjects, hierarchical, k-means clustering, and principal component analyses were used. Mean values of the observed dimensions for five cleft groups differed significantly from the control (p < .05). The group with one-stage closure of the cleft differed significantly from all other cleft groups in most variables (p < .05). Principal component analysis of all 159 cases identified three clusters with specific morphologic characteristics of the palate. A similar number of treated children were classified into each cluster, while all children without clefts were classified in the same cluster. The percentage of treated children from a particular group that fit this cluster ranged from 0% to 70% and increased with age at palatal closure and number of primary surgical procedures. At 6 years of age, children with stepwise repair and hard palate closure after the age of two more frequently result in palatal dimensions of noncleft control than children with earlier palatal closure and one-stage cleft repair.
Symptom clustering and quality of life in patients with ovarian cancer undergoing chemotherapy.
Nho, Ju-Hee; Reul Kim, Sung; Nam, Joo-Hyun
2017-10-01
The symptom clusters in patients with ovarian cancer undergoing chemotherapy have not been well evaluated. We investigated the symptom clusters and effects of symptom clusters on the quality of life of patients with ovarian cancer. We recruited 210 ovarian cancer patients being treated with chemotherapy and used a descriptive cross-sectional study design to collect information on their symptoms. To determine inter-relationships among symptoms, a principal component analysis with varimax rotation was performed based on the patient's symptoms (fatigue, pain, sleep disturbance, chemotherapy-induced peripheral neuropathy, anxiety, depression, and sexual dysfunction). All patients had experienced at least two domains of concurrent symptoms, and there were two types of symptom clusters. The first symptom cluster consisted of anxiety, depression, fatigue, and sleep disturbance symptoms, while the second symptom cluster consisted of pain and chemotherapy-induced peripheral neuropathy symptoms. Our subgroup cluster analysis showed that ovarian cancer patients with higher-scoring symptoms had significantly poorer quality of life in both symptom cluster 1 and 2 subgroups, with subgroup-specific patterns. The symptom clusters were different depending on age, age at disease onset, disease duration, recurrence, and performance status of patients with ovarian cancer. In addition, ovarian cancer patients experienced different symptom clusters according to cancer stage. The current study demonstrated that there is a specific pattern of symptom clusters, and symptom clusters negatively influence the quality of life in patients with ovarian cancer. Identifying symptom clusters of ovarian cancer patients may have clinical implications in improving symptom management. Copyright © 2017 Elsevier Ltd. All rights reserved.
Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu
2017-04-01
The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.
Fan, Yan; Zhang, Chenglin; Wu, Wendan; He, Wei; Zhang, Li; Ma, Xiao
2017-10-16
Indigofera pseudotinctoria Mats is an agronomically and economically important perennial legume shrub with a high forage yield, protein content and strong adaptability, which is subject to natural habitat fragmentation and serious human disturbance. Until now, our knowledge of the genetic relationships and intraspecific genetic diversity for its wild collections is still poor, especially at small spatial scales. Here amplified fragment length polymorphism (AFLP) technology was employed for analysis of genetic diversity, differentiation, and structure of 364 genotypes of I. pseudotinctoria from 15 natural locations in Wushan Montain, a highly structured mountain with typical karst landforms in Southwest China. We also tested whether eco-climate factors has affected genetic structure by correlating genetic diversity with habitat features. A total of 515 distinctly scoreable bands were generated, and 324 of them were polymorphic. The polymorphic information content (PIC) ranged from 0.694 to 0.890 with an average of 0.789 per primer pair. On species level, Nei's gene diversity ( H j ), the Bayesian genetic diversity index ( H B ) and the Shannon information index ( I ) were 0.2465, 0.2363 and 0.3772, respectively. The high differentiation among all sampling sites was detected ( F ST = 0.2217, G ST = 0.1746, G' ST = 0.2060, θ B = 0.1844), and instead, gene flow among accessions ( N m = 1.1819) was restricted. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. The population genetic structure resolved by the UPGMA tree, principal coordinate analysis, and Bayesian-based cluster analyses irrefutably grouped all accessions into two distinct clusters, i.e., lowland and highland groups. This structure pattern may indicate joint effects by the neutral evolution and natural selection. Restricted N m was observed across all accessions, and genetic barriers were detected between adjacent accessions due to specifically geographical landform.
AFLP-based genetic diversity assessment of commercially important tea germplasm in India.
Sharma, R K; Negi, M S; Sharma, S; Bhardwaj, P; Kumar, R; Bhattachrya, E; Tripathi, S B; Vijayan, D; Baruah, A R; Das, S C; Bera, B; Rajkumar, R; Thomas, J; Sud, R K; Muraleedharan, N; Hazarika, M; Lakshmikumaran, M; Raina, S N; Ahuja, P S
2010-08-01
India has a large repository of important tea accessions and, therefore, plays a major role in improving production and quality of tea across the world. Using seven AFLP primer combinations, we analyzed 123 commercially important tea accessions representing major populations in India. The overall genetic similarity recorded was 51%. No significant differences were recorded in average genetic similarity among tea populations cultivated in various geographic regions (northwest 0.60, northeast and south both 0.59). UPGMA cluster analysis grouped the tea accessions according to geographic locations, with a bias toward China or Assam/Cambod types. Cluster analysis results were congruent with principal component analysis. Further, analysis of molecular variance detected a high level of genetic variation (85%) within and limited genetic variation (15%) among the populations, suggesting their origin from a similar genetic pool.
Biomolecular Characterization of Diazotrophs Isolated from the Tropical Soil in Malaysia
Naher, Umme Aminun; Othman, Radziah; Latif, Mohammad Abdul; Panhwar, Qurban Ali; Amaddin, Puteri Aminatulhawa Megat; Shamsuddin, Zulkifli H
2013-01-01
This study was conducted to evaluate selected biomolecular characteristics of rice root-associated diazotrophs isolated from the Tanjong Karang rice irrigation project area of Malaysia. Soil and rice plant samples were collected from seven soil series belonging to order Inceptisol (USDA soil taxonomy). A total of 38 diazotrophs were isolated using a nitrogen-free medium. The biochemical properties of the isolated bacteria, such as nitrogenase activity, indoleacetic acid (IAA) production and sugar utilization, were measured. According to a cluster analysis of Jaccard’s similarity coefficients, the genetic similarities among the isolated diazotrophs ranged from 10% to 100%. A dendogram constructed using the unweighted pair-group method with arithmetic mean (UPGMA) showed that the isolated diazotrophs clustered into 12 groups. The genomic DNA rep-PCR data were subjected to a principal component analysis, and the first four principal components (PC) accounted for 52.46% of the total variation among the 38 diazotrophs. The 10 diazotrophs that tested highly positive in the acetylene reduction assay (ARA) were identified as Bacillus spp. (9 diazotrophs) and Burkholderia sp. (Sb16) using the partial 16S rRNA gene sequence analysis. In the analysis of the biochemical characteristics, three principal components were accounted for approximately 85% of the total variation among the identified diazotrophs. The examination of root colonization using scanning electron microscopy (SEM) and transmission electron microscopy (TEM) proved that two of the isolated diazotrophs (Sb16 and Sb26) were able to colonize the surface and interior of rice roots and fixed 22%–24% of the total tissue nitrogen from the atmosphere. In general, the tropical soils (Inceptisols) of the Tanjong Karang rice irrigation project area in Malaysia harbor a diverse group of diazotrophs that exhibit a large variation of biomolecular characteristics. PMID:23999588
Regionalization of precipitation characteristics in Iran's Lake Urmia basin
NASA Astrophysics Data System (ADS)
Fazel, Nasim; Berndtsson, Ronny; Uvo, Cintia Bertacchi; Madani, Kaveh; Kløve, Bjørn
2018-04-01
Lake Urmia in northwest Iran, once one of the largest hypersaline lakes in the world, has shrunk by almost 90% in area and 80% in volume during the last four decades. To improve the understanding of regional differences in water availability throughout the region and to refine the existing information on precipitation variability, this study investigated the spatial pattern of precipitation for the Lake Urmia basin. Daily rainfall time series from 122 precipitation stations with different record lengths were used to extract 15 statistical descriptors comprising 25th percentile, 75th percentile, and coefficient of variation for annual and seasonal total precipitation. Principal component analysis in association with cluster analysis identified three main homogeneous precipitation groups in the lake basin. The first sub-region (group 1) includes stations located in the center and southeast; the second sub-region (group 2) covers mostly northern and northeastern part of the basin, and the third sub-region (group 3) covers the western and southern edges of the basin. Results of principal component (PC) and clustering analyses showed that seasonal precipitation variation is the most important feature controlling the spatial pattern of precipitation in the lake basin. The 25th and 75th percentiles of winter and autumn are the most important variables controlling the spatial pattern of the first rotated principal component explaining about 32% of the total variance. Summer and spring precipitation variations are the most important variables in the second and third rotated principal components, respectively. Seasonal variation in precipitation amount and seasonality are explained by topography and influenced by the lake and westerly winds that are related to the strength of the North Atlantic Oscillation. Despite using incomplete time series with different lengths, the identified sub-regions are physically meaningful.
Lipophilicity of oils and fats estimated by TLC.
Naşcu-Briciu, Rodica D; Sârbu, Costel
2013-04-01
A representative series of natural toxins belonging to alkaloids and mycotoxins classes was investigated by TLC on classical chemically bonded plates and also on oils- and fats-impregnated plates. Their lipophilicity indices are employed in the characterization and comparison of oils and fats. The retention results allowed an accurate indirect estimation of oils and fats lipophilicity. The investigated fats and oils near classical chemically bonded phases are classified and compared by means of multivariate exploratory techniques, such as cluster analysis, principal component analysis, or fuzzy-principal component analysis. Additionally, a concrete hierarchy of oils and fats derived from the observed lipophilic character is suggested. Human fat seems to be very similar to animal fats, but also possess RP-18, RP-18W, and RP-8. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Clinical Study of the 3D-Master Color System among the Spanish Population.
Gómez-Polo, Cristina; Gómez-Polo, Miguel; Martínez Vázquez de Parga, Juan Antonio; Celemín-Viñuela, Alicia
2017-01-12
To study whether the shades of the 3D-Master System were grouped and represented in the chromatic space according to the three-color coordinates of value, chroma, and hue. Maxillary central incisor color was measured on tooth surfaces through the Easyshade Compact spectrophotometer using 1361 participants aged between 16 and 89. The natural (not bleached teeth) color of the middle thirds was registered in the 3D-Master System nomenclature and in the CIELCh system. Principal component analysis and cluster analysis were applied. 75 colors of the 3D-Master System were found. The statistical analysis revealed the existence of 5 cluster groups. The centroid, the average of the 75 samples, in relation to lightness (L*) was 74.64, 22.87 for chroma (C*), and 88.85 for hue (h*). All of the clusters, except cluster 3, showed significant statistical differences with the centroid for the three-color coordinates (p <0.001). The results of this study indicated that 75 shades in the 3D-Master System were grouped into 5 clusters following coordinates L*, C*, and h* resulting from the dental spectrophotometer Vita Easyshade compact. The shades that composed each cluster did not belong to the same lightness color dimension groups. There was no special uniform chromatic distribution among the colors of the 3D-Master System. © 2017 by the American College of Prosthodontists.
Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy
NASA Astrophysics Data System (ADS)
Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.
2012-11-01
Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.
Kinematic foot types in youth with equinovarus secondary to hemiplegia.
Krzak, Joseph J; Corcos, Daniel M; Damiano, Diane L; Graf, Adam; Hedeker, Donald; Smith, Peter A; Harris, Gerald F
2015-02-01
Elevated kinematic variability of the foot and ankle segments exists during gait among individuals with equinovarus secondary to hemiplegic cerebral palsy (CP). Clinicians have previously addressed such variability by developing classification schemes to identify subgroups of individuals based on their kinematics. To identify kinematic subgroups among youth with equinovarus secondary to CP using 3-dimensional multi-segment foot and ankle kinematics during locomotion as inputs for principal component analysis (PCA), and K-means cluster analysis. In a single assessment session, multi-segment foot and ankle kinematics using the Milwaukee Foot Model (MFM) were collected in 24 children/adolescents with equinovarus and 20 typically developing children/adolescents. PCA was used as a data reduction technique on 40 variables. K-means cluster analysis was performed on the first six principal components (PCs) which accounted for 92% of the variance of the dataset. The PCs described the location and plane of involvement in the foot and ankle. Five distinct kinematic subgroups were identified using K-means clustering. Participants with equinovarus presented with variable involvement ranging from primary hindfoot or forefoot deviations to deformtiy that included both segments in multiple planes. This study provides further evidence of the variability in foot characteristics associated with equinovarus secondary to hemiplegic CP. These findings would not have been detected using a single segment foot model. The identification of multiple kinematic subgroups with unique foot and ankle characteristics has the potential to improve treatment since similar patients within a subgroup are likely to benefit from the same intervention(s). Copyright © 2014 Elsevier B.V. All rights reserved.
Kinematic foot types in youth with equinovarus secondary to hemiplegia
Krzak, Joseph J.; Corcos, Daniel M.; Damiano, Diane L.; Graf, Adam; Hedeker, Donald; Smith, Peter A.; Harris, Gerald F.
2015-01-01
Background Elevated kinematic variability of the foot and ankle segments exists during gait among individuals with equinovarus secondary to hemiplegic cerebral palsy (CP). Clinicians have previously addressed such variability by developing classification schemes to identify subgroups of individuals based on their kinematics. Objective To identify kinematic subgroups among youth with equinovarus secondary to CP using 3-dimensional multi-segment foot and ankle kinematics during locomotion as inputs for principal component analysis (PCA), and K-means cluster analysis. Methods In a single assessment session, multi-segment foot and ankle kinematics using the Milwaukee Foot Model (MFM) were collected in 24 children/adolescents with equinovarus and 20 typically developing children/adolescents. Results PCA was used as a data reduction technique on 40 variables. K-means cluster analysis was performed on the first six principal components (PCs) which accounted for 92% of the variance of the dataset. The PCs described the location and plane of involvement in the foot and ankle. Five distinct kinematic subgroups were identified using K-means clustering. Participants with equinovarus presented with variable involvement ranging from primary hindfoot or forefoot deviations to deformtiy that included both segments in multiple planes. Conclusion This study provides further evidence of the variability in foot characteristics associated with equinovarus secondary to hemiplegic CP. These findings would not have been detected using a single segment foot model. The identification of multiple kinematic subgroups with unique foot and ankle characteristics has the potential to improve treatment since similar patients within a subgroup are likely to benefit from the same intervention(s). PMID:25467429
NASA Astrophysics Data System (ADS)
Farshadfar, M.; Farshadfar, E.
The present research was conducted to determine the genetic variability of 18 Lucerne cultivars, based on morphological and biochemical markers. The traits studied were plant height, tiller number, biomass, dry yield, dry yield/biomass, dry leaf/dry yield, macro and micro elements, crude protein, dry matter, crude fiber and ash percentage and SDS- PAGE in seed and leaf samples. Field experiments included 18 plots of two meter rows. Data based on morphological, chemical and SDS-PAGE markers were analyzed using SPSSWIN soft ware and the multivariate statistical procedures: cluster analysis (UPGMA), principal component. Analysis of analysis of variance and mean comparison for morphological traits reflected significant differences among genotypes. Genotype 13 and 15 had the greatest values for most traits. The Genotypic Coefficient of Variation (GCV), Phenotypic Coefficient of Variation (PCV) and Heritability (Hb) parameters for different characters raged from 12.49 to 26.58% for PCV, hence the GCV ranged from 6.84 to 18.84%. The greatest value of Hb was 0.94 for stem number. Lucerne genotypes could be classified, based on morphological traits, into four clusters and 94% of the variance among the genotypes was explained by two PCAs: Based on chemical traits they were classified into five groups and 73.492% of variance was explained by four principal components: Dry matter, protein, fiber, P, K, Na, Mg and Zn had higher variance. Genotypes based on the SDS-PAGE patterns all genotypes were classified into three clusters. The greatest genetic distance was between cultivar 10 and others, therefore they would be suitable parent in a breeding program.
Metsalu, Tauno; Vilo, Jaak
2015-01-01
The Principal Component Analysis (PCA) is a widely used method of reducing the dimensionality of high-dimensional data, often followed by visualizing two of the components on the scatterplot. Although widely used, the method is lacking an easy-to-use web interface that scientists with little programming skills could use to make plots of their own data. The same applies to creating heatmaps: it is possible to add conditional formatting for Excel cells to show colored heatmaps, but for more advanced features such as clustering and experimental annotations, more sophisticated analysis tools have to be used. We present a web tool called ClustVis that aims to have an intuitive user interface. Users can upload data from a simple delimited text file that can be created in a spreadsheet program. It is possible to modify data processing methods and the final appearance of the PCA and heatmap plots by using drop-down menus, text boxes, sliders etc. Appropriate defaults are given to reduce the time needed by the user to specify input parameters. As an output, users can download PCA plot and heatmap in one of the preferred file formats. This web server is freely available at http://biit.cs.ut.ee/clustvis/. PMID:25969447
Nijman, Henk; Simpson, Alan; Jones, Julia
2010-01-01
Background Conflict (aggression, substance use, absconding, etc.) and containment (coerced medication, manual restraint, etc.) threaten the safety of patients and staff on psychiatric wards. Previous work has suggested that staff variables may be significant in explaining differences between wards in their rates of these behaviours, and that structure (ward organisation, rules and daily routines) might be the most critical of these. This paper describes the exploration of a large dataset to assess the relationship between structure and other staff variables. Methods A multivariate cross-sectional design was utilised. Data were collected from staff on 136 acute psychiatric wards in 26 NHS Trusts in England, measuring leadership, teamwork, structure, burnout and attitudes towards difficult patients. Relationships between these variables were explored through principal components analysis (PCA), structural equation modelling and cluster analysis. Results Principal components analysis resulted in the identification of each questionnaire as a separate factor, indicating that the selected instruments assessed a number of non-overlapping items relevant for ward functioning. Structural equation modelling suggested a linear model in which leadership influenced teamwork, teamwork structure; structure burnout; and burnout feelings about difficult patients. Finally, cluster analysis identified two significantly distinct groups of wards: the larger of which had particularly good leadership, teamwork, structure, attitudes towards patients and low burnout; and the second smaller proportion which was poor on all variables and high on burnout. The better functioning cluster of wards had significantly lower rates of containment events. Conclusion The overall performance of staff teams is associated with differing rates of containment on wards. Interventions to reduce rates of containment on wards may need to address staff issues at every level, from leadership through to staff attitudes. PMID:20082064
Odoi, Agricola; Wray, Ron; Emo, Marion; Birch, Stephen; Hutchison, Brian; Eyles, John; Abernathy, Tom
2005-01-01
Background Population health planning aims to improve the health of the entire population and to reduce health inequities among population groups. Socioeconomic factors are increasingly being recognized as major determinants of many aspects of health and causes of health inequities. Knowledge of socioeconomic characteristics of neighbourhoods is necessary to identify their unique health needs and enhance identification of socioeconomically disadvantaged populations. Careful integration of this knowledge into health planning activities is necessary to ensure that health planning and service provision are tailored to unique neighbourhood population health needs. In this study, we identify unique neighbourhood socioeconomic characteristics and classify the neighbourhoods based on these characteristics. Principal components analysis (PCA) of 18 socioeconomic variables was used to identify the principal components explaining most of the variation in socioeconomic characteristics across the neighbourhoods. Cluster analysis was used to classify neighbourhoods based on their socioeconomic characteristics. Results Results of the PCA and cluster analysis were similar but the latter were more objective and easier to interpret. Five neighbourhood types with distinguishing socioeconomic and demographic characteristics were identified. The methodology provides a more complete picture of the neighbourhood socioeconomic characteristics than when a single variable (e.g. income) is used to classify neighbourhoods. Conclusion Cluster analysis is useful for generating neighbourhood population socioeconomic and demographic characteristics that can be useful in guiding neighbourhood health planning and service provision. This study is the first of a series of studies designed to investigate health inequalities at the neighbourhood level with a view to providing evidence-base for health planners, service providers and policy makers to help address health inequity issues at the neighbourhood level. Subsequent studies will investigate inequalities in health outcomes both within and across the neighbourhood types identified in the current study. PMID:16092969
Zhang, Qin-di; Jia, Rui-Zhi; Meng, Chao; Ti, Chao-Wen; Wang, Yi-Ling
2015-01-01
Knowledge of the genetic diversity and structure of tree species across their geographic ranges is essential for sustainable use and management of forest ecosystems. Acer grosseri Pax., an economically and ecologically important maple species, is mainly distributed in North China. In this study, the genetic diversity and population differentiation of 24 natural populations of this species were evaluated using sequence-related amplified polymorphism markers and morphological characters. The results show that highly significant differences occurred in 32 morphological traits. The coefficient of variation of 34 characters was 18.19 %. Principal component analysis indicated that 18 of 34 traits explained 60.20 % of the total variance. The phenotypic differentiation coefficient (VST) was 36.06 % for all morphological traits. The Shannon–Wiener index of 34 morphological characters was 6.09, while at the population level, it was 1.77. The percentage of polymorphic bands of all studied A. grosseri populations was 82.14 %. Nei's gene diversity (He) and Shannon's information index (I) were 0.35 and 0.50, respectively. Less genetic differentiation was detected among the natural populations (GST = 0.20, ΦST = 0.10). Twenty-four populations of A. grosseri formed two main clusters, which is consistent with morphological cluster analysis. Principal coordinates analysis and STRUCTURE analysis supported the UPGMA-cluster dendrogram. There was no significant correlation between genetic and geographical distances among populations. Both molecular and morphological data suggested that A. grosseri is rich in genetic diversity. The high level of genetic variation within populations could be affected by the biological characters, mating system and lifespan of A. grosseri, whereas the lower genetic diversity among populations could be caused by effective gene exchange, selective pressure from environmental heterogeneity and the species' geographical range. PMID:26311734
Analysis of the mutations induced by conazole fungicides in vivo.
Ross, Jeffrey A; Leavitt, Sharon A
2010-05-01
The mouse liver tumorigenic conazole fungicides triadimefon and propiconazole have previously been shown to be in vivo mouse liver mutagens in the Big Blue transgenic mutation assay when administered in feed at tumorigenic doses, whereas the non-tumorigenic conazole myclobutanil was not mutagenic. DNA sequencing of the mutants recovered from each treatment group as well as from animals receiving control diet was conducted to gain additional insight into the mode of action by which tumorigenic conazoles induce mutations. Relative dinucleotide mutabilities (RDMs) were calculated for each possible dinucleotide in each treatment group and then examined by multivariate statistical analysis techniques. Unsupervised hierarchical clustering analysis of RDM values segregated two independent control groups together, along with the non-tumorigen myclobutanil. The two tumorigenic conazoles clustered together in a distinct grouping. Partitioning around mediods of RDM values into two clusters also groups the triadimefon and propiconazole together in one cluster and the two control groups and myclobutanil together in a second cluster. Principal component analysis of these results identifies two components that account for 88.3% of the variability in the points. Taken together, these results are consistent with the hypothesis that propiconazole- and triadimefon-induced mutations do not represent clonal expansion of background mutations and support the hypothesis that they arise from the accumulation of reactive electrophilic metabolic intermediates within the liver in vivo.
Mueller, Daniela; Ferrão, Marco Flôres; Marder, Luciano; da Costa, Adilson Ben; de Cássia de Souza Schneider, Rosana
2013-01-01
The main objective of this study was to use infrared spectroscopy to identify vegetable oils used as raw material for biodiesel production and apply multivariate analysis to the data. Six different vegetable oil sources—canola, cotton, corn, palm, sunflower and soybeans—were used to produce biodiesel batches. The spectra were acquired by Fourier transform infrared spectroscopy using a universal attenuated total reflectance sensor (FTIR-UATR). For the multivariate analysis principal component analysis (PCA), hierarchical cluster analysis (HCA), interval principal component analysis (iPCA) and soft independent modeling of class analogy (SIMCA) were used. The results indicate that is possible to develop a methodology to identify vegetable oils used as raw material in the production of biodiesel by FTIR-UATR applying multivariate analysis. It was also observed that the iPCA found the best spectral range for separation of biodiesel batches using FTIR-UATR data, and with this result, the SIMCA method classified 100% of the soybean biodiesel samples. PMID:23539030
Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep
2015-05-01
The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
NASA Astrophysics Data System (ADS)
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M
2010-01-01
Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.
Ling, Y H; Zhang, X D; Yao, N; Ding, J P; Chen, H Q; Zhang, Z J; Zhang, Y H; Ren, C H; Ma, Y H; Zhang, X R
2012-02-01
To investigate the genetic diversity of seven Chinese indigenous meat goat breeds (Tibet goat, Guizhou white goat, Shannan white goat, Yichang white goat, Matou goat, Changjiangsanjiaozhou white goat and Anhui white goat), explain their genetic relationship and assess their integrity and degree of admixture, 302 individuals from these breeds and 42 Boer goats introduced from Africa as reference samples were genotyped for 11 microsatellite markers. Results indicated that the genetic diversity of Chinese indigenous meat goats was rich. The mean heterozygosity and the mean allelic richness (AR) for the 8 goat breeds varied from 0.697 to 0.738 and 6.21 to 7.35, respectively. Structure analysis showed that Tibet goat breed was genetically distinct and was the first to separate and the other Chinese goats were then divided into two sub-clusters: Shannan white goat and Yichang white goat in one cluster; and Guizhou white goat, Matou goat, Changjiangsanjiaozhou white goat and Anhui white goat in the other cluster. This grouping pattern was further supported by clustering analysis and Principal component analysis. These results may provide a scientific basis for the characteristization, conservation and utilization of Chinese meat goats.
Li, Dongsheng; Yang, Wei; Zhang, Wenyao
2017-05-01
Stress corrosion is the major failure type of bridge cable damage. The acoustic emission (AE) technique was applied to monitor the stress corrosion process of steel wires used in bridge cable structures. The damage evolution of stress corrosion in bridge cables was obtained according to the AE characteristic parameter figure. A particle swarm optimization cluster method was developed to determine the relationship between the AE signal and stress corrosion mechanisms. Results indicate that the main AE sources of stress corrosion in bridge cables included four types: passive film breakdown and detachment of the corrosion product, crack initiation, crack extension, and cable fracture. By analyzing different types of clustering data, the mean value of each damage pattern's AE characteristic parameters was determined. Different corrosion damage source AE waveforms and the peak frequency were extracted. AE particle swarm optimization cluster analysis based on principal component analysis was also proposed. This method can completely distinguish the four types of damage sources and simplifies the determination of the evolution process of corrosion damage and broken wire signals. Copyright © 2017. Published by Elsevier B.V.
Chemical Polymorphism of Essential Oils of Artemisia vulgaris Growing Wild in Lithuania.
Judzentiene, Asta; Budiene, Jurga
2018-02-01
Compositional variability of mugwort (Artemisia vulgaris L.) essential oils has been investigated in the study. Plant material (over ground parts at full flowering stage) was collected from forty-four wild populations in Lithuania. The oils from aerial parts were obtained by hydrodistillation and analyzed by GC(FID) and GC/MS. In total, up to 111 components were determined in the oils. As the major constituents were found: sabinene, 1,8-cineole, artemisia ketone, both thujone isomers, camphor, cis-chrysanthenyl acetate, davanone and davanone B. The compositional data were subjected to statistical analysis. The application of PCA (Principal Component Analysis) and AHC (Agglomerative Hierarchical Clustering) allowed grouping the oils into six clusters. AHC permitted to distinguish an artemisia ketone chemotype, which, to the best of our knowledge, is very scarce. Additionally, two rare cis-chrysanthenyl acetate and sabinene oil types were determined for the plants growing in Lithuania. Besides, davanone was found for the first time as a principal component in mugwort oils. The performed study revealed significant chemical polymorphism of essential oils in mugwort plants native to Lithuania; it has expanded our chemotaxonomic knowledge both of A. vulgaris species and Artemisia genus. © 2018 Wiley-VHCA AG, Zurich, Switzerland.
NASA Technical Reports Server (NTRS)
Storrie-Lombardi, Michael C.; Hoover, Richard B.
2005-01-01
Last year we presented techniques for the detection of fossils during robotic missions to Mars using both structural and chemical signatures[Storrie-Lombardi and Hoover, 2004]. Analyses included lossless compression of photographic images to estimate the relative complexity of a putative fossil compared to the rock matrix [Corsetti and Storrie-Lombardi, 2003] and elemental abundance distributions to provide mineralogical classification of the rock matrix [Storrie-Lombardi and Fisk, 2004]. We presented a classification strategy employing two exploratory classification algorithms (Principal Component Analysis and Hierarchical Cluster Analysis) and non-linear stochastic neural network to produce a Bayesian estimate of classification accuracy. We now present an extension of our previous experiments exploring putative fossil forms morphologically resembling cyanobacteria discovered in the Orgueil meteorite. Elemental abundances (C6, N7, O8, Na11, Mg12, Ai13, Si14, P15, S16, Cl17, K19, Ca20, Fe26) obtained for both extant cyanobacteria and fossil trilobites produce signatures readily distinguishing them from meteorite targets. When compared to elemental abundance signatures for extant cyanobacteria Orgueil structures exhibit decreased abundances for C6, N7, Na11, All3, P15, Cl17, K19, Ca20 and increases in Mg12, S16, Fe26. Diatoms and silicified portions of cyanobacterial sheaths exhibiting high levels of silicon and correspondingly low levels of carbon cluster more closely with terrestrial fossils than with extant cyanobacteria. Compression indices verify that variations in random and redundant textural patterns between perceived forms and the background matrix contribute significantly to morphological visual identification. The results provide a quantitative probabilistic methodology for discriminating putatitive fossils from the surrounding rock matrix and &om extant organisms using both structural and chemical information. The techniques described appear applicable to the geobiological analysis of meteoritic samples or in situ exploration of the Mars regolith. Keywords: cyanobacteria, microfossils, Mars, elemental abundances, complexity analysis, multifactor analysis, principal component analysis, hierarchical cluster analysis, artificial neural networks, paleo-biosignatures
Tang, Ping-Han; Wu, Ten-Ming; Yen, Tsung-Wen; Lai, S K; Hsu, P J
2011-09-07
We perform isothermal Brownian-type molecular dynamics simulations to obtain the velocity autocorrelation function and its time Fourier-transformed power spectral density for the metallic cluster Ag(17)Cu(2). The temperature dependences of these dynamical quantities from T = 0 to 1500 K were examined and across this temperature range the cluster melting temperature T(m), which we define to be the principal maximum position of the specific heat is determined. The instantaneous normal mode analysis is then used to dissect the cluster dynamics by calculating the vibrational instantaneous normal mode density of states and hence its frequency integrated value I(j) which is an ensemble average of all vibrational projection operators for the jth atom in the cluster. In addition to comparing the results with simulation data, we look more closely at the entities I(j) of all atoms using the point group symmetry and diagnose their temperature variations. We find that I(j) exhibit features that may be used to deduce T(m), which turns out to agree very well with those inferred from the power spectral density and specific heat. © 2011 American Institute of Physics
Dong, Xinwen; Zhang, Yunbo; Dong, Jin; Zhao, Yue; Guo, Jipeng; Wang, Zhanju; Liu, Mingqi; Na, Xiaolin; Wang, Cheng
2017-07-01
Di(2-ethylhexyl) phthalate (DEHP) is an omnipresent environmental chemical with widespread nonoccupational human exposure through multiple ways. Although considerable efforts have been invested to investigate mechanisms of DEHP toxicity, the key metabolic biomarkers of DEHP toxicity remain to be identified. The aim of this study was to assess the urinary metabonomics of dietary DEHP in rats using the technique of ultra-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry (UPLC/Q-TOF-MS). Fourteen female Wistar rats were divided into two groups and given increasing dietary doses of DEHP for 30 consecutive days. The urinary metabolite profile was studied using ultra-performance liquid chromatography coupled with quadrupole time-of-flight tandem mass spectrometry. Principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) enabled clusters to be clearly separated. Eleven principal urinary metabolites were identified as contributing to the clusters. The clusters in the positive electrospray ionization (ESI) mode were xanthurenic acid, kynurenic acid, nonate, N6-methyladenosine, and L-isoleucyl-L-proline. The clusters in the negative ESI mode were hippuric acid, tetrahydrocortisol, citric acid, phenylpropionylglycine, cPA(18:2(9Z, 12Z)/0:0), and LysoPC(14:1(9Z)). The urinary metabonomic changes indicated that exposure to dietary DEHP can affect energy-related metabolism, liver and renal function, fatty acid metabolism, and cause DNA damage in rats. The findings of this study on the urinary metabolites and metabolic pathways of DEHP may form the basis for future studies on the mechanisms of toxicity of this commonly found environmental chemical.
Sakhteman, Amirhossein; Faridi, Pouya; Daneshamouz, Saeid; Akbarizadeh, Amin Reza; Borhani-Haghighi, Afshin; Mohagheghzadeh, Abdolali
2017-01-01
Herbal oils have been widely used in Iran as medicinal compounds dating back to thousands of years in Iran. Chamomile oil is widely used as an example of traditional oil. We remade chamomile oils and tried to modify it with current knowledge and facilities. Six types of oil (traditional and modified) were prepared. Microbial limit tests and physicochemical tests were performed on them. Also, principal component analysis, hierarchical cluster analysis, and partial least squares discriminant analysis were done on the spectral data of attenuated total reflectance–infrared in order to obtain insight based on classification pattern of the samples. The results show that we can use modified versions of the chamomile oils (modified Clevenger-type apparatus method and microwave method) with the same content of traditional ones and with less microbial contaminations and better physicochemical properties. PMID:28585466
Zargaran, Arman; Sakhteman, Amirhossein; Faridi, Pouya; Daneshamouz, Saeid; Akbarizadeh, Amin Reza; Borhani-Haghighi, Afshin; Mohagheghzadeh, Abdolali
2017-10-01
Herbal oils have been widely used in Iran as medicinal compounds dating back to thousands of years in Iran. Chamomile oil is widely used as an example of traditional oil. We remade chamomile oils and tried to modify it with current knowledge and facilities. Six types of oil (traditional and modified) were prepared. Microbial limit tests and physicochemical tests were performed on them. Also, principal component analysis, hierarchical cluster analysis, and partial least squares discriminant analysis were done on the spectral data of attenuated total reflectance-infrared in order to obtain insight based on classification pattern of the samples. The results show that we can use modified versions of the chamomile oils (modified Clevenger-type apparatus method and microwave method) with the same content of traditional ones and with less microbial contaminations and better physicochemical properties.
Roshan, Abdul-Rahman A; Gad, Haidy A; El-Ahmady, Sherweit H; Khanbash, Mohamed S; Abou-Shoer, Mohamed I; Al-Azizi, Mohamed M
2013-08-14
This work describes a simple model developed for the authentication of monofloral Yemeni Sidr honey using UV spectroscopy together with chemometric techniques of hierarchical cluster analysis (HCA), principal component analysis (PCA), and soft independent modeling of class analogy (SIMCA). The model was constructed using 13 genuine Sidr honey samples and challenged with 25 honey samples of different botanical origins. HCA and PCA were successfully able to present a preliminary clustering pattern to segregate the genuine Sidr samples from the lower priced local polyfloral and non-Sidr samples. The SIMCA model presented a clear demarcation of the samples and was used to identify genuine Sidr honey samples as well as detect admixture with lower priced polyfloral honey by detection limits >10%. The constructed model presents a simple and efficient method of analysis and may serve as a basis for the authentication of other honey types worldwide.
Real Time Intelligent Target Detection and Analysis with Machine Vision
NASA Technical Reports Server (NTRS)
Howard, Ayanna; Padgett, Curtis; Brown, Kenneth
2000-01-01
We present an algorithm for detecting a specified set of targets for an Automatic Target Recognition (ATR) application. ATR involves processing images for detecting, classifying, and tracking targets embedded in a background scene. We address the problem of discriminating between targets and nontarget objects in a scene by evaluating 40x40 image blocks belonging to an image. Each image block is first projected onto a set of templates specifically designed to separate images of targets embedded in a typical background scene from those background images without targets. These filters are found using directed principal component analysis which maximally separates the two groups. The projected images are then clustered into one of n classes based on a minimum distance to a set of n cluster prototypes. These cluster prototypes have previously been identified using a modified clustering algorithm based on prior sensed data. Each projected image pattern is then fed into the associated cluster's trained neural network for classification. A detailed description of our algorithm will be given in this paper. We outline our methodology for designing the templates, describe our modified clustering algorithm, and provide details on the neural network classifiers. Evaluation of the overall algorithm demonstrates that our detection rates approach 96% with a false positive rate of less than 0.03%.
Middleton, David A; Hughes, Eleri; Madine, Jillian
2004-08-11
We describe an NMR approach for detecting the interactions between phospholipid membranes and proteins, peptides, or small molecules. First, 1H-13C dipolar coupling profiles are obtained from hydrated lipid samples at natural isotope abundance using cross-polarization magic-angle spinning NMR methods. Principal component analysis of dipolar coupling profiles for synthetic lipid membranes in the presence of a range of biologically active additives reveals clusters that relate to different modes of interaction of the additives with the lipid bilayer. Finally, by representing profiles from multiple samples in the form of contour plots, it is possible to reveal statistically significant changes in dipolar couplings, which reflect perturbations in the lipid molecules at the membrane surface or within the hydrophobic interior.
Boubaker, Moez Ben; Picard, Donald; Duchesne, Carl; Tessier, Jayson; Alamdari, Houshang; Fafard, Mario
2018-05-17
This paper reports on the application of an acousto-ultrasonic (AU) scheme for the inspection of industrial-size carbon anode blocks used in the production of primary aluminium by the Hall-Héroult process. A frequency-modulated wave is used to excite the anode blocks at multiple points. The collected attenuated AU signals are decomposed using the Discrete Wavelet Transform (DTW) after which vectors of features are calculated. Principal Component Analysis (PCA) is utilized to cluster the AU responses of the anodes. The approach allows locating cracks in the blocks and the AU features were found sensitive to crack severity. The results are validated using images collected after cutting some anodes. Copyright © 2018 Elsevier B.V. All rights reserved.
Mishra, K K; Pal, R S; Arunkumar, R; Chandrashekara, C; Jain, S K; Bhatt, J C
2013-06-01
Total phenolics, radical scavenging activity (RSA) on DPPH, ascorbic acid content and chelating activity on Fe(2+) of Pleurotus citrinopileatus, Pleurotus djamor, Pleurotus eryngii, Pleurotus flabellatus, Pleurotus florida, Pleurotus ostreatus, Pleurotus sajor-caju and Hypsizygus ulmarius have been evaluated. The assayed mushrooms contained 3.94-21.67 mg TAE of phenolics, 13.63-69.67% DPPH scavenging activity, 3.76-6.76 mg ascorbic acid and 60.25-82.7% chelating activity. Principal Component Analysis (PCA) revealed that significantly higher total phenolics, RSA on DPPH and growth/day was present in P. eryngii whereas P. citrinopileatus showed higher ascorbic acid and chelating activity. Agglomerative hierarchical clustering analysis revealed that studied mushroom species fall into two clusters; Cluster I included P. djamor, P. eryngii and P. flabellatus, while Cluster II included H. ulmarius, P. sajor-caju, P. citrinopileatus, P. ostreatus and P. florida. Enhanced yield of P. eryngii was achieved on spent compost casing material. Use of casing materials enhanced yield by 21-107% over non-cased substrate. Copyright © 2012 Elsevier Ltd. All rights reserved.
Seasonal and spatial variations of water quality and trophic status in Daya Bay, South China Sea.
Wu, Mei-Lin; Wang, You-Shao; Wang, Yu-Tu; Sun, Fu-Lin; Sun, Cui-Ci; Cheng, Hao; Dong, Jun-De
2016-11-15
Coastal water quality and trophic status are subject to intensive environmental stress induced by human activities and climate change. Quarterly cruises were conducted to identify environmental characteristics in Daya Bay in 2013. Water quality is spatially and temporally dynamic in the bay. Cluster analysis (CA) groups 12 monitoring stations into two clusters. Cluster I consists of stations (S1, S2, S4-S7, S9, and S12) located in the central, eastern, and southern parts of the bay, representing less polluted regions. Cluster II includes stations (S3, S8, S10, and S11) located in the western and northern parts of the bay, indicating the highly polluted regions receiving a high amount of wastewater and freshwater discharge. Principal component analysis (PCA) identified that water quality experience seasonal change (summer, winter, and spring-autumn seasons) because of two monsoons in the study area. Eutrophication in the bay is graded as high by Assessment of Estuarine Trophic Status (ASSETS). Copyright © 2016 Elsevier Ltd. All rights reserved.
Symptom clusters in patients with nasopharyngeal carcinoma during radiotherapy.
Xiao, Wenli; Chan, Carmen W H; Fan, Yuying; Leung, Doris Y P; Xia, Weixiong; He, Yan; Tang, Linquan
2017-06-01
Despite the improvement in radiotherapy (RT) technology, patients with nasopharyngeal carcinoma (NPC) still suffer from numerous distressing symptoms simultaneously during RT. The purpose of the study was to investigate the symptom clusters experienced by NPC patients during RT. First-treated Chinese NPC patients (n = 130) undergoing late-period RT (from week 4 till the end) were recruited for this cross-sectional study. They completed a sociodemographic and clinical data questionnaire, the Chinese version of the M. D. Anderson Symptom Inventory - Head and Neck Module (MDASI-HN-C) and the Chinese version of the Functional Assessment of Cancer Therapy - Head and Neck Scale (FACT-H&N-C). Principal axis factor analysis with oblimin rotation, independent t-test, one-way analysis of variance (ANOVA) and Pearson product-moment correlation were used to analyze the data. Four symptom clusters were identified, and labelled general, gastrointestinal, nutrition impact and social interaction impact. Of these 4 types, the nutrition impact symptom cluster was the most severe. Statistically positive correlations were found between severity of all 4 symptom clusters and symptom interference, as well as weight loss. Statistically negative correlations were detected between the cluster severity and the QOL total score and 3 out of 5 subscale scores. The four clusters identified reveal the symptom patterns experienced by NPC patients during RT. Future intervention studies on managing these symptom clusters are warranted, especially for the nutrition impact symptom cluster. Copyright © 2017 Elsevier Ltd. All rights reserved.
Computational methods for evaluation of cell-based data assessment--Bioconductor.
Le Meur, Nolwenn
2013-02-01
Recent advances in miniaturization and automation of technologies have enabled cell-based assay high-throughput screening, bringing along new challenges in data analysis. Automation, standardization, reproducibility have become requirements for qualitative research. The Bioconductor community has worked in that direction proposing several R packages to handle high-throughput data including flow cytometry (FCM) experiment. Altogether, these packages cover the main steps of a FCM analysis workflow, that is, data management, quality assessment, normalization, outlier detection, automated gating, cluster labeling, and feature extraction. Additionally, the open-source philosophy of R and Bioconductor, which offers room for new development, continuously drives research and improvement of theses analysis methods, especially in the field of clustering and data mining. This review presents the principal FCM packages currently available in R and Bioconductor, their advantages and their limits. Copyright © 2012 Elsevier Ltd. All rights reserved.
Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun
2015-11-04
There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.
Roblová, Vendula; Bittová, Miroslava; Kubáň, Petr; Kubáň, Vlastimil
2016-07-01
In this work aqueous infusions from ten Mentha herbal samples (four different Mentha species and six hybrids of Mentha x piperita) and 20 different peppermint teas were screened by capillary electrophoresis with UV detection. The fingerprint separation was accomplished in a 25 mM borate background electrolyte with 10% methanol at pH 9.3. The total polyphenolic content in the extracts was determined spectrophotometrically at 765 nm by a Folin-Ciocalteu phenol assay. Total antioxidant activity was determined by scavenging of 2,2-diphenyl-1-picrylhydrazyl radical at 515 nm. The peak areas of 12 dominant peaks from CE analysis, present in all samples, and the value of total polyphenolic content and total antioxidant activity obtained by spectrophotometry was combined into a single data matrix and principal component analysis was applied. The obtained principal component analysis model resulted in distinct clusters of Mentha and peppermint tea samples distinguishing the samples according to their potential protective antioxidant effect. Principal component analysis, using a non-targeted approach with no need for compound identification, was found as a new promising tool for the screening of herbal tea products. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Efficient generation of low-energy folded states of a model protein
NASA Astrophysics Data System (ADS)
Gordon, Heather L.; Kwan, Wai Kei; Gong, Chunhang; Larrass, Stefan; Rothstein, Stuart M.
2003-01-01
A number of short simulated annealing runs are performed on a highly-frustrated 46-"residue" off-lattice model protein. We perform, in an iterative fashion, a principal component analysis of the 946 nonbonded interbead distances, followed by two varieties of cluster analyses: hierarchical and k-means clustering. We identify several distinct sets of conformations with reasonably consistent cluster membership. Nonbonded distance constraints are derived for each cluster and are employed within a distance geometry approach to generate many new conformations, previously unidentified by the simulated annealing experiments. Subsequent analyses suggest that these new conformations are members of the parent clusters from which they were generated. Furthermore, several novel, previously unobserved structures with low energy were uncovered, augmenting the ensemble of simulated annealing results, and providing a complete distribution of low-energy states. The computational cost of this approach to generating low-energy conformations is small when compared to the expense of further Monte Carlo simulated annealing runs.
NASA Astrophysics Data System (ADS)
Fučkar, Neven-Stjepan; Guemas, Virginie; Massonnet, François; Doblas-Reyes, Francisco
2015-04-01
Over the modern observational era, the northern hemisphere sea ice concentration, age and thickness have experienced a sharp long-term decline superimposed with strong internal variability. Hence, there is a crucial need to identify robust patterns of Arctic sea ice variability on interannual timescales and disentangle them from the long-term trend in noisy datasets. The principal component analysis (PCA) is a versatile and broadly used method for the study of climate variability. However, the PCA has several limiting aspects because it assumes that all modes of variability have symmetry between positive and negative phases, and suppresses nonlinearities by using a linear covariance matrix. Clustering methods offer an alternative set of dimension reduction tools that are more robust and capable of taking into account possible nonlinear characteristics of a climate field. Cluster analysis aggregates data into groups or clusters based on their distance, to simultaneously minimize the distance between data points in a given cluster and maximize the distance between the centers of the clusters. We extract modes of Arctic interannual sea-ice variability with nonhierarchical K-means cluster analysis and investigate the mechanisms leading to these modes. Our focus is on the sea ice thickness (SIT) as the base variable for clustering because SIT holds most of the climate memory for variability and predictability on interannual timescales. We primarily use global reconstructions of sea ice fields with a state-of-the-art ocean-sea-ice model, but we also verify the robustness of determined clusters in other Arctic sea ice datasets. Applied cluster analysis over the 1958-2013 period shows that the optimal number of detrended SIT clusters is K=3. Determined SIT cluster patterns and their time series of occurrence are rather similar between different seasons and months. Two opposite thermodynamic modes are characterized with prevailing negative or positive SIT anomalies over the Arctic basin. The intermediate mode, with negative anomalies centered on the East Siberian shelf and positive anomalies along the North American side of the basin, has predominately dynamic characteristics. The associated sea ice concentration (SIC) clusters vary more between different seasons and months, but the SIC patterns are physically framed by the SIT cluster patterns.
Open-Source Sequence Clustering Methods Improve the State Of the Art.
Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob
2016-01-01
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Cluster-based exposure variation analysis
2013-01-01
Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p < 0.001). All three methods performed poorly in discriminating exposure patterns differing with respect to the variability in cycle time duration. Conclusion While C-EVA had a higher accuracy than conventional EVA, both failed to detect differences in temporal similarity. The data-driven optimality of data reduction and the capability of handling multiple exposure time lines in a single analysis are the advantages of the C-EVA. PMID:23557439
Automated cloud screening of AVHRR imagery using split-and-merge clustering
NASA Technical Reports Server (NTRS)
Gallaudet, Timothy C.; Simpson, James J.
1991-01-01
Previous methods to segment clouds from ocean in AVHRR imagery have shown varying degrees of success, with nighttime approaches being the most limited. An improved method of automatic image segmentation, the principal component transformation split-and-merge clustering (PCTSMC) algorithm, is presented and applied to cloud screening of both nighttime and daytime AVHRR data. The method combines spectral differencing, the principal component transformation, and split-and-merge clustering to sample objectively the natural classes in the data. This segmentation method is then augmented by supervised classification techniques to screen clouds from the imagery. Comparisons with other nighttime methods demonstrate its improved capability in this application. The sensitivity of the method to clustering parameters is presented; the results show that the method is insensitive to the split-and-merge thresholds.
NASA Astrophysics Data System (ADS)
Radzka, Elżbieta; Rymuza, Katarzyna
2015-04-01
The work is based on meteorological data recorded by nine stations of the Institute of Meteorology and Water Management located in east-central Poland from 1971 to 2005. The region encompasses the North Podlasian Lowland and the South Podlasian Lowland. Average values of selected agroclimate indicators for the growing season were determined. Moreover, principal component analysis was conducted to indicate elements that exerted the greatest influence on the agroclimate. Also, cluster analysis was carried out to select stations with similar agroclimate. Ward method was used for clustering and the Euclidean distance was applied. Principal component analysis revealed that the agroclimate of east-central Poland was predominantly affected by climatic water balance, number of days of active plant growth, length of the farming period, and the average air temperature during the growing season (Apr-Sept). Based on the analysis, the region of east-central Poland was divided into two groups (areas) with different agroclimatic conditions. The first area comprized the following stations: Szepietowo and Białowieża located in the North Podlasian Lowland and Biała Podlaska situated in the northern part of the South Podlasian Lowland. This area was characterized by shorter farming periods and a lower average air temperature during the growing season. The other group included the remaining stations located in the western part of both the Lowlands which was warmer and where greater water deficits were recorded.
Research of seafloor topographic analyses for a staged mineral exploration
NASA Astrophysics Data System (ADS)
Ikeda, M.; Kadoshima, K.; Koizumi, Y.; Yamakawa, T.; Asakawa, E.; Sumi, T.; Kose, M.
2016-12-01
J-MARES (Research and Development Partnership for Next Generation Technology of Marine Resources Survey, JAPAN) has been designing a low-cost and high-efficiency exploration system for seafloor hydrothermal massive sulfide (SMS) deposits in "Cross-ministerial Strategic Innovation Promotion Program (SIP)" granted by the Cabinet Office, Government of Japan since 2014. We proposed the multi-stage approach, which is designed from the regional scaled to the detail scaled survey stages through semi-detail scaled, focusing a prospective area by seafloor topographic analyses. We applied this method to the area of more than 100km x 100km around Okinawa Trough, including some well-known mineralized deposits. In the regional scale survey, we assume survey areas are more than 100 km x 100km. Then the spatial resolution of topography data should be bigger than 100m. The 500 m resolution data which is interpolated into 250 m resolution was used for extracting depression and performing principal component analysis (PCA) by the wavelength obtained from frequency analysis. As the result, we have successfully extracted the areas having the topographic features quite similar to well-known mineralized deposits. In the semi-local survey stage, we use the topography data obtained by bathymetric survey using multi-narrow beam echo-sounder. The 30m-resolution data was used for extracting depression, relative-large mounds, hills, lineaments as fault, and also for performing frequency analysis. As the result, wavelength as principal component constituting in the target area was extracted by PCA of wavelength obtained from frequency analysis. Therefore, color image was composited by using the second principal component (PC2) to the forth principal component (PC4) in which the continuity of specific wavelength was observed, and consistent with extracted lineaments. In addition, well-known mineralized deposits were discriminated in the same clusters by using clustering from PC2 to PC4.We applied the results described above to a new area, and successfully extract the quite similar area in vicinity to one of the well-known mineralized deposits. So we are going to verify the extracted areas by using geophysical methods, such as vertical cable seismic and time-domain EM survey, developed in this SIP project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chow, Edward, E-mail: Edward.Chow@sunnybrook.c; James, Jennifer; Barsevick, Andrea
Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up.more » Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.« less
NeatMap--non-clustering heat map alternatives in R.
Rajaram, Satwik; Oono, Yoshi
2010-01-22
The clustered heat map is the most popular means of visualizing genomic data. It compactly displays a large amount of data in an intuitive format that facilitates the detection of hidden structures and relations in the data. However, it is hampered by its use of cluster analysis which does not always respect the intrinsic relations in the data, often requiring non-standardized reordering of rows/columns to be performed post-clustering. This sometimes leads to uninformative and/or misleading conclusions. Often it is more informative to use dimension-reduction algorithms (such as Principal Component Analysis and Multi-Dimensional Scaling) which respect the topology inherent in the data. Yet, despite their proven utility in the analysis of biological data, they are not as widely used. This is at least partially due to the lack of user-friendly visualization methods with the visceral impact of the heat map. NeatMap is an R package designed to meet this need. NeatMap offers a variety of novel plots (in 2 and 3 dimensions) to be used in conjunction with these dimension-reduction techniques. Like the heat map, but unlike traditional displays of such results, it allows the entire dataset to be displayed while visualizing relations between elements. It also allows superimposition of cluster analysis results for mutual validation. NeatMap is shown to be more informative than the traditional heat map with the help of two well-known microarray datasets. NeatMap thus preserves many of the strengths of the clustered heat map while addressing some of its deficiencies. It is hoped that NeatMap will spur the adoption of non-clustering dimension-reduction algorithms.
Wongchai, C; Chaidee, A; Pfeiffer, W
2012-01-01
Global warming increases plant salt stress via evaporation after irrigation, but how plant cells sense salt stress remains unknown. Here, we searched for correlation-based targets of salt stress sensing in Chenopodium rubrum cell suspension cultures. We proposed a linkage between the sensing of salt stress and the sensing of distinct metabolites. Consequently, we analysed various extracellular pH signals in autotroph and heterotroph cell suspensions. Our search included signals after 52 treatments: salt and osmotic stress, ion channel inhibitors (amiloride, quinidine), salt-sensing modulators (proline), amino acids, carboxylic acids and regulators (salicylic acid, 2,4-dichlorphenoxyacetic acid). Multivariate analyses revealed hirarchical clusters of signals and five principal components of extracellular proton flux. The principal component correlated with salt stress was an antagonism of γ-aminobutyric and salicylic acid, confirming involvement of acid-sensing ion channels (ASICs) in salt stress sensing. Proline, short non-substituted mono-carboxylic acids (C2-C6), lactic acid and amiloride characterised the four uncorrelated principal components of proton flux. The proline-associated principal component included an antagonism of 2,4-dichlorphenoxyacetic acid and a set of amino acids (hydrophobic, polar, acidic, basic). The five principal components captured 100% of variance of extracellular proton flux. Thus, a bias-free, functional high-throughput screening was established to extract new clusters of response elements and potential signalling pathways, and to serve as a core for quantitative meta-analysis in plant biology. The eigenvectors reorient research, associating proline with development instead of salt stress, and the proof of existence of multiple components of proton flux can help to resolve controversy about the acid growth theory. © 2011 German Botanical Society and The Royal Botanical Society of the Netherlands.
Clustering of food and activity preferences in primary school children.
Rodenburg, Gerda; Oenema, Anke; Pasma, Marleen; Kremers, Stef P J; van de Mheen, Dike
2013-01-01
This study examined clustering of food and activity preferences in Dutch primary school children. It also explored whether the preference clusters are associated with child and parental background characteristics and with parenting practices. Data were used from 1480 parent-child dyads participating in the IVO Nutrition and Physical Activity Child cohort (INPACT). Children aged 8-11years reported their preferences for food (e.g. fruit and sweet snacks) and activities (e.g. biking and watching television) at school with a newly-developed, visual instrument designed for primary school children. Parents completed a questionnaire at home. Principal component analysis was used to identify preference clusters. Backward regression analyses were used to examine the relationship between child and parental characteristics with cluster scores. We found (1) a clustering of preferences for unhealthy foods and unhealthy drinks, (2) a clustering of preferences for various physical activity behaviours, and (3) a clustering of preferences for unhealthy drinks and sedentary behaviour. Boys had a higher cluster score than girls on all three preference clusters. In addition, physical activity-related parenting practices were negatively related to unhealthy preference clusters and positively to the physical-activity-preference cluster. The next step is to relate our preference clusters to child dietary and activity behaviours, with special attention to gender differences. This may help in the development of interventions aimed at improving children's food and activity preferences. Copyright © 2012 Elsevier Ltd. All rights reserved.
Lifestyle and accidents among young drivers.
Gregersen, N P; Berg, H Y
1994-06-01
This study covers the lifestyle component of the problems related to young drivers' accident risk. The purpose of the study is to measure the relationship between lifestyle and accident risk, and to identify specific high-risk and low-risk groups. Lifestyle is measured through a questionnaire, where 20-year-olds describe themselves and how often they deal with a large number of different activities, like sports, music, movies, reading, cars and driving, political engagement, etc. They also report their involvement in traffic accidents. With a principal component analysis followed by a cluster analysis, lifestyle profiles are defined. These profiles are finally correlated to accidents, which makes it possible to define high-risk and low-risk groups. The cluster analysis defined 15 clusters including four high-risk groups with an average overrisk of 150% and two low-risk groups with an average underrisk of 75%. The results are discussed from two perspectives. The first is the importance of theoretical understanding of the contribution of lifestyle factors to young drivers' high accident risk. The second is how the findings could be used in practical road safety measures, like education, campaigns, etc.
Genetic differentiation and geographical Relationship of Asian barley landraces using SSRs
Naeem, Rehan; Dahleen, Lynn; Mirza, Bushra
2011-01-01
Genetic diversity in 403 morphologically distinct landraces of barley (Hordeum vulgare L. subsp. vulgare) originating from seven geographical zones of Asia was studied using simple sequence repeat (SSR) markers from regions of medium to high recombination in the barley genome. The seven polymorphic SSR markers representing each of the chromosomes chosen for the study revealed a high level of allelic diversity among the landraces. Genetic richness was highest in those from India, followed by Pakistan while it was lowest for Uzbekistan and Turkmenistan. Out of the 50 alleles detected, 15 were unique to a geographic region. Genetic diversity was highest for landraces from Pakistan (0.70 ± 0.06) and lowest for those from Uzbekistan (0.18 ± 0.17). Likewise, polymorphic information content (PIC) was highest for Pakistan (0.67 ± 0.06) and lowest for Uzbekistan (0.15 ± 0.17). Diversity among groups was 40% compared to 60% within groups. Principal component analysis clustered the barley landraces into three groups to predict their domestication patterns. In total 51.58% of the variation was explained by the first two principal components of the barley germplasm. Pakistan landraces were clustered separately from those of India, Iran, Nepal and Iraq, whereas those from Turkmenistan and Uzbekistan were clustered together into a separate group. PMID:21734828
Armitage, Emily G; Godzien, Joanna; Peña, Imanol; López-Gonzálvez, Ángeles; Angulo, Santiago; Gradillas, Ana; Alonso-Herranz, Vanesa; Martín, Julio; Fiandor, Jose M; Barrett, Michael P; Gabarro, Raquel; Barbas, Coral
2018-05-18
A lack of viable hits, increasing resistance, and limited knowledge on mode of action is hindering drug discovery for many diseases. To optimize prioritization and accelerate the discovery process, a strategy to cluster compounds based on more than chemical structure is required. We show the power of metabolomics in comparing effects on metabolism of 28 different candidate treatments for Leishmaniasis (25 from the GSK Leishmania box, two analogues of Leishmania box series, and amphotericin B as a gold standard treatment), tested in the axenic amastigote form of Leishmania donovani. Capillary electrophoresis-mass spectrometry was applied to identify the metabolic profile of Leishmania donovani, and principal components analysis was used to cluster compounds on potential mode of action, offering a medium throughput screening approach in drug selection/prioritization. The comprehensive and sensitive nature of the data has also made detailed effects of each compound obtainable, providing a resource to assist in further mechanistic studies and prioritization of these compounds for the development of new antileishmanial drugs.
A Multivariate Analysis of Galaxy Cluster Properties
NASA Astrophysics Data System (ADS)
Ogle, P. M.; Djorgovski, S.
1993-05-01
We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.
Oberle, Michael; Wohlwend, Nadia; Jonas, Daniel; Maurer, Florian P.; Jost, Geraldine; Tschudin-Sutter, Sarah; Vranckx, Katleen; Egli, Adrian
2016-01-01
Background The technical, biological, and inter-center reproducibility of matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI TOF MS) typing data has not yet been explored. The aim of this study is to compare typing data from multiple centers employing bioinformatics using bacterial strains from two past outbreaks and non-related strains. Material/Methods Participants received twelve extended spectrum betalactamase-producing E. coli isolates and followed the same standard operating procedure (SOP) including a full-protein extraction protocol. All laboratories provided visually read spectra via flexAnalysis (Bruker, Germany). Raw data from each laboratory allowed calculating the technical and biological reproducibility between centers using BioNumerics (Applied Maths NV, Belgium). Results Technical and biological reproducibility ranged between 96.8–99.4% and 47.6–94.4%, respectively. The inter-center reproducibility showed a comparable clustering among identical isolates. Principal component analysis indicated a higher tendency to cluster within the same center. Therefore, we used a discriminant analysis, which completely separated the clusters. Next, we defined a reference center and performed a statistical analysis to identify specific peaks to identify the outbreak clusters. Finally, we used a classifier algorithm and a linear support vector machine on the determined peaks as classifier. A validation showed that within the set of the reference center, the identification of the cluster was 100% correct with a large contrast between the score with the correct cluster and the next best scoring cluster. Conclusions Based on the sufficient technical and biological reproducibility of MALDI-TOF MS based spectra, detection of specific clusters is possible from spectra obtained from different centers. However, we believe that a shared SOP and a bioinformatics approach are required to make the analysis robust and reliable. PMID:27798637
Grouping of Bulgarian wines according to grape variety by using statistical methods
NASA Astrophysics Data System (ADS)
Milev, M.; Nikolova, Kr.; Ivanova, Ir.; Minkova, St.; Evtimov, T.; Krustev, St.
2017-12-01
68 different types of Bulgarian wines were studied in accordance with 9 optical parameters as follows: color parameters in XYZ and SIE Lab color systems, lightness, Hue angle, chroma, fluorescence intensity and emission wavelength. The main objective of this research is using hierarchical cluster analysis to evaluate the similarity and the distance between examined different types of Bulgarian wines and their grouping based on physical parameters. We have found that wines are grouped in clusters on the base of the degree of identity between them. There are two main clusters each one with two subclusters. The first one contains white wines and Sira, the second contains red wines and rose. The results from cluster analysis are presented graphically by a dendrogram. The other statistical technique used is factor analysis performed by the Method of Principal Components (PCA). The aim is to reduce the large number of variables to a few factors by grouping the correlated variables into one factor and subdividing the noncorrelated variables into different factors. Moreover the factor analysis provided the possibility to determine the parameters with the greatest influence over the distribution of samples in different clusters. In our study after the rotation of the factors with Varimax method the parameters were combined into two factors, which explain about 80 % of the total variation. The first one explains the 61.49% and correlates with color characteristics, the second one explains 18.34% from the variation and correlates with the parameters connected with fluorescence spectroscopy.
Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg
2015-03-01
Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.
Verma, Priyanka; Kumar, Manoj; Mishra, Girish; Sahoo, Dinabandhu
2017-02-01
In the present study bio prospecting of thirty seaweeds from Indian coasts was analyzed for their biochemical components including pigments, fatty acid and ash content. Multivariate analysis of biochemical components and fatty acids was done using Principal Component Analysis (PCA) and Agglomerative hierarchical clustering (AHC) to manifest chemotaxonomic relationship among various seaweeds. The overall analysis suggests that these seaweeds have multi-functional properties and can be utilized as promising bioresource for proteins, lipids, pigments and carbohydrates for the food/feed and biofuel industry. Copyright © 2016. Published by Elsevier Ltd.
Yennurajalingam, Sriram; Williams, Janet L; Chisholm, Gary; Bruera, Eduardo
2016-03-01
Advanced cancer patients frequently experience debilitating symptoms that occur in clusters, but few pharmacological studies have targeted symptom clusters. Our objective was to examine the effects of dexamethasone on symptom clusters in patients with advanced cancer. We reviewed the data from a previous randomized clinical trial to determine the effects of dexamethasone on cancer symptoms. Symptom clusters were identified according to baseline symptoms by using principal component analysis. Correlations and change in the severity of symptom clusters were analyzed after study treatment. A total of 114 participants were included in this study. Three clusters were identified: fatigue/anorexia-cachexia/depression (FAD), sleep/anxiety/drowsiness (SAD), and pain/dyspnea (PD). Changes in severity of FAD and PD significantly correlated over time (at baseline, day 8, and day 15). The FAD cluster was associated with significant improvement in severity at day 8 and day 15, whereas no significant change was observed with the SAD cluster or PD cluster after dexamethasone treatment. The results of this preliminary study suggest significant correlation over time and improvement in the FAD cluster at day 8 and day 15 after treatment with dexamethasone. These findings suggest that fatigue, anorexia-cachexia, and depression may share a common pathophysiologic basis. Further studies are needed to investigate this cluster and target anti-inflammatory therapies. ©AlphaMed Press.
Amro, Amin; Waldum, Bård; von der Lippe, Nanna; Brekke, Fredrik Barth; Dammen, Toril; Miaskowski, Christine; Os, Ingrid
2015-01-01
Patients with end-stage renal disease on dialysis have reduced survival rates compared with the general population. Symptoms are frequent in dialysis patients, and a symptom cluster is defined as two or more related co-occurring symptoms. The aim of this study was to explore the associations between symptom clusters and mortality in dialysis patients. In a prospective observational cohort study of dialysis patients (n = 301), Kidney Disease and Quality of Life Short Form and Beck Depression Inventory questionnaires were administered. To generate symptom clusters, principal component analysis with varimax rotation was used on 11 kidney-specific self-reported physical symptoms. A Beck Depression Inventory score of 16 or greater was defined as clinically significant depressive symptoms. Physical and mental component summary scores were generated from Short Form-36. Multivariate Cox regression analysis was used for the survival analysis, Kaplan-Meier curves and log-rank statistics were applied to compare survival rates between the groups. Three different symptom clusters were identified; one included loading of several uremic symptoms. In multivariate analyses and after adjustment for health-related quality of life and depressive symptoms, the worst perceived quartile of the "uremic" symptom cluster independently predicted all-cause mortality (hazard ratio 2.47, 95% CI 1.44-4.22, P = 0.001) compared with the other quartiles during a follow-up period that ranged from four to 52 months. The two other symptom clusters ("neuromuscular" and "skin") or the individual symptoms did not predict mortality. Clustering of uremic symptoms predicted mortality. Assessing co-occurring symptoms rather than single symptoms may help to identify dialysis patients at high risk for mortality. Copyright © 2015 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Sanchez, J. L.; Osipowicz, T.; Tang, S. M.; Tay, T. S.; Win, T. T.
1997-07-01
The trace element concentrations found in geological samples can shed light on the formation process. In the case of gemstones, which might be of artificial or natural origin, there is also considerable interest in the development of methods that provide identification of the origin of a sample. For rubies, trace element concentrations present in natural samples were shown previously to be significant indicators of the region of origin [S.M. Tang et al., Appl. Spectr. 42 (1988) 44, and 43 (1989) 219]. Here we report the results of micro-PIXE analyses of trace element (Ti, V, Cr, Fe, Cu and Ga) concentrations of a large set ( n = 130) of natural rough rubies from nine locations in Myanmar (Burma). The resulting concentrations are subjected to statistical analysis. Six of the nine groups form clusters when the data base is evaluated using tree clustering and principal component analysis.
Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Lacey, Simon; Sathian, K
2018-02-01
In a recent study Eklund et al. have shown that cluster-wise family-wise error (FWE) rate-corrected inferences made in parametric statistical method-based functional magnetic resonance imaging (fMRI) studies over the past couple of decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; principally because the spatial autocorrelation functions (sACFs) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggest otherwise. Hence, the residuals from general linear model (GLM)-based fMRI activation estimates in these studies may not have possessed a homogenously Gaussian sACF. Here we propose a method based on the assumption that heterogeneity and non-Gaussianity of the sACF of the first-level GLM analysis residuals, as well as temporal autocorrelations in the first-level voxel residual time-series, are caused by unmodeled MRI signal from neuronal and physiological processes as well as motion and other artifacts, which can be approximated by appropriate decompositions of the first-level residuals with principal component analysis (PCA), and removed. We show that application of this method yields GLM residuals with significantly reduced spatial correlation, nearly Gaussian sACF and uniform spatial smoothness across the brain, thereby allowing valid cluster-based FWE-corrected inferences based on assumption of Gaussian spatial noise. We further show that application of this method renders the voxel time-series of first-level GLM residuals independent, and identically distributed across time (which is a necessary condition for appropriate voxel-level GLM inference), without having to fit ad hoc stochastic colored noise models. Furthermore, the detection power of individual subject brain activation analysis is enhanced. This method will be especially useful for case studies, which rely on first-level GLM analysis inferences.
Stuckey, Bronwyn G A; Opie, Nicole; Cussons, Andrea J; Watts, Gerald F; Burke, Valerie
2014-08-01
Polycystic ovary syndrome (PCOS) is a prevalent condition with heterogeneity of clinical features and cardiovascular risk factors that implies multiple aetiological factors and possible outcomes. To reduce a set of correlated variables to a smaller number of uncorrelated and interpretable factors that may delineate subgroups within PCOS or suggest pathogenetic mechanisms. We used principal component analysis (PCA) to examine the endocrine and cardiometabolic variables associated with PCOS defined by the National Institutes of Health (NIH) criteria. Data were retrieved from the database of a single clinical endocrinologist. We included women with PCOS (N = 378) who were not taking the oral contraceptive pill or other sex hormones, lipid lowering medication, metformin or other medication that could influence the variables of interest. PCA was performed retaining those factors with eigenvalues of at least 1.0. Varimax rotation was used to produce interpretable factors. We identified three principal components. In component 1, the dominant variables were homeostatic model assessment (HOMA) index, body mass index (BMI), high density lipoprotein (HDL) cholesterol and sex hormone binding globulin (SHBG); in component 2, systolic blood pressure, low density lipoprotein (LDL) cholesterol and triglycerides; in component 3, total testosterone and LH/FSH ratio. These components explained 37%, 13% and 11% of the variance in the PCOS cohort respectively. Multiple correlated variables from patients with PCOS can be reduced to three uncorrelated components characterised by insulin resistance, dyslipidaemia/hypertension or hyperandrogenaemia. Clustering of risk factors is consistent with different pathogenetic pathways within PCOS and/or differing cardiometabolic outcomes. Copyright © 2014 Elsevier Inc. All rights reserved.
Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich
2017-01-01
Sequencing the actinomycin (acm) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN, encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S. antibioticus. PMID:28435299
Crnovčić, Ivana; Rückert, Christian; Semsary, Siamak; Lang, Manuel; Kalinowski, Jörn; Keller, Ullrich
2017-01-01
Sequencing the actinomycin ( acm ) biosynthetic gene cluster of Streptomyces antibioticus IMRU 3720, which produces actinomycin X (Acm X), revealed 20 genes organized into a highly similar framework as in the bi-armed acm C biosynthetic gene cluster of Streptomyces chrysomallus but without an attached additional extra arm of orthologues as in the latter. Curiously, the extra arm of the S. chrysomallus gene cluster turned out to perfectly match the single arm of the S. antibioticus gene cluster in the same order of orthologues including the the presence of two pseudogenes, scacmM and scacmN , encoding a cytochrome P450 and its ferredoxin, respectively. Orthologues of the latter genes were both missing in the principal arm of the S. chrysomallus acm C gene cluster. All orthologues of the extra arm showed a G +C-contents different from that of their counterparts in the principal arm. Moreover, the similarities of translation products from the extra arm were all higher to the corresponding translation products of orthologue genes from the S. antibioticus acm X gene cluster than to those encoded by the principal arm of their own gene cluster. This suggests that the duplicated structure of the S. chrysomallus acm C biosynthetic gene cluster evolved from previous fusion between two one-armed acm gene clusters each from a different genetic background. However, while scacmM and scacmN in the extra arm of the S. chrysomallus acm C gene cluster are mutated and therefore are non-functional, their orthologues saacmM and saacmN in the S. antibioticus acm C gene cluster show no defects seemingly encoding active enzymes with functions specific for Acm X biosynthesis. Both acm biosynthetic gene clusters lack a kynurenine-3-monooxygenase gene necessary for biosynthesis of 3-hydroxy-4-methylanthranilic acid, the building block of the Acm chromophore, which suggests participation of a genome-encoded relevant monooxygenase during Acm biosynthesis in both S. chrysomallus and S. antibioticus .
Zautner, Andreas Erich; Masanta, Wycliffe Omurwa; Tareen, Abdul Malik; Weig, Michael; Lugert, Raimond; Groß, Uwe; Bader, Oliver
2013-11-07
Campylobacter jejuni, the most common bacterial pathogen causing gastroenteritis, shows a wide genetic diversity. Previously, we demonstrated by the combination of multi locus sequence typing (MLST)-based UPGMA-clustering and analysis of 16 genetic markers that twelve different C. jejuni subgroups can be distinguished. Among these are two prominent subgroups. The first subgroup contains the majority of hyperinvasive strains and is characterized by a dimeric form of the chemotaxis-receptor Tlp7(m+c). The second has an extended amino acid metabolism and is characterized by the presence of a periplasmic asparaginase (ansB) and gamma-glutamyl-transpeptidase (ggt). Phyloproteomic principal component analysis (PCA) hierarchical clustering of MALDI-TOF based intact cell mass spectrometry (ICMS) spectra was able to group particular C. jejuni subgroups of phylogenetic related isolates in distinct clusters. Especially the aforementioned Tlp7(m+c)(+) and ansB+/ ggt+ subgroups could be discriminated by PCA. Overlay of ICMS spectra of all isolates led to the identification of characteristic biomarker ions for these specific C. jejuni subgroups. Thus, mass peak shifts can be used to identify the C. jejuni subgroup with an extended amino acid metabolism. Although the PCA hierarchical clustering of ICMS-spectra groups the tested isolates into a different order as compared to MLST-based UPGMA-clustering, the isolates of the indicator-groups form predominantly coherent clusters. These clusters reflect phenotypic aspects better than phylogenetic clustering, indicating that the genes corresponding to the biomarker ions are phylogenetically coupled to the tested marker genes. Thus, PCA clustering could be an additional tool for analyzing the relatedness of bacterial isolates.
Sequential analysis of hydrochemical data for watershed characterization.
Thyne, Geoffrey; Güler, Cüneyt; Poeter, Eileen
2004-01-01
A methodology for characterizing the hydrogeology of watersheds using hydrochemical data that combine statistical, geochemical, and spatial techniques is presented. Surface water and ground water base flow and spring runoff samples (180 total) from a single watershed are first classified using hierarchical cluster analysis. The statistical clusters are analyzed for spatial coherence confirming that the clusters have a geological basis corresponding to topographic flowpaths and showing that the fractured rock aquifer behaves as an equivalent porous medium on the watershed scale. Then principal component analysis (PCA) is used to determine the sources of variation between parameters. PCA analysis shows that the variations within the dataset are related to variations in calcium, magnesium, SO4, and HCO3, which are derived from natural weathering reactions, and pH, NO3, and chlorine, which indicate anthropogenic impact. PHREEQC modeling is used to quantitatively describe the natural hydrochemical evolution for the watershed and aid in discrimination of samples that have an anthropogenic component. Finally, the seasonal changes in the water chemistry of individual sites were analyzed to better characterize the spatial variability of vertical hydraulic conductivity. The integrated result provides a method to characterize the hydrogeology of the watershed that fully utilizes traditional data.
Wang, Lei; Csallany, A Saari; Kerr, Brian J; Shurson, Gerald C; Chen, Chi
2016-05-18
In this study, the kinetics of aldehyde formation in heated frying oils was characterized by 2-hydrazinoquinoline derivatization, liquid chromatography-mass spectrometry (LC-MS) analysis, principal component analysis (PCA), and hierarchical cluster analysis (HCA). The aldehydes contributing to time-dependent separation of heated soybean oil (HSO) in a PCA model were grouped by the HCA into three clusters (A1, A2, and B) on the basis of their kinetics and fatty acid precursors. The increases of 4-hydroxynonenal (4-HNE) and the A2-to-B ratio in HSO were well-correlated with the duration of thermal stress. Chemometric and quantitative analysis of three frying oils (soybean, corn, and canola oils) and French fry extracts further supported the associations between aldehyde profiles and fatty acid precursors and also revealed that the concentrations of pentanal, hexanal, acrolein, and the A2-to-B ratio in French fry extracts were more comparable to their values in the frying oils than other unsaturated aldehydes. All of these results suggest the roles of specific aldehydes or aldehyde clusters as novel markers of the lipid oxidation status for frying oils or fried foods.
Analysis of cytokine release assay data using machine learning approaches.
Xiong, Feiyu; Janko, Marco; Walker, Mindi; Makropoulos, Dorie; Weinstock, Daniel; Kam, Moshe; Hrebien, Leonid
2014-10-01
The possible onset of Cytokine Release Syndrome (CRS) is an important consideration in the development of monoclonal antibody (mAb) therapeutics. In this study, several machine learning approaches are used to analyze CRS data. The analyzed data come from a human blood in vitro assay which was used to assess the potential of mAb-based therapeutics to produce cytokine release similar to that induced by Anti-CD28 superagonistic (Anti-CD28 SA) mAbs. The data contain 7 mAbs and two negative controls, a total of 423 samples coming from 44 donors. Three (3) machine learning approaches were applied in combination to observations obtained from that assay, namely (i) Hierarchical Cluster Analysis (HCA); (ii) Principal Component Analysis (PCA) followed by K-means clustering; and (iii) Decision Tree Classification (DTC). All three approaches were able to identify the treatment that caused the most severe cytokine response. HCA was able to provide information about the expected number of clusters in the data. PCA coupled with K-means clustering allowed classification of treatments sample by sample, and visualizing clusters of treatments. DTC models showed the relative importance of various cytokines such as IFN-γ, TNF-α and IL-10 to CRS. The use of these approaches in tandem provides better selection of parameters for one method based on outcomes from another, and an overall improved analysis of the data through complementary approaches. Moreover, the DTC analysis showed in addition that IL-17 may be correlated with CRS reactions, although this correlation has not yet been corroborated in the literature. Copyright © 2014 Elsevier B.V. All rights reserved.
Groundwater Quality: Analysis of Its Temporal and Spatial Variability in a Karst Aquifer.
Pacheco Castro, Roger; Pacheco Ávila, Julia; Ye, Ming; Cabrera Sansores, Armando
2018-01-01
This study develops an approach based on hierarchical cluster analysis for investigating the spatial and temporal variation of water quality governing processes. The water quality data used in this study were collected in the karst aquifer of Yucatan, Mexico, the only source of drinking water for a population of nearly two million people. Hierarchical cluster analysis was applied to the quality data of all the sampling periods lumped together. This was motivated by the observation that, if water quality does not vary significantly in time, two samples from the same sampling site will belong to the same cluster. The resulting distribution maps of clusters and box-plots of the major chemical components reveal the spatial and temporal variability of groundwater quality. Principal component analysis was used to verify the results of cluster analysis and to derive the variables that explained most of the variation of the groundwater quality data. Results of this work increase the knowledge about how precipitation and human contamination impact groundwater quality in Yucatan. Spatial variability of groundwater quality in the study area is caused by: a) seawater intrusion and groundwater rich in sulfates at the west and in the coast, b) water rock interactions and the average annual precipitation at the middle and east zones respectively, and c) human contamination present in two localized zones. Changes in the amount and distribution of precipitation cause temporal variation by diluting groundwater in the aquifer. This approach allows to analyze the variation of groundwater quality controlling processes efficiently and simultaneously. © 2017, National Ground Water Association.
Guo, Jin-Cheng; Wu, Yang; Chen, Yang; Pan, Feng; Wu, Zhi-Yong; Zhang, Jia-Sheng; Wu, Jian-Yi; Xu, Xiu-E; Zhao, Jian-Mei; Li, En-Min; Zhao, Yi; Xu, Li-Yan
2018-04-09
Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal carcinoma in China. This study was to develop a staging model to predict outcomes of patients with ESCC. Using Cox regression analysis, principal component analysis (PCA), partitioning clustering, Kaplan-Meier analysis, receiver operating characteristic (ROC) curve analysis, and classification and regression tree (CART) analysis, we mined the Gene Expression Omnibus database to determine the expression profiles of genes in 179 patients with ESCC from GSE63624 and GSE63622 dataset. Univariate cox regression analysis of the GSE63624 dataset revealed that 2404 protein-coding genes (PCGs) and 635 long non-coding RNAs (lncRNAs) were associated with the survival of patients with ESCC. PCA categorized these PCGs and lncRNAs into three principal components (PCs), which were used to cluster the patients into three groups. ROC analysis demonstrated that the predictive ability of PCG-lncRNA PCs when applied to new patients was better than that of the tumor-node-metastasis staging (area under ROC curve [AUC]: 0.69 vs. 0.65, P < 0.05). Accordingly, we constructed a molecular disaggregated model comprising one lncRNA and two PCGs, which we designated as the LSB staging model using CART analysis in the GSE63624 dataset. This LSB staging model classified the GSE63622 dataset of patients into three different groups, and its effectiveness was validated by analysis of another cohort of 105 patients. The LSB staging model has clinical significance for the prognosis prediction of patients with ESCC and may serve as a three-gene staging microarray.
Raman spectroscopy of normal oral buccal mucosa tissues: study on intact and incised biopsies
NASA Astrophysics Data System (ADS)
Deshmukh, Atul; Singh, S. P.; Chaturvedi, Pankaj; Krishna, C. Murali
2011-12-01
Oral squamous cell carcinoma is one of among the top 10 malignancies. Optical spectroscopy, including Raman, is being actively pursued as alternative/adjunct for cancer diagnosis. Earlier studies have demonstrated the feasibility of classifying normal, premalignant, and malignant oral ex vivo tissues. Spectral features showed predominance of lipids and proteins in normal and cancer conditions, respectively, which were attributed to membrane lipids and surface proteins. In view of recent developments in deep tissue Raman spectroscopy, we have recorded Raman spectra from superior and inferior surfaces of 10 normal oral tissues on intact, as well as incised, biopsies after separation of epithelium from connective tissue. Spectral variations and similarities among different groups were explored by unsupervised (principal component analysis) and supervised (linear discriminant analysis, factorial discriminant analysis) methodologies. Clusters of spectra from superior and inferior surfaces of intact tissues show a high overlap; whereas spectra from separated epithelium and connective tissue sections yielded clear clusters, though they also overlap on clusters of intact tissues. Spectra of all four groups of normal tissues gave exclusive clusters when tested against malignant spectra. Thus, this study demonstrates that spectra recorded from the superior surface of an intact tissue may have contributions from deeper layers but has no bearing from the classification of a malignant tissues point of view.
Sources of hydrocarbons in urban road dust: Identification, quantification and prediction.
Mummullage, Sandya; Egodawatta, Prasanna; Ayoko, Godwin A; Goonetilleke, Ashantha
2016-09-01
Among urban stormwater pollutants, hydrocarbons are a significant environmental concern due to their toxicity and relatively stable chemical structure. This study focused on the identification of hydrocarbon contributing sources to urban road dust and approaches for the quantification of pollutant loads to enhance the design of source control measures. The study confirmed the validity of the use of mathematical techniques of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for source identification and principal component analysis/absolute principal component scores (PCA/APCS) receptor model for pollutant load quantification. Study outcomes identified non-combusted lubrication oils, non-combusted diesel fuels and tyre and asphalt wear as the three most critical urban hydrocarbon sources. The site specific variabilities of contributions from sources were replicated using three mathematical models. The models employed predictor variables of daily traffic volume (DTV), road surface texture depth (TD), slope of the road section (SLP), effective population (EPOP) and effective impervious fraction (EIF), which can be considered as the five governing parameters of pollutant generation, deposition and redistribution. Models were developed such that they can be applicable in determining hydrocarbon contributions from urban sites enabling effective design of source control measures. Copyright © 2016 Elsevier Ltd. All rights reserved.
Van Cann, Joannes; Virgilio, Massimiliano; Jordaens, Kurt; De Meyer, Marc
2015-01-01
Previous attempts to resolve the Ceratitis FAR complex (Ceratitis fasciventris, Ceratitis anonae, Ceratitis rosa, Diptera, Tephritidae) showed contrasting results and revealed the occurrence of five microsatellite genotypic clusters (A, F1, F2, R1, R2). In this paper we explore the potential of wing morphometrics for the diagnosis of FAR morphospecies and genotypic clusters. We considered a set of 227 specimens previously morphologically identified and genotyped at 16 microsatellite loci. Seventeen wing landmarks and 6 wing band areas were used for morphometric analyses. Permutational multivariate analysis of variance detected significant differences both across morphospecies and genotypic clusters (for both males and females). Unconstrained and constrained ordinations did not properly resolve groups corresponding to morphospecies or genotypic clusters. However, posterior group membership probabilities (PGMPs) of the Discriminant Analysis of Principal Components (DAPC) allowed the consistent identification of a relevant proportion of specimens (but with performances differing across morphospecies and genotypic clusters). This study suggests that wing morphometrics and PGMPs might represent a possible tool for the diagnosis of species within the FAR complex. Here, we propose a tentative diagnostic method and provide a first reference library of morphometric measures that might be used for the identification of additional and unidentified FAR specimens.
ERIC Educational Resources Information Center
Gobena, Gemechu Abera
2017-01-01
This study was aimed to investigate the attitude of principals, supervisors and mentees towards action research as reflective practices in Postgraduate Diploma in Secondary School Teaching (PGDT). The samples used for this study consisted of 82 mentees, 38 Principals and 26 Supervisors taken from three clustered centres by using stratified random…
Vargas-Bello-Pérez, Einar; Toro-Mujica, Paula; Enriquez-Hidalgo, Daniel; Fellenberg, María Angélica; Gómez-Cortés, Pilar
2017-06-01
We used a multivariate chemometric approach to differentiate or associate retail bovine milks with different fat contents and non-dairy beverages, using fatty acid profiles and statistical analysis. We collected samples of bovine milk (whole, semi-skim, and skim; n = 62) and non-dairy beverages (n = 27), and we analyzed them using gas-liquid chromatography. Principal component analysis of the fatty acid data yielded 3 significant principal components, which accounted for 72% of the total variance in the data set. Principal component 1 was related to saturated fatty acids (C4:0, C6:0, C8:0, C12:0, C14:0, C17:0, and C18:0) and monounsaturated fatty acids (C14:1 cis-9, C16:1 cis-9, C17:1 cis-9, and C18:1 trans-11); whole milk samples were clearly differentiated from the rest using this principal component. Principal component 2 differentiated semi-skim milk samples by n-3 fatty acid content (C20:3n-3, C20:5n-3, and C22:6n-3). Principal component 3 was related to C18:2 trans-9,trans-12 and C20:4n-6, and its lower scores were observed in skim milk and non-dairy beverages. A cluster analysis yielded 3 groups: group 1 consisted of only whole milk samples, group 2 was represented mainly by semi-skim milks, and group 3 included skim milk and non-dairy beverages. Overall, the present study showed that a multivariate chemometric approach is a useful tool for differentiating or associating retail bovine milks and non-dairy beverages using their fatty acid profile. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Hajdari, Avni; Mustafa, Behxhet; Nebija, Dashnor; Selimi, Hyrmete; Veselaj, Zeqir; Breznica, Pranvera; Quave, Cassandra Leah; Novak, Johannes
The principal aim of this study was to analyze the chemical composition and qualitative and quantitative variability of essential oils obtained from seven naturally grown populations of the Pinus peuce Grisebach, Pinaceae in Kosovo. Plant materials were collected from three populations in the Sharri National Park and from four other populations in the Bjeshkët e Nemuna National Park, in Kosovo. Essential oils were obtained by steam distillation and analyzed by GC-FID (Gas Chromatography-Flame Ionization Detection) and GC-MS (Gas Chromatography-Mass Spectrometry). The results showed that the yield of essential oils (v/w dry weight) varied depending on the origin of population and the plant organs and ranged from 0.7 to 3.3%. In total, 51 compounds were identified. The main compounds were α-pinene (needles: 21.6-34.9%; twigs: 11.0-24%), β-phellandrene (needles: 4.1-27.7; twigs: 29.0-49.8%), and β-pinene (needles: 10.0-16.1; twigs: 6.9-20.7%). HCA (Hierarchical Cluster Analysis) and PCA (Principal Component Analyses) were used to assess geographical variations in essential oil composition. Statistical analysis showed that the analyzed populations are grouped in three main clusters which seem to reflect microclimatic conditions on the chemical composition of the essential oils.
NASA Astrophysics Data System (ADS)
Sehgal, V.; Lakhanpal, A.; Maheswaran, R.; Khosa, R.; Sridhar, Venkataramana
2018-01-01
This study proposes a wavelet-based multi-resolution modeling approach for statistical downscaling of GCM variables to mean monthly precipitation for five locations at Krishna Basin, India. Climatic dataset from NCEP is used for training the proposed models (Jan.'69 to Dec.'94) and are applied to corresponding CanCM4 GCM variables to simulate precipitation for the validation (Jan.'95-Dec.'05) and forecast (Jan.'06-Dec.'35) periods. The observed precipitation data is obtained from the India Meteorological Department (IMD) gridded precipitation product at 0.25 degree spatial resolution. This paper proposes a novel Multi-Scale Wavelet Entropy (MWE) based approach for clustering climatic variables into suitable clusters using k-means methodology. Principal Component Analysis (PCA) is used to obtain the representative Principal Components (PC) explaining 90-95% variance for each cluster. A multi-resolution non-linear approach combining Discrete Wavelet Transform (DWT) and Second Order Volterra (SoV) is used to model the representative PCs to obtain the downscaled precipitation for each downscaling location (W-P-SoV model). The results establish that wavelet-based multi-resolution SoV models perform significantly better compared to the traditional Multiple Linear Regression (MLR) and Artificial Neural Networks (ANN) based frameworks. It is observed that the proposed MWE-based clustering and subsequent PCA, helps reduce the dimensionality of the input climatic variables, while capturing more variability compared to stand-alone k-means (no MWE). The proposed models perform better in estimating the number of precipitation events during the non-monsoon periods whereas the models with clustering without MWE over-estimate the rainfall during the dry season.
Białek, A; Białek, M; Lepionka, T; Kaszperuk, K; Banaszkiewicz, T; Tokarz, A
2018-04-23
The aim of this study was to determine whether diet modification with different doses of grapeseed oil or pomegranate seed oil will improve the nutritive value of poultry meat in terms of n-3 and n-6 fatty acids, as well as rumenic acid (cis-9, trans-11 conjugated linoleic acid) content in tissues diversified in lipid composition and roles in lipid metabolism. To evaluate the influence of applied diet modification comprehensively, two chemometric methods were used. Results of cluster analysis demonstrated that pomegranate seed oil modifies fatty acids profile in the most potent way, mainly by an increase in rumenic acid content. Principal component analysis showed that regardless of type of tissue first principal component is strongly associated with type of deposited fatty acid, while second principal component enables identification of place of deposition-type of tissue. Pomegranate seed oil seems to be a valuable feed additive in chickens' feeding. © 2018 Blackwell Verlag GmbH.
[Discrimination of varieties of brake fluid using visual-near infrared spectra].
Jiang, Lu-lu; Tan, Li-hong; Qiu, Zheng-jun; Lu, Jiang-feng; He, Yong
2008-06-01
A new method was developed to fast discriminate brands of brake fluid by means of visual-near infrared spectroscopy. Five different brands of brake fluid were analyzed using a handheld near infrared spectrograph, manufactured by ASD Company, and 60 samples were gotten from each brand of brake fluid. The samples data were pretreated using average smoothing and standard normal variable method, and then analyzed using principal component analysis (PCA). A 2-dimensional plot was drawn based on the first and the second principal components, and the plot indicated that the clustering characteristic of different brake fluid is distinct. The foregoing 6 principal components were taken as input variable, and the band of brake fluid as output variable to build the discriminate model by stepwise discriminant analysis method. Two hundred twenty five samples selected randomly were used to create the model, and the rest 75 samples to verify the model. The result showed that the distinguishing rate was 94.67%, indicating that the method proposed in this paper has good performance in classification and discrimination. It provides a new way to fast discriminate different brands of brake fluid.
Isomap transform for segmenting human body shapes.
Cerveri, P; Sarro, K J; Marchente, M; Barros, R M L
2011-09-01
Segmentation of the 3D human body is a very challenging problem in applications exploiting volume capture data. Direct clustering in the Euclidean space is usually complex or even unsolvable. This paper presents an original method based on the Isomap (isometric feature mapping) transform of the volume data-set. The 3D articulated posture is mapped by Isomap in the pose of Da Vinci's Vitruvian man. The limbs are unrolled from each other and separated from the trunk and pelvis, and the topology of the human body shape is recovered. In such a configuration, Hoshen-Kopelman clustering applied to concentric spherical shells is used to automatically group points into the labelled principal curves. Shepard interpolation is utilised to back-map points of the principal curves into the original volume space. The experimental results performed on many different postures have proved the validity of the proposed method. Reliability of less than 2 cm and 3° in the location of the joint centres and direction axes of rotations has been obtained, respectively, which qualifies this procedure as a potential tool for markerless motion analysis.
Kandadai, Venk; Yang, Haodong; Jiang, Ling; Yang, Christopher C; Fleisher, Linda; Winston, Flaura Koplin
2016-05-05
Little is known about the ability of individual stakeholder groups to achieve health information dissemination goals through Twitter. This study aimed to develop and apply methods for the systematic evaluation and optimization of health information dissemination by stakeholders through Twitter. Tweet content from 1790 followers of @SafetyMD (July-November 2012) was examined. User emphasis, a new indicator of Twitter information dissemination, was defined and applied to retweets across two levels of retweeters originating from @SafetyMD. User interest clusters were identified based on principal component analysis (PCA) and hierarchical cluster analysis (HCA) of a random sample of 170 followers. User emphasis of keywords remained across levels but decreased by 9.5 percentage points. PCA and HCA identified 12 statistically unique clusters of followers within the @SafetyMD Twitter network. This study is one of the first to develop methods for use by stakeholders to evaluate and optimize their use of Twitter to disseminate health information. Our new methods provide preliminary evidence that individual stakeholders can evaluate the effectiveness of health information dissemination and create content-specific clusters for more specific targeted messaging.
Busch, Vincent; Van Stel, Henk F; Schrijvers, Augustinus J P; de Leeuw, Johannes R J
2013-12-04
Recent studies show several health-related behaviors to cluster in adolescents. This has important implications for public health. Interrelated behaviors have been shown to be most effectively targeted by multimodal interventions addressing wider-ranging improvements in lifestyle instead of via separate interventions targeting individual behaviors. However, few previous studies have taken into account a broad, multi-disciplinary range of health-related behaviors and connected these behavioral patterns to health-related outcomes. This paper presents an analysis of the clustering of a broad range of health-related behaviors with relevant demographic factors and several health-related outcomes in adolescents. Self-report questionnaire data were collected from a sample of 2,690 Dutch high school adolescents. Behavioral patterns were deducted via Principal Components Analysis. Subsequently a Two-Step Cluster Analysis was used to identify groups of adolescents with similar behavioral patterns and health-related outcomes. Four distinct behavioral patterns describe the analyzed individual behaviors: 1- risk-prone behavior, 2- bully behavior, 3- problematic screen time use, and 4- sedentary behavior. Subsequent cluster analysis identified four clusters of adolescents. Multi-problem behavior was associated with problematic physical and psychosocial health outcomes, as opposed to those exerting relatively few unhealthy behaviors. These associations were relatively independent of demographics such as ethnicity, gender and socio-economic status. The results show that health-related behaviors tend to cluster, indicating that specific behavioral patterns underlie individual health behaviors. In addition, specific patterns of health-related behaviors were associated with specific health outcomes and demographic factors. In general, unhealthy behavior on account of multiple health-related behaviors was associated with both poor psychosocial and physical health. These findings have significant meaning for future public health programs, which should be more tailored with use of such knowledge on behavioral clustering via e.g. Transfer Learning.
2013-01-01
Background Recent studies show several health-related behaviors to cluster in adolescents. This has important implications for public health. Interrelated behaviors have been shown to be most effectively targeted by multimodal interventions addressing wider-ranging improvements in lifestyle instead of via separate interventions targeting individual behaviors. However, few previous studies have taken into account a broad, multi-disciplinary range of health-related behaviors and connected these behavioral patterns to health-related outcomes. This paper presents an analysis of the clustering of a broad range of health-related behaviors with relevant demographic factors and several health-related outcomes in adolescents. Methods Self-report questionnaire data were collected from a sample of 2,690 Dutch high school adolescents. Behavioral patterns were deducted via Principal Components Analysis. Subsequently a Two-Step Cluster Analysis was used to identify groups of adolescents with similar behavioral patterns and health-related outcomes. Results Four distinct behavioral patterns describe the analyzed individual behaviors: 1- risk-prone behavior, 2- bully behavior, 3- problematic screen time use, and 4- sedentary behavior. Subsequent cluster analysis identified four clusters of adolescents. Multi-problem behavior was associated with problematic physical and psychosocial health outcomes, as opposed to those exerting relatively few unhealthy behaviors. These associations were relatively independent of demographics such as ethnicity, gender and socio-economic status. Conclusions The results show that health-related behaviors tend to cluster, indicating that specific behavioral patterns underlie individual health behaviors. In addition, specific patterns of health-related behaviors were associated with specific health outcomes and demographic factors. In general, unhealthy behavior on account of multiple health-related behaviors was associated with both poor psychosocial and physical health. These findings have significant meaning for future public health programs, which should be more tailored with use of such knowledge on behavioral clustering via e.g. Transfer Learning. PMID:24305509
Unsupervised pattern recognition methods in ciders profiling based on GCE voltammetric signals.
Jakubowska, Małgorzata; Sordoń, Wanda; Ciepiela, Filip
2016-07-15
This work presents a complete methodology of distinguishing between different brands of cider and ageing degrees, based on voltammetric signals, utilizing dedicated data preprocessing procedures and unsupervised multivariate analysis. It was demonstrated that voltammograms recorded on glassy carbon electrode in Britton-Robinson buffer at pH 2 are reproducible for each brand. By application of clustering algorithms and principal component analysis visible homogenous clusters were obtained. Advanced signal processing strategy which included automatic baseline correction, interval scaling and continuous wavelet transform with dedicated mother wavelet, was a key step in the correct recognition of the objects. The results show that voltammetry combined with optimized univariate and multivariate data processing is a sufficient tool to distinguish between ciders from various brands and to evaluate their freshness. Copyright © 2016 Elsevier Ltd. All rights reserved.
Kenzaka, Tsuneaki; Kumabe, Ayako; Kosami, Koki; Matsuoka, Yasufumi; Minami, Kensuke; Ninomiya, Daisuke; Noda, Ayako; Okayama, Masanobu
2017-05-01
To investigate the items that are considered by physicians when making decisions regarding the resumption of oral intake among patients with aspiration pneumonia who have undergone short-term fasting. We surveyed 2490 Japanese hospitals that had internal medicine and respiratory medicine departments. We mailed questionnaires that contained 24 items related to oral intake resumption after aspiration pneumonia to the head of the department at each hospital. Cronbach statistics, principal component analysis and cluster analysis were used to analyze the results. We received responses from 350 hospitals; 89.7% of the respondents answered that they "Strongly agree" that "level of consciousness" is a useful criterion for resuming oral intake. Furthermore, 66%, 66%, 63.4%, 58.5% and 51% of the respondents answered that they "strongly agree" regarding the use of SpO 2 , the discretion of the attending physician, body temperature, swallowing function test results, mental state and respiratory rate, respectively. In the cluster analysis, level of consciousness, body temperature, SpO 2 , respiratory rate, mental state and the discretion of the attending physician belonged to the first cluster. The second cluster consisted of the patient's request, the family's request, the opinions of the medical staff and non-physician healthcare providers, and performance status. Physicians consider several criteria during decision-making regarding oral intake resumption, which can be assigned to two clusters. Future studies are required to develop generalizable and objective criteria. Geriatr Gerontol Int 2017; 17: 810-818. © 2016 The Authors. Geriatrics & Gerontology International published by John Wiley & Sons Australia, Ltd on behalf of Japan Geriatrics Society.
Arciniega, Marcelino; Beck, Philipp; Lange, Oliver F.; Groll, Michael; Huber, Robert
2014-01-01
Two clusters of configurations of the main proteolytic subunit β5 were identified by principal component analysis of crystal structures of the yeast proteasome core particle (yCP). The apo-cluster encompasses unliganded species and complexes with nonpeptidic ligands, and the pep-cluster comprises complexes with peptidic ligands. The murine constitutive CP structures conform to the yeast system, with the apo-form settled in the apo-cluster and the PR-957 (a peptidic ligand) complex in the pep-cluster. In striking contrast, the murine immune CP classifies into the pep-cluster in both the apo and the PR-957–liganded species. The two clusters differ essentially by multiple small structural changes and a domain motion enabling enclosure of the peptidic ligand and formation of specific hydrogen bonds in the pep-cluster. The immune CP species is in optimal peptide binding configuration also in its apo form. This favors productive ligand binding and may help to explain the generally increased functional activity of the immunoproteasome. Molecular dynamics simulations of the representative murine species are consistent with the experimentally observed configurations. A comparison of all 28 subunits of the unliganded species with the peptidic liganded forms demonstrates a greatly enhanced plasticity of β5 and suggests specific signaling pathways to other subunits. PMID:24979800
Fraga, Angelina Bossi; de Lima Silva, Fabiane; Hongyu, Kuang; Da Silva Santos, Darlim; Murphy, Thomas Wayne; Lopes, Fernando Brito
2016-03-01
The objective of this research was to try to unveil the relationship between production traits and genotypic proportions of crossbred dairy cattle using principal component analysis (PCA) and cluster analysis. The herd consists of crossbred animals of Holstein (H) and Zebu (Z) (Gir and Guzerat) in different genotypic proportions; the composition of which varies from 12.5 to 100.0 % of the genetic group H. For this study, 834 milk production records from 257 cows from the years 1997 to 2014 were analyzed. The animals were all managed at a farm located in northeastern Brazil. The variables in the PCA were total milk yield per lactation (MY), milk yield adjusted to 305 days (MY305), lactation length (LL), and proportion of H and Z breeding. This analysis reduced the size of the sample space from the original five variables to two principal components (PCs) that together explained 89.4 % of the total variation. MY, MY305, LL, and genotypic proportion of H all contributed positively to PC1. The genotypic proportion of Z contributed negatively, which established a contrast between H and Z. Further cluster analysis identified two distinct groups when considering production performance and genotype of the animals. The high-performance group was predominantly Holstein breeding, while the lower performing group consisted mostly of Zebu. Under the environmental and management conditions in which this research was conducted, the best performances for the traits considered were achieved from cows whose genotypic proportion was between 38.0 and 94.0 % Holstein breeding.
Diabetes Changes Symptoms Cluster Patterns in Persons Living With HIV.
Zuniga, Julie Ann; Bose, Eliezer; Park, Jungmin; Lapiz-Bluhm, M Danet; García, Alexandra A
Approximately 10-15% of persons living with HIV (PLWH) have a comorbid diagnosis of diabetes mellitus (DM). Both of these long-term chronic conditions are associated with high rates of symptom burden. The purpose of our study was to describe symptom patterns for PLWH with DM (PLWH+DM) using a large secondary dataset. The prevalence, burden, and bothersomeness of symptoms reported by patients in routine clinic visits during 2015 were assessed using the 20-item HIV Symptom Index. Principal component analysis was used to identify symptom clusters. Three main clusters were identified: (a) neurological/psychological, (b) gastrointestinal/flu-like, and (c) physical changes. The most prevalent symptoms were fatigue, poor sleep, aches, neuropathy, and sadness. When compared to a previous symptom study with PLWH, symptoms clustered differently in our sample of patients with dual diagnoses of HIV and diabetes. Clinicians should appropriately assess symptoms for their patients' comorbid conditions. Copyright © 2017 Association of Nurses in AIDS Care. Published by Elsevier Inc. All rights reserved.
Liévanos, Raoul S
2015-11-01
This article contributes to environmental inequality outcomes research on the spatial and demographic factors associated with cumulative air-toxic health risks at multiple geographic scales across the United States. It employs a rigorous spatial cluster analysis of census tract-level 2005 estimated lifetime cancer risk (LCR) of ambient air-toxic emissions from stationary (e.g., facility) and mobile (e.g., vehicular) sources to locate spatial clusters of air-toxic LCR risk in the continental United States. It then tests intersectional environmental inequality hypotheses on the predictors of tract presence in air-toxic LCR clusters with tract-level principal component factor measures of economic deprivation by race and immigrant status. Logistic regression analyses show that net of controls, isolated Latino immigrant-economic deprivation is the strongest positive demographic predictor of tract presence in air-toxic LCR clusters, followed by black-economic deprivation and isolated Asian/Pacific Islander immigrant-economic deprivation. Findings suggest scholarly and practical implications for future research, advocacy, and policy. Copyright © 2015 Elsevier Inc. All rights reserved.
Chastagner, Amélie; Dugat, Thibaud; Vourc'h, Gwenaël; Verheyden, Hélène; Legrand, Loïc; Bachy, Véronique; Chabanne, Luc; Joncour, Guy; Maillard, Renaud; Boulouis, Henri-Jean; Haddad, Nadia; Bailly, Xavier; Leblond, Agnès
2014-12-09
Molecular epidemiology represents a powerful approach to elucidate the complex epidemiological cycles of multi-host pathogens, such as Anaplasma phagocytophilum. A. phagocytophilum is a tick-borne bacterium that affects a wide range of wild and domesticated animals. Here, we characterized its genetic diversity in populations of French cattle; we then compared the observed genotypes with those found in horses, dogs, and roe deer to determine whether genotypes of A. phagocytophilum are shared among different hosts. We sampled 120 domesticated animals (104 cattle, 13 horses, and 3 dogs) and 40 wild animals (roe deer) and used multilocus sequence analysis on nine loci (ankA, msp4, groESL, typA, pled, gyrA, recG, polA, and an intergenic region) to characterize the genotypes of A. phagocytophilum present. Phylogenic analysis revealed three genetic clusters of bacterial variants in domesticated animals. The two principal clusters included 98% of the bacterial genotypes found in cattle, which were only distantly related to those in roe deer. One cluster comprised only cattle genotypes, while the second contained genotypes from cattle, horses, and dogs. The third contained all roe deer genotypes and three cattle genotypes. Geographical factors could not explain this clustering pattern. These results suggest that roe deer do not contribute to the spread of A. phagocytophilum in cattle in France. Further studies should explore if these different clusters are associated with differing disease severity in domesticated hosts. Additionally, it remains to be seen if the three clusters of A. phagocytophilum genotypes in cattle correspond to distinct epidemiological cycles, potentially involving different reservoir hosts.
Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.
Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed
2015-11-05
Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (P<0.001), independent of age, height, weight, and running speed. When these two groups were compared to PFP runners, one cluster exhibited greater while the other exhibited reduced peak knee abduction angles (P<0.05). The variability observed in running patterns across this sample could be the result of different gait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners. Copyright © 2015 Elsevier Ltd. All rights reserved.
Quality Evaluation of Agricultural Distillates Using an Electronic Nose
Dymerski, Tomasz; Gębicki, Jacek; Wardencki, Waldemar; Namieśnik, Jacek
2013-01-01
The paper presents the application of an electronic nose instrument to fast evaluation of agricultural distillates differing in quality. The investigations were carried out using a prototype of electronic nose equipped with a set of six semiconductor sensors by FIGARO Co., an electronic circuit converting signal into digital form and a set of thermostats able to provide gradient temperature characteristics to a gas mixture. A volatile fraction of the agricultural distillate samples differing in quality was obtained by barbotage. Interpretation of the results involved three data analysis techniques: principal component analysis, single-linkage cluster analysis and cluster analysis with spheres method. The investigations prove the usefulness of the presented technique in the quality control of agricultural distillates. Optimum measurements conditions were also defined, including volumetric flow rate of carrier gas (15 L/h), thermostat temperature during the barbotage process (15 °C) and time of sensor signal acquisition from the onset of the barbotage process (60 s). PMID:24287525
NASA Astrophysics Data System (ADS)
Benninghoff, L.; von Czarnowski, D.; Denkhaus, E.; Lemke, K.
1997-07-01
For the determination of trace element distributions of more than 20 elements in malignant and normal tissues of the human colon, tissue samples (approx. 400 mg wet weight) were digested with 3 ml of nitric acid (sub-boiled quality) by use of an autoclave system. The accuracy of measurements has been investigated by using certified materials. The analytical results were evaluated by using a spreadsheet program to give an overview of the element distribution in cancerous samples and in normal colon tissues. A further application, cluster analysis of the analytical results, was introduced to demonstrate the possibility of classification for cancer diagnosis. To confirm the results of cluster analysis, multivariate three-way principal component analysis was performed. Additionally, microtome frozen sections (10 μm) were prepared from the same tissue samples to compare the analytical results, i.e. the mass fractions of elements, according to the preparation method and to exclude systematic errors depending on the inhomogeneity of the tissues.
Nakamura, Kengo; Kuwatani, Tatsu; Kawabe, Yoshishige; Komai, Takeshi
2016-02-01
Tsunami deposits accumulated on the Tohoku coastal area in Japan due to the impact of the Tohoku-oki earthquake. In the study reported in this paper, we applied principal component analysis (PCA) and cluster analysis (CA) to determine the concentrations of heavy metals in tsunami deposits that had been diluted with water or digested using 1 M HCl. The results suggest that the environmental risk is relatively low, evidenced by the following geometric mean concentrations: Pb, 16 mg kg(-1) and 0.003 ml L(-1); As, 1.8 mg kg(-1) and 0.004 ml L(-1); and Cd, 0.17 mg kg(-1) and 0.0001 ml L(-1). CA was performed after outliers were excluded using PCA. The analysis grouped the concentrations of heavy metals for leaching in water and acid. For the acid case, the first cluster contained Ni, Fe, Cd, Cu, Al, Cr, Zn, and Mn; while the second contained Pb, Sb, As, and Mo. For water, the first cluster contained Ni, Fe, Al, and Cr; and the second cluster contained Mo, Sb, As, Cu, Zn, Pb, and Mn. Statistical analysis revealed that the typical toxic elements, As, Pb, and Cd have steady correlations for acid leaching but are relatively sparse for water leaching. Pb and As from the tsunami deposits seemed to reveal a kind of redox elution mechanism using 1 M HCl. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
NASA Astrophysics Data System (ADS)
Kumar, Raj; Sharma, Vishal
2017-03-01
The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%).
Vasilaki, V; Volcke, E I P; Nandi, A K; van Loosdrecht, M C M; Katsou, E
2018-04-26
Multivariate statistical analysis was applied to investigate the dependencies and underlying patterns between N 2 O emissions and online operational variables (dissolved oxygen and nitrogen component concentrations, temperature and influent flow-rate) during biological nitrogen removal from wastewater. The system under study was a full-scale reactor, for which hourly sensor data were available. The 15-month long monitoring campaign was divided into 10 sub-periods based on the profile of N 2 O emissions, using Binary Segmentation. The dependencies between operating variables and N 2 O emissions fluctuated according to Spearman's rank correlation. The correlation between N 2 O emissions and nitrite concentrations ranged between 0.51 and 0.78. Correlation >0.7 between N 2 O emissions and nitrate concentrations was observed at sub-periods with average temperature lower than 12 °C. Hierarchical k-means clustering and principal component analysis linked N 2 O emission peaks with precipitation events and ammonium concentrations higher than 2 mg/L, especially in sub-periods characterized by low N 2 O fluxes. Additionally, the highest ranges of measured N 2 O fluxes belonged to clusters corresponding with NO 3 -N concentration less than 1 mg/L in the upstream plug-flow reactor (middle of oxic zone), indicating slow nitrification rates. The results showed that the range of N 2 O emissions partially depends on the prior behavior of the system. The principal component analysis validated the findings from the clustering analysis and showed that ammonium, nitrate, nitrite and temperature explained a considerable percentage of the variance in the system for the majority of the sub-periods. The applied statistical methods, linked the different ranges of emissions with the system variables, provided insights on the effect of operating conditions on N 2 O emissions in each sub-period and can be integrated into N 2 O emissions data processing at wastewater treatment plants. Copyright © 2018. Published by Elsevier Ltd.
[Study on HPLC fingerprint of Oldenlandia diffusa].
Chen, Yan; Yao, Zhi-Hong; Dai, Yi; Cheng, Hong; Wen, Li-Rong; Zhou, Guang-Xiong; Yao, Xin-Sheng
2012-06-01
To establish the HPLC fingerprint chromatogram of Oldenlandia diffusa coupled with chemometrics means for the quality control of multi-batches of medicinal material. The separation was developed on C18 column(4.6 mm x 250 mm, 5 microm) by gradient elution with acetonitrile-water(both containing 0.1 per thousand (V/V) ocetic acid) as mobile phase at a flow rate of 0.8 mL/min, the detection wavelength at 238 nm and column temperature at 30 degrees C. The HPLC fingerprint chromatogram of Oldenlandia diffusa was set up and the main characteristic peaks were identified by comparing with chemical reference substance. The quality of 22 batches of medicinal material was evaluated by similarity assay as well as principal component analysis (PCA) and cluster analysis. The established HPLC fingerprint chromatogram of Oldenlandia diffusa was specific, precise, reproducible and stable. 11 peaks were chemically identified. The similarity of 17 batches of Oldenlandia diffusa was obviously higher than 5 batches of adulterants. PCA showed that 17 batches of Oldenlandia diffusa were in a domain and 5 batches of adulterants were far apart from the domain. The cluster analysis of the 22 batches of medicinal material showed that 17 batches of Oldenlandia diffusa were in a cluster while 5 batches of adulterants were excluded. Further cluster analysis was carried out for the quality consistency of 17 batches of Oldenlandia diffusa and accordingly they were devided into 4 clusters. With the combination of chemometrics means, the HPLC fingerprint chromatogram provides a method for evaluation of authenticity and quality control of Oldenlandia diffusa, which is favorable to improve overall quality control of Oldenlandia diffusa.
Deschamps, Kevin; Matricali, Giovanni Arnoldo; Roosen, Philip; Desloovere, Kaat; Bruyninckx, Herman; Spaepen, Pieter; Nobels, Frank; Tits, Jos; Flour, Mieke; Staes, Filip
2013-01-01
Background The aim of this study was to identify groups of subjects with similar patterns of forefoot loading and verify if specific groups of patients with diabetes could be isolated from non-diabetics. Methodology/Principal Findings Ninety-seven patients with diabetes and 33 control participants between 45 and 70 years were prospectively recruited in two Belgian Diabetic Foot Clinics. Barefoot plantar pressure measurements were recorded and subsequently analysed using a semi-automatic total mapping technique. Kmeans cluster analysis was applied on relative regional impulses of six forefoot segments in order to pursue a classification for the control group separately, the diabetic group separately and both groups together. Cluster analysis led to identification of three distinct groups when considering only the control group. For the diabetic group, and the computation considering both groups together, four distinct groups were isolated. Compared to the cluster analysis of the control group an additional forefoot loading pattern was identified. This group comprised diabetic feet only. The relevance of the reported clusters was supported by ANOVA statistics indicating significant differences between different regions of interest and different clusters. Conclusion/s Significance There seems to emerge a new era in diabetic foot medicine which embraces the classification of diabetic patients according to their biomechanical profile. Classification of the plantar pressure distribution has the potential to provide a means to determine mechanical interventions for the prevention and/or treatment of the diabetic foot. PMID:24278219
Employment relations and global health: a typological study of world labor markets.
Chung, Haejoo; Muntaner, Carles; Benach, Joan
2010-01-01
In this study, the authors investigate the global labor market and employment relations, which are central building blocks of the welfare state; the aim is to propose a global typology of labor markets to explain global inequalities in population health. Countries are categorized into core (21), semi-peripheral (42), and peripheral (71) countries, based on gross national product per capita (Atlas method). Labor market-related variables and factors are then used to generate clusters of countries with principal components and cluster analysis methods. The authors then examine the relationship between the resulting clusters and health outcomes. The clusters of countries are largely geographically defined, each cluster with similar historical background and developmental strategy. However, there are interesting exceptions, which warrant further elaboration. The relationship between health outcomes and clusters largely follows the authors' expectations (except for communicable diseases): more egalitarian labor institutions have better health outcomes. The world system, then, can be divided according to different types of labor markets that are predictive of population health outcomes at each level of economic development. As is the case for health and social policies, variability in labor market characteristics is likely to reflect, in part, the relative strength of a country's political actors.
Transport in the Subtropical Lowermost Stratosphere during CRYSTAL-FACE
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NO(y)) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations, and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and lastly by the extent of convective influence, potentially related to the latitude of convective injection [Dessler and Sherwuud, 2004]. We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and non-local events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
NASA Technical Reports Server (NTRS)
Pittman, Jasna V.; Weinstock, Elliot M.; Oglesby, Robert J.; Sayres, David S.; Smith, Jessica B.; Anderson, James G.; Cooper, Owen R.; Wofsy, Steven C.; Xueref, Irene; Gerbig, Cristoph;
2007-01-01
We use in situ measurements of water vapor (H2O), ozone (O3), carbon dioxide (CO2), carbon monoxide (CO), nitric oxide (NO), and total reactive nitrogen (NOy) obtained during the CRYSTAL-FACE campaign in July 2002 to study summertime transport in the subtropical lowermost stratosphere. We use an objective methodology to distinguish the latitudinal origin of the sampled air masses despite the influence of convection, and we calculate backward trajectories to elucidate their recent geographical history. The methodology consists of exploring the statistical behavior of the data by performing multivariate clustering and agglomerative hierarchical clustering calculations and projecting cluster groups onto principal component space to identify air masses of like composition and hence presumed origin. The statistically derived cluster groups are then examined in physical space using tracer-tracer correlation plots. Interpretation of the principal component analysis suggests that the variability in the data is accounted for primarily by the mean age of air in the stratosphere, followed by the age of the convective influence, and last by the extent of convective influence, potentially related to the latitude of convective injection (Dessler and Sherwood, 2004). We find that high-latitude stratospheric air is the dominant source region during the beginning of the campaign while tropical air is the dominant source region during the rest of the campaign. Influence of convection from both local and nonlocal events is frequently observed. The identification of air mass origin is confirmed with backward trajectories, and the behavior of the trajectories is associated with the North American monsoon circulation.
Silva, A V C; Nascimento, A L S; Vitória, M F; Rabbani, A R C; Soares, A N R; Lédo, A S
2017-02-23
Banana (Musa spp) is a fruit species frequently cultivated and consumed worldwide. Molecular markers are important for estimating genetic diversity in germplasm and between genotypes in breeding programs. The objective of this study was to analyze the genetic diversity of 21 banana genotypes (FHIA 23, PA42-44, Maçã, Pacovan Ken, Bucaneiro, YB42-47, Grand Naine, Tropical, FHIA 18, PA94-01, YB42-17, Enxerto, Japira, Pacovã, Prata-Anã, Maravilha, PV79-34, Caipira, Princesa, Garantida, and Thap Maeo), by using inter-simple sequence repeat (ISSR) markers. Material was generated from the banana breeding program of Embrapa Cassava & Fruits and evaluated at Embrapa Coastal Tablelands. The 12 primers used in this study generated 97.5% polymorphism. Four clusters were identified among the different genotypes studied, and the sum of the first two principal components was 48.91%. From the Unweighted Pair Group Method using Arithmetic averages (UPGMA) dendrogram, it was possible to identify two main clusters and subclusters. Two genotypes (Garantida and Thap Maeo) remained isolated from the others, both in the UPGMA clustering and in the principal cordinate analysis (PCoA). Using ISSR markers, we could analyze the genetic diversity of the studied material and state that these markers were efficient at detecting sufficient polymorphism to estimate the genetic variability in banana genotypes.
Rakotosamimanana, Sitraka; Mandrosovololona, Vatsiharizandry; Rakotonirina, Julio; Ramamonjisoa, Joselyne; Ranjalahy, Justin Rasolofomanana; Randremanana, Rindra Vatosoa; Rakotomanana, Fanjasoa
2014-01-01
Tuberculosis infection may remain latent, but the disease is nevertheless a serious public health issue. Various epidemiological studies on pulmonary tuberculosis have considered the spatial component and taken it into account, revealing the tendency of this disease to cluster in particular locations. The aim was to assess the contribution of Knowledge Attitude and Practice (KAP) to the distribution of tuberculosis and to provide information for the improvement of the National Tuberculosis Program. We investigated the role of KAP to distribution patterns of pulmonary tuberculosis in Antananarivo. First, we performed spatial scanning of tuberculosis aggregation among permanent cases resident in Antananarivo Urban Township using the Kulldorff method, and then we carried out a quantitative study on KAP, involving TB patients. The KAP study in the population was based on qualitative methods with focus groups. The disease still clusters in the same districts identified in the previous study. The principal cluster covered 22 neighborhoods. Most of them are part of the first district. A secondary cluster was found, involving 18 neighborhoods in the sixth district and two neighborhoods in the fifth. The relative risk was respectively 1.7 (p<10-6) in the principal cluster and 1.6 (p<10-3) in the secondary cluster. Our study showed that more was known about TB symptoms than about the duration of the disease or free treatment. Knowledge about TB was limited to that acquired at school or from relatives with TB. The attitude and practices of patients and the population in general indicated that there is still a stigma attached to tuberculosis. This type of survey can be conducted in remote zones where the tuberculosis-related KAP of the TB patients and the general population is less known or not documented; the findings could be used to adapt control measures to the local particularities.
Finch, Caroline F; Stephan, Karen; Shee, Anna Wong; Hill, Keith; Haines, Terry P; Clemson, Lindy; Day, Lesley
2015-01-01
Background There has been limited research investigating the relationship between injurious falls and hospital resource use. The aims of this study were to identify clusters of community-dwelling older people in the general population who are at increased risk of being admitted to hospital following a fall and how those clusters differed in their use of hospital resources. Methods Analysis of routinely collected hospital admissions data relating to 45 374 fall-related admissions in Victorian community-dwelling older adults aged ≥65 years that occurred during 2008/2009 to 2010/2011. Fall-related admission episodes were identified based on being admitted from a private residence to hospital with a principal diagnosis of injury (International Classification of Diseases (ICD)-10-AM codes S00 to T75) and having a first external cause of a fall (ICD-10-AM codes W00 to W19). A cluster analysis was performed to identify homogeneous groups using demographic details of patients and information on the presence of comorbidities. Hospital length of stay (LOS) was compared across clusters using competing risks regression. Results Clusters based on area of residence, demographic factors (age, gender, marital status, country of birth) and the presence of comorbidities were identified. Clusters representing hospitalised fallers with comorbidities were associated with longer LOS compared with other cluster groups. Clusters delineated by demographic factors were also associated with increased LOS. Conclusions All patients with comorbidity, and older women without comorbidities, stay in hospital longer following a fall and hence consume a disproportionate share of hospital resources. These findings have important implications for the targeting of falls prevention interventions for community-dwelling older people. PMID:25618735
Multivariate analysis of selected metals in tannery effluents and related soil.
Tariq, Saadia R; Shah, Munir H; Shaheen, N; Khalique, A; Manzoor, S; Jaffar, M
2005-06-30
Effluent and relevant soil samples from 38 tanning units housed in Kasur, Pakistan, were obtained for metal analysis by flame atomic absorption spectrophotometric method. The levels of 12 metals, Na, Ca, K, Mg, Fe, Mn, Cr, Co, Cd, Ni, Pb and Zn were determined in the two media. The data were evaluated towards metal distribution and metal-to-metal correlations. The study evidenced enhanced levels of Cr (391, 16.7 mg/L) and Na (25,519, 9369 mg/L) in tannery effluents and relevant soil samples, respectively. The effluent versus soil trace metal content relationship confirmed that the effluent Cr was strongly correlated with soil Cr. For metal source identification the techniques of principal component analysis, and cluster analysis were applied. The principal component analysis yielded two factors for effluents: factor 1 (49.6% variance) showed significant loading for Ca, Fe, Mn, Cr, Cd, Ni, Pb and Zn, referring to a tanning related source for these metals, and factor 2 (12.6% variance) with higher loadings of Na, K, Mg and Co, was associated with the processes during the skin/hide treatment. Similarly, two factors with a cumulative variance of 34.8% were obtained for soil samples: factor 1 manifested the contribution from Mg, Mn, Co, Cd, Ni and Pb, which though soil-based is basically effluent-derived, while factor 2 was found associated with Na, K, Ca, Cr and Zn which referred to a tannery-based source. The dendograms obtained from cluster analysis, also support the observed results. The study exhibits a gross pollution of soils with Cr at levels far exceeding the stipulated safe limit laid down for tannery effluents.
Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid
2008-01-01
A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.
Rangjaroen, Chakrapong; Rerkasem, Benjavan; Teaumroong, Neung; Sungthong, Rungroch; Lumyong, Saisamorn
2014-01-01
Communities of bacterial endophytes within the rice landraces cultivated in the highlands of northern Thailand were studied using fingerprinting data of 16S rRNA and nifH genes profiling by polymerase chain reaction-denaturing gradient gel electrophoresis. The bacterial communities' richness, diversity index, evenness, and stability were varied depending on the plant tissues, stages of growth, and rice cultivars. These indices for the endophytic diazotrophic bacteria within the landrace rice Bue Wah Bo were significantly the lowest. The endophytic bacteria revealed greater diversity by cluster analysis with seven clusters compared to the endophytic diazotrophic bacteria (three clusters). Principal component analysis suggested that the endophytic bacteria showed that the community structures across the rice landraces had a higher stability than those of the endophytic diazotrophic bacteria. Uncultured bacteria were found dominantly in both bacterial communities, while higher generic varieties were observed in the endophytic diazotrophic bacterial community. These differences in bacterial communities might be influenced either by genetic variation in the rice landraces or the rice cultivation system, where the nitrogen input affects the endophytic diazotrophic bacterial community.
Mo, Yun; Zhang, Zhongzhao; Meng, Weixiao; Ma, Lin; Wang, Yao
2014-01-01
Indoor positioning systems based on the fingerprint method are widely used due to the large number of existing devices with a wide range of coverage. However, extensive positioning regions with a massive fingerprint database may cause high computational complexity and error margins, therefore clustering methods are widely applied as a solution. However, traditional clustering methods in positioning systems can only measure the similarity of the Received Signal Strength without being concerned with the continuity of physical coordinates. Besides, outage of access points could result in asymmetric matching problems which severely affect the fine positioning procedure. To solve these issues, in this paper we propose a positioning system based on the Spatial Division Clustering (SDC) method for clustering the fingerprint dataset subject to physical distance constraints. With the Genetic Algorithm and Support Vector Machine techniques, SDC can achieve higher coarse positioning accuracy than traditional clustering algorithms. In terms of fine localization, based on the Kernel Principal Component Analysis method, the proposed positioning system outperforms its counterparts based on other feature extraction methods in low dimensionality. Apart from balancing online matching computational burden, the new positioning system exhibits advantageous performance on radio map clustering, and also shows better robustness and adaptability in the asymmetric matching problem aspect. PMID:24451470
Gerns Storey, Helen L; Richardson, Barbra A; Singa, Benson; Naulikha, Jackie; Prindle, Vivian C; Diaz-Ochoa, Vladimir E; Felgner, Phil L; Camerini, David; Horton, Helen; John-Stewart, Grace; Walson, Judd L
2014-01-01
The role of HIV-1-specific antibody responses in HIV disease progression is complex and would benefit from analysis techniques that examine clusterings of responses. Protein microarray platforms facilitate the simultaneous evaluation of numerous protein-specific antibody responses, though excessive data are cumbersome in analyses. Principal components analysis (PCA) reduces data dimensionality by generating fewer composite variables that maximally account for variance in a dataset. To identify clusters of antibody responses involved in disease control, we investigated the association of HIV-1-specific antibody responses by protein microarray, and assessed their association with disease progression using PCA in a nested cohort design. Associations observed among collections of antibody responses paralleled protein-specific responses. At baseline, greater antibody responses to the transmembrane glycoprotein (TM) and reverse transcriptase (RT) were associated with higher viral loads, while responses to the surface glycoprotein (SU), capsid (CA), matrix (MA), and integrase (IN) proteins were associated with lower viral loads. Over 12 months greater antibody responses were associated with smaller decreases in CD4 count (CA, MA, IN), and reduced likelihood of disease progression (CA, IN). PCA and protein microarray analyses highlighted a collection of HIV-specific antibody responses that together were associated with reduced disease progression, and may not have been identified by examining individual antibody responses. This technique may be useful to explore multifaceted host-disease interactions, such as HIV coinfections.
Seierstad, Therese; Røe, Kathrine; Sitter, Beathe; Halgunset, Jostein; Flatmark, Kjersti; Ree, Anne H; Olsen, Dag Rune; Gribbestad, Ingrid S; Bathen, Tone F
2008-01-01
Background This study was conducted in order to elucidate metabolic differences between human rectal cancer biopsies and colorectal HT29, HCT116 and SW620 xenografts by using high-resolution magnetic angle spinning (MAS) magnetic resonance spectroscopy (MRS) and for determination of the most appropriate human rectal xenograft model for preclinical MR spectroscopy studies. A further aim was to investigate metabolic changes following irradiation of HT29 xenografts. Methods HR MAS MRS of tissue samples from xenografts and rectal biopsies were obtained with a Bruker Avance DRX600 spectrometer and analyzed using principal component analysis (PCA) and partial least square (PLS) regression analysis. Results and conclusion HR MAS MRS enabled assignment of 27 metabolites. Score plots from PCA of spin-echo and single-pulse spectra revealed separate clusters of the different xenografts and rectal biopsies, reflecting underlying differences in metabolite composition. The loading profile indicated that clustering was mainly based on differences in relative amounts of lipids, lactate and choline-containing compounds, with HT29 exhibiting the metabolic profile most similar to human rectal cancers tissue. Due to high necrotic fractions in the HT29 xenografts, radiation-induced changes were not detected when comparing spectra from untreated and irradiated HT29 xenografts. However, PLS calibration relating spectral data to the necrotic fraction revealed a significant correlation, indicating that necrotic fraction can be assessed from the MR spectra. PMID:18439252
A data fusion-based drought index
NASA Astrophysics Data System (ADS)
Azmi, Mohammad; Rüdiger, Christoph; Walker, Jeffrey P.
2016-03-01
Drought and water stress monitoring plays an important role in the management of water resources, especially during periods of extreme climate conditions. Here, a data fusion-based drought index (DFDI) has been developed and analyzed for three different locations of varying land use and climate regimes in Australia. The proposed index comprehensively considers all types of drought through a selection of indices and proxies associated with each drought type. In deriving the proposed index, weekly data from three different data sources (OzFlux Network, Asia-Pacific Water Monitor, and MODIS-Terra satellite) were employed to first derive commonly used individual standardized drought indices (SDIs), which were then grouped using an advanced clustering method. Next, three different multivariate methods (principal component analysis, factor analysis, and independent component analysis) were utilized to aggregate the SDIs located within each group. For the two clusters in which the grouped SDIs best reflected the water availability and vegetation conditions, the variables were aggregated based on an averaging between the standardized first principal components of the different multivariate methods. Then, considering those two aggregated indices as well as the classifications of months (dry/wet months and active/non-active months), the proposed DFDI was developed. Finally, the symbolic regression method was used to derive mathematical equations for the proposed DFDI. The results presented here show that the proposed index has revealed new aspects in water stress monitoring which previous indices were not able to, by simultaneously considering both hydrometeorological and ecological concepts to define the real water stress of the study areas.
NASA Astrophysics Data System (ADS)
Das, Shreya; Nag, S. K.
2017-05-01
Multivariate statistical techniques, cluster and principal component analysis were applied to the data on groundwater quality of Suri I and II Blocks of Birbhum District, West Bengal, India, to extract principal factors corresponding to the different sources of variation in the hydrochemistry as well as the main controls on the hydrochemistry. For this, bore well water samples have been collected in two phases, during Post-monsoon (November 2012) and Pre-monsoon (April 2013) from 26 sampling locations spread homogeneously over the two blocks. Excess fluoride in groundwater has been reported at two locations both in post- and in pre-monsoon sessions, with a rise observed in pre-monsoon. Localized presence of excess iron has also been observed during both sessions. The water is found to be mildly alkaline in post-monsoon but slightly acidic at some locations during pre-monsoon. Correlation and cluster analysis studies demonstrate that fluoride shares a moderately positive correlation with pH in post-monsoon and a very strong one with carbonate in pre-monsoon indicating dominance of rock water interaction and ion exchange activity in the study area. Certain locations in the study area have been reported with less than 0.6 mg/l fluoride in groundwater, leading to possibility of occurrence of severe dental caries especially in children. Low values of sulfate and phosphate in water indicate a meager chance of contamination of groundwater due to anthropogenic factors.
Quantum Dynamics of Helium Clusters
1993-03-01
the structure of both these and the HeN clusters in the body fixed frame by computing principal moments of inertia, thereby avoiding the...8217 of helium clusters, with the modification that we subtract 0.96 K from the computed values so that lor sufficiently large clusters we recover the...phonon spectrum of liquid He. To get a picture of these spectra one needs to compute the structure functions 51. Monte Carlo random walk simulations
Riaz, Summaira; De Lorenzis, Gabriella; Velasco, Dianne; Koehmstedt, Anne; Maghradze, David; Bobokashvili, Zviad; Musayev, Mirza; Zdunic, Goran; Laucou, Valerie; Andrew Walker, M; Failla, Osvaldo; Preece, John E; Aradhya, Mallikarjuna; Arroyo-Garcia, Rosa
2018-06-27
The mountainous region between the Caucasus and China is considered to be the center of domestication for grapevine. Despite the importance of Central Asia in the history of grape growing, information about the extent and distribution of grape genetic variation in this region is limited in comparison to wild and cultivated grapevines from around the Mediterranean basin. The principal goal of this work was to survey the genetic diversity and relationships among wild and cultivated grape germplasm from the Caucasus, Central Asia, and the Mediterranean basin collectively to understand gene flow, possible domestication events and adaptive introgression. A total of 1378 wild and cultivated grapevines collected around the Mediterranean basin and from Central Asia were tested with a set of 20 nuclear SSR markers. Genetic data were analyzed (Cluster analysis, Principal Coordinate Analysis and STRUCTURE) to identify groups, and the results were validated by Nei's genetic distance, pairwise F ST analysis and assignment tests. All of these analyses identified three genetic groups: G1, wild accessions from Croatia, France, Italy and Spain; G2, wild accessions from Armenia, Azerbaijan and Georgia; and G3, cultivars from Spain, France, Italy, Georgia, Iran, Pakistan and Turkmenistan, which included a small group of wild accessions from Georgia and Croatia. Wild accessions from Georgia clustered with cultivated grape from the same area (proles pontica), but also with Western Europe (proles occidentalis), supporting Georgia as the ancient center of grapevine domestication. In addition, cluster analysis indicated that Western European wild grapes grouped with cultivated grapes from the same area, suggesting that the cultivated proles occidentalis contributed more to the early development of wine grapes than the wild vines from Eastern Europe. The analysis of genetic relationships among the tested genotypes provided evidence of genetic relationships between wild and cultivated accessions in the Mediterranean basin and Central Asia. The genetic structure indicated a considerable amount of gene flow, which limited the differentiation between the two subspecies. The results also indicated that grapes with mixed ancestry occur in the regions where wild grapevines were domesticated.
M Weerasekera, Manjula; H Sissons, Chris; Wong, Lisa; A Anderson, Sally; R Holmes, Ann; D Cannon, Richard
2017-10-01
The aim was to investigate the relationship between groups of bacteria identified by cluster analysis of the DGGE fingerprints and the amounts and diversity of yeast present. Bacterial and yeast populations in saliva samples from 24 adults were analysed using denaturing gradient gel electrophoresis (DGGE) of the bacteria present and by yeast culture. Eubacterial DGGE banding patterns showed considerable variation between individuals. Seventy one different amplicon bands were detected, the band number per saliva sample ranged from 21 to 39 (mean±SD=29.3±4.9). Cluster and principal component analysis of the bacterial DGGE patterns yielded three major clusters containing 20 of the samples. Seventeen of the 24 (71%) saliva samples were yeast positive with concentrations up to 10 3 cfu/mL. Candida albicans was the predominant species in saliva samples although six other yeast species, including Candida dubliniensis, Candida tropicalis, Candida krusei, Candida guilliermondii, Candida rugosa and Saccharomyces cerevisiae, were identified. The presence, concentration, and species of yeast in samples showed no clear relationship to the bacterial clusters. Despite indications of in vitro bacteria-yeast interactions, there was a lack of association between the presence, identity and diversity of yeasts and the bacterial DGGE fingerprint clusters in saliva. This suggests significant ecological individual-specificity of these associations in highly complex in vivo oral biofilm systems under normal oral conditions. Copyright © 2017 Elsevier Ltd. All rights reserved.
HPLC-DAD-ESI-MS Analysis of Flavonoids from Leaves of Different Cultivars of Sweet Osmanthus.
Wang, Yiguang; Fu, Jianxin; Zhang, Chao; Zhao, Hongbo
2016-09-14
Osmanthus fragrans Lour. has traditionally been a popular ornamental plant in China. In this study, ethanol extracts of the leaves of four cultivar groups of O. fragrans were analyzed by high-performance liquid chromatography coupled with diode array detection (HPLC-DAD) and high-performance liquid chromatography with electrospray ionization and mass spectrometry (HPLC-ESI-MS). The results suggest that variation in flavonoids among O. fragrans cultivars is quantitative, rather than qualitative. Fifteen components were detected and separated, among which, the structures of 11 flavonoids and two coumarins were identified or tentatively identified. According to principal component analysis (PCA) and hierarchical cluster analysis (HCA) based on the abundance of these components (expressed as rutin equivalents), 22 selected cultivars were classified into four clusters. The seven cultivars from Cluster III ('Xiaoye Sugui', 'Boye Jingui', 'Wuyi Dangui', 'Yingye Dangui', 'Danzhuang', 'Foding Zhu', and 'Tianxiang Taige'), which are enriched in rutin and total flavonoids, and 'Sijigui' from Cluster II which contained the highest amounts of kaempferol glycosides and apigenin 7-O-glucoside, could be selected as potential pharmaceutical resources. However, the chemotaxonomy in this paper does not correlate with the distribution of the existing cultivar groups, demonstrating that the distribution of flavonoids in O. fragrans leaves does not provide an effective means of classification for O. fragrans cultivars based on flower color.
Santos, D N; Nunes, C F; Setotaw, T A; Pio, R; Pasqual, M; Cançado, G M A
2016-12-19
Cambuci (Campomanesia phaea) belongs to the Myrtaceae family and is native to the Atlantic Forest of Brazil. It has ecological and social appeal but is exposed to problems associated with environmental degradation and expansion of agricultural activities in the region. Comprehensive studies on this species are rare, making its conservation and genetic improvement difficult. Thus, it is important to develop research activities to understand the current situation of the species as well as to make recommendations for its conservation and use. This study was performed to characterize the cambuci accessions found in the germplasm bank of Coordenadoria de Assistência Técnica Integral using inter-simple sequence repeat markers, with the goal of understanding the plant's population structure. The results showed the existence of some level of genetic diversity among the cambuci accessions that could be exploited for the genetic improvement of the species. Principal coordinate analysis and discriminant analysis clustered the 80 accessions into three groups, whereas Bayesian model-based clustering analysis clustered them into two groups. The formation of two cluster groups and the high membership coefficients within the groups pointed out the importance of further collection to cover more areas and more genetic variability within the species. The study also showed the lack of conservation activities; therefore, more attention from the appropriate organizations is needed to plan and implement natural and ex situ conservation activities.
Hamad, Ismail; AbdElgawad, Hamada; Al Jaouni, Soad; Zinta, Gaurav; Asard, Han; Hassan, Sherif; Hegab, Momtaz; Hagagy, Nashwa; Selim, Samy
2015-07-27
Date palm is an important crop, especially in the hot-arid regions of the world. Date palm fruits have high nutritional and therapeutic value and possess significant antibacterial and antifungal properties. In this study, we performed bioactivity analyses and metabolic profiling of date fruits of 12 cultivars from Saudi Arabia to assess their nutritional value. Our results showed that the date extracts from different cultivars have different free radical scavenging and anti-lipid peroxidation activities. Moreover, the cultivars showed significant differences in their chemical composition, e.g., the phenolic content (10.4-22.1 mg/100 g DW), amino acids (37-108 μmol·g-1 FW) and minerals (237-969 mg/100 g DW). Principal component analysis (PCA) showed a clear separation of the cultivars into four different groups. The first group consisted of the Sokary, Nabtit Ali cultivars, the second group of Khlas Al Kharj, Khla Al Qassim, Mabroom, Khlas Al Ahsa, the third group of Khals Elshiokh, Nabot Saif, Khodry, and the fourth group consisted of Ajwa Al Madinah, Saffawy, Rashodia, cultivars. Hierarchical cluster analysis (HCA) revealed clustering of date cultivars into two groups. The first cluster consisted of the Sokary, Rashodia and Nabtit Ali cultivars, and the second cluster contained all the other tested cultivars. These results indicate that date fruits have high nutritive value, and different cultivars have different chemical composition.
A stochastic model of weather states and concurrent daily precipitation at multiple precipitation stations is described. our algorithms are invested for classification of daily weather states; k means, fuzzy clustering, principal components, and principal components coupled with ...
Costa, Patrício Soares; Santos, Nadine Correia; Cunha, Pedro; Cotter, Jorge; Sousa, Nuno
2013-01-01
The main focus of this study was to illustrate the applicability of multiple correspondence analysis (MCA) in detecting and representing underlying structures in large datasets used to investigate cognitive ageing. Principal component analysis (PCA) was used to obtain main cognitive dimensions, and MCA was used to detect and explore relationships between cognitive, clinical, physical, and lifestyle variables. Two PCA dimensions were identified (general cognition/executive function and memory), and two MCA dimensions were retained. Poorer cognitive performance was associated with older age, less school years, unhealthier lifestyle indicators, and presence of pathology. The first MCA dimension indicated the clustering of general/executive function and lifestyle indicators and education, while the second association was between memory and clinical parameters and age. The clustering analysis with object scores method was used to identify groups sharing similar characteristics. The weaker cognitive clusters in terms of memory and executive function comprised individuals with characteristics contributing to a higher MCA dimensional mean score (age, less education, and presence of indicators of unhealthier lifestyle habits and/or clinical pathologies). MCA provided a powerful tool to explore complex ageing data, covering multiple and diverse variables, showing if a relationship exists and how variables are related, and offering statistical results that can be seen both analytically and visually.
Chemometric analysis of minerals in gluten-free products.
Gliszczyńska-Świgło, Anna; Klimczak, Inga; Rybicka, Iga
2018-06-01
Numerous studies indicate mineral deficiencies in people on a gluten-free (GF) diet. These deficiencies may indicate that GF products are a less valuable source of minerals than gluten-containing products. In the study, the nutritional quality of 50 GF products is discussed taking into account the nutritional requirements for minerals expressed as percentage of recommended daily allowance (%RDA) or percentage of adequate intake (%AI) for a model celiac patient. Elements analyzed were calcium, potassium, magnesium, sodium, copper, iron, manganese, and zinc. Analysis of %RDA or %AI was performed using principal component analysis (PCA) and hierarchical cluster analysis (HCA). Using PCA, the differentiation between products based on rice, corn, potato, GF wheat starch and based on buckwheat, chickpea, millet, oats, amaranth, teff, quinoa, chestnut, and acorn was possible. In the HCA, four clusters were created. The main criterion determining the adherence of the sample to the cluster was the content of all minerals included to HCA (K, Mg, Cu, Fe, Mn); however, only the Mn content differentiated four formed groups. GF products made of buckwheat, chickpea, millet, oats, amaranth, teff, quinoa, chestnut, and acorn are better source of minerals than based on other GF raw materials, what was confirmed by PCA and HCA. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Shyamalamma, S; Chandra, S B C; Hegde, M; Naryanswamy, P
2008-07-22
Artocarpus heterophyllus Lam., commonly called jackfruit, is a medium-sized evergreen tree that bears high yields of the largest known edible fruit. Yet, it has been little explored commercially due to wide variation in fruit quality. The genetic diversity and genetic relatedness of 50 jackfruit accessions were studied using amplified fragment length polymorphism markers. Of 16 primer pairs evaluated, eight were selected for screening of genotypes based on the number and quality of polymorphic fragments produced. These primer combinations produced 5976 bands, 1267 (22%) of which were polymorphic. Among the jackfruit accessions, the similarity coefficient ranged from 0.137 to 0.978; the accessions also shared a large number of monomorphic fragments (78%). Cluster analysis and principal component analysis grouped all jackfruit genotypes into three major clusters. Cluster I included the genotypes grown in a jackfruit region of Karnataka, called Tamaka, with very dry conditions; cluster II contained the genotypes collected from locations having medium to heavy rainfall in Karnataka; cluster III grouped the genotypes in distant locations with different environmental conditions. Strong coincidence of these amplified fragment length polymorphism-based groupings with geographical localities as well as morphological characters was observed. We found moderate genetic diversity in these jackfruit accessions. This information should be useful for tree breeding programs, as part of our effort to popularize jackfruit as a commercial crop.
Li, Siyue; Zhang, Quanfa
2010-04-15
A data matrix (4032 observations), obtained during a 2-year monitoring period (2005-2006) from 42 sites in the upper Han River is subjected to various multivariate statistical techniques including cluster analysis, principal component analysis (PCA), factor analysis (FA), correlation analysis and analysis of variance to determine the spatial characterization of dissolved trace elements and heavy metals. Our results indicate that waters in the upper Han River are primarily polluted by Al, As, Cd, Pb, Sb and Se, and the potential pollutants include Ba, Cr, Hg, Mn and Ni. Spatial distribution of trace metals indicates the polluted sections mainly concentrate in the Danjiang, Danjiangkou Reservoir catchment and Hanzhong Plain, and the most contaminated river is in the Hanzhong Plain. Q-model clustering depends on geographical location of sampling sites and groups the 42 sampling sites into four clusters, i.e., Danjiang, Danjiangkou Reservoir region (lower catchment), upper catchment and one river in headwaters pertaining to water quality. The headwaters, Danjiang and lower catchment, and upper catchment correspond to very high polluted, moderate polluted and relatively low polluted regions, respectively. Additionally, PCA/FA and correlation analysis demonstrates that Al, Cd, Mn, Ni, Fe, Si and Sr are controlled by natural sources, whereas the other metals appear to be primarily controlled by anthropogenic origins though geogenic source contributing to them. 2009 Elsevier B.V. All rights reserved.
Bioclimatic Classification of Northeast Asia for climate change response
NASA Astrophysics Data System (ADS)
Choi, Y.; Jeon, S. W.; Lim, C. H.
2016-12-01
As climate change has been getting worse, we should monitor the change of biodiversity, and distribution of species to handle the crisis and take advantage of climate change. The development of bioclimatic map which classifies land into homogenous zones by similar environment properties is the first step to establish a strategy. Statistically derived classifications of land provide useful spatial frameworks to support ecosystem research, monitoring and policy decisions. Many countries are trying to make this kind of map and actively utilize it to ecosystem conservation and management. However, the Northeast Asia including North Korea doesn't have detailed environmental information, and has not built environmental classification map. Therefore, this study presents a bioclimatic map of Northeast Asia based on statistical clustering of bioclimate data. Bioclim data ver1.4 which provided by WorldClim were considered for inclusion in a model. Eight of the most relevant climate variables were selected by correlation analysis, based on previous studies. Principal Components Analysis (PCA) was used to explain 86% of the variation into three independent dimensions, which were subsequently clustered using an ISODATA clustering. The bioclimatic zone of Northeast Asia could consist of 29, 35, and 50 zones. This bioclimatic map has a 30' resolution. To assess the accuracy, the correlation coefficient was calculated between the first principal component values of the classification variables and the vegetation index, Gross Primary Production (GPP). It shows about 0.5 Pearson correlation coefficient. This study constructed Northeast Asia bioclimatic map by statistical method with high resolution, but in order to better reflect the realities, the variety of climate variables should be considered. Also, further studies should do more quantitative and qualitative validation in various ways. Then, this could be used more effectively to support decision making on climate change adaptation.
Automated Classification and Analysis of Non-metallic Inclusion Data Sets
NASA Astrophysics Data System (ADS)
Abdulsalam, Mohammad; Zhang, Tongsheng; Tan, Jia; Webler, Bryan A.
2018-05-01
The aim of this study is to utilize principal component analysis (PCA), clustering methods, and correlation analysis to condense and examine large, multivariate data sets produced from automated analysis of non-metallic inclusions. Non-metallic inclusions play a major role in defining the properties of steel and their examination has been greatly aided by automated analysis in scanning electron microscopes equipped with energy dispersive X-ray spectroscopy. The methods were applied to analyze inclusions on two sets of samples: two laboratory-scale samples and four industrial samples from a near-finished 4140 alloy steel components with varying machinability. The laboratory samples had well-defined inclusions chemistries, composed of MgO-Al2O3-CaO, spinel (MgO-Al2O3), and calcium aluminate inclusions. The industrial samples contained MnS inclusions as well as (Ca,Mn)S + calcium aluminate oxide inclusions. PCA could be used to reduce inclusion chemistry variables to a 2D plot, which revealed inclusion chemistry groupings in the samples. Clustering methods were used to automatically classify inclusion chemistry measurements into groups, i.e., no user-defined rules were required.
Cholera Epidemic in Guinea-Bissau (2008): The Importance of “Place”
Luquero, Francisco J.; Banga, Cunhate Na; Remartínez, Daniel; Palma, Pedro Pablo; Baron, Emanuel; Grais, Rebeca F.
2011-01-01
Background As resources are limited when responding to cholera outbreaks, knowledge about where to orient interventions is crucial. We describe the cholera epidemic affecting Guinea-Bissau in 2008 focusing on the geographical spread in order to guide prevention and control activities. Methodology/Principal Findings We conducted two studies: 1) a descriptive analysis of the cholera epidemic in Guinea-Bissau focusing on its geographical spread (country level and within the capital); and 2) a cross-sectional study to measure the prevalence of houses with at least one cholera case in the most affected neighbourhood of the capital (Bairro Bandim) to detect clustering of households with cases (cluster analysis). All cholera cases attending the cholera treatment centres in Guinea-Bissau who fulfilled a modified World Health Organization clinical case definition during the epidemic were included in the descriptive study. For the cluster analysis, a sample of houses was selected from a satellite photo (Google Earth™); 140 houses (and the four closest houses) were assessed from the 2,202 identified structures. We applied K-functions and Kernel smoothing to detect clustering. We confirmed the clustering using Kulldorff's spatial scan statistic. A total of 14,222 cases and 225 deaths were reported in the country (AR = 0.94%, CFR = 1.64%). The more affected regions were Biombo, Bijagos and Bissau (the capital). Bairro Bandim was the most affected neighborhood of the capital (AR = 4.0). We found at least one case in 22.7% of the houses (95%CI: 19.5–26.2) in this neighborhood. The cluster analysis identified two areas within Bairro Bandim at highest risk: a market and an intersection where runoff accumulates waste (p<0.001). Conclusions/Significance Our analysis allowed for the identification of the most affected regions in Guinea-Bissau during the 2008 cholera outbreak, and the most affected areas within the capital. This information was essential for making decisions on where to reinforce treatment and to guide control and prevention activities. PMID:21572530
Li, Tao; Sun, Guihua; Ma, Shengzhong; Liang, Kai; Yang, Chupeng; Li, Bo; Luo, Weidong
2016-11-15
Concentration, spatial distribution, composition and sources of polycyclic aromatic hydrocarbons (PAHs) were investigated based on measurements of 16 PAH compounds in surface sediments of the western Taiwan Strait. Total PAH concentrations ranged from 2.41 to 218.54ngg -1 . Cluster analysis identified three site clusters representing the northern, central and southern regions. Sedimentary PAHs mainly originated from a mixture of pyrolytic and petrogenic in the north, from pyrolytic in the central, and from petrogenic in the south. An end-member mixing model was performed using PAH compound data to estimate mixing proportions for unknown end-members (i.e., extreme-value sample points) proposed by principal component analysis (PCA). The results showed that the analyzed samples can be expressed as mixtures of three end-members, and the mixing of different end-members was strongly related to the transport pathway controlled by two currents, which alternately prevail in the Taiwan Strait during different seasons. Copyright © 2016. Published by Elsevier Ltd.
H, Maulidiani; Khatib, Alfi; Shaari, Khozirah; Abas, Faridah; Shitan, Mahendran; Kneer, Ralf; Neto, Victor; Lajis, Nordin H
2012-01-11
The metabolites of three species of Apiaceae, also known as Pegaga, were analyzed utilizing (1)H NMR spectroscopy and multivariate data analysis. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) resolved the species, Centella asiatica, Hydrocotyle bonariensis, and Hydrocotyle sibthorpioides, into three clusters. The saponins, asiaticoside and madecassoside, along with chlorogenic acids were the metabolites that contributed most to the separation. Furthermore, the effects of growth-lighting condition to metabolite contents were also investigated. The extracts of C. asiatica grown in full-day light exposure exhibited a stronger radical scavenging activity and contained more triterpenes (asiaticoside and madecassoside), flavonoids, and chlorogenic acids as compared to plants grown in 50% shade. This study established the potential of using a combination of (1)H NMR spectroscopy and multivariate data analyses in differentiating three closely related species and the effects of growth lighting, based on their metabolite contents and identification of the markers contributing to their differences.
Puma (Puma concolor) epididymal sperm morphometry
Cucho, Hernán; Alarcón, Virgilio; Ordóñez, César; Ampuero, Enrique; Meza, Aydee; Soler, Carles
2016-01-01
The Andean puma (Puma concolor) has not been widely studied, particularly in reference to its semen characteristics. The aim of the present study was to define the morphometry of puma sperm heads and classify their subpopulations by cluster analysis. Samples were recovered postmortem from two epididymides from one animal and prepared for morphological observation after staining with the Hemacolor kit. Morphometric data were obtained from 581 spermatozoa using a CASA-Morph system, rendering 13 morphometric parameters. The principal component (PC) analysis was performed followed by cluster analysis for the establishment of subpopulations. Two PC components were obtained, the first related to size and the second to shape. Three subpopulations were observed, corresponding to elongated and intermediate-size sperm heads and acrosomes, to large heads with large acrosomes, and to small heads with short acrosomes. In conclusion, puma spermatozoa showed no uniform sperm morphology but three clear subpopulations. These results should be used for future work in the establishment of an adequate germplasm bank of this species. PMID:27678466
Puma (Puma concolor) epididymal sperm morphometry.
Cucho, Hernán; Alarcón, Virgilio; Ordóñez, César; Ampuero, Enrique; Meza, Aydee; Soler, Carles
2016-01-01
The Andean puma (Puma concolor) has not been widely studied, particularly in reference to its semen characteristics. The aim of the present study was to define the morphometry of puma sperm heads and classify their subpopulations by cluster analysis. Samples were recovered postmortem from two epididymides from one animal and prepared for morphological observation after staining with the Hemacolor kit. Morphometric data were obtained from 581 spermatozoa using a CASA-Morph system, rendering 13 morphometric parameters. The principal component (PC) analysis was performed followed by cluster analysis for the establishment of subpopulations. Two PC components were obtained, the first related to size and the second to shape. Three subpopulations were observed, corresponding to elongated and intermediate-size sperm heads and acrosomes, to large heads with large acrosomes, and to small heads with short acrosomes. In conclusion, puma spermatozoa showed no uniform sperm morphology but three clear subpopulations. These results should be used for future work in the establishment of an adequate germplasm bank of this species.
Essential Oil Composition of Pinus peuce Griseb. Needles and Twigs from Two National Parks of Kosovo
Hajdari, Avni; Mustafa, Behxhet; Selimi, Hyrmete; Veselaj, Zeqir; Breznica, Pranvera; Novak, Johannes
2016-01-01
The principal aim of this study was to analyze the chemical composition and qualitative and quantitative variability of essential oils obtained from seven naturally grown populations of the Pinus peuce Grisebach, Pinaceae in Kosovo. Plant materials were collected from three populations in the Sharri National Park and from four other populations in the Bjeshkët e Nemuna National Park, in Kosovo. Essential oils were obtained by steam distillation and analyzed by GC-FID (Gas Chromatography-Flame Ionization Detection) and GC-MS (Gas Chromatography-Mass Spectrometry). The results showed that the yield of essential oils (v/w dry weight) varied depending on the origin of population and the plant organs and ranged from 0.7 to 3.3%. In total, 51 compounds were identified. The main compounds were α-pinene (needles: 21.6–34.9%; twigs: 11.0–24%), β-phellandrene (needles: 4.1–27.7; twigs: 29.0–49.8%), and β-pinene (needles: 10.0–16.1; twigs: 6.9–20.7%). HCA (Hierarchical Cluster Analysis) and PCA (Principal Component Analyses) were used to assess geographical variations in essential oil composition. Statistical analysis showed that the analyzed populations are grouped in three main clusters which seem to reflect microclimatic conditions on the chemical composition of the essential oils. PMID:27579344
Hajdari, Avni; Mustafa, Behxhet; Nebija, Dashnor; Miftari, Elheme; Quave, Cassandra L; Novak, Johannes
2015-11-01
Ripe cones of Juniperus communis L. (Cupressaceae) were collected from five wild populations in Kosovo, with the aim of investigating the chemical composition and natural variation of essential oils between and within wild populations. Ripe cones were collected, air dried, crushed, and the essential oils obtained by hydrodistillation. The essential-oil constituents were identified by GC-FID and GC/MS analyses. The yield of essential oil differed depending on the population origins and ranged from 0.4 to 3.8% (v/w, based on the dry weight). In total, 42 compounds were identified in the essential oils of all populations. The principal components of the cone-essential oils were α-pinene, followed by β-myrcene, sabinene, and D-limonene. Taking into consideration the yield and chemical composition, the essential oil originating from various collection sites in Kosovo fulfilled the minimum requirements for J. communis essential oils of the European Pharmacopoeia. Hierarchical cluster analysis (HCA) and principal component analysis (PCA) were used to determine the influence of the geographical variations on the essential-oil composition. These statistical analyses suggested that the clustering of populations was not related to their geographic location, but rather appeared to be linked to local selective forces acting on the chemotype diversity. Copyright © 2015 Verlag Helvetica Chimica Acta AG, Zürich.
Yamaguchi-Kabata, Yumi; Tsunoda, Tatsuhiko; Kumasaka, Natsuhiko; Takahashi, Atsushi; Hosono, Naoya; Kubo, Michiaki; Nakamura, Yusuke; Kamatani, Naoyuki
2012-05-01
Although the Japanese population has a rather low genetic diversity, we recently confirmed the presence of two main clusters (the Hondo and Ryukyu clusters) through principal component analysis of genome-wide single-nucleotide polymorphism (SNP) genotypes. Understanding the genetic differences between the two main clusters requires further genome-wide analyses based on a dense SNP set and comparison of haplotype frequencies. In the present study, we determined haplotypes for the Hondo cluster of the Japanese population by detecting SNP homozygotes with 388,591 autosomal SNPs from 18,379 individuals and estimated the haplotype frequencies. Haplotypes for the Ryukyu cluster were inferred by a statistical approach using the genotype data from 504 individuals. We then compared the haplotype frequencies between the Hondo and Ryukyu clusters. In most genomic regions, the haplotype frequencies in the Hondo and Ryukyu clusters were very similar. However, in addition to the human leukocyte antigen region on chromosome 6, other genomic regions (chromosomes 3, 4, 5, 7, 10 and 12) showed dissimilarities in haplotype frequency. These regions were enriched for genes involved in the immune system, cell-cell adhesion and the intracellular signaling cascade. These differentiated genomic regions between the Hondo and Ryukyu clusters are of interest because they (1) should be examined carefully in association studies and (2) likely contain genes responsible for morphological or physiological differences between the two groups.
Cardiometabolic risk clustering in spinal cord injury: results of exploratory factor analysis.
Libin, Alexander; Tinsley, Emily A; Nash, Mark S; Mendez, Armando J; Burns, Patricia; Elrod, Matt; Hamm, Larry F; Groah, Suzanne L
2013-01-01
Evidence suggests an elevated prevalence of cardiometabolic risks among persons with spinal cord injury (SCI); however, the unique clustering of risk factors in this population has not been fully explored. The purpose of this study was to describe unique clustering of cardiometabolic risk factors differentiated by level of injury. One hundred twenty-one subjects (mean 37 ± 12 years; range, 18-73) with chronic C5 to T12 motor complete SCI were studied. Assessments included medical histories, anthropometrics and blood pressure, and fasting serum lipids, glucose, insulin, and hemoglobin A1c (HbA1c). The most common cardiometabolic risk factors were overweight/obesity, high levels of low-density lipoprotein (LDL-C), and low levels of high-density lipoprotein (HDL-C). Risk clustering was found in 76.9% of the population. Exploratory principal component factor analysis using varimax rotation revealed a 3-factor model in persons with paraplegia (65.4% variance) and a 4-factor solution in persons with tetraplegia (73.3% variance). The differences between groups were emphasized by the varied composition of the extracted factors: Lipid Profile A (total cholesterol [TC] and LDL-C), Body Mass-Hypertension Profile (body mass index [BMI], systolic blood pressure [SBP], and fasting insulin [FI]); Glycemic Profile (fasting glucose and HbA1c), and Lipid Profile B (TG and HDL-C). BMI and SBP formed a separate factor only in persons with tetraplegia. Although the majority of the population with SCI has risk clustering, the composition of the risk clusters may be dependent on level of injury, based on a factor analysis group comparison. This is clinically plausible and relevant as tetraplegics tend to be hypo- to normotensive and more sedentary, resulting in lower HDL-C and a greater propensity toward impaired carbohydrate metabolism.
Choudhary, Shashi Bhushan; Sharma, Hariom Kumar; Kumar, Arroju Anil; Maruthi, Rangappa Thimmaiah; Mitra, Jiban; Chowdhury, Isholeena; Singh, Binay Kumar; Karmakar, Pran Gobinda
2017-02-01
A total of 130 flax accessions of diverse morphotypes and worldwide origin were assessed for genetic diversity and population structure using 11 morphological traits and microsatellite markers (15 gSSRs and 7 EST-SSRs). Analysis performed after classifying these accessions on the basis of plant height, branching pattern, seed size, Indian/foreign origin into six categories called sub-populations viz. fibre type exotic, fibre type indigenous, intermediate type exotic, intermediate type indigenous, linseed type exotic and linseed type indigenous. The study assessed different diversity indices, AMOVA, population structure and included a principal coordinate analysis based on different marker systems. The highest diversity was exhibited by gSSR markers (SI=0.46; He=0.31; P=85.11). AMOVA based on all markers explained significant difference among fibre type, intermediate type and linseed type populations of flax. In terms of variation explained by different markers, EST-SSR markers (12%) better differentiated flax populations compared to morphological (9%) and gSSR (6%) markers at P=0.01. The maximum Nei's unbiased genetic distance (D=0.11) was observed between fibre type and linseed type exotic sub-populations based on EST-SSR markers. The combined structure analysis by using all markers grouped Indian fibre type accessions (63.4%) in a separate cluster along with the Indian intermediate type (48.7%), whereas Indian accessions (82.16%) of linseed type constituted an independent cluster. These findings were supported by the results of the principal coordinate analysis. Morphological markers employed in the study found complementary with microsatellite based markers in deciphering genetic diversity and population structure of the flax germplasm. Copyright © 2016 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Phylogenetic Evidence for Lateral Gene Transfer in the Intestine of Marine Iguanas
Nelson, David M.; Cann, Isaac K. O.; Altermann, Eric; Mackie, Roderick I.
2010-01-01
Background Lateral gene transfer (LGT) appears to promote genotypic and phenotypic variation in microbial communities in a range of environments, including the mammalian intestine. However, the extent and mechanisms of LGT in intestinal microbial communities of non-mammalian hosts remains poorly understood. Methodology/Principal Findings We sequenced two fosmid inserts obtained from a genomic DNA library derived from an agar-degrading enrichment culture of marine iguana fecal material. The inserts harbored 16S rRNA genes that place the organism from which they originated within Clostridium cluster IV, a well documented group that habitats the mammalian intestinal tract. However, sequence analysis indicates that 52% of the protein-coding genes on the fosmids have top BLASTX hits to bacterial species that are not members of Clostridium cluster IV, and phylogenetic analysis suggests that at least 10 of 44 coding genes on the fosmids may have been transferred from Clostridium cluster XIVa to cluster IV. The fosmids encoded four transposase-encoding genes and an integrase-encoding gene, suggesting their involvement in LGT. In addition, several coding genes likely involved in sugar transport were probably acquired through LGT. Conclusion Our phylogenetic evidence suggests that LGT may be common among phylogenetically distinct members of the phylum Firmicutes inhabiting the intestinal tract of marine iguanas. PMID:20520734
State-Space Estimation of Soil Organic Carbon Stock
NASA Astrophysics Data System (ADS)
Ogunwole, Joshua O.; Timm, Luis C.; Obidike-Ugwu, Evelyn O.; Gabriels, Donald M.
2014-04-01
Understanding soil spatial variability and identifying soil parameters most determinant to soil organic carbon stock is pivotal to precision in ecological modelling, prediction, estimation and management of soil within a landscape. This study investigates and describes field soil variability and its structural pattern for agricultural management decisions. The main aim was to relate variation in soil organic carbon stock to soil properties and to estimate soil organic carbon stock from the soil properties. A transect sampling of 100 points at 3 m intervals was carried out. Soils were sampled and analyzed for soil organic carbon and other selected soil properties along with determination of dry aggregate and water-stable aggregate fractions. Principal component analysis, geostatistics, and state-space analysis were conducted on the analyzed soil properties. The first three principal components explained 53.2% of the total variation; Principal Component 1 was dominated by soil exchange complex and dry sieved macroaggregates clusters. Exponential semivariogram model described the structure of soil organic carbon stock with a strong dependence indicating that soil organic carbon values were correlated up to 10.8m.Neighbouring values of soil organic carbon stock, all waterstable aggregate fractions, and dithionite and pyrophosphate iron gave reliable estimate of soil organic carbon stock by state-space.
Vijaykumar, Archana; Saini, Ajay; Jawali, Narendra
2012-01-01
Background and aims Intra-species hybridization and incompletely homogenized ribosomal RNA repeat units have earlier been reported in 21 accessions of Vigna unguiculata from six subspecies using internal transcribed spacer (ITS) and 5S intergenic spacer (IGS) analyses. However, the relationships among these accessions were not clear from these analyses. We therefore assessed intra-species hybridization in the same set of accessions. Methodology Arbitrarily primed polymerase chain reaction (AP-PCR) analysis was carried out using 12 primers. The PCR products were resolved on agarose gels and the DNA fragments were scored manually. Genetic relationships were inferred by TREECON software using unweighted paired group method with arithmetic averages (UPGMA) cluster analysis evaluated by bootstrapping and compared with previous analyses based on ITS and 5S IGS. Principal results A total of 202 (86 %) fragments were found to be polymorphic and used for generating a genetic distance matrix. Twenty-one V. unguiculata accessions were grouped into three main clusters. The cultivated subspecies (var. unguiculata) and most of its wild progenitors (var. spontanea) were placed in cluster I along with ssp. pubescens and ssp. stenophylla. Whereas var. spontanea were grouped with ssp. alba and ssp. tenuis accessions in cluster II, ssp. alba and ssp. baoulensis were included in cluster III. Close affinities of ssp. unguiculata, ssp. alba and ssp. tenuis suggested inter-subspecies hybridization. Conclusions Multi-locus AP-PCR analysis reveals that intra-species hybridization is prevalent among V. unguiculata subspecies and suggests that grouping of accessions from two different subspecies is not solely due to the similarity in the ITS and 5S IGS regions but also due to other regions of the genome. PMID:22619698
NASA Astrophysics Data System (ADS)
Song, Bowen; Zhang, Guopeng; Wang, Huafeng; Zhu, Wei; Liang, Zhengrong
2013-02-01
Various types of features, e.g., geometric features, texture features, projection features etc., have been introduced for polyp detection and differentiation tasks via computer aided detection and diagnosis (CAD) for computed tomography colonography (CTC). Although these features together cover more information of the data, some of them are statistically highly-related to others, which made the feature set redundant and burdened the computation task of CAD. In this paper, we proposed a new dimension reduction method which combines hierarchical clustering and principal component analysis (PCA) for false positives (FPs) reduction task. First, we group all the features based on their similarity using hierarchical clustering, and then PCA is employed within each group. Different numbers of principal components are selected from each group to form the final feature set. Support vector machine is used to perform the classification. The results show that when three principal components were chosen from each group we can achieve an area under the curve of receiver operating characteristics of 0.905, which is as high as the original dataset. Meanwhile, the computation time is reduced by 70% and the feature set size is reduce by 77%. It can be concluded that the proposed method captures the most important information of the feature set and the classification accuracy is not affected after the dimension reduction. The result is promising and further investigation, such as automatically threshold setting, are worthwhile and are under progress.
Utah Principals Academy, 1987-1988.
ERIC Educational Resources Information Center
Utah State Board of Education, Salt Lake City.
Improving instructional leadership skills of principals is the focus of the academy. Following a foreword and mission statement by James R. Moss, the state superintendent of public instruction, the booklet describes three programs that help to achieve the academy's goals: Academy Fellows, Academy Seminars, and Cluster Grants. Titles and authors of…
Wang, Ling; Lan, Xin-Yi; Ji, Jun; Zhang, Chun-Feng; Li, Fei; Wang, Chong-Zhi; Yuan, Chun-Su
2018-06-01
Rheumatoid arthritis (RA) is one of the most prevalent chronic inflammatory and angiogenic diseases. The aim of this study was to evaluate the anti-inflammatory and anti-angiogenic activities in vitro of eight diterpenoids isolated from Daphne genkwa. LC-MS was used to identify diterpenes isolated from D. genkwa. The anti-inflammatory and anti-angiogenic activities of eight diterpenoids were evaluated on LPS-induced macrophage RAW264.7 cells and TNF-α-stimulated human umbilical vein endothelial cells (HUVECs) using hierarchical cluster analysis (HCA) and principal component analysis (PCA). The eight diterpenes isolated from D. genkwa were identified as yuanhuaphnin, isoyuanhuacine, 12-O-(2'E,4'E-decadienoyl)-4-hydroxyphorbol-13-acetyl, yuanhuagine, isoyuanhuadine, yuanhuadine, yuanhuaoate C and yuanhuacine. All the eight diterpenes significantly down-regulated the excessive secretion of TNF-α, IL-6, IL-1β and NO in LPS-induced RAW264.7 macrophages. However, only 12-O-(2'E,4'E-decadienoyl)-4-hydroxyphorbol-13-acetyl markedly reduced production of VEGF, MMP-3, ICAM and VCAM in TNF-α-stimulated HUVECs. HCA obtained 4 clusters, containing 12-O-(2'E,4'E-decadienoyl)-4-hydroxyphorbol-13-acetyl, isoyuanhuacine, isoyuanhuadine and five other compounds. PCA showed that the ranking of diterpenes sorted by efficacy from highest to lowest was 12-O-(2'E,4'E-decadienoyl)-4-hydroxyphorbol-13-acetyl, yuanhuaphnin, isoyuanhuacine, yuanhuacine, yuanhuaoate C, yuanhuagine, isoyuanhuadine, yuanhuadine. In conclusion, eight diterpenes isolated from D. genkwa showed different levels of activity in LPS-induced RAW264.7 cells and TNF-α-stimulated HUVECs. The comprehensive evaluation of activity by HCA and PCA indicated that of the eight diterpenes, 12-O-(2'E,4'E-decadienoyl)-4-hydroxyphorbol-13-acetyl was the best, and can be developed as a new drug for RA therapy.
Detection and tracking of gas plumes in LWIR hyperspectral video sequence data
NASA Astrophysics Data System (ADS)
Gerhart, Torin; Sunu, Justin; Lieu, Lauren; Merkurjev, Ekaterina; Chang, Jen-Mei; Gilles, Jérôme; Bertozzi, Andrea L.
2013-05-01
Automated detection of chemical plumes presents a segmentation challenge. The segmentation problem for gas plumes is difficult due to the diffusive nature of the cloud. The advantage of considering hyperspectral images in the gas plume detection problem over the conventional RGB imagery is the presence of non-visual data, allowing for a richer representation of information. In this paper we present an effective method of visualizing hyperspectral video sequences containing chemical plumes and investigate the effectiveness of segmentation techniques on these post-processed videos. Our approach uses a combination of dimension reduction and histogram equalization to prepare the hyperspectral videos for segmentation. First, Principal Components Analysis (PCA) is used to reduce the dimension of the entire video sequence. This is done by projecting each pixel onto the first few Principal Components resulting in a type of spectral filter. Next, a Midway method for histogram equalization is used. These methods redistribute the intensity values in order to reduce icker between frames. This properly prepares these high-dimensional video sequences for more traditional segmentation techniques. We compare the ability of various clustering techniques to properly segment the chemical plume. These include K-means, spectral clustering, and the Ginzburg-Landau functional.
Type 2 diabetes mellitus: distribution of genetic markers in Kazakh population.
Sikhayeva, Nurgul; Talzhanov, Yerkebulan; Iskakova, Aisha; Dzharmukhanov, Jarkyn; Nugmanova, Raushan; Zholdybaeva, Elena; Ramanculov, Erlan
2018-01-01
Ethnic differences exist in the frequencies of genetic variations that contribute to the risk of common disease. This study aimed to analyse the distribution of several genes, previously associated with susceptibility to type 2 diabetes and obesity-related phenotypes, in a Kazakh population. A total of 966 individuals belonging to the Kazakh ethnicity were recruited from an outpatient clinic. We genotyped 41 common single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes in other ethnic groups and 31 of these were in Hardy-Weinberg equilibrium. The obtained allele frequencies were further compared to publicly available data from other ethnic populations. Allele frequencies for other (compared) populations were pooled from the haplotype map (HapMap) database. Principal component analysis (PCA), cluster analysis, and multidimensional scaling (MDS) were used for the analysis of genetic relationship between the populations. Comparative analysis of allele frequencies of the studied SNPs showed significant differentiation among the studied populations. The Kazakh population was grouped with Asian populations according to the cluster analysis and with the Caucasian populations according to PCA. According to MDS, results of the current study show that the Kazakh population holds an intermediate position between Caucasian and Asian populations. A high percentage of population differentiation was observed between Kazakh and world populations. The Kazakh population was clustered with Caucasian populations, and this result may indicate a significant Caucasian component in the Kazakh gene pool.
Khosravi, Rasoul; Rezaei, Hamid Reza; Kaboli, Mohammad
2013-01-01
The genetic threat due to hybridization with free-ranging dogs is one major concern in wolf conservation. The identification of hybrids and extent of hybridization is important in the conservation and management of wolf populations. Genetic variation was analyzed at 15 unlinked loci in 28 dogs, 28 wolves, four known hybrids, two black wolves, and one dog with abnormal traits in Iran. Pritchard's model, multivariate ordination by principal component analysis and neighbor joining clustering were used for population clustering and individual assignment. Analysis of genetic variation showed that genetic variability is high in both wolf and dog populations in Iran. Values of H(E) in dog and wolf samples ranged from 0.75-0.92 and 0.77-0.92, respectively. The results of AMOVA showed that the two groups of dog and wolf were significantly different (F(ST) = 0.05 and R(ST) = 0.36; P < 0.001). In each of the three methods, wolf and dog samples were separated into two distinct clusters. Two dark wolves were assigned to the wolf cluster. Also these models detected D32 (dog with abnormal traits) and some other samples, which were assigned to more than one cluster and could be a hybrid. This study is the beginning of a genetic study in wolf populations in Iran, and our results reveal that as in other countries, hybridization between wolves and dogs is sporadic in Iran and can be a threat to wolf populations if human perturbations increase.
Analysis of the structure and dynamics of human serum albumin.
Guizado, T R Cuya
2014-10-01
Human serum albumin (HSA) is a biologically relevant protein that binds a variety of drugs and other small molecules. No less than 50 structures are deposited in the RCSB Protein Data Bank (PDB). Based on these structures, we first performed a clustering analysis. Despite the diversity of ligands, only two well defined conformations are detected, with a deviation of 0.46 nm between the average structures of the two clusters, while deviations within each cluster are smaller than 0.08 nm. Those two conformations are representative of the apoprotein and the HSA-myristate complex already identified in previous literature. Considering the structures within each cluster as a representative sample of the dynamical states of the corresponding conformation, we scrutinize the structural and dynamical differences between both conformations. Analysis of the fluctuations within each cluster set reveals that domain II is the most rigid one and better matches both structures. Then, taking this domain as reference, we show that the structural difference between both conformations can be expressed in terms of twist and hinge motions of domains I and III, respectively. We also characterize the dynamical difference between conformations by computing correlations and principal components for each set of dynamical states. The two conformations display different collective motions. The results are compared with those obtained from the trajectories of short molecular dynamics simulations, giving consistent outcomes. Let us remark that, beyond the relevance of the results for the structural and dynamical characterization of HAS conformations, the present methodology could be extended to other proteins in the PDB archive.
Organic Food Market Segmentation in Lebanon
NASA Astrophysics Data System (ADS)
Tleis, Malak; Roma, Rocco; Callieris, Roberta
2015-04-01
Organic farming in Lebanon is not a new concept. It started with the efforts of the private sector more than a decade ago and is still present even with the limited agricultural production. The local market is quite developed in comparison to neighboring countries, depending mainly on imports. Few studies were addressed to organic consumption in Lebanon, were none of them dealt with organic consumers analysis. Therefore, our objectives were to identify the profiles of Lebanese organic consumer and non organic consumer and to propose appropriate marketing strategies for each segment of consumer with the final aim of developing the Lebanese organic market. A survey, based on the use of closed-ended questionnaire, was addressed to 400 consumers in the capital, Beirut, from the end of February till the end of March 2014. Data underwent descriptive analyses, principal component analyses (PCA) and cluster analyses (k-means method) through the statistical software SPSS. Four cluster were obtained based on psychographic characteristics and willingness to pay (WTP) for the principal organic products purchased. "Localists" and "Health conscious" clusters constituted the largest proportion of the selected sample, thus were the most critical to be addressed by specific marketing strategies emphasizing the combination of local and organic food and the healthy properties of organic products. "Rational" and "Irregular" cluster were relatively small groups, addressed by pricing and promotional strategies. This study showed a positive attitude among Lebanese consumer towards organic food, where egoistic motives are prevailing over altruistic motives. High prices of organic commodities and low trust in organic farming, remain a constraint to levitating organic consumption. The combined efforts of the public and the private sector are required to spread the knowledge about positive environmental payback of organic agriculture and for the promotion of locally produced organic goods.
The evolution of cerebrotypes in birds.
Iwaniuk, Andrew N; Hurd, Peter L
2005-01-01
Multivariate analyses of brain composition in mammals, amphibians and fish have revealed the evolution of 'cerebrotypes' that reflect specific niches and/or clades. Here, we present the first demonstration of similar cerebrotypes in birds. Using principal component analysis and hierarchical clustering methods to analyze a data set of 67 species, we demonstrate that five main cerebrotypes can be recognized. One type is dominated by galliforms and pigeons, among other species, that all share relatively large brainstems, but can be further differentiated by the proportional size of the cerebellum and telencephalic regions. The second cerebrotype contains a range of species that all share relatively large cerebellar and small nidopallial volumes. A third type is composed of two species, the tawny frogmouth (Podargus strigoides) and an owl, both of which share extremely large Wulst volumes. Parrots and passerines, the principal members of the fourth group, possess much larger nidopallial, mesopallial and striatopallidal proportions than the other groups. The fifth cerebrotype contains species such as raptors and waterfowl that are not found at the extremes for any of the brain regions and could therefore be classified as 'generalist' brains. Overall, the clustering of species does not directly reflect the phylogenetic relationships among species, but there is a tendency for species within an order to clump together. There may also be a weak relationship between cerebrotype and developmental differences, but two of the main clusters contained species with both altricial and precocial developmental patterns. As a whole, the groupings do agree with behavioral and ecological similarities among species. Most notably, species that share similarities in locomotor behavior, mode of prey capture or cognitive ability are clustered together. The relationship between cerebrotype and behavior/ecology in birds suggests that future comparative studies of brain-behavior relationships will benefit from adopting a multivariate approach. Copyright 2005 S. Karger AG, Basel.
Gopinath, Kaundinya; Krishnamurthy, Venkatagiri; Sathian, K
2018-02-01
In a recent study, Eklund et al. employed resting-state functional magnetic resonance imaging data as a surrogate for null functional magnetic resonance imaging (fMRI) datasets and posited that cluster-wise family-wise error (FWE) rate-corrected inferences made by using parametric statistical methods in fMRI studies over the past two decades may have been invalid, particularly for cluster defining thresholds less stringent than p < 0.001; this was principally because the spatial autocorrelation functions (sACF) of fMRI data had been modeled incorrectly to follow a Gaussian form, whereas empirical data suggested otherwise. Here, we show that accounting for non-Gaussian signal components such as those arising from resting-state neural activity as well as physiological responses and motion artifacts in the null fMRI datasets yields first- and second-level general linear model analysis residuals with nearly uniform and Gaussian sACF. Further comparison with nonparametric permutation tests indicates that cluster-based FWE corrected inferences made with Gaussian spatial noise approximations are valid.
Robust fiber clustering of cerebral fiber bundles in white matter
NASA Astrophysics Data System (ADS)
Yao, Xufeng; Wang, Yongxiong; Zhuang, Songlin
2014-11-01
Diffusion tensor imaging fiber tracking (DTI-FT) has been widely accepted in the diagnosis and treatment of brain diseases. During the rendering pipeline of specific fiber tracts, the image noise and low resolution of DTI would lead to false propagations. In this paper, we propose a robust fiber clustering (FC) approach to diminish false fibers from one fiber tract. Our algorithm consists of three steps. Firstly, the optimized fiber assignment continuous tracking (FACT) is implemented to reconstruct one fiber tract; and then each curved fiber in the fiber tract is mapped to a point by kernel principal component analysis (KPCA); finally, the point clouds of fiber tract are clustered by hierarchical clustering which could distinguish false fibers from true fibers in one tract. In our experiment, the corticospinal tract (CST) in one case of human data in vivo was used to validate our method. Our method showed reliable capability in decreasing the false fibers in one tract. In conclusion, our method could effectively optimize the visualization of fiber bundles and would help a lot in the field of fiber evaluation.
Kumar, Raj; Sharma, Vishal
2017-03-15
The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%). Copyright © 2016 Elsevier B.V. All rights reserved.
Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan
2017-12-01
Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.
Li, Yaqian; Du, Xilin; Lu, Zhi John; Wu, Daqiang; Zhao, Yilei; Ren, Bin; Huang, Jiaofang; Huang, Xianqing; Xu, Yuhong; Xu, Yuquan
2011-01-01
Background Phenazines are important compounds produced by pseudomonads and other bacteria. Two phz gene clusters called phzA1-G1 and phzA2-G2, respectively, were found in the genome of Pseudomonas sp. M18, an effective biocontrol agent, which is highly homologous to the opportunistic human pathogen P. aeruginosa PAO1, however little is known about the correlation between the expressions of two phz gene clusters. Methodology/Principal Findings Two chromosomal insertion inactivated mutants for the two gene clusters were constructed respectively and the correlation between the expressions of two phz gene clusters was investigated in strain M18. Phenazine-1-carboxylic acid (PCA) molecules produced from phzA2-G2 gene cluster are able to auto-regulate expression itself and activate the expression of phzA1-G1 gene cluster in a circulated amplification pattern. However, the post-transcriptional expression of phzA1-G1 transcript was blocked principally through 5′-untranslated region (UTR). In contrast, the phzA2-G2 gene cluster was transcribed to a lesser extent and translated efficiently and was negatively regulated by the GacA signal transduction pathway, mainly at a post-transcriptional level. Conclusions/Significance A single molecule, PCA, produced in different quantities by the two phz gene clusters acted as the functional mediator and the two phz gene clusters developed a specific regulatory mechanism which acts through 5′-UTR to transfer a single, but complex bacterial signaling event in Pseudomonas sp. strain M18. PMID:21559370
Arboleda, Mark; Reichardt, Wolfgang
2009-01-01
In search for microbiological indicators of coral health and coral diseases, community profiles of coral-associated epizoic prokaryotes were investigated because of their dual potential as a source of coral pathogens and their antagonists. In pairwise samples of visually healthy and diseased coral specimens from Bolinao Bay (Pangasinan, Philippines), mixed biofilm communities of ectoderm- and mucus-colonizing epizoic prokaryotes were compared using fluorescent in situ hybridization (FISH). Oligonucleotide probes targeted 13 phylotypes representing the main taxonomic groups of marine prokaryotes. Coral taxa tended to show specific community profiles. An attempt to separate the profiles of healthy and diseased specimens by applying principal component analysis (PCA) to a (nonselective) collection of corals (affected by various diseases) proved unsuccessful. On the other hand, separate PCA clusters were obtained from healthy and diseased corals belonging to a single species (Pocillopora damicornis) only. This cluster formation was dominated by principal component 1 with the genus Vibrio accounting for 18%. At the same time, reef-site-specific clusters were formed as well. At a reef site exposed to pollution from intensive fish cage (Chanos chanos) farming, healthy P. damicornis were mainly (93%) colonized by unicellular cyanobacteria. The formal calculation of diversity parameters suggested that evenness in particular was driven by both health status and reef site location. Despite the low resolution of taxonomic levels achieved with FISH probes targeting only large phylotype groups, significant differences between healthy and diseased corals and also between polluted and nonpolluted reef sites were observed.
Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.
2015-01-01
This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375
Contrast improvement of terahertz images of thin histopathologic sections
Formanek, Florian; Brun, Marc-Aurèle; Yasuda, Akio
2011-01-01
We present terahertz images of 10 μm thick histopathologic sections obtained in reflection geometry with a time-domain spectrometer, and demonstrate improved contrast for sections measured in paraffin with water. Automated segmentation is applied to the complex refractive index data to generate clustered terahertz images distinguishing cancer from healthy tissues. The degree of classification of pixels is then evaluated using registered visible microscope images. Principal component analysis and propagation simulations are employed to investigate the origin and the gain of image contrast. PMID:21326635
Contrast improvement of terahertz images of thin histopathologic sections.
Formanek, Florian; Brun, Marc-Aurèle; Yasuda, Akio
2010-12-03
We present terahertz images of 10 μm thick histopathologic sections obtained in reflection geometry with a time-domain spectrometer, and demonstrate improved contrast for sections measured in paraffin with water. Automated segmentation is applied to the complex refractive index data to generate clustered terahertz images distinguishing cancer from healthy tissues. The degree of classification of pixels is then evaluated using registered visible microscope images. Principal component analysis and propagation simulations are employed to investigate the origin and the gain of image contrast.
Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S. Stanley
2016-01-01
Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability. PMID:27213413
Fogel, Paul; Gaston-Mathé, Yann; Hawkins, Douglas; Fogel, Fajwel; Luta, George; Young, S Stanley
2016-05-18
Often data can be represented as a matrix, e.g., observations as rows and variables as columns, or as a doubly classified contingency table. Researchers may be interested in clustering the observations, the variables, or both. If the data is non-negative, then Non-negative Matrix Factorization (NMF) can be used to perform the clustering. By its nature, NMF-based clustering is focused on the large values. If the data is normalized by subtracting the row/column means, it becomes of mixed signs and the original NMF cannot be used. Our idea is to split and then concatenate the positive and negative parts of the matrix, after taking the absolute value of the negative elements. NMF applied to the concatenated data, which we call PosNegNMF, offers the advantages of the original NMF approach, while giving equal weight to large and small values. We use two public health datasets to illustrate the new method and compare it with alternative clustering methods, such as K-means and clustering methods based on the Singular Value Decomposition (SVD) or Principal Component Analysis (PCA). With the exception of situations where a reasonably accurate factorization can be achieved using the first SVD component, we recommend that the epidemiologists and environmental scientists use the new method to obtain clusters with improved quality and interpretability.
Mwangi, Benson; Soares, Jair C; Hasan, Khader M
2014-10-30
Neuroimaging machine learning studies have largely utilized supervised algorithms - meaning they require both neuroimaging scan data and corresponding target variables (e.g. healthy vs. diseased) to be successfully 'trained' for a prediction task. Noticeably, this approach may not be optimal or possible when the global structure of the data is not well known and the researcher does not have an a priori model to fit the data. We set out to investigate the utility of an unsupervised machine learning technique; t-distributed stochastic neighbour embedding (t-SNE) in identifying 'unseen' sample population patterns that may exist in high-dimensional neuroimaging data. Multimodal neuroimaging scans from 92 healthy subjects were pre-processed using atlas-based methods, integrated and input into the t-SNE algorithm. Patterns and clusters discovered by the algorithm were visualized using a 2D scatter plot and further analyzed using the K-means clustering algorithm. t-SNE was evaluated against classical principal component analysis. Remarkably, based on unlabelled multimodal scan data, t-SNE separated study subjects into two very distinct clusters which corresponded to subjects' gender labels (cluster silhouette index value=0.79). The resulting clusters were used to develop an unsupervised minimum distance clustering model which identified 93.5% of subjects' gender. Notably, from a neuropsychiatric perspective this method may allow discovery of data-driven disease phenotypes or sub-types of treatment responders. Copyright © 2014 Elsevier B.V. All rights reserved.
Comparison of organs' shapes with geometric and Zernike 3D moments.
Broggio, D; Moignier, A; Ben Brahim, K; Gardumi, A; Grandgirard, N; Pierrat, N; Chea, M; Derreumaux, S; Desbrée, A; Boisserie, G; Aubert, B; Mazeron, J-J; Franck, D
2013-09-01
The morphological similarity of organs is studied with feature vectors based on geometric and Zernike 3D moments. It is particularly investigated if outliers and average models can be identified. For this purpose, the relative proximity to the mean feature vector is defined, principal coordinate and clustering analyses are also performed. To study the consistency and usefulness of this approach, 17 livers and 76 hearts voxel models from several sources are considered. In the liver case, models with similar morphological feature are identified. For the limited amount of studied cases, the liver of the ICRP male voxel model is identified as a better surrogate than the female one. For hearts, the clustering analysis shows that three heart shapes represent about 80% of the morphological variations. The relative proximity and clustering analysis rather consistently identify outliers and average models. For the two cases, identification of outliers and surrogate of average models is rather robust. However, deeper classification of morphological feature is subject to caution and can only be performed after cross analysis of at least two kinds of feature vectors. Finally, the Zernike moments contain all the information needed to re-construct the studied objects and thus appear as a promising tool to derive statistical organ shapes. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
2013-01-01
Background The publication of protocols by medical journals is increasingly becoming an accepted means for promoting good quality research and maximising transparency. Recently, Finfer and Bellomo have suggested the publication of statistical analysis plans (SAPs).The aim of this paper is to make public and to report in detail the planned analyses that were approved by the Trial Steering Committee in May 2010 for the principal papers of the PACE (Pacing, graded Activity, and Cognitive behaviour therapy: a randomised Evaluation) trial, a treatment trial for chronic fatigue syndrome. It illustrates planned analyses of a complex intervention trial that allows for the impact of clustering by care providers, where multiple care-providers are present for each patient in some but not all arms of the trial. Results The trial design, objectives and data collection are reported. Considerations relating to blinding, samples, adherence to the protocol, stratification, centre and other clustering effects, missing data, multiplicity and compliance are described. Descriptive, interim and final analyses of the primary and secondary outcomes are then outlined. Conclusions This SAP maximises transparency, providing a record of all planned analyses, and it may be a resource for those who are developing SAPs, acting as an illustrative example for teaching and methodological research. It is not the sum of the statistical analysis sections of the principal papers, being completed well before individual papers were drafted. Trial registration ISRCTN54285094 assigned 22 May 2003; First participant was randomised on 18 March 2005. PMID:24225069
Statistical analyses and characteristics of volcanic tremor on Stromboli Volcano (Italy)
NASA Astrophysics Data System (ADS)
Falsaperla, S.; Langer, H.; Spampinato, S.
A study of volcanic tremor on Stromboli is carried out on the basis of data recorded daily between 1993 and 1995 by a permanent seismic station (STR) located 1.8km away from the active craters. We also consider the signal of a second station (TF1), which operated for a shorter time span. Changes in the spectral tremor characteristics can be related to modifications in volcanic activity, particularly to lava effusions and explosive sequences. Statistical analyses were carried out on a set of spectra calculated daily from seismic signals where explosion quakes were present or excluded. Principal component analysis and cluster analysis were applied to identify different classes of spectra. Three clusters of spectra are associated with two different states of volcanic activity. One cluster corresponds to a state of low to moderate activity, whereas the two other clusters are present during phases with a high magma column as inferred from the occurrence of lava fountains or effusions. We therefore conclude that variations in volcanic activity at Stromboli are usually linked to changes in the spectral characteristics of volcanic tremor. Site effects are evident when comparing the spectra calculated from signals synchronously recorded at STR and TF1. However, some major spectral peaks at both stations may reflect source properties. Statistical considerations and polarization analysis are in favor of a prevailing presence of P-waves in the tremor signal along with a position of the source northwest of the craters and at shallow depth.
Pathak, Bhuvan; Ayala-Silva, Tomas; Yang, Xiping; Todd, James; Glynn, Neil C.; Kuhn, David N.; Glaz, Barry; Gilbert, Robert A.; Comstock, Jack C.; Wang, Jianping
2014-01-01
Sugarcane (Saccharum spp.) and other members of Saccharum spp. are attractive biofuel feedstocks. One of the two World Collections of Sugarcane and Related Grasses (WCSRG) is in Miami, FL. This WCSRG has 1002 accessions, presumably with valuable alleles for biomass, other important agronomic traits, and stress resistance. However, the WCSRG has not been fully exploited by breeders due to its lack of characterization and unmanageable population. In order to optimize the use of this genetic resource, we aim to 1) genotypically evaluate all the 1002 accessions to understand its genetic diversity and population structure and 2) form a core collection, which captures most of the genetic diversity in the WCSRG. We screened 36 microsatellite markers on 1002 genotypes and recorded 209 alleles. Genetic diversity of the WCSRG ranged from 0 to 0.5 with an average of 0.304. The population structure analysis and principal coordinate analysis revealed three clusters with all S. spontaneum in one cluster, S. officinarum and S. hybrids in the second cluster and mostly non-Saccharum spp. in the third cluster. A core collection of 300 accessions was identified which captured the maximum genetic diversity of the entire WCSRG which can be further exploited for sugarcane and energy cane breeding. Sugarcane and energy cane breeders can effectively utilize this core collection for cultivar improvement. Further, the core collection can provide resources for forming an association panel to evaluate the traits of agronomic and commercial importance. PMID:25333358
Ishii, Genichiro; Aoyagi, Kazuhiko; Sasaki, Hiroki; Ochiai, Atsushi
2015-01-01
Background Fibroblasts are the principal stromal cells that exist in whole organs and play vital roles in many biological processes. Although the functional diversity of fibroblasts has been estimated, a comprehensive analysis of fibroblasts from the whole body has not been performed and their transcriptional diversity has not been sufficiently explored. The aim of this study was to elucidate the transcriptional diversity of human fibroblasts within the whole body. Methods Global gene expression analysis was performed on 63 human primary fibroblasts from 13 organs. Of these, 32 fibroblasts from gastrointestinal organs (gastrointestinal fibroblasts: GIFs) were obtained from a pair of 2 anatomical sites: the submucosal layer (submucosal fibroblasts: SMFs) and the subperitoneal layer (subperitoneal fibroblasts: SPFs). Using hierarchical clustering analysis, we elucidated identifiable subgroups of fibroblasts and analyzed the transcriptional character of each subgroup. Results In unsupervised clustering, 2 major clusters that separate GIFs and non-GIFs were observed. Organ- and anatomical site-dependent clusters within GIFs were also observed. The signature genes that discriminated GIFs from non-GIFs, SMFs from SPFs, and the fibroblasts of one organ from another organ consisted of genes associated with transcriptional regulation, signaling ligands, and extracellular matrix remodeling. Conclusions GIFs are characteristic fibroblasts with specific gene expressions from transcriptional regulation, signaling ligands, and extracellular matrix remodeling related genes. In addition, the anatomical site- and organ-dependent diversity of GIFs was also discovered. These features of GIFs contribute to their specific physiological function and homeostatic maintenance, and create a functional diversity of the gastrointestinal tract. PMID:26046848
Chemical study of the metal-rich globular cluster NGC 5927
NASA Astrophysics Data System (ADS)
Mura-Guzmán, A.; Villanova, S.; Muñoz, C.; Tang, B.
2018-03-01
Globular clusters (GCs) are natural laboratories where stellar and chemical evolution can be studied in detail. In addition, their chemical patterns and kinematics can tell us to which Galactic structure (disc, bulge, halo or extragalactic) the cluster belongs to. NGC 5927 is one of most metal-rich GCs in the Galaxy and its kinematics links it to the thick disc. We present abundance analysis based on high-resolution spectra of seven giant stars. The data were obtained using Fibre Large Array Multi Element Spectrograph/Ultraviolet Echelle Spectrograph (UVES) spectrograph mounted on UT2 telescope of the European Southern Observatory. The principal objective of this work is to perform a wide and detailed chemical abundance analysis of the cluster and look for possible Multiple Populations (MPs). We determined stellar parameters and measured 22 elements corresponding to light (Na, Al), alpha (O, Mg, Si, Ca, Ti), iron-peak (Sc, V, Cr, Mn, Fe, Co, Ni, Cu, Zn), and heavy elements (Y, Zr, Ba, Ce, Nd, Eu). We found a mean iron content of [Fe/H] = -0.47 ± 0.02 (error on the mean). We confirm the existence of MPs in this GC with an O-Na anti-correlation, and moderate spread in Al abundances. We estimate a mean [α/Fe] = 0.25 ± 0.08. Iron-peak elements show no significant spread. The [Ba/Eu] ratios indicate a predominant contribution from SNeII for the formation of the cluster.
Research on potential user identification model for electric energy substitution
NASA Astrophysics Data System (ADS)
Xia, Huaijian; Chen, Meiling; Lin, Haiying; Yang, Shuo; Miao, Bo; Zhu, Xinzhi
2018-01-01
The implementation of energy substitution plays an important role in promoting the development of energy conservation and emission reduction in china. Energy service management platform of alternative energy users based on the data in the enterprise production value, product output, coal and other energy consumption as a potential evaluation index, using principal component analysis model to simplify the formation of characteristic index, comprehensive index contains the original variables, and using fuzzy clustering model for the same industry user’s flexible classification. The comprehensive index number and user clustering classification based on constructed particle optimization neural network classification model based on the user, user can replace electric potential prediction. The results of an example show that the model can effectively predict the potential of users’ energy potential.
Tchabo, William; Ma, Yongkun; Kwaw, Emmanuel; Zhang, Haining; Xiao, Lulu; Tahir, Haroon Elrasheid
2017-10-01
The present study was undertaken to assess accelerating aging effects of high pressure, ultrasound and manosonication on the aromatic profile and sensorial attributes of aged mulberry wines (AMW). A total of 166 volatile compounds were found amongst the AMW. The outcomes of the investigation were presented by means of geometric mean (GM), cluster analysis (CA), principal component analysis (PCA), partial least squares regressions (PLSR) and principal component regression (PCR). GM highlighted 24 organoleptic attributes responsible for the sensorial profile of the AMW. Moreover, CA revealed that the volatile composition of the non-thermal accelerated aged wines differs from that of the conventional aged wines. Besides, PCA discriminated the AMW on the basis of their main sensorial characteristics. Furthermore, PLSR identified 75 aroma compounds which were mainly responsible for the olfactory notes of the AMW. Finally, the overall quality of the AMW was noted to be better predicted by PLSR than PCR. Copyright © 2017 Elsevier Ltd. All rights reserved.
2013-01-01
Background Various diet- and activity-related parenting practices are positive determinants of child dietary and activity behaviour, including home availability, parental modelling and parental policies. There is evidence that parenting practices cluster within the dietary domain and within the activity domain. This study explores whether diet- and activity-related parenting practices cluster across the dietary and activity domain. Also examined is whether the clusters are related to child and parental background characteristics. Finally, to indicate the relevance of the clusters in influencing child dietary and activity behaviour, we examined whether clusters of parenting practices are related to these behaviours. Methods Data were used from 1480 parent–child dyads participating in the Dutch IVO Nutrition and Physical Activity Child cohorT (INPACT). Parents of children aged 8–11 years completed questionnaires at home assessing their diet- and activity-related parenting practices, child and parental background characteristics, and child dietary and activity behaviours. Principal component analysis (PCA) was used to identify clusters of parenting practices. Backward regression analysis was used to examine the relationship between child and parental background characteristics with cluster scores, and partial correlations to examine associations between cluster scores and child dietary and activity behaviours. Results PCA revealed five clusters of parenting practices: 1) high visibility and accessibility of screens and unhealthy food, 2) diet- and activity-related rules, 3) low availability of unhealthy food, 4) diet- and activity-related positive modelling, and 5) positive modelling on sports and fruit. Low parental education was associated with unhealthy cluster 1, while high(er) education was associated with healthy clusters 2, 3 and 5. Separate clusters were related to both child dietary and activity behaviour in the hypothesized directions: healthy clusters were positively related to obesity-reducing behaviours and negatively to obesity-inducing behaviours. Conclusion Parenting practices cluster across the dietary and activity domain. Parental education can be seen as an indicator of a broader parental context in which clusters of parenting practices operate. Separate clusters are related to both child dietary and activity behaviour. Interventions that focus on clusters of parenting practices to assist parents (especially low-educated parents) in changing their child’s dietary and activity behaviour seems justified. PMID:23531232
[Study on Commercial Specification of Lonicerae Japonicae Flos].
Zhou, Jie; Zou, Lin; Liu, Wei; Bian, Li-hua; Wang, Xiao; Zhang, Yong-qing; Dan, Staerk
2015-04-01
To provide the basis data for the institute of commercial specification standard of Lonicerae Japonicae Flos. 39 samples of Lonicerae Japonicae Flos commercial of different grades in market were collected, and vernier caliper and electronic balance were used to measure the numbers of flower bud and blooming rate per 0. 5 g, contamination content, browning degree, milden and rot, length, upside diameter, middle diameter and bottom diameter of Lonicerae Japonicae Flos. The content of neochlorogenic acid, chlorogenic acid, cryptochlorogenic acid, rutin, galuteolin,3,5-icaffeoylquinic acid and 4,5-dicaffeoylquinic acid were detected by HPLC. Correlation analysis, principal component analysis and cluster analysis were used by SPSS to analyze all index data,and the correlation of appearance characteristics and intrinsic active constituents was discussed. The numbers of flower bud and blooming rate per 0. 5 g, contamination content and browning degree were principal component indexes. The length of flower bud showed a significant correlation with galuteolin content, and the browning degree and upside diameter showed a significant correlation with chlorogenic acid content. Lonicerae Japonicae Flos commercial should be divided into four specification grades by sieved indexes.
Bao, Zhihua; Ikunaga, Yoko; Matsushita, Yuko; Morimoto, Sho; Takada-Hoshino, Yuko; Okada, Hiroaki; Oba, Hirosuke; Takemoto, Shuhei; Niwa, Shigeru; Ohigashi, Kentaro; Suzuki, Chika; Nagaoka, Kazunari; Takenaka, Makoto; Urashima, Yasufumi; Sekiguchi, Hiroyuki; Kushida, Atsuhiko; Toyota, Koki; Saito, Masanori; Tsushima, Seiya
2012-01-01
We simultaneously examined the bacteria, fungi and nematode communities in Andosols from four agro-geographical sites in Japan using polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) and statistical analyses to test the effects of environmental factors including soil properties on these communities depending on geographical sites. Statistical analyses such as Principal component analysis (PCA) and Redundancy analysis (RDA) revealed that the compositions of the three soil biota communities were strongly affected by geographical sites, which were in turn strongly associated with soil characteristics such as total C (TC), total N (TN), C/N ratio and annual mean soil temperature (ST). In particular, the TC, TN and C/N ratio had stronger effects on bacterial and fungal communities than on the nematode community. Additionally, two-way cluster analysis using the combined DGGE profile also indicated that all soil samples were classified into four clusters corresponding to the four sites, showing high site specificity of soil samples, and all DNA bands were classified into four clusters, showing the coexistence of specific DGGE bands of bacteria, fungi and nematodes in Andosol fields. The results of this study suggest that geography relative to soil properties has a simultaneous impact on soil microbial and nematode community compositions. This is the first combined profile analysis of bacteria, fungi and nematodes at different sites with agricultural Andosols. PMID:22223474
Bao, Zhihua; Ikunaga, Yoko; Matsushita, Yuko; Morimoto, Sho; Takada-Hoshino, Yuko; Okada, Hiroaki; Oba, Hirosuke; Takemoto, Shuhei; Niwa, Shigeru; Ohigashi, Kentaro; Suzuki, Chika; Nagaoka, Kazunari; Takenaka, Makoto; Urashima, Yasufumi; Sekiguchi, Hiroyuki; Kushida, Atsuhiko; Toyota, Koki; Saito, Masanori; Tsushima, Seiya
2012-01-01
We simultaneously examined the bacteria, fungi and nematode communities in Andosols from four agro-geographical sites in Japan using polymerase chain reaction-denaturing gradient gel electrophoresis (PCR-DGGE) and statistical analyses to test the effects of environmental factors including soil properties on these communities depending on geographical sites. Statistical analyses such as Principal component analysis (PCA) and Redundancy analysis (RDA) revealed that the compositions of the three soil biota communities were strongly affected by geographical sites, which were in turn strongly associated with soil characteristics such as total C (TC), total N (TN), C/N ratio and annual mean soil temperature (ST). In particular, the TC, TN and C/N ratio had stronger effects on bacterial and fungal communities than on the nematode community. Additionally, two-way cluster analysis using the combined DGGE profile also indicated that all soil samples were classified into four clusters corresponding to the four sites, showing high site specificity of soil samples, and all DNA bands were classified into four clusters, showing the coexistence of specific DGGE bands of bacteria, fungi and nematodes in Andosol fields. The results of this study suggest that geography relative to soil properties has a simultaneous impact on soil microbial and nematode community compositions. This is the first combined profile analysis of bacteria, fungi and nematodes at different sites with agricultural Andosols.
Melo, Armindo; Pinto, Edgar; Aguiar, Ana; Mansilha, Catarina; Pinho, Olívia; Ferreira, Isabel M P L V O
2012-07-01
A monitoring program of nitrate, nitrite, potassium, sodium, and pesticides was carried out in water samples from an intensive horticulture area in a vulnerable zone from north of Portugal. Eight collecting points were selected and water-analyzed in five sampling campaigns, during 1 year. Chemometric techniques, such as cluster analysis, principal component analysis (PCA), and discriminant analysis, were used in order to understand the impact of intensive horticulture practices on dug and drilled wells groundwater and to study variations in the hydrochemistry of groundwater. PCA performed on pesticide data matrix yielded seven significant PCs explaining 77.67% of the data variance. Although PCA rendered considerable data reduction, it could not clearly group and distinguish the sample types. However, a visible differentiation between the water samples was obtained. Cluster and discriminant analysis grouped the eight collecting points into three clusters of similar characteristics pertaining to water contamination, indicating that it is necessary to improve the use of water, fertilizers, and pesticides. Inorganic fertilizers such as potassium nitrate were suspected to be the most important factors for nitrate contamination since highly significant Pearson correlation (r = 0.691, P < 0.01) was obtained between groundwater nitrate and potassium contents. Water from dug wells is especially prone to contamination from the grower and their closer neighbor's practices. Water from drilled wells is also contaminated from distant practices.
Zhang, Xianming; Lohmann, Rainer; Dassuncao, Clifton; Hu, Xindi C.; Weber, Andrea K.; Vecitis, Chad D.; Sunderland, Elsie M.
2017-01-01
Exposure to poly and perfluoroalkyl substances (PFASs) has been associated with adverse health effects in humans and wildlife. Understanding pollution sources is essential for environmental regulation but source attribution for PFASs has been confounded by limited information on industrial releases and rapid changes in chemical production. Here we use principal component analysis (PCA), hierarchical clustering, and geospatial analysis to understand source contributions to 14 PFASs measured across 37 sites in the Northeastern United States in 2014. PFASs are significantly elevated in urban areas compared to rural sites except for perfluorobutane sulfonate (PFBS), N-methyl perfluorooctanesulfonamidoacetic acid (N-MeFOSAA), perfluoroundecanate (PFUnDA) and perfluorododecanate (PFDoDA). The highest PFAS concentrations across sites were for perfluorooctanate (PFOA, 56 ng L−1) and perfluorohexane sulfonate (PFOS, 43 ng L−1) and PFOS levels are lower than earlier measurements of U.S. surface waters. PCA and cluster analysis indicates three main statistical groupings of PFASs. Geospatial analysis of watersheds reveals the first component/cluster originates from a mixture of contemporary point sources such as airports and textile mills. Atmospheric sources from the waste sector are consistent with the second component, and the metal smelting industry plausibly explains the third component. We find this source-attribution technique is effective for better understanding PFAS sources in urban areas. PMID:28217711
NASA Astrophysics Data System (ADS)
Milev, M.; Nikolova, Kr.; Ivanova, Ir.; Dobreva, M.
2015-11-01
25 olive oils were studied- different in origin and ways of extraction, in accordance with 17 physico-chemical parameters as follows: color parameters - a and b, light, fluorescence peaks, pigments - chlorophyll and β-carotene, fatty-acid content. The goals of the current study were: Conducting correlation analysis to find the inner relation between the studied indices; By applying factor analysis with the help of the method of Principal Components (PCA), to reduce the great number of variables into a few factors, which are of main importance for distinguishing the different types of olive oil;Using K-means cluster to compare and group the tested types olive oils based on their similarity. The inner relation between the studied indices was found by applying correlation analysis. A factor analysis using PCA was applied on the basis of the found correlation matrix. Thus the number of the studied indices was reduced to 4 factors, which explained 79.3% from the entire variation. The first one unified the color parameters, β-carotene and the related with oxidative products fluorescence peak - about 520 nm. The second one was determined mainly by the chlorophyll content and related to it fluorescence peak - about 670 nm. The third and the fourth factors were determined by the fatty-acid content of the samples. The third one unified the fatty-acids, which give us the opportunity to distinguish olive oil from the other plant oils - oleic, linoleic and stearin acids. The fourth factor included fatty-acids with relatively much lower content in the studied samples. It is enquired the number of clusters to be determined preliminary in order to apply the K-Cluster analysis. The variant K = 3 was worked out because the types of the olive oil were three. The first cluster unified all salad and pomace olive oils, the second unified the samples of extra virgin oilstaken as controls from producers, which were bought from the trade network. The third cluster unified samples from pomace and extra virgin oils, which distinguish one from another in accordance with their parameters from the natural olive oils, because of presence of plant oils impurities.
Buriani, Alessandro; Fortinguerra, Stefano; Sorrenti, Vincenzo; Dall'Acqua, Stefano; Innocenti, Gabbriella; Montopoli, Monica; Gabbia, Daniela; Carrara, Maria
2017-08-11
Principal component analysis (PCA) multivariate analysis was applied to study the cytotoxic activity of essential oils from various species of the Pistacia genus on human tumor cell lines. In particular, the cytotoxic activity of essential oils obtained from P. lentiscus , P. lentiscus var. chia (mastic gum), P. terebinthus , P. vera , and P. integerrima , was screened on three human adenocarcinoma cell lines: MCF-7 (breast), 2008 (ovarian), and LoVo (colon). The results indicate that all the Pistacia phytocomplexes, with the exception of mastic gum oil, induce cytotoxic effects on one or more of the three cell lines. PCA highlighted the presence of different cooperating clusters of bioactive molecules. Cluster variability among species, and even within the same species, could explain some of the differences seen among samples suggesting the presence of both common and species-specific mechanisms. Single molecules from one of the most significant clusters were tested, but only bornyl-acetate presented cytotoxic activity, although at much higher concentrations (IC 50 = 138.5 µg/mL) than those present in the essential oils, indicating that understanding of the full biological effect requires a holistic vision of the phytocomplexes with all its constituents.
Huang, Wei; Oh, Sung-Kwun; Pedrycz, Witold
2014-12-01
In this study, we propose Hybrid Radial Basis Function Neural Networks (HRBFNNs) realized with the aid of fuzzy clustering method (Fuzzy C-Means, FCM) and polynomial neural networks. Fuzzy clustering used to form information granulation is employed to overcome a possible curse of dimensionality, while the polynomial neural network is utilized to build local models. Furthermore, genetic algorithm (GA) is exploited here to optimize the essential design parameters of the model (including fuzzification coefficient, the number of input polynomial fuzzy neurons (PFNs), and a collection of the specific subset of input PFNs) of the network. To reduce dimensionality of the input space, principal component analysis (PCA) is considered as a sound preprocessing vehicle. The performance of the HRBFNNs is quantified through a series of experiments, in which we use several modeling benchmarks of different levels of complexity (different number of input variables and the number of available data). A comparative analysis reveals that the proposed HRBFNNs exhibit higher accuracy in comparison to the accuracy produced by some models reported previously in the literature. Copyright © 2014 Elsevier Ltd. All rights reserved.
2013-01-01
Background The paper presents the evaluation of soil contamination with total, water-available, mobile, semi-mobile and non-mobile Hg fractions in the surroundings of a former chlor-alkali plant in connection with several chemical soil characteristics. Principal Component Analysis and Cluster Analysis were used to evaluate the chemical composition variability of soil and factors influencing the fate of Hg in such areas. The sequential extraction EPA 3200-Method and the determination technique based on capacitively coupled microplasma optical emission spectrometry were checked. Results A case study was conducted in the Turda town, Romania. The results revealed a high contamination with Hg in the area of the former chlor-alkali plant and waste landfills, where soils were categorized as hazardous waste. The weight of the Hg fractions decreased in the order semi-mobile > non-mobile > mobile > water leachable. Principal Component Analysis revealed 7 factors describing chemical composition variability of soil, of which 3 attributed to Hg species. Total Hg, semi-mobile, non-mobile and mobile fractions were observed to have a strong influence, while the water leachable fraction a weak influence. The two-dimensional plot of PCs highlighted 3 groups of sites according to the Hg contamination factor. The statistical approach has shown that the Hg fate in soil is dependent on pH, content of organic matter, Ca, Fe, Mn, Cu and SO42- rather than natural components, such as aluminosilicates. Cluster analysis of soil characteristics revealed 3 clusters, one of which including Hg species. Soil contamination with Cu as sulfate and Zn as nitrate was also observed. Conclusions The approach based on speciation and statistical interpretation of data developed in this study could be useful in the investigation of other chlor-alkali contaminated areas. According to the Bland and Altman test the 3-step sequential extraction scheme is suitable for Hg speciation in soil, while the used determination method of Hg is appropriate. PMID:24252185
Zheng, Yiqi; Xu, Shaojun; Liu, Jing; Zhao, Yan; Liu, Jianxiu
2017-01-01
Bermudagrass [Cynodon dactylon (L.) Pers.], an important turfgrass used in public parks, home lawns, golf courses and sports fields, is widely distributed in China. In the present study, sequence-related amplified polymorphism (SRAP) markers were used to assess genetic diversity and population structure among 157 indigenous bermudagrass genotypes from 20 provinces in China. The application of 26 SRAP primer pairs produced 340 bands, of which 328 (96.58%) were polymorphic. The polymorphic information content (PIC) ranged from 0.36 to 0.49 with a mean of 0.44. Genetic distance coefficients among accessions ranged from 0.04 to 0.61, with an average of 0.32. The results of STRUCTURE analysis suggested that 157 bermudagrass accessions can be grouped into three subpopulations. Moreover, according to clustering based on the unweighted pair-group method of arithmetic averages (UPGMA), accessions were divided into three major clusters. The UPGMA dendrogram revealed that accessions from identical or adjacent areas were generally, but not entirely, clustered into the same cluster. Comparison of the UPGMA dendrogram and the Bayesian STRUCTURE analysis showed general agreement between the population subdivisions and the genetic relationships among accessions. Principal coordinate analysis (PCoA) with SRAP markers revealed a similar grouping of accessions to the UPGMA dendrogram and STRUCTUE analysis. Analysis of molecular variance (AMOVA) indicated that 18% of total molecular variance was attributed to diversity among subpopulations, while 82% of variance was associated with differences within subpopulations. Our study represents the most comprehensive investigation of the genetic diversity and population structure of bermudagrass in China to date, and provides valuable information for the germplasm collection, genetic improvement, and systematic utilization of bermudagrass.
Xu, Shaojun; Liu, Jing; Zhao, Yan; Liu, Jianxiu
2017-01-01
Bermudagrass [Cynodon dactylon (L.) Pers.], an important turfgrass used in public parks, home lawns, golf courses and sports fields, is widely distributed in China. In the present study, sequence-related amplified polymorphism (SRAP) markers were used to assess genetic diversity and population structure among 157 indigenous bermudagrass genotypes from 20 provinces in China. The application of 26 SRAP primer pairs produced 340 bands, of which 328 (96.58%) were polymorphic. The polymorphic information content (PIC) ranged from 0.36 to 0.49 with a mean of 0.44. Genetic distance coefficients among accessions ranged from 0.04 to 0.61, with an average of 0.32. The results of STRUCTURE analysis suggested that 157 bermudagrass accessions can be grouped into three subpopulations. Moreover, according to clustering based on the unweighted pair-group method of arithmetic averages (UPGMA), accessions were divided into three major clusters. The UPGMA dendrogram revealed that accessions from identical or adjacent areas were generally, but not entirely, clustered into the same cluster. Comparison of the UPGMA dendrogram and the Bayesian STRUCTURE analysis showed general agreement between the population subdivisions and the genetic relationships among accessions. Principal coordinate analysis (PCoA) with SRAP markers revealed a similar grouping of accessions to the UPGMA dendrogram and STRUCTUE analysis. Analysis of molecular variance (AMOVA) indicated that 18% of total molecular variance was attributed to diversity among subpopulations, while 82% of variance was associated with differences within subpopulations. Our study represents the most comprehensive investigation of the genetic diversity and population structure of bermudagrass in China to date, and provides valuable information for the germplasm collection, genetic improvement, and systematic utilization of bermudagrass. PMID:28493962
Danielsson, Rebecca; Dicksved, Johan; Sun, Li; Gonda, Horacio; Müller, Bettina; Schnürer, Anna; Bertilsson, Jan
2017-01-01
Methane (CH 4 ) is produced as an end product from feed fermentation in the rumen. Yield of CH 4 varies between individuals despite identical feeding conditions. To get a better understanding of factors behind the individual variation, 73 dairy cows given the same feed but differing in CH 4 emissions were investigated with focus on fiber digestion, fermentation end products and bacterial and archaeal composition. In total 21 cows (12 Holstein, 9 Swedish Red) identified as persistent low, medium or high CH 4 emitters over a 3 month period were furthermore chosen for analysis of microbial community structure in rumen fluid. This was assessed by sequencing the V4 region of 16S rRNA gene and by quantitative qPCR of targeted Methanobrevibacter groups. The results showed a positive correlation between low CH 4 emitters and higher abundance of Methanobrevibacter ruminantium clade. Principal coordinate analysis (PCoA) on operational taxonomic unit (OTU) level of bacteria showed two distinct clusters ( P < 0.01) that were related to CH 4 production. One cluster was associated with low CH 4 production (referred to as cluster L) whereas the other cluster was associated with high CH 4 production (cluster H) and the medium emitters occurred in both clusters. The differences between clusters were primarily linked to differential abundances of certain OTUs belonging to Prevotella . Moreover, several OTUs belonging to the family Succinivibrionaceae were dominant in samples belonging to cluster L. Fermentation pattern of volatile fatty acids showed that proportion of propionate was higher in cluster L, while proportion of butyrate was higher in cluster H. No difference was found in milk production or organic matter digestibility between cows. Cows in cluster L had lower CH 4 /kg energy corrected milk (ECM) compared to cows in cluster H, 8.3 compared to 9.7 g CH 4 /kg ECM, showing that low CH 4 cows utilized the feed more efficient for milk production which might indicate a more efficient microbial population or host genetic differences that is reflected in bacterial and archaeal (or methanogens) populations.
Danielsson, Rebecca; Dicksved, Johan; Sun, Li; Gonda, Horacio; Müller, Bettina; Schnürer, Anna; Bertilsson, Jan
2017-01-01
Methane (CH4) is produced as an end product from feed fermentation in the rumen. Yield of CH4 varies between individuals despite identical feeding conditions. To get a better understanding of factors behind the individual variation, 73 dairy cows given the same feed but differing in CH4 emissions were investigated with focus on fiber digestion, fermentation end products and bacterial and archaeal composition. In total 21 cows (12 Holstein, 9 Swedish Red) identified as persistent low, medium or high CH4 emitters over a 3 month period were furthermore chosen for analysis of microbial community structure in rumen fluid. This was assessed by sequencing the V4 region of 16S rRNA gene and by quantitative qPCR of targeted Methanobrevibacter groups. The results showed a positive correlation between low CH4 emitters and higher abundance of Methanobrevibacter ruminantium clade. Principal coordinate analysis (PCoA) on operational taxonomic unit (OTU) level of bacteria showed two distinct clusters (P < 0.01) that were related to CH4 production. One cluster was associated with low CH4 production (referred to as cluster L) whereas the other cluster was associated with high CH4 production (cluster H) and the medium emitters occurred in both clusters. The differences between clusters were primarily linked to differential abundances of certain OTUs belonging to Prevotella. Moreover, several OTUs belonging to the family Succinivibrionaceae were dominant in samples belonging to cluster L. Fermentation pattern of volatile fatty acids showed that proportion of propionate was higher in cluster L, while proportion of butyrate was higher in cluster H. No difference was found in milk production or organic matter digestibility between cows. Cows in cluster L had lower CH4/kg energy corrected milk (ECM) compared to cows in cluster H, 8.3 compared to 9.7 g CH4/kg ECM, showing that low CH4 cows utilized the feed more efficient for milk production which might indicate a more efficient microbial population or host genetic differences that is reflected in bacterial and archaeal (or methanogens) populations. PMID:28261182
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pang, Yuanjie, E-mail: yuanjie.p@gmail.com
Background: Natural and anthropogenic sources of metal exposure differ for urban and rural residents. We searched to identify patterns of metal mixtures which could suggest common environmental sources and/or metabolic pathways of different urinary metals, and compared metal-mixtures in two population-based studies from urban/sub-urban and rural/town areas in the US: the Multi-Ethnic Study of Atherosclerosis (MESA) and the Strong Heart Study (SHS). Methods: We studied a random sample of 308 White, Black, Chinese-American, and Hispanic participants in MESA (2000–2002) and 277 American Indian participants in SHS (1998–2003). We used principal component analysis (PCA), cluster analysis (CA), and linear discriminant analysismore » (LDA) to evaluate nine urinary metals (antimony [Sb], arsenic [As], cadmium [Cd], lead [Pb], molybdenum [Mo], selenium [Se], tungsten [W], uranium [U] and zinc [Zn]). For arsenic, we used the sum of inorganic and methylated species (∑As). Results: All nine urinary metals were higher in SHS compared to MESA participants. PCA and CA revealed the same patterns in SHS, suggesting 4 distinct principal components (PC) or clusters (∑As-U-W, Pb-Sb, Cd-Zn, Mo-Se). In MESA, CA showed 2 large clusters (∑As-Mo-Sb-U-W, Cd-Pb-Se-Zn), while PCA showed 4 PCs (Sb-U-W, Pb-Se-Zn, Cd-Mo, ∑As). LDA indicated that ∑As, U, W, and Zn were the most discriminant variables distinguishing MESA and SHS participants. Conclusions: In SHS, the ∑As-U-W cluster and PC might reflect groundwater contamination in rural areas, and the Cd-Zn cluster and PC could reflect common sources from meat products or metabolic interactions. Among the metals assayed, ∑As, U, W and Zn differed the most between MESA and SHS, possibly reflecting disproportionate exposure from drinking water and perhaps food in rural Native communities compared to urban communities around the US. - Highlights: • We identified and compared environmental sources of urinary metals in MESA and SHS. • ∑As-U-W in SHS may reflect groundwater contamination in rural areas. • Cd-Zn in SHS may reflect common sources from meat products or metabolic interaction. • ∑As, U, W, and Zn differed the most between MESA and SHS participants.« less
Sischo, William M.; Short, Diana M.; Geissler, Mareen; Bunyatratchata, Apichaya; Barile, Daniela
2017-01-01
Prebiotics are nondigestible dietary ingredients, usually oligosaccharides (OS), that provide a health benefit to the host by directly modulating the gut microbiota. Although there is some information describing OS content in dairy-source milk, no information is available to describe the OS content of beef-source milk. Given the different trait emphasis between dairy and beef for milk production and calf survivability, it is plausible that OS composition, diversity, and abundance differ between production types. The goal of this study was to compare OS in milk from commercial dairy and beef cows in early lactation. Early-lactation multiparous cows (5–12 d in milk) from 5 commercial Holstein dairy herds and 5 Angus or Angus hybrid beef herds were sampled once. Milk was obtained from each enrolled cow and frozen on the farm. Subsequently, each milk sample was assessed for total solids, pH, and OS content and relative abundance. Oligosaccharide diversity and abundance within and between samples was transformed through principal component analysis to reduce data complexity. Factors from principal component analysis were used to create similarity clusters, which were subsequently used in a multivariate logistic regression. In total, 30 OS were identified in early-lactation cow milk, including 21 distinct OS and 9 isomers with unique retention times. The majority of OS detected in the milk samples were present in all individual samples regardless of production type. Two clusters described distribution patterns of OS for the study sample; when median OS abundance was compared between the 2 clusters, we found that overall OS relative abundance was consistently greater in the cluster dominated by beef cows. For several of the structures, including those with known prebiotic effect, the difference in abundance was 2- to 4-fold greater in the beef-dominated cluster. Assuming that beef OS content in milk is the gold standard for cattle, it is likely that preweaning dairy calves are deprived of dietary-source OS. Although supplementing rations with OS is an approach to rectify this deficiency, understanding the health and productivity effects of improving OS abundance being fed to preweaning calves is a necessary next step before recommending supplementation. These studies should account for the observation that OS products are variable for both OS diversity and structural complexity, and some products may not be suitable as prebiotics. PMID:28318588
Krüsemann, Erna J Z; Lasschuijt, Marlou P; de Graaf, C; de Wijk, René A; Punter, Pieter H; van Tiel, Loes; Cremers, Johannes W J M; van de Nobelen, Suzanne; Boesveldt, Sanne; Talhout, Reinskje
2018-05-23
Tobacco flavours are an important regulatory concept in several jurisdictions, for example in the USA, Canada and Europe. The European Tobacco Products Directive 2014/40/EU prohibits cigarettes and roll-your-own tobacco having a characterising flavour. This directive defines characterising flavour as 'a clearly noticeable smell or taste other than one of tobacco […]'. To distinguish between products with and without a characterising flavour, we trained an expert panel to identify characterising flavours by smelling. An expert panel (n=18) evaluated the smell of 20 tobacco products using self-defined odour attributes, following Quantitative Descriptive Analysis. The panel was trained during 14 attribute training, consensus training and performance monitoring sessions. Products were assessed during six test sessions. Principal component analysis, hierarchical clustering (four and six clusters) and Hotelling's T-tests (95% and 99% CIs) were used to determine differences and similarities between tobacco products based on odour attributes. The final attribute list contained 13 odour descriptors. Panel performance was sufficient after 14 training sessions. Products marketed as unflavoured that formed a cluster were considered reference products. A four-cluster method distinguished cherry-flavoured, vanilla-flavoured and menthol-flavoured products from reference products. Six clusters subdivided reference products into tobacco leaves, roll-your-own and commercial products. An expert panel was successfully trained to assess characterising odours in cigarettes and roll-your-own tobacco. This method could be applied to other product types such as e-cigarettes. Regulatory decisions on the choice of reference products and significance level are needed which directly influences the products being assessed as having a characterising odour. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
ERIC Educational Resources Information Center
Santamaría, Andrés P.; Webber, Melinda; Santamaría, Lorri J.; Dam, Lincoln I.
2015-01-01
In early 2014, a team of researchers was invited into partnership with the Maori Success Initiative (MSI), a national, indigenous led network of Maori and non-Maori principals committed to working collaboratively to raise Maori student achievement. Working with over sixty principals across six regional clusters throughout Aotearoa New Zealand,…
On Identifying Clusters Within the C-type Asteroids of the Sloan Digital Sky Survey
NASA Astrophysics Data System (ADS)
Poole, Renae; Ziffer, J.; Harvell, T.
2012-10-01
We applied AutoClass, a data mining technique based upon Bayesian Classification, to C-group asteroid colors in the Sloan Digital Sky Survey (SDSS). Previous taxonomic studies relied mostly on Principal Component Analysis (PCA) to differentiate asteroids within the C-group (e.g. B, G, F, Ch, Cg and Cb). AutoClass's advantage is that it calculates the most probable classification for us, removing the human factor from this part of the analysis. In our results, AutoClass divided the C-groups into two large classes and six smaller classes. The two large classes (n=4974 and 2033, respectively) display distinct regions with some overlap in color-vs-color plots. Each cluster's average spectrum is compared to 'typical' spectra of the C-group subtypes as defined by Tholen (1989) and each cluster's members are evaluated for consistency with previous taxonomies. Of the 117 asteroids classified as B-type in previous taxonomies, only 12 were found with SDSS colors that matched our criteria of having less than 0.1 magnitude error in u and 0.05 magnitude error in g, r, i, and z colors. Although this is a relatively small group, 11 of the 12 B-types were placed by AutoClass in the same cluster. By determining the C-group sub-classifications in the large SDSS database, this research furthers our understanding of the stratigraphy and composition of the main-belt.
Biochemical imaging of tissues by SIMS for biomedical applications
NASA Astrophysics Data System (ADS)
Lee, Tae Geol; Park, Ji-Won; Shon, Hyun Kyong; Moon, Dae Won; Choi, Won Woo; Li, Kapsok; Chung, Jin Ho
2008-12-01
With the development of optimal surface cleaning techniques by cluster ion beam sputtering, certain applications of SIMS for analyzing cells and tissues have been actively investigated. For this report, we collaborated with bio-medical scientists to study bio-SIMS analyses of skin and cancer tissues for biomedical diagnostics. We pay close attention to the setting up of a routine procedure for preparing tissue specimens and treating the surface before obtaining the bio-SIMS data. Bio-SIMS was used to study two biosystems, skin tissues for understanding the effects of photoaging and colon cancer tissues for insight into the development of new cancer diagnostics for cancer. Time-of-flight SIMS imaging measurements were taken after surface cleaning with cluster ion bombardment by Bi n or C 60 under varying conditions. The imaging capability of bio-SIMS with a spatial resolution of a few microns combined with principal component analysis reveal biologically meaningful information, but the lack of high molecular weight peaks even with cluster ion bombardment was a problem. This, among other problems, shows that discourse with biologists and medical doctors are critical to glean any meaningful information from SIMS mass spectrometric and imaging data. For SIMS to be accepted as a routine, daily analysis tool in biomedical laboratories, various practical sample handling methodology such as surface matrix treatment, including nano-metal particles and metal coating, in addition to cluster sputtering, should be studied.
Roessner, Ute; Willmitzer, Lothar; Fernie, Alisdair R.
2001-01-01
We conducted a comprehensive metabolic phenotyping of potato (Solanum tuberosum L. cv Desiree) tuber tissue that had been modified either by transgenesis or exposure to different environmental conditions using a recently developed gas chromatography-mass spectrometry profiling protocol. Applying this technique, we were able to identify and quantify the major constituent metabolites of the potato tuber within a single chromatographic run. The plant systems that we selected to profile were tuber discs incubated in varying concentrations of fructose, sucrose, and mannitol and transgenic plants impaired in their starch biosynthesis. The resultant profiles were then compared, first at the level of individual metabolites and then using the statistical tools hierarchical cluster analysis and principal component analysis. These tools allowed us to assign clusters to the individual plant systems and to determine relative distances between these clusters; furthermore, analyzing the loadings of these analyses enabled identification of the most important metabolites in the definition of these clusters. The metabolic profiles of the sugar-fed discs were dramatically different from the wild-type steady-state values. When these profiles were compared with one another and also with those we assessed in previous studies, however, we were able to evaluate potential phenocopies. These comparisons highlight the importance of such an approach in the functional and qualitative assessment of diverse systems to gain insights into important mediators of metabolism. PMID:11706160
Advanced multivariate analysis to assess remediation of hydrocarbons in soils.
Lin, Deborah S; Taylor, Peter; Tibbett, Mark
2014-10-01
Accurate monitoring of degradation levels in soils is essential in order to understand and achieve complete degradation of petroleum hydrocarbons in contaminated soils. We aimed to develop the use of multivariate methods for the monitoring of biodegradation of diesel in soils and to determine if diesel contaminated soils could be remediated to a chemical composition similar to that of an uncontaminated soil. An incubation experiment was set up with three contrasting soil types. Each soil was exposed to diesel at varying stages of degradation and then analysed for key hydrocarbons throughout 161 days of incubation. Hydrocarbon distributions were analysed by Principal Coordinate Analysis and similar samples grouped by cluster analysis. Variation and differences between samples were determined using permutational multivariate analysis of variance. It was found that all soils followed trajectories approaching the chemical composition of the unpolluted soil. Some contaminated soils were no longer significantly different to that of uncontaminated soil after 161 days of incubation. The use of cluster analysis allows the assignment of a percentage chemical similarity of a diesel contaminated soil to an uncontaminated soil sample. This will aid in the monitoring of hydrocarbon contaminated sites and the establishment of potential endpoints for successful remediation.
Madeo, Andrea; Piras, Paolo; Re, Federica; Gabriele, Stefano; Nardinocchi, Paola; Teresi, Luciano; Torromeo, Concetta; Chialastri, Claudia; Schiariti, Michele; Giura, Geltrude; Evangelista, Antonietta; Dominici, Tania; Varano, Valerio; Zachara, Elisabetta; Puddu, Paolo Emilio
2015-01-01
The assessment of left ventricular shape changes during cardiac revolution may be a new step in clinical cardiology to ease early diagnosis and treatment. To quantify these changes, only point registration was adopted and neither Generalized Procrustes Analysis nor Principal Component Analysis were applied as we did previously to study a group of healthy subjects. Here, we extend to patients affected by hypertrophic cardiomyopathy the original approach and preliminarily include genotype positive/phenotype negative individuals to explore the potential that incumbent pathology might also be detected. Using 3D Speckle Tracking Echocardiography, we recorded left ventricular shape of 48 healthy subjects, 24 patients affected by hypertrophic cardiomyopathy and 3 genotype positive/phenotype negative individuals. We then applied Generalized Procrustes Analysis and Principal Component Analysis and inter-individual differences were cleaned by Parallel Transport performed on the tangent space, along the horizontal geodesic, between the per-subject consensuses and the grand mean. Endocardial and epicardial layers were evaluated separately, different from many ecocardiographic applications. Under a common Principal Component Analysis, we then evaluated left ventricle morphological changes (at both layers) explained by first Principal Component scores. Trajectories’ shape and orientation were investigated and contrasted. Logistic regression and Receiver Operating Characteristic curves were used to compare these morphometric indicators with traditional 3D Speckle Tracking Echocardiography global parameters. Geometric morphometrics indicators performed better than 3D Speckle Tracking Echocardiography global parameters in recognizing pathology both in systole and diastole. Genotype positive/phenotype negative individuals clustered with patients affected by hypertrophic cardiomyopathy during diastole, suggesting that incumbent pathology may indeed be foreseen by these methods. Left ventricle deformation in patients affected by hypertrophic cardiomyopathy compared to healthy subjects may be assessed by modern shape analysis better than by traditional 3D Speckle Tracking Echocardiography global parameters. Hypertrophic cardiomyopathy pathophysiology was unveiled in a new manner whereby also diastolic phase abnormalities are evident which is more difficult to investigate by traditional ecocardiographic techniques. PMID:25875818
Pandolfi, Fanny; Edwards, Sandra A; Maes, Dominiek; Kyriazakis, Ilias
2018-01-01
This study aimed to provide an overview of the interconnections between biosecurity, health, welfare, and performance in commercial pig farms in Great Britain. We collected on-farm data about the level of biosecurity and animal performance in 40 fattening pig farms and 28 breeding pig farms between 2015 and 2016. We identified interconnections between these data, slaughterhouse health indicators, and welfare indicator records in fattening pig farms. After achieving the connections between databases, a secondary data analysis was performed to assess the interconnections between biosecurity, health, welfare, and performance using correlation analysis, principal component analysis, and hierarchical clustering. Although we could connect the different data sources the final sample size was limited, suggesting room for improvement in database connection to conduct secondary data analyses. The farm biosecurity scores ranged from 40 to 90 out of 100, with internal biosecurity scores being lower than external biosecurity scores. Our analysis suggested several interconnections between health, welfare, and performance. The initial correlation analysis showed that the prevalence of lameness and severe tail lesions was associated with the prevalence of enzootic pneumonia-like lesions and pyaemia, and the prevalence of severe body marks was associated with several disease indicators, including peritonitis and milk spots ( r > 0.3; P < 0.05). Higher average daily weight gain (ADG) was associated with lower prevalence of pleurisy ( r > 0.3; P < 0.05), but no connection was identified between mortality and health indicators. A subsequent cluster analysis enabled identification of patterns which considered concurrently indicators of health, welfare, and performance. Farms from cluster 1 had lower biosecurity scores, lower ADG, and higher prevalence of several disease and welfare indicators. Farms from cluster 2 had higher biosecurity scores than cluster 1, but a higher prevalence of pigs requiring hospitalization and lameness which confirmed the correlation between biosecurity and the prevalence of pigs requiring hospitalization ( r > 0.3; P < 0.05). Farms from cluster 3 had higher biosecurity, higher ADG, and lower prevalence for some disease and welfare indicators. The study suggests a smaller impact of biosecurity on issues such as mortality, prevalence of lameness, and pig requiring hospitalization. The correlations and the identified clusters suggested the importance of animal welfare for the pig industry.
Xu, Ning; Zhou, Guofu; Li, Xiaojuan; Lu, Heng; Meng, Fanyun; Zhai, Huaqiang
2017-05-01
A reliable and comprehensive method for identifying the origin and assessing the quality of Epimedium has been developed. The method is based on analysis of HPLC fingerprints, combined with similarity analysis, hierarchical cluster analysis (HCA), principal component analysis (PCA) and multi-ingredient quantitative analysis. Nineteen batches of Epimedium, collected from different areas in the western regions of China, were used to establish the fingerprints and 18 peaks were selected for the analysis. Similarity analysis, HCA and PCA all classified the 19 areas into three groups. Simultaneous quantification of the five major bioactive ingredients in the Epimedium samples was also carried out to confirm the consistency of the quality tests. These methods were successfully used to identify the geographical origin of the Epimedium samples and to evaluate their quality. Copyright © 2016 John Wiley & Sons, Ltd.
Descriptive Epidemiology of Typhoid Fever during an Epidemic in Harare, Zimbabwe, 2012
Polonsky, Jonathan A.; Martínez-Pino, Isabel; Nackers, Fabienne; Chonzi, Prosper; Manangazira, Portia; Van Herp, Michel; Maes, Peter; Porten, Klaudia; Luquero, Francisco J.
2014-01-01
Background Typhoid fever remains a significant public health problem in developing countries. In October 2011, a typhoid fever epidemic was declared in Harare, Zimbabwe - the fourth enteric infection epidemic since 2008. To orient control activities, we described the epidemiology and spatiotemporal clustering of the epidemic in Dzivaresekwa and Kuwadzana, the two most affected suburbs of Harare. Methods A typhoid fever case-patient register was analysed to describe the epidemic. To explore clustering, we constructed a dataset comprising GPS coordinates of case-patient residences and randomly sampled residential locations (spatial controls). The scale and significance of clustering was explored with Ripley K functions. Cluster locations were determined by a random labelling technique and confirmed using Kulldorff's spatial scan statistic. Principal Findings We analysed data from 2570 confirmed and suspected case-patients, and found significant spatiotemporal clustering of typhoid fever in two non-overlapping areas, which appeared to be linked to environmental sources. Peak relative risk was more than six times greater than in areas lying outside the cluster ranges. Clusters were identified in similar geographical ranges by both random labelling and Kulldorff's spatial scan statistic. The spatial scale at which typhoid fever clustered was highly localised, with significant clustering at distances up to 4.5 km and peak levels at approximately 3.5 km. The epicentre of infection transmission shifted from one cluster to the other during the course of the epidemic. Conclusions This study demonstrated highly localised clustering of typhoid fever during an epidemic in an urban African setting, and highlights the importance of spatiotemporal analysis for making timely decisions about targetting prevention and control activities and reinforcing treatment during epidemics. This approach should be integrated into existing surveillance systems to facilitate early detection of epidemics and identify their spatial range. PMID:25486292
STAR FORMATION AND SUPERCLUSTER ENVIRONMENT OF 107 NEARBY GALAXY CLUSTERS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cohen, Seth A.; Hickox, Ryan C.; Wegner, Gary A.
We analyze the relationship between star formation (SF), substructure, and supercluster environment in a sample of 107 nearby galaxy clusters using data from the Sloan Digital Sky Survey. Previous works have investigated the relationships between SF and cluster substructure, and cluster substructure and supercluster environment, but definitive conclusions relating all three of these variables has remained elusive. We find an inverse relationship between cluster SF fraction ( f {sub SF}) and supercluster environment density, calculated using the Galaxy luminosity density field at a smoothing length of 8 h {sup −1} Mpc (D8). The slope of f {sub SF} versus D8more » is −0.008 ± 0.002. The f {sub SF} of clusters located in low-density large-scale environments, 0.244 ± 0.011, is higher than for clusters located in high-density supercluster cores, 0.202 ± 0.014. We also divide superclusters, according to their morphology, into filament- and spider-type systems. The inverse relationship between cluster f {sub SF} and large-scale density is dominated by filament- rather than spider-type superclusters. In high-density cores of superclusters, we find a higher f {sub SF} in spider-type superclusters, 0.229 ± 0.016, than in filament-type superclusters, 0.166 ± 0.019. Using principal component analysis, we confirm these results and the direct correlation between cluster substructure and SF. These results indicate that cluster SF is affected by both the dynamical age of the cluster (younger systems exhibit higher amounts of SF); the large-scale density of the supercluster environment (high-density core regions exhibit lower amounts of SF); and supercluster morphology (spider-type superclusters exhibit higher amounts of SF at high densities).« less
Type 2 diabetes mellitus: distribution of genetic markers in Kazakh population
Sikhayeva, Nurgul; Talzhanov, Yerkebulan; Iskakova, Aisha; Dzharmukhanov, Jarkyn; Nugmanova, Raushan; Zholdybaeva, Elena; Ramanculov, Erlan
2018-01-01
Background Ethnic differences exist in the frequencies of genetic variations that contribute to the risk of common disease. This study aimed to analyse the distribution of several genes, previously associated with susceptibility to type 2 diabetes and obesity-related phenotypes, in a Kazakh population. Methods A total of 966 individuals belonging to the Kazakh ethnicity were recruited from an outpatient clinic. We genotyped 41 common single nucleotide polymorphisms (SNPs) previously associated with type 2 diabetes in other ethnic groups and 31 of these were in Hardy–Weinberg equilibrium. The obtained allele frequencies were further compared to publicly available data from other ethnic populations. Allele frequencies for other (compared) populations were pooled from the haplotype map (HapMap) database. Principal component analysis (PCA), cluster analysis, and multidimensional scaling (MDS) were used for the analysis of genetic relationship between the populations. Results Comparative analysis of allele frequencies of the studied SNPs showed significant differentiation among the studied populations. The Kazakh population was grouped with Asian populations according to the cluster analysis and with the Caucasian populations according to PCA. According to MDS, results of the current study show that the Kazakh population holds an intermediate position between Caucasian and Asian populations. Conclusion A high percentage of population differentiation was observed between Kazakh and world populations. The Kazakh population was clustered with Caucasian populations, and this result may indicate a significant Caucasian component in the Kazakh gene pool. PMID:29551892
Automated extraction and analysis of rock discontinuity characteristics from 3D point clouds
NASA Astrophysics Data System (ADS)
Bianchetti, Matteo; Villa, Alberto; Agliardi, Federico; Crosta, Giovanni B.
2016-04-01
A reliable characterization of fractured rock masses requires an exhaustive geometrical description of discontinuities, including orientation, spacing, and size. These are required to describe discontinuum rock mass structure, perform Discrete Fracture Network and DEM modelling, or provide input for rock mass classification or equivalent continuum estimate of rock mass properties. Although several advanced methodologies have been developed in the last decades, a complete characterization of discontinuity geometry in practice is still challenging, due to scale-dependent variability of fracture patterns and difficult accessibility to large outcrops. Recent advances in remote survey techniques, such as terrestrial laser scanning and digital photogrammetry, allow a fast and accurate acquisition of dense 3D point clouds, which promoted the development of several semi-automatic approaches to extract discontinuity features. Nevertheless, these often need user supervision on algorithm parameters which can be difficult to assess. To overcome this problem, we developed an original Matlab tool, allowing fast, fully automatic extraction and analysis of discontinuity features with no requirements on point cloud accuracy, density and homogeneity. The tool consists of a set of algorithms which: (i) process raw 3D point clouds, (ii) automatically characterize discontinuity sets, (iii) identify individual discontinuity surfaces, and (iv) analyse their spacing and persistence. The tool operates in either a supervised or unsupervised mode, starting from an automatic preliminary exploration data analysis. The identification and geometrical characterization of discontinuity features is divided in steps. First, coplanar surfaces are identified in the whole point cloud using K-Nearest Neighbor and Principal Component Analysis algorithms optimized on point cloud accuracy and specified typical facet size. Then, discontinuity set orientation is calculated using Kernel Density Estimation and principal vector similarity criteria. Poles to points are assigned to individual discontinuity objects using easy custom vector clustering and Jaccard distance approaches, and each object is segmented into planar clusters using an improved version of the DBSCAN algorithm. Modal set orientations are then recomputed by cluster-based orientation statistics to avoid the effects of biases related to cluster size and density heterogeneity of the point cloud. Finally, spacing values are measured between individual discontinuity clusters along scanlines parallel to modal pole vectors, whereas individual feature size (persistence) is measured using 3D convex hull bounding boxes. Spacing and size are provided both as raw population data and as summary statistics. The tool is optimized for parallel computing on 64bit systems, and a Graphic User Interface (GUI) has been developed to manage data processing, provide several outputs, including reclassified point clouds, tables, plots, derived fracture intensity parameters, and export to modelling software tools. We present test applications performed both on synthetic 3D data (simple 3D solids) and real case studies, validating the results with existing geomechanical datasets.
Hyperspectral imaging of polymer banknotes for building and analysis of spectral library
NASA Astrophysics Data System (ADS)
Lim, Hoong-Ta; Murukeshan, Vadakke Matham
2017-11-01
The use of counterfeit banknotes increases crime rates and cripples the economy. New countermeasures are required to stop counterfeiters who use advancing technologies with criminal intent. Many countries started adopting polymer banknotes to replace paper notes, as polymer notes are more durable and have better quality. The research on authenticating such banknotes is of much interest to the forensic investigators. Hyperspectral imaging can be employed to build a spectral library of polymer notes, which can then be used for classification to authenticate these notes. This is however not widely reported and has become a research interest in forensic identification. This paper focuses on the use of hyperspectral imaging on polymer notes to build spectral libraries, using a pushbroom hyperspectral imager which has been previously reported. As an initial study, a spectral library will be built from three arbitrarily chosen regions of interest of five circulated genuine polymer notes. Principal component analysis is used for dimension reduction and to convert the information in the spectral library to principal components. A 99% confidence ellipse is formed around the cluster of principal component scores of each class and then used as classification criteria. The potential of the adopted methodology is demonstrated by the classification of the imaged regions as training samples.
Distributions of experimental protein structures on coarse-grained free energy landscapes
Liu, Jie; Jernigan, Robert L.
2015-01-01
Predicting conformational changes of proteins is needed in order to fully comprehend functional mechanisms. With the large number of available structures in sets of related proteins, it is now possible to directly visualize the clusters of conformations and their conformational transitions through the use of principal component analysis. The most striking observation about the distributions of the structures along the principal components is their highly non-uniform distributions. In this work, we use principal component analysis of experimental structures of 50 diverse proteins to extract the most important directions of their motions, sample structures along these directions, and estimate their free energy landscapes by combining knowledge-based potentials and entropy computed from elastic network models. When these resulting motions are visualized upon their coarse-grained free energy landscapes, the basis for conformational pathways becomes readily apparent. Using three well-studied proteins, T4 lysozyme, serum albumin, and sarco-endoplasmic reticular Ca2+ adenosine triphosphatase (SERCA), as examples, we show that such free energy landscapes of conformational changes provide meaningful insights into the functional dynamics and suggest transition pathways between different conformational states. As a further example, we also show that Monte Carlo simulations on the coarse-grained landscape of HIV-1 protease can directly yield pathways for force-driven conformational changes. PMID:26723638
Davis, Harley T.; Aelion, C. Marjorie; McDermott, Suzanne; Lawson, Andrew B.
2009-01-01
Determining sources of neurotoxic metals in rural and urban soils is important for mitigating human exposure. Surface soil from four areas with significant clusters of mental retardation and developmental delay (MR/DD) in children, and one control site were analyzed for nine metals and characterized by soil type, climate, ecological region, land use and industrial facilities using readily-available GIS-based data. Kriging, principal component analysis (PCA) and cluster analysis (CA) were used to identify commonalities of metal distribution. Three MR/DD areas (one rural and two urban) had similar soil types and significantly higher soil metal concentrations. PCA and CA results suggested that Ba, Be and Mn were consistently from natural sources; Pb and Hg from anthropogenic sources; and As, Cr, Cu, and Ni from both sources. Arsenic had low commonality estimates, was highly associated with a third PCA factor, and had a complex distribution, complicating mitigation strategies to minimize concentrations and exposures. PMID:19361902
Statistical and clustering analysis for disturbances: A case study of voltage dips in wind farms
Garcia-Sanchez, Tania; Gomez-Lazaro, Emilio; Muljadi, Eduard; ...
2016-01-28
This study proposes and evaluates an alternative statistical methodology to analyze a large number of voltage dips. For a given voltage dip, a set of lengths is first identified to characterize the root mean square (rms) voltage evolution along the disturbance, deduced from partial linearized time intervals and trajectories. Principal component analysis and K-means clustering processes are then applied to identify rms-voltage patterns and propose a reduced number of representative rms-voltage profiles from the linearized trajectories. This reduced group of averaged rms-voltage profiles enables the representation of a large amount of disturbances, which offers a visual and graphical representation ofmore » their evolution along the events, aspects that were not previously considered in other contributions. The complete process is evaluated on real voltage dips collected in intense field-measurement campaigns carried out in a wind farm in Spain among different years. The results are included in this paper.« less
Milanović, Vesna; Osimani, Andrea; Pasquini, Marina; Aquilanti, Lucia; Garofalo, Cristiana; Taccari, Manuela; Cardinali, Federica; Riolo, Paola; Clementi, Francesca
2016-06-16
This study was aimed at investigating the occurrence of 11 transferable antibiotic resistance (AR) genes [erm(A), erm(B), erm(C), vanA, vanB, tet(M), tet(O), tet(S), tet(K), mecA, blaZ] in 11 species of marketed edible insects (small crickets powder, small crickets, locusts, mealworm larvae, giant waterbugs, black ants, winged termite alates, rhino beetles, mole crickets, silkworm pupae, and black scorpions) in order to provide a first baseline for risk assessment. Among the AR genes under study, tet(K) occurred with the highest frequency, followed by erm(B), tet(S) and blaZ. A high variability was seen among the samples, in terms of occurrence of different AR determinants. Cluster Analysis and Principal Coordinates Analysis allowed the 11 samples to be grouped in two main clusters, one including all but one samples produced in Thailand and the other including those produced in the Netherlands. Copyright © 2016 Elsevier B.V. All rights reserved.
Ben Ayed, Rayda; Ben Hassen, Hanen; Ennouri, Karim; Rebai, Ahmed
2016-12-01
The genetic diversity of 22 olive tree cultivars (Olea europaea L.) sampled from different Mediterranean countries was assessed using 5 SNP markers (FAD2.1; FAD2.3; CALC; SOD and ANTHO3) located in four different genes. The genotyping analysis of the 22 cultivars with 5 SNP loci revealed 11 alleles (average 2.2 per allele). The dendrogram based on cultivar genotypes revealed three clusters consistent with the cultivars classification. Besides, the results obtained with the five SNPs were compared to those obtained with the SSR markers using bioinformatic analyses and by computing a cophenetic correlation coefficient, indicating the usefulness of the UPGMA method for clustering plant genotypes. Based on principal coordinate analysis using a similarity matrix, the first two coordinates, revealed 54.94 % of the total variance. This work provides a more comprehensive explanation of the diversity available in Tunisia olive cultivars, and an important contribution for olive breeding and olive oil authenticity.
Mojarrad, Mehran; Hosseini Sarghein, Siavash; Sonboli, Ali
2018-05-16
Chemical diversity of the essential oils of twenty wild populations of Tanacetum polycephalum Sch. Bip., was investigated. The aerial parts of T. polycephalum were collected at full flowering stage from West Azerbaijan Province of Iran, air-dried; hydrodistilled to produce essential oils. The essential oils were analyzed by GC-FID and GC-MS. A total of forty compounds were identified accounting for 96.4-99.9% of the total oils. The most principal compounds were cis-thujone (0-82.3%), trans-thujone (0-79.8%), camphor (1.3-75.0%), 1,8-cineole (4.5-43.3%), borneol (1.0-36.2%) and bornyl acetate (0-26.8%). Hierarchical cluster analysis based on the percentages (>0.5%) of the essential oils components was carried out to determine the chemical diversity among the populations studied. The cluster analysis resulted in the identification of four main chemotypes namely: 'camphor + 1,8-cineole', 'mixed', 'cis-thujone' and 'trans-thujone'.
Net-phytoplankton communities in the Western Boundary Currents and their environmental correlations
NASA Astrophysics Data System (ADS)
Chen, Yunyan; Sun, Xiaoxia; Zhun, Mingliang
2018-03-01
This study investigated net-phytoplankton biomass, species composition, the phytoplankton abundance horizontal distribution, and the correlations between net-phytoplankton communities and mesoscale structure that were derived from the net samples taken from the Western Boundary Currents during summer, 2014. A total of 199 phytoplankton species belonging to 61 genera in four phyla were identified. The dominant species included Climacodium frauenfeldianum, Thalassiothrix longissima, Rhizosolenia styliformis var. styliformis, Pyrocystis noctiluca, Ceratium trichoceros, and Trichodesmium thiebautii. Four phytoplankton communities were divided by cluster analysis and the clusters were mainly associated with the North Equatorial Counter Current (NECC), the North Equatorial Current (NEC), the Subtropical Counter Current (STCC), and the Luzon Current (LC), respectively. The lowest phytoplankton cell abundance and the highest Trichodesmium filament abundance were recorded in the STCC region. The principal component analysis showed that T. thiebautii preferred warm and nutrient poor water. There was also an increase in phytoplankton abundance and biomass near 5°N in the NECC region, where they benefit from upwellings and eddies.
Tuttolomondo, Teresa; Dugo, Giacomo; Ruberto, Giuseppe; Leto, Claudio; Napoli, Edoardo M; Cicero, Nicola; Gervasi, Teresa; Virga, Giuseppe; Leone, Raffaele; Licata, Mario; La Bella, Salvatore
2015-01-01
In this study the chemical characterisation of 10 Sicilian Rosmarinus officinalis L. biotypes essential oils is reported. The main goal of this work was to analyse the relationship between the essential oils yield and the geographical distribution of the species plants. The essential oils were analysed by GC-FID and GC-MS. Hierarchical cluster analysis and principal component analysis statistical methods were used to cluster biotypes according to the essential oils chemical composition. The essential oil yield ranged from 0.8 to 2.3 (v/w). In total 82 compounds have been identified, these represent 96.7-99.9% of the essential oil. The most represented compounds in the essential oils were 1.8-cineole, linalool, α-terpineol, verbenone, α-pinene, limonene, bornyl acetate and terpinolene. The results show that the essential oil yield of the 10 biotypes is affected by the environmental characteristics of the sampling sites while the chemical composition is linked to the genetic characteristics of different biotypes.
Wu, Xiao; Yin, Hao; Shi, Zebin; Chen, Yangyang; Qi, Kaijie; Qiao, Xin; Wang, Guoming; Cao, Peng; Zhang, Shaoling
2018-01-01
An evaluation of fruit wax components will provide us with valuable information for pear breeding and enhancing fruit quality. Here, we dissected the epicuticular wax concentration, composition and structure of mature fruits from 35 pear cultivars belonging to five different species and hybrid interspecies. A total of 146 epicuticular wax compounds were detected, and the wax composition and concentration varied dramatically among species, with the highest level of 1.53 mg/cm2 in Pyrus communis and the lowest level of 0.62 mg/cm2 in Pyrus pyrifolia. Field emission scanning electron microscopy (FESEM) analysis showed amorphous structures of the epicuticular wax crystals of different pear cultivars. Cluster analysis revealed that the Pyrus bretschneideri cultivars were grouped much closer to Pyrus pyrifolia and Pyrus ussuriensis, and the Pyrus sinkiangensis cultivars were clustered into a distant group. Based on the principal component analysis (PCA), the cultivars could be divided into three groups and five groups according to seven main classes of epicuticular wax compounds and 146 wax compounds, respectively. PMID:29875784
Washio, Kana; Oka, Takashi; Abdalkader, Lamia; Muraoka, Michiko; Shimada, Akira; Oda, Megumi; Sato, Hiaki; Takata, Katsuyoshi; Kagami, Yoshitoyo; Shimizu, Norio; Kato, Seiichi; Kimura, Hiroshi; Nishizaki, Kazunori; Yoshino, Tadashi; Tsukahara, Hirokazu
2017-11-01
The human herpes virus, Epstein-Barr virus (EBV), is a known oncogenic virus and plays important roles in life-threatening T/NK-cell lymphoproliferative disorders (T/NK-cell LPD) such as hypersensitivity to mosquito bite (HMB), chronic active EBV infection (CAEBV), and NK/T-cell lymphoma/leukemia. During the clinical courses of HMB and CAEBV, patients frequently develop malignant lymphomas and the diseases passively progress sequentially. In the present study, gene expression of CD16 (-) CD56 (+) -, EBV (+) HMB, CAEBV, NK-lymphoma, and NK-leukemia cell lines, which were established from patients, was analyzed using oligonucleotide microarrays and compared to that of CD56 bright CD16 dim/- NK cells from healthy donors. Principal components analysis showed that CAEBV and NK-lymphoma cells were relatively closely located, indicating that they had similar expression profiles. Unsupervised hierarchal clustering analyses of microarray data and gene ontology analysis revealed specific gene clusters and identified several candidate genes responsible for disease that can be used to discriminate each category of NK-LPD and NK-cell lymphoma/leukemia.
Chen, Lin; Liu, Yuetao; Guo, Qingfeng; Zheng, Qingxia; Zhang, Wancun
2018-05-11
A systematic study on the metabolome differences between wild Ophiocordyceps sinensis and artificial cultured Cordyceps militaris was conducted using liquid chromatography-mass spectrometry. Principal component analysis and orthogonal projection on latent structure-discriminant analysis results showed that C. militaris grown on solid rice medium (R-CM) and C. militaris grown on tussah pupa (T-CM) evidently separated and individually separated from wild O. sinensis, indicating metabolome difference among wild O. sinensis, R-CM and T-CM. The metabolome differences between R-CM and T-CM indicated that C. militaris could accommodate to culture medium by differential metabolic regulation. Hierarchical clustering analysis was further performed to cluster the differential metabolites and samples based on their metabolic similarity. The higher content of amino acids (pyroglutamic acid, glutamic acid, histidine, phenylalanine and arginine), unsaturated fatty acid (linolenic acid and linoleic acid), peptides, mannitol, adenosine and succinoadenosine in O. sinensis make it as an excellent choice as a traditional Chinese medicine for invigoration or nutritional supplementation. Similar compositions with O. sinensis and easy cultivation make artificially cultured C. militaris a possible alternative to O. sinensis. Copyright © 2018 John Wiley & Sons, Ltd.
Szymanska-Chargot, M; Chylinska, M; Kruk, B; Zdunek, A
2015-01-22
The aim of this work was to quantitatively and qualitatively determine the composition of the cell wall material from apples during development by means of Fourier transform infrared (FT-IR) spectroscopy. The FT-IR region of 1500-800 cm(-1), containing characteristic bands for galacturonic acid, hemicellulose and cellulose, was examined using principal component analysis (PCA), k-means clustering and partial least squares (PLS). The samples were differentiated by development stage and cultivar using PCA and k-means clustering. PLS calibration models for galacturonic acid, hemicellulose and cellulose content from FT-IR spectra were developed and validated with the reference data. PLS models were tested using the root-mean-square errors of cross-validation for contents of galacturonic acid, hemicellulose and cellulose which was 8.30 mg/g, 4.08% and 1.74%, respectively. It was proven that FT-IR spectroscopy combined with chemometric methods has potential for fast and reliable determination of the main constituents of fruit cell walls. Copyright © 2014 Elsevier Ltd. All rights reserved.
Cao, Zhen; Wang, Zhenjie; Shang, Zhonglin; Zhao, Jiancheng
2017-01-01
Fourier-transform infrared spectroscopy (FTIR) with the attenuated total reflectance technique was used to identify Rhodobryum roseum from its four adulterants. The FTIR spectra of six samples in the range from 4000 cm-1 to 600 cm-1 were obtained. The second-derivative transformation test was used to identify the small and nearby absorption peaks. A cluster analysis was performed to classify the spectra in a dendrogram based on the spectral similarity. Principal component analysis (PCA) was used to classify the species of six moss samples. A cluster analysis with PCA was used to identify different genera. However, some species of the same genus exhibited highly similar chemical components and FTIR spectra. Fourier self-deconvolution and discrete wavelet transform (DWT) were used to enhance the differences among the species with similar chemical components and FTIR spectra. Three scales were selected as the feature-extracting space in the DWT domain. The results show that FTIR spectroscopy with chemometrics is suitable for identifying Rhodobryum roseum and its adulterants.
NASA Astrophysics Data System (ADS)
Alevizos, Evangelos; Snellen, Mirjam; Simons, Dick; Siemes, Kerstin; Greinert, Jens
2018-06-01
This study applies three classification methods exploiting the angular dependence of acoustic seafloor backscatter along with high resolution sub-bottom profiling for seafloor sediment characterization in the Eckernförde Bay, Baltic Sea Germany. This area is well suited for acoustic backscatter studies due to its shallowness, its smooth bathymetry and the presence of a wide range of sediment types. Backscatter data were acquired using a Seabeam1180 (180 kHz) multibeam echosounder and sub-bottom profiler data were recorded using a SES-2000 parametric sonar transmitting 6 and 12 kHz. The high density of seafloor soundings allowed extracting backscatter layers for five beam angles over a large part of the surveyed area. A Bayesian probability method was employed for sediment classification based on the backscatter variability at a single incidence angle, whereas Maximum Likelihood Classification (MLC) and Principal Components Analysis (PCA) were applied to the multi-angle layers. The Bayesian approach was used for identifying the optimum number of acoustic classes because cluster validation is carried out prior to class assignment and class outputs are ordinal categorical values. The method is based on the principle that backscatter values from a single incidence angle express a normal distribution for a particular sediment type. The resulting Bayesian classes were well correlated to median grain sizes and the percentage of coarse material. The MLC method uses angular response information from five layers of training areas extracted from the Bayesian classification map. The subsequent PCA analysis is based on the transformation of these five layers into two principal components that comprise most of the data variability. These principal components were clustered in five classes after running an external cluster validation test. In general both methods MLC and PCA, separated the various sediment types effectively, showing good agreement (kappa >0.7) with the Bayesian approach which also correlates well with ground truth data (r2 > 0.7). In addition, sub-bottom data were used in conjunction with the Bayesian classification results to characterize acoustic classes with respect to their geological and stratigraphic interpretation. The joined interpretation of seafloor and sub-seafloor data sets proved to be an efficient approach for a better understanding of seafloor backscatter patchiness and to discriminate acoustically similar classes in different geological/bathymetric settings.
Inference from clustering with application to gene-expression microarrays.
Dougherty, Edward R; Barrera, Junior; Brun, Marcel; Kim, Seungchan; Cesar, Roberto M; Chen, Yidong; Bittner, Michael; Trent, Jeffrey M
2002-01-01
There are many algorithms to cluster sample data points based on nearness or a similarity measure. Often the implication is that points in different clusters come from different underlying classes, whereas those in the same cluster come from the same class. Stochastically, the underlying classes represent different random processes. The inference is that clusters represent a partition of the sample points according to which process they belong. This paper discusses a model-based clustering toolbox that evaluates cluster accuracy. Each random process is modeled as its mean plus independent noise, sample points are generated, the points are clustered, and the clustering error is the number of points clustered incorrectly according to the generating random processes. Various clustering algorithms are evaluated based on process variance and the key issue of the rate at which algorithmic performance improves with increasing numbers of experimental replications. The model means can be selected by hand to test the separability of expected types of biological expression patterns. Alternatively, the model can be seeded by real data to test the expected precision of that output or the extent of improvement in precision that replication could provide. In the latter case, a clustering algorithm is used to form clusters, and the model is seeded with the means and variances of these clusters. Other algorithms are then tested relative to the seeding algorithm. Results are averaged over various seeds. Output includes error tables and graphs, confusion matrices, principal-component plots, and validation measures. Five algorithms are studied in detail: K-means, fuzzy C-means, self-organizing maps, hierarchical Euclidean-distance-based and correlation-based clustering. The toolbox is applied to gene-expression clustering based on cDNA microarrays using real data. Expression profile graphics are generated and error analysis is displayed within the context of these profile graphics. A large amount of generated output is available over the web.
Maheux, Andrée F; Sellam, Adnane; Piché, Yves; Boissinot, Maurice; Pelletier, René; Boudreau, Dominique K; Picard, François J; Trépanier, Hélène; Boily, Marie-Josée; Ouellette, Marc; Roy, Paul H; Bergeron, Michel G
2016-12-01
Successful treatment of a Candida infection relies on 1) an accurate identification of the pathogenic fungus and 2) on its susceptibility to antifungal drugs. In the present study we investigated the level of correlation between phylogenetical evolution and susceptibility of pathogenic Candida spp. to antifungal drugs. For this, we compared a phylogenetic tree, assembled with the concatenated sequences (2475-bp) of the ATP2, TEF1, and TUF1 genes from 20 representative Candida species, with published minimal inhibitory concentrations (MIC) of the four principal antifungal drug classes commonly used in the treatment of candidiasis: polyenes, triazoles, nucleoside analogues, and echinocandins. The phylogenetic tree revealed three distinct phylogenetic clusters among Candida species. Species within a given phylogenetic cluster have generally similar susceptibility profiles to antifungal drugs and species within Clusters II and III were less sensitive to antifungal drugs than Cluster I species. These results showed that phylogenetical relationship between clusters and susceptibility to several antifungal drugs could be used to guide therapy when only species identification is available prior to information pertaining to its resistance profile. An extended study comprising a large panel of clinical samples should be conducted to confirm the efficiency of this approach in the treatment of candidiasis. Copyright © 2016. Published by Elsevier B.V.
Recognizing patterns of visual field loss using unsupervised machine learning
NASA Astrophysics Data System (ADS)
Yousefi, Siamak; Goldbaum, Michael H.; Zangwill, Linda M.; Medeiros, Felipe A.; Bowd, Christopher
2014-03-01
Glaucoma is a potentially blinding optic neuropathy that results in a decrease in visual sensitivity. Visual field abnormalities (decreased visual sensitivity on psychophysical tests) are the primary means of glaucoma diagnosis. One form of visual field testing is Frequency Doubling Technology (FDT) that tests sensitivity at 52 points within the visual field. Like other psychophysical tests used in clinical practice, FDT results yield specific patterns of defect indicative of the disease. We used Gaussian Mixture Model with Expectation Maximization (GEM), (EM is used to estimate the model parameters) to automatically separate FDT data into clusters of normal and abnormal eyes. Principal component analysis (PCA) was used to decompose each cluster into different axes (patterns). FDT measurements were obtained from 1,190 eyes with normal FDT results and 786 eyes with abnormal (i.e., glaucomatous) FDT results, recruited from a university-based, longitudinal, multi-center, clinical study on glaucoma. The GEM input was the 52-point FDT threshold sensitivities for all eyes. The optimal GEM model separated the FDT fields into 3 clusters. Cluster 1 contained 94% normal fields (94% specificity) and clusters 2 and 3 combined, contained 77% abnormal fields (77% sensitivity). For clusters 1, 2 and 3 the optimal number of PCA-identified axes were 2, 2 and 5, respectively. GEM with PCA successfully separated FDT fields from healthy and glaucoma eyes and identified familiar glaucomatous patterns of loss.
Long-term surface EMG monitoring using K-means clustering and compressive sensing
NASA Astrophysics Data System (ADS)
Balouchestani, Mohammadreza; Krishnan, Sridhar
2015-05-01
In this work, we present an advanced K-means clustering algorithm based on Compressed Sensing theory (CS) in combination with the K-Singular Value Decomposition (K-SVD) method for Clustering of long-term recording of surface Electromyography (sEMG) signals. The long-term monitoring of sEMG signals aims at recording of the electrical activity produced by muscles which are very useful procedure for treatment and diagnostic purposes as well as for detection of various pathologies. The proposed algorithm is examined for three scenarios of sEMG signals including healthy person (sEMG-Healthy), a patient with myopathy (sEMG-Myopathy), and a patient with neuropathy (sEMG-Neuropathr), respectively. The proposed algorithm can easily scan large sEMG datasets of long-term sEMG recording. We test the proposed algorithm with Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) dimensionality reduction methods. Then, the output of the proposed algorithm is fed to K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers in order to calclute the clustering performance. The proposed algorithm achieves a classification accuracy of 99.22%. This ability allows reducing 17% of Average Classification Error (ACE), 9% of Training Error (TE), and 18% of Root Mean Square Error (RMSE). The proposed algorithm also reduces 14% clustering energy consumption compared to the existing K-Means clustering algorithm.
Zhang, Bing; Song, Xianfang; Zhang, Yinghua; Han, Dongmei; Tang, Changyuan; Yu, Yilei; Ma, Ying
2012-05-15
Water quality is the critical factor that influence on human health and quantity and quality of grain production in semi-humid and semi-arid area. Songnen plain is one of the grain bases in China, as well as one of the three major distribution regions of soda saline-alkali soil in the world. To assess the water quality, surface water and groundwater were sampled and analyzed by fuzzy membership analysis and multivariate statistics. The surface water were gather into class I, IV and V, while groundwater were grouped as class I, II, III and V by fuzzy membership analysis. The water samples were grouped into four categories according to irrigation water quality assessment diagrams of USDA. Most water samples distributed in category C1-S1, C2-S2 and C3-S3. Three groups were generated from hierarchical cluster analysis. Four principal components were extracted from principal component analysis. The indicators to water quality assessment were Na, HCO(3), NO(3), Fe, Mn and EC from principal component analysis. We conclude that surface water and shallow groundwater are suitable for irrigation, the reservoir and deep groundwater in upstream are the resources for drinking. The water for drinking should remove of the naturally occurring ions of Fe and Mn. The control of sodium and salinity hazard is required for irrigation. The integrated management of surface water and groundwater for drinking and irrigation is to solve the water issues. Copyright © 2012 Elsevier Ltd. All rights reserved.
Mansfeldt, Cresten B.; Rowe, Annette R.; Heavner, Gretchen L. W.; Zinder, Stephen H.
2014-01-01
A cDNA-microarray was designed and used to monitor the transcriptomic profile of Dehalococcoides mccartyi strain 195 (in a mixed community) respiring various chlorinated organics, including chloroethenes and 2,3-dichlorophenol. The cultures were continuously fed in order to establish steady-state respiration rates and substrate levels. The organization of array data into a clustered heat map revealed two major experimental partitions. This partitioning in the data set was further explored through principal component analysis. The first two principal components separated the experiments into those with slow (1.6 ± 0.6 μM Cl−/h)- and fast (22.9 ± 9.6 μM Cl−/h)-respiring cultures. Additionally, the transcripts with the highest loadings in these principal components were identified, suggesting that those transcripts were responsible for the partitioning of the experiments. By analyzing the transcriptomes (n = 53) across experiments, relationships among transcripts were identified, and hypotheses about the relationships between electron transport chain members were proposed. One hypothesis, that the hydrogenases Hup and Hym and the formate dehydrogenase-like oxidoreductase (DET0186-DET0187) form a complex (as displayed by their tight clustering in the heat map analysis), was explored using a nondenaturing protein separation technique combined with proteomic sequencing. Although these proteins did not migrate as a single complex, DET0112 (an FdhB-like protein encoded in the Hup operon) was found to comigrate with DET0187 rather than with the catalytic Hup subunit DET0110. On closer inspection of the genome annotations of all Dehalococcoides strains, the DET0185-to-DET0187 operon was found to lack a key subunit, an FdhB-like protein. Therefore, on the basis of the transcriptomic, genomic, and proteomic evidence, the place of the missing subunit in the DET0185-to-DET0187 operon is likely filled by recruiting a subunit expressed from the Hup operon (DET0112). PMID:25063656
Buccheri, Maria A; Spina, Sonia; Ruberto, Concetta; Lombardo, Turi; Labie, Dominique; Ragusa, And Angela
2013-01-01
Fetal hemoglobin (Hb F) is the principal ameliorating factor of β-thalassemia (β-thal) and sickle cell disease. Persistent production in adult life is a quantitative trait regulated by loci inside or outside the β-globin gene cluster. From genome-wide association studies, principal quantitative trait loci (QTL) (accounting for 50.0% of Hb F variability in different populations) have been identified in the BCL11A gene, HBS1L-MYB intergenic polymorphism and the β-globin gene cluster itself. In this study, we analyzed quantitative trait haplotypes in two Sicilian families with extremely mild β-thal and unusually high Hb F expression, in order to examine possible genetic background variations in a similar β-thalassemic phenotype. This study redefines the linkage disequilibrium blocks at these loci, but also shows slight differences between probands in haplotype combinations which could reflect different mechanisms of high Hb F production in patients with β-thal. We proposed a haplotype-based approach as a useful tool for the understanding of β-thal phenotype variation in patients with similar β-thalassemic backgrounds in an attempt to answer the recurring question of why patients with the same β-thalassemic genotype show different phenotypes.
Fetal exposure markers of dioxins and dioxin-like PCBs.
Lampa, Erik; Eguchi, Akifumi; Todaka, Emiko; Mori, Chisato
2018-04-01
Fetal exposure to polychlorinated biphenyls (PCBs), polychlorinated-p-dibenzodioxins (PCDDs), and polychlorinated dibenzofurans (PCDFs) have been associated with a number of adverse health outcomes. Although the placenta acts as a barrier between the mother and the fetus, these contaminants transfer through the placenta exposing the fetus. Several studies have investigated placental transfer, but few have assessed the co-variation among these contaminants. Maternal blood, cord blood, and cord tissue were collected from 41 Japanese mother-infant pairs and analyzed for dioxin-like PCBs and PCDD/Fs. Hierarchical cluster analysis followed by principal component analysis were used to assess the co-variation. Two stable clusters of dioxin-like PCBs were found in maternal and cord blood. One cluster of low/medium chlorinated dioxin-like PCBs was present in all three matrices with 2,3',4,4',5-PeCB(#118) and 3,3',4,4',5-PeCB(#126) explaining the majority of the clusters' variances. Medium/high chlorinated dioxin-like PCBs clustered in maternal blood and cord blood but not in cord tissue. 2,3,4,4',5-PeCB(#114) and 2,3,3',4,4',5,5'-HpCB(#189) explained the majority of the clusters' variances. There was a substantial correlation between the sum of dioxin-like PCBs and total PCDD/F in all three matrices. The sum of the four suggested PCBs plus 3,3',4,4'-TeCB(#77) correlated well with total PCDD/F in all three matrices. Apart from the dioxin-like PCBs, little co-variation existed among the studied contaminants. The five PCBs can be used as fetal exposure markers for dioxin and dioxin-like PCBs in maternal and cord blood respectively. In cord tissue, more higher chlorinated dioxin-like PCBs need to be measured as well.
He, Yan; Caporaso, J Gregory; Jiang, Xiao-Tao; Sheng, Hua-Fang; Huse, Susan M; Rideout, Jai Ram; Edgar, Robert C; Kopylova, Evguenia; Walters, William A; Knight, Rob; Zhou, Hong-Wei
2015-01-01
The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses. Here, we report that many hierarchical and greedy clustering methods produce unstable OTUs, with membership that depends on the number of sequences clustered. If OTUs are regenerated with additional sequences or samples, sequences originally assigned to a given OTU can be split into different OTUs. Alternatively, sequences assigned to different OTUs can be merged into a single OTU. This OTU instability affects alpha-diversity analyses such as rarefaction curves, beta-diversity analyses such as distance-based ordination (for example, Principal Coordinate Analysis (PCoA)), and the identification of differentially represented OTUs. Our results show that the proportion of unstable OTUs varies for different clustering methods. We found that the closed-reference method is the only one that produces completely stable OTUs, with the caveat that sequences that do not match a pre-existing reference sequence collection are discarded. As a compromise to the factors listed above, we propose using an open-reference method to enhance OTU stability. This type of method clusters sequences against a database and includes unmatched sequences by clustering them via a relatively stable de novo clustering method. OTU stability is an important consideration when analyzing microbial diversity and is a feature that should be taken into account during the development of novel OTU clustering methods.
Kebir, Sied; Khurshid, Zain; Gaertner, Florian C.; Essler, Markus; Hattingen, Elke; Fimmers, Rolf; Scheffler, Björn; Herrlinger, Ulrich; Bundschuh, Ralph A.; Glas, Martin
2017-01-01
Rationale Timely detection of pseudoprogression (PSP) is crucial for the management of patients with high-grade glioma (HGG) but remains difficult. Textural features of O-(2-[18F]fluoroethyl)-L-tyrosine positron emission tomography (FET-PET) mirror tumor uptake heterogeneity; some of them may be associated with tumor progression. Methods Fourteen patients with HGG and suspected of PSP underwent FET-PET imaging. A set of 19 conventional and textural FET-PET features were evaluated and subjected to unsupervised consensus clustering. The final diagnosis of true progression vs. PSP was based on follow-up MRI using RANO criteria. Results Three robust clusters have been identified based on 10 predominantly textural FET-PET features. None of the patients with PSP fell into cluster 2, which was associated with high values for textural FET-PET markers of uptake heterogeneity. Three out of 4 patients with PSP were assigned to cluster 3 that was largely associated with low values of textural FET-PET features. By comparison, tumor-to-normal brain ratio (TNRmax) at the optimal cutoff 2.1 was less predictive of PSP (negative predictive value 57% for detecting true progression, p=0.07 vs. 75% with cluster 3, p=0.04). Principal Conclusions Clustering based on textural O-(2-[18F]fluoroethyl)-L-tyrosine PET features may provide valuable information in assessing the elusive phenomenon of pseudoprogression. PMID:28030820
Spatial assessment of air quality patterns in Malaysia using multivariate analysis
NASA Astrophysics Data System (ADS)
Dominick, Doreena; Juahir, Hafizan; Latif, Mohd Talib; Zain, Sharifuddin M.; Aris, Ahmad Zaharin
2012-12-01
This study aims to investigate possible sources of air pollutants and the spatial patterns within the eight selected Malaysian air monitoring stations based on a two-year database (2008-2009). The multivariate analysis was applied on the dataset. It incorporated Hierarchical Agglomerative Cluster Analysis (HACA) to access the spatial patterns, Principal Component Analysis (PCA) to determine the major sources of the air pollution and Multiple Linear Regression (MLR) to assess the percentage contribution of each air pollutant. The HACA results grouped the eight monitoring stations into three different clusters, based on the characteristics of the air pollutants and meteorological parameters. The PCA analysis showed that the major sources of air pollution were emissions from motor vehicles, aircraft, industries and areas of high population density. The MLR analysis demonstrated that the main pollutant contributing to variability in the Air Pollutant Index (API) at all stations was particulate matter with a diameter of less than 10 μm (PM10). Further MLR analysis showed that the main air pollutant influencing the high concentration of PM10 was carbon monoxide (CO). This was due to combustion processes, particularly originating from motor vehicles. Meteorological factors such as ambient temperature, wind speed and humidity were also noted to influence the concentration of PM10.
ERIC Educational Resources Information Center
Xu, Zeyu; Nichols, Austin
2010-01-01
The gold standard in making causal inference on program effects is a randomized trial. Most randomization designs in education randomize classrooms or schools rather than individual students. Such "clustered randomization" designs have one principal drawback: They tend to have limited statistical power or precision. This study aims to…
Cluster Supervision Practices in Primary School of Jimma Zone
ERIC Educational Resources Information Center
Afework, E. A.; Frew, A. T.; Abeya, G. G.
2017-01-01
The main objective of this study was to assess the supervisory practice of cluster resource centre (CRC) supervisors in Jimma Zone primary schools. To achieve this purpose, the descriptive survey design was employed. Data were collected from 238 randomly selected teachers, and 60 school principals with a response rate of 98.6%. Moreover, 12 CRC…
Penalized unsupervised learning with outliers
Witten, Daniela M.
2013-01-01
We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057
Spatial and kinematic structure of Monoceros star-forming region
NASA Astrophysics Data System (ADS)
Costado, M. T.; Alfaro, E. J.
2018-05-01
The principal aim of this work is to study the velocity field in the Monoceros star-forming region using the radial velocity data available in the literature, as well as astrometric data from the Gaia first release. This region is a large star-forming complex formed by two associations named Monoceros OB1 and OB2. We have collected radial velocity data for more than 400 stars in the area of 8 × 12 deg2 and distance for more than 200 objects. We apply a clustering analysis in the subspace of the phase space formed by angular coordinates and radial velocity or distance data using the Spectrum of Kinematic Grouping methodology. We found four and three spatial groupings in radial velocity and distance variables, respectively, corresponding to the Local arm, the central clusters forming the associations and the Perseus arm, respectively.
ERIC Educational Resources Information Center
Kean, Teoh Hong; Kannan, Sathiamoorthy; Piaw, Chua Yan
2017-01-01
The main aim of this research paper was to ascertain the relationship between principal leadership practices and teacher commitment. The study was conducted using quantitative survey questionnaire to 384 secondary school teachers, ranging from band 1 to band 6 in Malaysia using multi stage stratified cluster random sampling. This study was using…
Differentiation of aflatoxigenic and non-aflatoxigenic strains of Aspergilli by FT-IR spectroscopy.
Atkinson, Curtis; Pechanova, Olga; Sparks, Darrell L; Brown, Ashli; Rodriguez, Jose M
2014-01-01
Fourier transform infrared spectroscopy (FT-IR) is a well-established and widely accepted methodology to identify and differentiate diverse microbial species. In this study, FT-IR was used to differentiate 20 strains of ubiquitous and agronomically important phytopathogens of Aspergillus flavus and Aspergillus parasiticus. By analyzing their spectral profiles via principal component and cluster analysis, differentiation was achieved between the aflatoxin-producing and nonproducing strains of both fungal species. This study thus indicates that FT-IR coupled to multivariate statistics can rapidly differentiate strains of Aspergilli based on their toxigenicity.
NASA Astrophysics Data System (ADS)
Mignani, Anna G.; Ciaccheri, Leonardo; Cimato, Antonio; Sani, Graziano; Smith, Peter R.
2004-03-01
Absorption spectroscopy and multi-angle scattering measurements in the visible spectral range are innovately used to analyze samples of extra virgin olive oils coming from selected areas of Tuscany, a famous Italian region for the production of extra virgin olive oil. The measured spectra are processed by means of the Principal Component Analysis method, so as to create a 3D map capable of clustering the Tuscan oils within the wider area of Italian extra virgin olive oils.
Cardiometabolic Risk Clustering in Spinal Cord Injury: Results of Exploratory Factor Analysis
2013-01-01
Background: Evidence suggests an elevated prevalence of cardiometabolic risks among persons with spinal cord injury (SCI); however, the unique clustering of risk factors in this population has not been fully explored. Objective: The purpose of this study was to describe unique clustering of cardiometabolic risk factors differentiated by level of injury. Methods: One hundred twenty-one subjects (mean 37 ± 12 years; range, 18–73) with chronic C5 to T12 motor complete SCI were studied. Assessments included medical histories, anthropometrics and blood pressure, and fasting serum lipids, glucose, insulin, and hemoglobin A1c (HbA1c). Results: The most common cardiometabolic risk factors were overweight/obesity, high levels of low-density lipoprotein (LDL-C), and low levels of high-density lipoprotein (HDL-C). Risk clustering was found in 76.9% of the population. Exploratory principal component factor analysis using varimax rotation revealed a 3–factor model in persons with paraplegia (65.4% variance) and a 4–factor solution in persons with tetraplegia (73.3% variance). The differences between groups were emphasized by the varied composition of the extracted factors: Lipid Profile A (total cholesterol [TC] and LDL-C), Body Mass-Hypertension Profile (body mass index [BMI], systolic blood pressure [SBP], and fasting insulin [FI]); Glycemic Profile (fasting glucose and HbA1c), and Lipid Profile B (TG and HDL-C). BMI and SBP formed a separate factor only in persons with tetraplegia. Conclusions: Although the majority of the population with SCI has risk clustering, the composition of the risk clusters may be dependent on level of injury, based on a factor analysis group comparison. This is clinically plausible and relevant as tetraplegics tend to be hypo- to normotensive and more sedentary, resulting in lower HDL-C and a greater propensity toward impaired carbohydrate metabolism. PMID:23960702
2012-01-01
Background The use of growth-promoters in beef cattle, despite the EU ban, remains a frequent practice. The use of transcriptomic markers has already proposed to identify indirect evidence of anabolic hormone treatment. So far, such approach has been tested in experimentally treated animals. Here, for the first time commercial samples were analyzed. Results Quantitative determination of Dexamethasone (DEX) residues in the urine collected at the slaughterhouse was performed by Liquid Chromatography-Mass Spectrometry (LC-MS). DNA-microarray technology was used to obtain transcriptomic profiles of skeletal muscle in commercial samples and negative controls. LC-MS confirmed the presence of low level of DEX residues in the urine of the commercial samples suspect for histological classification. Principal Component Analysis (PCA) on microarray data identified two clusters of samples. One cluster included negative controls and a subset of commercial samples, while a second cluster included part of the specimens collected at the slaughterhouse together with positives for corticosteroid treatment based on thymus histology and LC-MS. Functional analysis of the differentially expressed genes (3961) between the two groups provided further evidence that animals clustering with positive samples might have been treated with corticosteroids. These suspect samples could be reliably classified with a specific classification tool (Prediction Analysis of Microarray) using just two genes. Conclusions Despite broad variation observed in gene expression profiles, the present study showed that DNA-microarrays can be used to find transcriptomic signatures of putative anabolic treatments and that gene expression markers could represent a useful screening tool. PMID:23110699
Lee, Jennifer E.; Watson, David; Frey-Law, Laura A.
2012-01-01
Background Recent studies suggest an underlying three- or four-factor structure explains the conceptual overlap and distinctiveness of several negative emotionality and pain-related constructs. However, the validity of these latent factors for predicting pain has not been examined. Methods A cohort of 189 (99F; 90M) healthy volunteers completed eight self-report negative emotionality and pain-related measures (Eysenck Personality Questionnaire-Revised; Positive and Negative Affect Schedule; State-Trait Anxiety Inventory; Pain Catastrophizing Scale; Fear of Pain Questionnaire; Somatosensory Amplification Scale; Anxiety Sensitivity Index; Whiteley Index). Using principal axis factoring, three primary latent factors were extracted: General Distress; Catastrophic Thinking; and Pain-Related Fear. Using these factors, individuals clustered into three subgroups of high, moderate, and low negative emotionality responses. Experimental pain was induced via intramuscular acidic infusion into the anterior tibialis muscle, producing local (infusion site) and/or referred (anterior ankle) pain and hyperalgesia. Results Pain outcomes differed between clusters (multivariate analysis of variance and multinomial regression), with individuals in the highest negative emotionality cluster reporting the greatest local pain (p = 0.05), mechanical hyperalgesia (pressure pain thresholds; p = 0.009) and greater odds (2.21 OR) of experiencing referred pain compared to the lowest negative emotionality cluster. Conclusion Our results provide support for three latent psychological factors explaining the majority of the variance between several pain-related psychological measures, and that individuals in the high negative emotionality subgroup are at increased risk for (1) acute local muscle pain; (2) local hyperalgesia; and (3) referred pain using a standardized nociceptive input. PMID:23165778
20 Years Spatial-Temporal Analysis of Dengue Fever and Hemorrhagic Fever in Mexico.
Hernández-Gaytán, Sendy Isarel; Díaz-Vásquez, Francisco Javier; Duran-Arenas, Luis Gerardo; López Cervantes, Malaquías; Rothenberg, Stephen J
2017-10-01
Dengue Fever (DF) is a human vector-borne disease and a major public health problem worldwide. In Mexico, DF and Dengue Hemorrhagic Fever (DHF) cases have increased in recent years. The aim of this study was to identify variations in the spatial distribution of DF and DHF cases over time using space-time statistical analysis and geographic information systems. Official data of DF and DHF cases were obtained in 32 states from 1995-2015. Space-time scan statistics were used to determine the space-time clusters of DF and DHF cases nationwide, and a geographic information system was used to display the location of clusters. A total of 885,748 DF cases was registered of which 13.4% (n = 119,174) correspond to DHF in the 32 states from 1995-2015. The most likely cluster of DF (relative risk = 25.5) contained the states of Jalisco, Colima, and Nayarit, on the Pacific coast in 2009, and the most likely cluster of DHF (relative risk = 8.5) was in the states of Chiapas, Tabasco, Campeche, Oaxaca, Veracruz, Quintana Roo, Yucatán, Puebla, Morelos, and Guerrero principally on the Gulf coast over 2006-2015. The geographic distribution of DF and DHF cases has increased in recent years and cases are significantly clustered in two coastal areas (Pacific and Gulf of Mexico). This provides the basis for further investigation of risk factors as well as interventions in specific areas. Copyright © 2018 IMSS. Published by Elsevier Inc. All rights reserved.
Transcriptional profiling reveals progeroid Ercc1-/Δ mice as a model system for glomerular aging
2013-01-01
Background Aging-related kidney diseases are a major health concern. Currently, models to study renal aging are lacking. Due to a reduced life-span progeroid models hold the promise to facilitate aging studies and allow examination of tissue-specific changes. Defects in genome maintenance in the Ercc1-/Δ progeroid mouse model result in premature aging and typical age-related pathologies. Here, we compared the glomerular transcriptome of young and aged Ercc1-deficient mice to young and aged WT mice in order to establish a novel model for research of aging-related kidney disease. Results In a principal component analysis, age and genotype emerged as first and second principal components. Hierarchical clustering of all 521 genes differentially regulated between young and old WT and young and old Ercc1-/Δ mice showed cluster formation between young WT and Ercc1-/Δ as well as old WT and Ercc1-/Δ samples. An unexpectedly high number of 77 genes were differentially regulated in both WT and Ercc1-/Δ mice (p < 0.0001). GO term enrichment analysis revealed these genes to be involved in immune and inflammatory response, cell death, and chemotaxis. In a network analysis, these genes were part of insulin signaling, chemokine and cytokine signaling and extracellular matrix pathways. Conclusion Beyond insulin signaling, we find chemokine and cytokine signaling as well as modifiers of extracellular matrix composition to be subject to major changes in the aging glomerulus. At the level of the transcriptome, the pattern of gene activities is similar in the progeroid Ercc1-/Δ mouse model constituting a valuable tool for future studies of aging-associated glomerular pathologies. PMID:23947592
Transcriptional profiling reveals progeroid Ercc1(-/Δ) mice as a model system for glomerular aging.
Schermer, Bernhard; Bartels, Valerie; Frommolt, Peter; Habermann, Bianca; Braun, Fabian; Schultze, Joachim L; Roodbergen, Marianne; Hoeijmakers, Jan Hj; Schumacher, Björn; Nürnberg, Peter; Dollé, Martijn Et; Benzing, Thomas; Müller, Roman-Ulrich; Kurschat, Christine E
2013-08-16
Aging-related kidney diseases are a major health concern. Currently, models to study renal aging are lacking. Due to a reduced life-span progeroid models hold the promise to facilitate aging studies and allow examination of tissue-specific changes. Defects in genome maintenance in the Ercc1(-/Δ) progeroid mouse model result in premature aging and typical age-related pathologies. Here, we compared the glomerular transcriptome of young and aged Ercc1-deficient mice to young and aged WT mice in order to establish a novel model for research of aging-related kidney disease. In a principal component analysis, age and genotype emerged as first and second principal components. Hierarchical clustering of all 521 genes differentially regulated between young and old WT and young and old Ercc1(-/Δ) mice showed cluster formation between young WT and Ercc1(-/Δ) as well as old WT and Ercc1(-/Δ) samples. An unexpectedly high number of 77 genes were differentially regulated in both WT and Ercc1(-/Δ) mice (p < 0.0001). GO term enrichment analysis revealed these genes to be involved in immune and inflammatory response, cell death, and chemotaxis. In a network analysis, these genes were part of insulin signaling, chemokine and cytokine signaling and extracellular matrix pathways. Beyond insulin signaling, we find chemokine and cytokine signaling as well as modifiers of extracellular matrix composition to be subject to major changes in the aging glomerulus. At the level of the transcriptome, the pattern of gene activities is similar in the progeroid Ercc1(-/Δ) mouse model constituting a valuable tool for future studies of aging-associated glomerular pathologies.
Wang, Xinwang; Wadl, Phillip A; Wood-Jones, Alicia; Windham, Gary; Trigiano, Robert N; Scruggs, Mary; Pilgrim, Candace; Baird, Richard
2012-12-01
Simple sequence repeat (SSR) markers were developed from Aspergillus flavus expressed sequence tag (EST) database to conduct an analysis of genetic relationships of Aspergillus isolates from numerous host species and geographical regions, but primarily from the United States. Twenty-nine primers were designed from 362 tri-nucleotide EST-SSR sequences. Eighteen polymorphic loci were used to genotype 96 Aspergillus species isolates. The number of alleles detected per locus ranged from 2 to 24 with a mean of 8.2 alleles. Haploid diversity ranged from 0.28 to 0.91. Genetic distance matrix was used to perform principal coordinates analysis (PCA) and to generate dendrograms using unweighted pair group method with arithmetic mean (UPGMA). Two principal coordinates explained more than 75 % of the total variation among the isolates. One clade was identified for A. flavus isolates (n = 87) with the other Aspergillus species (n = 7) using PCA, but five distinct clusters were present when the others taxa were excluded from the analysis. Six groups were noted when the EST-SSR data were compared using UPGMA. However, the latter PCA or UPGMA comparison resulted in no direct associations with host species, geographical region or aflatoxin production. Furthermore, there was no direct correlation to visible morphological features such as sclerotial types. The isolates from Mississippi Delta region, which contained the largest percentage of isolates, did not show any unusual clustering except for isolates K32, K55, and 199. Further studies of these three isolates are warranted to evaluate their pathogenicity, aflatoxin production potential, additional gene sequences (e.g., RPB2), and morphological comparisons.
Detecting most influencing courses on students grades using block PCA
NASA Astrophysics Data System (ADS)
Othman, Osama H.; Gebril, Rami Salah
2014-12-01
One of the modern solutions adopted in dealing with the problem of large number of variables in statistical analyses is the Block Principal Component Analysis (Block PCA). This modified technique can be used to reduce the vertical dimension (variables) of the data matrix Xn×p by selecting a smaller number of variables, (say m) containing most of the statistical information. These selected variables can then be employed in further investigations and analyses. Block PCA is an adapted multistage technique of the original PCA. It involves the application of Cluster Analysis (CA) and variable selection throughout sub principal components scores (PC's). The application of Block PCA in this paper is a modified version of the original work of Liu et al (2002). The main objective was to apply PCA on each group of variables, (established using cluster analysis), instead of involving the whole large pack of variables which was proved to be unreliable. In this work, the Block PCA is used to reduce the size of a huge data matrix ((n = 41) × (p = 251)) consisting of Grade Point Average (GPA) of the students in 251 courses (variables) in the faculty of science in Benghazi University. In other words, we are constructing a smaller analytical data matrix of the GPA's of the students with less variables containing most variation (statistical information) in the original database. By applying the Block PCA, (12) courses were found to `absorb' most of the variation or influence from the original data matrix, and hence worth to be keep for future statistical exploring and analytical studies. In addition, the course Independent Study (Math.) was found to be the most influencing course on students GPA among the 12 selected courses.
NASA Astrophysics Data System (ADS)
Sierra-Pérez, Julián; Torres-Arredondo, M.-A.; Alvarez-Montoya, Joham
2018-01-01
Structural health monitoring consists of using sensors integrated within structures together with algorithms to perform load monitoring, damage detection, damage location, damage size and severity, and prognosis. One possibility is to use strain sensors to infer structural integrity by comparing patterns in the strain field between the pristine and damaged conditions. In previous works, the authors have demonstrated that it is possible to detect small defects based on strain field pattern recognition by using robust machine learning techniques. They have focused on methodologies based on principal component analysis (PCA) and on the development of several unfolding and standardization techniques, which allow dealing with multiple load conditions. However, before a real implementation of this approach in engineering structures, changes in the strain field due to conditions different from damage occurrence need to be isolated. Since load conditions may vary in most engineering structures and promote significant changes in the strain field, it is necessary to implement novel techniques for uncoupling such changes from those produced by damage occurrence. A damage detection methodology based on optimal baseline selection (OBS) by means of clustering techniques is presented. The methodology includes the use of hierarchical nonlinear PCA as a nonlinear modeling technique in conjunction with Q and nonlinear-T 2 damage indices. The methodology is experimentally validated using strain measurements obtained by 32 fiber Bragg grating sensors bonded to an aluminum beam under dynamic bending loads and simultaneously submitted to variations in its pitch angle. The results demonstrated the capability of the methodology for clustering data according to 13 different load conditions (pitch angles), performing the OBS and detecting six different damages induced in a cumulative way. The proposed methodology showed a true positive rate of 100% and a false positive rate of 1.28% for a 99% of confidence.
NASA Astrophysics Data System (ADS)
Minguez, D. A.; Kodama, K. P.
2013-12-01
We present the preliminary results of a multi-faceted rock magnetic study conducted on 195 samples from the Oatka Creek member of the Marcellus formation, where it has been extracted from the subsurface as a drill core near Sunbury, PA. Samples were oriented based on bedding attitude observed within the core and were removed from the core at a spacing of ≈0.25 meters starting from the base (depth ≈ 500 meters) and spanning 51 meters of stratigraphic section. The results of measurements of the anisotropy of magnetic susceptibility (AMS) consistently demonstrate a nearly triaxial fabric with maximum principal axes clustering east-west and horizontal in geographic coordinates, nearly parallel to the direction of bedding strike. AMS minimum principal axes cluster near the pole to the bedding plane. Anisotropy of anhysteretic remanence (AAR) applied with a 100 mT peak field and a 97 μT bias field in 9 orientations demonstrates a markedly different fabric, with maximum principal axis clustering north-south and horizontal in geographic coordinates. Minimum principal axes of AAR cluster steeply (~60-70 degrees) to the west. The discrepancy between AAR and AMS fabrics likely indicates the AMS is dominated by paramagnetic clays, and thus may be interpreted as an east-west intersection lineation of clay particles dipping gently north or south. Paleomagnetic directions obtained using Alternating Field (AF) demagnetization in 5 mT steps up to 110 mT demonstrates a high coercivity remanence (>35 mT) with a south and shallow direction (D= 183.4 I=-14.7). This result is consistent with previous studies of the Marcellus formation and the Devonian Catskill red beds. Thermal demagnetization experiments demonstrate a similar magnetization removed by temperatures between 250 and 350 degrees Celsius, however, continued heating results in the acquisition of strong, inconsistent magnetizations likely the result of oxidizing iron sulfides. Thermal demagnetization of orthogonal partial ARMs applied in 100 mT and 50 mT peak fields was conducted on a subset of samples sealed in aluminum foil with alumina-silica cement to prevent oxidization. The results demonstrate that the low coercivity pARM is removed by 400 degrees Celsius and the high coercivity pARM, which is only 10% of the total remanence, is removed by 600 degrees Celsius. The results suggest the presence of low coercivity Fe sulfides and high coercivity magnetite. Lastly, time series analysis of bulk magnetic susceptibility using the Multi-Taper Method (MTM) demonstrates oscillations with a wavelength of 18 meters above the 99% confidence level with respect to the robust red noise. This wavelength may have a duration of 405 kyr given ancillary chronostratigraphic evidence.
Dietary patterns in middle-aged Irish men and women defined by cluster analysis.
Villegas, R; Salim, A; Collins, M M; Flynn, A; Perry, I J
2004-12-01
To identify and characterise dietary patterns in a middle-aged Irish population sample and study associations between these patterns, sociodemographic and anthropometric variables and major risk factors for cardiovascular disease. A cross-sectional study. A group of 1473 men and women were sampled from 17 general practice lists in the South of Ireland. A total of 1018 attended for screening, with a response rate of 69%. Participants completed a detailed health and lifestyle questionnaire and provided a fasting blood sample for glucose, lipids and homocysteine. Dietary intake was assessed using a standard food-frequency questionnaire adapted for use in the Irish population. The food-frequency questionnaire was a modification of that used in the UK arm of the European Prospective Investigation into Cancer study, which was based on that used in the US Nurses' Health Study. Dietary patterns were assessed primarily by K-means cluster analysis, following initial principal components analysis to identify the seeds. Three dietary patterns were identified. These clusters corresponded to a traditional Irish diet, a prudent diet and a diet characterised by high consumption of alcoholic drinks and convenience foods. Cluster 1 (Traditional Diet) had the highest intakes of saturated fat (SFA), monounsaturated fat (MUFA) and percentage of total energy from fat, and the lowest polyunsaturated fat (PUFA) intake and ratio of polyunsaturated to saturated fat (P:S). Cluster 2 (Prudent Diet) was characterised by significantly higher intakes of fibre, PUFA, P:S ratio and antioxidant vitamins (vitamins C and E), and lower intakes of total fat, MUFA, SFA and cholesterol. Cluster 3 (Alcohol & Convenience Foods) had the highest intakes of alcohol, protein, cholesterol, vitamin B(12), vitamin B(6), folate, iron, phosphorus, selenium and zinc, and the lowest intakes of PUFA, vitamin A and antioxidant vitamins (vitamins C and E). There were significant differences between clusters in gender distribution, smoking status, physical activity, body mass index, waist circumference and serum homocysteine concentrations. In this general population sample, cluster analysis methods yielded two major dietary patterns: prudent and traditional. The prudent dietary pattern is associated with other health-seeking behaviours. Study of dietary patterns will help elucidate links between diet and disease and contribute to the development of healthy eating guidelines for health promotion.
Lubell, Jessica D; Brand, Mark H; Lehrer, Jonathan M; Holsinger, Kent E
2008-06-01
Japanese barberry (Berberis thunbergii DC.) is a widespread invasive plant that remains an important landscape shrub represented by ornamental, purple-leaved forms of the botanical variety atropurpurea. These forms differ greatly in appearance from feral plants, bringing into question whether they contribute to invasive populations or whether the invasions represent self-sustaining populations derived from the initial introduction of the species in the late 19th century. In this study we used amplified fragment length polymorphism (AFLP) markers to determine whether genetic contributions from B. t. var. atropurpurea are found within naturalized Japanese barberry populations in southern New England. Bayesian clustering of AFLP genotypes and principal coordinate analysis distinguished B. t. var. atropurpurea genotypes from 85 plants representing five invasive populations. While a single feral plant resembled B. t. var. atropurpurea phenotypically and fell within the same genetic cluster, all other naturalized plants sampled were genetically distinct from the purple-leaved genotypes. Seven plants from two different sites possessed morphology consistent with Berberis vulgaris (common barberry) or B. ×ottawensis (B. thunbergii × B. vulgaris). Genetic analysis placed these plants in two clusters separate from B. thunbergii. Although the Bayesian analysis indicated some introgression of B. t. var. atropurpurea and B. vulgaris, these genotypes have had limited influence on extant feral populations of B. thunbergii.
Feng, Jingjing; Chen, Xiaolin; Jia, Lei; Liu, Qizhen; Chen, Xiaojia; Han, Deming; Cheng, Jinping
2018-04-10
Wastewater treatment plants (WWTPs) are the most common form of industrial and municipal wastewater control. To evaluate the performance of wastewater treatment and the potential risk of treated wastewater to aquatic life and human health, the influent and effluent concentrations of nine toxic metals were determined in 12 full-scale WWTPs in Shanghai, China. The performance was evaluated based on national standards for reclamation and aquatic criteria published by US EPA, and by comparison with other full-scale WWTPs in different countries. Potential sources of heavy metals were recognized using partial correlation analysis, hierarchical clustering, and principal component analysis (PCA). Results indicated significant treatment effect on As, Cd, Cr, Cu, Hg, Mn, Pb, and Zn. The removal efficiencies ranged from 92% (Cr) to 16.7% (Hg). The results indicated potential acute and/or chronic effect of Cu, Ni, Pb, and Zn on aquatic life and potential harmful effect of As and Mn on human health for the consumption of water and/or organism. The results of partial correlation analysis, hierarchical clustering based on cosine distance, and PCA, which were consistent with each other, suggested common source of Cd, Cr, Cu, and Pb and common source of As, Hg, Mn, Ni, and Zn. Hierarchical clustering based on Jaccard similarity suggested common source of Cd, Hg, and Ni, which was statistically proved by Fisher's exact test.
Yang, Yongxin; Bu, Dengpan; Zhao, Xiaowei; Sun, Peng; Wang, Jiaqi; Zhou, Lingyun
2013-04-05
To aid in unraveling diverse genetic and biological unknowns, a proteomic approach was used to analyze the whey proteome in cow, yak, buffalo, goat, and camel milk based on the isobaric tag for relative and absolute quantification (iTRAQ) techniques. This analysis is the first to produce proteomic data for the milk from the above-mentioned animal species: 211 proteins have been identified and 113 proteins have been categorized according to molecular function, cellular components, and biological processes based on gene ontology annotation. The results of principal component analysis showed significant differences in proteomic patterns among goat, camel, cow, buffalo, and yak milk. Furthermore, 177 differentially expressed proteins were submitted to advanced hierarchical clustering. The resulting clustering pattern included three major sample clusters: (1) cow, buffalo, and yak milk; (2) goat, cow, buffalo, and yak milk; and (3) camel milk. Certain proteins were chosen as characterization traits for a given species: whey acidic protein and quinone oxidoreductase for camel milk, biglycan for goat milk, uncharacterized protein (Accession Number: F1MK50 ) for yak milk, clusterin for buffalo milk, and primary amine oxidase for cow milk. These results help reveal the quantitative milk whey proteome pattern for analyzed species. This provides information for evaluating adulteration of specific specie milk and may provide potential directions for application of specific milk protein production based on physiological differences among animal species.
Pes, Giovanni Mario; Delitala, Alessandro Palmerio; Errigo, Alessandra; Delitala, Giuseppe; Dore, Maria Pina
2016-06-01
Latent autoimmune diabetes in adults (LADA) which accounts for more than 10 % of all cases of diabetes is characterized by onset after age 30, absence of ketoacidosis, insulin independence for at least 6 months, and presence of circulating islet-cell antibodies. Its marked heterogeneity in clinical features and immunological markers suggests the existence of multiple mechanisms underlying its pathogenesis. The principal component (PC) analysis is a statistical approach used for finding patterns in data of high dimension. In this study the PC analysis was applied to a set of variables from a cohort of Sardinian LADA patients to identify a smaller number of latent patterns. A list of 11 variables including clinical (gender, BMI, lipid profile, systolic and diastolic blood pressure and insulin-free time period), immunological (anti-GAD65, anti-IA-2 and anti-TPO antibody titers) and genetic features (predisposing gene variants previously identified as risk factors for autoimmune diabetes) retrieved from clinical records of 238 LADA patients referred to the Internal Medicine Unit of University of Sassari, Italy, were analyzed by PC analysis. The predictive value of each PC on the further development of insulin dependence was evaluated using Kaplan-Meier curves. Overall 4 clusters were identified by PC analysis. In component PC-1, the dominant variables were: BMI, triglycerides, systolic and diastolic blood pressure and duration of insulin-free time period; in PC-2: genetic variables such as Class II HLA, CTLA-4 as well as anti-GAD65, anti-IA-2 and anti-TPO antibody titers, and the insulin-free time period predominated; in PC-3: gender and triglycerides; and in PC-4: total cholesterol. These components explained 18, 15, 12, and 12 %, respectively, of the total variance in the LADA cohort. The predictive power of insulin dependence of the four components was different. PC-2 (characterized mostly by high antibody titers and presence of predisposing genetic markers) showed a faster beta-cells failure and PC-3 (characterized mostly by gender and high triglycerides) and PC-4 (high cholesterol) showed a slower beta-cells failure. PC-1 (including dislipidemia and other metabolic dysfunctions), showed a mild beta-cells failure. In conclusion variable clustering might be consistent with different pathogenic pathways and/or distinct immune mechanisms in LADA and could potentially help physicians improve the clinical management of these patients.
Clustering high dimensional data using RIA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aziz, Nazrina
2015-05-15
Clustering may simply represent a convenient method for organizing a large data set so that it can easily be understood and information can efficiently be retrieved. However, identifying cluster in high dimensionality data sets is a difficult task because of the curse of dimensionality. Another challenge in clustering is some traditional functions cannot capture the pattern dissimilarity among objects. In this article, we used an alternative dissimilarity measurement called Robust Influence Angle (RIA) in the partitioning method. RIA is developed using eigenstructure of the covariance matrix and robust principal component score. We notice that, it can obtain cluster easily andmore » hence avoid the curse of dimensionality. It is also manage to cluster large data sets with mixed numeric and categorical value.« less
Wang, Jian; Zhu, Jinmao; Huang, RuZhu; Yang, YuSheng
2012-07-01
We explored the rapid qualitative analysis of wheat cultivars with good lodging resistances by Fourier transform infrared resonance (FTIR) spectroscopy and multivariate statistical analysis. FTIR imaging showing that wheat stem cell walls were mainly composed of cellulose, pectin, protein, and lignin. Principal components analysis (PCA) was used to eliminate multicollinearity among multiple peak absorptions. PCA revealed the developmental internodes of wheat stems could be distributed from low to high along the load of the second principal component, which was consistent with the corresponding bands of cellulose in the FTIR spectra of the cell walls. Furthermore, four distinct stem populations could also be identified by spectral features related to their corresponding mechanical properties via PCA and cluster analysis. Histochemical staining of four types of wheat stems with various abilities to resist lodging revealed that cellulose contributed more than lignin to the ability to resist lodging. These results strongly suggested that the main cell wall component responsible for these differences was cellulose. Therefore, the combination of multivariate analysis and FTIR could rapidly screen wheat cultivars with good lodging resistance. Furthermore, the application of these methods to a much wider range of cultivars of unknown mechanical properties promises to be of interest.
NASA Astrophysics Data System (ADS)
Liu, Wen; Zhang, Yuying; Yang, Si; Han, Donghai
2018-05-01
A new technique to identify the floral resources of honeys is demanded. Terahertz time-domain attenuated total reflection spectroscopy combined with chemometrics methods was applied to discriminate different categorizes (Medlar honey, Vitex honey, and Acacia honey). Principal component analysis (PCA), cluster analysis (CA) and partial least squares-discriminant analysis (PLS-DA) have been used to find information of the botanical origins of honeys. Spectral range also was discussed to increase the precision of PLS-DA model. The accuracy of 88.46% for validation set was obtained, using PLS-DA model in 0.5-1.5 THz. This work indicated terahertz time-domain attenuated total reflection spectroscopy was an available approach to evaluate the quality of honey rapidly.
[Identification of two varieties of Citri Fructus by fingerprint and chemometrics].
Su, Jing-hua; Zhang, Chao; Sun, Lei; Gu, Bing-ren; Ma, Shuang-cheng
2015-06-01
Citri Fructus identification by fingerprint and chemometrics was investigated in this paper. Twenty-three Citri Fructus samples were collected which referred to two varieties as Cirtus wilsonii and C. medica recorded in Chinese Pharmacopoeia. HPLC chromatograms were obtained. The components were partly identified by reference substances, and then common pattern was established for chemometrics analysis. Similarity analysis, principal component analysis (PCA) , partial least squares-discriminant analysis (PLS-DA) and hierarchical cluster analysis heatmap were applied. The results indicated that C. wilsonii and C. medica could be ideally classified with common pattern contained twenty-five characteristic peaks. Besides, preliminary pattern recognition had verified the chemometrics analytical results. Absolute peak area (APA) was used for relevant quantitative analysis, results showed the differences between two varieties and it was valuable for further quality control as selection of characteristic components.
puma: a Bioconductor package for propagating uncertainty in microarray analysis.
Pearson, Richard D; Liu, Xuejun; Sanguinetti, Guido; Milo, Marta; Lawrence, Neil D; Rattray, Magnus
2009-07-09
Most analyses of microarray data are based on point estimates of expression levels and ignore the uncertainty of such estimates. By determining uncertainties from Affymetrix GeneChip data and propagating these uncertainties to downstream analyses it has been shown that we can improve results of differential expression detection, principal component analysis and clustering. Previously, implementations of these uncertainty propagation methods have only been available as separate packages, written in different languages. Previous implementations have also suffered from being very costly to compute, and in the case of differential expression detection, have been limited in the experimental designs to which they can be applied. puma is a Bioconductor package incorporating a suite of analysis methods for use on Affymetrix GeneChip data. puma extends the differential expression detection methods of previous work from the 2-class case to the multi-factorial case. puma can be used to automatically create design and contrast matrices for typical experimental designs, which can be used both within the package itself but also in other Bioconductor packages. The implementation of differential expression detection methods has been parallelised leading to significant decreases in processing time on a range of computer architectures. puma incorporates the first R implementation of an uncertainty propagation version of principal component analysis, and an implementation of a clustering method based on uncertainty propagation. All of these techniques are brought together in a single, easy-to-use package with clear, task-based documentation. For the first time, the puma package makes a suite of uncertainty propagation methods available to a general audience. These methods can be used to improve results from more traditional analyses of microarray data. puma also offers improvements in terms of scope and speed of execution over previously available methods. puma is recommended for anyone working with the Affymetrix GeneChip platform for gene expression analysis and can also be applied more generally.
Effect of the statin therapy on biochemical laboratory tests--a chemometrics study.
Durceková, Tatiana; Mocák, Ján; Boronová, Katarína; Balla, Ján
2011-01-05
Statins are the first-line choice for lowering total and LDL cholesterol levels and very important medicaments for reducing the risk of coronary artery disease. The aim of this study is therefore assessment of the results of biochemical tests characterizing the condition of 172 patients before and after administration of statins. For this purpose, several chemometric tools, namely principal component analysis, cluster analysis, discriminant analysis, logistic regression, KNN classification, ROC analysis, descriptive statistics and ANOVA were used. Mutual relations of 11 biochemical laboratory tests, the patient's age and gender were investigated in detail. Achieved results enable to evaluate the extent of the statin treatment in each individual case. They may also help in monitoring the dynamic progression of the disease. Copyright © 2010 Elsevier B.V. All rights reserved.
Effects of mutation, truncation, and temperature on the folding kinetics of a WW domain.
Maisuradze, Gia G; Zhou, Rui; Liwo, Adam; Xiao, Yi; Scheraga, Harold A
2012-07-20
The purpose of this work is to show how mutation, truncation, and change of temperature can influence the folding kinetics of a protein. This is accomplished by principal component analysis of molecular-dynamics-generated folding trajectories of the triple β-strand WW domain from formin binding protein 28 (FBP28) (Protein Data Bank ID: 1E0L) and its full-size, and singly- and doubly-truncated mutants at temperatures below and very close to the melting point. The reasons for biphasic folding kinetics [i.e., coexistence of slow (three-state) and fast (two-state) phases], including the involvement of a solvent-exposed hydrophobic cluster and another delocalized hydrophobic core in the folding kinetics, are discussed. New folding pathways are identified in free-energy landscapes determined in terms of principal components for full-size mutants. Three-state folding is found to be a main mechanism for folding the FBP28 WW domain and most of the full-size and truncated mutants. The results from the theoretical analysis are compared to those from experiment. Agreements and discrepancies between the theoretical and experimental results are discussed. Because of its importance in understanding protein kinetics and function, the diffusive mechanism by which the FBP28 WW domain and its full-size and truncated mutants explore their conformational space is examined in terms of the mean-square displacement and principal component analysis eigenvalue spectrum analyses. Subdiffusive behavior is observed for all studied systems. Copyright © 2012. Published by Elsevier Ltd.
Surnames in Honduras: A study of the population of Honduras through isonymy.
Herrera Paz, Edwin Francisco; Scapoli, Chiara; Mamolini, Elisabetta; Sandri, Massimo; Carrieri, Alberto; Rodriguez-Larralde, Alvaro; Barrai, Italo
2014-05-01
In this work, we investigated surname distribution in 4,348,021 Honduran electors with the aim of detecting population structure through the study of isonymy in three administrative levels: the whole nation, the 18 departments, and the 298 municipalities. For each administrative level, we studied the surname effective number, α, the total inbreeding, FIT , the random inbreeding, FST , and the local inbreeding, FIS . Principal components analysis, multidimensional scaling, and cluster analysis were performed on Lasker's distance matrix to detect the direction of surname diffusion and for a graphic representation of the surname relationship between different locations. The values of FIT , FST , and FIS display a variation of random inbreeding between the administrative levels in the Honduras population, which is attributed to the "Prefecture effect." Multivariate analyses of department data identified two main clusters, one south-western and the second north-eastern, with the Bay Islands and the eastern Gracias a Dios out of the main clusters. The results suggest that currently the population structure of this country is the result of the joint action of short-range directional migration and drift, with drift dominating over migration, and that population diffusion may have taken place mainly in the NW-SE direction. © 2014 John Wiley & Sons Ltd/University College London.
Cui, G F; Wu, L F; Wang, X N; Jia, W J; Duan, Q; Ma, L L; Jiang, Y L; Wang, J H
2014-07-29
Inter-simple sequence repeat (ISSR) markers were used to discriminate 62 lily cultivars of 5 hybrid series. Eight ISSR primers generated 104 bands in total, which all showed 100% polymorphism, and an average of 13 bands were amplified by each primer. Two software packages, POPGENE 1.32 and NTSYSpc 2.1, were used to analyze the data matrix. Our results showed that the observed number of alleles (NA), effective number of alleles (NE), Nei's genetic diversity (H), and Shannon's information index (I) were 1.9630, 1.4179, 0.2606, and 0.4080, respectively. The highest genetic similarity (0.9601) was observed between the Oriental x Trumpet and Oriental lilies, which indicated that the two hybrids had a close genetic relationship. An unweighted pair-group method with arithmetic means dendrogram showed that the 62 lily cultivars clustered into two discrete groups. The first group included the Oriental and OT cultivars, while the Asiatic, LA, and Longiflorum lilies were placed in the second cluster. The distribution of individuals in the principal component analysis was consistent with the clustering of the dendrogram. Fingerprints of all lily cultivars built from 8 primers could be separated completely. This study confirmed the effect and efficiency of ISSR identification in lily cultivars.
Zhang, Wanfeng; Zhu, Shukui; He, Sheng; Wang, Yanxin
2015-02-06
Using comprehensive two-dimensional gas chromatography coupled to time-of-flight mass spectrometry (GC×GC/TOFMS), volatile and semi-volatile organic compounds in crude oil samples from different reservoirs or regions were analyzed for the development of a molecular fingerprint database. Based on the GC×GC/TOFMS fingerprints of crude oils, principal component analysis (PCA) and cluster analysis were used to distinguish the oil sources and find biomarkers. As a supervised technique, the geological characteristics of crude oils, including thermal maturity, sedimentary environment etc., are assigned to the principal components. The results show that tri-aromatic steroid (TAS) series are the suitable marker compounds in crude oils for the oil screening, and the relative abundances of individual TAS compounds have excellent correlation with oil sources. In order to correct the effects of some other external factors except oil sources, the variables were defined as the content ratio of some target compounds and 13 parameters were proposed for the screening of oil sources. With the developed model, the crude oils were easily discriminated, and the result is in good agreement with the practical geological setting. Copyright © 2014 Elsevier B.V. All rights reserved.
Xu, Lingyang; Liu, Gang; Wang, Zhigang; Zhao, Fuping; Zhang, Li; Han, Xu; Du, Lixin; Liu, Chousheng
2014-01-01
Background China has numerous native domestic goat breeds, however, extensive studies are focused on the genetic diversity within the fewer breeds and limited regions, the population demograogic history and origin of Chinese goats are still unclear. The roles of geographical structure have not been analyzed in Chinese goat domestic process. In this study, the genetic relationships of Chinese indigenous goat populations were evaluated using 30 microsatellite markers. Methodology/Principal Findings Forty Chinese indigenous populations containing 2078 goats were sampled from different geographic regions of China. Moderate genetic diversity at the population level (HS of 0.644) and high population diversity at the species level (HT value of 0.737) were estimated. Significant moderate population differentiation was detected (FST value of 0.129). Significant excess homozygosity (FIS of 0.105) and recent population bottlenecks were detected in thirty-six populations. Neighbour-joining tree, principal components analysis and Bayesian clusters all revealed that Chinese goat populations could be subdivided into at least four genetic clusters: Southwest China, South China, Northwest China and East China. It was observed that the genetic diversity of Northern China goats was highest among these clusters. The results here suggested that the goat populations in Southwest China might be the earliest domestic goats in China. Conclusions/Significance Our results suggested that the current genetic structure of Chinese goats were resulted from the special geographical structure, especially in the Western China, and the Western goat populations had been separated by the geographic structure (Hengduan Mountains and Qinling Mountains-Huaihe River Line) into two clusters: the Southwest and Northwest. It also indicated that the current genetic structure was caused by the geographical origin mainly, in close accordance with the human’s migration history throughout China. This study provides a fundamental genetic profile for the conservation of these populations and better to understand the domestication process and origin of Chinese goats. PMID:24718092
Landsat-TM identification of Amblyomma variegatum (Acari: Ixodidae) habitats in Guadeloupe
NASA Technical Reports Server (NTRS)
Hugh-Jones, M.; Barre, N.; Nelson, G.; Wehnes, K.; Warner, J.; Garvin, J.; Garris, G.
1992-01-01
The feasibility of identifying specific habitats of the African bont tick, Amblyomma variegatum, from Landsat-TM images was investigated by comparing remotely sensed images of visible farms in Grande Terre (Guadeloupe) with field observations made in the same period of time (1986-1987). The different tick habitates could be separated using principal component analysis. The analysis clustered the sites by large and small variance of band values, and by vegetation and moisture indexes. It was found that herds in heterogeneous sites with large variances had more ticks than those in homogeneous or low variance sites. Within the heterogeneous sites, those with high vegetation and moisture indexes had more ticks than those with low values.
NASA Astrophysics Data System (ADS)
Panahi, Nima S.
We studied the problem of understanding and computing the essential features and dynamics of molecular motions through the development of two theories for two different systems. First, we studied the process of the Berry Pseudorotation of PF5 and the rotations it induces in the molecule through its natural and intrinsic geometric nature by setting it in the language of fiber bundles and graph theory. With these tools, we successfully extracted the essentials of the process' loops and induced rotations. The infinite number of pseudorotation loops were broken down into a small set of essential loops called "super loops", with their intrinsic properties and link to the physical movements of the molecule extensively studied. In addition, only the three "self-edge loops" generated any induced rotations, and then only a finite number of classes of them. Second, we studied applying the statistical methods of Principal Components Analysis (PCA) and Principal Coordinate Analysis (PCO) to capture only the most important changes in Argon clusters so as to reduce computational costs and graph the potential energy surface (PES) in three dimensions respectively. Both methods proved successful, but PCA was only partially successful since one will only see advantages for PES database systems much larger than those both currently being studied and those that can be computationally studied in the next few decades to come. In addition, PCA is only needed for the very rare case of a PES database that does not already include Hessian eigenvalues.
Oh, Ching Mien; Heng, Paul Wan Sia; Chan, Lai Wah
2015-04-01
An understanding of the rheological behaviour of polymer melt suspensions is crucial in pharmaceutical manufacturing, especially when processed by spray congealing or melt extruding. However, a detailed comparison of the viscosities at each and every temperature and concentration between the various grades of adjuvants in the formulation will be tedious and time-consuming. Therefore, the statistical method, principal component analysis (PCA), was explored in this study. The composite formulations comprising polyethylene glycol (PEG) 3350 and hydroxypropyl methylcellulose (HPMC) of ten different grades (K100 LV, K4M, K15M, K100M, E15 LV, E50 LV, E4M, F50 LV, F4M and Methocel VLV) at various concentrations were prepared and their viscosities at different temperatures determined. Surface plots showed that concentration of HPMC had a greater effect on the viscosity compared to temperature. Particle size and size distribution of HPMC played an important role in the viscosity of melt suspensions. Smaller particles led to a greater viscosity than larger particles. PCA was used to evaluate formulations of different viscosities. The complex viscosity profiles of the various formulations containing HPMC were successfully classified into three clusters of low, moderate and high viscosity. Formulations within each group showed similar viscosities despite differences in grade or concentration of HPMC. Formulations in the low viscosity cluster were found to be sprayable. PCA was able to differentiate the complex viscosity profiles of different formulations containing HPMC in an efficient and time-saving manner and provided an excellent visualisation of the data.
NASA Astrophysics Data System (ADS)
Yang, Haiqing; Wu, Di; He, Yong
2007-11-01
Near-infrared spectroscopy (NIRS) with the characteristics of high speed, non-destructiveness, high precision and reliable detection data, etc. is a pollution-free, rapid, quantitative and qualitative analysis method. A new approach for variety discrimination of brown sugars using short-wave NIR spectroscopy (800-1050nm) was developed in this work. The relationship between the absorbance spectra and brown sugar varieties was established. The spectral data were compressed by the principal component analysis (PCA). The resulting features can be visualized in principal component (PC) space, which can lead to discovery of structures correlative with the different class of spectral samples. It appears to provide a reasonable variety clustering of brown sugars. The 2-D PCs plot obtained using the first two PCs can be used for the pattern recognition. Least-squares support vector machines (LS-SVM) was applied to solve the multivariate calibration problems in a relatively fast way. The work has shown that short-wave NIR spectroscopy technique is available for the brand identification of brown sugar, and LS-SVM has the better identification ability than PLS when the calibration set is small.
Comparative multivariate analysis of biometric traits of West African Dwarf and Red Sokoto goats.
Yakubu, Abdulmojeed; Salako, Adebowale E; Imumorin, Ikhide G
2011-03-01
The population structure of 302 randomly selected West African Dwarf (WAD) and Red Sokoto (RS) goats was examined using multivariate morphometric analyses. This was to make the case for conservation, rational management and genetic improvement of these two most important Nigerian goat breeds. Fifteen morphometric measurements were made on each individual animal. RS goats were superior (P<0.05) to the WAD for the body size and skeletal proportions investigated. The phenotypic variability between the two breeds was revealed by their mutual responses in the principal components. While four principal components were extracted for WAD goats, three components were obtained for their RS counterparts with variation in the loading traits of each component for each breed. The Mahalanobis distance of 72.28 indicated a high degree of spatial racial separation in morphology between the genotypes. The Ward's option of the cluster analysis consolidated the morphometric distinctness of the two breeds. Application of selective breeding to genetic improvement would benefit from the detected phenotypic differentiation. Other implications for management and conservation of the goats are highlighted.
AFLP analysis of Cynodon dactylon (L.) Pers. var. dactylon genetic variation.
Wu, Y Q; Taliaferro, C M; Bai, G H; Anderson, M P
2004-08-01
Cynodon dactylon (L.) Pers. var. dactylon (common bermudagrass) is geographically widely distributed between about lat 45 degrees N and lat 45 degrees S, penetrating to about lat 53 degrees N in Europe. The extensive variation of morphological and adaptive characteristics of the taxon is substantially documented, but information is lacking on DNA molecular variation in geographically disparate forms. Accordingly, this study was conducted to assess molecular genetic variation and genetic relatedness among 28 C. dactylon var. dactylon accessions originating from 11 countries on 4 continents (Africa, Asia, Australia, and Europe). A fluorescence-labeled amplified fragment length polymorphism (AFLP) DNA profiling method was used to detect the genetic diversity and relatedness. On the basis of 443 polymorphic AFLP fragments from 8 primer combinations, the accessions were grouped into clusters and subclusters associating with their geographic origins. Genetic similarity coefficients (SC) for the 28 accessions ranged from 0.53 to 0.98. Accessions originating from Africa, Australia, Asia, and Europe formed major groupings as indicated by cluster and principal coordinate analysis. Accessions from Australia and Asia, though separately clustered, were relatively closely related and most distantly related to accessions of European origin. African accessions formed two distant clusters and had the greatest variation in genetic relatedness relative to accessions from other geographic regions. Sampling the full extent of genetic variation in C. dactylon var. dactylon would require extensive germplasm collection in the major geographic regions of its distributional range.
The NGC 4839 group falling into the Coma cluster observed by XMM-Newton
NASA Astrophysics Data System (ADS)
Neumann, D. M.; Arnaud, M.; Gastaud, R.; Aghanim, N.; Lumb, D.; Briel, U. G.; Vestrand, W. T.; Stewart, G. C.; Molendi, S.; Mittaz, J. P. D.
2001-01-01
We present here the first analysis of the XMM-Newton EPIC-MOS data of the galaxy group around NGC 4839, which lies at a projected distance to the Coma cluster center of 1.6h50-1 Mpc. In our analysis, which includes imaging, spectro-imaging and spectroscopy we find compelling evidence for the sub group being on its first infall onto the Coma cluster. The complex temperature structure around NGC 4839 is consistent with simulations of galaxies falling into a cluster environment. We see indications of a bow shock and of ram pressure stripping around NGC 4839. Furthermore our data reveal a displacement between NGC 4839 and the center of the hot gas in the group of about 300h50-1 kpc. With a simple approximation we can explain this displacement by the pressure force originating from the infall, which acts much stronger on the group gas than on the galaxies. Based on observations obtained with XMM-Newton, an ESA science mission with instruments and contributions directly funded by ESA Member States and the USA (NASA). EPIC was developed by the EPIC Consortium led by the Principal Investigator, Dr. M. J. L. Turner. The consortium comprises the following Institutes: University of Leicester, University of Birmingham, (UK); CEA/Saclay, IAS Orsay, CESR Toulouse, (France); IAAP Tuebingen, MPE Garching, (Germany); IFC Milan, ITESRE Bologna, IAUP Palermo, Italy. EPIC is funded by: PPARC, CEA, CNES, DLR and ASI.
Zielinski, Acácio Antonio Ferreira; Ávila, Suelen; Ito, Vivian; Nogueira, Alessandro; Wosiacki, Gilvan; Haminiuk, Charles Windson Isidoro
2014-04-01
A total of 19 Brazilian frozen pulps from the following fruits: açai (Euterpe oleracea), blackberry (Rubus sp.), cajá (Spondias mombin), cashew (Anacardium occidentale), cocoa (Theobroma cacao), coconut (Cocos nucifera), grape (Vitis sp.), graviola (Annona muricata), guava (Psidium guajava), papaya (Carica papaya), peach (Prunus persica), pineapple (Ananas comosus), pineapple and mint (A. comosus and Mentha spicata), red fruits (Rubus sp. and Fragaria sp.), seriguela (Spondias purpurea), strawberry (Fragaria sp.), tamarind (Tamarindus indica), umbu (Spondias tuberosa), and yellow passion fruit (Passiflora edulis) were analyzed in terms of chromaticity, phenolic compounds, carotenoids, and in vitro antioxidant activity using ferric reducing antioxidant power (FRAP) and 1,1-diphenyl-2-picrylhydrazyl (DPPH) assays. Data were processed using principal component analysis (PCA) and hierarchical cluster analysis (HCA). Antioxidant capacity was measured by DPPH and FRAP assays, which showed significant (P < 0.01) correlation with total phenolic compounds (r = 0.88 and 0.70, respectively), total flavonoids (r = 0.63 and 0.81, respectively), and total monomeric anthocyanins (r = 0.59 and 0.73, respectively). PCA explained 74.82% of total variance of data, and the separation into 3 groups in a scatter plot was verified. Three clusters also suggested by HCA, corroborated with PCA, in which cluster 3 was formed by strawberry, red fruits, blackberry, açaí, and grape pulps. This cluster showed the highest contents of total phenolic compounds, total flavonoids, and antioxidant activity. © 2014 Institute of Food Technologists®
An efficient method to identify differentially expressed genes in microarray experiments
Qin, Huaizhen; Feng, Tao; Harding, Scott A.; Tsai, Chung-Jui; Zhang, Shuanglin
2013-01-01
Motivation Microarray experiments typically analyze thousands to tens of thousands of genes from small numbers of biological replicates. The fact that genes are normally expressed in functionally relevant patterns suggests that gene-expression data can be stratified and clustered into relatively homogenous groups. Cluster-wise dimensionality reduction should make it feasible to improve screening power while minimizing information loss. Results We propose a powerful and computationally simple method for finding differentially expressed genes in small microarray experiments. The method incorporates a novel stratification-based tight clustering algorithm, principal component analysis and information pooling. Comprehensive simulations show that our method is substantially more powerful than the popular SAM and eBayes approaches. We applied the method to three real microarray datasets: one from a Populus nitrogen stress experiment with 3 biological replicates; and two from public microarray datasets of human cancers with 10 to 40 biological replicates. In all three analyses, our method proved more robust than the popular alternatives for identification of differentially expressed genes. Availability The C++ code to implement the proposed method is available upon request for academic use. PMID:18453554
Spike sorting based upon machine learning algorithms (SOMA).
Horton, P M; Nicol, A U; Kendrick, K M; Feng, J F
2007-02-15
We have developed a spike sorting method, using a combination of various machine learning algorithms, to analyse electrophysiological data and automatically determine the number of sampled neurons from an individual electrode, and discriminate their activities. We discuss extensions to a standard unsupervised learning algorithm (Kohonen), as using a simple application of this technique would only identify a known number of clusters. Our extra techniques automatically identify the number of clusters within the dataset, and their sizes, thereby reducing the chance of misclassification. We also discuss a new pre-processing technique, which transforms the data into a higher dimensional feature space revealing separable clusters. Using principal component analysis (PCA) alone may not achieve this. Our new approach appends the features acquired using PCA with features describing the geometric shapes that constitute a spike waveform. To validate our new spike sorting approach, we have applied it to multi-electrode array datasets acquired from the rat olfactory bulb, and from the sheep infero-temporal cortex, and using simulated data. The SOMA sofware is available at http://www.sussex.ac.uk/Users/pmh20/spikes.
Hadjisolomou, Ekaterini; Stefanidis, Konstantinos; Papatheodorou, George; Papastergiadou, Evanthia
2018-03-19
During the last decades, Mediterranean freshwater ecosystems, especially lakes, have been under severe pressure due to increasing eutrophication and water quality deterioration. In this article, we compared the effectiveness of different data analysis methods by assessing the contribution of environmental parameters to eutrophication processes. For this purpose, principal components analysis (PCA), cluster analysis, and a self-organizing map (SOM) were applied, using water quality data from two transboundary lakes of North Greece. SOM is considered as an advanced and powerful data analysis tool because of its ability to represent complex and nonlinear relationships among multivariate data sets. The results of PCA and cluster analysis agreed with the SOM results, although the latter provided more information because of the visualization abilities regarding the parameters' relationships. Besides nutrients that were found to be a key factor for controlling chlorophyll-a (Chl - a), water temperature was related positively with algal production, while the Secchi disk depth parameter was found to be highly important and negatively related toeutrophic conditions. In general, the SOM results were more specific and allowed direct associations between the water quality variables. Our work showed that SOMs can be used effectively in limnological studies to produce robust and interpretable results, aiding scientists and managers to cope with environmental problems such as eutrophication.
Jung, Brian C.; Choi, Soo I.; Du, Annie X.; Cuzzocreo, Jennifer L.; Geng, Zhuo Z.; Ying, Howard S.; Perlman, Susan L.; Toga, Arthur W.; Prince, Jerry L.
2014-01-01
Although “cerebellar ataxia” is often used in reference to a disease process, presumably there are different underlying pathogenetic mechanisms for different subtypes. Indeed, spinocerebellar ataxia (SCA) types 2 and 6 demonstrate complementary phenotypes, thus predicting a different anatomic pattern of degeneration. Here, we show that an unsupervised classification method, based on principal component analysis (PCA) of cerebellar shape characteristics, can be used to separate SCA2 and SCA6 into two classes, which may represent disease-specific archetypes. Patients with SCA2 (n=11) and SCA6 (n=7) were compared against controls (n=15) using PCA to classify cerebellar anatomic shape characteristics. Within the first three principal components, SCA2 and SCA6 differed from controls and from each other. In a secondary analysis, we studied five additional subjects and found that these patients were consistent with the previously defined archetypal clusters of clinical and anatomical characteristics. Secondary analysis of five subjects with related diagnoses showed that disease groups that were clinically and pathophysiologically similar also shared similar anatomic characteristics. Specifically, Archetype #1 consisted of SCA3 (n=1) and SCA2, suggesting that cerebellar syndromes accompanied by atrophy of the pons may be associated with a characteristic pattern of cerebellar neurodegeneration. In comparison, Archetype #2 was comprised of disease groups with pure cerebellar atrophy (episodic ataxia type 2 (n=1), idiopathic late-onset cerebellar ataxias (n=3), and SCA6). This suggests that cerebellar shape analysis could aid in discriminating between different pathologies. Our findings further suggest that magnetic resonance imaging is a promising imaging biomarker that could aid in the diagnosis and therapeutic management in patients with cerebellar syndromes. PMID:22258915
Early Environment and Neurobehavioral Development Predict Adult Temperament Clusters
Congdon, Eliza; Service, Susan; Wessman, Jaana; Seppänen, Jouni K.; Schönauer, Stefan; Miettunen, Jouko; Turunen, Hannu; Koiranen, Markku; Joukamaa, Matti; Järvelin, Marjo-Riitta; Veijola, Juha; Mannila, Heikki; Paunio, Tiina; Freimer, Nelson B.
2012-01-01
Background Investigation of the environmental influences on human behavioral phenotypes is important for our understanding of the causation of psychiatric disorders. However, there are complexities associated with the assessment of environmental influences on behavior. Methods/Principal Findings We conducted a series of analyses using a prospective, longitudinal study of a nationally representative birth cohort from Finland (the Northern Finland 1966 Birth Cohort). Participants included a total of 3,761 male and female cohort members who were living in Finland at the age of 16 years and who had complete temperament scores. Our initial analyses (Wessman et al., in press) provide evidence in support of four stable and robust temperament clusters. Using these temperament clusters, as well as independent temperament dimensions for comparison, we conducted a data-driven analysis to assess the influence of a broad set of life course measures, assessed pre-natally, in infancy, and during adolescence, on adult temperament. Results Measures of early environment, neurobehavioral development, and adolescent behavior significantly predict adult temperament, classified by both cluster membership and temperament dimensions. Specifically, our results suggest that a relatively consistent set of life course measures are associated with adult temperament profiles, including maternal education, characteristics of the family’s location and residence, adolescent academic performance, and adolescent smoking. Conclusions Our finding that a consistent set of life course measures predict temperament clusters indicate that these clusters represent distinct developmental temperament trajectories and that information about a subset of life course measures has implications for adult health outcomes. PMID:22815688
Liu, Zhangxiong; Li, Huihui; Wen, Zixiang; Fan, Xuhong; Li, Yinghui; Guan, Rongxia; Guo, Yong; Wang, Shuming; Wang, Dechun; Qiu, Lijuan
2017-01-01
Soybean is one of the most important economic crops for both China and the United States (US). The exchange of germplasm between these two countries has long been active. In order to investigate genetic relationships between Chinese and US soybean germplasm, 277 Chinese soybean accessions and 300 US soybean accessions from geographically diverse regions were analyzed using 5,361 SNP markers. The genetic diversity and the polymorphism information content (PIC) of the Chinese accessions was higher than that of the US accessions. Population structure analysis, principal component analysis, and cluster analysis all showed that the genetic basis of Chinese soybeans is distinct from that of the USA. The groupings observed in clustering analysis reflected the geographical origins of the accessions; this conclusion was validated with both genetic distance analysis and relative kinship analysis. FST-based and EigenGWAS statistical analysis revealed high genetic variation between the two subpopulations. Analysis of the 10 loci with the strongest selection signals showed that many loci were located in chromosome regions that have previously been identified as quantitative trait loci (QTL) associated with environmental-adaptation-related and yield-related traits. The pattern of diversity among the American and Chinese accessions should help breeders to select appropriate parental accessions to enhance the performance of future soybean cultivars. PMID:29250088
Identifying Symptom Patterns in People Living With HIV Disease.
Wilson, Natalie L; Azuero, Andres; Vance, David E; Richman, Joshua S; Moneyham, Linda D; Raper, James L; Heath, Sonya L; Kempf, Mirjam-Colette
2016-01-01
Symptoms guide disease management, and patients frequently report HIV-related symptoms, but HIV symptom patterns reported by patients have not been described in the era of improved antiretroviral treatment. The objectives of our study were to investigate the prevalence and burden of symptoms in people living with HIV and attending an outpatient clinic. The prevalence, burden, and bothersomeness of symptoms reported by patients in routine clinic visits during 2011 were assessed using the 20-item HIV Symptom Index. Principal component analysis was used to identify symptom clusters and relationships between groups using appropriate statistic techniques. Two main clusters were identified. The most prevalent and bothersome symptoms were muscle aches/joint pain, fatigue, and poor sleep. A third of patients had seven or more symptoms, including the most burdensome symptoms. Even with improved antiretroviral drug side-effect profiles, symptom prevalence and burden, independent of HIV viral load and CD4+ T cell count, are high. Published by Elsevier Inc.
Identifying Symptom Patterns in People Living With HIV Disease
Wilson, Natalie L.; Azuero, Andres; Vance, David E.; Richman, Joshua S.; Moneyham, Linda D.; Raper, James L.; Heath, Sonya L.; Kempf, Mirjam-Colette
2016-01-01
Symptoms guide disease management, and patients frequently report HIV-related symptoms, but HIV symptom patterns reported by patients have not been described in the era of improved antiretroviral treatment. The objectives of our study were to investigate the prevalence and burden of symptoms in people living with HIV and attending an outpatient clinic. The prevalence, burden, and bothersomeness of symptoms reported by patients in routine clinic visits during 2011 were assessed using the 20-item HIV Symptom Index. Principal component analysis was used to identify symptom clusters and relationships between groups using appropriate statistic techniques. Two main clusters were identified. The most prevalent and bothersome symptoms were muscle aches/joint pain, fatigue, and poor sleep. A third of patients had seven or more symptoms, including the most burdensome symptoms. Even with improved antiretroviral drug side-effect profiles, symptom prevalence and burden, independent of HIV viral load and CD4+ T cell count, are high. PMID:26790340
Dimension Reduction of Hyperspectral Data on Beowulf Clusters
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek
2000-01-01
Traditional remote sensing instruments are multispectral, where observations are collected at a few different spectral bands. Recently, many hyperspectral instruments, that can collect observations at hundreds of bands, have been operation. Furthermore, there have been ongoing research efforts on ultraspectral instruments that can produce observations at thousands of spectral bands. While these remote sensing technology developments hold a great promise for new findings in the area of Earth and space science, they present many challenges. These include the need for faster processing of such increased data volumes, and methods for data reduction. Dimension Reduction is a spectral transformation, which is used widely in remote sensing, is the Principal Components Analysis (PCA). In light of the growing number of spectral channels of modern instruments, the paper reports on the development of a parallel PCA and its implementation on two Beowulf cluster configurations, on with fast Ethernet switch and the other is with a Myrinet interconnection.
[Research on spectra recognition method for cabbages and weeds based on PCA and SIMCA].
Zu, Qin; Deng, Wei; Wang, Xiu; Zhao, Chun-Jiang
2013-10-01
In order to improve the accuracy and efficiency of weed identification, the difference of spectral reflectance was employed to distinguish between crops and weeds. Firstly, the different combinations of Savitzky-Golay (SG) convolutional derivation and multiplicative scattering correction (MSC) method were applied to preprocess the raw spectral data. Then the clustering analysis of various types of plants was completed by using principal component analysis (PCA) method, and the feature wavelengths which were sensitive for classifying various types of plants were extracted according to the corresponding loading plots of the optimal principal components in PCA results. Finally, setting the feature wavelengths as the input variables, the soft independent modeling of class analogy (SIMCA) classification method was used to identify the various types of plants. The experimental results of classifying cabbages and weeds showed that on the basis of the optimal pretreatment by a synthetic application of MSC and SG convolutional derivation with SG's parameters set as 1rd order derivation, 3th degree polynomial and 51 smoothing points, 23 feature wavelengths were extracted in accordance with the top three principal components in PCA results. When SIMCA method was used for classification while the previously selected 23 feature wavelengths were set as the input variables, the classification rates of the modeling set and the prediction set were respectively up to 98.6% and 100%.
Balouchestani, Mohammadreza; Krishnan, Sridhar
2014-01-01
Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.
Coarse Point Cloud Registration by Egi Matching of Voxel Clusters
NASA Astrophysics Data System (ADS)
Wang, Jinhu; Lindenbergh, Roderik; Shen, Yueqian; Menenti, Massimo
2016-06-01
Laser scanning samples the surface geometry of objects efficiently and records versatile information as point clouds. However, often more scans are required to fully cover a scene. Therefore, a registration step is required that transforms the different scans into a common coordinate system. The registration of point clouds is usually conducted in two steps, i.e. coarse registration followed by fine registration. In this study an automatic marker-free coarse registration method for pair-wise scans is presented. First the two input point clouds are re-sampled as voxels and dimensionality features of the voxels are determined by principal component analysis (PCA). Then voxel cells with the same dimensionality are clustered. Next, the Extended Gaussian Image (EGI) descriptor of those voxel clusters are constructed using significant eigenvectors of each voxel in the cluster. Correspondences between clusters in source and target data are obtained according to the similarity between their EGI descriptors. The random sampling consensus (RANSAC) algorithm is employed to remove outlying correspondences until a coarse alignment is obtained. If necessary, a fine registration is performed in a final step. This new method is illustrated on scan data sampling two indoor scenarios. The results of the tests are evaluated by computing the point to point distance between the two input point clouds. The presented two tests resulted in mean distances of 7.6 mm and 9.5 mm respectively, which are adequate for fine registration.
Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks
Mall, Raghvendra; Langone, Rocco; Suykens, Johan A. K.
2014-01-01
Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks. PMID:24949877
Cheng, Lin; Zhu, Yang; Sun, Junfeng; Deng, Lifu; He, Naying; Yang, Yang; Ling, Huawei; Ayaz, Hasan; Fu, Yi; Tong, Shanbao
2018-01-25
Task-related reorganization of functional connectivity (FC) has been widely investigated. Under classic static FC analysis, brain networks under task and rest have been demonstrated a general similarity. However, brain activity and cognitive process are believed to be dynamic and adaptive. Since static FC inherently ignores the distinct temporal patterns between rest and task, dynamic FC may be more a suitable technique to characterize the brain's dynamic and adaptive activities. In this study, we adopted [Formula: see text]-means clustering to investigate task-related spatiotemporal reorganization of dynamic brain networks and hypothesized that dynamic FC would be able to reveal the link between resting-state and task-state brain organization, including broadly similar spatial patterns but distinct temporal patterns. In order to test this hypothesis, this study examined the dynamic FC in default-mode network (DMN) and motor-related network (MN) using Blood-Oxygenation-Level-Dependent (BOLD)-fMRI data from 26 healthy subjects during rest (REST) and a hand closing-and-opening (HCO) task. Two principal FC states in REST and one principal FC state in HCO were identified. The first principal FC state in REST was found similar to that in HCO, which appeared to represent intrinsic network architecture and validated the broadly similar spatial patterns between REST and HCO. However, the second FC principal state in REST with much shorter "dwell time" implied the transient functional relationship between DMN and MN during REST. In addition, a more frequent shifting between two principal FC states indicated that brain network dynamically maintained a "default mode" in the motor system during REST, whereas the presence of a single principal FC state and reduced FC variability implied a more temporally stable connectivity during HCO, validating the distinct temporal patterns between REST and HCO. Our results further demonstrated that dynamic FC analysis could offer unique insights in understanding how the brain reorganizes itself during rest and task states, and the ways in which the brain adaptively responds to the cognitive requirements of tasks.
Malaquias, José B; Ramalho, Francisco S; Dos S Dias, Carlos T; Brugger, Bruno P; S Lira, Aline Cristina; Wilcken, Carlos F; Pachú, Jéssica K S; Zanuncio, José C
2017-02-09
The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied.
Malaquias, José B.; Ramalho, Francisco S.; dos S. Dias, Carlos T.; Brugger, Bruno P.; S. Lira, Aline Cristina; Wilcken, Carlos F.; Pachú, Jéssica K. S.; Zanuncio, José C.
2017-01-01
The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied. PMID:28181503
NASA Astrophysics Data System (ADS)
Malaquias, José B.; Ramalho, Francisco S.; Dos S. Dias, Carlos T.; Brugger, Bruno P.; S. Lira, Aline Cristina; Wilcken, Carlos F.; Pachú, Jéssica K. S.; Zanuncio, José C.
2017-02-01
The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied.
Breland, Jessica Y; Hundt, Natalie E; Barrera, Terri L; Mignogna, Joseph; Petersen, Nancy J; Stanley, Melinda A; Cully, Jeffery A
2015-10-01
Treatment of chronic obstructive pulmonary disease (COPD) is palliative, and quality of life is important. Increased understanding of correlates of quality of life and its domains could help clinicians and researchers better tailor COPD treatments and better support patients engaging in those treatments or other important self-management behaviors. Anxiety is common in those with COPD; however, overlap of physical and emotional symptoms complicates its assessment. The current study aimed to identify anxiety symptom clusters and to assess the association of these symptom clusters with COPD-related quality of life. Participants (N = 162) with COPD completed the Beck Anxiety Inventory (BAI), Chronic Respiratory Disease Questionnaire, Patient Health Questionnaire-9, and Medical Research Council dyspnea scale. Anxiety clusters were identified, using principal component analysis (PCA) on the BAI's 21 items. Anxiety clusters, along with factors previously associated with quality of life, were entered into a multiple regression designed to predict COPD-related quality of life. PCA identified four symptom clusters related to (1) general somatic distress, (2) fear, (3) nervousness, and (4) respiration-related distress. Multiple regression analyses indicated that greater fear was associated with less perceived mastery over COPD (β = -0.19, t(149) = -2.69, p < 0.01). Anxiety symptoms associated with fear appear to be an important indicator of anxiety in patients with COPD. In particular, fear was associated with perceptions of mastery, an important psychological construct linked to disease self-management. Assessing the BAI symptom cluster associated with fear (five items) may be a valuable rapid assessment tool to improve COPD treatment and physical health outcomes.
Rahimi, Mohammad Ali; Nazeri, Vahideh; Andi, Seyed Ali; Sefidkon, Fatemeh
2018-05-21
In present work, the chemical composition of the essential oils obtained from dried flowering aerial parts of Teucrium hircanicum L. (Labiatae) originated from ten wild populations in Iran was analyzed by a GC-FID and GC/MS system. The oil yields varied from 0.04% to 0.1%. A total of thirty-two compounds representing 67.6-97.7% of the oil were identified. The essential oil was found to be rich in sesquiterpene hydrocarpons (E)-α-bergamotene (17.5-86.9%) and (E)-β-farnesene (0.5-21.4%). Of the total identified compounds, sesquiterpene hydrocarpons (36.1-89.7%) were included the greatest essential oil fraction in all the populations, followed by oxygenated monoterpenes (2.2-21.6%), oxygenated sesquiterpenes (0.0-14.4%) and monoterepene hydrocarbons (0.0-9.5%). Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) were used to distinguish any geographical variations, indicating that the clustering of populations is related to their geographic origin. According to the GC/MS analysis, two chemotypes consisting of (E)-α-bergamotene and (E)-α-bergamotene-(E)-β-farnesene were identified in the populations.
Physico-chemical trends in the sediments of Agbede Wetlands, Nigeria
NASA Astrophysics Data System (ADS)
Dirisu, Abdul-Rahman; Olomukoro, John Ovie; Ezenwa, Ifeanyi Maxwell
2017-07-01
This study assessed the physico-chemical status of sediments in the Agbede Wetlands with the aim to create a reference archive for the Edo North catchment and to further identify the characteristics mostly influenced by the natural and anthropogenic activities going on at the watershed. Nutrients, zinc, nickel and lead were identified to be mostly of anthropogenic origin, while alkali metals and alkaline earth metals were from both anthropogenic and natural sources. The clustering of stations 1 and 4 indicates that the sediment quality in the lentic systems was not completely excluded from the lotic system, suggesting that principal component analysis (PCA) and cluster analysis (CA) techniques are invaluable tools for identifying factors influencing the sediment quality. The mean values of the particle size distribution were in the following order across the ecosystems: sand (61.86-80.53%) > silt (9.75-30.34%) > clay (7.83-13.89%). The contamination of the water bodies was primarily derived from agricultural run-offs and through geochemical weathering of the top soils. Therefore, our analysis indicates that the concentrations of cations, anions and nutrients in the sediments of the lotic and lentic ecosystems in Agbede Wetlands are not at an alarming level.
Assessment of self-organizing maps to analyze sole-carbon source utilization profiles.
Leflaive, Joséphine; Céréghino, Régis; Danger, Michaël; Lacroix, Gérard; Ten-Hage, Loïc
2005-07-01
The use of community-level physiological profiles obtained with Biolog microplates is widely employed to consider the functional diversity of bacterial communities. Biolog produces a great amount of data which analysis has been the subject of many studies. In most cases, after some transformations, these data were investigated with classical multivariate analyses. Here we provided an alternative to this method, that is the use of an artificial intelligence technique, the Self-Organizing Maps (SOM, unsupervised neural network). We used data from a microcosm study of algae-associated bacterial communities placed in various nutritive conditions. Analyses were carried out on the net absorbances at two incubation times for each substrates and on the chemical guild categorization of the total bacterial activity. Compared to Principal Components Analysis and cluster analysis, SOM appeared as a valuable tool for community classification, and to establish clear relationships between clusters of bacterial communities and sole-carbon sources utilization. Specifically, SOM offered a clear bidimensional projection of a relatively large volume of data and were easier to interpret than plots commonly obtained with multivariate analyses. They would be recommended to pattern the temporal evolution of communities' functional diversity.
Zhou, L X; Xiao, Y; Xia, W; Yang, Y D
2015-12-08
Genetic diversity and patterns of population structure of the 94 oil palm lines were investigated using species-specific simple sequence repeat (SSR) markers. We designed primers for 63 SSR loci based on their flanking sequences and conducted amplification in 94 oil palm DNA samples. The amplification result showed that a relatively high level of genetic diversity was observed between oil palm individuals according a set of 21 polymorphic microsatellite loci. The observed heterozygosity (Ho) was 0.3683 and 0.4035, with an average of 0.3859. The Ho value was a reliable determinant of the discriminatory power of the SSR primer combinations. The principal component analysis and unweighted pair-group method with arithmetic averaging cluster analysis showed the 94 oil palm lines were grouped into one cluster. These results demonstrated that the oil palm in Hainan Province of China and the germplasm introduced from Malaysia may be from the same source. The SSR protocol was effective and reliable for assessing the genetic diversity of oil palm. Knowledge of the genetic diversity and population structure will be crucial for establishing appropriate management stocks for this species.
TU-CD-BRB-12: Radiogenomics of MRI-Guided Prostate Cancer Biopsy Habitats
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stoyanova, R; Lynne, C; Abraham, S
2015-06-15
Purpose: Diagnostic prostate biopsies are subject to sampling bias. We hypothesize that quantitative imaging with multiparametric (MP)-MRI can more accurately direct targeted biopsies to index lesions associated with highest risk clinical and genomic features. Methods: Regionally distinct prostate habitats were delineated on MP-MRI (T2-weighted, perfusion and diffusion imaging). Directed biopsies were performed on 17 habitats from 6 patients using MRI-ultrasound fusion. Biopsy location was characterized with 52 radiographic features. Transcriptome-wide analysis of 1.4 million RNA probes was performed on RNA from each habitat. Genomics features with insignificant expression values (<0.25) and interquartile range <0.5 were filtered, leaving total of 212more » genes. Correlation between imaging features, genes and a 22 feature genomic classifier (GC), developed as a prognostic assay for metastasis after radical prostatectomy was investigated. Results: High quality genomic data was derived from 17 (100%) biopsies. Using the 212 ‘unbiased’ genes, the samples clustered by patient origin in unsupervised analysis. When only prostate cancer related genomic features were used, hierarchical clustering revealed samples clustered by needle-biopsy Gleason score (GS). Similarly, principal component analysis of the imaging features, found the primary source of variance segregated the samples into high (≥7) and low (6) GS. Pearson’s correlation analysis of genes with significant expression showed two main patterns of gene expression clustering prostate peripheral and transitional zone MRI features. Two-way hierarchical clustering of GC with radiomics features resulted in the expected groupings of high and low expressed genes in this metastasis signature. Conclusions: MP-MRI-targeted diagnostic biopsies can potentially improve risk stratification by directing pathological and genomic analysis to clinically significant index lesions. As determinant lesions are more reliably identified, targeting with radiotherapy should improve outcome. This is the first demonstration of a link between quantitative imaging features (radiomics) with genomic features in MRI-directed prostate biopsies. The research was supported by NIH- NCI R01 CA 189295 and R01 CA 189295; E Davicioni is partial owner of GenomeDx Biosciences, Inc. M Takhar, N Erho, L Lam, C Buerki and E Davicioni are current employees at GenomeDx Biosciences, Inc.« less
Characterization of synoptic patterns causing dust outbreaks that affect the Arabian Peninsula
NASA Astrophysics Data System (ADS)
Hermida, L.; Merino, A.; Sánchez, J. L.; Fernández-González, S.; García-Ortega, E.; López, L.
2018-01-01
Dust storms pose serious weather hazards in arid and semiarid regions of the earth. Understanding the main synoptic conditions that give rise to dust outbreaks is important for issuing forecasts and warnings to the public in cases of severe storms. The aim of the present study is to determine synoptic patterns that are associated with or even favor dust outbreaks over the Arabian Peninsula. In this respect, red-green-blue dust composite images from the Meteosat Second Generation (MSG) satellite are used to detect dust outbreaks affecting the Arabian Peninsula, with possible influences in southwestern Asia and northeastern Africa, between 2005 and 2013. The Meteosat imagery yielded a sample of 95 dust storm days. Meteorological fields from NCEP/NCAR reanalysis data of wind fields at 10 m and 250 hPa, mean sea level pressure, and geopotential heights at 850 and 500 hPa were obtained for the dust storm days. Using principal component analysis in T-mode and non-hierarchical k-means clustering, we obtained four major atmospheric circulation patterns associated with dust outbreaks during the study days. Cluster 4 had the largest number of days with dust events, which were constrained to summer, and cluster 3 had the fewest. In clusters 1, 2 and 3, the jet stream favored the entry of a low-pressure area or trough that varied in location between the three clusters. Their most northerly location was found in cluster 4, along with an extensive low-pressure area supporting strong winds over the Arabian Peninsula. The spatial distribution of aerosol optical depth for each cluster obtained was characterized using the Moderate Resolution Imaging Spectroradiometer data. Then, using METAR stations, clusters were also characterized in terms of frequency and visibility.
Seo, Joo Hee; Lee, Jun Heon; Kong, Hong Sik
2017-01-01
Objective This study was conducted to investigate the basic information on genetic structure and characteristics of Korean Native chickens (NC) and foreign breeds through the analysis of the pure chicken populations and commercial chicken lines of the Hanhyup Company which are popular in the NC market, using the 20 microsatellite markers. Methods In this study, the genetic diversity and phylogenetic relationships of 445 NC from five different breeds (NC, Leghorn [LH], Cornish [CS], Rhode Island Red [RIR], and Hanhyup [HH] commercial line) were investigated by performing genotyping using 20 microsatellite markers. Results The highest genetic distance was observed between RIR and LH (18.9%), whereas the lowest genetic distance was observed between HH and NC (2.7%). In the principal coordinates analysis (PCoA) illustrated by the first component, LH was clearly separated from the other groups. The correspondence analysis showed close relationship among individuals belonging to the NC, CS, and HH lines. From the STRUCTURE program, the presence of 5 clusters was detected and it was found that the proportion of membership in the different clusters was almost comparable among the breeds with the exception of one breed (HH), although it was highest in LH (0.987) and lowest in CS (0.578). For the cluster 1 it was high in HH (0.582) and in CS (0.368), while for the cluster 4 it was relatively higher in HH (0.392) than other breeds. Conclusion Our study showed useful genetic diversity and phylogenetic relationship data that can be utilized for NC breeding and development by the commercial chicken industry to meet consumer demands. PMID:28335091
Kaestli, Mirjam; Mayo, Mark; Harrington, Glenda; Ward, Linda; Watt, Felicity; Hill, Jason V.; Cheng, Allen C.; Currie, Bart J.
2009-01-01
Background The soil-dwelling saprophyte bacterium Burkholderia pseudomallei is the cause of melioidosis, a severe disease of humans and animals in southeast Asia and northern Australia. Despite the detection of B. pseudomallei in various soil and water samples from endemic areas, the environmental habitat of B. pseudomallei remains unclear. Methodology/Principal Findings We performed a large survey in the Darwin area in tropical Australia and screened 809 soil samples for the presence of these bacteria. B. pseudomallei were detected by using a recently developed and validated protocol involving soil DNA extraction and real-time PCR targeting the B. pseudomallei–specific Type III Secretion System TTS1 gene cluster. Statistical analyses such as multivariable cluster logistic regression and principal component analysis were performed to assess the association of B. pseudomallei with environmental factors. The combination of factors describing the habitat of B. pseudomallei differed between undisturbed sites and environmentally manipulated areas. At undisturbed sites, the occurrence of B. pseudomallei was found to be significantly associated with areas rich in grasses, whereas at environmentally disturbed sites, B. pseudomallei was associated with the presence of livestock animals, lower soil pH and different combinations of soil texture and colour. Conclusions/Significance This study contributes to the elucidation of environmental factors influencing the occurrence of B. pseudomallei and raises concerns that B. pseudomallei may spread due to changes in land use. PMID:19156200
Badran, M; Morsy, R; Soliman, H; Elnimr, T
2016-01-01
The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.
NASA Astrophysics Data System (ADS)
Barette, Florian; Poppe, Sam; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu
2017-10-01
We present an integrated, spatially-explicit database of existing geochemical major-element analyses available from (post-) colonial scientific reports, PhD Theses and international publications for the Virunga Volcanic Province, located in the western branch of the East African Rift System. This volcanic province is characterised by alkaline volcanism, including silica-undersaturated, alkaline and potassic lavas. The database contains a total of 908 geochemical analyses of eruptive rocks for the entire volcanic province with a localisation for most samples. A preliminary analysis of the overall consistency of the database, using statistical techniques on sets of geochemical analyses with contrasted analytical methods or dates, demonstrates that the database is consistent. We applied a principal component analysis and cluster analysis on whole-rock major element compositions included in the database to study the spatial variation of the chemical composition of eruptive products in the Virunga Volcanic Province. These statistical analyses identify spatially distributed clusters of eruptive products. The known geochemical contrasts are highlighted by the spatial analysis, such as the unique geochemical signature of Nyiragongo lavas compared to other Virunga lavas, the geochemical heterogeneity of the Bulengo area, and the trachyte flows of Karisimbi volcano. Most importantly, we identified separate clusters of eruptive products which originate from primitive magmatic sources. These lavas of primitive composition are preferentially located along NE-SW inherited rift structures, often at distance from the central Virunga volcanoes. Our results illustrate the relevance of a spatial analysis on integrated geochemical data for a volcanic province, as a complement to classical petrological investigations. This approach indeed helps to characterise geochemical variations within a complex of magmatic systems and to identify specific petrologic and geochemical investigations that should be tackled within a study area.
Kwon, Yong-Kook; Ahn, Myung Suk; Park, Jong Suk; Liu, Jang Ryol; In, Dong Su; Min, Byung Whan; Kim, Suk Weon
2013-01-01
To determine whether Fourier transform (FT)-IR spectral analysis combined with multivariate analysis of whole-cell extracts from ginseng leaves can be applied as a high-throughput discrimination system of cultivation ages and cultivars, a total of total 480 leaf samples belonging to 12 categories corresponding to four different cultivars (Yunpung, Kumpung, Chunpung, and an open-pollinated variety) and three different cultivation ages (1 yr, 2 yr, and 3 yr) were subjected to FT-IR. The spectral data were analyzed by principal component analysis and partial least squares-discriminant analysis. A dendrogram based on hierarchical clustering analysis of the FT-IR spectral data on ginseng leaves showed that leaf samples were initially segregated into three groups in a cultivation age-dependent manner. Then, within the same cultivation age group, leaf samples were clustered into four subgroups in a cultivar-dependent manner. The overall prediction accuracy for discrimination of cultivars and cultivation ages was 94.8% in a cross-validation test. These results clearly show that the FT-IR spectra combined with multivariate analysis from ginseng leaves can be applied as an alternative tool for discriminating of ginseng cultivars and cultivation ages. Therefore, we suggest that this result could be used as a rapid and reliable F1 hybrid seed-screening tool for accelerating the conventional breeding of ginseng. PMID:24558311
Goekoop, Rutger; Goekoop, Jaap G.; Scholte, H. Steven
2012-01-01
Introduction Human personality is described preferentially in terms of factors (dimensions) found using factor analysis. An alternative and highly related method is network analysis, which may have several advantages over factor analytic methods. Aim To directly compare the ability of network community detection (NCD) and principal component factor analysis (PCA) to examine modularity in multidimensional datasets such as the neuroticism-extraversion-openness personality inventory revised (NEO-PI-R). Methods 434 healthy subjects were tested on the NEO-PI-R. PCA was performed to extract factor structures (FS) of the current dataset using both item scores and facet scores. Correlational network graphs were constructed from univariate correlation matrices of interactions between both items and facets. These networks were pruned in a link-by-link fashion while calculating the network community structure (NCS) of each resulting network using the Wakita Tsurumi clustering algorithm. NCSs were matched against FS and networks of best matches were kept for further analysis. Results At facet level, NCS showed a best match (96.2%) with a ‘confirmatory’ 5-FS. At item level, NCS showed a best match (80%) with the standard 5-FS and involved a total of 6 network clusters. Lesser matches were found with ‘confirmatory’ 5-FS and ‘exploratory’ 6-FS of the current dataset. Network analysis did not identify facets as a separate level of organization in between items and clusters. A small-world network structure was found in both item- and facet level networks. Conclusion We present the first optimized network graph of personality traits according to the NEO-PI-R: a ‘Personality Web’. Such a web may represent the possible routes that subjects can take during personality development. NCD outperforms PCA by producing plausible modularity at item level in non-standard datasets, and can identify the key roles of individual items and clusters in the network. PMID:23284713
Goekoop, Rutger; Goekoop, Jaap G; Scholte, H Steven
2012-01-01
Human personality is described preferentially in terms of factors (dimensions) found using factor analysis. An alternative and highly related method is network analysis, which may have several advantages over factor analytic methods. To directly compare the ability of network community detection (NCD) and principal component factor analysis (PCA) to examine modularity in multidimensional datasets such as the neuroticism-extraversion-openness personality inventory revised (NEO-PI-R). 434 healthy subjects were tested on the NEO-PI-R. PCA was performed to extract factor structures (FS) of the current dataset using both item scores and facet scores. Correlational network graphs were constructed from univariate correlation matrices of interactions between both items and facets. These networks were pruned in a link-by-link fashion while calculating the network community structure (NCS) of each resulting network using the Wakita Tsurumi clustering algorithm. NCSs were matched against FS and networks of best matches were kept for further analysis. At facet level, NCS showed a best match (96.2%) with a 'confirmatory' 5-FS. At item level, NCS showed a best match (80%) with the standard 5-FS and involved a total of 6 network clusters. Lesser matches were found with 'confirmatory' 5-FS and 'exploratory' 6-FS of the current dataset. Network analysis did not identify facets as a separate level of organization in between items and clusters. A small-world network structure was found in both item- and facet level networks. We present the first optimized network graph of personality traits according to the NEO-PI-R: a 'Personality Web'. Such a web may represent the possible routes that subjects can take during personality development. NCD outperforms PCA by producing plausible modularity at item level in non-standard datasets, and can identify the key roles of individual items and clusters in the network.
Does the 1H-NMR plasma metabolome reflect the host-tumor interactions in human breast cancer?
Richard, Vincent; Conotte, Raphaël; Mayne, David; Colet, Jean-Marie
2017-07-25
Breast cancer (BC) is the most common diagnosed cancer and the leading cause of cancer death in women worldwide. There is an obvious need for a better understanding of BC biology. Alterations in the serum metabolome of BC patients have been identified but their clinical significance remains elusive. We evaluated by 1H-Nuclear Magnetic Resonance (1H-NMR) spectroscopy, filtered plasma metabolome of 50 early (EBC) and 15 metastatic BC (MBC) patients. Using Principal Component Analysis, Partial Least-Squares Discriminant Analysis and Hierarchical Clustering we show that plasma levels of glucose, lactate, pyruvate, alanine, leucine, isoleucine, glutamate, glutamine, valine, lysine, glycine, threonine, tyrosine, phenylalanine, acetate, acetoacetate, β-hydroxy-butyrate, urea, creatine and creatinine are modulated across patients clusters. In particular lactate levels are inversely correlated with the tumor size in the EBC cohort (Pearson correlation r = -0.309; p = 0.044). We suggest that, in BC patients, tumor cells could induce modulation of the whole patient's metabolism even at early stages. If confirmed in a lager study these observations could be of clinical importance.
Li, Muwang; Shen, Li; Xu, Anying; Miao, Xuexia; Hou, Chengxiang; Sun, Pingjiang; Zhang, Yuehua; Huang, Yongping
2005-10-01
To determine genetic relationships among strains of silkworm, Bombyx mori L., 31 strains with different origins, number of generations per year, number of molts per generation, and morphological characters were studied using simple sequence repeat (SSR) markers. Twenty-six primer pairs flanking microsatellite sequences in the silkworm genome were assayed. All were polymorphic and unambiguously separated silkworm strains from each other. A total of 188 alleles were detected with a mean value of 7.2 alleles/locus (range 2-17). The average heterozygosity value for each SSR locus ranged from 0 to 0.60, and the highest one was 0.96 (Fl0516 in 4013). The mean polymorphism index content (PIC) was 0.66 (range 0.12-0.89). Unweighted pair group method with arithmetic means (UPGMA) cluster analysis of Nei's genetic distance grouped silkworm strains based on their origin. Seven major ecotypic silkworm groups were analyzed. Principal components analysis (PCA) for SSR data support their UPGMA clustering. The results indicated that SSR markers are an efficient tool for fingerprinting cultivars and conducting genetic-diversity studies in the silkworm.
Basile, F; Voorhees, K J; Hadfield, T L
1995-04-01
Curie-point pyrolysis (Py)-mass spectrometry has been used to differentiate 19 microorganisms by Gram type on the basis of the methyl esters of their fatty acid distribution. The mass spectra of gram-negative microorganisms were characterized by the presence of palmitoleic acid (C(inf16:1)) and oleic acid (C(inf18:1)), as well as a higher abundance of palmitic acid (C(inf16:0)) than pentadecanoic acid (C(inf15:0)). For gram-positive microorganisms, a signal of branched C(inf15:0) (isoC(inf15:0) and/or anteisoC(inf15:0)) more intense than that of palmitic acid was observed in the mass spectra. Principal components analysis of these mass spectral data segregated the microorganisms investigated in this study into three discrete clusters that correlated to their gram reactions and pathogenicities. Further tandem mass spectrometric analysis demonstrated that the nature of the C(inf15:0) fatty acid isomer (branched or normal) present in the mass spectrum of each microorganism was important for achieving the classification into three clusters.
Cheong, Kit-Leong; Wu, Ding-Tao; Deng, Yong; Leong, Fong; Zhao, Jing; Zhang, Wen-Jie; Li, Shao-Ping
2016-11-20
The objective of this study was to qualify and quantify the specific polysaccharides in Panax spp. The analyses of specific polysaccharides were performed by using GC-MS, saccharide mapping and high performance size exclusion chromatography (HPSEC) coupled with multi angle laser light scattering (MALLS) and refractive index detector (RID). Results showed that compositional monosaccharides were the same in different species of Panax and composed of rhamnose, arabinose, galacturonic acid, mannose, glucose, and galactose. Saccharide mapping results showed that glycosides linkages, which existed in specific polysaccharides from Panax spp., were similar. Additionally, the content of specific polysaccharides of P. ginseng, P. notoginseng and P. quinquefolium were 17.9-20.5mg/g, 11.9-15.0mg/g, and 9.9-13.3mg/g, respectively. P. ginseng, P. notoginseng, and P. quinquefolium could be clustered into three groups using both hierarchical cluster analysis and principal component analysis. The results possessed great potential in characterization and content determination of specific polysaccharides in Panax spp. Copyright © 2016 Elsevier Ltd. All rights reserved.
Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer
2015-01-01
Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885
Analysis of PETT images in psychiatric disorders
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brodie, J.D.; Gomez-Mont, F.; Volkow, N.D.
1983-01-01
A quantitative method is presented for studying the pattern of metabolic activity in a set of Positron Emission Transaxial Tomography (PETT) images. Using complex Fourier coefficients as a feature vector for each image, cluster, principal components, and discriminant function analyses are used to empirically describe metabolic differences between control subjects and patients with DSM III diagnosis for schizophrenia or endogenous depression. We also present data on the effects of neuroleptic treatment on the local cerebral metabolic rate of glucose utilization (LCMRGI) in a group of chronic schizophrenics using the region of interest approach. 15 references, 4 figures, 3 tables.
Patidar, Shailesh Kumar; Chokshi, Kaumeel; George, Basil; Bhattacharya, Sourish; Mishra, Sandhya
2015-01-01
Industrial clusters of Gujarat, India, generate high quantity of effluents which are received by aquatic bodies such as estuary and coastal water. In the present study, microalgal assemblage, heavy metals, and physico-chemical variables were studied from different habitats. Principal component analysis revealed that biovolume of cyanobacterial and cryptophytic community positively correlated with the heavy metal concentration (Hg, As, Zn, Fe, Mo, Ni, and Co) and chromophoric dissolved organic matter (CDOM) under hypoxic environment. Green algae and diatoms dominated at comparatively lower nitrate concentration which was positively associated with Pb and Mn.
Detection Method of TOXOPLASMA GONDII Tachyzoites
NASA Astrophysics Data System (ADS)
Eassa, Souzan; Bose, Chhanda; Alusta, Pierre; Tarasenko, Olga
2011-06-01
Tachyzoites are considered to be the most important stage of Toxoplasma gondii which causes toxoplasmosis. T. gondii is, an obligate intracellular parasite which infects a wide range of cells. The present study was designed to develop a method for an early detection of T. gondii tachyzoites. The method comprised of a binding assay which was analyzed using principal component and cluster analysis. Our data showed that glycoconjugates GC1, GC2, GC3 and GC10 exhibit a significantly higher binding affinity for T. gondii tachyzoites as compared to controls (T. gondii only, PAA only, GC 1, 2, 3, and 10 only).
Kubo, Yuji; Rooney, Alejandro P; Tsukakoshi, Yoshiki; Nakagawa, Rikio; Hasegawa, Hiromasa; Kimura, Keitarou
2011-09-01
Spore-forming Bacillus strains that produce extracellular poly-γ-glutamic acid were screened for their application to natto (fermented soybean food) fermentation. Among the 424 strains, including Bacillus subtilis and B. amyloliquefaciens, which we isolated from rice straw, 59 were capable of fermenting natto. Biotin auxotrophism was tightly linked to natto fermentation. A multilocus nucleotide sequence of six genes (rpoB, purH, gyrA, groEL, polC, and 16S rRNA) was used for phylogenetic analysis, and amplified fragment length polymorphism (AFLP) analysis was also conducted on the natto-fermenting strains. The ability to ferment natto was inferred from the two principal components of the AFLP banding pattern, and natto-fermenting strains formed a tight cluster within the B. subtilis subsp. subtilis group.
Kubo, Yuji; Rooney, Alejandro P.; Tsukakoshi, Yoshiki; Nakagawa, Rikio; Hasegawa, Hiromasa; Kimura, Keitarou
2011-01-01
Spore-forming Bacillus strains that produce extracellular poly-γ-glutamic acid were screened for their application to natto (fermented soybean food) fermentation. Among the 424 strains, including Bacillus subtilis and B. amyloliquefaciens, which we isolated from rice straw, 59 were capable of fermenting natto. Biotin auxotrophism was tightly linked to natto fermentation. A multilocus nucleotide sequence of six genes (rpoB, purH, gyrA, groEL, polC, and 16S rRNA) was used for phylogenetic analysis, and amplified fragment length polymorphism (AFLP) analysis was also conducted on the natto-fermenting strains. The ability to ferment natto was inferred from the two principal components of the AFLP banding pattern, and natto-fermenting strains formed a tight cluster within the B. subtilis subsp. subtilis group. PMID:21764950
Yu, Xiaoxue; Zhang, Yafeng; Wang, Dongmei; Jiang, Lin; Xu, Xinjun
2018-01-01
Background: Citri Reticulatae Pericarpium is the dried mature pericarp of Citrus reticulata Blanco which can be divided into “Chenpi” and “Guangchenpi.” “Guangchenpi” is the genuine Chinese medicinal material in Xinhui, Guangdong province; based on the greatest quality and least amount, it is most expensive among others. Hesperidin is used as the marker to identify Citri Reticulatae Pericarpium described in the Chinese Pharmacopoeia 2010. However, both “Chenpi” and “Guangchenpi” contain hesperidin so that it is impossible to differentiate them by measuring hesperidin. Objective: Our study aims to develop an efficient and accurate method to separate and identify “Guangchenpi” from other Citri Reticulatae Pericarpium. Materials and Methods: The genomic deoxyribonucleic acid (DNA) of all the materials was extracted and then the internal transcribed spacer 2 was amplified, sequenced, aligned, and analyzed. The secondary structures were created in terms of the database and website established by Jörg Schultz et al. High-performance liquid chromatography-diode array detection-electrospray Ionization/mass spectrometry (HPLC-DAD-ESI-MS)/MS coupled with chemometric analysis was applied to compare the differences in chemical profiles of the three kinds of Citri Reticulatae Pericarpium. Results: A total of 22 samples were classified into three groups. The results of DNA barcoding were in accordance with principal component analysis and hierarchical cluster analysis. Eight compounds were deduced from HPLC-DAD-ESI-MS/MS. Conclusions: This method is a reliable and effective tool to differentiate the three Citri Reticulatae Pericarpium. SUMMARY The internal transcribed spacer 2 regions and the secondary structure among three kinds of Citri Reticulatae Pericarpium varied considerablyAll the 22 samples were analyzed by high-performance liquid chromatography (HPLC) to obtain the chemical profilesPrincipal component analysis and hierarchical cluster analysis were used in the chemometric analysisdeoxyribonucleic acid barcoding and HPLC-diode array detection-electrospray ionization/mass spectrometry/MS coupled with chemometric analysis provided an accurate and strong proof to identify these three herbs. Abbreviations used: CTAB: Hexadecyltrimethylammonium bromide, DNA: Deoxyribonucleic acid, ITS2: Internal transcribed spacer 2, PCR: Polymerase chain reaction. PMID:29576703
Selection of fragrance for cosmetic cream containing olive oil.
Parente, María Emma; Gámbaro, Adriana; Boinbaser, Lucía; Roascio, Antonella
2014-01-01
Perceptions of essences for potential use in the development of a line of cosmetic emulsions containing olive oil were studied. Six cream samples prepared with six essences selected in a preliminary study were evaluated for overall liking and intention to purchase by a 63-women sample. A check-all-that-apply (CATA) question consisting of 32 terms was used to gather information about consumer perceptions of fragrance, affective associations, effects on the skin, price, target market, zones of application, and occasions of use. Hierarchical cluster analysis led to the identification of two consumer clusters with different frequency of use of face creams. The two clusters assigned different overall liking scores to the samples and used the CATA terms differently to describe them. A fragrance with jasmine as its principal note was selected for further development of cosmetic creams, as it was awarded the highest overall liking scores by respondents of the two clusters, and was significantly associated with cosmetic features including nourishing, moisturizing, softening, with a delicious and mild smell, and with a natural image, as well as being considered suitable for face and body creams. The use of CATA questions enabled the rapid identification of attributes associated by respondents with a cosmetic cream's fragrance, in addition to contributing relevant information for the definition of marketing and communication strategies.
Fine-scale population genetic structure of arctic foxes (Vulpes lagopus) in the High Arctic.
Lai, Sandra; Quiles, Adrien; Lambourdière, Josie; Berteaux, Dominique; Lalis, Aude
2017-12-01
The arctic fox (Vulpes lagopus) is a circumpolar species inhabiting all accessible Arctic tundra habitats. The species forms a panmictic population over areas connected by sea ice, but recently, kin clustering and population differentiation were detected even in regions where sea ice was present. The purpose of this study was to examine the genetic structure of a population in the High Arctic using a robust panel of highly polymorphic microsatellites. We analyzed the genotypes of 210 individuals from Bylot Island, Nunavut, Canada, using 15 microsatellite loci. No pattern of isolation-by-distance was detected, but a spatial principal component analysis (sPCA) revealed the presence of genetic subdivisions. Overall, the sPCA revealed two spatially distinct genetic clusters corresponding to the northern and southern parts of the study area, plus another subdivision within each of these two clusters. The north-south genetic differentiation partly matched the distribution of a snow goose colony, which could reflect a preference for settling into familiar ecological environments. Secondary clusters may result from higher-order social structures (neighbourhoods) that use landscape features to delimit their borders. The cryptic genetic subdivisions found in our population may highlight ecological processes deserving further investigations in arctic foxes at larger, regional spatial scales.
Micro-Raman spectroscopy of natural and synthetic indigo samples.
Vandenabeele, Peter; Moens, Luc
2003-02-01
In this work indigo samples from three different sources are studied by using Raman spectroscopy: the synthetic pigment and pigments from the woad (Isatis tinctoria) and the indigo plant (Indigofera tinctoria). 21 samples were obtained from 8 suppliers; for each sample 5 Raman spectra were recorded and used for further chemometrical analysis. Principal components analysis (PCA) was performed as data reduction method before applying hierarchical cluster analysis. Linear discriminant analysis (LDA) was implemented as a non-hierarchical supervised pattern recognition method to build a classification model. In order to avoid broad-shaped interferences from the fluorescence background, the influence of 1st and 2nd derivatives on the classification was studied by using cross-validation. Although chemically identical, it is shown that Raman spectroscopy in combination with suitable chemometric methods has the potential to discriminate between synthetic and natural indigo samples.
NASA Astrophysics Data System (ADS)
Grasel, Fábio dos Santos; Ferrão, Marco Flôres; Wolf, Carlos Rodolfo
2016-01-01
Tannins are polyphenolic compounds of complex structures formed by secondary metabolism in several plants. These polyphenolic compounds have different applications, such as drugs, anti-corrosion agents, flocculants, and tanning agents. This study analyses six different type of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), we observed well-defined separation between condensed (quebracho and black wattle) and hydrolysable (valonea, chestnut, myrobalan, and tara) tannins. For hydrolysable tannins, it was also possible to observe the formation of two different subgroups between samples of chestnut and valonea and between samples of tara and myrobalan. Among all samples analysed, the chestnut and valonea showed the greatest similarity, indicating that these extracts contain equivalent chemical compositions and structure and, therefore, similar properties.
The hoard of Beçin—non-destructive analysis of the silver coins
NASA Astrophysics Data System (ADS)
Rodrigues, M.; Schreiner, M.; Mäder, M.; Melcher, M.; Guerra, M.; Salomon, J.; Radtke, M.; Alram, M.; Schindel, N.
2010-05-01
We report the results of an analytical investigation on 416 silver-copper coins stemming from the Ottoman Empire (end of 16th and beginning of 17th centuries), using synchrotron micro X-ray fluorescence analysis (SRXRF). In the past, analyses had already been conducted with energy dispersive X-ray fluorescence analysis (EDXRF), scanning electron microscopy with energy dispersive X-ray spectrometry (SEM/EDX) and proton induced X-ray emission spectroscopy (PIXE). With this combination of techniques it was possible to confirm the fineness of the coinage as well as to study the provenance of the alloy used for the coins. For the interpretation of the data statistical analysis (principal component analysis—PCA) has been performed. A definite local assignment was explored and significant clustering was obtained regarding the minor and trace elements composing the coin alloys.
Identification of Marker-Trait Associations for Lint Traits in Cotton
Iqbal, Muhammad A.; Rahman, Mehboob-ur-
2017-01-01
Harvesting high quality lint, a long-awaited breeding goal—accomplished partly, can be achieved by identifying DNA markers which could be used for diagnosing cotton plants containing the desired traits. In the present studies, a total of 185 cotton genotypes exhibiting diversity for lint traits were selected from a set of 546 genotypes evaluated for fiber traits in 2009. These genotypes were extensively studied for three consecutive years (2011–2013) at three different locations. Significant genetic variations were found for average boll weight, ginning out turn (GOT), micronaire value, staple length, fiber bundle strength, and uniformity index. IR-NIBGE-3701 showed maximum GOT (43.63%). Clustering of genotypes using Ward's method was found more informative than that of the clusters generated by principal component analysis. A total of 382 SSRs were surveyed on 10 Gossypium hirsutum genotypes exhibiting contrasting fiber traits. Out of these, 95 polymorphic SSR primer pairs were then surveyed on 185 genotypes. The gene diversity averaged 0.191 and the polymorphic information content (PIC) averaged 0.175. Unweighted pair group method with arithmetic mean (UPGMA), principal coordinate analysis (PCoA), and STRUCTURE software grouped these genotypes into four major clusters each. Genetic distance within the clusters ranged from 0.0587 to 0.1030. A total of 47 (25.41%) genotypes exhibited shared ancestry. In total 6.8% (r2 ≥ 0.05) and 4.4% (r2 ≥ 0.1) of the marker pairs showed significant linkage disequilibrium (LD). A number of marker-trait associations (in total 75) including 13 for average boll weight, 18 for GOT percentage, eight for micronaire value, 18 for staple length, three for fiber bundle strength, and 15 for uniformity index were calculated. Out of these, MGHES-51 was associated with all the traits. Most of the marker-trait associations were novel while few validated the associations reported in the previous studies. High frequency of favorable alleles in cultivated varieties is possibly due to fixation of desirable alleles by domestication. These favorable alleles can be used in marker assisted breeding or for gene cloning using next generation sequencing tools. The present studies would set a stage for harvesting high quality lint without compromising the yield potential—ascertaining natural fiber security. PMID:28220132
Pertoldi, Cino; Sonne, Christian; Wiig, Øystein; Baagøe, Hans J; Loeschcke, Volker; Bechshøft, Thea Østergaard
2012-06-01
A morphometric study was conducted on four skull traits of 37 male and 18 female adult East Greenland polar bears (Ursus maritimus) collected 1892-1968, and on 54 male and 44 female adult Barents Sea polar bears collected 1950-1969. The aim was to compare differences in size and shape of the bear skulls using a multivariate approach, characterizing the variation between the two populations using morphometric traits as an indicator of environmental and genetic differences. Mixture analysis testing for geographic differentiation within each population revealed three clusters for Barents Sea males and three clusters for Barents Sea females. East Greenland consisted of one female and one male cluster. A principal component analysis (PCA) conducted on the clusters defined by the mixture analysis, showed that East Greenland and Barents Sea polar bear populations overlapped to a large degree, especially with regards to females. Multivariate analyses of variance (MANOVA) showed no significant differences in morphometric means between the two populations, but differences were detected between clusters from each respective geographic locality. To estimate the importance of genetics and environment in the morphometric differences between the bears, a PCA was performed on the covariance matrix derived from the skull measurements. Skull trait size (PC1) explained approx. 80% of the morphometric variation, whereas shape (PC2) defined approx. 15%, indicating some genetic differentiation. Hence, both environmental and genetic factors seem to have contributed to the observed skull differences between the two populations. Overall, results indicate that many Barents Sea polar bears are morphometrically similar to the East Greenland ones, suggesting an exchange of individuals between the two populations. Furthermore, a subpopulation structure in the Barents Sea population was also indicated from the present analyses, which should be considered with regards to future management decisions. © 2012 The Authors.
Muntaner, Carles; Chung, Haejoo; Benach, Joan; Ng, Edwin
2012-04-18
An important contribution of the social determinants of health perspective has been to inquire about non-medical determinants of population health. Among these, labour market regulations are of vital significance. In this study, we investigate the labour market regulations among low- and middle-income countries (LMICs) and propose a labour market taxonomy to further understand population health in a global context. Using Gross National Product per capita, we classify 113 countries into either low-income (n = 71) or middle-income (n = 42) strata. Principal component analysis of three standardized indicators of labour market inequality and poverty is used to construct 2 factor scores. Factor score reliability is evaluated with Cronbach's alpha. Using these scores, we conduct a hierarchical cluster analysis to produce a labour market taxonomy, conduct zero-order correlations, and create box plots to test their associations with adult mortality, healthy life expectancy, infant mortality, maternal mortality, neonatal mortality, under-5 mortality, and years of life lost to communicable and non-communicable diseases. Labour market and health data are retrieved from the International Labour Organization's Key Indicators of Labour Markets and World Health Organization's Statistical Information System. Six labour market clusters emerged: Residual (n = 16), Emerging (n = 16), Informal (n = 10), Post-Communist (n = 18), Less Successful Informal (n = 22), and Insecure (n = 31). Primary findings indicate: (i) labour market poverty and population health is correlated in both LMICs; (ii) association between labour market inequality and health indicators is significant only in low-income countries; (iii) Emerging (e.g., East Asian and Eastern European countries) and Insecure (e.g., sub-Saharan African nations) clusters are the most advantaged and disadvantaged, respectively, with the remaining clusters experiencing levels of population health consistent with their labour market characteristics. The labour market regulations of LMICs appear to be important social determinant of population health. This study demonstrates the heuristic value of understanding the labour markets of LMICs and their health effects using exploratory taxonomy approaches.
Images of Leadership and their Effect Upon School Principals' Performance
NASA Astrophysics Data System (ADS)
Gaziel, Haim
2003-09-01
The purpose of the present study is to identify how school principals perceive their world and how their perceptions influence their effectiveness as managers and leaders. The principals' views of their world were categorised into four different metaphorical ways of describing the workings of organisations: (1) the structural model (organisations as machines); (2) the human-resource model (organisations as organisms); (3) the political model (organisations as political systems); (4) the symbolic model (organisations as cultural patterns and clusters of myths and symbols). The results reveal that the best predictors of school principals' effectiveness as managers, according to their own assessments and teachers' reports, are the structural and human resource models, while the best predictors of effective leadership are the political and human-resource models.
A Parametric k-Means Algorithm
Tarpey, Thaddeus
2007-01-01
Summary The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm. PMID:17917692
NASA Astrophysics Data System (ADS)
Yamashita, S.; Nakajo, T.; Naruse, H.
2009-12-01
In this study, we statistically classified the grain size distribution of the bottom surface sediment on a microtidal sand flat to analyze the depositional processes of the sediment. Multiple classification analysis revealed that two types of sediment populations exist in the bottom surface sediment. Then, we employed the sediment trend model developed by Gao and Collins (1992) for the estimation of sediment transport pathways. As a result, we found that statistical discrimination of the bottom surface sediment provides useful information for the sediment trend model while dealing with various types of sediment transport processes. The microtidal sand flat along the Kushida River estuary, Ise Bay, central Japan, was investigated, and 102 bottom surface sediment samples were obtained. Then, their grain size distribution patterns were measured by the settling tube method, and each grain size distribution parameter (mud and gravel contents, mean grain size, coefficient of variance (CV), skewness, kurtosis, 5, 25, 50, 75, and 95 percentile) was calculated. Here, CV is the normalized sorting value divided by the mean grain size. Two classical statistical methods—principal component analysis (PCA) and fuzzy cluster analysis—were applied. The results of PCA showed that the bottom surface sediment of the study area is mainly characterized by grain size (mean grain size and 5-95 percentile) and the CV value, indicating predominantly large absolute values of factor loadings in primal component (PC) 1. PC1 is interpreted as being indicative of the grain-size trend, in which a finer grain-size distribution indicates better size sorting. The frequency distribution of PC1 has a bimodal shape and suggests the existence of two types of sediment populations. Therefore, we applied fuzzy cluster analysis, the results of which revealed two groupings of the sediment (Cluster 1 and Cluster 2). Cluster 1 shows a lower value of PC1, indicating coarse and poorly sorted sediments. Cluster 1 sediments are distributed around the branched channel from Kushida River and show an expanding distribution from the river mouth toward the northeast direction. Cluster 2 shows a higher value of PC1, indicating fine and well-sorted sediments; this cluster is distributed in a distant area from the river mouth, including the offshore region. Therefore, Cluster 1 and Cluster 2 are interpreted as being deposited by fluvial and wave processes, respectively. Finally, on the basis of this distribution pattern, the sediment trend model was applied in areas dominated separately by fluvial and wave processes. Resultant sediment transport patterns showed good agreement with those obtained by field observations. The results of this study provide an important insight into the numerical models of sediment transport.
Liem, David Alexandre; Murali, Sanjana; Sigdel, Dibakar; Shi, Yu; Wang, Xuan; Shen, Jiaming; Choi, Howard; Caufield, J Harry; Wang, Wei; Ping, Peipei; Han, Jiawei
2018-05-18
Extracellular matrix (ECM) proteins have been shown to play important roles regulating multiple biological processes in an array of organ systems, including the cardiovascular system. By using a novel bioinformatics text-mining tool, we studied six categories of cardiovascular disease (CVD), namely ischemic heart disease (IHD), cardiomyopathies (CM), cerebrovascular accident (CVA), congenital heart disease (CHD), arrhythmias (ARR), and valve disease (VD), anticipating novel ECM protein-disease and protein-protein relationships hidden within vast quantities of textual data. We conducted a phrase-mining analysis, delineating the relationships of 709 ECM proteins with the six groups of CVDs reported in 1,099,254 abstracts. The technology pipeline known as Context-aware Semantic Online Analytical Processing (CaseOLAP) was applied to semantically rank the association of proteins to each and all six CVDs, performing analyses to quantify each protein-disease relationship. We performed principal component analysis and hierarchical clustering of the data, where each protein is visualized as a six dimensional vector. We found that ECM proteins display variable degrees of association with the six CVDs; certain CVDs share groups of associated proteins whereas others have divergent protein associations. We identified 82 ECM proteins sharing associations with all six CVDs. Our bioinformatics analysis ascribed distinct ECM pathways (via Reactome) from this subset of proteins, namely insulin-like growth factor regulation and interleukin-4 and interleukin-13 signaling, suggesting their contribution to the pathogenesis of all six CVDs. Finally, we performed hierarchical clustering analysis and identified protein clusters associated with a targeted CVD; analyses revealed unexpected insights underlying ECM-pathogenesis of CVDs.
Assessment of Depression in a Rodent Model of Spinal Cord Injury
Luedtke, Kelsey; Bouchard, Sioui Maldonado; Woller, Sarah A.; Funk, Mary Katherine; Aceves, Miriam
2014-01-01
Abstract Despite an increased incidence of depression in patients after spinal cord injury (SCI), there is no animal model of depression after SCI. To address this, we used a battery of established tests to assess depression after a rodent contusion injury. Subjects were acclimated to the tasks, and baseline scores were collected before SCI. Testing was conducted on days 9–10 (acute) and 19–20 (chronic) postinjury. To categorize depression, subjects' scores on each behavioral measure were averaged across the acute and chronic stages of injury and subjected to a principal component analysis. This analysis revealed a two-component structure, which explained 72.2% of between-subjects variance. The data were then analyzed with a hierarchical cluster analysis, identifying two clusters that differed significantly on the sucrose preference, open field, social exploration, and burrowing tasks. One cluster (9 of 26 subjects) displayed characteristics of depression. Using these data, a discriminant function analysis was conducted to derive an equation that could classify subjects as “depressed” on days 9–10. The discriminant function was used in a second experiment examining whether the depression-like symptoms could be reversed with the antidepressant, fluoxetine. Fluoxetine significantly decreased immobility in the forced swim test (FST) in depressed subjects identified with the equation. Subjects that were depressed and treated with saline displayed significantly increased immobility on the FST, relative to not depressed, saline-treated controls. These initial experiments validate our tests of depression, generating a powerful model system for further understanding the relationships between molecular changes induced by SCI and the development of depression. PMID:24564232
Gad, Haidy A; El-Ahmady, Sherweit H; Abou-Shoer, Mohamed I; Al-Azizi, Mohamed M
2013-01-01
Recently, the fields of chemometrics and multivariate analysis have been widely implemented in the quality control of herbal drugs to produce precise results, which is crucial in the field of medicine. Thyme represents an essential medicinal herb that is constantly adulterated due to its resemblance to many other plants with similar organoleptic properties. To establish a simple model for the quality assessment of Thymus species using UV spectroscopy together with known chemometric techniques. The success of this model may also serve as a technique for the quality control of other herbal drugs. The model was constructed using 30 samples of authenticated Thymus vulgaris and challenged with 20 samples of different botanical origins. The methanolic extracts of all samples were assessed using UV spectroscopy together with chemometric techniques: principal component analysis (PCA), soft independent modeling of class analogy (SIMCA) and hierarchical cluster analysis (HCA). The model was able to discriminate T. vulgaris from other Thymus, Satureja, Origanum, Plectranthus and Eriocephalus species, all traded in the Egyptian market as different types of thyme. The model was also able to classify closely related species in clusters using PCA and HCA. The model was finally used to classify 12 commercial thyme varieties into clusters of species incorporated in the model as thyme or non-thyme. The model constructed is highly recommended as a simple and efficient method for distinguishing T. vulgaris from other related species as well as the classification of marketed herbs as thyme or non-thyme. Copyright © 2013 John Wiley & Sons, Ltd.
Colorimetric sensing of anions in water using ratiometric indicator-displacement assay.
Feng, Liang; Li, Hui; Li, Xiao; Chen, Liang; Shen, Zheng; Guan, Yafeng
2012-09-19
The analysis of anions in water presents a difficult challenge due to their low charge-to-radius ratio, and the ability to discriminate among similar anions often remains problematic. The use of a 3×6 ratiometric indicator-displacement assay (RIDA) array for the colorimetric detection and identification of ten anions in water is reported. The sensor array consists of different combinations of colorimetric indicators and metal cations. The colorimetric indicators chelate with metal cations, forming the color changes. Upon the addition of anions, anions compete with the indicator ligands according to solubility product constants (K(sp)). The indicator-metal chelate compound changes color back dramatically when the competition of anions wins. The color changes of the RIDA array were used as a digital representation of the array response and analyzed with standard statistical methods, including principal component analysis and hierarchical clustering analysis. No confusion or errors in classification by hierarchical clustering analysis were observed in 44 trials. The limit of detection was calculated approximately, and most limits of detections of anions are well below μM level using our RIDA array. The pH effect, temperature influence, interfering anions were also investigated, and the RIDA array shows the feasibility of real sample testing. Copyright © 2012 Elsevier B.V. All rights reserved.
Bayesian and Phylogenic Approaches for Studying Relationships among Table Olive Cultivars.
Ben Ayed, Rayda; Ennouri, Karim; Ben Amar, Fathi; Moreau, Fabienne; Triki, Mohamed Ali; Rebai, Ahmed
2017-08-01
To enhance table olive tree authentication, relationship, and productivity, we consider the analysis of 18 worldwide table olive cultivars (Olea europaea L.) based on morphological, biological, and physicochemical markers analyzed by bioinformatic and biostatistic tools. Accordingly, we assess the relationships between the studied varieties, on the one hand, and the potential productivity-quantitative parameter links on the other hand. The bioinformatic analysis based on the graphical representation of the matrix of Euclidean distances, the principal components analysis, unweighted pair group method with arithmetic mean, and principal coordinate analysis (PCoA) revealed three major clusters which were not correlated with the geographic origin. The statistical analysis based on Kendall's and Spearman correlation coefficients suggests two highly significant associations with both fruit color and pollinization and the productivity character. These results are confirmed by the multiple linear regression prediction models. In fact, based on the coefficient of determination (R 2 ) value, the best model demonstrated the power of the pollinization on the tree productivity (R 2 = 0.846). Moreover, the derived directed acyclic graph showed that only two direct influences are detected: effect of tolerance on fruit and stone symmetry on side and effect of tolerance on stone form and oil content on the other side. This work provides better understanding of the diversity available in worldwide table olive cultivars and supplies an important contribution for olive breeding and authenticity.
Fasoula, S; Zisi, Ch; Sampsonidis, I; Virgiliou, Ch; Theodoridis, G; Gika, H; Nikitas, P; Pappa-Louisi, A
2015-03-27
In the present study a series of 45 metabolite standards belonging to four chemically similar metabolite classes (sugars, amino acids, nucleosides and nucleobases, and amines) was subjected to LC analysis on three HILIC columns under 21 different gradient conditions with the aim to explore whether the retention properties of these analytes are determined from the chemical group they belong. Two multivariate techniques, principal component analysis (PCA) and discriminant analysis (DA), were used for statistical evaluation of the chromatographic data and extraction similarities between chemically related compounds. The total variance explained by the first two principal components of PCA was found to be about 98%, whereas both statistical analyses indicated that all analytes are successfully grouped in four clusters of chemical structure based on the retention obtained in four or at least three chromatographic runs, which, however should be performed on two different HILIC columns. Moreover, leave-one-out cross-validation of the above retention data set showed that the chemical group in which an analyte belongs can be 95.6% correctly predicted when the analyte is subjected to LC analysis under the same four or three experimental conditions as the all set of analytes was run beforehand. That, in turn, may assist with disambiguation of analyte identification in complex biological extracts. Copyright © 2015 Elsevier B.V. All rights reserved.
Li, Jinling; He, Ming; Han, Wei; Gu, Yifan
2009-05-30
An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.
Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping
2018-05-22
Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.
Tahir, Haroon Elrasheid; Xiaobo, Zou; Xiaowei, Huang; Jiyong, Shi; Mariod, Abdalbasit Adam
2016-09-01
Aroma profiles of six honey varieties of different botanical origins were investigated using colorimetric sensor array, gas chromatography-mass spectrometry (GC-MS) and descriptive sensory analysis. Fifty-eight aroma compounds were identified, including 2 norisoprenoids, 5 hydrocarbons, 4 terpenes, 6 phenols, 7 ketones, 9 acids, 12 aldehydes and 13 alcohols. Twenty abundant or active compounds were chosen as key compounds to characterize honey aroma. Discrimination of the honeys was subsequently implemented using multivariate analysis, including hierarchical clustering analysis (HCA) and principal component analysis (PCA). Honeys of the same botanical origin were grouped together in the PCA score plot and HCA dendrogram. SPME-GC/MS and colorimetric sensor array were able to discriminate the honeys effectively with the advantages of being rapid, simple and low-cost. Moreover, partial least squares regression (PLSR) was applied to indicate the relationship between sensory descriptors and aroma compounds. Copyright © 2016 Elsevier Ltd. All rights reserved.
Landsat-4 MSS and Thematic Mapper data quality and information content analysis
NASA Technical Reports Server (NTRS)
Anuta, P. E.; Bartolucci, L. A.; Dean, M. E.; Lozano, D. F.; Malaret, E.; Mcgillem, C. D.; Valdes, J. A.; Valenzuela, C. R.
1984-01-01
Landsat-4 Thematic Mapper and Multispectral Scanner data were analyzed to obtain information on data quality and information content. Geometric evaluations were performed to test band-to-band registration accuracy. Thematic Mapper overall system resolution was evaluated using scene objects which demonstrated sharp high contrast edge responses. Radiometric evaluation included detector relative calibration, effects of resampling, and coherent noise effects. Information content evaluation was carried out using clustering, principal components, transformed divergence separability measure, and numerous supervised classifiers on data from Iowa and Illinois. A detailed spectral class analysis (multispectral classification) was carried out on data from the Des Moines, IA area to compare the information content of the MSS and TM for a large number of scene classes.
Kalgin, Igor V; Caflisch, Amedeo; Chekmarev, Sergei F; Karplus, Martin
2013-05-23
A new analysis of the 20 μs equilibrium folding/unfolding molecular dynamics simulations of the three-stranded antiparallel β-sheet miniprotein (beta3s) in implicit solvent is presented. The conformation space is reduced in dimensionality by introduction of linear combinations of hydrogen bond distances as the collective variables making use of a specially adapted principal component analysis (PCA); i.e., to make structured conformations more pronounced, only the formed bonds are included in determining the principal components. It is shown that a three-dimensional (3D) subspace gives a meaningful representation of the folding behavior. The first component, to which eight native hydrogen bonds make the major contribution (four in each beta hairpin), is found to play the role of the reaction coordinate for the overall folding process, while the second and third components distinguish the structured conformations. The representative points of the trajectory in the 3D space are grouped into conformational clusters that correspond to locally stable conformations of beta3s identified in earlier work. A simplified kinetic network based on the three components is constructed, and it is complemented by a hydrodynamic analysis. The latter, making use of "passive tracers" in 3D space, indicates that the folding flow is much more complex than suggested by the kinetic network. A 2D representation of streamlines shows there are vortices which correspond to repeated local rearrangement, not only around minima of the free energy surface but also in flat regions between minima. The vortices revealed by the hydrodynamic analysis are apparently not evident in folding pathways generated by transition-path sampling. Making use of the fact that the values of the collective hydrogen bond variables are linearly related to the Cartesian coordinate space, the RMSD between clusters is determined. Interestingly, the transition rates show an approximate exponential correlation with distance in the hydrogen bond subspace. Comparison with the many published studies shows good agreement with the present analysis for the parts that can be compared, supporting the robust character of our understanding of this "hydrogen atom" of protein folding.
ToF-SIMS observation for evaluating the interaction between amyloid β and lipid membranes.
Aoyagi, Satoka; Shimanouchi, Toshinori; Kawashima, Tomoko; Iwai, Hideo
2015-04-01
The adsorption behaviour of amyloid beta (Aβ), thought to be a key peptide for understanding Alzheimer's disease, was investigated by means of time-of-flight secondary ion mass spectrometry (ToF-SIMS). Aβ aggregates depending on the lipid membrane condition though it has not been fully understood yet. In this study, Aβ samples on different lipid membranes, 1,2-dipalmitoyl-sn-glycero-3-phosphocholine (DPPC), 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC) and 1,2-dioleoyl-sn-glycero-3-phosphocholine (DOPC), were observed with ToF-SIMS and the complex ToF-SIMS data of the Aβ samples was interpreted using data analysis techniques such as principal component analysis (PCA), gentle-SIMS (G-SIMS) and g-ogram. DOPC and DMPC are liquid crystal at room temperature, while DPPC is gel at room temperature. As primary ion beams, Bi3(+) and Ar cluster ion beams were used and the effect of an Ar cluster ion for evaluating biomolecules was also studied. The secondary ion images of the peptide fragment ions indicated by G-SIMS and g-ogram were consistent with the PCA results. It is suggested that Aβ is adsorbed homogeneously on the liquid-crystalline-phase lipid membranes, while it aggregates along the lipid on the gel-phase lipid membrane. Moreover, in the results using the Ar cluster, the influence of contamination was reduced.
Chemical indices and methods of multivariate statistics as a tool for odor classification.
Mahlke, Ingo T; Thiesen, Peter H; Niemeyer, Bernd
2007-04-01
Industrial and agricultural off-gas streams are comprised of numerous volatile compounds, many of which have substantially different odorous properties. State-of-the-art waste-gas treatment includes the characterization of these molecules and is directed at, if possible, either the avoidance of such odorants during processing or the use of existing standardized air purification techniques like bioscrubbing or afterburning, which however, often show low efficiency under ecological and economical regards. Selective odor separation from the off-gas streams could ease many of these disadvantages but is not yet widely applicable. Thus, the aim of this paper is to identify possible model substances in selective odor separation research from 155 volatile molecules mainly originating from livestock facilities, fat refineries, and cocoa and coffee production by knowledge-based methods. All compounds are examined with regard to their structure and information-content using topological and information-theoretical indices. Resulting data are fitted in an observation matrix, and similarities between the substances are computed. Principal component analysis and k-means cluster analysis are conducted showing that clustering of indices data can depict odor information correlating well to molecular composition and molecular shape. Quantitative molecule describtion along with the application of such statistical means therefore provide a good classification tool of malodorant structure properties with no thermodynamic data needed. The approximate look-alike shape of odorous compounds within the clusters suggests a fair choice of possible model molecules.
Counties eliminating racial disparities in colorectal cancer mortality.
Rust, George; Zhang, Shun; Yu, Zhongyuan; Caplan, Lee; Jain, Sanjay; Ayer, Turgay; McRoy, Luceta; Levine, Robert S
2016-06-01
Although colorectal cancer (CRC) mortality rates are declining, racial-ethnic disparities in CRC mortality nationally are widening. Herein, the authors attempted to identify county-level variations in this pattern, and to characterize counties with improving disparity trends. The authors examined 20-year trends in US county-level black-white disparities in CRC age-adjusted mortality rates during the study period between 1989 and 2010. Using a mixed linear model, counties were grouped into mutually exclusive patterns of black-white racial disparity trends in age-adjusted CRC mortality across 20 three-year rolling average data points. County-level characteristics from census data and from the Area Health Resources File were normalized and entered into a principal component analysis. Multinomial logistic regression models were used to test the relation between these factors (clusters of related contextual variables) and the disparity trend pattern group for each county. Counties were grouped into 4 disparity trend pattern groups: 1) persistent disparity (parallel black and white trend lines); 2) diverging (widening disparity); 3) sustained equality; and 4) converging (moving from disparate outcomes toward equality). The initial principal component analysis clustered the 82 independent variables into a smaller number of components, 6 of which explained 47% of the county-level variation in disparity trend patterns. County-level variation in social determinants, health care workforce, and health systems all were found to contribute to variations in cancer mortality disparity trend patterns from 1990 through 2010. Counties sustaining equality over time or moving from disparities to equality in cancer mortality suggest that disparities are not inevitable, and provide hope that more communities can achieve optimal and equitable cancer outcomes for all. Cancer 2016;122:1735-48. © 2016 American Cancer Society. © 2016 American Cancer Society.
Russell, James A; Fjell, Chris; Hsu, Joseph L; Lee, Terry; Boyd, John; Thair, Simone; Singer, Joel; Patterson, Andrew J; Walley, Keith R
2013-08-01
Changes in plasma cytokine levels may predict mortality, and therapies (vasopressin versus norepinephrine) could change plasma cytokine levels in early septic shock. Our hypotheses were that changes in plasma cytokine levels over 24 hours differ between survivors and nonsurvivors, and that there are different effects of vasopressin and norepinephrine on plasma cytokine levels in septic shock. We studied 394 patients in a randomized, controlled trial of vasopressin versus norepinephrine in septic shock. We used hierarchical clustering and principal components analysis of the baseline cytokine concentrations to subgroup cytokines; we then compared survivors to nonsurvivors (28 d) and compared vasopressin- versus norepinephrine-induced changes in cytokine levels over 24 hours. A total of 39 plasma cytokines were measured at baseline and at 24 hours. Hierarchical clustering and principal components analysis grouped cytokines similarly. Survivors (versus nonsurvivors) had greater decreases of overall cytokine levels (P < 0.001). Vasopressin decreased overall 24-hour cytokine concentration compared with norepinephrine (P = 0.037). In less severe septic shock, the difference in plasma cytokine reduction over 24 hours between survivors and nonsurvivors was less pronounced than that seen in more severe septic shock. Furthermore, vasopressin decreased interferon-inducible protein 10 and granulocyte colony-stimulating factor more than did norepinephrine in less severe septic shock, whereas vasopressin decreased granulocyte-macrophage colony-stimulating factor in patients who had more severe shock. Survivors of septic shock had greater decreases of cytokines, chemokines and growth factors in early septic shock. Vasopressin decreased 24-hour plasma cytokine levels more than did norepinephrine. The vasopressin-associated decrease of cytokines differed according to severity of shock. Clinical trial registered with www.controlled-trials.com (ISRCTN94845869).
Zhang, Hong-Guang; Yang, Qin-Min; Lu, Jian-Gang
2014-04-01
In this paper, a novel discriminant methodology based on near infrared spectroscopic analysis technique and least square support vector machine was proposed for rapid and nondestructive discrimination of different types of Polyacrylamide. The diffuse reflectance spectra of samples of Non-ionic Polyacrylamide, Anionic Polyacrylamide and Cationic Polyacrylamide were measured. Then principal component analysis method was applied to reduce the dimension of the spectral data and extract of the principal compnents. The first three principal components were used for cluster analysis of the three different types of Polyacrylamide. Then those principal components were also used as inputs of least square support vector machine model. The optimization of the parameters and the number of principal components used as inputs of least square support vector machine model was performed through cross validation based on grid search. 60 samples of each type of Polyacrylamide were collected. Thus a total of 180 samples were obtained. 135 samples, 45 samples for each type of Polyacrylamide, were randomly split into a training set to build calibration model and the rest 45 samples were used as test set to evaluate the performance of the developed model. In addition, 5 Cationic Polyacrylamide samples and 5 Anionic Polyacrylamide samples adulterated with different proportion of Non-ionic Polyacrylamide were also prepared to show the feasibilty of the proposed method to discriminate the adulterated Polyacrylamide samples. The prediction error threshold for each type of Polyacrylamide was determined by F statistical significance test method based on the prediction error of the training set of corresponding type of Polyacrylamide in cross validation. The discrimination accuracy of the built model was 100% for prediction of the test set. The prediction of the model for the 10 mixing samples was also presented, and all mixing samples were accurately discriminated as adulterated samples. The overall results demonstrate that the discrimination method proposed in the present paper can rapidly and nondestructively discriminate the different types of Polyacrylamide and the adulterated Polyacrylamide samples, and offered a new approach to discriminate the types of Polyacrylamide.
Baudry, Julia; Touvier, Mathilde; Allès, Benjamin; Péneau, Sandrine; Méjean, Caroline; Galan, Pilar; Hercberg, Serge; Lairon, Denis; Kesse-Guyot, Emmanuelle
2016-08-01
Limited information is available on large-scale populations regarding the socio-demographic and nutrient profiles and eating behaviour of consumers, taking into account both organic and conventional foods. The aims of this study were to draw up a typology of consumers according to their eating habits, based both on their dietary patterns and the mode of food production, and to outline their socio-demographic, behavioural and nutritional characteristics. Data were collected from 28 245 participants of the NutriNet-Santé study. Dietary information was obtained using a 264-item, semi-quantitative, organic FFQ. To identify clusters of consumers, principal component analysis was applied on sixteen conventional and sixteen organic food groups followed by a clustering procedure. The following five clusters of consumers were identified: (1) a cluster characterised by low energy intake, low consumption of organic food and high prevalence of inadequate nutrient intakes; (2) a cluster of big eaters of conventional foods with high intakes of SFA and cholesterol; (3) a cluster with high consumption of organic food and relatively adequate nutritional diet quality; (4) a group with a high percentage of organic food consumers, 14 % of which were either vegetarians or vegans, who exhibited a high nutritional diet quality and a low prevalence of inadequate intakes of most vitamins except B12; and (5) a group of moderate organic food consumers with a particularly high intake of proteins and alcohol and a poor nutritional diet quality. These findings may have implications for future aetiological studies investigating the potential impact of organic food consumption.
Rosas-Castor, J M; Guzmán-Mar, J L; Alfaro-Barbosa, J M; Hernández-Ramírez, A; Pérez-Maldonado, I N; Caballero-Quintero, A; Hinojosa-Reyes, L
2014-11-01
The presence of arsenic (As) in agricultural food products is a matter of concern because it can cause adverse health effects at low concentrations. Agricultural-product intake constitutes a principal source for As exposure in humans. In this study, the contribution of the chemical-soil parameters in As accumulation and translocation in the maize crop from a mining area of San Luis Potosi was evaluated. The total arsenic concentration and arsenic speciation were determined by HG-AFS and IC-HG-AFS, respectively. The data analysis was conducted by cluster analysis (CA) and principal component analysis (PCA). The soil pH presented a negative correlation with the accumulated As in each maize plant part, and parameters such as iron (Fe) and manganese (Mn) presented a higher correlation with the As translocation in maize. Thus, the metabolic stress in maize may induce organic acid exudation leading a higher As bioavailability. A high As inorganic/organic ratio in edible maize plant tissues suggests a substantial risk of poisoning by this metalloid. Careful attention to the chemical changes in the rhizosphere of the agricultural zones that can affect As transfer through the food chain could reduce the As-intoxication risk of maize consumers. Copyright © 2014 Elsevier B.V. All rights reserved.
Fadil, Mouhcine; Farah, Abdellah; Ihssane, Bouchaib; Haloui, Taoufik; Lebrazi, Sara; Zghari, Badreddine; Rachiq, Saâd
2016-01-01
To investigate the effect of environmental factors such as light and shade on essential oil yield and morphological traits of Moroccan Myrtus communis, a chemometric study was conducted on 20 individuals growing under two contrasting light environments. The study of individual's parameters by principal component analysis has shown that essential oil yield, altitude, and leaves thickness were positively correlated between them and negatively correlated with plants height, leaves length and leaves width. Principal component analysis and hierarchical cluster analysis have also shown that the individuals of each sampling site were grouped separately. The one-way ANOVA test has confirmed the effect of light and shade on essential oil yield and morphological parameters by showing a statistically significant difference between them from the shaded side to the sunny one. Finally, the multiple linear model containing main, interaction and quadratic terms was chosen for the modeling of essential oil yield in terms of morphological parameters. Sun plants have a small height, small leaves length and width, but they are thicker and richer in essential oil than shade plants which have shown almost the opposite. The highlighted multiple linear model can be used to predict essential oil yield in the studied area.
Craters on Earth, Moon, and Mars: Multivariate classification and mode of origin
Pike, R.J.
1974-01-01
Testing extraterrestrial craters and candidate terrestrial analogs for morphologic similitude is treated as a problem in numerical taxonomy. According to a principal-components solution and a cluster analysis, 402 representative craters on the Earth, the Moon, and Mars divide into two major classes of contrasting shapes and modes of origin. Craters of net accumulation of material (cratered lunar domes, Martian "calderas," and all terrestrial volcanoes except maars and tuff rings) group apart from craters of excavation (terrestrial meteorite impact and experimental explosion craters, typical Martian craters, and all other lunar craters). Maars and tuff rings belong to neither group but are transitional. The classification criteria are four independent attributes of topographic geometry derived from seven descriptive variables by the principal-components transformation. Morphometric differences between crater bowl and raised rim constitute the strongest of the four components. Although single topographic variables cannot confidently predict the genesis of individual extraterrestrial craters, multivariate statistical models constructed from several variables can distinguish consistently between large impact craters and volcanoes. ?? 1974.