Sample records for multivariate cluster analysis

  1. Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction

    ERIC Educational Resources Information Center

    Steinley, Douglas; Brusco, Michael J.; Henson, Robert

    2012-01-01

    A measure of "clusterability" serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space.…

  2. Multivariate statistical analysis: Principles and applications to coorbital streams of meteorite falls

    NASA Technical Reports Server (NTRS)

    Wolf, S. F.; Lipschutz, M. E.

    1993-01-01

    Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.

  3. Development of Pattern Recognition Techniques for the Evaluation of Toxicant Impacts to Multispecies Systems

    DTIC Science & Technology

    1993-06-18

    the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and clustering methods...rule rather than the exception. In the Standardized Aquatic Microcosm and the Mixed Flask Culture (MFC) microcosms, multivariate analysis and...experiments using two microcosm protocols. We use nonmetric clustering, a multivariate pattern recognition technique developed by Matthews and Heame (1991

  4. Mapping Informative Clusters in a Hierarchial Framework of fMRI Multivariate Analysis

    PubMed Central

    Xu, Rui; Zhen, Zonglei; Liu, Jia

    2010-01-01

    Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies. PMID:21152081

  5. FACTOR ANALYTIC MODELS OF CLUSTERED MULTIVARIATE DATA WITH INFORMATIVE CENSORING

    EPA Science Inventory

    This paper describes a general class of factor analytic models for the analysis of clustered multivariate data in the presence of informative missingness. We assume that there are distinct sets of cluster-level latent variables related to the primary outcomes and to the censorin...

  6. Identification of complex metabolic states in critically injured patients using bioinformatic cluster analysis.

    PubMed

    Cohen, Mitchell J; Grossman, Adam D; Morabito, Diane; Knudson, M Margaret; Butte, Atul J; Manley, Geoffrey T

    2010-01-01

    Advances in technology have made extensive monitoring of patient physiology the standard of care in intensive care units (ICUs). While many systems exist to compile these data, there has been no systematic multivariate analysis and categorization across patient physiological data. The sheer volume and complexity of these data make pattern recognition or identification of patient state difficult. Hierarchical cluster analysis allows visualization of high dimensional data and enables pattern recognition and identification of physiologic patient states. We hypothesized that processing of multivariate data using hierarchical clustering techniques would allow identification of otherwise hidden patient physiologic patterns that would be predictive of outcome. Multivariate physiologic and ventilator data were collected continuously using a multimodal bioinformatics system in the surgical ICU at San Francisco General Hospital. These data were incorporated with non-continuous data and stored on a server in the ICU. A hierarchical clustering algorithm grouped each minute of data into 1 of 10 clusters. Clusters were correlated with outcome measures including incidence of infection, multiple organ failure (MOF), and mortality. We identified 10 clusters, which we defined as distinct patient states. While patients transitioned between states, they spent significant amounts of time in each. Clusters were enriched for our outcome measures: 2 of the 10 states were enriched for infection, 6 of 10 were enriched for MOF, and 3 of 10 were enriched for death. Further analysis of correlations between pairs of variables within each cluster reveals significant differences in physiology between clusters. Here we show for the first time the feasibility of clustering physiological measurements to identify clinically relevant patient states after trauma. These results demonstrate that hierarchical clustering techniques can be useful for visualizing complex multivariate data and may provide new insights for the care of critically injured patients.

  7. Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

    PubMed

    Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

    2007-01-01

    The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.

  8. Identification of Reliable Components in Multivariate Curve Resolution-Alternating Least Squares (MCR-ALS): a Data-Driven Approach across Metabolic Processes.

    PubMed

    Motegi, Hiromi; Tsuboi, Yuuri; Saga, Ayako; Kagami, Tomoko; Inoue, Maki; Toki, Hideaki; Minowa, Osamu; Noda, Tetsuo; Kikuchi, Jun

    2015-11-04

    There is an increasing need to use multivariate statistical methods for understanding biological functions, identifying the mechanisms of diseases, and exploring biomarkers. In addition to classical analyses such as hierarchical cluster analysis, principal component analysis, and partial least squares discriminant analysis, various multivariate strategies, including independent component analysis, non-negative matrix factorization, and multivariate curve resolution, have recently been proposed. However, determining the number of components is problematic. Despite the proposal of several different methods, no satisfactory approach has yet been reported. To resolve this problem, we implemented a new idea: classifying a component as "reliable" or "unreliable" based on the reproducibility of its appearance, regardless of the number of components in the calculation. Using the clustering method for classification, we applied this idea to multivariate curve resolution-alternating least squares (MCR-ALS). Comparisons between conventional and modified methods applied to proton nuclear magnetic resonance ((1)H-NMR) spectral datasets derived from known standard mixtures and biological mixtures (urine and feces of mice) revealed that more plausible results are obtained by the modified method. In particular, clusters containing little information were detected with reliability. This strategy, named "cluster-aided MCR-ALS," will facilitate the attainment of more reliable results in the metabolomics datasets.

  9. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  10. Multivariate Cluster Analysis.

    ERIC Educational Resources Information Center

    McRae, Douglas J.

    Procedures for grouping students into homogeneous subsets have long interested educational researchers. The research reported in this paper is an investigation of a set of objective grouping procedures based on multivariate analysis considerations. Four multivariate functions that might serve as criteria for adequate grouping are given and…

  11. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    ERIC Educational Resources Information Center

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  12. Applications of modern statistical methods to analysis of data in physical science

    NASA Astrophysics Data System (ADS)

    Wicker, James Eric

    Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

  13. Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis.

    PubMed

    Brindle, R C; Ginty, A T; Jones, A; Phillips, A C; Roseboom, T J; Carroll, D; Painter, R C; de Rooij, S R

    2016-12-01

    Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure reactivity patterns and hypertension in a large prospective cohort (age range 55-60 years). Four clusters emerged with statistically different systolic and diastolic blood pressure and heart rate reactivity patterns. Cluster 1 was characterised by a relatively exaggerated blood pressure and heart rate response while the blood pressure and heart rate responses of cluster 2 were relatively modest and in line with the sample mean. Cluster 3 was characterised by blunted cardiovascular stress reactivity across all variables and cluster 4, by an exaggerated blood pressure response and modest heart rate response. Membership to cluster 4 conferred an increased risk of hypertension at 5-year follow-up (hazard ratio=2.98 (95% CI: 1.50-5.90), P<0.01) that survived adjustment for a host of potential confounding variables. These results suggest that the cardiac reactivity plays a potentially important role in the link between blood pressure reactivity and hypertension and support the use of multivariate approaches to stress psychophysiology.

  14. Multivariate time series clustering on geophysical data recorded at Mt. Etna from 1996 to 2003

    NASA Astrophysics Data System (ADS)

    Di Salvo, Roberto; Montalto, Placido; Nunnari, Giuseppe; Neri, Marco; Puglisi, Giuseppe

    2013-02-01

    Time series clustering is an important task in data analysis issues in order to extract implicit, previously unknown, and potentially useful information from a large collection of data. Finding useful similar trends in multivariate time series represents a challenge in several areas including geophysics environment research. While traditional time series analysis methods deal only with univariate time series, multivariate time series analysis is a more suitable approach in the field of research where different kinds of data are available. Moreover, the conventional time series clustering techniques do not provide desired results for geophysical datasets due to the huge amount of data whose sampling rate is different according to the nature of signal. In this paper, a novel approach concerning geophysical multivariate time series clustering is proposed using dynamic time series segmentation and Self Organizing Maps techniques. This method allows finding coupling among trends of different geophysical data recorded from monitoring networks at Mt. Etna spanning from 1996 to 2003, when the transition from summit eruptions to flank eruptions occurred. This information can be used to carry out a more careful evaluation of the state of volcano and to define potential hazard assessment at Mt. Etna.

  15. A nonparametric clustering technique which estimates the number of clusters

    NASA Technical Reports Server (NTRS)

    Ramey, D. B.

    1983-01-01

    In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.

  16. Groundwater source contamination mechanisms: Physicochemical profile clustering, risk factor analysis and multivariate modelling

    NASA Astrophysics Data System (ADS)

    Hynds, Paul; Misstear, Bruce D.; Gill, Laurence W.; Murphy, Heather M.

    2014-04-01

    An integrated domestic well sampling and "susceptibility assessment" programme was undertaken in the Republic of Ireland from April 2008 to November 2010. Overall, 211 domestic wells were sampled, assessed and collated with local climate data. Based upon groundwater physicochemical profile, three clusters have been identified and characterised by source type (borehole or hand-dug well) and local geological setting. Statistical analysis indicates that cluster membership is significantly associated with the prevalence of bacteria (p = 0.001), with mean Escherichia coli presence within clusters ranging from 15.4% (Cluster-1) to 47.6% (Cluster-3). Bivariate risk factor analysis shows that on-site septic tank presence was the only risk factor significantly associated (p < 0.05) with bacterial presence within all clusters. Point agriculture adjacency was significantly associated with both borehole-related clusters. Well design criteria were associated with hand-dug wells and boreholes in areas characterised by high permeability subsoils, while local geological setting was significant for hand-dug wells and boreholes in areas dominated by low/moderate permeability subsoils. Multivariate susceptibility models were developed for all clusters, with predictive accuracies of 84% (Cluster-1) to 91% (Cluster-2) achieved. Septic tank setback was a common variable within all multivariate models, while agricultural sources were also significant, albeit to a lesser degree. Furthermore, well liner clearance was a significant factor in all models, indicating that direct surface ingress is a significant well contamination mechanism. Identification and elucidation of cluster-specific contamination mechanisms may be used to develop improved overall risk management and wellhead protection strategies, while also informing future remediation and maintenance efforts.

  17. Bayesian multivariate hierarchical transformation models for ROC analysis.

    PubMed

    O'Malley, A James; Zou, Kelly H

    2006-02-15

    A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box-Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial.

  18. Bayesian multivariate hierarchical transformation models for ROC analysis

    PubMed Central

    O'Malley, A. James; Zou, Kelly H.

    2006-01-01

    SUMMARY A Bayesian multivariate hierarchical transformation model (BMHTM) is developed for receiver operating characteristic (ROC) curve analysis based on clustered continuous diagnostic outcome data with covariates. Two special features of this model are that it incorporates non-linear monotone transformations of the outcomes and that multiple correlated outcomes may be analysed. The mean, variance, and transformation components are all modelled parametrically, enabling a wide range of inferences. The general framework is illustrated by focusing on two problems: (1) analysis of the diagnostic accuracy of a covariate-dependent univariate test outcome requiring a Box–Cox transformation within each cluster to map the test outcomes to a common family of distributions; (2) development of an optimal composite diagnostic test using multivariate clustered outcome data. In the second problem, the composite test is estimated using discriminant function analysis and compared to the test derived from logistic regression analysis where the gold standard is a binary outcome. The proposed methodology is illustrated on prostate cancer biopsy data from a multi-centre clinical trial. PMID:16217836

  19. Using Interactive Graphics to Teach Multivariate Data Analysis to Psychology Students

    ERIC Educational Resources Information Center

    Valero-Mora, Pedro M.; Ledesma, Ruben D.

    2011-01-01

    This paper discusses the use of interactive graphics to teach multivariate data analysis to Psychology students. Three techniques are explored through separate activities: parallel coordinates/boxplots; principal components/exploratory factor analysis; and cluster analysis. With interactive graphics, students may perform important parts of the…

  20. Resemblance profiles as clustering decision criteria: Estimating statistical power, error, and correspondence for a hypothesis test for multivariate structure.

    PubMed

    Kilborn, Joshua P; Jones, David L; Peebles, Ernst B; Naar, David F

    2017-04-01

    Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.

  1. The Potential of Multivariate Analysis in Assessing Students' Attitude to Curriculum Subjects

    ERIC Educational Resources Information Center

    Gaotlhobogwe, Michael; Laugharne, Janet; Durance, Isabelle

    2011-01-01

    Background: Understanding student attitudes to curriculum subjects is central to providing evidence-based options to policy makers in education. Purpose: We illustrate how quantitative approaches used in the social sciences and based on multivariate analysis (categorical Principal Components Analysis, Clustering Analysis and General Linear…

  2. An enhanced cluster analysis program with bootstrap significance testing for ecological community analysis

    USGS Publications Warehouse

    McKenna, J.E.

    2003-01-01

    The biosphere is filled with complex living patterns and important questions about biodiversity and community and ecosystem ecology are concerned with structure and function of multispecies systems that are responsible for those patterns. Cluster analysis identifies discrete groups within multivariate data and is an effective method of coping with these complexities, but often suffers from subjective identification of groups. The bootstrap testing method greatly improves objective significance determination for cluster analysis. The BOOTCLUS program makes cluster analysis that reliably identifies real patterns within a data set more accessible and easier to use than previously available programs. A variety of analysis options and rapid re-analysis provide a means to quickly evaluate several aspects of a data set. Interpretation is influenced by sampling design and a priori designation of samples into replicate groups, and ultimately relies on the researcher's knowledge of the organisms and their environment. However, the BOOTCLUS program provides reliable, objectively determined groupings of multivariate data.

  3. Using sperm morphometry and multivariate analysis to differentiate species of gray Mazama

    PubMed Central

    Duarte, José Maurício Barbanti

    2016-01-01

    There is genetic evidence that the two species of Brazilian gray Mazama, Mazama gouazoubira and Mazama nemorivaga, belong to different genera. This study identified significant differences that separated them into distinct groups, based on characteristics of the spermatozoa and ejaculate of both species. The characteristics that most clearly differentiated between the species were ejaculate colour, white for M. gouazoubira and reddish for M. nemorivaga, and sperm head dimensions. Multivariate analysis of sperm head dimension and format data accurately discriminated three groups for species with total percentage of misclassified of 0.71. The individual analysis, by animal, and the multivariate analysis have also discriminated correctly all five animals (total percentage of misclassified of 13.95%), and the canonical plot has shown three different clusters: Cluster 1, including individuals of M. nemorivaga; Cluster 2, including two individuals of M. gouazoubira; and Cluster 3, including a single individual of M. gouazoubira. The results obtained in this work corroborate the hypothesis of the formation of new genera and species for gray Mazama. Moreover, the easily applied method described herein can be used as an auxiliary tool to identify sibling species of other taxonomic groups. PMID:28018612

  4. Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

    ERIC Educational Resources Information Center

    Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2008-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…

  5. Clustering of Multivariate Geostatistical Data

    NASA Astrophysics Data System (ADS)

    Fouedjio, Francky

    2017-04-01

    Multivariate data indexed by geographical coordinates have become omnipresent in the geosciences and pose substantial analysis challenges. One of them is the grouping of data locations into spatially contiguous clusters so that data locations belonging to the same cluster have a certain degree of homogeneity while data locations in the different clusters have to be as different as possible. However, groups of data locations created through classical clustering techniques turn out to show poor spatial contiguity, a feature obviously inconvenient for many geoscience applications. In this work, we develop a clustering method that overcomes this problem by accounting the spatial dependence structure of data; thus reinforcing the spatial contiguity of resulting cluster. The capability of the proposed clustering method to provide spatially contiguous and meaningful clusters of data locations is assessed using both synthetic and real datasets. Keywords: clustering, geostatistics, spatial contiguity, spatial dependence.

  6. Variation of heavy metals in recent sediments from Piratininga Lagoon (Brazil): interpretation of geochemical data with the aid of multivariate analysis

    NASA Astrophysics Data System (ADS)

    Huang, W.; Campredon, R.; Abrao, J. J.; Bernat, M.; Latouche, C.

    1994-06-01

    In the last decade, the Atlantic coast of south-eastern Brazil has been affected by increasing deforestation and anthropogenic effluents. Sediments in the coastal lagoons have recorded the process of such environmental change. Thirty-seven sediment samples from three cores in Piratininga Lagoon, Rio de Janeiro, were analyzed for their major components and minor element concentrations in order to examine geochemical characteristics and the depositional environment and to investigate the variation of heavy metals of environmental concern. Two multivariate analysis methods, principal component analysis and cluster analysis, were performed on the analytical data set to help visualize the sample clusters and the element associations. On the whole, the sediment samples from each core are similar and the sample clusters corresponding to the three cores are clearly separated, as a result of the different conditions of sedimentation. Some changes in the depositional environment are recognized using the results of multivariate analysis. The enrichment of Pb, Cu, and Zn in the upper parts of cores is in agreement with increasing anthropogenic influx (pollution).

  7. Temporal and spatial assessment of river surface water quality using multivariate statistical techniques: a study in Can Tho City, a Mekong Delta area, Vietnam.

    PubMed

    Phung, Dung; Huang, Cunrui; Rutherford, Shannon; Dwirahmadi, Febi; Chu, Cordia; Wang, Xiaoming; Nguyen, Minh; Nguyen, Nga Huy; Do, Cuong Manh; Nguyen, Trung Hieu; Dinh, Tuan Anh Diep

    2015-05-01

    The present study is an evaluation of temporal/spatial variations of surface water quality using multivariate statistical techniques, comprising cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA). Eleven water quality parameters were monitored at 38 different sites in Can Tho City, a Mekong Delta area of Vietnam from 2008 to 2012. Hierarchical cluster analysis grouped the 38 sampling sites into three clusters, representing mixed urban-rural areas, agricultural areas and industrial zone. FA/PCA resulted in three latent factors for the entire research location, three for cluster 1, four for cluster 2, and four for cluster 3 explaining 60, 60.2, 80.9, and 70% of the total variance in the respective water quality. The varifactors from FA indicated that the parameters responsible for water quality variations are related to erosion from disturbed land or inflow of effluent from sewage plants and industry, discharges from wastewater treatment plants and domestic wastewater, agricultural activities and industrial effluents, and contamination by sewage waste with faecal coliform bacteria through sewer and septic systems. Discriminant analysis (DA) revealed that nephelometric turbidity units (NTU), chemical oxygen demand (COD) and NH₃ are the discriminating parameters in space, affording 67% correct assignation in spatial analysis; pH and NO₂ are the discriminating parameters according to season, assigning approximately 60% of cases correctly. The findings suggest a possible revised sampling strategy that can reduce the number of sampling sites and the indicator parameters responsible for large variations in water quality. This study demonstrates the usefulness of multivariate statistical techniques for evaluation of temporal/spatial variations in water quality assessment and management.

  8. A Cyber-Attack Detection Model Based on Multivariate Analyses

    NASA Astrophysics Data System (ADS)

    Sakai, Yuto; Rinsaka, Koichiro; Dohi, Tadashi

    In the present paper, we propose a novel cyber-attack detection model based on two multivariate-analysis methods to the audit data observed on a host machine. The statistical techniques used here are the well-known Hayashi's quantification method IV and cluster analysis method. We quantify the observed qualitative audit event sequence via the quantification method IV, and collect similar audit event sequence in the same groups based on the cluster analysis. It is shown in simulation experiments that our model can improve the cyber-attack detection accuracy in some realistic cases where both normal and attack activities are intermingled.

  9. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    DTIC Science & Technology

    2014-12-01

    regression trees Daisy DISsimilAritY PAM partitioning around medoids PMA penalized multivariate analysis SPC sparse principal components UPGMA unweighted...unweighted pair-group average method ( UPGMA ). This method measures dissimilarities between all objects in two clusters and takes the average value

  10. Multivariate Statistical Analysis: a tool for groundwater quality assessment in the hidrogeologic region of the Ring of Cenotes, Yucatan, Mexico.

    NASA Astrophysics Data System (ADS)

    Ye, M.; Pacheco Castro, R. B.; Pacheco Avila, J.; Cabrera Sansores, A.

    2014-12-01

    The karstic aquifer of Yucatan is a vulnerable and complex system. The first fifteen meters of this aquifer have been polluted, due to this the protection of this resource is important because is the only source of potable water of the entire State. Through the assessment of groundwater quality we can gain some knowledge about the main processes governing water chemistry as well as spatial patterns which are important to establish protection zones. In this work multivariate statistical techniques are used to assess the groundwater quality of the supply wells (30 to 40 meters deep) in the hidrogeologic region of the Ring of Cenotes, located in Yucatan, Mexico. Cluster analysis and principal component analysis are applied in groundwater chemistry data of the study area. Results of principal component analysis show that the main sources of variation in the data are due sea water intrusion and the interaction of the water with the carbonate rocks of the system and some pollution processes. The cluster analysis shows that the data can be divided in four clusters. The spatial distribution of the clusters seems to be random, but is consistent with sea water intrusion and pollution with nitrates. The overall results show that multivariate statistical analysis can be successfully applied in the groundwater quality assessment of this karstic aquifer.

  11. Detecting synchronization clusters in multivariate time series via coarse-graining of Markov chains.

    PubMed

    Allefeld, Carsten; Bialonski, Stephan

    2007-12-01

    Synchronization cluster analysis is an approach to the detection of underlying structures in data sets of multivariate time series, starting from a matrix R of bivariate synchronization indices. A previous method utilized the eigenvectors of R for cluster identification, analogous to several recent attempts at group identification using eigenvectors of the correlation matrix. All of these approaches assumed a one-to-one correspondence of dominant eigenvectors and clusters, which has however been shown to be wrong in important cases. We clarify the usefulness of eigenvalue decomposition for synchronization cluster analysis by translating the problem into the language of stochastic processes, and derive an enhanced clustering method harnessing recent insights from the coarse-graining of finite-state Markov processes. We illustrate the operation of our method using a simulated system of coupled Lorenz oscillators, and we demonstrate its superior performance over the previous approach. Finally we investigate the question of robustness of the algorithm against small sample size, which is important with regard to field applications.

  12. Combination of multivariate curve resolution and multivariate classification techniques for comprehensive high-performance liquid chromatography-diode array absorbance detection fingerprints analysis of Salvia reuterana extracts.

    PubMed

    Hakimzadeh, Neda; Parastar, Hadi; Fattahi, Mohammad

    2014-01-24

    In this study, multivariate curve resolution (MCR) and multivariate classification methods are proposed to develop a new chemometric strategy for comprehensive analysis of high-performance liquid chromatography-diode array absorbance detection (HPLC-DAD) fingerprints of sixty Salvia reuterana samples from five different geographical regions. Different chromatographic problems occurred during HPLC-DAD analysis of S. reuterana samples, such as baseline/background contribution and noise, low signal-to-noise ratio (S/N), asymmetric peaks, elution time shifts, and peak overlap are handled using the proposed strategy. In this way, chromatographic fingerprints of sixty samples are properly segmented to ten common chromatographic regions using local rank analysis and then, the corresponding segments are column-wise augmented for subsequent MCR analysis. Extended multivariate curve resolution-alternating least squares (MCR-ALS) is used to obtain pure component profiles in each segment. In general, thirty-one chemical components were resolved using MCR-ALS in sixty S. reuterana samples and the lack of fit (LOF) values of MCR-ALS models were below 10.0% in all cases. Pure spectral profiles are considered for identification of chemical components by comparing their resolved spectra with the standard ones and twenty-four components out of thirty-one components were identified. Additionally, pure elution profiles are used to obtain relative concentrations of chemical components in different samples for multivariate classification analysis by principal component analysis (PCA) and k-nearest neighbors (kNN). Inspection of the PCA score plot (explaining 76.1% of variance accounted for three PCs) showed that S. reuterana samples belong to four clusters. The degree of class separation (DCS) which quantifies the distance separating clusters in relation to the scatter within each cluster is calculated for four clusters and it was in the range of 1.6-5.8. These results are then confirmed by kNN. In addition, according to the PCA loading plot and kNN dendrogram of thirty-one variables, five chemical constituents of luteolin-7-o-glucoside, salvianolic acid D, rosmarinic acid, lithospermic acid and trijuganone A are identified as the most important variables (i.e., chemical markers) for clusters discrimination. Finally, the effect of different chemical markers on samples differentiation is investigated using counter-propagation artificial neural network (CP-ANN) method. It is concluded that the proposed strategy can be successfully applied for comprehensive analysis of chromatographic fingerprints of complex natural samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. A hybrid clustering approach for multivariate time series - A case study applied to failure analysis in a gas turbine.

    PubMed

    Fontes, Cristiano Hora; Budman, Hector

    2017-11-01

    A clustering problem involving multivariate time series (MTS) requires the selection of similarity metrics. This paper shows the limitations of the PCA similarity factor (SPCA) as a single metric in nonlinear problems where there are differences in magnitude of the same process variables due to expected changes in operation conditions. A novel method for clustering MTS based on a combination between SPCA and the average-based Euclidean distance (AED) within a fuzzy clustering approach is proposed. Case studies involving either simulated or real industrial data collected from a large scale gas turbine are used to illustrate that the hybrid approach enhances the ability to recognize normal and fault operating patterns. This paper also proposes an oversampling procedure to create synthetic multivariate time series that can be useful in commonly occurring situations involving unbalanced data sets. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  14. A method of using cluster analysis to study statistical dependence in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.

  15. Multivariate Analysis of the Visual Information Processing of Numbers

    ERIC Educational Resources Information Center

    Levine, David M.

    1977-01-01

    Nonmetric multidimensional scaling and hierarchical clustering procedures are applied to a confusion matrix of numerals. Two dimensions were interpreted: straight versus curved, and locus of curvature. Four major clusters of numerals were developed. (Author/JKS)

  16. Multi-Sample Cluster Analysis Using Akaike’s Information Criterion.

    DTIC Science & Technology

    1982-12-20

    of Likelihood Criteria for I)fferent Hypotheses," in P. A. Krishnaiah (Ed.), Multivariate Analysis-Il, New York: Academic Press. [5] Fisher, R. A...Methods of Simultaneous Inference in MANOVA," in P. R. Krishnaiah (Ed.), rultivariate Analysis-Il, New York: Academic Press. [8) Kendall, M. G. (1966...1982), Applied Multivariate Statisti- cal-Analysis, Englewood Cliffs: Prentice-Mall, Inc. [1U] Krishnaiah , P. R. (1969), "Simultaneous Test

  17. Multivariate analysis: A statistical approach for computations

    NASA Astrophysics Data System (ADS)

    Michu, Sachin; Kaushik, Vandana

    2014-10-01

    Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.

  18. Assessment of trace elements levels in patients with Type 2 diabetes using multivariate statistical analysis.

    PubMed

    Badran, M; Morsy, R; Soliman, H; Elnimr, T

    2016-01-01

    The trace elements metabolism has been reported to possess specific roles in the pathogenesis and progress of diabetes mellitus. Due to the continuous increase in the population of patients with Type 2 diabetes (T2D), this study aims to assess the levels and inter-relationships of fast blood glucose (FBG) and serum trace elements in Type 2 diabetic patients. This study was conducted on 40 Egyptian Type 2 diabetic patients and 36 healthy volunteers (Hospital of Tanta University, Tanta, Egypt). The blood serum was digested and then used to determine the levels of 24 trace elements using an inductive coupled plasma mass spectroscopy (ICP-MS). Multivariate statistical analysis depended on correlation coefficient, cluster analysis (CA) and principal component analysis (PCA), were used to analysis the data. The results exhibited significant changes in FBG and eight of trace elements, Zn, Cu, Se, Fe, Mn, Cr, Mg, and As, levels in the blood serum of Type 2 diabetic patients relative to those of healthy controls. The statistical analyses using multivariate statistical techniques were obvious in the reduction of the experimental variables, and grouping the trace elements in patients into three clusters. The application of PCA revealed a distinct difference in associations of trace elements and their clustering patterns in control and patients group in particular for Mg, Fe, Cu, and Zn that appeared to be the most crucial factors which related with Type 2 diabetes. Therefore, on the basis of this study, the contributors of trace elements content in Type 2 diabetic patients can be determine and specify with correlation relationship and multivariate statistical analysis, which confirm that the alteration of some essential trace metals may play a role in the development of diabetes mellitus. Copyright © 2015 Elsevier GmbH. All rights reserved.

  19. Groundwater quality assessment of urban Bengaluru using multivariate statistical techniques

    NASA Astrophysics Data System (ADS)

    Gulgundi, Mohammad Shahid; Shetty, Amba

    2018-03-01

    Groundwater quality deterioration due to anthropogenic activities has become a subject of prime concern. The objective of the study was to assess the spatial and temporal variations in groundwater quality and to identify the sources in the western half of the Bengaluru city using multivariate statistical techniques. Water quality index rating was calculated for pre and post monsoon seasons to quantify overall water quality for human consumption. The post-monsoon samples show signs of poor quality in drinking purpose compared to pre-monsoon. Cluster analysis (CA), principal component analysis (PCA) and discriminant analysis (DA) were applied to the groundwater quality data measured on 14 parameters from 67 sites distributed across the city. Hierarchical cluster analysis (CA) grouped the 67 sampling stations into two groups, cluster 1 having high pollution and cluster 2 having lesser pollution. Discriminant analysis (DA) was applied to delineate the most meaningful parameters accounting for temporal and spatial variations in groundwater quality of the study area. Temporal DA identified pH as the most important parameter, which discriminates between water quality in the pre-monsoon and post-monsoon seasons and accounts for 72% seasonal assignation of cases. Spatial DA identified Mg, Cl and NO3 as the three most important parameters discriminating between two clusters and accounting for 89% spatial assignation of cases. Principal component analysis was applied to the dataset obtained from the two clusters, which evolved three factors in each cluster, explaining 85.4 and 84% of the total variance, respectively. Varifactors obtained from principal component analysis showed that groundwater quality variation is mainly explained by dissolution of minerals from rock water interactions in the aquifer, effect of anthropogenic activities and ion exchange processes in water.

  20. Prevalence and risk factors for scrub typhus in South India.

    PubMed

    Trowbridge, Paul; P, Divya; Premkumar, Prasanna S; Varghese, George M

    2017-05-01

    To determine the prevalence and risk factors of scrub typhus in Tamil Nadu, South India. We performed a clustered seroprevalence study of the areas around Vellore. All participants completed a risk factor survey, with seropositive and seronegative participants acting as cases and controls, respectively, in a risk factor analysis. After univariate analysis, variables found to be significant underwent multivariate analysis. Of 721 people participating in this study, 31.8% tested seropositive. By univariate analysis, after accounting for clustering, having a house that was clustered with other houses, having a fewer rooms in a house, having fewer people living in a household, defecating outside, female sex, age >60 years, shorter height, lower weight, smaller body mass index and smaller mid-upper arm circumference were found to be significantly associated with seropositivity. After multivariate regression modelling, living in a house clustered with other houses, female sex and age >60 years were significantly associated with scrub typhus exposure. Overall, scrub typhus is much more common than previously thought. Previously described individual environmental and habitual risk factors seem to have less importance in South India, perhaps because of the overall scrub typhus-conducive nature of the environment in this region. © 2017 John Wiley & Sons Ltd.

  1. Clustering analysis for muon tomography data elaboration in the Muon Portal project

    NASA Astrophysics Data System (ADS)

    Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

    2015-05-01

    Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.

  2. Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data

    NASA Astrophysics Data System (ADS)

    Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.

    2014-12-01

    We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar results with data from APXS. These techniques offer new ways to understand the chemical relationships between the materials interrogated by Curiosity, and potentially their relation to materials observed by APXS instruments on other landed missions.

  3. Biostatistics Series Module 10: Brief Overview of Multivariate Methods.

    PubMed

    Hazra, Avijit; Gogtay, Nithya

    2017-01-01

    Multivariate analysis refers to statistical techniques that simultaneously look at three or more variables in relation to the subjects under investigation with the aim of identifying or clarifying the relationships between them. These techniques have been broadly classified as dependence techniques, which explore the relationship between one or more dependent variables and their independent predictors, and interdependence techniques, that make no such distinction but treat all variables equally in a search for underlying relationships. Multiple linear regression models a situation where a single numerical dependent variable is to be predicted from multiple numerical independent variables. Logistic regression is used when the outcome variable is dichotomous in nature. The log-linear technique models count type of data and can be used to analyze cross-tabulations where more than two variables are included. Analysis of covariance is an extension of analysis of variance (ANOVA), in which an additional independent variable of interest, the covariate, is brought into the analysis. It tries to examine whether a difference persists after "controlling" for the effect of the covariate that can impact the numerical dependent variable of interest. Multivariate analysis of variance (MANOVA) is a multivariate extension of ANOVA used when multiple numerical dependent variables have to be incorporated in the analysis. Interdependence techniques are more commonly applied to psychometrics, social sciences and market research. Exploratory factor analysis and principal component analysis are related techniques that seek to extract from a larger number of metric variables, a smaller number of composite factors or components, which are linearly related to the original variables. Cluster analysis aims to identify, in a large number of cases, relatively homogeneous groups called clusters, without prior information about the groups. The calculation intensive nature of multivariate analysis has so far precluded most researchers from using these techniques routinely. The situation is now changing with wider availability, and increasing sophistication of statistical software and researchers should no longer shy away from exploring the applications of multivariate methods to real-life data sets.

  4. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review.

    PubMed

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-12-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis (PCA) and cluster analysis (CA). Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Advanced multivariate analysis to assess remediation of hydrocarbons in soils.

    PubMed

    Lin, Deborah S; Taylor, Peter; Tibbett, Mark

    2014-10-01

    Accurate monitoring of degradation levels in soils is essential in order to understand and achieve complete degradation of petroleum hydrocarbons in contaminated soils. We aimed to develop the use of multivariate methods for the monitoring of biodegradation of diesel in soils and to determine if diesel contaminated soils could be remediated to a chemical composition similar to that of an uncontaminated soil. An incubation experiment was set up with three contrasting soil types. Each soil was exposed to diesel at varying stages of degradation and then analysed for key hydrocarbons throughout 161 days of incubation. Hydrocarbon distributions were analysed by Principal Coordinate Analysis and similar samples grouped by cluster analysis. Variation and differences between samples were determined using permutational multivariate analysis of variance. It was found that all soils followed trajectories approaching the chemical composition of the unpolluted soil. Some contaminated soils were no longer significantly different to that of uncontaminated soil after 161 days of incubation. The use of cluster analysis allows the assignment of a percentage chemical similarity of a diesel contaminated soil to an uncontaminated soil sample. This will aid in the monitoring of hydrocarbon contaminated sites and the establishment of potential endpoints for successful remediation.

  6. A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

    PubMed Central

    Horsch, Salome; Kopczynski, Dominik; Kuthe, Elias; Baumbach, Jörg Ingo; Rahmann, Sven

    2017-01-01

    Motivation Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. Method We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. Results The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. PMID:28910313

  7. Transmission clustering among newly diagnosed HIV patients in Chicago, 2008 to 2011: using phylogenetics to expand knowledge of regional HIV transmission patterns

    PubMed Central

    Lubelchek, Ronald J.; Hoehnen, Sarah C.; Hotton, Anna L.; Kincaid, Stacey L.; Barker, David E.; French, Audrey L.

    2014-01-01

    Introduction HIV transmission cluster analyses can inform HIV prevention efforts. We describe the first such assessment for transmission clustering among HIV patients in Chicago. Methods We performed transmission cluster analyses using HIV pol sequences from newly diagnosed patients presenting to Chicago’s largest HIV clinic between 2008 and 2011. We compared sequences via progressive pairwise alignment, using neighbor joining to construct an un-rooted phylogenetic tree. We defined clusters as >2 sequences among which each sequence had at least one partner within a genetic distance of ≤ 1.5%. We used multivariable regression to examine factors associated with clustering and used geospatial analysis to assess geographic proximity of phylogenetically clustered patients. Results We compared sequences from 920 patients; median age 35 years; 75% male; 67% Black, 23% Hispanic; 8% had a Rapid Plasma Reagin (RPR) titer ≥ 1:16 concurrent with their HIV diagnosis. We had HIV transmission risk data for 54%; 43% identified as men who have sex with men (MSM). Phylogenetic analysis demonstrated 123 patients (13%) grouped into 26 clusters, the largest having 20 members. In multivariable regression, age < 25, Black race, MSM status, male gender, higher HIV viral load, and RPR ≥ 1:16 associated with clustering. We did not observe geographic grouping of genetically clustered patients. Discussion Our results demonstrate high rates of HIV transmission clustering, without local geographic foci, among young Black MSM in Chicago. Applied prospectively, phylogenetic analyses could guide prevention efforts and help break the cycle of transmission. PMID:25321182

  8. Igloo-Plot: a tool for visualization of multidimensional datasets.

    PubMed

    Kuntal, Bhusan K; Ghosh, Tarini Shankar; Mande, Sharmila S

    2014-01-01

    Advances in science and technology have resulted in an exponential growth of multivariate (or multi-dimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies. Igloo-Plot is available for download from: http://metagenomics.atc.tcs.com/IglooPlot/. Copyright © 2014 Elsevier Inc. All rights reserved.

  9. A CLIPS expert system for clinical flow cytometry data analysis

    NASA Technical Reports Server (NTRS)

    Salzman, G. C.; Duque, R. E.; Braylan, R. C.; Stewart, C. C.

    1990-01-01

    An expert system is being developed using CLIPS to assist clinicians in the analysis of multivariate flow cytometry data from cancer patients. Cluster analysis is used to find subpopulations representing various cell types in multiple datasets each consisting of four to five measurements on each of 5000 cells. CLIPS facts are derived from results of the clustering. CLIPS rules are based on the expertise of Drs. Stewart, Duque, and Braylan. The rules incorporate certainty factors based on case histories.

  10. Multi-Sample Cluster Analysis Using Akaike’s Information Criterion.

    DTIC Science & Technology

    1982-12-20

    Intervals. For more details on these test procedures refer to Gabriel [7J, Krishnaiah (CIlUj, [11]), Srivastava [16), and others. -3- As noted in Consul...723. (4] Consul, P. C. (1969), "The Exact Distributions of Likelihood Criteria for Different Hypotheses," in P. R. Krishnaiah (Ed.), Multivariate...1178. [7] Gabriel, K. R. (1969), "A Comparison of Some lethods of Simultaneous Inference in MANOVA," in P. R. Krishnaiah (Ed.), Multivariate Analysis-lI

  11. A Multivariate Analysis of Galaxy Cluster Properties

    NASA Astrophysics Data System (ADS)

    Ogle, P. M.; Djorgovski, S.

    1993-05-01

    We have assembled from the literature a data base on on 394 clusters of galaxies, with up to 16 parameters per cluster. They include optical and x-ray luminosities, x-ray temperatures, galaxy velocity dispersions, central galaxy and particle densities, optical and x-ray core radii and ellipticities, etc. In addition, derived quantities, such as the mass-to-light ratios and x-ray gas masses are included. Doubtful measurements have been identified, and deleted from the data base. Our goal is to explore the correlations between these parameters, and interpret them in the framework of our understanding of evolution of clusters and large-scale structure, such as the Gott-Rees scaling hierarchy. Among the simple, monovariate correlations we found, the most significant include those between the optical and x-ray luminosities, x-ray temperatures, cluster velocity dispersions, and central galaxy densities, in various mutual combinations. While some of these correlations have been discussed previously in the literature, generally smaller samples of objects have been used. We will also present the results of a multivariate statistical analysis of the data, including a principal component analysis (PCA). Such an approach has not been used previously for studies of cluster properties, even though it is much more powerful and complete than the simple monovariate techniques which are commonly employed. The observed correlations may lead to powerful constraints for theoretical models of formation and evolution of galaxy clusters. P.M.O. was supported by a Caltech graduate fellowship. S.D. acknowledges a partial support from the NASA contract NAS5-31348 and the NSF PYI award AST-9157412.

  12. Symptom clusters predict mortality among dialysis patients in Norway: a prospective observational cohort study.

    PubMed

    Amro, Amin; Waldum, Bård; von der Lippe, Nanna; Brekke, Fredrik Barth; Dammen, Toril; Miaskowski, Christine; Os, Ingrid

    2015-01-01

    Patients with end-stage renal disease on dialysis have reduced survival rates compared with the general population. Symptoms are frequent in dialysis patients, and a symptom cluster is defined as two or more related co-occurring symptoms. The aim of this study was to explore the associations between symptom clusters and mortality in dialysis patients. In a prospective observational cohort study of dialysis patients (n = 301), Kidney Disease and Quality of Life Short Form and Beck Depression Inventory questionnaires were administered. To generate symptom clusters, principal component analysis with varimax rotation was used on 11 kidney-specific self-reported physical symptoms. A Beck Depression Inventory score of 16 or greater was defined as clinically significant depressive symptoms. Physical and mental component summary scores were generated from Short Form-36. Multivariate Cox regression analysis was used for the survival analysis, Kaplan-Meier curves and log-rank statistics were applied to compare survival rates between the groups. Three different symptom clusters were identified; one included loading of several uremic symptoms. In multivariate analyses and after adjustment for health-related quality of life and depressive symptoms, the worst perceived quartile of the "uremic" symptom cluster independently predicted all-cause mortality (hazard ratio 2.47, 95% CI 1.44-4.22, P = 0.001) compared with the other quartiles during a follow-up period that ranged from four to 52 months. The two other symptom clusters ("neuromuscular" and "skin") or the individual symptoms did not predict mortality. Clustering of uremic symptoms predicted mortality. Assessing co-occurring symptoms rather than single symptoms may help to identify dialysis patients at high risk for mortality. Copyright © 2015 American Academy of Hospice and Palliative Medicine. Published by Elsevier Inc. All rights reserved.

  13. Fourier Transform Infrared Spectroscopy (FTIR) and Multivariate Analysis for Identification of Different Vegetable Oils Used in Biodiesel Production

    PubMed Central

    Mueller, Daniela; Ferrão, Marco Flôres; Marder, Luciano; da Costa, Adilson Ben; de Cássia de Souza Schneider, Rosana

    2013-01-01

    The main objective of this study was to use infrared spectroscopy to identify vegetable oils used as raw material for biodiesel production and apply multivariate analysis to the data. Six different vegetable oil sources—canola, cotton, corn, palm, sunflower and soybeans—were used to produce biodiesel batches. The spectra were acquired by Fourier transform infrared spectroscopy using a universal attenuated total reflectance sensor (FTIR-UATR). For the multivariate analysis principal component analysis (PCA), hierarchical cluster analysis (HCA), interval principal component analysis (iPCA) and soft independent modeling of class analogy (SIMCA) were used. The results indicate that is possible to develop a methodology to identify vegetable oils used as raw material in the production of biodiesel by FTIR-UATR applying multivariate analysis. It was also observed that the iPCA found the best spectral range for separation of biodiesel batches using FTIR-UATR data, and with this result, the SIMCA method classified 100% of the soybean biodiesel samples. PMID:23539030

  14. Multivariate analysis of molecular and morphological diversity in fig (Ficus carica L.)

    USDA-ARS?s Scientific Manuscript database

    Genetic polymorphism across 15 microsatellite loci among 194 fig accessions including Common, Smyrna, San Pedro, and Caprifig were analyzed using a cluster analysis (CA) and the principal components analysis (PCA). The collection was moderately variable with observed number of alleles per locus rang...

  15. Multivariate analysis of fatty acid and biochemical constitutes of seaweeds to characterize their potential as bioresource for biofuel and fine chemicals.

    PubMed

    Verma, Priyanka; Kumar, Manoj; Mishra, Girish; Sahoo, Dinabandhu

    2017-02-01

    In the present study bio prospecting of thirty seaweeds from Indian coasts was analyzed for their biochemical components including pigments, fatty acid and ash content. Multivariate analysis of biochemical components and fatty acids was done using Principal Component Analysis (PCA) and Agglomerative hierarchical clustering (AHC) to manifest chemotaxonomic relationship among various seaweeds. The overall analysis suggests that these seaweeds have multi-functional properties and can be utilized as promising bioresource for proteins, lipids, pigments and carbohydrates for the food/feed and biofuel industry. Copyright © 2016. Published by Elsevier Ltd.

  16. 1 H-NMR with Multivariate Analysis for Automobile Lubricant Comparison.

    PubMed

    Kim, Siwon; Yoon, Dahye; Lee, Dong-Kye; Yoon, Changshin; Kim, Suhkmann

    2017-07-01

    Identification of suspected automobile-related lubricants could provide valuable information in forensic cases. We examined that automobile lubricants might exhibit the chemometric characteristics to their individual usages. To compare the degree of clustering in the plots, we co-plotted general industrial oils that were highly dissimilar with automobile lubricants in additive compositions. 1 H-NMR spectroscopy was used with multivariate statistics as a tool for grouping, clustering, and identification of automobile lubricants in laboratory conditions. We analyzed automobile lubricants including automobile engine oils, automobile transmission oils, automobile gear oils, and motorcycle oils. In contrast to the general industrial oils, automobile lubricants showed relatively high tendencies of clustering to their usages. Our pilot study demonstrated that the comparison of known and questioned samples to their usages might be possible in forensic fields. © 2017 American Academy of Forensic Sciences.

  17. An Extension of Multiple Correspondence Analysis for Identifying Heterogeneous Subgroups of Respondents

    ERIC Educational Resources Information Center

    Hwang, Heungsun; Montreal, Hec; Dillon, William R.; Takane, Yoshio

    2006-01-01

    An extension of multiple correspondence analysis is proposed that takes into account cluster-level heterogeneity in respondents' preferences/choices. The method involves combining multiple correspondence analysis and k-means in a unified framework. The former is used for uncovering a low-dimensional space of multivariate categorical variables…

  18. Detecting Outliers in Factor Analysis Using the Forward Search Algorithm

    ERIC Educational Resources Information Center

    Mavridis, Dimitris; Moustaki, Irini

    2008-01-01

    In this article we extend and implement the forward search algorithm for identifying atypical subjects/observations in factor analysis models. The forward search has been mainly developed for detecting aberrant observations in regression models (Atkinson, 1994) and in multivariate methods such as cluster and discriminant analysis (Atkinson, Riani,…

  19. OGLE II Eclipsing Binaries In The LMC: Analysis With Class

    NASA Astrophysics Data System (ADS)

    Devinney, Edward J.; Prsa, A.; Guinan, E. F.; DeGeorge, M.

    2011-01-01

    The Eclipsing Binaries (EBs) via Artificial Intelligence (EBAI) Project is applying machine learning techniques to elucidate the nature of EBs. Previously, Prsa, et al. applied artificial neural networks (ANNs) trained on physically-realistic Wilson-Devinney models to solve the light curves of the 1882 detached EBs in the LMC discovered by the OGLE II Project (Wyrzykowski, et al.) fully automatically, bypassing the need for manually-derived starting solutions. A curious result is the non-monotonic distribution of the temperature ratio parameter T2/T1, featuring a subsidiary peak noted previously by Mazeh, et al. in an independent analysis using the EBOP EB solution code (Tamuz, et al.). To explore this and to gain a fuller understanding of the multivariate EBAI LMC observational plus solutions data, we have employed automatic clustering and advanced visualization (CAV) techniques. Clustering the OGLE II data aggregates objects that are similar with respect to many parameter dimensions. Measures of similarity for example, could include the multidimensional Euclidean Distance between data objects, although other measures may be appropriate. Applying clustering, we find good evidence that the T2/T1 subsidiary peak is due to evolved binaries, in support of Mazeh et al.'s speculation. Further, clustering suggests that the LMC detached EBs occupying the main sequence region belong to two distinct classes. Also identified as a separate cluster in the multivariate data are stars having a Period-I band relation. Derekas et al. had previously found a Period-K band relation for LMC EBs discovered by the MACHO Project (Alcock, et al.). We suggest such CAV techniques will prove increasingly useful for understanding the large, multivariate datasets increasingly being produced in astronomy. We are grateful for the support of this research from NSF/RUI Grant AST-05-75042 f.

  20. Untangling Magmatic Processes and Hydrothermal Alteration of in situ Superfast Spreading Ocean Crust at ODP/IODP Site 1256 with Fuzzy c-means Cluster Analysis of Rock Magnetic Properties

    NASA Astrophysics Data System (ADS)

    Dekkers, M. J.; Heslop, D.; Herrero-Bervera, E.; Acton, G.; Krasa, D.

    2014-12-01

    Ocean Drilling Program (ODP)/Integrated ODP (IODP) Hole 1256D (6.44.1' N, 91.56.1' W) on the Cocos Plate occurs in 15.2 Ma oceanic crust generated by superfast seafloor spreading. Presently, it is the only drill hole that has sampled all three oceanic crust layers in a tectonically undisturbed setting. Here we interpret down-hole trends in several rock-magnetic parameters with fuzzy c-means cluster analysis, a multivariate statistical technique. The parameters include the magnetization ratio, the coercivity ratio, the coercive force, the low-field susceptibility, and the Curie temperature. By their combined, multivariate, analysis the effects of magmatic and hydrothermal processes can be evaluated. The optimal number of clusters - a key point in the analysis because there is no a priori information on this - was determined through a combination of approaches: by calculation of several cluster validity indices, by testing for coherent cluster distributions on non-linear-map plots, and importantly by testing for stability of the cluster solution from all possible starting points. Here, we consider a solution robust if the cluster allocation is independent of the starting configuration. The five-cluster solution appeared to be robust. Three clusters are distinguished in the extrusive segment of the Hole that express increasing hydrothermal alteration of the lavas. The sheeted dike and gabbro portions are characterized by two clusters, both with higher coercivities than in lava samples. Extensive alteration, however, can obliterate magnetic property differences between lavas, dikes, and gabbros. The imprint of thermochemical alteration on the iron-titanium oxides is only partially related to the porosity of the rocks. All clusters display rock magnetic characteristics in line with a stable NRM. This implies that the entire sampled sequence of ocean crust can contribute to marine magnetic anomalies. Determination of the absolute paleointensity with thermal techniques is not straightforward because of the propensity of oxyexsolution during laboratory heating and/or the presence of intergrowths. The upper part of the extrusive sequence, the granoblastic portion of the dikes, and moderately altered gabbros may contain a comparatively uncontaminated thermoremanent magnetization.

  1. Use of multivariate statistics to identify unreliable data obtained using CASA.

    PubMed

    Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón

    2013-06-01

    In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.

  2. Discrimination of cultivation ages and cultivars of ginseng leaves using Fourier transform infrared spectroscopy combined with multivariate analysis

    PubMed Central

    Kwon, Yong-Kook; Ahn, Myung Suk; Park, Jong Suk; Liu, Jang Ryol; In, Dong Su; Min, Byung Whan; Kim, Suk Weon

    2013-01-01

    To determine whether Fourier transform (FT)-IR spectral analysis combined with multivariate analysis of whole-cell extracts from ginseng leaves can be applied as a high-throughput discrimination system of cultivation ages and cultivars, a total of total 480 leaf samples belonging to 12 categories corresponding to four different cultivars (Yunpung, Kumpung, Chunpung, and an open-pollinated variety) and three different cultivation ages (1 yr, 2 yr, and 3 yr) were subjected to FT-IR. The spectral data were analyzed by principal component analysis and partial least squares-discriminant analysis. A dendrogram based on hierarchical clustering analysis of the FT-IR spectral data on ginseng leaves showed that leaf samples were initially segregated into three groups in a cultivation age-dependent manner. Then, within the same cultivation age group, leaf samples were clustered into four subgroups in a cultivar-dependent manner. The overall prediction accuracy for discrimination of cultivars and cultivation ages was 94.8% in a cross-validation test. These results clearly show that the FT-IR spectra combined with multivariate analysis from ginseng leaves can be applied as an alternative tool for discriminating of ginseng cultivars and cultivation ages. Therefore, we suggest that this result could be used as a rapid and reliable F1 hybrid seed-screening tool for accelerating the conventional breeding of ginseng. PMID:24558311

  3. Hydrogeochemistry and water quality of the Kordkandi-Duzduzan plain, NW Iran: application of multivariate statistical analysis and PoS index.

    PubMed

    Soltani, Shahla; Asghari Moghaddam, Asghar; Barzegar, Rahim; Kazemian, Naeimeh; Tziritis, Evangelos

    2017-08-18

    Kordkandi-Duzduzan plain is one of the fertile plains of East Azarbaijan Province, NW of Iran. Groundwater is an important resource for drinking and agricultural purposes due to the lack of surface water resources in the region. The main objectives of the present study are to identify the hydrogeochemical processes and the potential sources of major, minor, and trace metals and metalloids such as Cr, Mn, Cd, Fe, Al, and As by using joint hydrogeochemical techniques and multivariate statistical analysis and to evaluate groundwater quality deterioration with the use of PoS environmental index. To achieve these objectives, 23 groundwater samples were collected in September 2015. Piper diagram shows that the mixed Ca-Mg-Cl is the dominant groundwater type, and some of the samples have Ca-HCO 3 , Ca-Cl, and Na-Cl types. Multivariate statistical analyses indicate that weathering and dissolution of different rocks and minerals, e.g., silicates, gypsum, and halite, ion exchange, and agricultural activities influence the hydrogeochemistry of the study area. The cluster analysis divides the samples into two distinct clusters which are completely different in EC (and its dependent variables such as Na + , K + , Ca 2+ , Mg 2+ , SO 4 2- , and Cl - ), Cd, and Cr variables according to the ANOVA statistical test. Based on the median values, the concentrations of pH, NO 3 - , SiO 2 , and As in cluster 1 are elevated compared with those of cluster 2, while their maximum values occur in cluster 2. According to the PoS index, the dominant parameter that controls quality deterioration is As, with 60% of contribution. Samples of lowest PoS values are located in the southern and northern parts (recharge area) while samples of the highest values are located in the discharge area and the eastern part.

  4. Unsupervised pattern recognition methods in ciders profiling based on GCE voltammetric signals.

    PubMed

    Jakubowska, Małgorzata; Sordoń, Wanda; Ciepiela, Filip

    2016-07-15

    This work presents a complete methodology of distinguishing between different brands of cider and ageing degrees, based on voltammetric signals, utilizing dedicated data preprocessing procedures and unsupervised multivariate analysis. It was demonstrated that voltammograms recorded on glassy carbon electrode in Britton-Robinson buffer at pH 2 are reproducible for each brand. By application of clustering algorithms and principal component analysis visible homogenous clusters were obtained. Advanced signal processing strategy which included automatic baseline correction, interval scaling and continuous wavelet transform with dedicated mother wavelet, was a key step in the correct recognition of the objects. The results show that voltammetry combined with optimized univariate and multivariate data processing is a sufficient tool to distinguish between ciders from various brands and to evaluate their freshness. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Agro-ecoregionalization of Iowa using multivariate geographical clustering

    Treesearch

    Carol L. Williams; William W. Hargrove; Matt Leibman; David E. James

    2008-01-01

    Agro-ecoregionalization is categorization of landscapes for use in crop suitability analysis, strategic agroeconomic development, risk analysis, and other purposes. Past agro-ecoregionalizations have been subjective, expert opinion driven, crop specific, and unsuitable for statistical extrapolation. Use of quantitative analytical methods provides an opportunity for...

  6. Applying Multivariate Adaptive Splines to Identify Genes With Expressions Varying After Diagnosis in Microarray Experiments.

    PubMed

    Duan, Fenghai; Xu, Ye

    2017-01-01

    To analyze a microarray experiment to identify the genes with expressions varying after the diagnosis of breast cancer. A total of 44 928 probe sets in an Affymetrix microarray data publicly available on Gene Expression Omnibus from 249 patients with breast cancer were analyzed by the nonparametric multivariate adaptive splines. Then, the identified genes with turning points were grouped by K-means clustering, and their network relationship was subsequently analyzed by the Ingenuity Pathway Analysis. In total, 1640 probe sets (genes) were reliably identified to have turning points along with the age at diagnosis in their expression profiling, of which 927 expressed lower after turning points and 713 expressed higher after the turning points. K-means clustered them into 3 groups with turning points centering at 54, 62.5, and 72, respectively. The pathway analysis showed that the identified genes were actively involved in various cancer-related functions or networks. In this article, we applied the nonparametric multivariate adaptive splines method to a publicly available gene expression data and successfully identified genes with expressions varying before and after breast cancer diagnosis.

  7. The effect of heavy metal contamination on the bacterial community structure at Jiaozhou Bay, China.

    PubMed

    Yao, Xie-Feng; Zhang, Jiu-Ming; Tian, Li; Guo, Jian-Hua

    In this study, determination of heavy metal parameters and microbiological characterization of marine sediments obtained from two heavily polluted sites and one low-grade contaminated reference station at Jiaozhou Bay in China were carried out. The microbial communities found in the sampled marine sediments were studied using PCR-DGGE (denaturing gradient gel electrophoresis) fingerprinting profiles in combination with multivariate analysis. Clustering analysis of DGGE and matrix of heavy metals displayed similar occurrence patterns. On this basis, 17 samples were classified into two clusters depending on the presence or absence of the high level contamination. Moreover, the cluster of highly contaminated samples was further classified into two sub-groups based on the stations of their origin. These results showed that the composition of the bacterial community is strongly influenced by heavy metal variables present in the sediments found in the Jiaozhou Bay. This study also suggested that metagenomic techniques such as PCR-DGGE fingerprinting in combination with multivariate analysis is an efficient method to examine the effect of metal contamination on the bacterial community structure. Copyright © 2016 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.

  8. Groundwater flow and hydrogeochemical evolution in the Jianghan Plain, central China

    NASA Astrophysics Data System (ADS)

    Gan, Yiqun; Zhao, Ke; Deng, Yamin; Liang, Xing; Ma, Teng; Wang, Yanxin

    2018-05-01

    Hydrogeochemical analysis and multivariate statistics were applied to identify flow patterns and major processes controlling the hydrogeochemistry of groundwater in the Jianghan Plain, which is located in central Yangtze River Basin (central China) and characterized by intensive surface-water/groundwater interaction. Although HCO3-Ca-(Mg) type water predominated in the study area, the 457 (21 surface water and 436 groundwater) samples were effectively classified into five clusters by hierarchical cluster analysis. The hydrochemical variations among these clusters were governed by three factors from factor analysis. Major components (e.g., Ca, Mg and HCO3) in surface water and groundwater originated from carbonate and silicate weathering (factor 1). Redox conditions (factor 2) influenced the geogenic Fe and As contamination in shallow confined groundwater. Anthropogenic activities (factor 3) primarily caused high levels of Cl and SO4 in surface water and phreatic groundwater. Furthermore, the factor score 1 of samples in the shallow confined aquifer gradually increased along the flow paths. This study demonstrates that enhanced information on hydrochemistry in complex groundwater flow systems, by multivariate statistical methods, improves the understanding of groundwater flow and hydrogeochemical evolution due to natural and anthropogenic impacts.

  9. Epidemiological study of phylogenetic transmission clusters in a local HIV-1 epidemic reveals distinct differences between subtype B and non-B infections.

    PubMed

    Chalmet, Kristen; Staelens, Delfien; Blot, Stijn; Dinakis, Sylvie; Pelgrom, Jolanda; Plum, Jean; Vogelaers, Dirk; Vandekerckhove, Linos; Verhofstede, Chris

    2010-09-07

    The number of HIV-1 infected individuals in the Western world continues to rise. More in-depth understanding of regional HIV-1 epidemics is necessary for the optimal design and adequate use of future prevention strategies. The use of a combination of phylogenetic analysis of HIV sequences, with data on patients' demographics, infection route, clinical information and laboratory results, will allow a better characterization of individuals responsible for local transmission. Baseline HIV-1 pol sequences, obtained through routine drug-resistance testing, from 506 patients, newly diagnosed between 2001 and 2009, were used to construct phylogenetic trees and identify transmission-clusters. Patients' demographics, laboratory and clinical data, were retrieved anonymously. Statistical analysis was performed to identify subtype-specific and transmission-cluster-specific characteristics. Multivariate analysis showed significant differences between the 59.7% of individuals with subtype B infection and the 40.3% non-B infected individuals, with regard to route of transmission, origin, infection with Chlamydia (p = 0.01) and infection with Hepatitis C virus (p = 0.017). More and larger transmission-clusters were identified among the subtype B infections (p < 0.001). Overall, in multivariate analysis, clustering was significantly associated with Caucasian origin, infection through homosexual contact and younger age (all p < 0.001). Bivariate analysis additionally showed a correlation between clustering and syphilis (p < 0.001), higher CD4 counts (p = 0.002), Chlamydia infection (p = 0.013) and primary HIV (p = 0.017). Combination of phylogenetics with demographic information, laboratory and clinical data, revealed that HIV-1 subtype B infected Caucasian men-who-have-sex-with-men with high prevalence of sexually transmitted diseases, account for the majority of local HIV-transmissions. This finding elucidates observed epidemiological trends through molecular analysis, and justifies sustained focus in prevention on this high risk group.

  10. Comprehensive analysis of Polygoni Multiflori Radix of different geographical origins using ultra-high-performance liquid chromatography fingerprints and multivariate chemometric methods.

    PubMed

    Sun, Li-Li; Wang, Meng; Zhang, Hui-Jie; Liu, Ya-Nan; Ren, Xiao-Liang; Deng, Yan-Ru; Qi, Ai-Di

    2018-01-01

    Polygoni Multiflori Radix (PMR) is increasingly being used not just as a traditional herbal medicine but also as a popular functional food. In this study, multivariate chemometric methods and mass spectrometry were combined to analyze the ultra-high-performance liquid chromatograph (UPLC) fingerprints of PMR from six different geographical origins. A chemometric strategy based on multivariate curve resolution-alternating least squares (MCR-ALS) and three classification methods is proposed to analyze the UPLC fingerprints obtained. Common chromatographic problems, including the background contribution, baseline contribution, and peak overlap, were handled by the established MCR-ALS model. A total of 22 components were resolved. Moreover, relative species concentrations were obtained from the MCR-ALS model, which was used for multivariate classification analysis. Principal component analysis (PCA) and Ward's method have been applied to classify 72 PMR samples from six different geographical regions. The PCA score plot showed that the PMR samples fell into four clusters, which related to the geographical location and climate of the source areas. The results were then corroborated by Ward's method. In addition, according to the variance-weighted distance between cluster centers obtained from Ward's method, five components were identified as the most significant variables (chemical markers) for cluster discrimination. A counter-propagation artificial neural network has been applied to confirm and predict the effects of chemical markers on different samples. Finally, the five chemical markers were identified by UPLC-quadrupole time-of-flight mass spectrometer. Components 3, 12, 16, 18, and 19 were identified as 2,3,5,4'-tetrahydroxy-stilbene-2-O-β-d-glucoside, emodin-8-O-β-d-glucopyranoside, emodin-8-O-(6'-O-acetyl)-β-d-glucopyranoside, emodin, and physcion, respectively. In conclusion, the proposed method can be applied for the comprehensive analysis of natural samples. Copyright © 2016. Published by Elsevier B.V.

  11. Motivational Profiles of Adult Learners

    ERIC Educational Resources Information Center

    Rothes, Ana; Lemos, Marina S.; Gonçalves, Teresa

    2017-01-01

    This study investigated profiles of autonomous and controlled motivation and their effects in a sample of 188 adult learners from two Portuguese urban areas. Using a person-centered approach, results of cluster analysis and multivariate analysis of covariance revealed four motivational groups with different effects in self-efficacy, engagement,…

  12. Mean Comparison: Manifest Variable versus Latent Variable

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Bentler, Peter M.

    2006-01-01

    An extension of multiple correspondence analysis is proposed that takes into account cluster-level heterogeneity in respondents' preferences/choices. The method involves combining multiple correspondence analysis and k-means in a unified framework. The former is used for uncovering a low-dimensional space of multivariate categorical variables…

  13. Muscle ischaemia associated with NXP2 autoantibodies: a severe subtype of juvenile dermatomyositis.

    PubMed

    Aouizerate, Jessie; De Antonio, Marie; Bader-Meunier, Brigitte; Barnerias, Christine; Bodemer, Christine; Isapof, Arnaud; Quartier, Pierre; Melki, Isabelle; Charuel, Jean-Luc; Bassez, Guillaume; Desguerre, Isabelle; Gherardi, Romain K; Authier, François-Jérôme; Gitiaux, Cyril

    2018-05-01

    Myositis-specific autoantibodies (MSAs) are increasingly used to delineate distinct subgroups of JDM. The aim of our study was to explore without a priori hypotheses whether MSAs are associated with distinct clinical-pathological changes and severity in a monocentric JDM cohort. Clinical, biological and histological findings from 23 JDM patients were assessed. Twenty-six histopathological parameters were subjected to multivariate analysis. Autoantibodies included anti-NXP2 (9/23), anti-TIF1γ (4/23), anti-MDA5 (2/23), no MSAs (8/23). Multivariate analysis yielded two histopathological clusters. Cluster 1 (n = 11) showed a more severe and ischaemic pattern than cluster 2 (n = 12) assessed by: total score severity ⩾ 20 (100.0% vs 25.0%); visual analogic score ⩾6 (100.0% vs 25.0%); the vascular domain score >1 (100.0% vs 41.7%); microinfarcts (100% vs 58.3%); ischaemic myofibrillary loss (focal punched-out vacuoles) (90.9 vs 25%); and obvious capillary loss (81.8% vs 16.7). Compared with cluster 2, patients in cluster 1 had strikingly more often anti-NXP2 antibodies (7/11 vs 2/12), more pronounced muscle weakness, more gastrointestinal involvement and required more aggressive treatment. Furthermore, patients with anti-NXP2 antibodies, mostly assigned in the first cluster, also displayed more severe muscular disease, requiring more aggressive treatment and having a lower remission rate during the follow-up period. Marked muscle ischaemic involvement and the presence of anti-NXP2 autoantibodies are associated with more severe forms of JDM.

  14. Spatial characterization of dissolved trace elements and heavy metals in the upper Han River (China) using multivariate statistical techniques.

    PubMed

    Li, Siyue; Zhang, Quanfa

    2010-04-15

    A data matrix (4032 observations), obtained during a 2-year monitoring period (2005-2006) from 42 sites in the upper Han River is subjected to various multivariate statistical techniques including cluster analysis, principal component analysis (PCA), factor analysis (FA), correlation analysis and analysis of variance to determine the spatial characterization of dissolved trace elements and heavy metals. Our results indicate that waters in the upper Han River are primarily polluted by Al, As, Cd, Pb, Sb and Se, and the potential pollutants include Ba, Cr, Hg, Mn and Ni. Spatial distribution of trace metals indicates the polluted sections mainly concentrate in the Danjiang, Danjiangkou Reservoir catchment and Hanzhong Plain, and the most contaminated river is in the Hanzhong Plain. Q-model clustering depends on geographical location of sampling sites and groups the 42 sampling sites into four clusters, i.e., Danjiang, Danjiangkou Reservoir region (lower catchment), upper catchment and one river in headwaters pertaining to water quality. The headwaters, Danjiang and lower catchment, and upper catchment correspond to very high polluted, moderate polluted and relatively low polluted regions, respectively. Additionally, PCA/FA and correlation analysis demonstrates that Al, Cd, Mn, Ni, Fe, Si and Sr are controlled by natural sources, whereas the other metals appear to be primarily controlled by anthropogenic origins though geogenic source contributing to them. 2009 Elsevier B.V. All rights reserved.

  15. Source Evaluation and Trace Metal Contamination in Benthic Sediments from Equatorial Ecosystems Using Multivariate Statistical Techniques

    PubMed Central

    Benson, Nsikak U.; Asuquo, Francis E.; Williams, Akan B.; Essien, Joseph P.; Ekong, Cyril I.; Akpabio, Otobong; Olajire, Abaas A.

    2016-01-01

    Trace metals (Cd, Cr, Cu, Ni and Pb) concentrations in benthic sediments were analyzed through multi-step fractionation scheme to assess the levels and sources of contamination in estuarine, riverine and freshwater ecosystems in Niger Delta (Nigeria). The degree of contamination was assessed using the individual contamination factors (ICF) and global contamination factor (GCF). Multivariate statistical approaches including principal component analysis (PCA), cluster analysis and correlation test were employed to evaluate the interrelationships and associated sources of contamination. The spatial distribution of metal concentrations followed the pattern Pb>Cu>Cr>Cd>Ni. Ecological risk index by ICF showed significant potential mobility and bioavailability for Cu, Cu and Ni. The ICF contamination trend in the benthic sediments at all studied sites was Cu>Cr>Ni>Cd>Pb. The principal component and agglomerative clustering analyses indicate that trace metals contamination in the ecosystems was influenced by multiple pollution sources. PMID:27257934

  16. Analysis and assessment on heavy metal sources in the coastal soils developed from alluvial deposits using multivariate statistical methods.

    PubMed

    Li, Jinling; He, Ming; Han, Wei; Gu, Yifan

    2009-05-30

    An investigation on heavy metal sources, i.e., Cu, Zn, Ni, Pb, Cr, and Cd in the coastal soils of Shanghai, China, was conducted using multivariate statistical methods (principal component analysis, clustering analysis, and correlation analysis). All the results of the multivariate analysis showed that: (i) Cu, Ni, Pb, and Cd had anthropogenic sources (e.g., overuse of chemical fertilizers and pesticides, industrial and municipal discharges, animal wastes, sewage irrigation, etc.); (ii) Zn and Cr were associated with parent materials and therefore had natural sources (e.g., the weathering process of parent materials and subsequent pedo-genesis due to the alluvial deposits). The effect of heavy metals in the soils was greatly affected by soil formation, atmospheric deposition, and human activities. These findings provided essential information on the possible sources of heavy metals, which would contribute to the monitoring and assessment process of agricultural soils in worldwide regions.

  17. Discrimination of three Pegaga (Centella) varieties and determination of growth-lighting effects on metabolites content based on the chemometry of 1H nuclear magnetic resonance spectroscopy.

    PubMed

    H, Maulidiani; Khatib, Alfi; Shaari, Khozirah; Abas, Faridah; Shitan, Mahendran; Kneer, Ralf; Neto, Victor; Lajis, Nordin H

    2012-01-11

    The metabolites of three species of Apiaceae, also known as Pegaga, were analyzed utilizing (1)H NMR spectroscopy and multivariate data analysis. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) resolved the species, Centella asiatica, Hydrocotyle bonariensis, and Hydrocotyle sibthorpioides, into three clusters. The saponins, asiaticoside and madecassoside, along with chlorogenic acids were the metabolites that contributed most to the separation. Furthermore, the effects of growth-lighting condition to metabolite contents were also investigated. The extracts of C. asiatica grown in full-day light exposure exhibited a stronger radical scavenging activity and contained more triterpenes (asiaticoside and madecassoside), flavonoids, and chlorogenic acids as compared to plants grown in 50% shade. This study established the potential of using a combination of (1)H NMR spectroscopy and multivariate data analyses in differentiating three closely related species and the effects of growth lighting, based on their metabolite contents and identification of the markers contributing to their differences.

  18. Using multivariate techniques to assess the effects of urbanization on surface water quality: a case study in the Liangjiang New Area, China.

    PubMed

    Luo, Kun; Hu, Xuebin; He, Qiang; Wu, Zhengsong; Cheng, Hao; Hu, Zhenlong; Mazumder, Asit

    2017-04-01

    Rapid urbanization in China has been causing dramatic deterioration in the water quality of rivers and threatening aquatic ecosystem health. In this paper, multivariate techniques, such as factor analysis (FA) and cluster analysis (CA), were applied to analyze the water quality datasets for 19 rivers in Liangjiang New Area (LJNA), China, collected in April (dry season) and September (wet season) of 2014 and 2015. In most sampling rivers, total phosphorus, total nitrogen, and fecal coliform exceeded the Class V guideline (GB3838-2002), which could thereby threaten the water quality in Yangtze and Jialing Rivers. FA clearly identified the five groups of water quality variables, which explain majority of the experimental data. Nutritious pollution, seasonal changes, and construction activities were three key factors influencing rivers' water quality in LJNA. CA grouped 19 sampling sites into two clusters, which located at sub-catchments with high- and low-level urbanization, respectively. One-way ANOVA showed the nutrients (total phosphorus, soluble reactive phosphorus, total nitrogen, ammonium nitrogen, and nitrite), fecal coliform, and conductivity in cluster 1 were significantly greater than in cluster 2. Thus, catchment urbanization degraded rivers' water quality in Liangjiang New Area. Identifying effective buffer zones at riparian scale to weaken the negative impacts of catchment urbanization was recommended.

  19. Some Integrated Squared Error Procedures for Multivariate Normal Data,

    DTIC Science & Technology

    1986-01-01

    a lnear regresmion or experimental design model). Our procedures have &lSO been usned wcelyOn non -linear models but we do not addres nan-lnear...of fit, outliers, influence functions, experimental design , cluster analysis, robustness 24L A =TO ACT (VCefme - pvre alli of magsy MW identif by...structured data such as multivariate experimental designs . Several illustrations are provided. * 0 %41 %-. 4.’. * " , -.--, ,. -,, ., -, ’v ’ , " ,,- ,, . -,-. . ., * . - tAma- t

  20. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sreepathi, Sarat; Kumar, Jitendra; Mills, Richard T.

    A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like themore » Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.« less

  1. Clustering of Variables for Mixed Data

    NASA Astrophysics Data System (ADS)

    Saracco, J.; Chavent, M.

    2016-05-01

    This chapter presents clustering of variables which aim is to lump together strongly related variables. The proposed approach works on a mixed data set, i.e. on a data set which contains numerical variables and categorical variables. Two algorithms of clustering of variables are described: a hierarchical clustering and a k-means type clustering. A brief description of PCAmix method (that is a principal component analysis for mixed data) is provided, since the calculus of the synthetic variables summarizing the obtained clusters of variables is based on this multivariate method. Finally, the R packages ClustOfVar and PCAmixdata are illustrated on real mixed data. The PCAmix and ClustOfVar approaches are first used for dimension reduction (step 1) before applying in step 2 a standard clustering method to obtain groups of individuals.

  2. ANALYSIS OF LOTIC MACROINVERTEBRATE ASSEMBLAGES IN CALIFORNIA'S CENTRAL VALLEY

    EPA Science Inventory

    Using multivariate and cluster analyses, we examined the relaitonships between chemical and physical characteristics and macroinvertebrate assemblages at sites sampled by R-EMAP in California's Central Valley. By contrasting results where community structure was summarized as met...

  3. Is It Feasible to Identify Natural Clusters of TSC-Associated Neuropsychiatric Disorders (TAND)?

    PubMed

    Leclezio, Loren; Gardner-Lubbe, Sugnet; de Vries, Petrus J

    2018-04-01

    Tuberous sclerosis complex (TSC) is a genetic disorder with multisystem involvement. The lifetime prevalence of TSC-Associated Neuropsychiatric Disorders (TAND) is in the region of 90% in an apparently unique, individual pattern. This "uniqueness" poses significant challenges for diagnosis, psycho-education, and intervention planning. To date, no studies have explored whether there may be natural clusters of TAND. The purpose of this feasibility study was (1) to investigate the practicability of identifying natural TAND clusters, and (2) to identify appropriate multivariate data analysis techniques for larger-scale studies. TAND Checklist data were collected from 56 individuals with a clinical diagnosis of TSC (n = 20 from South Africa; n = 36 from Australia). Using R, the open-source statistical platform, mean squared contingency coefficients were calculated to produce a correlation matrix, and various cluster analyses and exploratory factor analysis were examined. Ward's method rendered six TAND clusters with good face validity and significant convergence with a six-factor exploratory factor analysis solution. The "bottom-up" data-driven strategies identified a "scholastic" cluster of TAND manifestations, an "autism spectrum disorder-like" cluster, a "dysregulated behavior" cluster, a "neuropsychological" cluster, a "hyperactive/impulsive" cluster, and a "mixed/mood" cluster. These feasibility results suggest that a combination of cluster analysis and exploratory factor analysis methods may be able to identify clinically meaningful natural TAND clusters. Findings require replication and expansion in larger dataset, and could include quantification of cluster or factor scores at an individual level. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. REGIONAL-SCALE WIND FIELD CLASSIFICATION EMPLOYING CLUSTER ANALYSIS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Glascoe, L G; Glaser, R E; Chin, H S

    2004-06-17

    The classification of time-varying multivariate regional-scale wind fields at a specific location can assist event planning as well as consequence and risk analysis. Further, wind field classification involves data transformation and inference techniques that effectively characterize stochastic wind field variation. Such a classification scheme is potentially useful for addressing overall atmospheric transport uncertainty and meteorological parameter sensitivity issues. Different methods to classify wind fields over a location include the principal component analysis of wind data (e.g., Hardy and Walton, 1978) and the use of cluster analysis for wind data (e.g., Green et al., 1992; Kaufmann and Weber, 1996). The goalmore » of this study is to use a clustering method to classify the winds of a gridded data set, i.e, from meteorological simulations generated by a forecast model.« less

  5. Improved Ant Colony Clustering Algorithm and Its Performance Study

    PubMed Central

    Gao, Wei

    2016-01-01

    Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533

  6. Computer-based self-organized tectonic zoning: a tentative pattern recognition for Iran

    NASA Astrophysics Data System (ADS)

    Zamani, Ahmad; Hashemi, Naser

    2004-08-01

    Conventional methods of tectonic zoning are frequently characterized by two deficiencies. The first one is the large uncertainty involved in tectonic zoning based on non-quantitative and subjective analysis. Failure to interpret accurately a large amount of data "by eye" is the second. In order to alleviate each of these deficiencies, the multivariate statistical method of cluster analysis has been utilized to seek and separate zones with similar tectonic pattern and construct automated self-organized multivariate tectonic zoning maps. This analytical method of tectonic regionalization is particularly useful for showing trends in tectonic evolution of a region that could not be discovered by any other means. To illustrate, this method has been applied for producing a general-purpose numerical tectonic zoning map of Iran. While there are some similarities between the self-organized multivariate numerical maps and the conventional maps, the cluster solution maps reveal some remarkable features that cannot be observed on the current tectonic maps. The following specific examples need to be noted: (1) The much disputed extent and rigidity of the Lut Rigid Block, described as the microplate of east Iran, is clearly revealed on the self-organized numerical maps. (2) The cluster solution maps reveal a striking similarity between this microplate and the northern Central Iran—including the Great Kavir region. (3) Contrary to the conventional map, the cluster solution maps make a clear distinction between the East Iranian Ranges and the Makran Mountains. (4) Moreover, an interesting similarity between the Azarbaijan region in the northwest and the Makran Mountains in the southeast and between the Kopet Dagh Ranges in the northeast and the Zagros Folded Belt in the southwest of Iran are revealed in the clustering process. This new approach to tectonic zoning is a starting point and is expected to be improved and refined by collection of new data. The method is also a useful tool in studying neotectonics, seismotectonics, seismic zoning, and hazard estimation of the seismogenic regions.

  7. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    NASA Astrophysics Data System (ADS)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  8. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra.

    PubMed

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-03-13

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models' performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  9. A Framework for Establishing Standard Reference Scale of Texture by Multivariate Statistical Analysis Based on Instrumental Measurement and Sensory Evaluation.

    PubMed

    Zhi, Ruicong; Zhao, Lei; Xie, Nan; Wang, Houyin; Shi, Bolin; Shi, Jingye

    2016-01-13

    A framework of establishing standard reference scale (texture) is proposed by multivariate statistical analysis according to instrumental measurement and sensory evaluation. Multivariate statistical analysis is conducted to rapidly select typical reference samples with characteristics of universality, representativeness, stability, substitutability, and traceability. The reasonableness of the framework method is verified by establishing standard reference scale of texture attribute (hardness) with Chinese well-known food. More than 100 food products in 16 categories were tested using instrumental measurement (TPA test), and the result was analyzed with clustering analysis, principal component analysis, relative standard deviation, and analysis of variance. As a result, nine kinds of foods were determined to construct the hardness standard reference scale. The results indicate that the regression coefficient between the estimated sensory value and the instrumentally measured value is significant (R(2) = 0.9765), which fits well with Stevens's theory. The research provides reliable a theoretical basis and practical guide for quantitative standard reference scale establishment on food texture characteristics.

  10. Human Adenocarcinoma Cell Line Sensitivity to Essential Oil Phytocomplexes from Pistacia Species: a Multivariate Approach.

    PubMed

    Buriani, Alessandro; Fortinguerra, Stefano; Sorrenti, Vincenzo; Dall'Acqua, Stefano; Innocenti, Gabbriella; Montopoli, Monica; Gabbia, Daniela; Carrara, Maria

    2017-08-11

    Principal component analysis (PCA) multivariate analysis was applied to study the cytotoxic activity of essential oils from various species of the Pistacia genus on human tumor cell lines. In particular, the cytotoxic activity of essential oils obtained from P. lentiscus , P. lentiscus var. chia (mastic gum), P. terebinthus , P. vera , and P. integerrima , was screened on three human adenocarcinoma cell lines: MCF-7 (breast), 2008 (ovarian), and LoVo (colon). The results indicate that all the Pistacia phytocomplexes, with the exception of mastic gum oil, induce cytotoxic effects on one or more of the three cell lines. PCA highlighted the presence of different cooperating clusters of bioactive molecules. Cluster variability among species, and even within the same species, could explain some of the differences seen among samples suggesting the presence of both common and species-specific mechanisms. Single molecules from one of the most significant clusters were tested, but only bornyl-acetate presented cytotoxic activity, although at much higher concentrations (IC 50 = 138.5 µg/mL) than those present in the essential oils, indicating that understanding of the full biological effect requires a holistic vision of the phytocomplexes with all its constituents.

  11. An open-source software package for multivariate modeling and clustering: applications to air quality management.

    PubMed

    Wang, Xiuquan; Huang, Guohe; Zhao, Shan; Guo, Junhong

    2015-09-01

    This paper presents an open-source software package, rSCA, which is developed based upon a stepwise cluster analysis method and serves as a statistical tool for modeling the relationships between multiple dependent and independent variables. The rSCA package is efficient in dealing with both continuous and discrete variables, as well as nonlinear relationships between the variables. It divides the sample sets of dependent variables into different subsets (or subclusters) through a series of cutting and merging operations based upon the theory of multivariate analysis of variance (MANOVA). The modeling results are given by a cluster tree, which includes both intermediate and leaf subclusters as well as the flow paths from the root of the tree to each leaf subcluster specified by a series of cutting and merging actions. The rSCA package is a handy and easy-to-use tool and is freely available at http://cran.r-project.org/package=rSCA . By applying the developed package to air quality management in an urban environment, we demonstrate its effectiveness in dealing with the complicated relationships among multiple variables in real-world problems.

  12. Multivariate Analysis and Its Applications

    DTIC Science & Technology

    1989-02-14

    defined in situations where measurements are taken on natural clusters of individuals like brothers in a family. A number of problems arise in the study of...intraclass correlations. How do we estimate it when observations are available on clusters of different sizes? How do we test the hypothesis that the...the random variable y(X) = #I X + G2X 2 + ... + GmX m , follows an exponential distribution with mean unity. Such a class of life distributions, has a

  13. Nearest clusters based partial least squares discriminant analysis for the classification of spectral data.

    PubMed

    Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar

    2018-06-07

    Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. Multivariate Statistical Analysis of Water Quality data in Indian River Lagoon, Florida

    NASA Astrophysics Data System (ADS)

    Sayemuzzaman, M.; Ye, M.

    2015-12-01

    The Indian River Lagoon, is part of the longest barrier island complex in the United States, is a region of particular concern to the environmental scientist because of the rapid rate of human development throughout the region and the geographical position in between the colder temperate zone and warmer sub-tropical zone. Thus, the surface water quality analysis in this region always brings the newer information. In this present study, multivariate statistical procedures were applied to analyze the spatial and temporal water quality in the Indian River Lagoon over the period 1998-2013. Twelve parameters have been analyzed on twelve key water monitoring stations in and beside the lagoon on monthly datasets (total of 27,648 observations). The dataset was treated using cluster analysis (CA), principle component analysis (PCA) and non-parametric trend analysis. The CA was used to cluster twelve monitoring stations into four groups, with stations on the similar surrounding characteristics being in the same group. The PCA was then applied to the similar groups to find the important water quality parameters. The principal components (PCs), PC1 to PC5 was considered based on the explained cumulative variances 75% to 85% in each cluster groups. Nutrient species (phosphorus and nitrogen), salinity, specific conductivity and erosion factors (TSS, Turbidity) were major variables involved in the construction of the PCs. Statistical significant positive or negative trends and the abrupt trend shift were detected applying Mann-Kendall trend test and Sequential Mann-Kendall (SQMK), for each individual stations for the important water quality parameters. Land use land cover change pattern, local anthropogenic activities and extreme climate such as drought might be associated with these trends. This study presents the multivariate statistical assessment in order to get better information about the quality of surface water. Thus, effective pollution control/management of the surface waters can be undertaken.

  15. A Novel Approach to Detect Accelerated Aged and Surface-Mediated Degradation in Explosives by UPLC-ESI-MS.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beppler, Christina L

    2015-12-01

    A new approach was created for studying energetic material degradation. This approach involved detecting and tentatively identifying non-volatile chemical species by liquid chromatography-mass spectrometry (LC-MS) with multivariate statistical data analysis that form as the CL-20 energetic material thermally degraded. Multivariate data analysis showed clear separation and clustering of samples based on sample group: either pristine or aged material. Further analysis showed counter-clockwise trends in the principal components analysis (PCA), a type of multivariate data analysis, Scores plots. These trends may indicate that there was a discrete shift in the chemical markers as the went from pristine to aged material, andmore » then again when the aged CL-20 mixed with a potentially incompatible material was thermally aged for 4, 6, or 9 months. This new approach to studying energetic material degradation should provide greater knowledge of potential degradation markers in these materials.« less

  16. BIVARIATE MODELLING OF CLUSTERED CONTINUOUS AND ORDERED CATEGORICAL OUTCOMES. (R824757)

    EPA Science Inventory

    Simultaneous observation of continuous and ordered categorical outcomes for each subject is common in biomedical research but multivariate analysis of the data is complicated by the multiple data types. Here we construct a model for the joint distribution of bivariate continuous ...

  17. Image-based compound profiling reveals a dual inhibitor of tyrosine kinase and microtubule polymerization.

    PubMed

    Tanabe, Kenji

    2016-04-27

    Small-molecule compounds are widely used as biological research tools and therapeutic drugs. Therefore, uncovering novel targets of these compounds should provide insights that are valuable in both basic and clinical studies. I developed a method for image-based compound profiling by quantitating the effects of compounds on signal transduction and vesicle trafficking of epidermal growth factor receptor (EGFR). Using six signal transduction molecules and two markers of vesicle trafficking, 570 image features were obtained and subjected to multivariate analysis. Fourteen compounds that affected EGFR or its pathways were classified into four clusters, based on their phenotypic features. Surprisingly, one EGFR inhibitor (CAS 879127-07-8) was classified into the same cluster as nocodazole, a microtubule depolymerizer. In fact, this compound directly depolymerized microtubules. These results indicate that CAS 879127-07-8 could be used as a chemical probe to investigate both the EGFR pathway and microtubule dynamics. The image-based multivariate analysis developed herein has potential as a powerful tool for discovering unexpected drug properties.

  18. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

    PubMed

    Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

    2011-03-04

    Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

  19. Countries population determination to test rice crisis indicator at national level using k-means cluster analysis

    NASA Astrophysics Data System (ADS)

    Hidayat, Y.; Purwandari, T.; Sukono; Ariska, Y. D.

    2017-01-01

    This study aimed to obtain information on the population of the countries which is have similarities with Indonesia based on three characteristics, that is the democratic atmosphere, rice consumption and purchasing power of rice. It is useful as a reference material for research which tested the strength and predictability of the rice crisis indicators Unprecedented Restlessness (UR). The similarities countries with Indonesia were conducted using multivariate analysis that is non-hierarchical cluster analysis k-Means with 38 countries as the data population. This analysis is done repeatedly until the obtainment number of clusters which is capable to show the differentiator power of the three characteristics and describe the high similarity within clusters. Based on the results, it turns out with 6 clusters can describe the differentiator power of characteristics of formed clusters. However, to answer the purpose of the study, only one cluster which will be taken accordance with the criteria of success for the population of countries that have similarities with Indonesia that cluster contain Indonesia therein, there are countries which is sustain crisis and non-crisis of rice in 2008, and cluster which is have the largest member among them. This criterion is met by cluster 2, which consists of 22 countries, namely Indonesia, Brazil, Costa Rica, Djibouti, Dominican Republic, Ecuador, Fiji, Guinea-Bissau, Haiti, India, Jamaica, Japan, Korea South, Madagascar, Malaysia, Mali, Nicaragua, Panama, Peru, Senegal, Sierra Leone and Suriname.

  20. Spatial assessment of air quality patterns in Malaysia using multivariate analysis

    NASA Astrophysics Data System (ADS)

    Dominick, Doreena; Juahir, Hafizan; Latif, Mohd Talib; Zain, Sharifuddin M.; Aris, Ahmad Zaharin

    2012-12-01

    This study aims to investigate possible sources of air pollutants and the spatial patterns within the eight selected Malaysian air monitoring stations based on a two-year database (2008-2009). The multivariate analysis was applied on the dataset. It incorporated Hierarchical Agglomerative Cluster Analysis (HACA) to access the spatial patterns, Principal Component Analysis (PCA) to determine the major sources of the air pollution and Multiple Linear Regression (MLR) to assess the percentage contribution of each air pollutant. The HACA results grouped the eight monitoring stations into three different clusters, based on the characteristics of the air pollutants and meteorological parameters. The PCA analysis showed that the major sources of air pollution were emissions from motor vehicles, aircraft, industries and areas of high population density. The MLR analysis demonstrated that the main pollutant contributing to variability in the Air Pollutant Index (API) at all stations was particulate matter with a diameter of less than 10 μm (PM10). Further MLR analysis showed that the main air pollutant influencing the high concentration of PM10 was carbon monoxide (CO). This was due to combustion processes, particularly originating from motor vehicles. Meteorological factors such as ambient temperature, wind speed and humidity were also noted to influence the concentration of PM10.

  1. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

    PubMed Central

    Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260

  2. Diversity in phenotypic and nutritional traits in vegetable amaranth (Amaranthus tricolor), a nutritionally underutilised crop.

    PubMed

    Shukla, Sudhir; Bhargava, Atul; Chatterjee, Avijeet; Pandey, Avinash Chandra; Mishra, Brij K

    2010-01-15

    Assessment of genetic diversity in a crop-breeding programme helps in the identification of diverse parental combinations to create segregating progenies with maximum genetic variability and facilitates introgression of desirable genes from diverse germplasm into the available genetic base. In the present study, 39 strains of vegetable amaranth (Amaranthus tricolor) were evaluated for eight morphological and seven quality traits for two test seasons to study the extent of genetic divergence among the strains. Multivariate analysis showed that the first four principal components contributed 67.55% of the variability. Cluster analysis grouped the strains into six clusters that displayed a wide range of diversity for most of the traits. Cluster analysis has proved to be an effective method in grouping strains that may facilitate effective management and utilisation in crop-breeding programmes. The diverse strains falling in different clusters were identified, which can be utilised in different hybridisation programmes to develop high-foliage-yielding varieties rich in nutritional components. Copyright (c) 2009 Society of Chemical Industry.

  3. Mapping the Diversity among Runaways: A Descriptive Multivariate Analysis of Selected Social Psychological Background Conditions.

    ERIC Educational Resources Information Center

    Brennan, Tim

    1980-01-01

    A review of prior classification systems of runaways is followed by a descriptive taxonomy of runaways developed using cluster-analytic methods. The empirical types illustrate patterns of weakness in bonds between runaways and families, schools, or peer relationships. (Author)

  4. Potential use of MCR-ALS for the identification of coeliac-related biochemical changes in hyperspectral Raman maps from pediatric intestinal biopsies.

    PubMed

    Fornasaro, Stefano; Vicario, Annalisa; De Leo, Luigina; Bonifacio, Alois; Not, Tarcisio; Sergo, Valter

    2018-05-14

    Raman hyperspectral imaging is an emerging practice in biological and biomedical research for label free analysis of tissues and cells. Using this method, both spatial distribution and spectral information of analyzed samples can be obtained. The current study reports the first Raman microspectroscopic characterisation of colon tissues from patients with Coeliac Disease (CD). The aim was to assess if Raman imaging coupled with hyperspectral multivariate image analysis is capable of detecting the alterations in the biochemical composition of intestinal tissues associated with CD. The analytical approach was based on a multi-step methodology: duodenal biopsies from healthy and coeliac patients were measured and processed with Multivariate Curve Resolution Alternating Least Squares (MCR-ALS). Based on the distribution maps and the pure spectra of the image constituents obtained from MCR-ALS, interesting biochemical differences between healthy and coeliac patients has been derived. Noticeably, a reduced distribution of complex lipids in the pericryptic space, and a different distribution and abundance of proteins rich in beta-sheet structures was found in CD patients. The output of the MCR-ALS analysis was then used as a starting point for two clustering algorithms (k-means clustering and hierarchical clustering methods). Both methods converged with similar results providing precise segmentation over multiple Raman images of studied tissues.

  5. Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of LANDSAT digital data

    NASA Technical Reports Server (NTRS)

    Ballew, G.

    1977-01-01

    The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

  6. COVARIATE-ADAPTIVE CLUSTERING OF EXPOSURES FOR AIR POLLUTION EPIDEMIOLOGY COHORTS*

    PubMed Central

    Keller, Joshua P.; Drton, Mathias; Larson, Timothy; Kaufman, Joel D.; Sandler, Dale P.; Szpiro, Adam A.

    2017-01-01

    Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive k-means procedure identifies centers using a mixture model and is followed by multi-class spatial prediction. In simulations, we demonstrate that predictive k-means can reduce misclassification error by over 50% compared to ordinary k-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive k-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM2.5) exposure varies significantly between different clusters of PM2.5 component profiles. Our cluster-based analysis shows that for subjects assigned to a cluster located in the Midwestern U.S., a 10 μg/m3 difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP. PMID:28572869

  7. Skill Assessment for Coupled Biological/Physical Models of Marine Systems

    DTIC Science & Technology

    2009-01-01

    cluster analysis e.g., Clark and Corley, 2006) and shown that the dimensions of the problem can be reduced and multivariate and univariate goodness...information; a follow-up analysis (Arhonditsis et al., 2006) reported no relationship between the level of skill assessment presented or the accuracy of the...uncertainty analysis (Beck, 1987), model selection (Kass and Raftery, 1995), model averaging (Hoeting et al., 1999), and scores for probabilistic

  8. DENBRAN: A basic program for a significance test for multivariate normality of clusters from branching patterns in dendrograms

    NASA Astrophysics Data System (ADS)

    Sneath, P. H. A.

    A BASIC program is presented for significance tests to determine whether a dendrogram is derived from clustering of points that belong to a single multivariate normal distribution. The significance tests are based on statistics of the Kolmogorov—Smirnov type, obtained by comparing the observed cumulative graph of branch levels with a graph for the hypothesis of multivariate normality. The program also permits testing whether the dendrogram could be from a cluster of lower dimensionality due to character correlations. The program makes provision for three similarity coefficients, (1) Euclidean distances, (2) squared Euclidean distances, and (3) Simple Matching Coefficients, and for five cluster methods (1) WPGMA, (2) UPGMA, (3) Single Linkage (or Minimum Spanning Trees), (4) Complete Linkage, and (5) Ward's Increase in Sums of Squares. The program is entitled DENBRAN.

  9. Sputum neutrophil counts are associated with more severe asthma phenotypes using cluster analysis.

    PubMed

    Moore, Wendy C; Hastie, Annette T; Li, Xingnan; Li, Huashi; Busse, William W; Jarjour, Nizar N; Wenzel, Sally E; Peters, Stephen P; Meyers, Deborah A; Bleecker, Eugene R

    2014-06-01

    Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified 5 asthma subphenotypes that represent the severity spectrum of early-onset allergic asthma, late-onset severe asthma, and severe asthma with chronic obstructive pulmonary disease characteristics. Analysis of induced sputum from a subset of SARP subjects showed 4 sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophil (≥2%) and neutrophil (≥40%) percentages had characteristics of very severe asthma. To better understand interactions between inflammation and clinical subphenotypes, we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Participants in SARP who underwent sputum induction at 3 clinical sites were included in this analysis (n = 423). Fifteen variables, including clinical characteristics and blood and sputum inflammatory cell assessments, were selected using factor analysis for unsupervised cluster analysis. Four phenotypic clusters were identified. Cluster A (n = 132) and B (n = 127) subjects had mild-to-moderate early-onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of cluster C (n = 117) and D (n = 47) subjects who had moderate-to-severe asthma with frequent health care use despite treatment with high doses of inhaled or oral corticosteroids and, in cluster D, reduced lung function. The majority of these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophil percentages were the most important variables determining cluster assignment. This multivariate approach identified 4 asthma subphenotypes representing the severity spectrum from mild-to-moderate allergic asthma with minimal or eosinophil-predominant sputum inflammation to moderate-to-severe asthma with neutrophil-predominant or mixed granulocytic inflammation. Published by Mosby, Inc.

  10. Sputum neutrophils are associated with more severe asthma phenotypes using cluster analysis

    PubMed Central

    Moore, Wendy C.; Hastie, Annette T.; Li, Xingnan; Li, Huashi; Busse, William W.; Jarjour, Nizar N.; Wenzel, Sally E.; Peters, Stephen P.; Meyers, Deborah A.; Bleecker, Eugene R.

    2013-01-01

    Background Clinical cluster analysis from the Severe Asthma Research Program (SARP) identified five asthma subphenotypes that represent the severity spectrum of early onset allergic asthma, late onset severe asthma and severe asthma with COPD characteristics. Analysis of induced sputum from a subset of SARP subjects showed four sputum inflammatory cellular patterns. Subjects with concurrent increases in eosinophils (≥2%) and neutrophils (≥40%) had characteristics of very severe asthma. Objective To better understand interactions between inflammation and clinical subphenotypes we integrated inflammatory cellular measures and clinical variables in a new cluster analysis. Methods Participants in SARP at three clinical sites who underwent sputum induction were included in this analysis (n=423). Fifteen variables including clinical characteristics and blood and sputum inflammatory cell assessments were selected by factor analysis for unsupervised cluster analysis. Results Four phenotypic clusters were identified. Cluster A (n=132) and B (n=127) subjects had mild-moderate early onset allergic asthma with paucigranulocytic or eosinophilic sputum inflammatory cell patterns. In contrast, these inflammatory patterns were present in only 7% of Cluster C (n=117) and D (n=47) subjects who had moderate-severe asthma with frequent health care utilization despite treatment with high doses of inhaled or oral corticosteroids, and in Cluster D, reduced lung function. The majority these subjects (>83%) had sputum neutrophilia either alone or with concurrent sputum eosinophilia. Baseline lung function and sputum neutrophils were the most important variables determining cluster assignment. Conclusion This multivariate approach identified four asthma subphenotypes representing the severity spectrum from mild-moderate allergic asthma with minimal or eosinophilic predominant sputum inflammation to moderate-severe asthma with neutrophilic predominant or mixed granulocytic inflammation. PMID:24332216

  11. Clustering of unhealthy outdoor advertisements around child-serving institutions: a comparison of three cities.

    PubMed

    Hillier, Amy; Cole, Brian L; Smith, Tony E; Yancey, Antronette K; Williams, Jerome D; Grier, Sonya A; McCarthy, William J

    2009-12-01

    Using GPS devices and digital cameras, we surveyed outdoor advertisements in Austin, Los Angeles and Philadelphia. GIS and hot spot analysis revealed that unhealthy ads were clustered around child-serving institutions in Los Angeles and Philadelphia but not in Austin. Multivariate generalized least square (GLS) regression models showed that percent black (p<0.04) was a significant positive predictor of clustering in Philadelphia and percent white (p<0.06) was a marginally significant negative predictor of clustering in Los Angeles after controlling for several land use variables. The results emphasize the importance of zoning and land use regulations to protect children from exposure to unhealthy commercial messages, particularly in neighborhoods with significant racial/ethnic minority populations.

  12. Application of multivariate analysis to investigate the trace element contamination in top soil of coal mining district in Jorong, South Kalimantan, Indonesia

    NASA Astrophysics Data System (ADS)

    Pujiwati, Arie; Nakamura, K.; Watanabe, N.; Komai, T.

    2018-02-01

    Multivariate analysis is applied to investigate geochemistry of several trace elements in top soils and their relation with the contamination source as the influence of coal mines in Jorong, South Kalimantan. Total concentration of Cd, V, Co, Ni, Cr, Zn, As, Pb, Sb, Cu and Ba was determined in 20 soil samples by the bulk analysis. Pearson correlation is applied to specify the linear correlation among the elements. Principal Component Analysis (PCA) and Cluster Analysis (CA) were applied to observe the classification of trace elements and contamination sources. The results suggest that contamination loading is contributed by Cr, Cu, Ni, Zn, As, and Pb. The elemental loading mostly affects the non-coal mining area, for instances the area near settlement and agricultural land use. Moreover, the contamination source is classified into the areas that are influenced by the coal mining activity, the agricultural types, and the river mixing zone. Multivariate analysis could elucidate the elemental loading and the contamination sources of trace elements in the vicinity of coal mine area.

  13. Characterization of spatial and temporal variability in hydrochemistry of Johor Straits, Malaysia.

    PubMed

    Abdullah, Pauzi; Abdullah, Sharifah Mastura Syed; Jaafar, Othman; Mahmud, Mastura; Khalik, Wan Mohd Afiq Wan Mohd

    2015-12-15

    Characterization of hydrochemistry changes in Johor Straits within 5 years of monitoring works was successfully carried out. Water quality data sets (27 stations and 19 parameters) collected in this area were interpreted subject to multivariate statistical analysis. Cluster analysis grouped all the stations into four clusters ((Dlink/Dmax) × 100<90) and two clusters ((Dlink/Dmax) × 100<80) for site and period similarities. Principal component analysis rendered six significant components (eigenvalue>1) that explained 82.6% of the total variance of the data set. Classification matrix of discriminant analysis assigned 88.9-92.6% and 83.3-100% correctness in spatial and temporal variability, respectively. Times series analysis then confirmed that only four parameters were not significant over time change. Therefore, it is imperative that the environmental impact of reclamation and dredging works, municipal or industrial discharge, marine aquaculture and shipping activities in this area be effectively controlled and managed. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution

    PubMed Central

    Lo, Kenneth

    2011-01-01

    Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. PMID:22125375

  15. Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.

    PubMed

    Lo, Kenneth; Gottardo, Raphael

    2012-01-01

    Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.

  16. Analysis of risk factors for cluster behavior of dental implant failures.

    PubMed

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  17. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

    PubMed

    Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

    2017-03-15

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  18. Characterization of Interfacial Chemistry of Adhesive/Dentin Bond Using FTIR Chemical Imaging With Univariate and Multivariate Data Processing

    PubMed Central

    Wang, Yong; Yao, Xiaomei; Parthasarathy, Ranganathan

    2008-01-01

    Fourier transform infrared (FTIR) chemical imaging can be used to investigate molecular chemical features of the adhesive/dentin interfaces. However, the information is not straightforward, and is not easily extracted. The objective of this study was to use multivariate analysis methods, principal component analysis and fuzzy c-means clustering, to analyze spectral data in comparison with univariate analysis. The spectral imaging data collected from both the adhesive/healthy dentin and adhesive/caries-affected dentin specimens were used and compared. The univariate statistical methods such as mapping of intensities of specific functional group do not always accurately identify functional group locations and concentrations due to more or less band overlapping in adhesive and dentin. Apart from the ease with which information can be extracted, multivariate methods highlight subtle and often important changes in the spectra that are difficult to observe using univariate methods. The results showed that the multivariate methods gave more satisfactory, interpretable results than univariate methods and were conclusive in showing that they can discriminate and classify differences between healthy dentin and caries-affected dentin within the interfacial regions. It is demonstrated that the multivariate FTIR imaging approaches can be used in the rapid characterization of heterogeneous, complex structure. PMID:18980198

  19. Assessment of self-organizing maps to analyze sole-carbon source utilization profiles.

    PubMed

    Leflaive, Joséphine; Céréghino, Régis; Danger, Michaël; Lacroix, Gérard; Ten-Hage, Loïc

    2005-07-01

    The use of community-level physiological profiles obtained with Biolog microplates is widely employed to consider the functional diversity of bacterial communities. Biolog produces a great amount of data which analysis has been the subject of many studies. In most cases, after some transformations, these data were investigated with classical multivariate analyses. Here we provided an alternative to this method, that is the use of an artificial intelligence technique, the Self-Organizing Maps (SOM, unsupervised neural network). We used data from a microcosm study of algae-associated bacterial communities placed in various nutritive conditions. Analyses were carried out on the net absorbances at two incubation times for each substrates and on the chemical guild categorization of the total bacterial activity. Compared to Principal Components Analysis and cluster analysis, SOM appeared as a valuable tool for community classification, and to establish clear relationships between clusters of bacterial communities and sole-carbon sources utilization. Specifically, SOM offered a clear bidimensional projection of a relatively large volume of data and were easier to interpret than plots commonly obtained with multivariate analyses. They would be recommended to pattern the temporal evolution of communities' functional diversity.

  20. Multivariate statistical techniques for the evaluation of surface water quality of the Himalayan foothills streams, Pakistan

    NASA Astrophysics Data System (ADS)

    Malik, Riffat Naseem; Hashmi, Muhammad Zaffar

    2017-10-01

    Himalayan foothills streams, Pakistan play an important role in living water supply and irrigation of farmlands; thus, the water quality is closely related to public health. Multivariate techniques were applied to check spatial and seasonal trends, and metals contamination sources of the Himalayan foothills streams, Pakistan. Grab surface water samples were collected from different sites (5-15 cm water depth) in pre-washed polyethylene containers. Fast Sequential Atomic Absorption Spectrophotometer (Varian FSAA-240) was used to measure the metals concentration. Concentrations of Ni, Cu, and Mn were high in pre-monsoon season than the post-monsoon season. Cluster analysis identified impaired, moderately impaired and least impaired clusters based on water parameters. Discriminant function analysis indicated spatial variability in water was due to temperature, electrical conductivity, nitrates, iron and lead whereas seasonal variations were correlated with 16 physicochemical parameters. Factor analysis identified municipal and poultry waste, automobile activities, surface runoff, and soil weathering as major sources of contamination. Levels of Mn, Cr, Fe, Pb, Cd, Zn and alkalinity were above the WHO and USEPA standards for surface water. The results of present study will help to higher authorities for the management of the Himalayan foothills streams.

  1. Multivariate carbon and nitrogen stable isotope model for the reconstruction of prehistoric human diet.

    PubMed

    Froehle, A W; Kellner, C M; Schoeninger, M J

    2012-03-01

    Using a sample of published archaeological data, we expand on an earlier bivariate carbon model for diet reconstruction by adding bone collagen nitrogen stable isotope values (δ(15) N), which provide information on trophic level and consumption of terrestrial vs. marine protein. The bivariate carbon model (δ(13) C(apatite) vs. δ(13) C(collagen) ) provides detailed information on the isotopic signatures of whole diet and dietary protein, but is limited in its ability to distinguish between C(4) and marine protein. Here, using cluster analysis and discriminant function analysis, we generate a multivariate diet reconstruction model that incorporates δ(13) C(apatite) , δ(13) C(collagen) , and δ(15) N holistically. Inclusion of the δ(15) N data proves useful in resolving protein-related limitations of the bivariate carbon model, and splits the sample into five distinct dietary clusters. Two significant discriminant functions account for 98.8% of the sample variance, providing a multivariate model for diet reconstruction. Both carbon variables dominate the first function, while δ(15) N most strongly influences the second. Independent support for the functions' ability to accurately classify individuals according to diet comes from a small sample of experimental rats, which cluster as expected from their diets. The new model also provides a statistical basis for distinguishing between food sources with similar isotopic signatures, as in a previously analyzed archaeological population from Saipan (see Ambrose et al.: AJPA 104(1997) 343-361). Our model suggests that the Saipan islanders' (13) C-enriched signal derives mainly from sugarcane, not seaweed. Further development and application of this model can similarly improve dietary reconstructions in archaeological, paleontological, and primatological contexts. Copyright © 2011 Wiley Periodicals, Inc.

  2. Phenotypes determined by cluster analysis in severe or difficult-to-treat asthma.

    PubMed

    Schatz, Michael; Hsu, Jin-Wen Y; Zeiger, Robert S; Chen, Wansu; Dorenbaum, Alejandro; Chipps, Bradley E; Haselkorn, Tmirah

    2014-06-01

    Asthma phenotyping can facilitate understanding of disease pathogenesis and potential targeted therapies. To further characterize the distinguishing features of phenotypic groups in difficult-to-treat asthma. Children ages 6-11 years (n = 518) and adolescents and adults ages ≥12 years (n = 3612) with severe or difficult-to-treat asthma from The Epidemiology and Natural History of Asthma: Outcomes and Treatment Regimens (TENOR) study were evaluated in this post hoc cluster analysis. Analyzed variables included sex, race, atopy, age of asthma onset, smoking (adolescents and adults), passive smoke exposure (children), obesity, and aspirin sensitivity. Cluster analysis used the hierarchical clustering algorithm with the Ward minimum variance method. The results were compared among clusters by χ(2) analysis; variables with significant (P < .05) differences among clusters were considered as distinguishing feature candidates. Associations among clusters and asthma-related health outcomes were assessed in multivariable analyses by adjusting for socioeconomic status, environmental exposures, and intensity of therapy. Five clusters were identified in each age stratum. Sex, atopic status, and nonwhite race were distinguishing variables in both strata; passive smoke exposure was distinguishing in children and aspirin sensitivity in adolescents and adults. Clusters were not related to outcomes in children, but 2 adult and adolescent clusters distinguished by nonwhite race and aspirin sensitivity manifested poorer quality of life (P < .0001), and the aspirin-sensitive cluster experienced more frequent asthma exacerbations (P < .0001). Distinct phenotypes appear to exist in patients with severe or difficult-to-treat asthma, which is related to outcomes in adolescents and adults but not in children. The study of the therapeutic implications of these phenotypes is warranted. Copyright © 2013 American Academy of Allergy, Asthma & Immunology. Published by Mosby, Inc. All rights reserved.

  3. Emergence of sporadic non-clustered cases of hospital-associated listeriosis among immunocompromised adults in southern Taiwan from 1992 to 2013: effect of precipitating immunosuppressive agents.

    PubMed

    Lee, Chun-Yuan; Tsai, Hung-Chin; Kunin, Calvin M; Lee, Susan Shin-Jung; Wu, Kuan-Sheng; Chen, Yao-Shen

    2014-03-19

    Sporadic non-clustered hospital-associated listeriosis is an emerging infectious disease in immunocompromised hosts. The current study was designed to determine the impact of long-term and precipitating immunosuppressive agents and underlying diseases on triggering the expression of the disease, and to compare the clinical features and outcome of hospital-associated and community-associated listeriosis. We reviewed the medical records of all patients with Listeria monocytogenes isolated from sterile body sites at a large medical center in southern Taiwan during 1992-2013. Non-clustered cases were defined as those unrelated to any other in time or place. Multivariable regression analysis was used to determine factors associated with prognosis. Thirty-five non-clustered cases of listeriosis were identified. Twelve (34.2%) were hospital-associated, and 23 (65.7%) were community-associated. The 60-day mortality was significantly greater in hospital-associated than in community-associated cases (66.7% vs. 17.4%, p = 0.007). Significantly more hospital-associated than community-associated cases were treated with a precipitating immunosuppressive agent within 4 weeks prior to onset of listeriosis (91.7% vs. 4.3%, respectively p < 0.001). The median period from the start of precipitating immunosuppressive treatment to the onset of listeriosis-related symptoms was 12 days (range, 4-27 days) in 11 of the 12 hospital-associated cases. In the multivariable analysis, APACHE II score >21 (p = 0.04) and receipt of precipitating immunosuppressive therapy (p = 0.02) were independent risk factors for 60-day mortality. Sporadic non-clustered hospital-associated listeriosis needs to be considered in the differential diagnosis of sepsis in immunocompromised patients, particularly in those treated with new or increased doses of immunosuppressive agents.

  4. Multivariate Statistical Analysis of Cigarette Design Feature Influence on ISO TNCO Yields.

    PubMed

    Agnew-Heard, Kimberly A; Lancaster, Vicki A; Bravo, Roberto; Watson, Clifford; Walters, Matthew J; Holman, Matthew R

    2016-06-20

    The aim of this study is to explore how differences in cigarette physical design parameters influence tar, nicotine, and carbon monoxide (TNCO) yields in mainstream smoke (MSS) using the International Organization of Standardization (ISO) smoking regimen. Standardized smoking methods were used to evaluate 50 U.S. domestic brand cigarettes and a reference cigarette representing a range of TNCO yields in MSS collected from linear smoking machines using a nonintense smoking regimen. Multivariate statistical methods were used to form clusters of cigarettes based on their ISO TNCO yields and then to explore the relationship between the ISO generated TNCO yields and the nine cigarette physical design parameters between and within each cluster simultaneously. The ISO generated TNCO yields in MSS are 1.1-17.0 mg tar/cigarette, 0.1-2.2 mg nicotine/cigarette, and 1.6-17.3 mg CO/cigarette. Cluster analysis divided the 51 cigarettes into five discrete clusters based on their ISO TNCO yields. No one physical parameter dominated across all clusters. Predicting ISO machine generated TNCO yields based on these nine physical design parameters is complex due to the correlation among and between the nine physical design parameters and TNCO yields. From these analyses, it is estimated that approximately 20% of the variability in the ISO generated TNCO yields comes from other parameters (e.g., filter material, filter type, inclusion of expanded or reconstituted tobacco, and tobacco blend composition, along with differences in tobacco leaf origin and stalk positions and added ingredients). A future article will examine the influence of these physical design parameters on TNCO yields under a Canadian Intense (CI) smoking regimen. Together, these papers will provide a more robust picture of the design features that contribute to TNCO exposure across the range of real world smoking patterns.

  5. Multivariate statistical assessment of heavy metal pollution sources of groundwater around a lead and zinc plant.

    PubMed

    Zamani, Abbas Ali; Yaftian, Mohammad Reza; Parizanganeh, Abdolhossein

    2012-12-17

    The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area.

  6. Multivariate statistical assessment of heavy metal pollution sources of groundwater around a lead and zinc plant

    PubMed Central

    2012-01-01

    The contamination of groundwater by heavy metal ions around a lead and zinc plant has been studied. As a case study groundwater contamination in Bonab Industrial Estate (Zanjan-Iran) for iron, cobalt, nickel, copper, zinc, cadmium and lead content was investigated using differential pulse polarography (DPP). Although, cobalt, copper and zinc were found correspondingly in 47.8%, 100.0%, and 100.0% of the samples, they did not contain these metals above their maximum contaminant levels (MCLs). Cadmium was detected in 65.2% of the samples and 17.4% of them were polluted by this metal. All samples contained detectable levels of lead and iron with 8.7% and 13.0% of the samples higher than their MCLs. Nickel was also found in 78.3% of the samples, out of which 8.7% were polluted. In general, the results revealed the contamination of groundwater sources in the studied zone. The higher health risks are related to lead, nickel, and cadmium ions. Multivariate statistical techniques were applied for interpreting the experimental data and giving a description for the sources. The data analysis showed correlations and similarities between investigated heavy metals and helps to classify these ion groups. Cluster analysis identified five clusters among the studied heavy metals. Cluster 1 consisted of Pb, Cu, and cluster 3 included Cd, Fe; also each of the elements Zn, Co and Ni was located in groups with single member. The same results were obtained by factor analysis. Statistical investigations revealed that anthropogenic factors and notably lead and zinc plant and pedo-geochemical pollution sources are influencing water quality in the studied area. PMID:23369182

  7. Deeper Insights into the Circumgalactic Medium using Multivariate Analysis Methods

    NASA Astrophysics Data System (ADS)

    Lewis, James; Churchill, Christopher W.; Nielsen, Nikole M.; Kacprzak, Glenn

    2017-01-01

    Drawing from a database of galaxies whose surrounding gas has absorption from MgII, called the MgII-Absorbing Galaxy Catalog (MAGIICAT, Neilsen et al 2013), we studied the circumgalactic medium (CGM) for a sample of 47 galaxies. Using multivariate analysis, in particular the k-means clustering algorithm, we determined that simultaneously examining column density (N), rest-frame B-K color, virial mass, and azimuthal angle (the projected angle between the galaxy major axis and the quasar line of sight) yields two distinct populations: (1) bluer, lower mass galaxies with higher column density along the minor axis, and (2) redder, higher mass galaxies with lower column density along the major axis. We support this grouping by running (i) two-sample, two-dimensional Kolmogorov-Smirnov (KS) tests on each of the six bivariate planes and (ii) two-sample KS tests on each of the four variables to show that the galaxies significantly cluster into two independent populations. To account for the fact that 16 of our 47 galaxies have upper limits on N, we performed Monte-Carlo tests whereby we replaced upper limits with random deviates drawn from a Schechter distribution fit, f(N). These tests strengthen the results of the KS tests. We examined the behavior of the MgII λ2796 absorption line equivalent width and velocity width for each galaxy population. We find that equivalent width and velocity width do not show similar characteristic distinctions between the two galaxy populations. We discuss the k-means clustering algorithm for optimizing the analysis of populations within datasets as opposed to using arbitrary bivariate subsample cuts. We also discuss the power of the k-means clustering algorithm in extracting deeper physical insight into the CGM in relationship to host galaxies.

  8. Clustangles: An Open Library for Clustering Angular Data.

    PubMed

    Sargsyan, Karen; Hua, Yun Hao; Lim, Carmay

    2015-08-24

    Dihedral angles are good descriptors of the numerous conformations visited by large, flexible systems, but their analysis requires directional statistics. A single package including the various multivariate statistical methods for angular data that accounts for the distinct topology of such data does not exist. Here, we present a lightweight standalone, operating-system independent package called Clustangles to fill this gap. Clustangles will be useful in analyzing the ever-increasing number of structures in the Protein Data Bank and clustering the copious conformations from increasingly long molecular dynamics simulations.

  9. Cluster analysis of the national weight control registry to identify distinct subgroups maintaining successful weight loss.

    PubMed

    Ogden, Lorraine G; Stroebele, Nanette; Wyatt, Holly R; Catenacci, Victoria A; Peters, John C; Stuht, Jennifer; Wing, Rena R; Hill, James O

    2012-10-01

    The National Weight Control Registry (NWCR) is the largest ongoing study of individuals successful at maintaining weight loss; the registry enrolls individuals maintaining a weight loss of at least 13.6 kg (30 lb) for a minimum of 1 year. The current report uses multivariate latent class cluster analysis to identify unique clusters of individuals within the NWCR that have distinct experiences, strategies, and attitudes with respect to weight loss and weight loss maintenance. The cluster analysis considers weight and health history, weight control behaviors and strategies, effort and satisfaction with maintaining weight, and psychological and demographic characteristics. The analysis includes 2,228 participants enrolled between 1998 and 2002. Cluster 1 (50.5%) represents a weight-stable, healthy, exercise conscious group who are very satisfied with their current weight. Cluster 2 (26.9%) has continuously struggled with weight since childhood; they rely on the greatest number of resources and strategies to lose and maintain weight, and report higher levels of stress and depression. Cluster 3 (12.7%) represents a group successful at weight reduction on the first attempt; they were least likely to be overweight as children, are maintaining the longest duration of weight loss, and report the least difficulty maintaining weight. Cluster 4 (9.9%) represents a group less likely to use exercise to control weight; they tend to be older, eat fewer meals, and report more health problems. Further exploration of the unique characteristics of these clusters could be useful for tailoring future weight loss and weight maintenance programs to the specific characteristics of an individual.

  10. Impact of multi-resolution analysis of artificial intelligence models inputs on multi-step ahead river flow forecasting

    NASA Astrophysics Data System (ADS)

    Badrzadeh, Honey; Sarukkalige, Ranjan; Jayawardena, A. W.

    2013-12-01

    Discrete wavelet transform was applied to decomposed ANN and ANFIS inputs.Novel approach of WNF with subtractive clustering applied for flow forecasting.Forecasting was performed in 1-5 step ahead, using multi-variate inputs.Forecasting accuracy of peak values and longer lead-time significantly improved.

  11. Bias correction in the hierarchical likelihood approach to the analysis of multivariate survival data.

    PubMed

    Jeon, Jihyoun; Hsu, Li; Gorfine, Malka

    2012-07-01

    Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.

  12. Topic modeling for cluster analysis of large biological and medical datasets

    PubMed Central

    2014-01-01

    Background The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. Results In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Conclusion Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets. PMID:25350106

  13. Topic modeling for cluster analysis of large biological and medical datasets.

    PubMed

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting that topic model-based methods could provide an analytic advancement in the analysis of large biological or medical datasets.

  14. Peeking Network States with Clustered Patterns

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Jinoh; Sim, Alex

    2015-10-20

    Network traffic monitoring has long been a core element for effec- tive network management and security. However, it is still a chal- lenging task with a high degree of complexity for comprehensive analysis when considering multiple variables and ever-increasing traffic volumes to monitor. For example, one of the widely con- sidered approaches is to scrutinize probabilistic distributions, but it poses a scalability concern and multivariate analysis is not gen- erally supported due to the exponential increase of the complexity. In this work, we propose a novel method for network traffic moni- toring based on clustering, one of the powerful deep-learningmore » tech- niques. We show that the new approach enables us to recognize clustered results as patterns representing the network states, which can then be utilized to evaluate “similarity” of network states over time. In addition, we define a new quantitative measure for the similarity between two compared network states observed in dif- ferent time windows, as a supportive means for intuitive analysis. Finally, we demonstrate the clustering-based network monitoring with public traffic traces, and show that the proposed approach us- ing the clustering method has a great opportunity for feasible, cost- effective network monitoring.« less

  15. Factors influencing epibenthic assemblages in the Minho Estuary (NW Iberian Peninsula).

    PubMed

    Costa-Dias, Sérgia; Freitas, Vânia; Sousa, Ronaldo; Antunes, Carlos

    2010-01-01

    The epibenthic community of the Minho Estuary was studied during the summer of 2006. Diversity was generally low and a total of 14 fish and five crustacean taxa were identified. Multivariate analysis revealed two site clusters (A and B). Water conductivity and percentage of fine sand were the abiotic variables that most contributed to the spatial distinction between clusters. The species contributing the most to the average similarity within Cluster A were Crangon crangon and Pomatoschistus microps, while in Cluster B was Atyaephyra desmarestii. Possible factors responsible for the low diversity of the epibenthic community in Minho Estuary were the low macrozoobenthic abundance and diversity, and the high abiotic oscillations between tides (mainly salinity) acting on the ecosystem. Copyright 2010 Elsevier Ltd. All rights reserved.

  16. Interactive visual exploration and analysis of origin-destination data

    NASA Astrophysics Data System (ADS)

    Ding, Linfang; Meng, Liqiu; Yang, Jian; Krisp, Jukka M.

    2018-05-01

    In this paper, we propose a visual analytics approach for the exploration of spatiotemporal interaction patterns of massive origin-destination data. Firstly, we visually query the movement database for data at certain time windows. Secondly, we conduct interactive clustering to allow the users to select input variables/features (e.g., origins, destinations, distance, and duration) and to adjust clustering parameters (e.g. distance threshold). The agglomerative hierarchical clustering method is applied for the multivariate clustering of the origin-destination data. Thirdly, we design a parallel coordinates plot for visualizing the precomputed clusters and for further exploration of interesting clusters. Finally, we propose a gradient line rendering technique to show the spatial and directional distribution of origin-destination clusters on a map view. We implement the visual analytics approach in a web-based interactive environment and apply it to real-world floating car data from Shanghai. The experiment results show the origin/destination hotspots and their spatial interaction patterns. They also demonstrate the effectiveness of our proposed approach.

  17. East Greenland and Barents Sea polar bears (Ursus maritimus): adaptive variation between two populations using skull morphometrics as an indicator of environmental and genetic differences.

    PubMed

    Pertoldi, Cino; Sonne, Christian; Wiig, Øystein; Baagøe, Hans J; Loeschcke, Volker; Bechshøft, Thea Østergaard

    2012-06-01

    A morphometric study was conducted on four skull traits of 37 male and 18 female adult East Greenland polar bears (Ursus maritimus) collected 1892-1968, and on 54 male and 44 female adult Barents Sea polar bears collected 1950-1969. The aim was to compare differences in size and shape of the bear skulls using a multivariate approach, characterizing the variation between the two populations using morphometric traits as an indicator of environmental and genetic differences. Mixture analysis testing for geographic differentiation within each population revealed three clusters for Barents Sea males and three clusters for Barents Sea females. East Greenland consisted of one female and one male cluster. A principal component analysis (PCA) conducted on the clusters defined by the mixture analysis, showed that East Greenland and Barents Sea polar bear populations overlapped to a large degree, especially with regards to females. Multivariate analyses of variance (MANOVA) showed no significant differences in morphometric means between the two populations, but differences were detected between clusters from each respective geographic locality. To estimate the importance of genetics and environment in the morphometric differences between the bears, a PCA was performed on the covariance matrix derived from the skull measurements. Skull trait size (PC1) explained approx. 80% of the morphometric variation, whereas shape (PC2) defined approx. 15%, indicating some genetic differentiation. Hence, both environmental and genetic factors seem to have contributed to the observed skull differences between the two populations. Overall, results indicate that many Barents Sea polar bears are morphometrically similar to the East Greenland ones, suggesting an exchange of individuals between the two populations. Furthermore, a subpopulation structure in the Barents Sea population was also indicated from the present analyses, which should be considered with regards to future management decisions. © 2012 The Authors.

  18. Alteration mapping at Goldfield, Nevada, by cluster and discriminant analysis of Landsat digital data. [mapping of hydrothermally altered volcanic rocks

    NASA Technical Reports Server (NTRS)

    Ballew, G.

    1977-01-01

    The ability of Landsat multispectral digital data to differentiate among 62 combinations of rock and alteration types at the Goldfield mining district of Western Nevada was investigated by using statistical techniques of cluster and discriminant analysis. Multivariate discriminant analysis was not effective in classifying each of the 62 groups, with classification results essentially the same whether data of four channels alone or combined with six ratios of channels were used. Bivariate plots of group means revealed a cluster of three groups including mill tailings, basalt and all other rock and alteration types. Automatic hierarchical clustering based on the fourth dimensional Mahalanobis distance between group means of 30 groups having five or more samples was performed using Johnson's HICLUS program. The results of the cluster analysis revealed hierarchies of mill tailings vs. natural materials, basalt vs. non-basalt, highly reflectant rocks vs. other rocks and exclusively unaltered rocks vs. predominantly altered rocks. The hierarchies were used to determine the order in which sets of multiple discriminant analyses were to be performed and the resulting discriminant functions were used to produce a map of geology and alteration which has an overall accuracy of 70 percent for discriminating exclusively altered rocks from predominantly altered rocks.

  19. Processes and subdivisions in diogenites, a multivariate statistical analysis

    NASA Technical Reports Server (NTRS)

    Harriott, T. A.; Hewins, R. H.

    1984-01-01

    Multivariate statistical techniques used on diogenite orthopyroxene analyses show the relationships that occur within diogenites and the two orthopyroxenite components (class I and II) in the polymict diogenite Garland. Cluster analysis shows that only Peckelsheim is similar to Garland class I (Fe-rich) and the other diogenites resemble Garland class II. The unique diogenite Y 75032 may be related to type I by fractionation. Factor analysis confirms the subdivision and shows that Fe does not correlate with the weakly incompatible elements across the entire pyroxene composition range, indicating that igneous fractionation is not the process controlling total diogenite composition variation. The occurrence of two groups of diogenites is interpreted as the result of sampling or mixing of two main sequences of orthopyroxene cumulates with slightly different compositions.

  20. Multivariate analysis of heavy metal contamination using river sediment cores of Nankan River, northern Taiwan

    NASA Astrophysics Data System (ADS)

    Lee, An-Sheng; Lu, Wei-Li; Huang, Jyh-Jaan; Chang, Queenie; Wei, Kuo-Yen; Lin, Chin-Jung; Liou, Sofia Ya Hsuan

    2016-04-01

    Through the geology and climate characteristic in Taiwan, generally rivers carry a lot of suspended particles. After these particles settled, they become sediments which are good sorbent for heavy metals in river system. Consequently, sediments can be found recording contamination footprint at low flow energy region, such as estuary. Seven sediment cores were collected along Nankan River, northern Taiwan, which is seriously contaminated by factory, household and agriculture input. Physico-chemical properties of these cores were derived from Itrax-XRF Core Scanner and grain size analysis. In order to interpret these complex data matrices, the multivariate statistical techniques (cluster analysis, factor analysis and discriminant analysis) were introduced to this study. Through the statistical determination, the result indicates four types of sediment. One of them represents contamination event which shows high concentration of Cu, Zn, Pb, Ni and Fe, and low concentration of Si and Zr. Furthermore, three possible contamination sources of this type of sediment were revealed by Factor Analysis. The combination of sediment analysis and multivariate statistical techniques used provides new insights into the contamination depositional history of Nankan River and could be similarly applied to other river systems to determine the scale of anthropogenic contamination.

  1. Development of methodology for identification the nature of the polyphenolic extracts by FTIR associated with multivariate analysis

    NASA Astrophysics Data System (ADS)

    Grasel, Fábio dos Santos; Ferrão, Marco Flôres; Wolf, Carlos Rodolfo

    2016-01-01

    Tannins are polyphenolic compounds of complex structures formed by secondary metabolism in several plants. These polyphenolic compounds have different applications, such as drugs, anti-corrosion agents, flocculants, and tanning agents. This study analyses six different type of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), we observed well-defined separation between condensed (quebracho and black wattle) and hydrolysable (valonea, chestnut, myrobalan, and tara) tannins. For hydrolysable tannins, it was also possible to observe the formation of two different subgroups between samples of chestnut and valonea and between samples of tara and myrobalan. Among all samples analysed, the chestnut and valonea showed the greatest similarity, indicating that these extracts contain equivalent chemical compositions and structure and, therefore, similar properties.

  2. Estimating global distribution of boreal, temperate, and tropical tree plant functional types using clustering techniques

    NASA Astrophysics Data System (ADS)

    Wang, Audrey; Price, David T.

    2007-03-01

    A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.

  3. An application of bioassessment metrics and multivariate techniques to evaluate central Nebraska streams

    USGS Publications Warehouse

    Frenzel, S.A.

    1996-01-01

    Ninety-one stream sites in central Nebraska were classified into four clusters on the basis of a cluster analysis (TWINSPAN) of macroinvertebrate data. Rapid bioassessment protocol scores for macroinvertebrate species were significantly different among sites grouped by teh first division into two clusters. This division may have distinguished sites on the basis of water-quality imparement. Individual metrics that differed between clusters of sites were the Hilsenhoff Biotic Index, the number of Ephemeroptera, Plecoptera, and Trichoptera (EPT) taxa, and the ratio of individuals in EPT to Chironomidae taxa. Canonical correspondence analysis of 57 of 91 sites showed that stream width, site altitude, latitude, soil permeability, water temperature, and mean annual precipitation were the most important environmental variables describing variance in the species-environment relation. Stream width and soil permeability reflected streamflow characteristics of a site, whereas site altitude and latitude were factors related to general climatic conditions. Mean annual precipitation related to both streamflow and climatic conditions.

  4. Cluster and Multiple Correspondence Analyses in Rheumatology: Paths to Uncovering Relationships in a Sea of Data.

    PubMed

    Han, Lu; Benseler, Susanne M; Tyrrell, Pascal N

    2018-05-01

    Rheumatic diseases encompass a wide range of conditions caused by inflammation and dysregulation of the immune system resulting in organ damage. Research in these heterogeneous diseases benefits from multivariate methods. The aim of this review was to describe and evaluate current literature in rheumatology regarding cluster analysis and correspondence analysis. A systematic review showed an increase in studies making use of these 2 methods. However, standardization in how these methods are applied and reported is needed. Researcher expertise was determined to be the main barrier to considering these approaches, whereas education and collaborating with a biostatistician were suggested ways forward. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Measuring the Indonesian provinces competitiveness by using PCA technique

    NASA Astrophysics Data System (ADS)

    Runita, Ditha; Fajriyah, Rohmatul

    2017-12-01

    Indonesia is a country which has vast teritoty. It has 34 provinces. Building local competitiveness is critical to enhance the long-term national competitiveness especially for a country as diverse as Indonesia. A competitive local government can attract and maintain successful firms and increase living standards for its inhabitants, because investment and skilled workers gravitate from uncompetitive regions to more competitive ones. Altough there are other methods to measuring competitiveness, but here we have demonstrated a simple method using principal component analysis (PCA). It can directly be applied to correlated, multivariate data. The analysis on Indonesian provinces provides 3 clusters based on the competitiveness measurement and the clusters are Bad, Good and Best perform provinces.

  6. Potential of SNP markers for the characterization of Brazilian cassava germplasm.

    PubMed

    de Oliveira, Eder Jorge; Ferreira, Cláudia Fortes; da Silva Santos, Vanderlei; de Jesus, Onildo Nunes; Oliveira, Gilmara Alvarenga Fachardo; da Silva, Maiane Suzarte

    2014-06-01

    High-throughput markers, such as SNPs, along with different methodologies were used to evaluate the applicability of the Bayesian approach and the multivariate analysis in structuring the genetic diversity in cassavas. The objective of the present work was to evaluate the diversity and genetic structure of the largest cassava germplasm bank in Brazil. Complementary methodological approaches such as discriminant analysis of principal components (DAPC), Bayesian analysis and molecular analysis of variance (AMOVA) were used to understand the structure and diversity of 1,280 accessions genotyped using 402 single nucleotide polymorphism markers. The genetic diversity (0.327) and the average observed heterozygosity (0.322) were high considering the bi-allelic markers. In terms of population, the presence of a complex genetic structure was observed indicating the formation of 30 clusters by DAPC and 34 clusters by Bayesian analysis. Both methodologies presented difficulties and controversies in terms of the allocation of some accessions to specific clusters. However, the clusters suggested by the DAPC analysis seemed to be more consistent for presenting higher probability of allocation of the accessions within the clusters. Prior information related to breeding patterns and geographic origins of the accessions were not sufficient for providing clear differentiation between the clusters according to the AMOVA analysis. In contrast, the F ST was maximized when considering the clusters suggested by the Bayesian and DAPC analyses. The high frequency of germplasm exchange between producers and the subsequent alteration of the name of the same material may be one of the causes of the low association between genetic diversity and geographic origin. The results of this study may benefit cassava germplasm conservation programs, and contribute to the maximization of genetic gains in breeding programs.

  7. Fast clustering using adaptive density peak detection.

    PubMed

    Wang, Xiao-Feng; Xu, Yifan

    2017-12-01

    Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

  8. Finding Groups Using Model-based Cluster Analysis: Heterogeneous Emotional Self-regulatory Processes and Heavy Alcohol Use Risk

    PubMed Central

    Mun, Eun-Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

    2010-01-01

    Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of non-nested models using the Bayesian Information Criterion (BIC) to compare multiple models and identify the optimum number of clusters. The current study clustered 36 young men and women based on their baseline heart rate (HR) and HR variability (HRV), chronic alcohol use, and reasons for drinking. Two cluster groups were identified and labeled High Alcohol Risk and Normative groups. Compared to the Normative group, individuals in the High Alcohol Risk group had higher levels of alcohol use and more strongly endorsed disinhibition and suppression reasons for use. The High Alcohol Risk group showed significant HRV changes in response to positive and negative emotional and appetitive picture cues, compared to neutral cues. In contrast, the Normative group showed a significant HRV change only to negative cues. Findings suggest that the individuals with autonomic self-regulatory difficulties may be more susceptible to heavy alcohol use and use alcohol for emotional regulation. PMID:18331138

  9. Subgroups of advanced cancer patients clustered by their symptom profiles: quality-of-life outcomes.

    PubMed

    Husain, Amna; Myers, Jeff; Selby, Debbie; Thomson, Barbara; Chow, Edward

    2011-11-01

    Symptom cluster analysis is a new frontier of research in symptom management. This study clustered patients by their symptom profiles to identify subgroups that may be at higher risk for poor quality of life (QOL) and that may, therefore, benefit most from targeted interventions. Longitudinal study of metastatic cancer patients using the Edmonton Symptom Assessment Scale (ESAS). We generated two-, three-, and four-cluster subgroups and examined the relationship of cluster membership with patient outcomes. To address the problem of missing longitudinal data, we developed a novel outcome variable (QualTime) that measures both QOL and time in study. Two hundred and twenty-one patients with a mean Palliative Performance Scale (PPS) of 59.1 were enrolled. The three-cluster model was chosen for further analysis. The low-burden subgroup had all low severity symptom scores. The intermediate subgroup separates from the low-burden group on the "debility" profile of fatigue, drowsiness, appetite, and well-being. The high-burden group separates from the intermediate-burden group on pain, depression, and anxiety. At baseline, PPS (p=0.0003) and cluster membership (p<0.0001) contributed significantly to global QOL. In univariate analysis, cluster membership was related to the longitudinal outcome, QualTime. In a multivariate model, the relationship of PPS to QualTime was still significant (p=0.0002), but subgroup membership was no longer significant (p=0.1009). PPS is a stronger predictor of the longitudinal variable than cluster subgroups; however, cluster subgroups provide a target for clinical interventions that may improve QOL.

  10. Heavy metal contamination of agricultural soils affected by mining activities around the Ganxi River in Chenzhou, Southern China.

    PubMed

    Ma, Li; Sun, Jing; Yang, Zhaoguang; Wang, Lin

    2015-12-01

    Heavy metal contamination attracted a wide spread attention due to their strong toxicity and persistence. The Ganxi River, located in Chenzhou City, Southern China, has been severely polluted by lead/zinc ore mining activities. This work investigated the heavy metal pollution in agricultural soils around the Ganxi River. The total concentrations of heavy metals were determined by inductively coupled plasma-mass spectrometry. The potential risk associated with the heavy metals in soil was assessed by Nemerow comprehensive index and potential ecological risk index. In both methods, the study area was rated as very high risk. Multivariate statistical methods including Pearson's correlation analysis, hierarchical cluster analysis, and principal component analysis were employed to evaluate the relationships between heavy metals, as well as the correlation between heavy metals and pH, to identify the metal sources. Three distinct clusters have been observed by hierarchical cluster analysis. In principal component analysis, a total of two components were extracted to explain over 90% of the total variance, both of which were associated with anthropogenic sources.

  11. Determining the Number of Component Clusters in the Standard Multivariate Normal Mixture Model Using Model-Selection Criteria.

    DTIC Science & Technology

    1983-06-16

    has been advocated by Gnanadesikan and 𔃾ilk (1969), and others in the literature. This suggests that, if we use the formal signficance test type...American Statistical Asso., 62, 1159-1178. Gnanadesikan , R., and Wilk, M..B. (1969). Data Analytic Methods in Multi- variate Statistical Analysis. In

  12. Study on the application of MRF and the D-S theory to image segmentation of the human brain and quantitative analysis of the brain tissue

    NASA Astrophysics Data System (ADS)

    Guan, Yihong; Luo, Yatao; Yang, Tao; Qiu, Lei; Li, Junchang

    2012-01-01

    The features of the spatial information of Markov random field image was used in image segmentation. It can effectively remove the noise, and get a more accurate segmentation results. Based on the fuzziness and clustering of pixel grayscale information, we find clustering center of the medical image different organizations and background through Fuzzy cmeans clustering method. Then we find each threshold point of multi-threshold segmentation through two dimensional histogram method, and segment it. The features of fusing multivariate information based on the Dempster-Shafer evidence theory, getting image fusion and segmentation. This paper will adopt the above three theories to propose a new human brain image segmentation method. Experimental result shows that the segmentation result is more in line with human vision, and is of vital significance to accurate analysis and application of tissues.

  13. Autonomic specificity of basic emotions: evidence from pattern classification and cluster analysis.

    PubMed

    Stephens, Chad L; Christie, Israel C; Friedman, Bruce H

    2010-07-01

    Autonomic nervous system (ANS) specificity of emotion remains controversial in contemporary emotion research, and has received mixed support over decades of investigation. This study was designed to replicate and extend psychophysiological research, which has used multivariate pattern classification analysis (PCA) in support of ANS specificity. Forty-nine undergraduates (27 women) listened to emotion-inducing music and viewed affective films while a montage of ANS variables, including heart rate variability indices, peripheral vascular activity, systolic time intervals, and electrodermal activity, were recorded. Evidence for ANS discrimination of emotion was found via PCA with 44.6% of overall observations correctly classified into the predicted emotion conditions, using ANS variables (z=16.05, p<.001). Cluster analysis of these data indicated a lack of distinct clusters, which suggests that ANS responses to the stimuli were nomothetic and stimulus-specific rather than idiosyncratic and individual-specific. Collectively these results further confirm and extend support for the notion that basic emotions have distinct ANS signatures. Copyright © 2010 Elsevier B.V. All rights reserved.

  14. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.

    2014-12-01

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.

  15. Development of methodology for identification the nature of the polyphenolic extracts by FTIR associated with multivariate analysis.

    PubMed

    Grasel, Fábio dos Santos; Ferrão, Marco Flôres; Wolf, Carlos Rodolfo

    2016-01-15

    Tannins are polyphenolic compounds of complex structures formed by secondary metabolism in several plants. These polyphenolic compounds have different applications, such as drugs, anti-corrosion agents, flocculants, and tanning agents. This study analyses six different type of polyphenolic extracts by Fourier transform infrared spectroscopy (FTIR) combined with multivariate analysis. Through both principal component analysis (PCA) and hierarchical cluster analysis (HCA), we observed well-defined separation between condensed (quebracho and black wattle) and hydrolysable (valonea, chestnut, myrobalan, and tara) tannins. For hydrolysable tannins, it was also possible to observe the formation of two different subgroups between samples of chestnut and valonea and between samples of tara and myrobalan. Among all samples analysed, the chestnut and valonea showed the greatest similarity, indicating that these extracts contain equivalent chemical compositions and structure and, therefore, similar properties. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Clustering Multivariate Time Series Using Hidden Markov Models

    PubMed Central

    Ghassempour, Shima; Girosi, Federico; Maeder, Anthony

    2014-01-01

    In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs), where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers. PMID:24662996

  17. Recognizing different tissues in human fetal femur cartilage by label-free Raman microspectroscopy

    NASA Astrophysics Data System (ADS)

    Kunstar, Aliz; Leijten, Jeroen; van Leuveren, Stefan; Hilderink, Janneke; Otto, Cees; van Blitterswijk, Clemens A.; Karperien, Marcel; van Apeldoorn, Aart A.

    2012-11-01

    Traditionally, the composition of bone and cartilage is determined by standard histological methods. We used Raman microscopy, which provides a molecular "fingerprint" of the investigated sample, to detect differences between the zones in human fetal femur cartilage without the need for additional staining or labeling. Raman area scans were made from the (pre)articular cartilage, resting, proliferative, and hypertrophic zones of growth plate and endochondral bone within human fetal femora. Multivariate data analysis was performed on Raman spectral datasets to construct cluster images with corresponding cluster averages. Cluster analysis resulted in detection of individual chondrocyte spectra that could be separated from cartilage extracellular matrix (ECM) spectra and was verified by comparing cluster images with intensity-based Raman images for the deoxyribonucleic acid/ribonucleic acid (DNA/RNA) band. Specific dendrograms were created using Ward's clustering method, and principal component analysis (PCA) was performed with the separated and averaged Raman spectra of cells and ECM of all measured zones. Overall (dis)similarities between measured zones were effectively visualized on the dendrograms and main spectral differences were revealed by PCA allowing for label-free detection of individual cartilaginous zones and for label-free evaluation of proper cartilaginous matrix formation for future tissue engineering and clinical purposes.

  18. Description and typology of intensive Chios dairy sheep farms in Greece.

    PubMed

    Gelasakis, A I; Valergakis, G E; Arsenos, G; Banos, G

    2012-06-01

    The aim was to assess the intensified dairy sheep farming systems of the Chios breed in Greece, establishing a typology that may properly describe and characterize them. The study included the total of the 66 farms of the Chios sheep breeders' cooperative Macedonia. Data were collected using a structured direct questionnaire for in-depth interviews, including questions properly selected to obtain a general description of farm characteristics and overall management practices. A multivariate statistical analysis was used on the data to obtain the most appropriate typology. Initially, principal component analysis was used to produce uncorrelated variables (principal components), which would be used for the consecutive cluster analysis. The number of clusters was decided using hierarchical cluster analysis, whereas, the farms were allocated in 4 clusters using k-means cluster analysis. The identified clusters were described and afterward compared using one-way ANOVA or a chi-squared test. The main differences were evident on land availability and use, facility and equipment availability and type, expansion rates, and application of preventive flock health programs. In general, cluster 1 included newly established, intensive, well-equipped, specialized farms and cluster 2 included well-established farms with balanced sheep and feed/crop production. In cluster 3 were assigned small flock farms focusing more on arable crops than on sheep farming with a tendency to evolve toward cluster 2, whereas cluster 4 included farms representing a rather conservative form of Chios sheep breeding with low/intermediate inputs and choosing not to focus on feed/crop production. In the studied set of farms, 4 different farmer attitudes were evident: 1) farming disrupts sheep breeding; feed should be purchased and economies of scale will decrease costs (mainly cluster 1), 2) only exercise/pasture land is necessary; at least part of the feed (pasture) must be home-grown to decrease costs (clusters 1 and 4), 3) providing pasture to sheep is essential; on-farm feed production decreases costs (mainly cluster 3), and 4) large-scale farming (feed production and cash crops) does not disrupt sheep breeding; all feed must be produced on-farm to decrease costs (mainly cluster 3). Conducting a profitability analysis among different clusters, exploring and discovering the most beneficial levels of intensified management and capital investment should now be considered. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  19. Analysis of the mutations induced by conazole fungicides in vivo.

    PubMed

    Ross, Jeffrey A; Leavitt, Sharon A

    2010-05-01

    The mouse liver tumorigenic conazole fungicides triadimefon and propiconazole have previously been shown to be in vivo mouse liver mutagens in the Big Blue transgenic mutation assay when administered in feed at tumorigenic doses, whereas the non-tumorigenic conazole myclobutanil was not mutagenic. DNA sequencing of the mutants recovered from each treatment group as well as from animals receiving control diet was conducted to gain additional insight into the mode of action by which tumorigenic conazoles induce mutations. Relative dinucleotide mutabilities (RDMs) were calculated for each possible dinucleotide in each treatment group and then examined by multivariate statistical analysis techniques. Unsupervised hierarchical clustering analysis of RDM values segregated two independent control groups together, along with the non-tumorigen myclobutanil. The two tumorigenic conazoles clustered together in a distinct grouping. Partitioning around mediods of RDM values into two clusters also groups the triadimefon and propiconazole together in one cluster and the two control groups and myclobutanil together in a second cluster. Principal component analysis of these results identifies two components that account for 88.3% of the variability in the points. Taken together, these results are consistent with the hypothesis that propiconazole- and triadimefon-induced mutations do not represent clonal expansion of background mutations and support the hypothesis that they arise from the accumulation of reactive electrophilic metabolic intermediates within the liver in vivo.

  20. Objective and Perceived Weight: Associations with Risky Adolescent Sexual Behavior

    PubMed Central

    Akers, Aletha Y.; Cohen, Elan D.; Marshal, Michael P.; Roebuck, Geoff; Yu, Lan; Hipwell, Alison E.

    2016-01-01

    CONTEXT Studies have shown that obesity is associated with increased sexual risk-taking, particularly among adolescent females, but the relationships between obesity, perceived weight and sexual risk behaviors are poorly understood. METHODS Integrative data analysis was performed that combined baseline data from the 1994–1995 National Longitudinal Study of Adolescent Health (from 17,606 respondents in grades 7–12) and the 1997 National Longitudinal Survey of Youth (from 7,752 respondents aged 12–16). Using six sexual behaviors measured in both data sets (age at first intercourse, various measures of contraceptive use and number of partners), cluster analysis was conducted that identified five distinct behavior clusters. Multivariate ordinal logistic regression analysis examined associations between adolescents’ weight status (categorized as underweight, normal-weight, overweight or obese) and weight perception and their cluster membership. RESULTS Among males, being underweight, rather than normal-weight, was negatively associated with membership in increasingly risky clusters (odds ratio, 0.5), as was the perception of being overweight, as opposed to about the right weight (0.8). However, being overweight was positively associated with males’ membership in increasingly risky clusters (1.3). Among females, being obese, rather than normal-weight, was negatively correlated with membership in increasingly risky clusters (0.8), while the perception of being overweight was positively correlated with such membership (1.1). CONCLUSIONS Both objective and subjective assessments of weight are associated with the clustering of risky sexual behaviors among adolescents, and these behavioral patterns differ by gender. PMID:27608419

  1. Objective and Perceived Weight: Associations with Risky Adolescent Sexual Behavior.

    PubMed

    Akers, Aletha Y; Cohen, Elan D; Marshal, Michael P; Roebuck, Geoff; Yu, Lan; Hipwell, Alison E

    2016-09-01

    Studies have shown that obesity is associated with increased sexual risk-taking, particularly among adolescent females, but the relationships between obesity, perceived weight and sexual risk behaviors are poorly understood. Integrative data analysis was performed that combined baseline data from the 1994-1995 National Longitudinal Study of Adolescent Health (from 17,606 respondents in grades 7-12) and the 1997 National Longitudinal Survey of Youth (from 7,752 respondents aged 12-16). Using six sexual behaviors measured in both data sets (age at first intercourse, various measures of contraceptive use and number of partners), cluster analysis was conducted that identified five distinct behavior clusters. Multivariate ordinal logistic regression analysis examined associations between adolescents' weight status (categorized as underweight, normal-weight, overweight or obese) and weight perception and their cluster membership. Among males, being underweight, rather than normal-weight, was negatively associated with membership in increasingly risky clusters (odds ratio, 0.5), as was the perception of being overweight, as opposed to about the right weight (0.8). However, being overweight was positively associated with males' membership in increasingly risky clusters (1.3). Among females, being obese, rather than normal-weight, was negatively correlated with membership in increasingly risky clusters (0.8), while the perception of being overweight was positively correlated with such membership (1.1). Both objective and subjective assessments of weight are associated with the clustering of risky sexual behaviors among adolescents, and these behavioral patterns differ by gender. Copyright © 2016 by the Guttmacher Institute.

  2. Farm, household, and farmer characteristics associated with changes in management practices and technology adoption among dairy smallholders.

    PubMed

    Martínez-García, Carlos Galdino; Ugoretz, Sarah Janes; Arriaga-Jordán, Carlos Manuel; Wattiaux, Michel André

    2015-02-01

    This study explored whether technology adoption and changes in management practices were associated with farm structure, household, and farmer characteristics and to identify processes that may foster productivity and sustainability of small-scale dairy farming in the central highlands of Mexico. Factor analysis of survey data from 44 smallholders identified three factors-related to farm size, farmer's engagement, and household structure-that explained 70 % of cumulative variance. The subsequent hierarchical cluster analysis yielded three clusters. Cluster 1 included the most senior farmers with fewest years of education but greatest years of experience. Cluster 2 included farmers who reported access to extension, cooperative services, and more management changes. Cluster 2 obtained 25 and 35 % more milk than farmers in clusters 1 and 3, respectively. Cluster 3 included the youngest farmers, with most years of education and greatest availability of family labor. Access to a network and membership in a community of peers appeared as important contributors to success. Smallholders gravitated towards easy to implement technologies that have immediate benefits. Nonusers of high investment technologies found them unaffordable because of cost, insufficient farm size, and lack of knowledge or reliable electricity. Multivariate analysis may be a useful tool in planning extension activities and organizing channels of communication to effectively target farmers with varying needs, constraints, and motivations for change and in identifying farmers who may exemplify models of change for others who manage farms that are structurally similar but performing at a lower level.

  3. Evaluation of genetic diversity among soybean (Glycine max) genotypes using univariate and multivariate analysis.

    PubMed

    Oliveira, M M; Sousa, L B; Reis, M C; Silva Junior, E G; Cardoso, D B O; Hamawaki, O T; Nogueira, A P O

    2017-05-31

    The genetic diversity study has paramount importance in breeding programs; hence, it allows selection and choice of the parental genetic divergence, which have the agronomic traits desired by the breeder. This study aimed to characterize the genetic divergence between 24 soybean genotypes through their agronomic traits, using multivariate clustering methods to select the potential genitors for the promising hybrid combinations. Six agronomic traits evaluated were number of days to flowering and maturity, plant height at flowering and maturity, insertion height of the first pod, and yield. The genetic divergence evaluated by multivariate analysis that esteemed first the Mahalanobis' generalized distance (D 2 ), then the clustering using Tocher's optimization methods, and then the unweighted pair group method with arithmetic average (UPGMA). Tocher's optimization method and the UPGMA agreed with the groups' constitution between each other, the formation of eight distinct groups according Tocher's method and seven distinct groups using UPGMA. The trait number of days for flowering (45.66%) was the most efficient to explain dissimilarity between genotypes, and must be one of the main traits considered by the breeder in the moment of genitors choice in soybean-breeding programs. The genetic variability allowed the identification of dissimilar genotypes and with superior performances. The hybridizations UFU 18 x UFUS CARAJÁS, UFU 15 x UFU 13, and UFU 13 x UFUS CARAJÁS are promising to obtain superior segregating populations, which enable the development of more productive genotypes.

  4. A Multivariate Model and Analysis of Competitive Strategy in the U.S. Hardwood Lumber Industry

    Treesearch

    Robert J. Bush; Steven A. Sinclair

    1991-01-01

    Business-level competitive strategy in the hardwood lumber industry was modeled through the identification of strategic groups among large U.S. hardwood lumber producers. Strategy was operationalized using a measure based on the variables developed by Dess and Davis (1984). Factor and cluster analyses were used to define strategic groups along the dimensions of cost...

  5. Gap Shape Classification using Landscape Indices and Multivariate Statistics

    PubMed Central

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-01-01

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks’ lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap. PMID:27901127

  6. Gap Shape Classification using Landscape Indices and Multivariate Statistics.

    PubMed

    Wu, Chih-Da; Cheng, Chi-Chuan; Chang, Che-Chang; Lin, Chinsu; Chang, Kun-Cheng; Chuang, Yung-Chung

    2016-11-30

    This study proposed a novel methodology to classify the shape of gaps using landscape indices and multivariate statistics. Patch-level indices were used to collect the qualified shape and spatial configuration characteristics for canopy gaps in the Lienhuachih Experimental Forest in Taiwan in 1998 and 2002. Non-hierarchical cluster analysis was used to assess the optimal number of gap clusters and canonical discriminant analysis was used to generate the discriminant functions for canopy gap classification. The gaps for the two periods were optimally classified into three categories. In general, gap type 1 had a more complex shape, gap type 2 was more elongated and gap type 3 had the largest gaps that were more regular in shape. The results were evaluated using Wilks' lambda as satisfactory (p < 0.001). The agreement rate of confusion matrices exceeded 96%. Differences in gap characteristics between the classified gap types that were determined using a one-way ANOVA showed a statistical significance in all patch indices (p = 0.00), except for the Euclidean nearest neighbor distance (ENN) in 2002. Taken together, these results demonstrated the feasibility and applicability of the proposed methodology to classify the shape of a gap.

  7. Cytologic separation of branchial cleft cyst from metastatic cystic squamous cell carcinoma: A multivariate analysis of nineteen cytomorphologic features.

    PubMed

    Layfield, Lester J; Esebua, Magda; Schmidt, Robert L

    2016-07-01

    The separation of branchial cleft cysts from metastatic cystic squamous cell carcinomas in adults can be clinically and cytologically challenging. Diagnostic accuracy for separation is reported to be as low as 75% prompting some authors to recommend frozen section evaluation of suspected branchial cleft cysts before resection. We evaluated 19 cytologic features to determine which were useful in this distinction. Thirty-three cases (21 squamous carcinoma and 12 branchial cysts) of histologically confirmed cystic lesions of the lateral neck were graded for the presence or absence of 19 cytologic features by two cytopathologists. The cytologic features were analyzed for agreement between observers and underwent multivariate analysis for correlation with the diagnosis of carcinoma. Interobserver agreement was greatest for increased nuclear/cytoplasmic (N/C) ratio, pyknotic nuclei, and irregular nuclear membranes. Recursive partitioning analysis showed increased N/C ratio, small clusters of cells, and irregular nuclear membranes were the best discriminators. The distinction of branchial cleft cysts from cystic squamous cell carcinoma is cytologically difficult. Both digital image analysis and p16 testing have been suggested as aids in this separation, but analysis of cytologic features remains the main method for diagnosis. In an analysis of 19 cytologic features, we found that high nuclear cytoplasmic ratio, irregular nuclear membranes, and small cell clusters were most helpful in their distinction. Diagn. Cytopathol. 2016;44:561-567. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  8. Rapid quality assessment of Radix Aconiti Preparata using direct analysis in real time mass spectrometry.

    PubMed

    Zhu, Hongbin; Wang, Chunyan; Qi, Yao; Song, Fengrui; Liu, Zhiqiang; Liu, Shuying

    2012-11-08

    This study presents a novel and rapid method to identify chemical markers for the quality control of Radix Aconiti Preparata, a world widely used traditional herbal medicine. In the method, the samples with a fast extraction procedure were analyzed using direct analysis in real time mass spectrometry (DART MS) combined with multivariate data analysis. At present, the quality assessment approach of Radix Aconiti Preparata was based on the two processing methods recorded in Chinese Pharmacopoeia for the purpose of reducing the toxicity of Radix Aconiti and ensuring its clinical therapeutic efficacy. In order to ensure the safety and effectivity in clinical use, the processing degree of Radix Aconiti should be well controlled and assessed. In the paper, hierarchical cluster analysis and principal component analysis were performed to evaluate the DART MS data of Radix Aconiti Preparata samples in different processing times. The results showed that the well processed Radix Aconiti Preparata, unqualified processed and the raw Radix Aconiti could be clustered reasonably corresponding to their constituents. The loading plot shows that the main chemical markers having the most influence on the discrimination amongst the qualified and unqualified samples were mainly some monoester diterpenoid aconitines and diester diterpenoid aconitines, i.e. benzoylmesaconine, hypaconitine, mesaconitine, neoline, benzoylhypaconine, benzoylaconine, fuziline, aconitine and 10-OH-mesaconitine. The established DART MS approach in combination with multivariate data analysis provides a very flexible and reliable method for quality assessment of toxic herbal medicine. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. Psychosocial Clusters and their Associations with Well-Being and Health: An Empirical Strategy for Identifying Psychosocial Predictors Most Relevant to Racially/Ethnically Diverse Women’s Health

    PubMed Central

    Jabson, Jennifer M.; Bowen, Deborah; Weinberg, Janice; Kroenke, Candyce; Luo, Juhua; Messina, Catherine; Shumaker, Sally; Tindle, Hilary A.

    2016-01-01

    BACKGROUND Strategies for identifying the most relevant psychosocial predictors in studies of racial/ethnic minority women’s health are limited because they largely exclude cultural influences and they assume that psychosocial predictors are independent. This paper proposes and tests an empirical solution. METHODS Hierarchical cluster analysis, conducted with data from 140,652 Women’s Health Initiative participants, identified clusters among individual psychosocial predictors. Multivariable analyses tested associations between clusters and health outcomes. RESULTS A Social Cluster and a Stress Cluster were identified. The Social Cluster was positively associated with well-being and inversely associated with chronic disease index, and the Stress Cluster was inversely associated with well-being and positively associated with chronic disease index. As hypothesized, the magnitude of association between clusters and outcomes differed by race/ethnicity. CONCLUSIONS By identifying psychosocial clusters and their associations with health, we have taken an important step toward understanding how individual psychosocial predictors interrelate and how empirically formed Stress and Social clusters relate to health outcomes. This study has also demonstrated important insight about differences in associations between these psychosocial clusters and health among racial/ethnic minorities. These differences could signal the best pathways for intervention modification and tailoring. PMID:27279761

  10. Impact of socioeconomic inequalities on geographic disparities in cancer incidence: comparison of methods for spatial disease mapping.

    PubMed

    Goungounga, Juste Aristide; Gaudart, Jean; Colonna, Marc; Giorgi, Roch

    2016-10-12

    The reliability of spatial statistics is often put into question because real spatial variations may not be found, especially in heterogeneous areas. Our objective was to compare empirically different cluster detection methods. We assessed their ability to find spatial clusters of cancer cases and evaluated the impact of the socioeconomic status (e.g., the Townsend index) on cancer incidence. Moran's I, the empirical Bayes index (EBI), and Potthoff-Whittinghill test were used to investigate the general clustering. The local cluster detection methods were: i) the spatial oblique decision tree (SpODT); ii) the spatial scan statistic of Kulldorff (SaTScan); and, iii) the hierarchical Bayesian spatial modeling (HBSM) in a univariate and multivariate setting. These methods were used with and without introducing the Townsend index of socioeconomic deprivation known to be related to the distribution of cancer incidence. Incidence data stemmed from the Cancer Registry of Isère and were limited to prostate, lung, colon-rectum, and bladder cancers diagnosed between 1999 and 2007 in men only. The study found a spatial heterogeneity (p < 0.01) and an autocorrelation for prostate (EBI = 0.02; p = 0.001), lung (EBI = 0.01; p = 0.019) and bladder (EBI = 0.007; p = 0.05) cancers. After introduction of the Townsend index, SaTScan failed in finding cancers clusters. This introduction changed the results obtained with the other methods. SpODT identified five spatial classes (p < 0.05): four in the Western and one in the Northern parts of the study area (standardized incidence ratios: 1.68, 1.39, 1.14, 1.12, and 1.16, respectively). In the univariate setting, the Bayesian smoothing method found the same clusters as the two other methods (RR >1.2). The multivariate HBSM found a spatial correlation between lung and bladder cancers (r = 0.6). In spatial analysis of cancer incidence, SpODT and HBSM may be used not only for cluster detection but also for searching for confounding or etiological factors in small areas. Moreover, the multivariate HBSM offers a flexible and meaningful modeling of spatial variations; it shows plausible previously unknown associations between various cancers.

  11. Buried landmine detection using multivariate normal clustering

    NASA Astrophysics Data System (ADS)

    Duston, Brian M.

    2001-10-01

    A Bayesian classification algorithm is presented for discriminating buried land mines from buried and surface clutter in Ground Penetrating Radar (GPR) signals. This algorithm is based on multivariate normal (MVN) clustering, where feature vectors are used to identify populations (clusters) of mines and clutter objects. The features are extracted from two-dimensional images created from ground penetrating radar scans. MVN clustering is used to determine the number of clusters in the data and to create probability density models for target and clutter populations, producing the MVN clustering classifier (MVNCC). The Bayesian Information Criteria (BIC) is used to evaluate each model to determine the number of clusters in the data. An extension of the MVNCC allows the model to adapt to local clutter distributions by treating each of the MVN cluster components as a Poisson process and adaptively estimating the intensity parameters. The algorithm is developed using data collected by the Mine Hunter/Killer Close-In Detector (MH/K CID) at prepared mine lanes. The Mine Hunter/Killer is a prototype mine detecting and neutralizing vehicle developed for the U.S. Army to clear roads of anti-tank mines.

  12. [Molecular epidemiologic study on Mycobacterium tuberculosis from drug resistance monitoring sites of Guangdong Province, 2015].

    PubMed

    Huang, X C; Guo, H X; Wu, Z H; Guo, C X; Wei, W J; Li, H C; Sun, Q; Zhang, C C; Li, Z Y; Chen, T; Zhong, Q; Zhou, L

    2017-05-12

    Objective: To understand the characteristics of Mycobacterium tuberculosis (MTB) in epidemiology and distribution from Guangdong Province, and to explore the risk factors associated with drug resistance. Methods: A total of 225 clinical strains of MTB collected from 5 drug resistance monitoring sites of Guangdong Province in 2015 were tested by Regions of Difference 105 (RD105) deletion test and 15 loci mycobacterial interspersed repetitive units (MIRU) were used for genotyping. Gene clustering was analyzed using BioNumerics7.6. Drug susceptibility test was tested by proportion method. The statistical analysis used chi-square test and multivariate logistic regression. Results: There were 158 (70.2%) Beijing family strains from the 225 cases. Hunter-gaston index of MIRU loci varied from each other. The MTBs from Guangdong Province were categorized into 2 gene clusters by clustering analysis in which the rate of cluster of complexⅠwas significantly higher than complexⅡ(χ(2) values were 9.331, P values were 0.020). It was found by multivariate logistic regression that Qub11b was associated with resistance to rifampicin and isoniazid ( P values were 0.013, 0.012 respectively.), ETR F with resistance to isoniazid, streptomycin, ethambutol and ofloxacin ( P values were 0.039, 0.040, 0.023 and 0.003 respectively), Mtub21 with resistance to capreomycin ( P values were 0.040), and QUB26 with resistance to ethionamide ( P values were 0.047). Conclusions: The genes of MTB from Guangdong Province were of polymorphisms and the distribution of strains were stable. QUB11b, ETR F, Mtub21 and QUB26 could be related to biomarkers for predicting drug resistance.

  13. Clustering of change patterns using Fourier coefficients.

    PubMed

    Kim, Jaehee; Kim, Haseong

    2008-01-15

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a time period because biologically related gene groups can share the same change patterns. Many clustering algorithms have been proposed to group observation data. However, because of the complexity of the underlying functions there have not been many studies on grouping data based on change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. The sample Fourier coefficients not only provide information about the underlying functions, but also reduce the dimension. In addition, as their limiting distribution is a multivariate normal, a model-based clustering method incorporating statistical properties would be appropriate. This work is aimed at discovering gene groups with similar change patterns that share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. The model-based method is advantageous over other methods in our proposed model because the sample Fourier coefficients asymptotically follow the multivariate normal distribution. Change patterns are automatically estimated with the Fourier representation in our model. Our model was tested in simulations and on real gene data sets. The simulation results showed that the model-based clustering method with the sample Fourier coefficients has a lower clustering error rate than K-means clustering. Even when the number of repeated time points was small, the same results were obtained. We also applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns. The R program is available upon the request.

  14. Modified multidimensional scaling approach to analyze financial markets.

    PubMed

    Yin, Yi; Shang, Pengjian

    2014-06-01

    Detrended cross-correlation coefficient (σDCCA) and dynamic time warping (DTW) are introduced as the dissimilarity measures, respectively, while multidimensional scaling (MDS) is employed to translate the dissimilarities between daily price returns of 24 stock markets. We first propose MDS based on σDCCA dissimilarity and MDS based on DTW dissimilarity creatively, while MDS based on Euclidean dissimilarity is also employed to provide a reference for comparisons. We apply these methods in order to further visualize the clustering between stock markets. Moreover, we decide to confront MDS with an alternative visualization method, "Unweighed Average" clustering method, for comparison. The MDS analysis and "Unweighed Average" clustering method are employed based on the same dissimilarity. Through the results, we find that MDS gives us a more intuitive mapping for observing stable or emerging clusters of stock markets with similar behavior, while the MDS analysis based on σDCCA dissimilarity can provide more clear, detailed, and accurate information on the classification of the stock markets than the MDS analysis based on Euclidean dissimilarity. The MDS analysis based on DTW dissimilarity indicates more knowledge about the correlations between stock markets particularly and interestingly. Meanwhile, it reflects more abundant results on the clustering of stock markets and is much more intensive than the MDS analysis based on Euclidean dissimilarity. In addition, the graphs, originated from applying MDS methods based on σDCCA dissimilarity and DTW dissimilarity, may also guide the construction of multivariate econometric models.

  15. Non-targeted analyses of animal plasma: betaine and choline represent the nutritional and metabolic status.

    PubMed

    Katayama, K; Sato, T; Arai, T; Amao, H; Ohta, Y; Ozawa, T; Kenyon, P R; Hickson, R E; Tazaki, H

    2013-02-01

    Simple liquid chromatography-mass spectrometry (LC-MS) was applied to non-targeted metabolic analyses to discover new metabolic markers in animal plasma. Principle component analysis (PCA) and partial least squares-discriminate analysis (PLS-DA) were used to analyse LC-MS multivariate data. PCA clearly generated two separate clusters for artificially induced diabetic mice and healthy control mice. PLS-DA of time-course changes in plasma metabolites of chicks after feeding generated three clusters (pre- and immediately after feeding, 0.5-3 h after feeding and 4 h after feeding). Two separate clusters were also generated for plasma metabolites of pregnant Angus heifers with differing live-weight change profiles (gaining or losing). The accompanying PLS-DA loading plot detailed the metabolites that contribute the most to the cluster separation. In each case, the same highly hydrophilic metabolite was strongly correlated to the group separation. The metabolite was identified as betaine by LC-MS/MS. This result indicates that betaine and its metabolic precursor, choline, may be useful biomarkers to evaluate the nutritional and metabolic status of animals. © 2011 Blackwell Verlag GmbH.

  16. A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition

    PubMed Central

    Austin, Elena; Coull, Brent A.; Zanobetti, Antonella; Koutrakis, Petros

    2013-01-01

    Background Heterogeneity in the response to PM2.5 is hypothesized to be related to differences in particle composition across monitoring sites which reflect differences in source types as well as climatic and topographic conditions impacting different geographic locations. Identifying spatial patterns in particle composition is a multivariate problem that requires novel methodologies. Objectives Use cluster analysis methods to identify spatial patterns in PM2.5 composition. Verify that the resulting clusters are distinct and informative. Methods 109 monitoring sites with 75% reported speciation data during the period 2003–2008 were selected. These sites were categorized based on their average PM2.5 composition over the study period using k-means cluster analysis. The obtained clusters were validated and characterized based on their physico-chemical characteristics, geographic locations, emissions profiles, population density and proximity to major emission sources. Results Overall 31 clusters were identified. These include 21 clusters with 2 or more sites which were further grouped into 4 main types using hierarchical clustering. The resulting groupings are chemically meaningful and represent broad differences in emissions. The remaining clusters, encompassing single sites, were characterized based on their particle composition and geographic location. Conclusions The framework presented here provides a novel tool which can be used to identify and further classify sites based on their PM2.5 composition. The solution presented is fairly robust and yielded groupings that were meaningful in the context of air-pollution research. PMID:23850585

  17. Temporal and spatial analysis of psittacosis in association with poultry farming in the Netherlands, 2000-2015.

    PubMed

    Hogerwerf, Lenny; Holstege, Manon M C; Benincà, Elisa; Dijkstra, Frederika; van der Hoek, Wim

    2017-07-26

    Human psittacosis is a highly under diagnosed zoonotic disease, commonly linked to psittacine birds. Psittacosis in birds, also known as avian chlamydiosis, is endemic in poultry, but the risk for people living close to poultry farms is unknown. Therefore, our study aimed to explore the temporal and spatial patterns of human psittacosis infections and identify possible associations with poultry farming in the Netherlands. We analysed data on 700 human cases of psittacosis notified between 01-01-2000 and 01-09-2015. First, we studied the temporal behaviour of psittacosis notifications by applying wavelet analysis. Then, to identify possible spatial patterns, we applied spatial cluster analysis. Finally, we investigated the possible spatial association between psittacosis notifications and data on the Dutch poultry sector at municipality level using a multivariable model. We found a large spatial cluster that covered a highly poultry-dense area but additional clusters were found in areas that had a low poultry density. There were marked geographical differences in the awareness of psittacosis and the amount and the type of laboratory diagnostics used for psittacosis, making it difficult to draw conclusions about the correlation between the large cluster and poultry density. The multivariable model showed that the presence of chicken processing plants and slaughter duck farms in a municipality was associated with a higher rate of human psittacosis notifications. The significance of the associations was influenced by the inclusion or exclusion of farm density in the model. Our temporal and spatial analyses showed weak associations between poultry-related variables and psittacosis notifications. Because of the low number of psittacosis notifications available for analysis, the power of our analysis was relative low. Because of the exploratory nature of this research, the associations found cannot be interpreted as evidence for airborne transmission of psittacosis from poultry to the general population. Further research is needed to determine the prevalence of C. psittaci in Dutch poultry. Also, efforts to promote PCR-based testing for C. psittaci and genotyping for source tracing are important to reduce the diagnostic deficit, and to provide better estimates of the human psittacosis burden, and the possible role of poultry.

  18. Emergence of sporadic non-clustered cases of hospital-associated listeriosis among immunocompromised adults in southern Taiwan from 1992 to 2013: effect of precipitating immunosuppressive agents

    PubMed Central

    2014-01-01

    Background Sporadic non-clustered hospital-associated listeriosis is an emerging infectious disease in immunocompromised hosts. The current study was designed to determine the impact of long-term and precipitating immunosuppressive agents and underlying diseases on triggering the expression of the disease, and to compare the clinical features and outcome of hospital-associated and community-associated listeriosis. Methods We reviewed the medical records of all patients with Listeria monocytogenes isolated from sterile body sites at a large medical center in southern Taiwan during 1992–2013. Non-clustered cases were defined as those unrelated to any other in time or place. Multivariable regression analysis was used to determine factors associated with prognosis. Results Thirty-five non-clustered cases of listeriosis were identified. Twelve (34.2%) were hospital-associated, and 23 (65.7%) were community-associated. The 60-day mortality was significantly greater in hospital-associated than in community-associated cases (66.7% vs. 17.4%, p = 0.007). Significantly more hospital-associated than community-associated cases were treated with a precipitating immunosuppressive agent within 4 weeks prior to onset of listeriosis (91.7% vs. 4.3%, respectively p < 0.001). The median period from the start of precipitating immunosuppressive treatment to the onset of listeriosis-related symptoms was 12 days (range, 4–27 days) in 11 of the 12 hospital-associated cases. In the multivariable analysis, APACHE II score >21 (p = 0.04) and receipt of precipitating immunosuppressive therapy (p = 0.02) were independent risk factors for 60-day mortality. Conclusions Sporadic non-clustered hospital-associated listeriosis needs to be considered in the differential diagnosis of sepsis in immunocompromised patients, particularly in those treated with new or increased doses of immunosuppressive agents. PMID:24641498

  19. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.

    PubMed

    Grapov, Dmitry; Newman, John W

    2012-09-01

    Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010).

  20. Cluster-based exposure variation analysis

    PubMed Central

    2013-01-01

    Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p < 0.001). All three methods performed poorly in discriminating exposure patterns differing with respect to the variability in cycle time duration. Conclusion While C-EVA had a higher accuracy than conventional EVA, both failed to detect differences in temporal similarity. The data-driven optimality of data reduction and the capability of handling multiple exposure time lines in a single analysis are the advantages of the C-EVA. PMID:23557439

  1. Relationship between Oral Malodor and the Global Composition of Indigenous Bacterial Populations in Saliva ▿

    PubMed Central

    Takeshita, Toru; Suzuki, Nao; Nakano, Yoshio; Shimazaki, Yoshihiro; Yoneda, Masahiro; Hirofuji, Takao; Yamashita, Yoshihisa

    2010-01-01

    Oral malodor develops mostly from the metabolic activities of indigenous bacterial populations within the oral cavity, but whether healthy or oral malodor-related patterns of the global bacterial composition exist remains unclear. In this study, the bacterial compositions in the saliva of 240 subjects complaining of oral malodor were divided into groups based on terminal-restriction fragment length polymorphism (T-RFLP) profiles using hierarchical cluster analysis, and the patterns of the microbial community composition of those exhibiting higher and lower malodor were explored. Four types of bacterial community compositions were detected (clusters I, II, III, and IV). Two parameters for measuring oral malodor intensity (the concentration of volatile sulfur compounds in mouth air and the organoleptic score) were noticeably lower in cluster I than in the other clusters. Using multivariate analysis, the differences in the levels of oral malodor were significant after adjustment for potential confounding factors such as total bacterial count, mean periodontal pocket depth, and tongue coating score (P < 0.001). Among the four clusters with different proportions of indigenous members, the T-RFLP profiles of cluster I were implicated as the bacterial populations with higher proportions of Streptococcus, Granulicatella, Rothia, and Treponema species than those of the other clusters. These results clearly correlate the global composition of indigenous bacterial populations with the severity of oral malodor. PMID:20228112

  2. Applications of Multivariate Statistical Techniques for Computer Performance Evaluation.

    DTIC Science & Technology

    1983-12-01

    parameters has on another parameter. VII-f1 *T-. . . . . . . -,z X 71 .7 . V - AFIT/GCS/EE/83D-4 CHAPTER VIII CLUSTER ANALYSIS In data analysis the study...their highest, with bnchmk being 50% greater than the overall average of . 318 seconds and nuprocs being 147% greater than its overall average of 30.8...overall average of . 318 seconds and nuprocs being 147% greater than its overall average of 30.8. These increased values of bnchmk indicate that during

  3. Spermiogram and sperm head morphometry assessed by multivariate cluster analysis results during adolescence (12-18 years) and the effect of varicocele

    PubMed Central

    Vásquez, Fernando; Soler, Carles; Camps, Patricia; Valverde, Anthony; García-Molina, Almudena

    2016-01-01

    This work evaluates sperm head morphometric characteristics in adolescents from 12 to 18 years of age, and the effect of varicocele. Volunteers between 150 and 224 months of age (mean 191, n = 87), who had reached oigarche by 12 years old, were recruited in the area of Barranquilla, Colombia. Morphometric analysis of sperm heads was performed with principal component (PC) and discriminant analysis. Combining seminal fluid and sperm parameters provided five PCs: two related to sperm morphometry, one to sperm motility, and two to seminal fluid components. Discriminant analysis on the morphometric results of varicocele and nonvaricocele groups did not provide a useful classification matrix. Of the semen-related PCs, the most explanatory (40%) was related to sperm motility. Two PCs, including sperm head elongation and size, were sufficient to evaluate sperm morphometric characteristics. Most of the morphometric variables were correlated with age, with an increase in size and decrease in the elongation of the sperm head. For head size, the entire sperm population could be divided into two morphometric subpopulations, SP1 and SP2, which did not change during adolescence. In general, for varicocele individuals, SP1 had larger and more elongated sperm heads than SP2, which had smaller and more elongated heads than in nonvaricocele men. In summary, sperm head morphometry assessed by CASA-Morph and multivariate cluster analysis provides a better comprehension of the ejaculate structure and possibly sperm function. Morphometric analysis provides much more information than data obtained from conventional semen analysis. PMID:27751986

  4. Validity analysis on merged and averaged data using within and between analysis: focus on effect of qualitative social capital on self-rated health.

    PubMed

    Shin, Sang Soo; Shin, Young-Jeon

    2016-01-01

    With an increasing number of studies highlighting regional social capital (SC) as a determinant of health, many studies are using multi-level analysis with merged and averaged scores of community residents' survey responses calculated from community SC data. Sufficient examination is required to validate if the merged and averaged data can represent the community. Therefore, this study analyzes the validity of the selected indicators and their applicability in multi-level analysis. Within and between analysis (WABA) was performed after creating community variables using merged and averaged data of community residents' responses from the 2013 Community Health Survey in Korea, using subjective self-rated health assessment as a dependent variable. Further analysis was performed following the model suggested by WABA result. Both E-test results (1) and WABA results (2) revealed that single-level analysis needs to be performed using qualitative SC variable with cluster mean centering. Through single-level multivariate regression analysis, qualitative SC with cluster mean centering showed positive effect on self-rated health (0.054, p<0.001), although there was no substantial difference in comparison to analysis using SC variables without cluster mean centering or multi-level analysis. As modification in qualitative SC was larger within the community than between communities, we validate that relational analysis of individual self-rated health can be performed within the group, using cluster mean centering. Other tests besides the WABA can be performed in the future to confirm the validity of using community variables and their applicability in multi-level analysis.

  5. Subspace K-means clustering.

    PubMed

    Timmerman, Marieke E; Ceulemans, Eva; De Roover, Kim; Van Leeuwen, Karla

    2013-12-01

    To achieve an insightful clustering of multivariate data, we propose subspace K-means. Its central idea is to model the centroids and cluster residuals in reduced spaces, which allows for dealing with a wide range of cluster types and yields rich interpretations of the clusters. We review the existing related clustering methods, including deterministic, stochastic, and unsupervised learning approaches. To evaluate subspace K-means, we performed a comparative simulation study, in which we manipulated the overlap of subspaces, the between-cluster variance, and the error variance. The study shows that the subspace K-means algorithm is sensitive to local minima but that the problem can be reasonably dealt with by using partitions of various cluster procedures as a starting point for the algorithm. Subspace K-means performs very well in recovering the true clustering across all conditions considered and appears to be superior to its competitor methods: K-means, reduced K-means, factorial K-means, mixtures of factor analyzers (MFA), and MCLUST. The best competitor method, MFA, showed a performance similar to that of subspace K-means in easy conditions but deteriorated in more difficult ones. Using data from a study on parental behavior, we show that subspace K-means analysis provides a rich insight into the cluster characteristics, in terms of both the relative positions of the clusters (via the centroids) and the shape of the clusters (via the within-cluster residuals).

  6. Cluster Analysis Identifies 3 Phenotypes within Allergic Asthma.

    PubMed

    Sendín-Hernández, María Paz; Ávila-Zarza, Carmelo; Sanz, Catalina; García-Sánchez, Asunción; Marcos-Vadillo, Elena; Muñoz-Bellido, Francisco J; Laffond, Elena; Domingo, Christian; Isidoro-García, María; Dávila, Ignacio

    Asthma is a heterogeneous chronic disease with different clinical expressions and responses to treatment. In recent years, several unbiased approaches based on clinical, physiological, and molecular features have described several phenotypes of asthma. Some phenotypes are allergic, but little is known about whether these phenotypes can be further subdivided. We aimed to phenotype patients with allergic asthma using an unbiased approach based on multivariate classification techniques (unsupervised hierarchical cluster analysis). From a total of 54 variables of 225 patients with well-characterized allergic asthma diagnosed following American Thoracic Society (ATS) recommendation, positive skin prick test to aeroallergens, and concordant symptoms, we finally selected 19 variables by multiple correspondence analyses. Then a cluster analysis was performed. Three groups were identified. Cluster 1 was constituted by patients with intermittent or mild persistent asthma, without family antecedents of atopy, asthma, or rhinitis. This group showed the lowest total IgE levels. Cluster 2 was constituted by patients with mild asthma with a family history of atopy, asthma, or rhinitis. Total IgE levels were intermediate. Cluster 3 included patients with moderate or severe persistent asthma that needed treatment with corticosteroids and long-acting β-agonists. This group showed the highest total IgE levels. We identified 3 phenotypes of allergic asthma in our population. Furthermore, we described 2 phenotypes of mild atopic asthma mainly differentiated by a family history of allergy. Copyright © 2017 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  7. Social Media Use and Depression and Anxiety Symptoms: A Cluster Analysis.

    PubMed

    Shensa, Ariel; Sidani, Jaime E; Dew, Mary Amanda; Escobar-Viera, César G; Primack, Brian A

    2018-03-01

    Individuals use social media with varying quantity, emotional, and behavioral at- tachment that may have differential associations with mental health outcomes. In this study, we sought to identify distinct patterns of social media use (SMU) and to assess associations between those patterns and depression and anxiety symptoms. In October 2014, a nationally-representative sample of 1730 US adults ages 19 to 32 completed an online survey. Cluster analysis was used to identify patterns of SMU. Depression and anxiety were measured using respective 4-item Patient-Reported Outcome Measurement Information System (PROMIS) scales. Multivariable logistic regression models were used to assess associations between clus- ter membership and depression and anxiety. Cluster analysis yielded a 5-cluster solu- tion. Participants were characterized as "Wired," "Connected," "Diffuse Dabblers," "Concentrated Dabblers," and "Unplugged." Membership in 2 clusters - "Wired" and "Connected" - increased the odds of elevated depression and anxiety symptoms (AOR = 2.7, 95% CI = 1.5-4.7; AOR = 3.7, 95% CI = 2.1-6.5, respectively, and AOR = 2.0, 95% CI = 1.3-3.2; AOR = 2.0, 95% CI = 1.3-3.1, respectively). SMU pattern characterization of a large population suggests 2 pat- terns are associated with risk for depression and anxiety. Developing educational interventions that address use patterns rather than single aspects of SMU (eg, quantity) would likely be useful.

  8. Method of identifying clusters representing statistical dependencies in multivariate data

    NASA Technical Reports Server (NTRS)

    Borucki, W. J.; Card, D. H.; Lyle, G. C.

    1975-01-01

    Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.

  9. A Comparison of Two Approaches to Beta-Flexible Clustering.

    ERIC Educational Resources Information Center

    Belbin, Lee; And Others

    1992-01-01

    A method for hierarchical agglomerative polythetic (multivariate) clustering, based on unweighted pair group using arithmetic averages (UPGMA) is compared with the original beta-flexible technique, a weighted average method. Reasons the flexible UPGMA strategy is recommended are discussed, focusing on the ability to recover cluster structure over…

  10. A K-means multivariate approach for clustering independent components from magnetoencephalographic data.

    PubMed

    Spadone, Sara; de Pasquale, Francesco; Mantini, Dante; Della Penna, Stefania

    2012-09-01

    Independent component analysis (ICA) is typically applied on functional magnetic resonance imaging, electroencephalographic and magnetoencephalographic (MEG) data due to its data-driven nature. In these applications, ICA needs to be extended from single to multi-session and multi-subject studies for interpreting and assigning a statistical significance at the group level. Here a novel strategy for analyzing MEG independent components (ICs) is presented, Multivariate Algorithm for Grouping MEG Independent Components K-means based (MAGMICK). The proposed approach is able to capture spatio-temporal dynamics of brain activity in MEG studies by running ICA at subject level and then clustering the ICs across sessions and subjects. Distinctive features of MAGMICK are: i) the implementation of an efficient set of "MEG fingerprints" designed to summarize properties of MEG ICs as they are built on spatial, temporal and spectral parameters; ii) the implementation of a modified version of the standard K-means procedure to improve its data-driven character. This algorithm groups the obtained ICs automatically estimating the number of clusters through an adaptive weighting of the parameters and a constraint on the ICs independence, i.e. components coming from the same session (at subject level) or subject (at group level) cannot be grouped together. The performances of MAGMICK are illustrated by analyzing two sets of MEG data obtained during a finger tapping task and median nerve stimulation. The results demonstrate that the method can extract consistent patterns of spatial topography and spectral properties across sessions and subjects that are in good agreement with the literature. In addition, these results are compared to those from a modified version of affinity propagation clustering method. The comparison, evaluated in terms of different clustering validity indices, shows that our methodology often outperforms the clustering algorithm. Eventually, these results are confirmed by a comparison with a MEG tailored version of the self-organizing group ICA, which is largely used for fMRI IC clustering. Copyright © 2012 Elsevier Inc. All rights reserved.

  11. Customized recommendations for production management clusters of North American automatic milking systems.

    PubMed

    Tremblay, Marlène; Hess, Justin P; Christenson, Brock M; McIntyre, Kolby K; Smink, Ben; van der Kamp, Arjen J; de Jong, Lisanne G; Döpfer, Dörte

    2016-07-01

    Automatic milking systems (AMS) are implemented in a variety of situations and environments. Consequently, there is a need to characterize individual farming practices and regional challenges to streamline management advice and objectives for producers. Benchmarking is often used in the dairy industry to compare farms by computing percentile ranks of the production values of groups of farms. Grouping for conventional benchmarking is commonly limited to the use of a few factors such as farms' geographic region or breed of cattle. We hypothesized that herds' production data and management information could be clustered in a meaningful way using cluster analysis and that this clustering approach would yield better peer groups of farms than benchmarking methods based on criteria such as country, region, breed, or breed and region. By applying mixed latent-class model-based cluster analysis to 529 North American AMS dairy farms with respect to 18 significant risk factors, 6 clusters were identified. Each cluster (i.e., peer group) represented unique management styles, challenges, and production patterns. When compared with peer groups based on criteria similar to the conventional benchmarking standards, the 6 clusters better predicted milk produced (kilograms) per robot per day. Each cluster represented a unique management and production pattern that requires specialized advice. For example, cluster 1 farms were those that recently installed AMS robots, whereas cluster 3 farms (the most northern farms) fed high amounts of concentrates through the robot to compensate for low-energy feed in the bunk. In addition to general recommendations for farms within a cluster, individual farms can generate their own specific goals by comparing themselves to farms within their cluster. This is very comparable to benchmarking but adds the specific characteristics of the peer group, resulting in better farm management advice. The improvement that cluster analysis allows for is characterized by the multivariable approach and the fact that comparisons between production units can be accomplished within a cluster and between clusters as a choice. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  12. A Network-Based Algorithm for Clustering Multivariate Repeated Measures Data

    NASA Technical Reports Server (NTRS)

    Koslovsky, Matthew; Arellano, John; Schaefer, Caroline; Feiveson, Alan; Young, Millennia; Lee, Stuart

    2017-01-01

    The National Aeronautics and Space Administration (NASA) Astronaut Corps is a unique occupational cohort for which vast amounts of measures data have been collected repeatedly in research or operational studies pre-, in-, and post-flight, as well as during multiple clinical care visits. In exploratory analyses aimed at generating hypotheses regarding physiological changes associated with spaceflight exposure, such as impaired vision, it is of interest to identify anomalies and trends across these expansive datasets. Multivariate clustering algorithms for repeated measures data may help parse the data to identify homogeneous groups of astronauts that have higher risks for a particular physiological change. However, available clustering methods may not be able to accommodate the complex data structures found in NASA data, since the methods often rely on strict model assumptions, require equally-spaced and balanced assessment times, cannot accommodate missing data or differing time scales across variables, and cannot process continuous and discrete data simultaneously. To fill this gap, we propose a network-based, multivariate clustering algorithm for repeated measures data that can be tailored to fit various research settings. Using simulated data, we demonstrate how our method can be used to identify patterns in complex data structures found in practice.

  13. Univariate and multivariate analysis of tannin-impregnated wood species using vibrational spectroscopy.

    PubMed

    Schnabel, Thomas; Musso, Maurizio; Tondi, Gianluca

    2014-01-01

    Vibrational spectroscopy is one of the most powerful tools in polymer science. Three main techniques--Fourier transform infrared spectroscopy (FT-IR), FT-Raman spectroscopy, and FT near-infrared (NIR) spectroscopy--can also be applied to wood science. Here, these three techniques were used to investigate the chemical modification occurring in wood after impregnation with tannin-hexamine preservatives. These spectroscopic techniques have the capacity to detect the externally added tannin. FT-IR has very strong sensitivity to the aromatic peak at around 1610 cm(-1) in the tannin-treated samples, whereas FT-Raman reflects the peak at around 1600 cm(-1) for the externally added tannin. This high efficacy in distinguishing chemical features was demonstrated in univariate analysis and confirmed via cluster analysis. Conversely, the results of the NIR measurements show noticeable sensitivity for small differences. For this technique, multivariate analysis is required and with this chemometric tool, it is also possible to predict the concentration of tannin on the surface.

  14. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    DOE PAGES

    Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...

    2014-12-02

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less

  15. Stone loaches of Choman River system, Kurdistan, Iran (Teleostei: Cypriniformes: Nemacheilidae).

    PubMed

    Kamangar, Barzan Bahrami; Prokofiev, Artem M; Ghaderi, Edris; Nalbant, Theodore T

    2014-01-20

    For the first time, we present data on species composition and distributions of nemacheilid loaches in the Choman River basin of Kurdistan province, Iran. Two genera and four species are recorded from the area, of which three species are new for science: Oxynoemacheilus kurdistanicus, O. zagrosensis, O. chomanicus spp. nov., and Turcinoemacheilus kosswigi Băn. et Nalb. Detailed and illustrated morphological descriptions and univariate and multivariate analysis of morphometric and meristic features are for each of these species. Forty morphometric and eleven meristic characters were used in multivariate analysis to select characters that could discriminate between the four loach species. Discriminant Function Analysis revealed that sixteen morphometric measures and five meristic characters have the most variability between the loach species. The dendrograms based on cluster analysis of Mahalanobis distances of morphometrics and a combination of both characters confirmed two distinct groups: Oxynoemacheilus spp. and T. kosswigi. Within Oxynoemacheilus, O. zagrosensis and O. chomanicus are more similar to one other rather to either is to O. kurdistanicus.

  16. Composting of cow dung and crop residues using termite mounds as bulking agent.

    PubMed

    Karak, Tanmoy; Sonar, Indira; Paul, Ranjit K; Das, Sampa; Boruah, R K; Dutta, Amrit K; Das, Dilip K

    2014-10-01

    The present study reports the suitability of termite mounds as a bulking agent for composting with crop residues and cow dung in pit method. Use of 50 kg termite mound with the crop residues (stover of ground nut: 361.65 kg; soybean: 354.59 kg; potato: 357.67 kg and mustard: 373.19 kg) and cow dung (84.90 kg) formed a good quality compost within 70 days of composting having nitrogen, phosphorus and potassium as 20.19, 3.78 and 32.77 g kg(-1) respectively with a bulk density of 0.85 g cm(-3). Other physico-chemical and germination parameters of the compost were within Indian standard, which had been confirmed by the application of multivariate analysis of variance and multivariate contrast analysis. Principal component analysis was applied in order to gain insight into the characteristic variables. Four composting treatments formed two different groups when hierarchical cluster analysis was applied. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi

    2014-12-01

    Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less

  18. Sugar and acid content of Citrus prediction modeling using FT-IR fingerprinting in combination with multivariate statistical analysis.

    PubMed

    Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung

    2016-01-01

    A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. The differentiation of camel breeds based on meat measurements using discriminant analysis.

    PubMed

    Al-Atiyat, Raed Mahmoud; Suliman, Gamal; AlSuhaibani, Entissar; El-Waziry, Ahmad; Al-Owaimer, Abdullah; Basmaeil, Saeid

    2016-06-01

    The meat productivity of camel in the tropics is still under investigation for identification of better meat breed or type. Therefore, four one-humped Saudi Arabian (SA) camel breeds, Majaheem, Maghateer, Hamrah, and Safrah were experimented in order to differentiate them from each other based on meat measurements. The measurements were biometrical meat traits measured on six intact males from each breed. The results showed higher values of the Majaheem breed than that obtained for the other breeds except few cases such dressing percentage and rib-eye area. In differentiation analysis, the most discriminating meat variables were myofibrillar protein index, meat color components (L* and a*, b*), and cooking loss. Consequently, the Safrah and the Majaheem breeds presented the largest dissimilarity as evidenced by their multivariate means. The canonical discriminant analysis allowed an additional understanding of the differentiation between breeds. Furthermore, two large clusters, one formed by Hamrah and Maghateer in one group along with Safrah. These classifications may assign each breed into one cluster considering they are better as meat producers. The Majaheem was clustered alone in another cluster that might be a result of being better as milk producers. Nevertheless, the productivity type of the camel breeds of SA needs further morphology and genetic descriptions.

  20. Lagged segmented Poincaré plot analysis for risk stratification in patients with dilated cardiomyopathy.

    PubMed

    Voss, Andreas; Fischer, Claudia; Schroeder, Rico; Figulla, Hans R; Goernig, Matthias

    2012-07-01

    The objectives of this study were to introduce a new type of heart-rate variability analysis improving risk stratification in patients with idiopathic dilated cardiomyopathy (DCM) and to provide additional information about impaired heart beat generation in these patients. Beat-to-beat intervals (BBI) of 30-min ECGs recorded from 91 DCM patients and 21 healthy subjects were analyzed applying the lagged segmented Poincaré plot analysis (LSPPA) method. LSPPA includes the Poincaré plot reconstruction with lags of 1-100, rotating the cloud of points, its normalized segmentation adapted to their standard deviations, and finally, a frequency-dependent clustering. The lags were combined into eight different clusters representing specific frequency bands within 0.012-1.153 Hz. Statistical differences between low- and high-risk DCM could be found within the clusters II-VIII (e.g., cluster IV: 0.033-0.038 Hz; p = 0.0002; sensitivity = 85.7 %; specificity = 71.4 %). The multivariate statistics led to a sensitivity of 92.9 %, specificity of 85.7 % and an area under the curve of 92.1 % discriminating these patient groups. We introduced the LSPPA method to investigate time correlations in BBI time series. We found that LSPPA contributes considerably to risk stratification in DCM and yields the highest discriminant power in the low and very low-frequency bands.

  1. Relationships between coping style and PAI profiles in a community sample.

    PubMed

    Deisinger, J A; Cassisi, J E; Whitaker, S L

    1996-05-01

    Relationships between coping style and psychological functioning were examined in a heterogeneous community sample (N = 168). Psychological functioning was categorized with the Personality Assessment Inventory (PAI; Morey, 1991). Subjects were assigned to PAI configural profile clusters, using T-scores from PAI clinical scales. Three PAI clusters were prominent in this sample: normal, anxious, and eccentric. Multivariate analysis of covariance revealed that these clusters differed significantly in coping style, as measured by the dispositional format of the COPE Inventory (Carver, Scheier, & Weintraub, 1989). Normals coped through avoidance significantly less than anxious or eccentric subjects. Also, normals engaged in seeking social support and venting more than eccentric but less than anxious subjects. Gender differences also were noted, with women more likely to cope by seeking social support and men more likely to cope through hedonistic escapism.

  2. Wing morphometrics as a possible tool for the diagnosis of the Ceratitis fasciventris, C. anonae, C. rosa complex (Diptera, Tephritidae).

    PubMed

    Van Cann, Joannes; Virgilio, Massimiliano; Jordaens, Kurt; De Meyer, Marc

    2015-01-01

    Previous attempts to resolve the Ceratitis FAR complex (Ceratitis fasciventris, Ceratitis anonae, Ceratitis rosa, Diptera, Tephritidae) showed contrasting results and revealed the occurrence of five microsatellite genotypic clusters (A, F1, F2, R1, R2). In this paper we explore the potential of wing morphometrics for the diagnosis of FAR morphospecies and genotypic clusters. We considered a set of 227 specimens previously morphologically identified and genotyped at 16 microsatellite loci. Seventeen wing landmarks and 6 wing band areas were used for morphometric analyses. Permutational multivariate analysis of variance detected significant differences both across morphospecies and genotypic clusters (for both males and females). Unconstrained and constrained ordinations did not properly resolve groups corresponding to morphospecies or genotypic clusters. However, posterior group membership probabilities (PGMPs) of the Discriminant Analysis of Principal Components (DAPC) allowed the consistent identification of a relevant proportion of specimens (but with performances differing across morphospecies and genotypic clusters). This study suggests that wing morphometrics and PGMPs might represent a possible tool for the diagnosis of species within the FAR complex. Here, we propose a tentative diagnostic method and provide a first reference library of morphometric measures that might be used for the identification of additional and unidentified FAR specimens.

  3. Health-related fitness profiles in adolescents with complex congenital heart disease.

    PubMed

    Klausen, Susanne Hwiid; Wetterslev, Jørn; Søndergaard, Lars; Andersen, Lars L; Mikkelsen, Ulla Ramer; Dideriksen, Kasper; Zoffmann, Vibeke; Moons, Philip

    2015-04-01

    This study investigates whether subgroups of different health-related fitness (HrF) profiles exist among girls and boys with complex congenital heart disease (ConHD) and how these are associated with lifestyle behaviors. We measured the cardiorespiratory fitness, muscle strength, and body composition of 158 adolescents aged 13-16 years with previous surgery for a complex ConHD. Data on lifestyle behaviors were collected concomitantly between October 2010 and April 2013. A cluster analysis was conducted to identify profiles with similar HrF. For comparisons between clusters, multivariate analyses of covariance were used to test the differences in lifestyle behaviors. Three distinct profiles were formed: (1) Robust (43, 27%; 20 girls and 23 boys); (2) Moderately Robust (85, 54%; 37 girls and 48 boys); and (3) Less robust (30, 19%; 9 girls and 21 boys). The participants in the Robust clusters reported leading a physically active lifestyle and participants in the Less robust cluster reported leading a sedentary lifestyle. Diagnoses were evenly distributed between clusters. The cluster analysis attributed some of the variability in cardiorespiratory fitness among adolescents with complex ConHD to lifestyle behaviors and physical activity. Profiling of HrF offers a valuable new option in the management of person-centered health promotion. Copyright © 2015 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  4. Delineation of estuarine management areas using multivariate geostatistics: the case of Sado Estuary.

    PubMed

    Caeiro, Sandra; Goovaerts, Pierre; Painho, Marco; Costa, M Helena

    2003-09-15

    The Sado Estuary is a coastal zone located in the south of Portugal where conflicts between conservation and development exist because of its location near industrialized urban zones and its designation as a natural reserve. The aim of this paper is to evaluate a set of multivariate geostatistical approaches to delineate spatially contiguous regions of sediment structure for Sado Estuary. These areas will be the supporting infrastructure of an environmental management system for this estuary. The boundaries of each homogeneous area were derived from three sediment characterization attributes through three different approaches: (1) cluster analysis of dissimilarity matrix function of geographical separation followed by indicator kriging of the cluster data, (2) discriminant analysis of kriged values of the three sediment attributes, and (3) a combination of methods 1 and 2. Final maximum likelihood classification was integrated into a geographical information system. All methods generated fairly spatially contiguous management areas that reproduce well the environment of the estuary. Map comparison techniques based on kappa statistics showed thatthe resultant three maps are similar, supporting the choice of any of the methods as appropriate for management of the Sado Estuary. However, the results of method 1 seem to be in better agreement with estuary behavior, assessment of contamination sources, and previous work conducted at this site.

  5. Spatial and temporal variation of water quality of a segment of Marikina River using multivariate statistical methods.

    PubMed

    Chounlamany, Vanseng; Tanchuling, Maria Antonia; Inoue, Takanobu

    2017-09-01

    Payatas landfill in Quezon City, Philippines, releases leachate to the Marikina River through a creek. Multivariate statistical techniques were applied to study temporal and spatial variations in water quality of a segment of the Marikina River. The data set included 12 physico-chemical parameters for five monitoring stations over a year. Cluster analysis grouped the monitoring stations into four clusters and identified January-May as dry season and June-September as wet season. Principal components analysis showed that three latent factors are responsible for the data set explaining 83% of its total variance. The chemical oxygen demand, biochemical oxygen demand, total dissolved solids, Cl - and PO 4 3- are influenced by anthropogenic impact/eutrophication pollution from point sources. Total suspended solids, turbidity and SO 4 2- are influenced by rain and soil erosion. The highest state of pollution is at the Payatas creek outfall from March to May, whereas at downstream stations it is in May. The current study indicates that the river monitoring requires only four stations, nine water quality parameters and testing over three specific months of the year. The findings of this study imply that Payatas landfill requires a proper leachate collection and treatment system to reduce its impact on the Marikina River.

  6. Genetic Structure of Bluefin Tuna in the Mediterranean Sea Correlates with Environmental Variables

    PubMed Central

    Riccioni, Giulia; Stagioni, Marco; Landi, Monica; Ferrara, Giorgia; Barbujani, Guido; Tinti, Fausto

    2013-01-01

    Background Atlantic Bluefin Tuna (ABFT) shows complex demography and ecological variation in the Mediterranean Sea. Genetic surveys have detected significant, although weak, signals of population structuring; catch series analyses and tagging programs identified complex ABFT spatial dynamics and migration patterns. Here, we tested the hypothesis that the genetic structure of the ABFT in the Mediterranean is correlated with mean surface temperature and salinity. Methodology We used six samples collected from Western and Central Mediterranean integrated with a new sample collected from the recently identified easternmost reproductive area of Levantine Sea. To assess population structure in the Mediterranean we used a multidisciplinary framework combining classical population genetics, spatial and Bayesian clustering methods and a multivariate approach based on factor analysis. Conclusions FST analysis and Bayesian clustering methods detected several subpopulations in the Mediterranean, a result also supported by multivariate analyses. In addition, we identified significant correlations of genetic diversity with mean salinity and surface temperature values revealing that ABFT is genetically structured along two environmental gradients. These results suggest that a preference for some spawning habitat conditions could contribute to shape ABFT genetic structuring in the Mediterranean. However, further studies should be performed to assess to what extent ABFT spawning behaviour in the Mediterranean Sea can be affected by environmental variation. PMID:24260341

  7. Effect of sexual steroids on boar kinematic sperm subpopulations.

    PubMed

    Ayala, E M E; Aragón, M A

    2017-11-01

    Here, we show the effects of sexual steroids, progesterone, testosterone, or estradiol on motility parameters of boar sperm. Sixteen commercial seminal doses, four each of four adult boars, were analyzed using computer assisted sperm analysis (CASA). Mean values of motility parameters were analyzed by bivariate and multivariate statistics. Principal component analysis (PCA), followed by hierarchical clustering, was applied on data of motility parameters, provided automatically as intervals by the CASA system. Effects of sexual steroids were described in the kinematic subpopulations identified from multivariate statistics. Mean values of motility parameters were not significantly changed after addition of sexual steroids. Multivariate graphics showed that sperm subpopulations were not sensitive to the addition of either testosterone or estradiol, but sperm subpopulations responsive to progesterone were found. Distribution of motility parameters were wide in controls but sharpened at distinct concentrations of progesterone. We conclude that kinematic sperm subpopulations responsive to progesterone are present in boar semen, and these subpopulations are masked in evaluations of mean values of motility parameters. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.

  8. Identifying prognostic intratumor heterogeneity using pre- and post-radiotherapy 18F-FDG PET images for pancreatic cancer patients.

    PubMed

    Yue, Yong; Osipov, Arsen; Fraass, Benedick; Sandler, Howard; Zhang, Xiao; Nissen, Nicholas; Hendifar, Andrew; Tuli, Richard

    2017-02-01

    To stratify risks of pancreatic adenocarcinoma (PA) patients using pre- and post-radiotherapy (RT) PET/CT images, and to assess the prognostic value of texture variations in predicting therapy response of patients. Twenty-six PA patients treated with RT from 2011-2013 with pre- and post-treatment 18F-FDG-PET/CT scans were identified. Tumor locoregional texture was calculated using 3D kernel-based approach, and texture variations were identified by fitting discrepancies of texture maps of pre- and post-treatment images. A total of 48 texture and clinical variables were identified and evaluated for association with overall survival (OS). The prognostic heterogeneity features were selected using lasso/elastic net regression, and further were evaluated by multivariate Cox analysis. Median age was 69 y (range, 46-86 y). The texture map and temporal variations between pre- and post-treatment were well characterized by histograms and statistical fitting. The lasso analysis identified seven predictors (age, node stage, post-RT SUVmax, variations of homogeneity, variance, sum mean, and cluster tendency). The multivariate Cox analysis identified five significant variables: age, node stage, variations of homogeneity, variance, and cluster tendency (with P=0.020, 0.040, 0.065, 0.078, and 0.081, respectively). The patients were stratified into two groups based on the risk score of multivariate analysis with log-rank P=0.001: a low risk group (n=11) with a longer mean OS (29.3 months) and higher texture variation (>30%), and a high risk group (n=15) with a shorter mean OS (17.7 months) and lower texture variation (<15%). Locoregional metabolic texture response provides a feasible approach for evaluating and predicting clinical outcomes following treatment of PA with RT. The proposed method can be used to stratify patient risk and help select appropriate treatment strategies for individual patients toward implementing response-driven adaptive RT.

  9. Multivariate approach to quantitative analysis of Aphis gossypii Glover (Hemiptera: Aphididae) and their natural enemy populations at different cotton spacings.

    PubMed

    Malaquias, José B; Ramalho, Francisco S; Dos S Dias, Carlos T; Brugger, Bruno P; S Lira, Aline Cristina; Wilcken, Carlos F; Pachú, Jéssica K S; Zanuncio, José C

    2017-02-09

    The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied.

  10. Multivariate approach to quantitative analysis of Aphis gossypii Glover (Hemiptera: Aphididae) and their natural enemy populations at different cotton spacings

    PubMed Central

    Malaquias, José B.; Ramalho, Francisco S.; dos S. Dias, Carlos T.; Brugger, Bruno P.; S. Lira, Aline Cristina; Wilcken, Carlos F.; Pachú, Jéssica K. S.; Zanuncio, José C.

    2017-01-01

    The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied. PMID:28181503

  11. Multivariate approach to quantitative analysis of Aphis gossypii Glover (Hemiptera: Aphididae) and their natural enemy populations at different cotton spacings

    NASA Astrophysics Data System (ADS)

    Malaquias, José B.; Ramalho, Francisco S.; Dos S. Dias, Carlos T.; Brugger, Bruno P.; S. Lira, Aline Cristina; Wilcken, Carlos F.; Pachú, Jéssica K. S.; Zanuncio, José C.

    2017-02-01

    The relationship between pests and natural enemies using multivariate analysis on cotton in different spacing has not been documented yet. Using multivariate approaches is possible to optimize strategies to control Aphis gossypii at different crop spacings because the possibility of a better use of the aphid sampling strategies as well as the conservation and release of its natural enemies. The aims of the study were (i) to characterize the temporal abundance data of aphids and its natural enemies using principal components, (ii) to analyze the degree of correlation between the insects and between groups of variables (pests and natural enemies), (iii) to identify the main natural enemies responsible for regulating A. gossypii populations, and (iv) to investigate the similarities in arthropod occurrence patterns at different spacings of cotton crops over two seasons. High correlations in the occurrence of Scymnus rubicundus with aphids are shown through principal component analysis and through the important role the species plays in canonical correlation analysis. Clustering the presence of apterous aphids matches the pattern verified for Chrysoperla externa at the three different spacings between rows. Our results indicate that S. rubicundus is the main candidate to regulate the aphid populations in all spacings studied.

  12. Combined data preprocessing and multivariate statistical analysis characterizes fed-batch culture of mouse hybridoma cells for rational medium design.

    PubMed

    Selvarasu, Suresh; Kim, Do Yun; Karimi, Iftekhar A; Lee, Dong-Yup

    2010-10-01

    We present an integrated framework for characterizing fed-batch cultures of mouse hybridoma cells producing monoclonal antibody (mAb). This framework systematically combines data preprocessing, elemental balancing and statistical analysis technique. Initially, specific rates of cell growth, glucose/amino acid consumptions and mAb/metabolite productions were calculated via curve fitting using logistic equations, with subsequent elemental balancing of the preprocessed data indicating the presence of experimental measurement errors. Multivariate statistical analysis was then employed to understand physiological characteristics of the cellular system. The results from principal component analysis (PCA) revealed three major clusters of amino acids with similar trends in their consumption profiles: (i) arginine, threonine and serine, (ii) glycine, tyrosine, phenylalanine, methionine, histidine and asparagine, and (iii) lysine, valine and isoleucine. Further analysis using partial least square (PLS) regression identified key amino acids which were positively or negatively correlated with the cell growth, mAb production and the generation of lactate and ammonia. Based on these results, the optimal concentrations of key amino acids in the feed medium can be inferred, potentially leading to an increase in cell viability and productivity, as well as a decrease in toxic waste production. The study demonstrated how the current methodological framework using multivariate statistical analysis techniques can serve as a potential tool for deriving rational medium design strategies. Copyright © 2010 Elsevier B.V. All rights reserved.

  13. Salient concerns in using analgesia for cancer pain among outpatients: A cluster analysis study.

    PubMed

    Meghani, Salimah H; Knafl, George J

    2017-02-10

    To identify unique clusters of patients based on their concerns in using analgesia for cancer pain and predictors of the cluster membership. This was a 3-mo prospective observational study ( n = 207). Patients were included if they were adults (≥ 18 years), diagnosed with solid tumors or multiple myelomas, and had at least one prescription of around-the-clock pain medication for cancer or cancer-treatment-related pain. Patients were recruited from two outpatient medical oncology clinics within a large health system in Philadelphia. A choice-based conjoint (CBC) analysis experiment was used to elicit analgesic treatment preferences (utilities). Patients employed trade-offs based on five analgesic attributes (percent relief from analgesics, type of analgesic, type of side-effects, severity of side-effects, out of pocket cost). Patients were clustered based on CBC utilities using novel adaptive statistical methods. Multiple logistic regression was used to identify predictors of cluster membership. The analyses found 4 unique clusters: Most patients made trade-offs based on the expectation of pain relief (cluster 1, 41%). For a subset, the main underlying concern was type of analgesic prescribed, i.e ., opioid vs non-opioid (cluster 2, 11%) and type of analgesic side effects (cluster 4, 21%), respectively. About one in four made trade-offs based on multiple concerns simultaneously including pain relief, type of side effects, and severity of side effects (cluster 3, 28%). In multivariable analysis, to identify predictors of cluster membership, clinical and socioeconomic factors (education, health literacy, income, social support) rather than analgesic attitudes and beliefs were found important; only the belief, i.e ., pain medications can mask changes in health or keep you from knowing what is going on in your body was found significant in predicting two of the four clusters [cluster 1 (-); cluster 4 (+)]. Most patients appear to be driven by a single salient concern in using analgesia for cancer pain. Addressing these concerns, perhaps through real time clinical assessments, may improve patients' analgesic adherence patterns and cancer pain outcomes.

  14. Multivariate Analysis and Prediction of Dioxin-Furan ...

    EPA Pesticide Factsheets

    Peer Review Draft of Regional Methods Initiative Final Report Dioxins, which are bioaccumulative and environmentally persistent, pose an ongoing risk to human and ecosystem health. Fish constitute a significant source of dioxin exposure for humans and fish-eating wildlife. Current dioxin analytical methods are costly, time-consuming, and produce hazardous by-products. A Danish team developed a novel, multivariate statistical methodology based on the covariance of dioxin-furan congener Toxic Equivalences (TEQs) and fatty acid methyl esters (FAMEs) and applied it to North Atlantic Ocean fishmeal samples. The goal of the current study was to attempt to extend this Danish methodology to 77 whole and composite fish samples from three trophic groups: predator (whole largemouth bass), benthic (whole flathead and channel catfish) and forage fish (composite bluegill, pumpkinseed and green sunfish) from two dioxin contaminated rivers (Pocatalico R. and Kanawha R.) in West Virginia, USA. Multivariate statistical analyses, including, Principal Components Analysis (PCA), Hierarchical Clustering, and Partial Least Squares Regression (PLS), were used to assess the relationship between the FAMEs and TEQs in these dioxin contaminated freshwater fish from the Kanawha and Pocatalico Rivers. These three multivariate statistical methods all confirm that the pattern of Fatty Acid Methyl Esters (FAMEs) in these freshwater fish covaries with and is predictive of the WHO TE

  15. Support vector machine learning-based fMRI data group analysis.

    PubMed

    Wang, Ze; Childress, Anna R; Wang, Jiongjiong; Detre, John A

    2007-07-15

    To explore the multivariate nature of fMRI data and to consider the inter-subject brain response discrepancies, a multivariate and brain response model-free method is fundamentally required. Two such methods are presented in this paper by integrating a machine learning algorithm, the support vector machine (SVM), and the random effect model. Without any brain response modeling, SVM was used to extract a whole brain spatial discriminance map (SDM), representing the brain response difference between the contrasted experimental conditions. Population inference was then obtained through the random effect analysis (RFX) or permutation testing (PMU) on the individual subjects' SDMs. Applied to arterial spin labeling (ASL) perfusion fMRI data, SDM RFX yielded lower false-positive rates in the null hypothesis test and higher detection sensitivity for synthetic activations with varying cluster size and activation strengths, compared to the univariate general linear model (GLM)-based RFX. For a sensory-motor ASL fMRI study, both SDM RFX and SDM PMU yielded similar activation patterns to GLM RFX and GLM PMU, respectively, but with higher t values and cluster extensions at the same significance level. Capitalizing on the absence of temporal noise correlation in ASL data, this study also incorporated PMU in the individual-level GLM and SVM analyses accompanied by group-level analysis through RFX or group-level PMU. Providing inferences on the probability of being activated or deactivated at each voxel, these individual-level PMU-based group analysis methods can be used to threshold the analysis results of GLM RFX, SDM RFX or SDM PMU.

  16. Cluster Analysis of Weighted Bipartite Networks: A New Copula-Based Approach

    PubMed Central

    Chessa, Alessandro; Crimaldi, Irene; Riccaboni, Massimo; Trapin, Luca

    2014-01-01

    In this work we are interested in identifying clusters of “positional equivalent” actors, i.e. actors who play a similar role in a system. In particular, we analyze weighted bipartite networks that describes the relationships between actors on one side and features or traits on the other, together with the intensity level to which actors show their features. We develop a methodological approach that takes into account the underlying multivariate dependence among groups of actors. The idea is that positions in a network could be defined on the basis of the similar intensity levels that the actors exhibit in expressing some features, instead of just considering relationships that actors hold with each others. Moreover, we propose a new clustering procedure that exploits the potentiality of copula functions, a mathematical instrument for the modelization of the stochastic dependence structure. Our clustering algorithm can be applied both to binary and real-valued matrices. We validate it with simulations and applications to real-world data. PMID:25303095

  17. Analysis of human tissues by total reflection X-ray fluorescence. Application of chemometrics for diagnostic cancer recognition

    NASA Astrophysics Data System (ADS)

    Benninghoff, L.; von Czarnowski, D.; Denkhaus, E.; Lemke, K.

    1997-07-01

    For the determination of trace element distributions of more than 20 elements in malignant and normal tissues of the human colon, tissue samples (approx. 400 mg wet weight) were digested with 3 ml of nitric acid (sub-boiled quality) by use of an autoclave system. The accuracy of measurements has been investigated by using certified materials. The analytical results were evaluated by using a spreadsheet program to give an overview of the element distribution in cancerous samples and in normal colon tissues. A further application, cluster analysis of the analytical results, was introduced to demonstrate the possibility of classification for cancer diagnosis. To confirm the results of cluster analysis, multivariate three-way principal component analysis was performed. Additionally, microtome frozen sections (10 μm) were prepared from the same tissue samples to compare the analytical results, i.e. the mass fractions of elements, according to the preparation method and to exclude systematic errors depending on the inhomogeneity of the tissues.

  18. Evolutionary analysis of groundwater flow: Application of multivariate statistical analysis to hydrochemical data in the Densu Basin, Ghana

    NASA Astrophysics Data System (ADS)

    Yidana, Sandow Mark; Bawoyobie, Patrick; Sakyi, Patrick; Fynn, Obed Fiifi

    2018-02-01

    An evolutionary trend has been postulated through the analysis of hydrochemical data of a crystalline rock aquifer system in the Densu Basin, Southern Ghana. Hydrochemcial data from 63 groundwater samples, taken from two main groundwater outlets (Boreholes and hand dug wells) were used to postulate an evolutionary theory for the basin. Sequential factor and hierarchical cluster analysis were used to disintegrate the data into three factors and five clusters (spatial associations). These were used to characterize the controls on groundwater hydrochemistry and its evolution in the terrain. The dissolution of soluble salts and cation exchange processes are the dominant processes controlling groundwater hydrochemistry in the terrain. The trend of evolution of this set of processes follows the pattern of groundwater flow predicted by a calibrated transient groundwater model in the area. The data suggest that anthropogenic activities represent the second most important process in the hydrochemistry. Silicate mineral weathering is the third most important set of processes. Groundwater associations resulting from Q-mode hierarchical cluster analysis indicate an evolutionary pattern consistent with the general groundwater flow pattern in the basin. These key findings are at variance with results of previous investigations and indicate that when carefully done, groundwater hydrochemical data can be very useful for conceptualizing groundwater flow in basins.

  19. Multivariate statistical analysis of wildfires in Portugal

    NASA Astrophysics Data System (ADS)

    Costa, Ricardo; Caramelo, Liliana; Pereira, Mário

    2013-04-01

    Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).

  20. Spatial distribution and ecological risk assessment of heavy metal on surface sediment in west part of Java Sea

    NASA Astrophysics Data System (ADS)

    Effendi, Hefni; Wardiatno, Yusli; Kawaroe, Mujizat; Mursalin; Fauzia Lestari, Dea

    2017-01-01

    The surface sediments were identified from west part of Java Sea to evaluate spatial distribution and ecological risk potential of heavy metals (Hg, As, Cd, Cr, Cu, Pb, Zn and Ni). The samples were taken from surface sediment (<0.5 m) in 26 m up to 80 m water depth with Eikman grab. The average material composition on sediment samples were clay (9.86%), sand (8.57%) and mud sand (81.57%). The analysis showed that Pb (11.2%), Cd (49.7%), and Ni (59.5%) exceeded of Probably Effect Level (PEL). Base on ecological risk analysis, {{Cd }}≤ft( {E_r^i:300.64} \\right) and {{Cr }}≤ft( {E_r^i:0.02} \\right) were categorized to high risk and low risk criteria. The ecological risk potential sequences of this study were Cd>Hg>Pb>Ni>Cu>As>Zn>Cr. Furthermore, the result of multivariate statistical analysis shows that correlation among heavy metals (As/Ni, Cd/Ni, and Cu/Zn) and heavy metals with Risk Index (Cd/Ri and Ni/Ri) had positive correlation in significance level p<0.05. Total variance of analysis factor was 80.04% and developed into 3 factors (eigenvalues >1). On the cluster analysis, Cd, Ni, Pb were identified as fairly high contaminations level (cluster 1), Hg as moderate contamination level (cluster 2) and Cu, Zn, Cr with lower contamination level (cluster 3).

  1. Relating N2O emissions during biological nitrogen removal with operating conditions using multivariate statistical techniques.

    PubMed

    Vasilaki, V; Volcke, E I P; Nandi, A K; van Loosdrecht, M C M; Katsou, E

    2018-04-26

    Multivariate statistical analysis was applied to investigate the dependencies and underlying patterns between N 2 O emissions and online operational variables (dissolved oxygen and nitrogen component concentrations, temperature and influent flow-rate) during biological nitrogen removal from wastewater. The system under study was a full-scale reactor, for which hourly sensor data were available. The 15-month long monitoring campaign was divided into 10 sub-periods based on the profile of N 2 O emissions, using Binary Segmentation. The dependencies between operating variables and N 2 O emissions fluctuated according to Spearman's rank correlation. The correlation between N 2 O emissions and nitrite concentrations ranged between 0.51 and 0.78. Correlation >0.7 between N 2 O emissions and nitrate concentrations was observed at sub-periods with average temperature lower than 12 °C. Hierarchical k-means clustering and principal component analysis linked N 2 O emission peaks with precipitation events and ammonium concentrations higher than 2 mg/L, especially in sub-periods characterized by low N 2 O fluxes. Additionally, the highest ranges of measured N 2 O fluxes belonged to clusters corresponding with NO 3 -N concentration less than 1 mg/L in the upstream plug-flow reactor (middle of oxic zone), indicating slow nitrification rates. The results showed that the range of N 2 O emissions partially depends on the prior behavior of the system. The principal component analysis validated the findings from the clustering analysis and showed that ammonium, nitrate, nitrite and temperature explained a considerable percentage of the variance in the system for the majority of the sub-periods. The applied statistical methods, linked the different ranges of emissions with the system variables, provided insights on the effect of operating conditions on N 2 O emissions in each sub-period and can be integrated into N 2 O emissions data processing at wastewater treatment plants. Copyright © 2018. Published by Elsevier Ltd.

  2. A novel combined approach of diffuse reflectance UV-Vis-NIR spectroscopy and multivariate analysis for non-destructive examination of blue ballpoint pen inks in forensic application

    NASA Astrophysics Data System (ADS)

    Kumar, Raj; Sharma, Vishal

    2017-03-01

    The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%).

  3. Sampling effort affects multivariate comparisons of stream assemblages

    USGS Publications Warehouse

    Cao, Y.; Larsen, D.P.; Hughes, R.M.; Angermeier, P.L.; Patton, T.M.

    2002-01-01

    Multivariate analyses are used widely for determining patterns of assemblage structure, inferring species-environment relationships and assessing human impacts on ecosystems. The estimation of ecological patterns often depends on sampling effort, so the degree to which sampling effort affects the outcome of multivariate analyses is a concern. We examined the effect of sampling effort on site and group separation, which was measured using a mean similarity method. Two similarity measures, the Jaccard Coefficient and Bray-Curtis Index were investigated with 1 benthic macroinvertebrate and 2 fish data sets. Site separation was significantly improved with increased sampling effort because the similarity between replicate samples of a site increased more rapidly than between sites. Similarly, the faster increase in similarity between sites of the same group than between sites of different groups caused clearer separation between groups. The strength of site and group separation completely stabilized only when the mean similarity between replicates reached 1. These results are applicable to commonly used multivariate techniques such as cluster analysis and ordination because these multivariate techniques start with a similarity matrix. Completely stable outcomes of multivariate analyses are not feasible. Instead, we suggest 2 criteria for estimating the stability of multivariate analyses of assemblage data: 1) mean within-site similarity across all sites compared, indicating sample representativeness, and 2) the SD of within-site similarity across sites, measuring sample comparability.

  4. Novel clustering of items from the Autism Diagnostic Interview-Revised to define phenotypes within autism spectrum disorders

    PubMed Central

    Hu, Valerie W.; Steinberg, Mara E.

    2009-01-01

    Heterogeneity in phenotypic presentation of ASD has been cited as one explanation for the difficulty in pinpointing specific genes involved in autism. Recent studies have attempted to reduce the “noise” in genetic and other biological data by reducing the phenotypic heterogeneity of the sample population. The current study employs multiple clustering algorithms on 123 item scores from the Autism Diagnostic Interview-Revised (ADI-R) diagnostic instrument of nearly 2000 autistic individuals to identify subgroups of autistic probands with clinically relevant behavioral phenotypes in order to isolate more homogeneous groups of subjects for gene expression analyses. Our combined cluster analyses suggest optimal division of the autistic probands into 4 phenotypic clusters based on similarity of symptom severity across the 123 selected item scores. One cluster is characterized by severe language deficits, while another exhibits milder symptoms across the domains. A third group possesses a higher frequency of savant skills while the fourth group exhibited intermediate severity across all domains. Grouping autistic individuals by multivariate cluster analysis of ADI-R scores reveals meaningful phenotypes of subgroups within the autistic spectrum which we show, in a related (accompanying) study, to be associated with distinct gene expression profiles. PMID:19455643

  5. Analysis of Big Data in Gait Biomechanics: Current Trends and Future Directions.

    PubMed

    Phinyomark, Angkoon; Petri, Giovanni; Ibáñez-Marcelo, Esther; Osis, Sean T; Ferber, Reed

    2018-01-01

    The increasing amount of data in biomechanics research has greatly increased the importance of developing advanced multivariate analysis and machine learning techniques, which are better able to handle "big data". Consequently, advances in data science methods will expand the knowledge for testing new hypotheses about biomechanical risk factors associated with walking and running gait-related musculoskeletal injury. This paper begins with a brief introduction to an automated three-dimensional (3D) biomechanical gait data collection system: 3D GAIT, followed by how the studies in the field of gait biomechanics fit the quantities in the 5 V's definition of big data: volume, velocity, variety, veracity, and value. Next, we provide a review of recent research and development in multivariate and machine learning methods-based gait analysis that can be applied to big data analytics. These modern biomechanical gait analysis methods include several main modules such as initial input features, dimensionality reduction (feature selection and extraction), and learning algorithms (classification and clustering). Finally, a promising big data exploration tool called "topological data analysis" and directions for future research are outlined and discussed.

  6. Evaluation of drinking quality of groundwater through multivariate techniques in urban area.

    PubMed

    Das, Madhumita; Kumar, A; Mohapatra, M; Muduli, S D

    2010-07-01

    Groundwater is a major source of drinking water in urban areas. Because of the growing threat of debasing water quality due to urbanization and development, monitoring water quality is a prerequisite to ensure its suitability for use in drinking. But analysis of a large number of properties and parameter to parameter basis evaluation of water quality is not feasible in a regular interval. Multivariate techniques could streamline the data without much loss of information to a reasonably manageable data set. In this study, using principal component analysis, 11 relevant properties of 58 water samples were grouped into three statistical factors. Discriminant analysis identified "pH influence" as the most distinguished factor and pH, Fe, and NO₃⁻ as the most discriminating variables and could be treated as water quality indicators. These were utilized to classify the sampling sites into homogeneous clusters that reflect location-wise importance of specific indicator/s for use to monitor drinking water quality in the whole study area.

  7. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel

    PubMed Central

    Grapov, Dmitry; Newman, John W.

    2012-01-01

    Summary: Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Availability and implementation: Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010). Contact: John.Newman@ars.usda.gov Supplementary Information: Installation instructions, tutorials and users manual are available at http://sourceforge.net/projects/imdev/. PMID:22815358

  8. A cross-species socio-emotional behaviour development revealed by a multivariate analysis.

    PubMed

    Koshiba, Mamiko; Senoo, Aya; Mimura, Koki; Shirakawa, Yuka; Karino, Genta; Obara, Saya; Ozawa, Shinpei; Sekihara, Hitomi; Fukushima, Yuta; Ueda, Toyotoshi; Kishino, Hirohisa; Tanaka, Toshihisa; Ishibashi, Hidetoshi; Yamanouchi, Hideo; Yui, Kunio; Nakamura, Shun

    2013-01-01

    Recent progress in affective neuroscience and social neurobiology has been propelled by neuro-imaging technology and epigenetic approach in neurobiology of animal behaviour. However, quantitative measurements of socio-emotional development remains lacking, though sensory-motor development has been extensively studied in terms of digitised imaging analysis. Here, we developed a method for socio-emotional behaviour measurement that is based on the video recordings under well-defined social context using animal models with variously social sensory interaction during development. The behaviour features digitized from the video recordings were visualised in a multivariate statistic space using principal component analysis. The clustering of the behaviour parameters suggested the existence of species- and stage-specific as well as cross-species behaviour modules. These modules were used to characterise the behaviour of children with or without autism spectrum disorders (ASDs). We found that socio-emotional behaviour is highly dependent on social context and the cross-species behaviour modules may predict neurobiological basis of ASDs.

  9. Handwriting Examination: Moving from Art to Science

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jarman, K.H.; Hanlen, R.C.; Manzolillo, P.A.

    In this document, we present a method for validating the premises and methodology of forensic handwriting examination. This method is intuitively appealing because it relies on quantitative measurements currently used qualitatively by FDE's in making comparisons, and it is scientifically rigorous because it exploits the power of multivariate statistical analysis. This approach uses measures of both central tendency and variation to construct a profile for a given individual. (Central tendency and variation are important for characterizing an individual's writing and both are currently used by FDE's in comparative analyses). Once constructed, different profiles are then compared for individuality using clustermore » analysis; they are grouped so that profiles within a group cannot be differentiated from one another based on the measured characteristics, whereas profiles between groups can. The cluster analysis procedure used here exploits the power of multivariate hypothesis testing. The result is not only a profile grouping but also an indication of statistical significance of the groups generated.« less

  10. Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data

    PubMed Central

    Hallac, David; Vare, Sagar; Boyd, Stephen; Leskovec, Jure

    2018-01-01

    Subsequence clustering of multivariate time series is a useful tool for discovering repeated patterns in temporal data. Once these patterns have been discovered, seemingly complicated datasets can be interpreted as a temporal sequence of only a small number of states, or clusters. For example, raw sensor data from a fitness-tracking application can be expressed as a timeline of a select few actions (i.e., walking, sitting, running). However, discovering these patterns is challenging because it requires simultaneous segmentation and clustering of the time series. Furthermore, interpreting the resulting clusters is difficult, especially when the data is high-dimensional. Here we propose a new method of model-based clustering, which we call Toeplitz Inverse Covariance-based Clustering (TICC). Each cluster in the TICC method is defined by a correlation network, or Markov random field (MRF), characterizing the interdependencies between different observations in a typical subsequence of that cluster. Based on this graphical representation, TICC simultaneously segments and clusters the time series data. We solve the TICC problem through alternating minimization, using a variation of the expectation maximization (EM) algorithm. We derive closed-form solutions to efficiently solve the two resulting subproblems in a scalable way, through dynamic programming and the alternating direction method of multipliers (ADMM), respectively. We validate our approach by comparing TICC to several state-of-the-art baselines in a series of synthetic experiments, and we then demonstrate on an automobile sensor dataset how TICC can be used to learn interpretable clusters in real-world scenarios. PMID:29770257

  11. DEFINITION OF MULTIVARIATE GEOCHEMICAL ASSOCIATIONS WITH POLYMETALLIC MINERAL OCCURRENCES USING A SPATIALLY DEPENDENT CLUSTERING TECHNIQUE AND RASTERIZED STREAM SEDIMENT DATA - AN ALASKAN EXAMPLE.

    USGS Publications Warehouse

    Jenson, Susan K.; Trautwein, C.M.

    1984-01-01

    The application of an unsupervised, spatially dependent clustering technique (AMOEBA) to interpolated raster arrays of stream sediment data has been found to provide useful multivariate geochemical associations for modeling regional polymetallic resource potential. The technique is based on three assumptions regarding the compositional and spatial relationships of stream sediment data and their regional significance. These assumptions are: (1) compositionally separable classes exist and can be statistically distinguished; (2) the classification of multivariate data should minimize the pair probability of misclustering to establish useful compositional associations; and (3) a compositionally defined class represented by three or more contiguous cells within an array is a more important descriptor of a terrane than a class represented by spatial outliers.

  12. Using Clustering to Establish Climate Regimes from PCM Output

    NASA Technical Reports Server (NTRS)

    Oglesby, Robert; Arnold, James E. (Technical Monitor); Hoffman, Forrest; Hargrove, W. W.; Erickson, D.

    2002-01-01

    A multivariate statistical clustering technique--based on the k-means algorithm of Hartigan has been used to extract patterns of climatological significance from 200 years of general circulation model (GCM) output. Originally developed and implemented on a Beowulf-style parallel computer constructed by Hoffman and Hargrove from surplus commodity desktop PCs, the high performance parallel clustering algorithm was previously applied to the derivation of ecoregions from map stacks of 9 and 25 geophysical conditions or variables for the conterminous U.S. at a resolution of 1 sq km. Now applied both across space and through time, the clustering technique yields temporally-varying climate regimes predicted by transient runs of the Parallel Climate Model (PCM). Using a business-as-usual (BAU) scenario and clustering four fields of significance to the global water cycle (surface temperature, precipitation, soil moisture, and snow depth) from 1871 through 2098, the authors' analysis shows an increase in spatial area occupied by the cluster or climate regime which typifies desert regions (i.e., an increase in desertification) and a decrease in the spatial area occupied by the climate regime typifying winter-time high latitude perma-frost regions. The patterns of cluster changes have been analyzed to understand the predicted variability in the water cycle on global and continental scales. In addition, representative climate regimes were determined by taking three 10-year averages of the fields 100 years apart for northern hemisphere winter (December, January, and February) and summer (June, July, and August). The result is global maps of typical seasonal climate regimes for 100 years in the past, for the present, and for 100 years into the future. Using three-dimensional data or phase space representations of these climate regimes (i.e., the cluster centroids), the authors demonstrate the portion of this phase space occupied by the land surface at all points in space and time. Any single spot on the globe will exist in one of these climate regimes at any single point in time. By incrementing time, that same spot will trace out a trajectory or orbit between and among these climate regimes (or atmospheric states) in phase (or state) space. When a geographic region enters a state it never previously visited, a climatic change is said to have occurred. Tracing out the entire trajectory of a single spot on the globe yields a 'manifold' in state space representing the shape of its predicted climate occupancy. This sort of analysis enables a researcher to more easily grasp the multivariate behavior of the climate system.

  13. Evaluation of biomolecular distributions in rat brain tissues by means of ToF-SIMS using a continuous beam of Ar clusters.

    PubMed

    Nakano, Shusuke; Yokoyama, Yuta; Aoyagi, Satoka; Himi, Naoyuki; Fletcher, John S; Lockyer, Nicholas P; Henderson, Alex; Vickerman, John C

    2016-06-08

    Time-of-flight secondary ion mass spectrometry (ToF-SIMS) provides detailed chemical structure information and high spatial resolution images. Therefore, ToF-SIMS is useful for studying biological phenomena such as ischemia. In this study, in order to evaluate cerebral microinfarction, the distribution of biomolecules generated by ischemia was measured with ToF-SIMS. ToF-SIMS data sets were analyzed by means of multivariate analysis for interpreting complex samples containing unknown information and to obtain biomolecular mapping indicated by fragment ions from the target biomolecules. Using conventional ToF-SIMS (primary ion source: Bi cluster ion), it is difficult to detect secondary ions beyond approximately 1000 u. Moreover, the intensity of secondary ions related to biomolecules is not always high enough for imaging because of low concentration even if the masses are lower than 1000 u. However, for the observation of biomolecular distributions in tissues, it is important to detect low amounts of biological molecules from a particular area of tissue. Rat brain tissue samples were measured with ToF-SIMS (J105, Ionoptika, Ltd., Chandlers Ford, UK), using a continuous beam of Ar clusters as a primary ion source. ToF-SIMS with Ar clusters efficiently detects secondary ions related to biomolecules and larger molecules. Molecules detected by ToF-SIMS were examined by analyzing ToF-SIMS data using multivariate analysis. Microspheres (45 μm diameter) were injected into the rat unilateral internal carotid artery (MS rat) to cause cerebral microinfarction. The rat brain was sliced and then measured with ToF-SIMS. The brain samples of a normal rat and the MS rat were examined to find specific secondary ions related to important biomolecules, and then the difference between them was investigated. Finally, specific secondary ions were found around vessels incorporating microspheres in the MS rat. The results suggest that important biomolecules related to cerebral microinfarction can be detected by ToF-SIMS.

  14. A data fusion-based drought index

    NASA Astrophysics Data System (ADS)

    Azmi, Mohammad; Rüdiger, Christoph; Walker, Jeffrey P.

    2016-03-01

    Drought and water stress monitoring plays an important role in the management of water resources, especially during periods of extreme climate conditions. Here, a data fusion-based drought index (DFDI) has been developed and analyzed for three different locations of varying land use and climate regimes in Australia. The proposed index comprehensively considers all types of drought through a selection of indices and proxies associated with each drought type. In deriving the proposed index, weekly data from three different data sources (OzFlux Network, Asia-Pacific Water Monitor, and MODIS-Terra satellite) were employed to first derive commonly used individual standardized drought indices (SDIs), which were then grouped using an advanced clustering method. Next, three different multivariate methods (principal component analysis, factor analysis, and independent component analysis) were utilized to aggregate the SDIs located within each group. For the two clusters in which the grouped SDIs best reflected the water availability and vegetation conditions, the variables were aggregated based on an averaging between the standardized first principal components of the different multivariate methods. Then, considering those two aggregated indices as well as the classifications of months (dry/wet months and active/non-active months), the proposed DFDI was developed. Finally, the symbolic regression method was used to derive mathematical equations for the proposed DFDI. The results presented here show that the proposed index has revealed new aspects in water stress monitoring which previous indices were not able to, by simultaneously considering both hydrometeorological and ecological concepts to define the real water stress of the study areas.

  15. Application of multivariate statistical analysis concepts for assessment of hydrogeochemistry of groundwater—a study in Suri I and II blocks of Birbhum District, West Bengal, India

    NASA Astrophysics Data System (ADS)

    Das, Shreya; Nag, S. K.

    2017-05-01

    Multivariate statistical techniques, cluster and principal component analysis were applied to the data on groundwater quality of Suri I and II Blocks of Birbhum District, West Bengal, India, to extract principal factors corresponding to the different sources of variation in the hydrochemistry as well as the main controls on the hydrochemistry. For this, bore well water samples have been collected in two phases, during Post-monsoon (November 2012) and Pre-monsoon (April 2013) from 26 sampling locations spread homogeneously over the two blocks. Excess fluoride in groundwater has been reported at two locations both in post- and in pre-monsoon sessions, with a rise observed in pre-monsoon. Localized presence of excess iron has also been observed during both sessions. The water is found to be mildly alkaline in post-monsoon but slightly acidic at some locations during pre-monsoon. Correlation and cluster analysis studies demonstrate that fluoride shares a moderately positive correlation with pH in post-monsoon and a very strong one with carbonate in pre-monsoon indicating dominance of rock water interaction and ion exchange activity in the study area. Certain locations in the study area have been reported with less than 0.6 mg/l fluoride in groundwater, leading to possibility of occurrence of severe dental caries especially in children. Low values of sulfate and phosphate in water indicate a meager chance of contamination of groundwater due to anthropogenic factors.

  16. Cluster analysis and quality assessment of logged water at an irrigation project, eastern Saudi Arabia.

    PubMed

    Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid

    2008-01-01

    A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.

  17. Three estimates of the association between linear growth failure and cognitive ability.

    PubMed

    Cheung, Y B; Lam, K F

    2009-09-01

    To compare three estimators of association between growth stunting as measured by height-for-age Z-score and cognitive ability in children, and to examine the extent statistical adjustment for covariates is useful for removing confounding due to socio-economic status. Three estimators, namely random-effects, within- and between-cluster estimators, for panel data were used to estimate the association in a survey of 1105 pairs of siblings who were assessed for anthropometry and cognition. Furthermore, a 'combined' model was formulated to simultaneously provide the within- and between-cluster estimates. Random-effects and between-cluster estimators showed strong association between linear growth and cognitive ability, even after adjustment for a range of socio-economic variables. In contrast, the within-cluster estimator showed a much more modest association: For every increase of one Z-score in linear growth, cognitive ability increased by about 0.08 standard deviation (P < 0.001). The combined model verified that the between-cluster estimate was significantly larger than the within-cluster estimate (P = 0.004). Residual confounding by socio-economic situations may explain a substantial proportion of the observed association between linear growth and cognition in studies that attempt to control the confounding by means of multivariable regression analysis. The within-cluster estimator provides more convincing and modest results about the strength of association.

  18. Factors associated with recently transmitted Mycobacterium tuberculosis strain MS0006 in Hinds County, Mississippi.

    PubMed

    Temple, Brian; Kwara, Awewura; Sunesara, Imran; Mena, Leandro; Dobbs, Thomas; Henderson, Harold; Holcomb, Mike; Webb, Risa

    2011-12-01

    The objective of this study was to investigate risk factors associated with tuberculosis (TB) transmission that was caused by Mycobacterium tuberculosis strain MS0006 from 2004 to 2009 in Hinds County, Mississippi. DNA fingerprinting using spoligotyping, mycobacterial interspersed repetitive unit, and IS6110-based restriction fragment length polymorphism of culture-confirmed cases of TB was performed. Clinical and demographic factors associated with strain MS0006 were analyzed by univariate and multivariate analysis. Of the 144 cases of TB diagnosed during the study period, 117 were culture positive with fingerprints available. There were 48 different strains, of which 6 clustered strains were distributed among 74 patients. The MS0006 strain accounted for 46.2% of all culture-confirmed cases. Risk factors for having the MS0006 strain in a univariate analysis included homelessness, HIV co-infection, sputum smear negativity, tuberculin skin test negativity, and noninjectable drug use. Multivariate analysis identified homelessness (odds ratio 7.88, 95% confidence interval 2.90-21.35) and African American race (odds ratio 5.80, 95% confidence interval 1.37-24.55) as independent predictors of having TB caused by the MS0006 strain of M. tuberculosis. Our findings suggest that a majority of recently transmitted TB in the studied county was caused by the MS0006 strain. African American race and homelessness were significant risk factors for inclusion in the cluster. Molecular epidemiology techniques continue to provide in-depth analysis of disease transmission and play a vital role in effective contact tracing and interruption of ongoing transmission.

  19. Identifying patterns of general practitioner service utilisation and their relationship with potentially preventable hospitalisations in people with diabetes: The utility of a cluster analysis approach.

    PubMed

    Ha, Ninh Thi; Harris, Mark; Preen, David; Robinson, Suzanne; Moorin, Rachael

    2018-04-01

    We aimed to characterise use of general practitioners (GP) simultaneously across multiple attributes in people with diabetes and examine its impact on diabetes related potentially preventable hospitalisations (PPHs). Five-years of panel data from 40,625 adults with diabetes were sourced from Western Australian administrative health records. Cluster analysis (CA) was used to group individuals with similar patterns of GP utilisation characterised by frequency and recency of services. The relationship between GP utilisation cluster and the risk of PPHs was examined using multivariable random-effects negative binomial regression. CA categorised GP utilisation into three clusters: moderate; high and very high usage, having distinct patient characteristics. After adjusting for potential confounders, the rate of PPHs was significantly lower across all GP usage clusters compared with those with no GP usage; IRR = 0.67 (95%CI: 0.62-0.71) among the moderate, IRR = 0.70 (95%CI 0.66-0.73) high and IRR = 0.76 (95%CI 0.72-0.80) very high GP usage clusters. Combination of temporal factors with measures of frequency of use of GP services revealed patterns of primary health care utilisation associated with different underlying patient characteristics. Incorporation of multiple attributes, that go beyond frequency-based approaches may better characterise the complex relationship between use of GP services and diabetes-related hospitalisation. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. A Model-Based Analysis of Chemical and Temporal Patterns of Cuticular Hydrocarbons in Male Drosophila melanogaster

    PubMed Central

    Kent, Clement; Azanchi, Reza; Smith, Ben; Chu, Adrienne; Levine, Joel

    2007-01-01

    Drosophila Cuticular Hydrocarbons (CH) influence courtship behaviour, mating, aggregation, oviposition, and resistance to desiccation. We measured levels of 24 different CH compounds of individual male D. melanogaster hourly under a variety of environmental (LD/DD) conditions. Using a model-based analysis of CH variation, we developed an improved normalization method for CH data, and show that CH compounds have reproducible cyclic within-day temporal patterns of expression which differ between LD and DD conditions. Multivariate clustering of expression patterns identified 5 clusters of co-expressed compounds with common chemical characteristics. Turnover rate estimates suggest CH production may be a significant metabolic cost. Male cuticular hydrocarbon expression is a dynamic trait influenced by light and time of day; since abundant hydrocarbons affect male sexual behavior, males may present different pheromonal profiles at different times and under different conditions. PMID:17896002

  1. Permutation Tests of Hierarchical Cluster Analyses of Carrion Communities and Their Potential Use in Forensic Entomology.

    PubMed

    van der Ham, Joris L

    2016-05-19

    Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Phenotype in combination with genotype improves outcome prediction in acute myeloid leukemia: a report from Children’s Oncology Group protocol AAML0531

    PubMed Central

    Voigt, Andrew P.; Brodersen, Lisa Eidenschink; Alonzo, Todd A.; Gerbing, Robert B.; Menssen, Andrew J.; Wilson, Elisabeth R.; Kahwash, Samir; Raimondi, Susana C.; Hirsch, Betsy A.; Gamis, Alan S.; Meshinchi, Soheil; Wells, Denise A.; Loken, Michael R.

    2017-01-01

    Diagnostic biomarkers can be used to determine relapse risk in acute myeloid leukemia, and certain genetic aberrancies have prognostic relevance. A diagnostic immunophenotypic expression profile, which quantifies the amounts of distinct gene products, not just their presence or absence, was established in order to improve outcome prediction for patients with acute myeloid leukemia. The immunophenotypic expression profile, which defines each patient’s leukemia as a location in 15-dimensional space, was generated for 769 patients enrolled in the Children’s Oncology Group AAML0531 protocol. Unsupervised hierarchical clustering grouped patients with similar immunophenotypic expression profiles into eleven patient cohorts, demonstrating high associations among phenotype, genotype, morphology, and outcome. Of 95 patients with inv(16), 79% segregated in Cluster A. Of 109 patients with t(8;21), 92% segregated in Clusters A and B. Of 152 patients with 11q23 alterations, 78% segregated in Clusters D, E, F, G, or H. For both inv(16) and 11q23 abnormalities, differential phenotypic expression identified patient groups with different survival characteristics (P<0.05). Clinical outcome analysis revealed that Cluster B (predominantly t(8;21)) was associated with favorable outcome (P<0.001) and Clusters E, G, H, and K were associated with adverse outcomes (P<0.05). Multivariable regression analysis revealed that Clusters E, G, H, and K were independently associated with worse survival (P range <0.001 to 0.008). The Children’s Oncology Group AAML0531 trial: clinicaltrials.gov Identifier: 00372593. PMID:28883080

  3. Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor)

    PubMed Central

    Zhang, Jingjing; Dennis, Todd E.

    2015-01-01

    We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known ‘artificial behaviours’ comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified. PMID:25922935

  4. Extending the Functionality of Behavioural Change-Point Analysis with k-Means Clustering: A Case Study with the Little Penguin (Eudyptula minor).

    PubMed

    Zhang, Jingjing; O'Reilly, Kathleen M; Perry, George L W; Taylor, Graeme A; Dennis, Todd E

    2015-01-01

    We present a simple framework for classifying mutually exclusive behavioural states within the geospatial lifelines of animals. This method involves use of three sequentially applied statistical procedures: (1) behavioural change point analysis to partition movement trajectories into discrete bouts of same-state behaviours, based on abrupt changes in the spatio-temporal autocorrelation structure of movement parameters; (2) hierarchical multivariate cluster analysis to determine the number of different behavioural states; and (3) k-means clustering to classify inferred bouts of same-state location observations into behavioural modes. We demonstrate application of the method by analysing synthetic trajectories of known 'artificial behaviours' comprised of different correlated random walks, as well as real foraging trajectories of little penguins (Eudyptula minor) obtained by global-positioning-system telemetry. Our results show that the modelling procedure correctly classified 92.5% of all individual location observations in the synthetic trajectories, demonstrating reasonable ability to successfully discriminate behavioural modes. Most individual little penguins were found to exhibit three unique behavioural states (resting, commuting/active searching, area-restricted foraging), with variation in the timing and locations of observations apparently related to ambient light, bathymetry, and proximity to coastlines and river mouths. Addition of k-means clustering extends the utility of behavioural change point analysis, by providing a simple means through which the behaviours inferred for the location observations comprising individual movement trajectories can be objectively classified.

  5. Improving the sampling strategy of the Joint Danube Survey 3 (2013) by means of multivariate statistical techniques applied on selected physico-chemical and biological data.

    PubMed

    Hamchevici, Carmen; Udrea, Ion

    2013-11-01

    The concept of basin-wide Joint Danube Survey (JDS) was launched by the International Commission for the Protection of the Danube River (ICPDR) as a tool for investigative monitoring under the Water Framework Directive (WFD), with a frequency of 6 years. The first JDS was carried out in 2001 and its success in providing key information for characterisation of the Danube River Basin District as required by WFD lead to the organisation of the second JDS in 2007, which was the world's biggest river research expedition in that year. The present paper presents an approach for improving the survey strategy for the next planned survey JDS3 (2013) by means of several multivariate statistical techniques. In order to design the optimum structure in terms of parameters and sampling sites, principal component analysis (PCA), factor analysis (FA) and cluster analysis were applied on JDS2 data for 13 selected physico-chemical and one biological element measured in 78 sampling sites located on the main course of the Danube. Results from PCA/FA showed that most of the dataset variance (above 75%) was explained by five varifactors loaded with 8 out of 14 variables: physical (transparency and total suspended solids), relevant nutrients (N-nitrates and P-orthophosphates), feedback effects of primary production (pH, alkalinity and dissolved oxygen) and algal biomass. Taking into account the representation of the factor scores given by FA versus sampling sites and the major groups generated by the clustering procedure, the spatial network of the next survey could be carefully tailored, leading to a decreasing of sampling sites by more than 30%. The approach of target oriented sampling strategy based on the selected multivariate statistics can provide a strong reduction in dimensionality of the original data and corresponding costs as well, without any loss of information.

  6. Geomorphic Classification and Evaluation of Channel Width and Emergent Sandbar Habitat Relations on the Lower Platte River, Nebraska

    USGS Publications Warehouse

    Elliott, Caroline M.

    2011-01-01

    This report presents a summary of geomorphic characteristics extracted from aerial imagery for three broad segments of the Lower Platte River. This report includes a summary of the longitudinal multivariate classification in Elliott and others (2009) and presents a new analysis of total channel width and habitat variables. Three segments on the lower 102.8 miles of the Lower Platte River are addressed in this report: the Loup River to the Elkhorn River (70 miles long), the Elkhorn River to Salt Creek (6.9 miles long), and Salt Creek to the Missouri River (25.9 miles long). The locations of these segments were determined by the locations of tributaries potentially significant to the hydrology or sediment supply of the Lower Platte River. This report summarizes channel characteristics as mapped from July 2006 aerial imagery including river width, valley width, channel curvature, and in-channel habitat features. In-channel habitat measurements were not made under consistent hydrologic conditions and must be considered general estimates of channel condition in late July 2006. Longitudinal patterns in these features are explored and are summarized in the context of the longitudinal multivariate classification in Elliott and others (2009) for the three Lower Platte River segments. Detailed descriptions of data collection and classification methods are described in Elliott and others (2009). Nesting data for the endangered interior least tern (Sternula antillarum) and threatened piping plover (Charadrius melodus) from 2006 through 2009 are examined within the context of the multivariate classification and Lower Platte River segments. The widest reaches of the Lower Platte River are located in the segment downstream from the Loup River to the Elkhorn River. This segment also has the widest valley and highest degree of braiding of the three segments and many large vegetated islands. The short segment of river between the Elkhorn River and Salt Creek has a fairly low valley width and high channel sinuosities at larger scales. The segment from Salt Creek to the Missouri River has narrow valleys and generally low channel sinuosity. Tern and plover nest sites from 2006 through 2009 in the multi-scale multivariate classification indicated relative nesting selection of cluster 2 reaches among the four-cluster classification and reaches containing clusters 2, 3, and 6 from the seven-cluster classification. These classes, with the exception of cluster 6 are common downstream from the Elkhorn River. Trends in total channel width indicated that reaches dominated by dark vegetation (islands) are the widest on the Lower Platte River. Reaches with high percentages of dry sand and dry sand plus light vegetation were the narrowest reaches. This suggests that narrow channel reaches have sufficient transport capacity to maintain sandbars under recent (2006) flow regimes and are likely to be most amenable to maintaining tern and plover habitat in the Lower Platte River. Further investigations into the dynamics of emergent sandbar habitat and the effects of bank stabilization on in-channel habitats will require the collection and analysis of new data, particularly detailed elevation information and an assessment of existing bank stabilization structures.

  7. Exploring Geographical Differentiation of the Hoelen Medicinal Mushroom, Wolfiporia extensa (Agaricomycetes), Using Fourier-Transform Infrared Spectroscopy Combined with Multivariate Analysis.

    PubMed

    Li, Yan; Zhang, Ji; Zhao, Yanli; Liu, Honggao; Wang, Yuanzhong; Jin, Hang

    2016-01-01

    In this study the geographical differentiation of dried sclerotia of the medicinal mushroom Wolfiporia extensa, obtained from different regions in Yunnan Province, China, was explored using Fourier-transform infrared (FT-IR) spectroscopy coupled with multivariate data analysis. The FT-IR spectra of 97 samples were obtained for wave numbers ranging from 4000 to 400 cm-1. Then, the fingerprint region of 1800-600 cm-1 of the FT-IR spectrum, rather than the full spectrum, was analyzed. Different pretreatments were applied on the spectra, and a discriminant analysis model based on the Mahalanobis distance was developed to select an optimal pretreatment combination. Two unsupervised pattern recognition procedures- principal component analysis and hierarchical cluster analysis-were applied to enhance the authenticity of discrimination of the specimens. The results showed that excellent classification could be obtained after optimizing spectral pretreatment. The tested samples were successfully discriminated according to their geographical locations. The chemical properties of dried sclerotia of W. extensa were clearly dependent on the mushroom's geographical origins. Furthermore, an interesting finding implied that the elevations of collection areas may have effects on the chemical components of wild W. extensa sclerotia. Overall, this study highlights the feasibility of FT-IR spectroscopy combined with multivariate data analysis in particular for exploring the distinction of different regional W. extensa sclerotia samples. This research could also serve as a basis for the exploitation and utilization of medicinal mushrooms.

  8. Multivariate Analysis of Remains of Molluscan Foods Consumed by Latest Pleistocene and Holocene Humans in Nerja Cave, Málaga, Spain

    NASA Astrophysics Data System (ADS)

    Serrano, Francisco; Guerra-Merchán, Antonio; Lozano-Francisco, Carmen; Vera-Peláez, José Luis

    1997-09-01

    Nerja Cave is a karstic cavity used by humans from Late Paleolithic to post-Chalcolithic times. Remains of molluscan foods in the uppermost Pleistocene and Holocene sediments were studied with cluster analysis and principal components analysis, in both Qand Rmodes. The results from cluster analysis distinguished interval groups mainly in accordance with chronology and distinguished assemblages of species mainly according to habitat. Significant changes in the shellfish diet through time were revealed. In the Late Magdalenian, most molluscs consumed consisted of pulmonate gastropods and species from sandy sea bottoms. The Epipaleolithic diet was more varied and included species from rocky shorelines. From the Neolithic onward most molluscs consumed were from rocky shorelines. From the principal components analysis in Qmode, the first factor reflected mainly changes in the predominant capture environment, probably because of major paleogeographic changes. The second factor may reflect selective capture along rocky coastlines during certain times. The third factor correlated well with the sea-surface temperature curve in the western Mediterranean (Alboran Sea) during the late Quaternary.

  9. Automated Classification and Analysis of Non-metallic Inclusion Data Sets

    NASA Astrophysics Data System (ADS)

    Abdulsalam, Mohammad; Zhang, Tongsheng; Tan, Jia; Webler, Bryan A.

    2018-05-01

    The aim of this study is to utilize principal component analysis (PCA), clustering methods, and correlation analysis to condense and examine large, multivariate data sets produced from automated analysis of non-metallic inclusions. Non-metallic inclusions play a major role in defining the properties of steel and their examination has been greatly aided by automated analysis in scanning electron microscopes equipped with energy dispersive X-ray spectroscopy. The methods were applied to analyze inclusions on two sets of samples: two laboratory-scale samples and four industrial samples from a near-finished 4140 alloy steel components with varying machinability. The laboratory samples had well-defined inclusions chemistries, composed of MgO-Al2O3-CaO, spinel (MgO-Al2O3), and calcium aluminate inclusions. The industrial samples contained MnS inclusions as well as (Ca,Mn)S + calcium aluminate oxide inclusions. PCA could be used to reduce inclusion chemistry variables to a 2D plot, which revealed inclusion chemistry groupings in the samples. Clustering methods were used to automatically classify inclusion chemistry measurements into groups, i.e., no user-defined rules were required.

  10. Application of Multivariate Statistical Analysis to Biomarkers in Se-Turkey Crude Oils

    NASA Astrophysics Data System (ADS)

    Gürgey, K.; Canbolat, S.

    2017-11-01

    Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%), Stable Carbon Isotope, Gas Chromatography (GC), and Gas Chromatography-Mass Spectrometry (GC-MS) data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE) Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.

  11. Characterization of monofloral honeys with multivariate analysis of their chemical profile and antioxidant activity.

    PubMed

    Sant'Ana, Luiza D'O; Sousa, Juliana P L M; Salgueiro, Fernanda B; Lorenzon, Maria Cristina Affonso; Castro, Rosane N

    2012-01-01

    Various bioactive chemical constituents were quantified for 21 honey samples obtained at Rio de Janeiro and Minas Gerais, Brazil. To evaluate their antioxidant activity, 3 different methods were used: the ferric reducing antioxidant power, the 1,1-diphenyl-2-picrylhydrazyl (DPPH) radical-scavenging activity, and the 2,2'-azinobis (3-ethylbenzothiazolin)-6-sulfonate (ABTS) assays. Correlations between the parameters were statistically significant (-0.6684 ≤ r ≤-0.8410, P < 0.05). Principal component analysis showed that honey samples from the same floral origins had more similar profiles, which made it possible to group the eucalyptus, morrão de candeia, and cambara honey samples in 3 distinct areas, while cluster analysis could separate the artificial honey from the floral honeys. This research might aid in the discrimination of honey floral origin, by using simple analytical methods in association with multivariate analysis, which could also show a great difference among floral honeys and artificial honey, indicating a possible way to help with the identification of artificial honeys. © 2011 Institute of Food Technologists®

  12. Selecting climate simulations for impact studies based on multivariate patterns of climate change.

    PubMed

    Mendlik, Thomas; Gobiet, Andreas

    In climate change impact research it is crucial to carefully select the meteorological input for impact models. We present a method for model selection that enables the user to shrink the ensemble to a few representative members, conserving the model spread and accounting for model similarity. This is done in three steps: First, using principal component analysis for a multitude of meteorological parameters, to find common patterns of climate change within the multi-model ensemble. Second, detecting model similarities with regard to these multivariate patterns using cluster analysis. And third, sampling models from each cluster, to generate a subset of representative simulations. We present an application based on the ENSEMBLES regional multi-model ensemble with the aim to provide input for a variety of climate impact studies. We find that the two most dominant patterns of climate change relate to temperature and humidity patterns. The ensemble can be reduced from 25 to 5 simulations while still maintaining its essential characteristics. Having such a representative subset of simulations reduces computational costs for climate impact modeling and enhances the quality of the ensemble at the same time, as it prevents double-counting of dependent simulations that would lead to biased statistics. The online version of this article (doi:10.1007/s10584-015-1582-0) contains supplementary material, which is available to authorized users.

  13. Sexual Partnership Types as Determinant of HIV Risk in South African MSM: An Event-Level Cluster Analysis

    PubMed Central

    Sandfort, Theo; Yi, Huso; Knox, Justin; Reddy, Vasu

    2012-01-01

    While individual determinants of HIV risk among MSM have been widely studied, there is limited understanding of how relational characteristics determine sexual risk. Based on data collected among 300 South African men who have sex with men (MSM) and using cluster analysis, this study developed a typology of four partnership types: the “Race-Economic Similar,” “Age-Race-Economic Discordant,” “Non-regular Neighbourhood,” and “Familiar” partnership types. Support for the meaningfulness of these types was found through associations of these partnership types with participant characteristics and characteristics of the last anal sex event. Furthermore, in a multivariate analysis, only partnership type independently predicted whether the last anal sex event was unprotected. Findings of the study illustrate the importance of taking into account the relational context in understanding unprotected sexual practices and present ways to target intervention efforts as well as identify relationship specific determinants of unprotected sex. PMID:22956229

  14. Cluster analysis of phytoplankton data collected from the National Stream Quality Accounting Network in the Tennessee River basin, 1974-81

    USGS Publications Warehouse

    Stephens, D.W.; Wangsgard, J.B.

    1988-01-01

    A computer program, Numerical Taxonomy System of Multivariate Statistical Programs (NTSYS), was used with interfacing software to perform cluster analyses of phytoplankton data stored in the biological files of the U.S. Geological Survey. The NTSYS software performs various types of statistical analyses and is capable of handling a large matrix of data. Cluster analyses were done on phytoplankton data collected from 1974 to 1981 at four national Stream Quality Accounting Network stations in the Tennessee River basin. Analysis of the changes in clusters of phytoplankton genera indicated possible changes in the water quality of the French Broad River near Knoxville, Tennessee. At this station, the most common diatom groups indicated a shift in dominant forms with some of the less common diatoms being replaced by green and blue-green algae. There was a reduction in genera variability between 1974-77 and 1979-81 sampling periods. Statistical analysis of chloride and dissolved solids confirmed that concentrations of these substances were smaller in 1974-77 than in 1979-81. At Pickwick Landing Dam, the furthest downstream station used in the study, there was an increase in the number of genera of ' rare ' organisms with time. The appearance of two groups of green and blue-green algae indicated that an increase in temperature or nutrient concentrations occurred from 1974 to 1981, but this could not be confirmed using available water quality data. Associations of genera forming the phytoplankton communities at three stations on the Tennessee River were found to be seasonal. Nodal analysis of combined data from all four stations used in the study did not identify any seasonal or temporal patterns during 1974-81. Cluster analysis using the NYSYS programs was effective in reducing the large phytoplankton data set to a manageable size and provided considerable insight into the structure of phytoplankton communities in the Tennessee River basin. Problems encountered using cluster analysis were the subjectivity introduced in the definition of meaningful clusters, and the lack of taxonomic identification to the species level. (Author 's abstract)

  15. Visual cues for data mining

    NASA Astrophysics Data System (ADS)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  16. A multivariate assessment of changes in wetland habitat for waterbirds at Moosehorn National Wildlife Refuge, Maine, USA

    USGS Publications Warehouse

    Hierl, L.A.; Loftin, C.S.; Longcore, J.R.; McAuley, D.G.; Urban, D.L.

    2007-01-01

    We assessed changes in vegetative structure of 49 impoundments at Moosehorn National Wildlife Refuge (MNWR), Maine, USA, between the periods 1984-1985 to 2002 with a multivariate, adaptive approach that may be useful in a variety of wetland and other habitat management situations. We used Mahalanobis Distance (MD) analysis to classify the refuge?s wetlands as poor or good waterbird habitat based on five variables: percent emergent vegetation, percent shrub, percent open water, relative richness of vegetative types, and an interspersion juxtaposition index that measures adjacency of vegetation patches. Mahalanobis Distance is a multivariate statistic that examines whether a particular data point is an outlier or a member of a data cluster while accounting for correlations among inputs. For each wetland, we used MD analysis to quantify a distance from a reference condition defined a priori by habitat conditions measured in MNWR wetlands used by waterbirds. Twenty-five wetlands declined in quality between the two periods, whereas 23 wetlands improved. We identified specific wetland characteristics that may be modified to improve habitat conditions for waterbirds. The MD analysis seems ideal for instituting an adaptive wetland management approach because metrics can be easily added or removed, ranges of target habitat conditions can be defined by field-collected data, and the analysis can identify priorities for single or multiple management objectives.

  17. Integrated Multivariate Analysis with Nondetects for the Development of Human Sewage Source-Tracking Tools Using Bacteriophages of Enterococcus faecalis.

    PubMed

    Wangkahad, Bencharong; Mongkolsuk, Skorn; Sirikanchana, Kwanrawee

    2017-02-21

    We developed sewage-specific microbial source tracking (MST) tools using enterococci bacteriophages and evaluated their performance with univariate and multivariate analyses involving data below detection limits. Newly isolated Enterococci faecalis bacterial strains AIM06 (DSM100702) and SR14 (DSM100701) demonstrated 100% specificity and 90% sensitivity to human sewage without detecting 68 animal manure pooled samples of cats, chickens, cows, dogs, ducks, pigs, and pigeons. AIM06 and SR14 bacteriophages were present in human sewage at 2-4 orders of magnitude. A principal component analysis confirmed the importance of both phages as main water quality parameters. The phages presented only in the polluted water, as classified by a cluster analysis, and at median concentrations of 1.71 × 10 2 and 4.27 × 10 2 PFU/100 mL, respectively, higher than nonhost specific RYC2056 phages and sewage-specific KS148 phages (p < 0.05). Interestingly, AIM06 and SR14 phages exhibited significant correlations with each other and with total coliforms, E. coli, enterococci, and biochemical oxygen demand (Kendall's tau = 0.348 to 0.605, p < 0.05), a result supporting their roles as water quality indicators. This research demonstrates the multiregional applicability of enterococci hosts in MST application and highlights the significance of multivariate analysis with nondetects in evaluating the performance of new MST host strains.

  18. Phylogenetic Factor Analysis.

    PubMed

    Tolkoff, Max R; Alfaro, Michael E; Baele, Guy; Lemey, Philippe; Suchard, Marc A

    2018-05-01

    Phylogenetic comparative methods explore the relationships between quantitative traits adjusting for shared evolutionary history. This adjustment often occurs through a Brownian diffusion process along the branches of the phylogeny that generates model residuals or the traits themselves. For high-dimensional traits, inferring all pair-wise correlations within the multivariate diffusion is limiting. To circumvent this problem, we propose phylogenetic factor analysis (PFA) that assumes a small unknown number of independent evolutionary factors arise along the phylogeny and these factors generate clusters of dependent traits. Set in a Bayesian framework, PFA provides measures of uncertainty on the factor number and groupings, combines both continuous and discrete traits, integrates over missing measurements and incorporates phylogenetic uncertainty with the help of molecular sequences. We develop Gibbs samplers based on dynamic programming to estimate the PFA posterior distribution, over 3-fold faster than for multivariate diffusion and a further order-of-magnitude more efficiently in the presence of latent traits. We further propose a novel marginal likelihood estimator for previously impractical models with discrete data and find that PFA also provides a better fit than multivariate diffusion in evolutionary questions in columbine flower development, placental reproduction transitions and triggerfish fin morphometry.

  19. PCA based clustering for brain tumor segmentation of T1w MRI images.

    PubMed

    Kaya, Irem Ersöz; Pehlivanlı, Ayça Çakmak; Sekizkardeş, Emine Gezmez; Ibrikci, Turgay

    2017-03-01

    Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering. In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256 × 256, 128 × 128 and 64 × 64, were included in the study to examine their effect on the methods. The obtained results were compared in terms of both the reconstruction errors and the Euclidean distance errors among the clustered images containing the same number of principle components. According to the findings, the PPCA obtained the best results among all others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving significant results with both clustering algorithms for all size of T1w MRI images. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. Identification and Characterization of Unique Subgroups of Chronic Pain Individuals with Dispositional Personality Traits.

    PubMed

    Mehta, S; Rice, D; McIntyre, A; Getty, H; Speechley, M; Sequeira, K; Shapiro, A P; Morley-Forster, P; Teasell, R W

    2016-01-01

    Objective. The current study attempted to identify and characterize distinct CP subgroups based on their level of dispositional personality traits. The secondary objective was to compare the difference among the subgroups in mood, coping, and disability. Methods. Individuals with chronic pain were assessed for demographic, psychosocial, and personality measures. A two-step cluster analysis was conducted in order to identify distinct subgroups of patients based on their level of personality traits. Differences in clinical outcomes were compared using the multivariate analysis of variance based on cluster membership. Results. In 229 participants, three clusters were formed. No significant difference was seen among the clusters on patient demographic factors including age, sex, relationship status, duration of pain, and pain intensity. Those with high levels of dispositional personality traits had greater levels of mood impairment compared to the other two groups (p < 0.05). Significant difference in disability was seen between the subgroups. Conclusions. The study identified a high risk group of CP individuals whose level of personality traits significantly correlated with impaired mood and coping. Use of pharmacological treatment alone may not be successful in improving clinical outcomes among these individuals. Instead, a more comprehensive treatment involving psychological treatments may be important in managing the personality traits that interfere with recovery.

  1. The Wilcoxon signed rank test for paired comparisons of clustered data.

    PubMed

    Rosner, Bernard; Glynn, Robert J; Lee, Mei-Ling T

    2006-03-01

    The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.

  2. Multivariate analysis of water quality and environmental variables in the Great Barrier Reef catchments

    NASA Astrophysics Data System (ADS)

    Ryu, D.; Liu, S.; Western, A. W.; Webb, J. A.; Lintern, A.; Leahy, P.; Wilson, P.; Watson, M.; Waters, D.; Bende-Michl, U.

    2016-12-01

    The Great Barrier Reef (GBR) lagoon has been experiencing significant water quality deterioration due in part to agricultural intensification and urban settlement in adjacent catchments. The degradation of water quality in rivers is caused by land-derived pollutants (i.e. sediment, nutrient and pesticide). A better understanding of dynamics of water quality is essential for land management to improve the GBR ecosystem. However, water quality is also greatly influenced by natural hydrological processes. To assess influencing factors and predict the water quality accurately, selection of the most important predictors of water quality is necessary. In this work, multivariate statistical techniques - cluster analysis (CA), principal component analysis (PCA) and factor analysis (FA) - are used to reduce the complexity derived from the multidimensional water quality monitoring data. Seventeen stations are selected across the GBR catchments, and the event-based measurements of 12 variables monitored during 9 years (2006 - 2014) were analysed by means of CA and PCA/FA. The key findings are: (1) 17 stations can be grouped into two clusters according to the hierarchical CA, and the spatial dissimilarity between these sites is characterised by the different climatic and land use in the GBR catchments. (2) PCA results indicate that the first 3 PCs explain 85% of the total variance, and FA on the entire data set shows that the varifactor (VF) loadings can be used to interpret the sources of spatial variation in water quality on the GBR catchments level. The impact of soil erosion and non-point source of pollutants from agriculture contribution to VF1 and the variability in hydrological conditions and biogeochemical processes can explain the loadings in VF2. (3) FA is also performed on two groups of sites identified in CA individually, to evaluate the underlying sources that are responsible for spatial variability in water quality in the two groups. For the Cluster 1 sites, spatial variations in water quality are likely from the agricultural inputs (fertilises) and for the Cluster 2 sites, the differences in hydrological transport is responsible for large spatial variations in water quality. These findings can be applied to water quality assessment along with establish effective water and land management in the future.

  3. Toxic effects of two brominated flame retardants BDE-47 and BDE-183 on the survival and protein expression of the tubificid Monopylephorus limosus.

    PubMed

    Chiu, K H; Lin, C-R; Huang, H-W; Shiea, J; Liu, L L

    2012-10-01

    The toxic effects of two brominated diphenyl ethers (BDE), BDE-47, and BDE-183, on a benthic oligochaete tubificid, Monopylephorus limosus were studied under laboratory conditions. Investigated responses included survival, growth, and protein expression profiles, at BDE concentrations of 1, 10, 100, and 700 ng/g on a dry soil weight basis, with isooctane as the carrier solvent. Body weight losses among treatments were insignificant after 8 weeks of exposure. The 8-wk LC(50) of BDE-47 and -183 were 2311 and 169 ng/g, respectively. By applying multivariate analysis techniques, protein expression patterns were compared and correlated with stressful sources of long-term culture, carrier solvent, BDE-47 and -183. The treatment of 8-wk 100 ng/g BDE-47 was most closely clustered to the 10 ng/g BDE-183 treatment, based on the 40 examined protein spots. This indicated that BDE-183 was more potent to M. limosus, than was BDE-47. The 2-wk and 8-wk controls clustered into different groups indicating the occurrence of physiological changes due to long-term laboratory culture. Additionally, solvent effect was shown by grouping the isooctane carrier to different clusters. With further characterization by principle component analysis, it was found that the separation was mainly contributed by the 2nd principal-component. And, the primarily inhibitory variation was at spots 2 (UMP-CMP kinase) and 40 (plasma retinol-binding protein precursor) in the 8-wk groups. On the contrary, protein spots 16 (cell division control protein 2 homolog) and 24 (mitochondrial DNA mismatch repair protein) showed stimulatory variation. In all, the observed proteomic responses suggest that BDEs disrupted metabolic function in M. limosus and multivariate analysis tool offers significant potential for the assessment of various stress sources at biochemical level. Copyright © 2012 Elsevier Inc. All rights reserved.

  4. Selection Indices and Multivariate Analysis Show Similar Results in the Evaluation of Growth and Carcass Traits in Beef Cattle

    PubMed Central

    Brito Lopes, Fernando; da Silva, Marcelo Corrêa; Magnabosco, Cláudio Ulhôa; Goncalves Narciso, Marcelo; Sainz, Roberto Daniel

    2016-01-01

    This research evaluated a multivariate approach as an alternative tool for the purpose of selection regarding expected progeny differences (EPDs). Data were fitted using a multi-trait model and consisted of growth traits (birth weight and weights at 120, 210, 365 and 450 days of age) and carcass traits (longissimus muscle area (LMA), back-fat thickness (BF), and rump fat thickness (RF)), registered over 21 years in extensive breeding systems of Polled Nellore cattle in Brazil. Multivariate analyses were performed using standardized (zero mean and unit variance) EPDs. The k mean method revealed that the best fit of data occurred using three clusters (k = 3) (P < 0.001). Estimates of genetic correlation among growth and carcass traits and the estimates of heritability were moderate to high, suggesting that a correlated response approach is suitable for practical decision making. Estimates of correlation between selection indices and the multivariate index (LD1) were moderate to high, ranging from 0.48 to 0.97. This reveals that both types of indices give similar results and that the multivariate approach is reliable for the purpose of selection. The alternative tool seems very handy when economic weights are not available or in cases where more rapid identification of the best animals is desired. Interestingly, multivariate analysis allowed forecasting information based on the relationships among breeding values (EPDs). Also, it enabled fine discrimination, rapid data summarization after genetic evaluation, and permitted accounting for maternal ability and the genetic direct potential of the animals. In addition, we recommend the use of longissimus muscle area and subcutaneous fat thickness as selection criteria, to allow estimation of breeding values before the first mating season in order to accelerate the response to individual selection. PMID:26789008

  5. Selection Indices and Multivariate Analysis Show Similar Results in the Evaluation of Growth and Carcass Traits in Beef Cattle.

    PubMed

    Brito Lopes, Fernando; da Silva, Marcelo Corrêa; Magnabosco, Cláudio Ulhôa; Goncalves Narciso, Marcelo; Sainz, Roberto Daniel

    2016-01-01

    This research evaluated a multivariate approach as an alternative tool for the purpose of selection regarding expected progeny differences (EPDs). Data were fitted using a multi-trait model and consisted of growth traits (birth weight and weights at 120, 210, 365 and 450 days of age) and carcass traits (longissimus muscle area (LMA), back-fat thickness (BF), and rump fat thickness (RF)), registered over 21 years in extensive breeding systems of Polled Nellore cattle in Brazil. Multivariate analyses were performed using standardized (zero mean and unit variance) EPDs. The k mean method revealed that the best fit of data occurred using three clusters (k = 3) (P < 0.001). Estimates of genetic correlation among growth and carcass traits and the estimates of heritability were moderate to high, suggesting that a correlated response approach is suitable for practical decision making. Estimates of correlation between selection indices and the multivariate index (LD1) were moderate to high, ranging from 0.48 to 0.97. This reveals that both types of indices give similar results and that the multivariate approach is reliable for the purpose of selection. The alternative tool seems very handy when economic weights are not available or in cases where more rapid identification of the best animals is desired. Interestingly, multivariate analysis allowed forecasting information based on the relationships among breeding values (EPDs). Also, it enabled fine discrimination, rapid data summarization after genetic evaluation, and permitted accounting for maternal ability and the genetic direct potential of the animals. In addition, we recommend the use of longissimus muscle area and subcutaneous fat thickness as selection criteria, to allow estimation of breeding values before the first mating season in order to accelerate the response to individual selection.

  6. Detecting hybridization between Iranian wild wolf (Canis lupus pallipes) and free-ranging domestic dog (Canis familiaris) by analysis of microsatellite markers.

    PubMed

    Khosravi, Rasoul; Rezaei, Hamid Reza; Kaboli, Mohammad

    2013-01-01

    The genetic threat due to hybridization with free-ranging dogs is one major concern in wolf conservation. The identification of hybrids and extent of hybridization is important in the conservation and management of wolf populations. Genetic variation was analyzed at 15 unlinked loci in 28 dogs, 28 wolves, four known hybrids, two black wolves, and one dog with abnormal traits in Iran. Pritchard's model, multivariate ordination by principal component analysis and neighbor joining clustering were used for population clustering and individual assignment. Analysis of genetic variation showed that genetic variability is high in both wolf and dog populations in Iran. Values of H(E) in dog and wolf samples ranged from 0.75-0.92 and 0.77-0.92, respectively. The results of AMOVA showed that the two groups of dog and wolf were significantly different (F(ST) = 0.05 and R(ST) = 0.36; P < 0.001). In each of the three methods, wolf and dog samples were separated into two distinct clusters. Two dark wolves were assigned to the wolf cluster. Also these models detected D32 (dog with abnormal traits) and some other samples, which were assigned to more than one cluster and could be a hybrid. This study is the beginning of a genetic study in wolf populations in Iran, and our results reveal that as in other countries, hybridization between wolves and dogs is sporadic in Iran and can be a threat to wolf populations if human perturbations increase.

  7. The evolution of cerebrotypes in birds.

    PubMed

    Iwaniuk, Andrew N; Hurd, Peter L

    2005-01-01

    Multivariate analyses of brain composition in mammals, amphibians and fish have revealed the evolution of 'cerebrotypes' that reflect specific niches and/or clades. Here, we present the first demonstration of similar cerebrotypes in birds. Using principal component analysis and hierarchical clustering methods to analyze a data set of 67 species, we demonstrate that five main cerebrotypes can be recognized. One type is dominated by galliforms and pigeons, among other species, that all share relatively large brainstems, but can be further differentiated by the proportional size of the cerebellum and telencephalic regions. The second cerebrotype contains a range of species that all share relatively large cerebellar and small nidopallial volumes. A third type is composed of two species, the tawny frogmouth (Podargus strigoides) and an owl, both of which share extremely large Wulst volumes. Parrots and passerines, the principal members of the fourth group, possess much larger nidopallial, mesopallial and striatopallidal proportions than the other groups. The fifth cerebrotype contains species such as raptors and waterfowl that are not found at the extremes for any of the brain regions and could therefore be classified as 'generalist' brains. Overall, the clustering of species does not directly reflect the phylogenetic relationships among species, but there is a tendency for species within an order to clump together. There may also be a weak relationship between cerebrotype and developmental differences, but two of the main clusters contained species with both altricial and precocial developmental patterns. As a whole, the groupings do agree with behavioral and ecological similarities among species. Most notably, species that share similarities in locomotor behavior, mode of prey capture or cognitive ability are clustered together. The relationship between cerebrotype and behavior/ecology in birds suggests that future comparative studies of brain-behavior relationships will benefit from adopting a multivariate approach. Copyright 2005 S. Karger AG, Basel.

  8. Traveling around Cape Horn: Otolith chemistry reveals a mixed stock of Patagonian hoki with separate Atlantic and Pacific spawning grounds

    USGS Publications Warehouse

    Schuchert, P.C.; Arkhipkin, A.I.; Koenig, A.E.

    2010-01-01

    Trace element fingerprints of edge and core regions in otoliths from 260 specimens of Patagonian hoki, Macruronus magellanicus L??nnberg, 1907, were analyzed by LA-ICPMS to reveal whether this species forms one or more population units (stocks) in the Southern Oceans. Fish were caught on their spawning grounds in Chile and feeding grounds in Chile and the Falkland Islands. Univariate and multivariate analyses of trace element concentrations in the otolith edges, which relate to the adult life of fish, could not distinguish between Atlantic (Falkland) and Pacific (Chile) hoki. Cluster analyses of element concentrations in the otolith edges produced three different clusters in all sample areas indicating high mixture of the stocks. Cluster analysis of trace element concentrations in the otolith cores, relating to juvenile and larval life stages, produced two separate clusters mainly distinguished by 137Ba concentrations. The results suggest that Patagonian hoki is a highly mixed fish stock with at least two spawning grounds around South America. ?? 2009 Elsevier B.V.

  9. A novel combined approach of diffuse reflectance UV-Vis-NIR spectroscopy and multivariate analysis for non-destructive examination of blue ballpoint pen inks in forensic application.

    PubMed

    Kumar, Raj; Sharma, Vishal

    2017-03-15

    The present research is focused on the analysis of writing inks using destructive UV-Vis spectroscopy (dissolution of ink by the solvent) and non-destructive diffuse reflectance UV-Vis-NIR spectroscopy along with Chemometrics. Fifty seven samples of blue ballpoint pen inks were analyzed under optimum conditions to determine the differences in spectral features of inks among same and different manufacturers. Normalization was performed on the spectroscopic data before chemometric analysis. Principal Component Analysis (PCA) and K-mean cluster analysis were used on the data to ascertain whether the blue ballpoint pen inks could be differentiated by their UV-Vis/UV-Vis NIR spectra. The discriminating power is calculated by qualitative analysis by the visual comparison of the spectra (absorbance peaks), produced by the destructive and non-destructive methods. In the latter two methods, the pairwise comparison is made by incorporating the clustering method. It is found that chemometric method provides better discriminating power (98.72% and 99.46%, in destructive and non-destructive, respectively) in comparison to the qualitative analysis (69.67%). Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Source Apportionment and Risk Assessment of Emerging Contaminants: An Approach of Pharmaco-Signature in Water Systems

    PubMed Central

    Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.

    2015-01-01

    This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375

  11. Origin Discrimination of Osmanthus fragrans var. thunbergii Flowers using GC-MS and UPLC-PDA Combined with Multivariable Analysis Methods.

    PubMed

    Zhou, Fei; Zhao, Yajing; Peng, Jiyu; Jiang, Yirong; Li, Maiquan; Jiang, Yuan; Lu, Baiyi

    2017-07-01

    Osmanthus fragrans flowers are used as folk medicine and additives for teas, beverages and foods. The metabolites of O. fragrans flowers from different geographical origins were inconsistent in some extent. Chromatography and mass spectrometry combined with multivariable analysis methods provides an approach for discriminating the origin of O. fragrans flowers. To discriminate the Osmanthus fragrans var. thunbergii flowers from different origins with the identified metabolites. GC-MS and UPLC-PDA were conducted to analyse the metabolites in O. fragrans var. thunbergii flowers (in total 150 samples). Principal component analysis (PCA), soft independent modelling of class analogy analysis (SIMCA) and random forest (RF) analysis were applied to group the GC-MS and UPLC-PDA data. GC-MS identified 32 compounds common to all samples while UPLC-PDA/QTOF-MS identified 16 common compounds. PCA of the UPLC-PDA data generated a better clustering than PCA of the GC-MS data. Ten metabolites (six from GC-MS and four from UPLC-PDA) were selected as effective compounds for discrimination by PCA loadings. SIMCA and RF analysis were used to build classification models, and the RF model, based on the four effective compounds (caffeic acid derivative, acteoside, ligustroside and compound 15), yielded better results with the classification rate of 100% in the calibration set and 97.8% in the prediction set. GC-MS and UPLC-PDA combined with multivariable analysis methods can discriminate the origin of Osmanthus fragrans var. thunbergii flowers. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Sequential Linker Installation: Precise Placement of Functional Groups in Multivariate Metal-Organic Frameworks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yuan, S; Lu, WG; Chen, YP

    2015-03-11

    A unique strategy, sequential linker installation (SLI), has been developed to construct multivariate MOFs with functional groups precisely positioned. PCN-700, a Zr-MOF with eight-connected Zr6O4(OH)(8)(H2O)(4) clusters, has been judiciously designed; the Zr-6 clusters in this MOF are arranged in such a fashion that, by replacement of terminal OH-/H2O ligands, subsequent insertion of linear dicarboxylate linkers is achieved. We demonstrate that linkers with distinct lengths and functionalities can be sequentially installed into PCN-700. Single-crystal to single-crystal transformation is realized so that the positions of the subsequently installed linkers are pinpointed via single-crystal X-ray diffraction analyses. This methodology provides a powerful toolmore » to construct multivariate MOFs with precisely positioned functionalities in the desired proximity, which would otherwise be difficult to achieve.« less

  13. Cross multivariate correlation coefficients as screening tool for analysis of concurrent EEG-fMRI recordings.

    PubMed

    Ji, Hong; Petro, Nathan M; Chen, Badong; Yuan, Zejian; Wang, Jianji; Zheng, Nanning; Keil, Andreas

    2018-02-06

    Over the past decade, the simultaneous recording of electroencephalogram (EEG) and functional magnetic resonance imaging (fMRI) data has garnered growing interest because it may provide an avenue towards combining the strengths of both imaging modalities. Given their pronounced differences in temporal and spatial statistics, the combination of EEG and fMRI data is however methodologically challenging. Here, we propose a novel screening approach that relies on a Cross Multivariate Correlation Coefficient (xMCC) framework. This approach accomplishes three tasks: (1) It provides a measure for testing multivariate correlation and multivariate uncorrelation of the two modalities; (2) it provides criterion for the selection of EEG features; (3) it performs a screening of relevant EEG information by grouping the EEG channels into clusters to improve efficiency and to reduce computational load when searching for the best predictors of the BOLD signal. The present report applies this approach to a data set with concurrent recordings of steady-state-visual evoked potentials (ssVEPs) and fMRI, recorded while observers viewed phase-reversing Gabor patches. We test the hypothesis that fluctuations in visuo-cortical mass potentials systematically covary with BOLD fluctuations not only in visual cortical, but also in anterior temporal and prefrontal areas. Results supported the hypothesis and showed that the xMCC-based analysis provides straightforward identification of neurophysiological plausible brain regions with EEG-fMRI covariance. Furthermore xMCC converged with other extant methods for EEG-fMRI analysis. © 2018 The Authors Journal of Neuroscience Research Published by Wiley Periodicals, Inc.

  14. Characteristics of HIV-infected U.S. Army soldiers linked in molecular transmission clusters, 2001-2012

    PubMed Central

    Jagodzinski, Linda L.; Liu, Ying; Pham, Peter T.; Kijak, Gustavo H.; Tovanabutra, Sodsai; McCutchan, Francine E.; Scoville, Stephanie L.; Cersovsky, Steven B.; Michael, Nelson L.; Scott, Paul T.; Peel, Sheila A.

    2017-01-01

    Objective Recent surveillance data suggests the United States (U.S.) Army HIV epidemic is concentrated among men who have sex with men. To identify potential targets for HIV prevention strategies, the relationship between demographic and clinical factors and membership within transmission clusters based on baseline pol sequences of HIV-infected Soldiers from 2001 through 2012 were analyzed. Methods We conducted a retrospective analysis of baseline partial pol sequences, demographic and clinical characteristics available for all Soldiers in active service and newly-diagnosed with HIV-1 infection from January 1, 2001 through December 31, 2012. HIV-1 subtype designations and transmission clusters were identified from phylogenetic analysis of sequences. Univariate and multivariate logistic regression models were used to evaluate and adjust for the association between characteristics and cluster membership. Results Among 518 of 995 HIV-infected Soldiers with available partial pol sequences, 29% were members of a transmission cluster. Assignment to a southern U.S. region at diagnosis and year of diagnosis were independently associated with cluster membership after adjustment for other significant characteristics (p<0.10) of age, race, year of diagnosis, region of duty assignment, sexually transmitted infections, last negative HIV test, antiretroviral therapy, and transmitted drug resistance. Subtyping of the pol fragment indicated HIV-1 subtype B infection predominated (94%) among HIV-infected Soldiers. Conclusion These findings identify areas to explore as HIV prevention targets in the U.S. Army. An increased frequency of current force testing may be justified, especially among Soldiers assigned to duty in installations with high local HIV prevalence such as southern U.S. states. PMID:28759645

  15. Clusters of Healthy and Unhealthy Eating Behaviors are Associated with Body Mass Index Among Adults

    PubMed Central

    Heerman, William J.; Jackson, Natalie; Hargreaves, Margaret; Mulvaney, Shelagh A.; Schlundt, David; Wallston, Kenneth A.; Rothman, Russell L.

    2017-01-01

    Objective To identify eating styles from 6 eating behaviors and test their association with Body Mass Index (BMI) among adults. Design Cross-sectional analysis of self-report survey data Setting 12 primary care and specialty clinics in 5 states Participants 11,776 adult patients consented to participate; 9,977 completed survey questions. Variables measured Frequency of eating healthy food; frequency of eating unhealthy food; breakfast frequency; frequency of snacking; overall diet quality; and problem eating behaviors. The primary dependent variable was BMI, calculated from self-reported height and weight data. Analysis Kmeans cluster analysis of eating behaviors was used to determine eating styles. A categorical variable representing each eating style cluster was entered in a multivariate linear regression predicting BMI, controlling for covariates. Results Four eating styles were identified and defined by healthy vs. unhealthy diet patterns and engagement in problem eating behaviors. Each group had significantly higher average BMI than the healthy eating style: healthy with problem eating behaviors (β=1.9, p<0.001); unhealthy (β=2.5, p<0.001), and unhealthy with problem eating behaviors (β=5.1, p<0.001). Conclusions Future attempts to improve eating styles should address not only the consumption of healthy foods, but also snacking behaviors and the emotional component of eating. PMID:28363804

  16. Detecting subtle hydrochemical anomalies with multivariate statistics: an example from homogeneous groundwaters in the Great Artesian Basin, Australia

    NASA Astrophysics Data System (ADS)

    O'Shea, Bethany; Jankowski, Jerzy

    2006-12-01

    The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright

  17. Towards the authentication of European sea bass origin through a combination of biometric measurements and multiple analytical techniques.

    PubMed

    Farabegoli, Federica; Pirini, Maurizio; Rotolo, Magda; Silvi, Marina; Testi, Silvia; Ghidini, Sergio; Zanardi, Emanuela; Remondini, Daniel; Bonaldo, Alessio; Parma, Luca; Badiani, Anna

    2018-06-08

    The authenticity of fish products has become an imperative issue for authorities involved in the protection of consumers against fraudulent practices and in the market stabilization. The present study aimed to provide a method for authentication of European sea bass (Dicentrarchus labrax) according to the requirements for seafood labels (Regulation 1379/2013/EU). Data on biometric traits, fatty acid profile, elemental composition, and isotopic abundance of wild and reared (intensively, semi-intensively and extensively) specimens from 18 Southern European sources (n = 160) were collected and clustered in 6 sets of parameters, then subjected to multivariate analysis. Correct allocations of subjects according to their production method, origin and stocking density were demonstrated with good approximation rates (94%, 92% and 92%, respectively) using fatty acid profiles. Less satisfying results were obtained using isotopic abundance, biometric traits, and elemental composition. The multivariate analysis also revealed that extensively reared subjects cannot be analytically discriminated from wild ones.

  18. Multivariate proteomic profiling identifies novel accessory proteins of coated vesicles

    PubMed Central

    Antrobus, Robin; Hirst, Jennifer; Bhumbra, Gary S.; Kozik, Patrycja; Jackson, Lauren P.; Sahlender, Daniela A.

    2012-01-01

    Despite recent advances in mass spectrometry, proteomic characterization of transport vesicles remains challenging. Here, we describe a multivariate proteomics approach to analyzing clathrin-coated vesicles (CCVs) from HeLa cells. siRNA knockdown of coat components and different fractionation protocols were used to obtain modified coated vesicle-enriched fractions, which were compared by stable isotope labeling of amino acids in cell culture (SILAC)-based quantitative mass spectrometry. 10 datasets were combined through principal component analysis into a “profiling” cluster analysis. Overall, 136 CCV-associated proteins were predicted, including 36 new proteins. The method identified >93% of established CCV coat proteins and assigned >91% correctly to intracellular or endocytic CCVs. Furthermore, the profiling analysis extends to less well characterized types of coated vesicles, and we identify and characterize the first AP-4 accessory protein, which we have named tepsin. Finally, our data explain how sequestration of TACC3 in cytosolic clathrin cages causes the severe mitotic defects observed in auxilin-depleted cells. The profiling approach can be adapted to address related cell and systems biological questions. PMID:22472443

  19. Opportunities for multivariate analysis of open spatial datasets to characterize urban flooding risks

    NASA Astrophysics Data System (ADS)

    Gaitan, S.; ten Veldhuis, J. A. E.

    2015-06-01

    Cities worldwide are challenged by increasing urban flood risks. Precise and realistic measures are required to reduce flooding impacts. However, currently implemented sewer and topographic models do not provide realistic predictions of local flooding occurrence during heavy rain events. Assessing other factors such as spatially distributed rainfall, socioeconomic characteristics, and social sensing, may help to explain probability and impacts of urban flooding. Several spatial datasets have been recently made available in the Netherlands, including rainfall-related incident reports made by citizens, spatially distributed rain depths, semidistributed socioeconomic information, and buildings age. Inspecting the potential of this data to explain the occurrence of rainfall related incidents has not been done yet. Multivariate analysis tools for describing communities and environmental patterns have been previously developed and used in the field of study of ecology. The objective of this paper is to outline opportunities for these tools to explore urban flooding risks patterns in the mentioned datasets. To that end, a cluster analysis is performed. Results indicate that incidence of rainfall-related impacts is higher in areas characterized by older infrastructure and higher population density.

  20. Synchronization of world economic activity

    NASA Astrophysics Data System (ADS)

    Groth, Andreas; Ghil, Michael

    2017-12-01

    Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.

  1. Synchronization of world economic activity.

    PubMed

    Groth, Andreas; Ghil, Michael

    2017-12-01

    Common dynamical properties of business cycle fluctuations are studied in a sample of more than 100 countries that represent economic regions from all around the world. We apply the methodology of multivariate singular spectrum analysis (M-SSA) to identify oscillatory modes and to detect whether these modes are shared by clusters of phase- and frequency-locked oscillators. An extension of the M-SSA approach is introduced to help analyze structural changes in the cluster configuration of synchronization. With this novel technique, we are able to identify a common mode of business cycle activity across our sample, and thus point to the existence of a world business cycle. Superimposed on this mode, we further identify several major events that have markedly influenced the landscape of world economic activity in the postwar era.

  2. Impact of human activity and natural processes on groundwater arsenic in an urbanized area (South China) using multivariate statistical techniques.

    PubMed

    Huang, Guanxing; Chen, Zongyu; Liu, Fan; Sun, Jichao; Wang, Jincui

    2014-11-01

    Anthropogenic factors resulted from the urbanization may affect the groundwater As in urbanized areas. Groundwater samples from the Guangzhou city (South China) were collected for As and other parameter analysis, in order to assess the impact of urbanization and natural processes on As distribution in aquifers. Nearly 25.5 % of groundwater samples were above the WHO drinking water standard for As, and the As concentrations in the granular aquifer (GA) were generally far higher than that in the fractured bedrock aquifer (FBA). Samples were classified into four clusters by using hierarchical cluster analysis. Cluster 1 is mainly located in the FBA and controlled by natural processes. Anthropogenic pollution resulted from the urbanization is responsible for high As concentrations identified in cluster 2. Clusters 3 and 4 are mainly located in the GA and controlled by both natural processes and anthropogenic factors. Three main mechanisms control the source and mobilization of groundwater As in the study area. Firstly, the interaction of water and calcareous rocks appears to be responsible for As release in the FBA. Secondly, reduction of Fe/Mn oxyhydroxides and decomposition of organic matter are probably responsible for high As concentrations in the GA. Thirdly, during the process of urbanization, the infiltration of wastewater/leachate with a high As content is likely to be the main source for groundwater As, while NO3 (-) contamination diminishes groundwater As.

  3. Redefining the Breast Cancer Exosome Proteome by Tandem Mass Tag Quantitative Proteomics and Multivariate Cluster Analysis.

    PubMed

    Clark, David J; Fondrie, William E; Liao, Zhongping; Hanson, Phyllis I; Fulton, Amy; Mao, Li; Yang, Austin J

    2015-10-20

    Exosomes are microvesicles of endocytic origin constitutively released by multiple cell types into the extracellular environment. With evidence that exosomes can be detected in the blood of patients with various malignancies, the development of a platform that uses exosomes as a diagnostic tool has been proposed. However, it has been difficult to truly define the exosome proteome due to the challenge of discerning contaminant proteins that may be identified via mass spectrometry using various exosome enrichment strategies. To better define the exosome proteome in breast cancer, we incorporated a combination of Tandem-Mass-Tag (TMT) quantitative proteomics approach and Support Vector Machine (SVM) cluster analysis of three conditioned media derived fractions corresponding to a 10 000g cellular debris pellet, a 100 000g crude exosome pellet, and an Optiprep enriched exosome pellet. The quantitative analysis identified 2 179 proteins in all three fractions, with known exosomal cargo proteins displaying at least a 2-fold enrichment in the exosome fraction based on the TMT protein ratios. Employing SVM cluster analysis allowed for the classification 251 proteins as "true" exosomal cargo proteins. This study provides a robust and vigorous framework for the future development of using exosomes as a potential multiprotein marker phenotyping tool that could be useful in breast cancer diagnosis and monitoring disease progression.

  4. Whole Blood Gene Expression Profiling Predicts Severe Morbidity and Mortality in Cystic Fibrosis: A 5-Year Follow-Up Study.

    PubMed

    Saavedra, Milene T; Quon, Bradley S; Faino, Anna; Caceres, Silvia M; Poch, Katie R; Sanders, Linda A; Malcolm, Kenneth C; Nichols, David P; Sagel, Scott D; Taylor-Cousar, Jennifer L; Leach, Sonia M; Strand, Matthew; Nick, Jerry A

    2018-05-01

    Cystic fibrosis pulmonary exacerbations accelerate pulmonary decline and increase mortality. Previously, we identified a 10-gene leukocyte panel measured directly from whole blood, which indicates response to exacerbation treatment. We hypothesized that molecular characteristics of exacerbations could also predict future disease severity. We tested whether a 10-gene panel measured from whole blood could identify patient cohorts at increased risk for severe morbidity and mortality, beyond standard clinical measures. Transcript abundance for the 10-gene panel was measured from whole blood at the beginning of exacerbation treatment (n = 57). A hierarchical cluster analysis of subjects based on their gene expression was performed, yielding four molecular clusters. An analysis of cluster membership and outcomes incorporating an independent cohort (n = 21) was completed to evaluate robustness of cluster partitioning of genes to predict severe morbidity and mortality. The four molecular clusters were analyzed for differences in forced expiratory volume in 1 second, C-reactive protein, return to baseline forced expiratory volume in 1 second after treatment, time to next exacerbation, and time to morbidity or mortality events (defined as lung transplant referral, lung transplant, intensive care unit admission for respiratory insufficiency, or death). Clustering based on gene expression discriminated between patient groups with significant differences in forced expiratory volume in 1 second, admission frequency, and overall morbidity and mortality. At 5 years, all subjects in cluster 1 (very low risk) were alive and well, whereas 90% of subjects in cluster 4 (high risk) had suffered a major event (P = 0.0001). In multivariable analysis, the ability of gene expression to predict clinical outcomes remained significant, despite adjustment for forced expiratory volume in 1 second, sex, and admission frequency. The robustness of gene clustering to categorize patients appropriately in terms of clinical characteristics, and short- and long-term clinical outcomes, remained consistent, even when adding in a secondary population with significantly different clinical outcomes. Whole blood gene expression profiling allows molecular classification of acute pulmonary exacerbations, beyond standard clinical measures, providing a predictive tool for identifying subjects at increased risk for mortality and disease progression.

  5. GENETIC AND ENVIRONMENTAL CONTRIBUTIONS TO THE CO-OCCURRENCE OF DEPRESSIVE PERSONALITY DISORDER AND DSM-IV PERSONALITY DISORDERS

    PubMed Central

    Ørstavik, Ragnhild E.; Kendler, Kenneth S.; Røysamb, Espen; Czajkowski, Nikolai; Tambs, Kristian; Reichborn-Kjennerud, Ted

    2012-01-01

    One of the main controversies with regard to depressive personality disorder (DPD) concerns the co-occurrence with the established DSM-IV personality disorders (PDs). The main aim of this study was to examine to what extent DPD and the DSM-IV PDs share genetic and environmental risk factors, using multivariate twin modeling. The DSM-IV Structured Interview for Personality was applied to 2,794 young adult twins. Paranoid PD from Cluster A, borderline PD from Cluster B, and all three PDs from Cluster C were independently and significantly associated with DPD in multiple regression analysis. The genetic correlations between DPD and the other PDs were strong (.53–.83), while the environmental correlations were moderate (.36–.40). Close to 50% of the total variance in DPD was disorder specific. However, only 5% was due to disorder-specific genetic factors, indicating that a substantial part of the genetic vulnerability to DPD also increases the vulnerability to other PDs. PMID:22686231

  6. Genetic and environmental contributions to the co-occurrence of depressive personality disorder and DSM-IV personality disorders.

    PubMed

    Ørstavik, Ragnhild E; Kendler, Kenneth S; Røysamb, Espen; Czajkowski, Nikolai; Tambs, Kristian; Reichborn-Kjennerud, Ted

    2012-06-01

    One of the main controversies with regard to depressive personality disorder (DPD) concerns the co-occurrence with the established DSM-IV personality disorders (PDs). The main aim of this study was to examine to what extent DPD and the DSM-IV PDs share genetic and environmental risk factors, using multivariate twin modeling. The DSM-IV Structured Interview for Personality was applied to 2,794 young adult twins. Paranoid PD from Cluster A, borderline PD from Cluster B, and all three PDs from Cluster C were independently and significantly associated with DPD in multiple regression analysis. The genetic correlations between DPD and the other PDs were strong (.53-.83), while the environmental correlations were moderate (.36-.40). Close to 50% of the total variance in DPD was disorder specific. However, only 5% was due to disorder-specific genetic factors, indicating that a substantial part of the genetic vulnerability to DPD also increases the vulnerability to other PDs.

  7. Mental toughness profiles and their relations with achievement goals and sport motivation in adolescent Australian footballers.

    PubMed

    Gucciardi, Daniel F

    2010-04-01

    The aims of this study were to identify the mental toughness profiles of adolescent Australian footballers and to explore the relations between the mental toughness clusters and achievement goals and sport motivation. A total of 214 non-elite, male Australian footballers aged 16-18 years (mean = 16.8, s = 0.7) provided self-reports of mental toughness, achievement goals, and sport motivation. Cluster analysis supported the presence of two-groups in which players evidenced moderate and high levels of all four mental toughness subscales. Significant multivariate effects were observed for achievement goals and sport motivation with the high mental toughness group favouring both mastery- and performance-approach goals and self-determined as well as extrinsic motivational tendencies. The results suggest that adolescent Australian footballers' self-perceptions of mental toughness fall within two clusters involving high and moderate forms of all four components, and that these profiles show varying relations with achievement goals (particularly mastery-approach) and sport motivation.

  8. Unsupervised classification of multivariate geostatistical data: Two algorithms

    NASA Astrophysics Data System (ADS)

    Romary, Thomas; Ors, Fabien; Rivoirard, Jacques; Deraisme, Jacques

    2015-12-01

    With the increasing development of remote sensing platforms and the evolution of sampling facilities in mining and oil industry, spatial datasets are becoming increasingly large, inform a growing number of variables and cover wider and wider areas. Therefore, it is often necessary to split the domain of study to account for radically different behaviors of the natural phenomenon over the domain and to simplify the subsequent modeling step. The definition of these areas can be seen as a problem of unsupervised classification, or clustering, where we try to divide the domain into homogeneous domains with respect to the values taken by the variables in hand. The application of classical clustering methods, designed for independent observations, does not ensure the spatial coherence of the resulting classes. Image segmentation methods, based on e.g. Markov random fields, are not adapted to irregularly sampled data. Other existing approaches, based on mixtures of Gaussian random functions estimated via the expectation-maximization algorithm, are limited to reasonable sample sizes and a small number of variables. In this work, we propose two algorithms based on adaptations of classical algorithms to multivariate geostatistical data. Both algorithms are model free and can handle large volumes of multivariate, irregularly spaced data. The first one proceeds by agglomerative hierarchical clustering. The spatial coherence is ensured by a proximity condition imposed for two clusters to merge. This proximity condition relies on a graph organizing the data in the coordinates space. The hierarchical algorithm can then be seen as a graph-partitioning algorithm. Following this interpretation, a spatial version of the spectral clustering algorithm is also proposed. The performances of both algorithms are assessed on toy examples and a mining dataset.

  9. Global Tree Range Shifts Under Forecasts from Two Alternative GCMs Using Two Future Scenarios

    NASA Astrophysics Data System (ADS)

    Hargrove, W. W.; Kumar, J.; Potter, K. M.; Hoffman, F. M.

    2013-12-01

    Global shifts in the environmentally suitable ranges of 215 tree species were predicted under forecasts from two GCMs (the Parallel Climate Model (PCM), and the Hadley Model), each under two IPCC future climatic scenarios (A1 and B1), each at two future dates (2050 and 2100). The analysis considers all global land surface at a resolution of 4 km2. A statistical multivariate clustering procedure was used to quantitatively delineate 30 thousand environmentally homogeneous ecoregions across present and 8 potential future global locations at once, using global maps of 17 environmental characteristics describing temperature, precipitation, soils, topography and solar insolation. Presence of each tree species on Forest Inventory Analysis (FIA) plots and in Global Biodiversity Information Facility (GBIF) samples was used to select a subset of suitable ecoregions from the full set of 30 thousand. Once identified, this suitable subset of ecoregions was compared to the known current range of the tree species under present conditions. Predicted present ranges correspond well with current understanding for all but a few of the 215 tree species. The subset of suitable ecoregions for each tree species can then be tracked into the future to determine whether the suitable home range for this species remains the same, moves, grows, shrinks, or disappears under each model/scenario combination. Occurrence and growth performance measurements for various tree species across the U.S. are limited to FIA plots. We present a new, general-purpose empirical imputation method which associates sparse measurements of dependent variables with particular multivariate clustered combinations of the independent variables, and then estimates values for unmeasured clusters, based on directional proximity in multidimensional data space, at both the cluster and map-cell levels of resolution. Using Associative Clustering, we scaled up the FIA point measurements into contonuous maps that show the expected growth and suitability for individual tree species across the continental US. Maps were generated for each tree species showing the Minimum Required Movement (MRM) straight-line distance from each currently suitable location to the geographically nearest "lifeboat" location having suitable conditions in the future. Locations that are the closest "lifeboats" for many MRM propagules originating from wide surrounding areas may constitute high-priority preservation targets as a refugium against climatic change.

  10. Cluster analysis identifies three urodynamic patterns in patients with orthotopic neobladder reconstruction.

    PubMed

    Kim, Kwang Hyun; Yoon, Hyun Suk; Song, Wan; Choo, Hee Jung; Yoon, Hana; Chung, Woo Sik; Sim, Bong Suk; Lee, Dong Hyeon

    2017-01-01

    To classify patients with orthotopic neobladder based on urodynamic parameters using cluster analysis and to characterize the voiding function of each group. From January 2012 to November 2015, 142 patients with bladder cancer underwent radical cystectomy and Studer neobladder reconstruction at our institute. Of the 142 patients, 103 with complete urodynamic data and information on urinary functional outcomes were included in this study. K-means clustering was performed with urodynamic parameters which included maximal cystometric capacity, residual volume, maximal flow rate, compliance, and detrusor pressure at maximum flow rate. Three groups emerged by cluster analysis. Urodynamic parameters and urinary function outcomes were compared between three groups. Group 1 (n = 44) had ideal urodynamic parameters with a mean maximal bladder capacity of 513.3 ml and mean residual urine volume of 33.1 ml. Group 2 (n = 42) was characterized by small bladder capacity with low compliance. Patients in group 2 had higher rates of daytime incontinence and nighttime incontinence than patients in group 1. Group 3 (n = 17) was characterized by large residual urine volume with high compliance. When we examined gender differences in urodynamics and functional outcomes, residual urine volume and the rate of daytime incontinence were only marginally significant. However, females were significantly more likely to belong to group 2 or 3 (P = 0.003). In multivariate analysis to identify factors associated with group 1 which has the most ideal urodynamic pattern, age (OR 0.95, P = 0.017) and male gender (OR 7.57, P = 0.003) were identified as significant factors. While patients with ileal neobladder present with various voiding symptoms, three urodynamic patterns were identified by cluster analysis. Approximately half of patients had ideal urodynamic parameters. The other two groups were characterized by large residual urine and small capacity bladder with low compliance. Young age and male gender appear to have a favorable impact on urodynamic and voiding outcomes in patients undergoing orthotopic neobladder reconstruction.

  11. A Holarctic Biogeographical Analysis of the Collembola (Arthropoda, Hexapoda) Unravels Recent Post-Glacial Colonization Patterns

    PubMed Central

    Ávila-Jiménez, María Luisa; Coulson, Stephen James

    2011-01-01

    We aimed to describe the main Arctic biogeographical patterns of the Collembola, and analyze historical factors and current climatic regimes determining Arctic collembolan species distribution. Furthermore, we aimed to identify possible dispersal routes, colonization sources and glacial refugia for Arctic collembola. We implemented a Gaussian Mixture Clustering method on species distribution ranges and applied a distance- based parametric bootstrap test on presence-absence collembolan species distribution data. Additionally, multivariate analysis was performed considering species distributions, biodiversity, cluster distribution and environmental factors (temperature and precipitation). No clear relation was found between current climatic regimes and species distribution in the Arctic. Gaussian Mixture Clustering found common elements within Siberian areas, Atlantic areas, the Canadian Arctic, a mid-Siberian cluster and specific Beringian elements, following the same pattern previously described, using a variety of molecular methods, for Arctic plants. Species distribution hence indicate the influence of recent glacial history, as LGM glacial refugia (mid-Siberia, and Beringia) and major dispersal routes to high Arctic island groups can be identified. Endemic species are found in the high Arctic, but no specific biogeographical pattern can be clearly identified as a sign of high Arctic glacial refugia. Ocean currents patterns are suggested as being an important factor shaping the distribution of Arctic Collembola, which is consistent with Antarctic studies in collembolan biogeography. The clear relations between cluster distribution and geographical areas considering their recent glacial history, lack of relationship of species distribution with current climatic regimes, and consistency with previously described Arctic patterns in a series of organisms inferred using a variety of methods, suggest that historical phenomena shaping contemporary collembolan distribution can be inferred through biogeographical analysis. PMID:26467728

  12. Regression analysis for LED color detection of visual-MIMO system

    NASA Astrophysics Data System (ADS)

    Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

    2018-04-01

    Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.

  13. Implementation of physicochemical and sensory analysis in conjunction with multivariate analysis towards assessing olive oil authentication/adulteration.

    PubMed

    Arvanitoyannis, Ioannis S; Vlachos, Antonios

    2007-01-01

    The authenticity of products labeled as olive oils, and in particular as virgin olive oils, stands for a very important issue both in terms of its health and commercial aspects. In view of the continuously increasing interest in virgin olive oil therapeutic properties, the traditional methods of characterization and physical and sensory analysis were further enriched with more advanced and sophisticated methods such as HPLC-MS, HPLC-GC/C/IRMS, RPLC-GC, DEPT, and CSIA among others. The results of both traditional and "novel" methods were treated both by means of classical multivariate analysis (cluster, principal component, correspondence, canonical, and discriminant) and artificial intelligence methods showing that nowadays the adulteration of virgin olive oil with seed oil is detectable at very low percentages, sometimes even at less than 1%. Furthermore, the detection of geographical origin of olive oil is equally feasible and much more accurate in countries like Italy and Spain where databases of physical/chemical properties exist. However, this geographical origin classification can also be accomplished in the absence of such databases provided that an adequate number of oil samples are used and the parameters studied have "discriminating power."

  14. How Do Social Capital and HIV/AIDS Outcomes Geographically Cluster and Which Sociocontextual Mechanisms Predict Differences Across Clusters?

    PubMed

    Ransome, Yusuf; Dean, Lorraine T; Crawford, Natalie D; Metzger, David S; Blank, Michael B; Nunn, Amy S

    2017-09-01

    Place of residence has been associated with HIV transmission risks. Social capital, defined as features of social organization that improve efficiency of society by facilitating coordinated actions, often varies by neighborhood, and hypothesized to have protective effects on HIV care continuum outcomes. We examined whether the association between social capital and 2 HIV care continuum outcomes clustered geographically and whether sociocontextual mechanisms predict differences across clusters. Bivariate Local Moran's I evaluated geographical clustering in the association between social capital (participation in civic and social organizations, 2006, 2008, 2010) and [5-year (2007-2011) prevalence of late HIV diagnosis and linkage to HIV care] across Philadelphia, PA, census tracts (N = 378). Maps documented the clusters and multinomial regression assessed which sociocontextual mechanisms (eg, racial composition) predict differences across clusters. We identified 4 significant clusters (high social capital-high HIV/AIDS, low social capital-low HIV/AIDS, low social capital-high HIV/AIDS, and high social capital-low HIV/AIDS). Moran's I between social capital and late HIV diagnosis was (I = 0.19, z = 9.54, P < 0.001) and linkage to HIV care (I = 0.06, z = 3.274, P = 0.002). In multivariable analysis, median household income predicted differences across clusters, particularly where social capital was lowest and HIV burden the highest, compared with clusters with high social capital and lowest HIV burden. The association between social participation and HIV care continuum outcomes cluster geographically in Philadelphia, PA. HIV prevention interventions should account for this phenomenon. Reducing geographic disparities will require interventions tailored to each continuum step and that address socioeconomic factors such as neighborhood median income.

  15. Computer program documentation: ISOCLS iterative self-organizing clustering program, program C094

    NASA Technical Reports Server (NTRS)

    Minter, R. T. (Principal Investigator)

    1972-01-01

    The author has identified the following significant results. This program implements an algorithm which, ideally, sorts a given set of multivariate data points into similar groups or clusters. The program is intended for use in the evaluation of multispectral scanner data; however, the algorithm could be used for other data types as well. The user may specify a set of initial estimated cluster means to begin the procedure, or he may begin with the assumption that all the data belongs to one cluster. The procedure is initiatized by assigning each data point to the nearest (in absolute distance) cluster mean. If no initial cluster means were input, all of the data is assigned to cluster 1. The means and standard deviations are calculated for each cluster.

  16. Identifying contextual influences of community reintegration among injured servicemembers.

    PubMed

    Hawkins, Brent L; McGuire, Francis A; Britt, Thomas W; Linder, Sandra M

    2015-01-01

    Research suggests that community reintegration (CR) after injury and rehabilitation is difficult for many injured servicemembers. However, little is known about the influence of the contextual factors, both personal and environmental, that influence CR. Framed within the International Classification of Functioning, Disability and Health and Social Cognitive Theory, the quantitative portion of a larger mixed-methods study of 51 injured, community-dwelling servicemembers compared the relative contribution of contextual factors between groups of servicemembers with different levels of CR. Cluster analysis indicated three groups of servicemembers showing low, moderate, and high levels of CR. Statistical analyses identified contextual factors (e.g., personal and environmental factors) that significantly discriminated between CR clusters. Multivariate analysis of variance and discriminant analysis indicated significant contributions of general self-efficacy, services and assistance barriers, physical and structural barriers, attitudes and support barriers, perceived level of disability and/or handicap, work and school barriers, and policy barriers on CR scores. Overall, analyses indicated that injured servicemembers with lower CR scores had lower general self-efficacy scores, reported more difficulty with environmental barriers, and reported their injuries as more disabling.

  17. A scoring metric for multivariate data for reproducibility analysis using chemometric methods

    PubMed Central

    Sheen, David A.; de Carvalho Rocha, Werickson Fortunato; Lippa, Katrice A.; Bearden, Daniel W.

    2017-01-01

    Process quality control and reproducibility in emerging measurement fields such as metabolomics is normally assured by interlaboratory comparison testing. As a part of this testing process, spectral features from a spectroscopic method such as nuclear magnetic resonance (NMR) spectroscopy are attributed to particular analytes within a mixture, and it is the metabolite concentrations that are returned for comparison between laboratories. However, data quality may also be assessed directly by using binned spectral data before the time-consuming identification and quantification. Use of the binned spectra has some advantages, including preserving information about trace constituents and enabling identification of process difficulties. In this paper, we demonstrate the use of binned NMR spectra to conduct a detailed interlaboratory comparison and composition analysis. Spectra of synthetic and biologically-obtained metabolite mixtures, taken from a previous interlaboratory study, are compared with cluster analysis using a variety of distance and entropy metrics. The individual measurements are then evaluated based on where they fall within their clusters, and a laboratory-level scoring metric is developed, which provides an assessment of each laboratory’s individual performance. PMID:28694553

  18. Historic changes in fish assemblage structure in midwestern nonwadeable rivers

    USGS Publications Warehouse

    Parks, Timothy P.; Quist, Michael C.; Pierce, Clay L.

    2014-01-01

    Historical change in fish assemblage structure was evaluated in the mainstems of the Des Moines, Iowa, Cedar, Wapsipinicon, and Maquoketa rivers, in Iowa. Fish occurrence data were compared in each river between historical and recent time periods to characterize temporal changes among 126 species distributions and assess spatiotemporal patterns in faunal similarity. A resampling procedure was used to estimate species occurrences in rivers during each assessment period and changes in species occurrence were summarized. Spatiotemporal shifts in species composition were analyzed at the river and river section scale using cluster analysis, pairwise Jaccard's dissimilarities, and analysis of multivariate beta dispersion. The majority of species exhibited either increases or declines in distribution in all rivers with the exception of several “unknown” or inconclusive trends exhibited by species in the Maquoketa River. Cluster analysis identified temporal patterns of similarity among fish assemblages in the Des Moines, Cedar, and Iowa rivers within the historical and recent assessment period indicating a significant change in species composition. Prominent declines of backwater species with phytophilic spawning strategies contributed to assemblage changes occurring across river systems.

  19. Combining FT-IR spectroscopy and multivariate analysis for qualitative and quantitative analysis of the cell wall composition changes during apples development.

    PubMed

    Szymanska-Chargot, M; Chylinska, M; Kruk, B; Zdunek, A

    2015-01-22

    The aim of this work was to quantitatively and qualitatively determine the composition of the cell wall material from apples during development by means of Fourier transform infrared (FT-IR) spectroscopy. The FT-IR region of 1500-800 cm(-1), containing characteristic bands for galacturonic acid, hemicellulose and cellulose, was examined using principal component analysis (PCA), k-means clustering and partial least squares (PLS). The samples were differentiated by development stage and cultivar using PCA and k-means clustering. PLS calibration models for galacturonic acid, hemicellulose and cellulose content from FT-IR spectra were developed and validated with the reference data. PLS models were tested using the root-mean-square errors of cross-validation for contents of galacturonic acid, hemicellulose and cellulose which was 8.30 mg/g, 4.08% and 1.74%, respectively. It was proven that FT-IR spectroscopy combined with chemometric methods has potential for fast and reliable determination of the main constituents of fruit cell walls. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Reexamining Sample Size Requirements for Multivariate, Abundance-Based Community Research: When Resources are Limited, the Research Does Not Have to Be.

    PubMed

    Forcino, Frank L; Leighton, Lindsey R; Twerdy, Pamela; Cahill, James F

    2015-01-01

    Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.

  1. Career paths in physicians' postgraduate training - an eight-year follow-up study.

    PubMed

    Buddeberg-Fischer, Barbara; Stamm, Martina; Klaghofer, Richard

    2010-10-06

    To date, there are hardly any studies on the choice of career path in medical school graduates. The present study aimed to investigate what career paths can be identified in the course of postgraduate training of physicians; what factors have an influence on the choice of a career path; and in what way the career paths are correlated with career-related factors as well as with work-life balance aspirations. The data reported originates from five questionnaire surveys of the prospective SwissMedCareer Study, beginning in 2001 (T1, last year of medical school). The study sample consisted of 358 physicians (197 females, 55%; 161 males, 45%) participating at each assessment from T2 (2003, first year of residency) to T5 (2009, seventh year of residency), answering the question: What career do you aspire to have? Furthermore, personal characteristics, chosen specialty, career motivation, mentoring experience, work-life balance as well as workload, career success and career satisfaction were assessed. Career paths were analysed with cluster analysis, and differences between clusters analysed with multivariate methods. The cluster analysis revealed four career clusters which discriminated distinctly between each other: (1) career in practice, (2) hospital career, (3) academic career, and (4) changing career goal. From T3 (third year of residency) to T5, respondents in Cluster 1-3 were rather stable in terms of their career path aspirations, while those assigned to Cluster 4 showed a high fluctuation in their career plans. Physicians in Cluster 1 showed high values in extraprofessional concerns and often consider part-time work. Cluster 2 and 3 were characterised by high instrumentality, intrinsic and extrinsic career motivation, career orientation and high career success. No cluster differences were seen in career satisfaction. In Cluster 1 and 4, females were overrepresented. Trainees should be supported to stay on the career path that best suits his/her personal and professional profile. Attention should be paid to the subgroup of physicians in Cluster 4 switching from one to another career goal in the course of their postgraduate training.

  2. Predictive modeling of EEG time series for evaluating surgery targets in epilepsy patients.

    PubMed

    Steimer, Andreas; Müller, Michael; Schindler, Kaspar

    2017-05-01

    During the last 20 years, predictive modeling in epilepsy research has largely been concerned with the prediction of seizure events, whereas the inference of effective brain targets for resective surgery has received surprisingly little attention. In this exploratory pilot study, we describe a distributional clustering framework for the modeling of multivariate time series and use it to predict the effects of brain surgery in epilepsy patients. By analyzing the intracranial EEG, we demonstrate how patients who became seizure free after surgery are clearly distinguished from those who did not. More specifically, for 5 out of 7 patients who obtained seizure freedom (= Engel class I) our method predicts the specific collection of brain areas that got actually resected during surgery to yield a markedly lower posterior probability for the seizure related clusters, when compared to the resection of random or empty collections. Conversely, for 4 out of 5 Engel class III/IV patients who still suffer from postsurgical seizures, performance of the actually resected collection is not significantly better than performances displayed by random or empty collections. As the number of possible collections ranges into billions and more, this is a substantial contribution to a problem that today is still solved by visual EEG inspection. Apart from epilepsy research, our clustering methodology is also of general interest for the analysis of multivariate time series and as a generative model for temporally evolving functional networks in the neurosciences and beyond. Hum Brain Mapp 38:2509-2531, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  3. Socioeconomic Status (SES) and Childhood Acute Myeloid Leukemia (AML) Mortality

    PubMed Central

    Knoble, Naomi B.; Alderfer, Melissa A.; Hossain, Md Jobayer

    2016-01-01

    Socioeconomic status (SES) is a complex construct of multiple indicators, known to impact cancer outcomes, but has not been adequately examined among pediatric AML patients. This study aimed to identify the patterns of co-occurrence of multiple community-level SES indicators and to explore associations between various patterns of these indicators and pediatric AML mortality risk. A nationally representative US sample of 3,651 pediatric AML patients, aged 0–19 years at diagnosis was drawn from 17 Surveillance, Epidemiology, and End Results (SEER) database registries created between 1973 and 2012. Factor analysis, cluster analysis, stratified univariable and multivariable Cox proportional hazards models were used. Four SES factors accounting for 87% of the variance in SES indicators were identified: F1) economic/educational disadvantage, less immigration; F2) immigration-related features (foreign-born, language-isolation, crowding), less mobility F3) housing instability; and, F4) absence of moving. F1 and F3 showed elevated risk of mortality, adjusted hazards ratios (aHR) (95% CI): 1.07(1.02–1.12) and 1.05(1.00–1.10), respectively. Seven SES-defined cluster groups were identified. Cluster 1: (low economic/educational disadvantage, few immigration-related features, and residential-stability) showed the minimum risk of mortality. Compared to Cluster 1, Cluster 3: (high economic/educational disadvantage, high-mobility) and Cluster 6: (moderately-high economic/educational disadvantages, housing-instability and immigration-related features) exhibited substantially greater risk of mortality, aHR(95% CI) = 1.19(1.0–1.4) and 1.23 (1.1–1.5), respectively. Factors of correlated SES-indicators and their pattern-based groups demonstrated differential risks in the pediatric AML mortality indicating the need of special public-health attention in areas with economic-educational disadvantages, housing-instability and immigration-related features. PMID:27543948

  4. MCMC Sampling for a Multilevel Model with Nonindependent Residuals within and between Cluster Units

    ERIC Educational Resources Information Center

    Browne, William; Goldstein, Harvey

    2010-01-01

    In this article, we discuss the effect of removing the independence assumptions between the residuals in two-level random effect models. We first consider removing the independence between the Level 2 residuals and instead assume that the vector of all residuals at the cluster level follows a general multivariate normal distribution. We…

  5. Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets

    DTIC Science & Technology

    2017-07-01

    principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for

  6. The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

    NASA Astrophysics Data System (ADS)

    Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

    2014-05-01

    Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.

  7. A multivariate approach for the study of the environmental drivers of wine production structure

    NASA Astrophysics Data System (ADS)

    Lorenzetti, Romina; Costantini, Edoardo A. C.; Malorgio, Giulio

    2015-04-01

    Vitivinicultural "terroir" is a concept referring to an area in which the collective knowledge of the interactions between environment and vitivinicultural practices develops, providing distinctive characteristics to the products. The effect of the environment components over the terroir has been already widely demonstrated. What it has not been studied yet is their possible effect on the structure of wine production. Therefore, the aim of this work was to find if environmental drivers influence the wine production structure. This kind of investigation necessarily involves a change of scale towards wide territories. We used the Italian Denomination of Origin territories, which were grouped in Macro-areas (reference scale 1:500,000) with respect of geographic proximity, environmental features, viticultural affinity and tradition. The characterization of the structure of the wine transformation industry was based on the official data reported in the wine production declarations related to the year 2008. Statistics were taken into account about general quantitative variables of wine farms, presence of associative forms, degree of vertical integration of wineries, quality orientation of wine producers, and acreage of vineyard. The environmental variables climate, soil, and vegetation vigour were selected for their direct influence on the vine growing. A second set of variables was chosen to express the effect of land morphology on viticultural management. The third one was intended to discover the possible relationships between viticultural structures and land quality, such as the indexes of sensitivity to desertification, the soil resistance to water erosion, and land vulnerability. A PCA was carried out separately for the environmental and economic data to reduce the database dimensions. The new economic and environmental synthetic descriptors were involved in three multivariate analyses: i) the correlation between economic and environmental descriptors through the non-parametric Spearman test; ii) a cluster analysis to group the Macro-areas in few homogeneous economic structures; iii) a discriminant analysis of economic clusters and environmental factors, to highlight the environmental drivers of the different wine production structures. The cluster analysis identified six systems of production and organization. Climatic, pedoclimatic, morphological mean conditions and morphological heterogeneity of Macro-areas had the most important discriminant power over the clusters. The economic structures addressed to large-scale kind of production and those with a not clear orientation were located in low hills and plains with Mediterranean climatic conditions. Lands at higher elevation and rougher morphology correlated with high quality products and structures, either made of little independent farms or cooperatives, in the highest cold wet areas, or large independent farms, on medium hill. In conclusion, for the first time it was proved that certain landscape characteristics have a significant influence over the typology of wine production structure. The result of this multivariate analyses suggest that pedo-climatic characteristics and landscape attributes care can have a strategic role on the wine industry.

  8. Counting the Homeless: A Previously Incalculable Tuberculosis Risk and Its Social Determinants

    PubMed Central

    Teeter, Larry D.; Musser, James M.; Graviss, Edward A.

    2013-01-01

    Tuberculosis (TB) surveillance among the homeless is not supported by the political will necessary for TB elimination. We merged the first stakeholder-accepted enumeration of homeless persons with existing surveillance data to assess TB risk among the homeless in Houston, Texas. The average incidence per 100 000 was 411 among homeless and 9.5 among housed persons. The homeless were more likely than the housed to be US-born, clustered, and in a larger-sized cluster. Multivariate analysis revealed that TB rates among the homeless were driven not by comorbidities but by social determinants. Homeless patients were hospitalized more days than the housed and required more follow-up time. Reporting of TB rates for populations with known health disparities could help reframe TB prevention and better target limited funds. PMID:23488504

  9. Differentiation of aflatoxigenic and non-aflatoxigenic strains of Aspergilli by FT-IR spectroscopy.

    PubMed

    Atkinson, Curtis; Pechanova, Olga; Sparks, Darrell L; Brown, Ashli; Rodriguez, Jose M

    2014-01-01

    Fourier transform infrared spectroscopy (FT-IR) is a well-established and widely accepted methodology to identify and differentiate diverse microbial species. In this study, FT-IR was used to differentiate 20 strains of ubiquitous and agronomically important phytopathogens of Aspergillus flavus and Aspergillus parasiticus. By analyzing their spectral profiles via principal component and cluster analysis, differentiation was achieved between the aflatoxin-producing and nonproducing strains of both fungal species. This study thus indicates that FT-IR coupled to multivariate statistics can rapidly differentiate strains of Aspergilli based on their toxigenicity.

  10. Recent TB transmission, clustering and predictors of large clusters in London, 2010–2012: results from first 3 years of universal MIRU-VNTR strain typing

    PubMed Central

    Hamblion, Esther L; Le Menach, Arnaud; Anderson, Laura F; Lalor, Maeve K; Brown, Tim; Abubakar, Ibrahim; Anderson, Charlotte; Maguire, Helen; Anderson, Sarah R

    2016-01-01

    Background The incidence of TB has doubled in the last 20 years in London. A better understanding of risk groups for recent transmission is required to effectively target interventions. We investigated the molecular epidemiological characteristics of TB cases to estimate the proportion of cases due to recent transmission, and identify predictors for belonging to a cluster. Methods The study population included all culture-positive TB cases in London residents, notified between January 2010 and December 2012, strain typed using 24-loci multiple interspersed repetitive units-variable number tandem repeats. Multivariable logistic regression analysis was performed to assess the risk factors for clustering using sociodemographic and clinical characteristics of cases and for cluster size based on the characteristics of the first two cases. Results There were 10 147 cases of which 5728 (57%) were culture confirmed and 4790 isolates (84%) were typed. 2194 (46%) were clustered in 570 clusters, and the estimated proportion attributable to recent transmission was 34%. Clustered cases were more likely to be UK born, have pulmonary TB, a previous diagnosis, a history of substance abuse or alcohol abuse and imprisonment, be of white, Indian, black-African or Caribbean ethnicity. The time between notification of the first two cases was more likely to be <90 days in large clusters. Conclusions Up to a third of TB cases in London may be due to recent transmission. Resources should be directed to the timely investigation of clusters involving cases with risk factors, particularly those with a short period between the first two cases, to interrupt onward transmission of TB. PMID:27417280

  11. Intercenter Differences in Bronchopulmonary Dysplasia or Death Among Very Low Birth Weight Infants

    PubMed Central

    Walsh, Michele; Bobashev, Georgiy; Das, Abhik; Levine, Burton; Carlo, Waldemar A.; Higgins, Rosemary D.

    2011-01-01

    OBJECTIVES: To determine (1) the magnitude of clustering of bronchopulmonary dysplasia (36 weeks) or death (the outcome) across centers of the Eunice Kennedy Shriver National Institute of Child and Human Development National Research Network, (2) the infant-level variables associated with the outcome and estimate their clustering, and (3) the center-specific practices associated with the differences and build predictive models. METHODS: Data on neonates with a birth weight of <1250 g from the cluster-randomized benchmarking trial were used to determine the magnitude of clustering of the outcome according to alternating logistic regression by using pairwise odds ratio and predictive modeling. Clinical variables associated with the outcome were identified by using multivariate analysis. The magnitude of clustering was then evaluated after correction for infant-level variables. Predictive models were developed by using center-specific and infant-level variables for data from 2001 2004 and projected to 2006. RESULTS: In 2001–2004, clustering of bronchopulmonary dysplasia/death was significant (pairwise odds ratio: 1.3; P < .001) and increased in 2006 (pairwise odds ratio: 1.6; overall incidence: 52%; range across centers: 32%–74%); center rates were relatively stable over time. Variables that varied according to center and were associated with increased risk of outcome included lower body temperature at NICU admission, use of prophylactic indomethacin, specific drug therapy on day 1, and lack of endotracheal intubation. Center differences remained significant even after correction for clustered variables. CONCLUSION: Bronchopulmonary dysplasia/death rates demonstrated moderate clustering according to center. Clinical variables associated with the outcome were also clustered. Center differences after correction of clustered variables indicate presence of as-yet unmeasured center variables. PMID:21149431

  12. Socioeconomic status (SES) and childhood acute myeloid leukemia (AML) mortality risk: Analysis of SEER data.

    PubMed

    Knoble, Naomi B; Alderfer, Melissa A; Hossain, Md Jobayer

    2016-10-01

    Socioeconomic status (SES) is a complex construct of multiple indicators, known to impact cancer outcomes, but has not been adequately examined among pediatric AML patients. This study aimed to identify the patterns of co-occurrence of multiple community-level SES indicators and to explore associations between various patterns of these indicators and pediatric AML mortality risk. A nationally representative US sample of 3651 pediatric AML patients, aged 0-19 years at diagnosis was drawn from 17 Surveillance, Epidemiology, and End Results (SEER) database registries created between 1973 and 2012. Factor analysis, cluster analysis, stratified univariable and multivariable Cox proportional hazards models were used. Four SES factors accounting for 87% of the variance in SES indicators were identified: F1) economic/educational disadvantage, less immigration; F2) immigration-related features (foreign-born, language-isolation, crowding), less mobility; F3) housing instability; and, F4) absence of moving. F1 and F3 showed elevated risk of mortality, adjusted hazards ratios (aHR) (95% CI): 1.07(1.02-1.12) and 1.05(1.00-1.10), respectively. Seven SES-defined cluster groups were identified. Cluster 1 (low economic/educational disadvantage, few immigration-related features, and residential-stability) showed the minimum risk of mortality. Compared to Cluster 1, Cluster 3 (high economic/educational disadvantage, high-mobility) and Cluster 6 (moderately-high economic/educational disadvantages, housing-instability and immigration-related features) exhibited substantially greater risk of mortality, aHR(95% CI)=1.19(1.0-1.4) and 1.23 (1.1-1.5), respectively. Factors of correlated SES-indicators and their pattern-based groups demonstrated differential risks in the pediatric AML mortality indicating the need of special public-health attention in areas with economic-educational disadvantages, housing-instability and immigration-related features. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Neural Activity Patterns in the Human Brain Reflect Tactile Stickiness Perception.

    PubMed

    Kim, Junsuk; Yeon, Jiwon; Ryu, Jaekyun; Park, Jang-Yeon; Chung, Soon-Cheol; Kim, Sung-Phil

    2017-01-01

    Our previous human fMRI study found brain activations correlated with tactile stickiness perception using the uni-variate general linear model (GLM) (Yeon et al., 2017). Here, we conducted an in-depth investigation on neural correlates of sticky sensations by employing a multivoxel pattern analysis (MVPA) on the same dataset. In particular, we statistically compared multi-variate neural activities in response to the three groups of sticky stimuli: A supra-threshold group including a set of sticky stimuli that evoked vivid sticky perception; an infra-threshold group including another set of sticky stimuli that barely evoked sticky perception; and a sham group including acrylic stimuli with no physically sticky property. Searchlight MVPAs were performed to search for local activity patterns carrying neural information of stickiness perception. Similar to the uni-variate GLM results, significant multi-variate neural activity patterns were identified in postcentral gyrus, subcortical (basal ganglia and thalamus), and insula areas (insula and adjacent areas). Moreover, MVPAs revealed that activity patterns in posterior parietal cortex discriminated the perceptual intensities of stickiness, which was not present in the uni-variate analysis. Next, we applied a principal component analysis (PCA) to the voxel response patterns within identified clusters so as to find low-dimensional neural representations of stickiness intensities. Follow-up clustering analyses clearly showed separate neural grouping configurations between the Supra- and Infra-threshold groups. Interestingly, this neural categorization was in line with the perceptual grouping pattern obtained from the psychophysical data. Our findings thus suggest that different stickiness intensities would elicit distinct neural activity patterns in the human brain and may provide a neural basis for the perception and categorization of tactile stickiness.

  14. Quantitative effects of composting state variables on C/N ratio through GA-aided multivariate analysis.

    PubMed

    Sun, Wei; Huang, Guo H; Zeng, Guangming; Qin, Xiaosheng; Yu, Hui

    2011-03-01

    It is widely known that variation of the C/N ratio is dependent on many state variables during composting processes. This study attempted to develop a genetic algorithm aided stepwise cluster analysis (GASCA) method to describe the nonlinear relationships between the selected state variables and the C/N ratio in food waste composting. The experimental data from six bench-scale composting reactors were used to demonstrate the applicability of GASCA. Within the GASCA framework, GA searched optimal sets of both specified state variables and SCA's internal parameters; SCA established statistical nonlinear relationships between state variables and the C/N ratio; to avoid unnecessary and time-consuming calculation, a proxy table was introduced to save around 70% computational efforts. The obtained GASCA cluster trees had smaller sizes and higher prediction accuracy than the conventional SCA trees. Based on the optimal GASCA tree, the effects of the GA-selected state variables on the C/N ratio were ranged in a descending order as: NH₄+-N concentration>Moisture content>Ash Content>Mean Temperature>Mesophilic bacteria biomass. Such a rank implied that the variation of ammonium nitrogen concentration, the associated temperature and the moisture conditions, the total loss of both organic matters and available mineral constituents, and the mesophilic bacteria activity, were critical factors affecting the C/N ratio during the investigated food waste composting. This first application of GASCA to composting modelling indicated that more direct search algorithms could be coupled with SCA or other multivariate analysis methods to analyze complicated relationships during composting and many other environmental processes. Copyright © 2010 Elsevier B.V. All rights reserved.

  15. Revisiting the role of pathological analysis in transarterial chemoembolization-treated hepatocellular carcinoma after transplantation

    PubMed Central

    Vasuri, Francesco; Malvi, Deborah; Rosini, Francesca; Baldin, Pamela; Fiorentino, Michelangelo; Paccapelo, Alexandro; Ercolani, Giorgio; Pinna, Antonio Daniele; Golfieri, Rita; Morselli-Labate, Antonio Maria; Grigioni, Walter Franco; D’Errico-Grigioni, Antonia

    2014-01-01

    AIM: To define the histopathological features predictive of post-transplant hepatocellular carcinoma (HCC) recurrence after transarterial chemoembolization, applicable for recipient risk stratification. METHODS: We retrospectively reviewed the specimens of all suspicious nodules (total 275) from 101 consecutive liver transplant recipients which came to our Pathology Unit over a 6-year period. All nodules were sampled and analyzed, and follow-up data were collected. We finally considered 11 histological variables for each patient: total number of nodules, number of viable nodules, size of the major nodule, size of the major viable nodule, occurrence of microscopic vascular invasion, maximum Edmondson's grade, clear cell/sarcomatous changes, and the residual neoplastic volume. Survival data were computed by means of the Kaplan-Meier procedure and analyzed by means of the Cox proportional hazards model. The multivariate linear regression and a k-means cluster analysis were also used in order to compute the standardized histological score. RESULTS: The total number of nodules, the residual neoplastic volume (the total volume of all evaluated nodules minus the necrotic portion) and the microvascular invasion entered the Cox multivariate hazard model with HCC recurrence as dependent variable. The histological score was therefore computed and a cluster analysis sorted recipients into 3 risk groups, with 3.3%, 18.5% and 53.8% respectively of tumor recurrence rates and 1.6%, 11.1% and 38.5% of tumor-related mortality respectively at the end of follow-up. CONCLUSION: The histological score allows a reliable stratification of HCC recurrence risk, especially in those recipients found out to be beyond the Milan criteria after orthotopic liver transplantation (OLT). PMID:25309084

  16. Neural Activity Patterns in the Human Brain Reflect Tactile Stickiness Perception

    PubMed Central

    Kim, Junsuk; Yeon, Jiwon; Ryu, Jaekyun; Park, Jang-Yeon; Chung, Soon-Cheol; Kim, Sung-Phil

    2017-01-01

    Our previous human fMRI study found brain activations correlated with tactile stickiness perception using the uni-variate general linear model (GLM) (Yeon et al., 2017). Here, we conducted an in-depth investigation on neural correlates of sticky sensations by employing a multivoxel pattern analysis (MVPA) on the same dataset. In particular, we statistically compared multi-variate neural activities in response to the three groups of sticky stimuli: A supra-threshold group including a set of sticky stimuli that evoked vivid sticky perception; an infra-threshold group including another set of sticky stimuli that barely evoked sticky perception; and a sham group including acrylic stimuli with no physically sticky property. Searchlight MVPAs were performed to search for local activity patterns carrying neural information of stickiness perception. Similar to the uni-variate GLM results, significant multi-variate neural activity patterns were identified in postcentral gyrus, subcortical (basal ganglia and thalamus), and insula areas (insula and adjacent areas). Moreover, MVPAs revealed that activity patterns in posterior parietal cortex discriminated the perceptual intensities of stickiness, which was not present in the uni-variate analysis. Next, we applied a principal component analysis (PCA) to the voxel response patterns within identified clusters so as to find low-dimensional neural representations of stickiness intensities. Follow-up clustering analyses clearly showed separate neural grouping configurations between the Supra- and Infra-threshold groups. Interestingly, this neural categorization was in line with the perceptual grouping pattern obtained from the psychophysical data. Our findings thus suggest that different stickiness intensities would elicit distinct neural activity patterns in the human brain and may provide a neural basis for the perception and categorization of tactile stickiness. PMID:28936171

  17. Exploring public databases to characterize urban flood risks in Amsterdam

    NASA Astrophysics Data System (ADS)

    Gaitan, Santiago; ten Veldhuis, Marie-claire; van de Giesen, Nick

    2015-04-01

    Cities worldwide are challenged by increasing urban flood risks. Precise and realistic measures are required to decide upon investment to reduce their impacts. Obvious flooding factors affecting flood risk include sewer systems performance and urban topography. However, currently implemented sewer and topographic models do not provide realistic predictions of local flooding occurrence during heavy rain events. Assessing other factors such as spatially distributed rainfall and socioeconomic characteristics may help to explain probability and impacts of urban flooding. Several public databases were analyzed: complaints about flooding made by citizens, rainfall depths (15 min and 100 Ha spatio-temporal resolution), grids describing number of inhabitants, income, and housing price (1Ha and 25Ha resolution); and buildings age. Data analysis was done using Python and GIS programming, and included spatial indexing of data, cluster analysis, and multivariate regression on the complaints. Complaints were used as a proxy to characterize flooding impacts. The cluster analysis, run for all the variables except the complaints, grouped part of the grid-cells of central Amsterdam into a highly differentiated group, covering 10% of the analyzed area, and accounting for 25% of registered complaints. The configuration of the analyzed variables in central Amsterdam coincides with a high complaint count. Remaining complaints were evenly dispersed along other groups. An adjusted R2 of 0.38 in the multivariate regression suggests that explaining power can improve if additional variables are considered. While rainfall intensity explained 4% of the incidence of complaints, population density and building age significantly explained around 20% each. Data mining of public databases proved to be a valuable tool to identify factors explaining variability in occurrence of urban pluvial flooding, though additional variables must be considered to fully explain flood risk variability.

  18. Three subgroups of pain profiles identified in 227 women with arthritis: a latent class analysis.

    PubMed

    de Luca, Katie; Parkinson, Lynne; Downie, Aron; Blyth, Fiona; Byles, Julie

    2017-03-01

    The objectives were to identify subgroups of women with arthritis based upon the multi-dimensional nature of their pain experience and to compare health and socio-demographic variables between subgroups. A latent class analysis of 227 women with self-reported arthritis was used to identify clusters of women based upon the sensory, affective, and cognitive dimensions of the pain experience. Multivariate multinomial logistic regression analysis was used to determine the relationship between cluster membership and health and sociodemographic characteristics. A three-class cluster model was most parsimonious. 39.5 % of women had a unidimensional pain profile; 38.6 % of women had moderate multidimensional pain profile that included additional pain symptomatology such as sensory qualities and pain catastrophizing; and 21.9 % of women had severe multidimensional pain profile that included prominent pain symptomatology such as sensory and affective qualities of pain, pain catastrophizing, and neuropathic pain. Women with severe multidimensional pain profile have a 30.5 % higher risk of poorer quality of life and a 7.3 % higher risk of suffering depression, and women with moderate multidimensional pain profile have a 6.4 % higher risk of poorer quality of life when compared to women with unidimensional pain. This study identified three distinct subgroups of pain profiles in older women with arthritis. Women had very different experiences of pain, and cluster membership impacted significantly on health-related quality of life. These preliminary findings provide a stronger understanding of profiles of pain and may contribute to the development of tailored treatment options in arthritis.

  19. Psychological Factors Predict Local and Referred Experimental Muscle Pain: A Cluster Analysis in Healthy Adults

    PubMed Central

    Lee, Jennifer E.; Watson, David; Frey-Law, Laura A.

    2012-01-01

    Background Recent studies suggest an underlying three- or four-factor structure explains the conceptual overlap and distinctiveness of several negative emotionality and pain-related constructs. However, the validity of these latent factors for predicting pain has not been examined. Methods A cohort of 189 (99F; 90M) healthy volunteers completed eight self-report negative emotionality and pain-related measures (Eysenck Personality Questionnaire-Revised; Positive and Negative Affect Schedule; State-Trait Anxiety Inventory; Pain Catastrophizing Scale; Fear of Pain Questionnaire; Somatosensory Amplification Scale; Anxiety Sensitivity Index; Whiteley Index). Using principal axis factoring, three primary latent factors were extracted: General Distress; Catastrophic Thinking; and Pain-Related Fear. Using these factors, individuals clustered into three subgroups of high, moderate, and low negative emotionality responses. Experimental pain was induced via intramuscular acidic infusion into the anterior tibialis muscle, producing local (infusion site) and/or referred (anterior ankle) pain and hyperalgesia. Results Pain outcomes differed between clusters (multivariate analysis of variance and multinomial regression), with individuals in the highest negative emotionality cluster reporting the greatest local pain (p = 0.05), mechanical hyperalgesia (pressure pain thresholds; p = 0.009) and greater odds (2.21 OR) of experiencing referred pain compared to the lowest negative emotionality cluster. Conclusion Our results provide support for three latent psychological factors explaining the majority of the variance between several pain-related psychological measures, and that individuals in the high negative emotionality subgroup are at increased risk for (1) acute local muscle pain; (2) local hyperalgesia; and (3) referred pain using a standardized nociceptive input. PMID:23165778

  20. Bulk tank milk prevalence and production losses, spatial analysis, and predictive risk mapping of Ostertagia ostertagi infections in Mexican cattle herds.

    PubMed

    Villa-Mancera, Abel; Pastelín-Rojas, César; Olivares-Pérez, Jaime; Córdova-Izquierdo, Alejandro; Reynoso-Palomar, Alejandro

    2018-05-01

    This study investigated the prevalence, production losses, spatial clustering, and predictive risk mapping in different climate zones in five states of Mexico. The bulk tank milk samples obtained between January and April 2015 were analyzed for antibodies against Ostertagia ostertagi using the Svanovir ELISA. A total of 1204 farm owners or managers answered the questionnaire. The overall herd prevalence and mean optical density ratio (ODR) of parasite were 61.96% and 0.55, respectively. Overall, the production loss was approximately 0.542 kg of milk per parasited cow per day (mean ODR = 0.92, 142 farms, 11.79%). The spatial disease cluster analysis using SatScan software indicated that two high-risk clusters were observed. In the multivariable analysis, three models were tested for potential association with the ELISA results supported by climatic, environmental, and management factors. The final logistic regression model based on both climatic/environmental and management variables included the factors rainfall, elevation, land surface temperature (LST) day, and parasite control program that were significantly associated with an increased risk of infection. Geostatistical kriging was applied to generate a risk map for the presence of parasite in dairy cattle herds in Mexico. The results indicate that climatic and meteorological factors had a higher potential impact on the spatial distribution of O. ostertagi than the management factors.

  1. Investigation of cell wall composition related to stem lodging resistance in wheat (Triticum aestivum L.) by FTIR spectroscopy.

    PubMed

    Wang, Jian; Zhu, Jinmao; Huang, RuZhu; Yang, YuSheng

    2012-07-01

    We explored the rapid qualitative analysis of wheat cultivars with good lodging resistances by Fourier transform infrared resonance (FTIR) spectroscopy and multivariate statistical analysis. FTIR imaging showing that wheat stem cell walls were mainly composed of cellulose, pectin, protein, and lignin. Principal components analysis (PCA) was used to eliminate multicollinearity among multiple peak absorptions. PCA revealed the developmental internodes of wheat stems could be distributed from low to high along the load of the second principal component, which was consistent with the corresponding bands of cellulose in the FTIR spectra of the cell walls. Furthermore, four distinct stem populations could also be identified by spectral features related to their corresponding mechanical properties via PCA and cluster analysis. Histochemical staining of four types of wheat stems with various abilities to resist lodging revealed that cellulose contributed more than lignin to the ability to resist lodging. These results strongly suggested that the main cell wall component responsible for these differences was cellulose. Therefore, the combination of multivariate analysis and FTIR could rapidly screen wheat cultivars with good lodging resistance. Furthermore, the application of these methods to a much wider range of cultivars of unknown mechanical properties promises to be of interest.

  2. Characterization of the volatile components in green tea by IRAE-HS-SPME/GC-MS combined with multivariate analysis.

    PubMed

    Yang, Yan-Qin; Yin, Hong-Xu; Yuan, Hai-Bo; Jiang, Yong-Wen; Dong, Chun-Wang; Deng, Yu-Liang

    2018-01-01

    In the present work, a novel infrared-assisted extraction coupled to headspace solid-phase microextraction (IRAE-HS-SPME) followed by gas chromatography-mass spectrometry (GC-MS) was developed for rapid determination of the volatile components in green tea. The extraction parameters such as fiber type, sample amount, infrared power, extraction time, and infrared lamp distance were optimized by orthogonal experimental design. Under optimum conditions, a total of 82 volatile compounds in 21 green tea samples from different geographical origins were identified. Compared with classical water-bath heating, the proposed technique has remarkable advantages of considerably reducing the analytical time and high efficiency. In addition, an effective classification of green teas based on their volatile profiles was achieved by partial least square-discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). Furthermore, the application of a dual criterion based on the variable importance in the projection (VIP) values of the PLS-DA models and on the category from one-way univariate analysis (ANOVA) allowed the identification of 12 potential volatile markers, which were considered to make the most important contribution to the discrimination of the samples. The results suggest that IRAE-HS-SPME/GC-MS technique combined with multivariate analysis offers a valuable tool to assess geographical traceability of different tea varieties.

  3. An analysis of fracture trace patterns in areas of flat-lying sedimentary rocks for the detection of buried geologic structure. [Kansas and Texas

    NASA Technical Reports Server (NTRS)

    Podwysocki, M. H.

    1974-01-01

    Two study areas in a cratonic platform underlain by flat-lying sedimentary rocks were analyzed to determine if a quantitative relationship exists between fracture trace patterns and their frequency distributions and subsurface structural closures which might contain petroleum. Fracture trace lengths and frequency (number of fracture traces per unit area) were analyzed by trend surface analysis and length frequency distributions also were compared to a standard Gaussian distribution. Composite rose diagrams of fracture traces were analyzed using a multivariate analysis method which grouped or clustered the rose diagrams and their respective areas on the basis of the behavior of the rays of the rose diagram. Analysis indicates that the lengths of fracture traces are log-normally distributed according to the mapping technique used. Fracture trace frequency appeared higher on the flanks of active structures and lower around passive reef structures. Fracture trace log-mean lengths were shorter over several types of structures, perhaps due to increased fracturing and subsequent erosion. Analysis of rose diagrams using a multivariate technique indicated lithology as the primary control for the lower grouping levels. Groupings at higher levels indicated that areas overlying active structures may be isolated from their neighbors by this technique while passive structures showed no differences which could be isolated.

  4. Ripening-dependent metabolic changes in the volatiles of pineapple (Ananas comosus (L.) Merr.) fruit: II. Multivariate statistical profiling of pineapple aroma compounds based on comprehensive two-dimensional gas chromatography-mass spectrometry.

    PubMed

    Steingass, Christof Björn; Jutzi, Manfred; Müller, Jenny; Carle, Reinhold; Schmarr, Hans-Georg

    2015-03-01

    Ripening-dependent changes of pineapple volatiles were studied in a nontargeted profiling analysis. Volatiles were isolated via headspace solid phase microextraction and analyzed by comprehensive 2D gas chromatography and mass spectrometry (HS-SPME-GC×GC-qMS). Profile patterns presented in the contour plots were evaluated applying image processing techniques and subsequent multivariate statistical data analysis. Statistical methods comprised unsupervised hierarchical cluster analysis (HCA) and principal component analysis (PCA) to classify the samples. Supervised partial least squares discriminant analysis (PLS-DA) and partial least squares (PLS) regression were applied to discriminate different ripening stages and describe the development of volatiles during postharvest storage, respectively. Hereby, substantial chemical markers allowing for class separation were revealed. The workflow permitted the rapid distinction between premature green-ripe pineapples and postharvest-ripened sea-freighted fruits. Volatile profiles of fully ripe air-freighted pineapples were similar to those of green-ripe fruits postharvest ripened for 6 days after simulated sea freight export, after PCA with only two principal components. However, PCA considering also the third principal component allowed differentiation between air-freighted fruits and the four progressing postharvest maturity stages of sea-freighted pineapples.

  5. Characterization of the volatile components in green tea by IRAE-HS-SPME/GC-MS combined with multivariate analysis

    PubMed Central

    Yin, Hong-Xu; Yuan, Hai-Bo; Jiang, Yong-Wen; Dong, Chun-Wang; Deng, Yu-Liang

    2018-01-01

    In the present work, a novel infrared-assisted extraction coupled to headspace solid-phase microextraction (IRAE-HS-SPME) followed by gas chromatography-mass spectrometry (GC-MS) was developed for rapid determination of the volatile components in green tea. The extraction parameters such as fiber type, sample amount, infrared power, extraction time, and infrared lamp distance were optimized by orthogonal experimental design. Under optimum conditions, a total of 82 volatile compounds in 21 green tea samples from different geographical origins were identified. Compared with classical water-bath heating, the proposed technique has remarkable advantages of considerably reducing the analytical time and high efficiency. In addition, an effective classification of green teas based on their volatile profiles was achieved by partial least square-discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). Furthermore, the application of a dual criterion based on the variable importance in the projection (VIP) values of the PLS-DA models and on the category from one-way univariate analysis (ANOVA) allowed the identification of 12 potential volatile markers, which were considered to make the most important contribution to the discrimination of the samples. The results suggest that IRAE-HS-SPME/GC-MS technique combined with multivariate analysis offers a valuable tool to assess geographical traceability of different tea varieties. PMID:29494626

  6. Determinants of HIV Phylogenetic Clustering in Chicago Among Young Black Men Who Have Sex With Men From the uConnect Cohort.

    PubMed

    Morgan, Ethan; Nyaku, Amesika N; DʼAquila, Richard T; Schneider, John A

    2017-07-01

    Phylogenetic analysis determines similarities among HIV genetic sequences from persons infected with HIV, identifying clusters of transmission. We determined characteristics associated with both membership in an HIV transmission cluster and the number of clustered sequences among a cohort of young black men who have sex with men (YBMSM) in Chicago. Pairwise genetic distances of HIV-1 pol sequences were collected during 2013-2016. Potential transmission ties were identified among HIV-infected persons whose sequences were ≤1.5% genetically distant. Putative transmission pairs were defined as ≥1 tie to another sequence. We then determined demographic and risk attributes associated with both membership in an HIV transmission cluster and the number of ties to the sequences from other persons in the cluster. Of 86 available sequences, 31 (36.0%) were tied to ≥1 other sequence. Through multivariable analyses, we determined that those who reported symptoms of depression and those who had a higher number of confidants in their network had significantly decreased odds of membership in transmission clusters. We found that those who had unstable housing and who reported heavy marijuana use had significantly more ties to other individuals within transmission clusters, whereas those identifying as bisexual, those participating in group sex, and those with higher numbers of sexual partners had significantly fewer ties. This study demonstrates the potential for combining phylogenetic and individual and network attributes to target HIV control efforts to persons with potentially higher transmission risk, as well as suggesting some unappreciated specific predictors of transmission risk among YBMSM in Chicago for future study.

  7. Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.

    PubMed

    Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A

    2016-08-01

    The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Exploring the effects of climatic variables on monthly precipitation variation using a continuous wavelet-based multiscale entropy approach.

    PubMed

    Roushangar, Kiyoumars; Alizadeh, Farhad; Adamowski, Jan

    2018-08-01

    Understanding precipitation on a regional basis is an important component of water resources planning and management. The present study outlines a methodology based on continuous wavelet transform (CWT) and multiscale entropy (CWME), combined with self-organizing map (SOM) and k-means clustering techniques, to measure and analyze the complexity of precipitation. Historical monthly precipitation data from 1960 to 2010 at 31 rain gauges across Iran were preprocessed by CWT. The multi-resolution CWT approach segregated the major features of the original precipitation series by unfolding the structure of the time series which was often ambiguous. The entropy concept was then applied to components obtained from CWT to measure dispersion, uncertainty, disorder, and diversification of subcomponents. Based on different validity indices, k-means clustering captured homogenous areas more accurately, and additional analysis was performed based on the outcome of this approach. The 31 rain gauges in this study were clustered into 6 groups, each one having a unique CWME pattern across different time scales. The results of clustering showed that hydrologic similarity (multiscale variation of precipitation) was not based on geographic contiguity. According to the pattern of entropy across the scales, each cluster was assigned an entropy signature that provided an estimation of the entropy pattern of precipitation data in each cluster. Based on the pattern of mean CWME for each cluster, a characteristic signature was assigned, which provided an estimation of the CWME of a cluster across scales of 1-2, 3-8, and 9-13 months relative to other stations. The validity of the homogeneous clusters demonstrated the usefulness of the proposed approach to regionalize precipitation. Further analysis based on wavelet coherence (WTC) was performed by selecting central rain gauges in each cluster and analyzing against temperature, wind, Multivariate ENSO index (MEI), and East Atlantic (EA) and North Atlantic Oscillation (NAO), indeces. The results revealed that all climatic features except NAO influenced precipitation in Iran during the 1960-2010 period. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Multivariate analysis as a key tool in chemotaxonomy of brinjal eggplant, African eggplants and wild related species.

    PubMed

    Haliński, Łukasz P; Samuels, John; Stepnowski, Piotr

    2017-12-01

    The brinjal eggplant (Solanum melongena L.) is an important vegetable species worldwide, while African eggplants (S. aethiopicum L., S. macrocarpon L.) are indigenous vegetable species of local significance. Taxonomy of eggplants and their wild relatives is complicated and still unclear. Hence, the objective of the study was to clarify taxonomic position of cultivars and landraces of brinjal, its wild relatives and African eggplant species and their wild ancestors using chemotaxonomic markers and multivariate analysis techniques for data processing, with special attention paid to the recognition of markers characteristic for each group of the plants. The total of 34 accessions belonging to 9 species from genus Solanum L. were used in the study. Chemotaxonomic analysis was based on the profiles of cuticular n-alkanes and methylalkanes, obtained using gas chromatography-mass spectrometry and gas chromatography with flame ionization detector. Standard hierarchical cluster analysis (HCA) and principal component analysis (PCA) were used for the classification, while the latter and two-way HCA allowed to identify markers responsible for the clustering of the species. Cultivars, landraces and wild forms of S. melongena were practically identical in terms of their taxonomic position. The results confirmed high and statistically significant distinctiveness of all African eggplant species from the brinjal eggplant. The latter was characterized mostly by abundant long chain hydrocarbons in the range of 34-37 carbon atoms. The differences between both African eggplant species were, however, also statistically significant; S. aethiopicum displayed the highest contribution of 2-methylalkanes to the total cuticular hydrocarbons, while S. macrocarpon was characterized by elevated n-alkanes in the range of 25-32 carbon atoms. Wild ancestors of both African eggplant species were identical with their cultivated relatives. Concluding, high usefulness of the chemotaxonomic approach in classification of this important group of plants was confirmed. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Phenotypic mapping of metabolic profiles using self-organizing maps of high-dimensional mass spectrometry data.

    PubMed

    Goodwin, Cody R; Sherrod, Stacy D; Marasco, Christina C; Bachmann, Brian O; Schramm-Sapyta, Nicole; Wikswo, John P; McLean, John A

    2014-07-01

    A metabolic system is composed of inherently interconnected metabolic precursors, intermediates, and products. The analysis of untargeted metabolomics data has conventionally been performed through the use of comparative statistics or multivariate statistical analysis-based approaches; however, each falls short in representing the related nature of metabolic perturbations. Herein, we describe a complementary method for the analysis of large metabolite inventories using a data-driven approach based upon a self-organizing map algorithm. This workflow allows for the unsupervised clustering, and subsequent prioritization of, correlated features through Gestalt comparisons of metabolic heat maps. We describe this methodology in detail, including a comparison to conventional metabolomics approaches, and demonstrate the application of this method to the analysis of the metabolic repercussions of prolonged cocaine exposure in rat sera profiles.

  11. Large-scale Granger causality analysis on resting-state functional MRI

    NASA Astrophysics Data System (ADS)

    D'Souza, Adora M.; Abidin, Anas Zainul; Leistritz, Lutz; Wismüller, Axel

    2016-03-01

    We demonstrate an approach to measure the information flow between each pair of time series in resting-state functional MRI (fMRI) data of the human brain and subsequently recover its underlying network structure. By integrating dimensionality reduction into predictive time series modeling, large-scale Granger Causality (lsGC) analysis method can reveal directed information flow suggestive of causal influence at an individual voxel level, unlike other multivariate approaches. This method quantifies the influence each voxel time series has on every other voxel time series in a multivariate sense and hence contains information about the underlying dynamics of the whole system, which can be used to reveal functionally connected networks within the brain. To identify such networks, we perform non-metric network clustering, such as accomplished by the Louvain method. We demonstrate the effectiveness of our approach to recover the motor and visual cortex from resting state human brain fMRI data and compare it with the network recovered from a visuomotor stimulation experiment, where the similarity is measured by the Dice Coefficient (DC). The best DC obtained was 0.59 implying a strong agreement between the two networks. In addition, we thoroughly study the effect of dimensionality reduction in lsGC analysis on network recovery. We conclude that our approach is capable of detecting causal influence between time series in a multivariate sense, which can be used to segment functionally connected networks in the resting-state fMRI.

  12. Strategies to optimize monitoring schemes of recreational waters from Salta, Argentina: a multivariate approach

    PubMed Central

    Gutiérrez-Cacciabue, Dolores; Teich, Ingrid; Poma, Hugo Ramiro; Cruz, Mercedes Cecilia; Balzarini, Mónica; Rajal, Verónica Beatriz

    2014-01-01

    Several recreational surface waters in Salta, Argentina, were selected to assess their quality. Seventy percent of the measurements exceeded at least one of the limits established by international legislation becoming unsuitable for their use. To interpret results of complex data, multivariate techniques were applied. Arenales River, due to the variability observed in the data, was divided in two: upstream and downstream representing low and high pollution sites, respectively; and Cluster Analysis supported that differentiation. Arenales River downstream and Campo Alegre Reservoir were the most different environments and Vaqueros and La Caldera Rivers were the most similar. Canonical Correlation Analysis allowed exploration of correlations between physicochemical and microbiological variables except in both parts of Arenales River, and Principal Component Analysis allowed finding relationships among the 9 measured variables in all aquatic environments. Variable’s loadings showed that Arenales River downstream was impacted by industrial and domestic activities, Arenales River upstream was affected by agricultural activities, Campo Alegre Reservoir was disturbed by anthropogenic and ecological effects, and La Caldera and Vaqueros Rivers were influenced by recreational activities. Discriminant Analysis allowed identification of subgroup of variables responsible for seasonal and spatial variations. Enterococcus, dissolved oxygen, conductivity, E. coli, pH, and fecal coliforms are sufficient to spatially describe the quality of the aquatic environments. Regarding seasonal variations, dissolved oxygen, conductivity, fecal coliforms, and pH can be used to describe water quality during dry season, while dissolved oxygen, conductivity, total coliforms, E. coli, and Enterococcus during wet season. Thus, the use of multivariate techniques allowed optimizing monitoring tasks and minimizing costs involved. PMID:25190636

  13. Complex codon usage pattern and compositional features of retroviruses.

    PubMed

    RoyChoudhury, Sourav; Mukherjee, Debaprasad

    2013-01-01

    Retroviruses infect a wide range of organisms including humans. Among them, HIV-1, which causes AIDS, has now become a major threat for world health. Some of these viruses are also potential gene transfer vectors. In this study, the patterns of synonymous codon usage in retroviruses have been studied through multivariate statistical methods on ORFs sequences from the available 56 retroviruses. The principal determinant for evolution of the codon usage pattern in retroviruses seemed to be the compositional constraints, while selection for translation of the viral genes plays a secondary role. This was further supported by multivariate analysis on relative synonymous codon usage. Thus, it seems that mutational bias might have dominated role over translational selection in shaping the codon usage of retroviruses. Codon adaptation index was used to identify translationally optimal codons among genes from retroviruses. The comparative analysis of the preferred and optimal codons among different retroviral groups revealed that four codons GAA, AAA, AGA, and GGA were significantly more frequent in most of the retroviral genes inspite of some differences. Cluster analysis also revealed that phylogenetically related groups of retroviruses have probably evolved their codon usage in a concerted manner under the influence of their nucleotide composition.

  14. Assessment of changes of vector borne diseases with wetland characteristics using multivariate analysis.

    PubMed

    Sheela, A M; Sarun, S; Justus, J; Vineetha, P; Sheeja, R V

    2015-04-01

    Vector borne diseases are a threat to human health. Little attention has been paid to the prevention of these diseases. We attempted to identify the significant wetland characteristics associated with the spread of chikungunya, dengue fever and malaria in Kerala, a tropical region of South West India using multivariate analyses (hierarchical cluster analysis, factor analysis and multiple regression). High/medium turbid coastal lagoons and inland water-logged wetlands with aquatic vegetation have significant effect on the incidence of chikungunya while dengue influenced by high turbid coastal beaches and malaria by medium turbid coastal beaches. The high turbidity in water is due to the urban waste discharge namely sewage, sullage and garbage from the densely populated cities and towns. The large extent of wetland is low land area favours the occurrence of vector borne diseases. Hence the provision of pollution control measures at source including soil erosion control measures is vital. The identification of vulnerable zones favouring the vector borne diseases will help the authorities to control pollution especially from urban areas and prevent these vector borne diseases. Future research should cover land use cover changes, climatic factors, seasonal variations in weather and pollution factors favouring the occurrence of vector borne diseases.

  15. Multivariate Analysis of Conformational Changes Induced by Macromolecular Interactions

    NASA Astrophysics Data System (ADS)

    Mitra, Indranil; Alexov, Emil

    2009-11-01

    Understanding protein-protein binding and associated conformational changes is critical for both understanding thermodynamics of protein interactions and successful drug discovery. Our study focuses on computational analysis of plausible correlations between induced conformational changes and set of biophysical characteristics of interacting monomers. It was done by comparing 3D structures of unbound and bound monomers to calculate the RMSD which is used as measure of the structural changed induced by the binding. We correlate RMSD with volumetric and interfacial charge of the monomers, the amino acid composition, the energy of binding, and type of amino acids at the interface. as predictors. The data set was analyzed with SVM in R & SPSS which is trained on a combination of a new robust evolutionary conservation signal with the monomeric properties to predict the induced RMSD. The goal of this study is to undergo parametric tests and heirchiacal cluster and discriminant multivariate analysis to find key predictors which will be used to develop algorithm to predict the magnitude of conformational changes provided by the structure of interacting monomers. Results indicate that the most promising predictor is the net charge of the monomers, however, other parameters as the type of amino acids at the interface have significant contribution as well.

  16. Variation of Water Quality Parameters with Siltation Depth for River Ichamati Along International Border with Bangladesh Using Multivariate Statistical Techniques

    NASA Astrophysics Data System (ADS)

    Roy, P. K.; Pal, S.; Banerjee, G.; Biswas Roy, M.; Ray, D.; Majumder, A.

    2014-12-01

    River is considered as one of the main sources of freshwater all over the world. Hence analysis and maintenance of this water resource is globally considered a matter of major concern. This paper deals with the assessment of surface water quality of the Ichamati river using multivariate statistical techniques. Eight distinct surface water quality observation stations were located and samples were collected. For the samples collected statistical techniques were applied to the physico-chemical parameters and depth of siltation. In this paper cluster analysis is done to determine the relations between surface water quality and siltation depth of river Ichamati. Multiple regressions and mathematical equation modeling have been done to characterize surface water quality of Ichamati river on the basis of physico-chemical parameters. It was found that surface water quality of the downstream river was different from the water quality of the upstream. The analysis of the water quality parameters of the Ichamati river clearly indicate high pollution load on the river water which can be accounted to agricultural discharge, tidal effect and soil erosion. The results further reveal that with the increase in depth of siltation, water quality degraded.

  17. AR-V7 in circulating tumor cells cluster as a predictive biomarker of abiraterone acetate and enzalutamide treatment in castration-resistant prostate cancer patients.

    PubMed

    Okegawa, Takatsugu; Ninomiya, Naoki; Masuda, Kazuki; Nakamura, Yu; Tambo, Mitsuhiro; Nutahara, Kikuo

    2018-06-01

    We examined whether androgen receptor splice variant 7 (AR-V7) in circulating tumor cell(CTC)clusters can be used to predict survival in patients with bone metastatic castration resistant-prostate cancer (mCRPC) treated with abiraterone or enzalutamide. We retrospectively enrolled 98 patients with CRPC on abiraterone or enzalutamide, and investigated the prognostic value of CTC cluster detection (+ v -) and AR-V7 detection (+ v -) using a CTC cluster detection - based AR-V7 mRNA assay. We examined ≤50% prostate-specific antigen (PSA) responses, PSA progression-free survival (PSA-PFS), clinical and radiological progression-free survival (radiologic PSF), and overall survival (OS). We then assessed whether AR-V7 expression in CTC clusters identified after On-chip multi-imaging flow cytometry was related to disease progression and survival after first-line systemic therapy. All abiraterone-treated or enzalutamide-treated patients received prior docetaxel. The median follow-up was 20.7 (range: 3.0-37.0) months in the abiraterone and enzalutamide cohorts, respectively. Forty-nine of the 98 men (50.0%) were CTC cluster (-), 23 of the 98 men (23.5%) were CTC cluster(+)/AR-V7(-), and 26 of the 98 men (26.5%) were CTC cluster(+)/AR-V7(+). CTC cluster(+)/AR-V7(+) patients were more likely to have EOD ≥3 at diagnosis (P = 0.003), pain (P = 0.023), higher alkaline phosphatase levels (P < 0.001), and visceral metastases (P < 0.001). On multivariable analysis, pretherapy CTC cluster(+), CTC cluster(+)/AR-V7(-), and ALP >UNL were independently associated with a poor PSA-PFS, radiographic PFS, and OS in abiraterone-treated patients and enzalutamide-treated patients. The CTC clusters and AR-V7-positive CTC clusters detected were important for assessing the response to abiraterone or enzalutamide therapy and for predicting disease outcome. © 2018 Wiley Periodicals, Inc.

  18. Surnames in Honduras: A study of the population of Honduras through isonymy.

    PubMed

    Herrera Paz, Edwin Francisco; Scapoli, Chiara; Mamolini, Elisabetta; Sandri, Massimo; Carrieri, Alberto; Rodriguez-Larralde, Alvaro; Barrai, Italo

    2014-05-01

    In this work, we investigated surname distribution in 4,348,021 Honduran electors with the aim of detecting population structure through the study of isonymy in three administrative levels: the whole nation, the 18 departments, and the 298 municipalities. For each administrative level, we studied the surname effective number, α, the total inbreeding, FIT , the random inbreeding, FST , and the local inbreeding, FIS . Principal components analysis, multidimensional scaling, and cluster analysis were performed on Lasker's distance matrix to detect the direction of surname diffusion and for a graphic representation of the surname relationship between different locations. The values of FIT , FST , and FIS display a variation of random inbreeding between the administrative levels in the Honduras population, which is attributed to the "Prefecture effect." Multivariate analyses of department data identified two main clusters, one south-western and the second north-eastern, with the Bay Islands and the eastern Gracias a Dios out of the main clusters. The results suggest that currently the population structure of this country is the result of the joint action of short-range directional migration and drift, with drift dominating over migration, and that population diffusion may have taken place mainly in the NW-SE direction. © 2014 John Wiley & Sons Ltd/University College London.

  19. Molecular subtyping of bladder cancer using Kohonen self-organizing maps

    PubMed Central

    Borkowska, Edyta M; Kruk, Andrzej; Jedrzejczyk, Adam; Rozniecki, Marek; Jablonowski, Zbigniew; Traczyk, Magdalena; Constantinou, Maria; Banaszkiewicz, Monika; Pietrusinski, Michal; Sosnowski, Marek; Hamdy, Freddie C; Peter, Stefan; Catto, James WF; Kaluzewski, Bogdan

    2014-01-01

    Kohonen self-organizing maps (SOMs) are unsupervised Artificial Neural Networks (ANNs) that are good for low-density data visualization. They easily deal with complex and nonlinear relationships between variables. We evaluated molecular events that characterize high- and low-grade BC pathways in the tumors from 104 patients. We compared the ability of statistical clustering with a SOM to stratify tumors according to the risk of progression to more advanced disease. In univariable analysis, tumor stage (log rank P = 0.006) and grade (P < 0.001), HPV DNA (P < 0.004), Chromosome 9 loss (P = 0.04) and the A148T polymorphism (rs 3731249) in CDKN2A (P = 0.02) were associated with progression. Multivariable analysis of these parameters identified that tumor grade (Cox regression, P = 0.001, OR.2.9 (95% CI 1.6–5.2)) and the presence of HPV DNA (P = 0.017, OR 3.8 (95% CI 1.3–11.4)) were the only independent predictors of progression. Unsupervised hierarchical clustering grouped the tumors into discreet branches but did not stratify according to progression free survival (log rank P = 0.39). These genetic variables were presented to SOM input neurons. SOMs are suitable for complex data integration, allow easy visualization of outcomes, and may stratify BC progression more robustly than hierarchical clustering. PMID:25142434

  20. Variety identification of brown sugar using short-wave near infrared spectroscopy and multivariate calibration

    NASA Astrophysics Data System (ADS)

    Yang, Haiqing; Wu, Di; He, Yong

    2007-11-01

    Near-infrared spectroscopy (NIRS) with the characteristics of high speed, non-destructiveness, high precision and reliable detection data, etc. is a pollution-free, rapid, quantitative and qualitative analysis method. A new approach for variety discrimination of brown sugars using short-wave NIR spectroscopy (800-1050nm) was developed in this work. The relationship between the absorbance spectra and brown sugar varieties was established. The spectral data were compressed by the principal component analysis (PCA). The resulting features can be visualized in principal component (PC) space, which can lead to discovery of structures correlative with the different class of spectral samples. It appears to provide a reasonable variety clustering of brown sugars. The 2-D PCs plot obtained using the first two PCs can be used for the pattern recognition. Least-squares support vector machines (LS-SVM) was applied to solve the multivariate calibration problems in a relatively fast way. The work has shown that short-wave NIR spectroscopy technique is available for the brand identification of brown sugar, and LS-SVM has the better identification ability than PLS when the calibration set is small.

  1. Comparative multivariate analysis of biometric traits of West African Dwarf and Red Sokoto goats.

    PubMed

    Yakubu, Abdulmojeed; Salako, Adebowale E; Imumorin, Ikhide G

    2011-03-01

    The population structure of 302 randomly selected West African Dwarf (WAD) and Red Sokoto (RS) goats was examined using multivariate morphometric analyses. This was to make the case for conservation, rational management and genetic improvement of these two most important Nigerian goat breeds. Fifteen morphometric measurements were made on each individual animal. RS goats were superior (P<0.05) to the WAD for the body size and skeletal proportions investigated. The phenotypic variability between the two breeds was revealed by their mutual responses in the principal components. While four principal components were extracted for WAD goats, three components were obtained for their RS counterparts with variation in the loading traits of each component for each breed. The Mahalanobis distance of 72.28 indicated a high degree of spatial racial separation in morphology between the genotypes. The Ward's option of the cluster analysis consolidated the morphometric distinctness of the two breeds. Application of selective breeding to genetic improvement would benefit from the detected phenotypic differentiation. Other implications for management and conservation of the goats are highlighted.

  2. Dittrichia graveolens (L.) Greuter Essential Oil: Chemical Composition, Multivariate Analysis, and Antimicrobial Activity.

    PubMed

    Mitic, Violeta; Stankov Jovanovic, Vesna; Ilic, Marija; Jovanovic, Olga; Djordjevic, Aleksandra; Stojanovic, Gordana

    2016-01-01

    The chemical composition and in vitro antimicrobial activities of Dittrichia graveolens (L.) Greuter essential oil was studied. Moreover, using agglomerative hierarchical cluster (AHC) and principal component analyses (PCA), the interrelationships of the D. graveolens essential-oil profiles characterized so far (including the sample from this study) were investigated. To evaluate the chemical composition of the essential oil, GC-FID and GC/MS analyses were performed. Altogether, 54 compounds were identified, accounting for 92.9% of the total oil composition. The D. graveolens oil belongs to the monoterpenoid chemotype, with monoterpenoids comprising 87.4% of the totally identified compounds. The major components were borneol (43.6%) and bornyl acetate (38.3%). Multivariate analysis showed that the compounds borneol and bornyl acetate exerted the greatest influence on the spatial differences in the composition of the reported oils. The antimicrobial activity against five bacterial and one fungal strain was determined using a disk-diffusion assay. The studied essential oil was active only against Gram-positive bacteria. Copyright © 2016 Verlag Helvetica Chimica Acta AG, Zürich.

  3. Descriptor selection for banana accessions based on univariate and multivariate analysis.

    PubMed

    Brandão, L P; Souza, C P F; Pereira, V M; Silva, S O; Santos-Serejo, J A; Ledo, C A S; Amorim, E P

    2013-05-14

    Our objective was to establish a minimum number of morphological descriptors for the characterization of banana germplasm and evaluate the efficiency of removal of redundant characters, based on univariate and multivariate statistical analyses. Phenotypic characterization was made of 77 accessions from Bahia, Brazil, using 92 descriptors. The selection of the descriptors was carried out by principal components analysis (quantitative) and by entropy (multi-category). Efficiency of elimination was analyzed by a comparative study between the clusters formed, taking into consideration all 92 descriptors and smaller groups. The selected descriptors were analyzed with the Ward-MLM procedure and a combined matrix formed by the Gower algorithm. We were able to reduce the number of descriptors used for characterizing the banana germplasm (42%). The correlation between the matrices considering the 92 descriptors and the selected ones was 0.82, showing that the reduction in the number of descriptors did not influence estimation of genetic variability between the banana accessions. We conclude that removing these descriptors caused no loss of information, considering the groups formed from pre-established criteria, including subgroup/subspecies.

  4. Sustainable microbial water quality monitoring programme design using phage-lysis and multivariate techniques.

    PubMed

    Nnane, Daniel Ekane

    2011-11-15

    Contamination of surface waters is a pervasive threat to human health, hence, the need to better understand the sources and spatio-temporal variations of contaminants within river catchments. River catchment managers are required to sustainably monitor and manage the quality of surface waters. Catchment managers therefore need cost-effective low-cost long-term sustainable water quality monitoring and management designs to proactively protect public health and aquatic ecosystems. Multivariate and phage-lysis techniques were used to investigate spatio-temporal variations of water quality, main polluting chemophysical and microbial parameters, faecal micro-organisms sources, and to establish 'sentry' sampling sites in the Ouse River catchment, southeast England, UK. 350 river water samples were analysed for fourteen chemophysical and microbial water quality parameters in conjunction with the novel human-specific phages of Bacteroides GB-124 (Bacteroides GB-124). Annual, autumn, spring, summer, and winter principal components (PCs) explained approximately 54%, 75%, 62%, 48%, and 60%, respectively, of the total variance present in the datasets. Significant loadings of Escherichia coli, intestinal enterococci, turbidity, and human-specific Bacteroides GB-124 were observed in all datasets. Cluster analysis successfully grouped sampling sites into five clusters. Importantly, multivariate and phage-lysis techniques were useful in determining the sources and spatial extent of water contamination in the catchment. Though human faecal contamination was significant during dry periods, the main source of contamination was non-human. Bacteroides GB-124 could potentially be used for catchment routine microbial water quality monitoring. For a cost-effective low-cost long-term sustainable water quality monitoring design, E. coli or intestinal enterococci, turbidity, and Bacteroides GB-124 should be monitored all-year round in this river catchment. Copyright © 2011 Elsevier B.V. All rights reserved.

  5. Forensic discrimination of blue ballpoint pens on documents by laser ablation inductively coupled plasma mass spectrometry and multivariate analysis.

    PubMed

    Alamilla, Francisco; Calcerrada, Matías; García-Ruiz, Carmen; Torre, Mercedes

    2013-05-10

    The differentiation of blue ballpoint pen inks written on documents through an LA-ICP-MS methodology is proposed. Small common office paper portions containing ink strokes from 21 blue pens of known origin were cut and measured without any sample preparation. In a first step, Mg, Ca and Sr were proposed as internal standards (ISs) and used in order to normalize elemental intensities and subtract background signals from the paper. Then, specific criteria were designed and employed to identify target elements (Li, V, Mn, Co, Ni, Cu, Zn, Zr, Sn, W and Pb) which resulted independent of the IS chosen in a 98% of the cases and allowed a qualitative clustering of the samples. In a second step, an elemental-related ratio (ink ratio) based on the targets previously identified was used to obtain mass independent intensities and perform pairwise comparisons by means of multivariate statistical analyses (MANOVA, Tukey's HSD and T2 Hotelling). This treatment improved the discrimination power (DP) and provided objective results, achieving a complete differentiation among different brands and a partial differentiation within pen inks from the same brands. The designed data treatment, together with the use of multivariate statistical tools, represents an easy and useful tool for differentiating among blue ballpoint pen inks, with hardly sample destruction and without the need for methodological calibrations, being its use potentially advantageous from a forensic-practice standpoint. To test the procedure, it was applied to analyze real handwritten questioned contracts, previously studied by the Department of Forensic Document Exams of the Criminalistics Service of Civil Guard (Spain). The results showed that all questioned ink entries were clustered in the same group, being those different from the remaining ink on the document. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  6. Principal components derived from CSF inflammatory profiles predict outcome in survivors after severe traumatic brain injury.

    PubMed

    Kumar, Raj G; Rubin, Jonathan E; Berger, Rachel P; Kochanek, Patrick M; Wagner, Amy K

    2016-03-01

    Studies have characterized absolute levels of multiple inflammatory markers as significant risk factors for poor outcomes after traumatic brain injury (TBI). However, inflammatory marker concentrations are highly inter-related, and production of one may result in the production or regulation of another. Therefore, a more comprehensive characterization of the inflammatory response post-TBI should consider relative levels of markers in the inflammatory pathway. We used principal component analysis (PCA) as a dimension-reduction technique to characterize the sets of markers that contribute independently to variability in cerebrospinal (CSF) inflammatory profiles after TBI. Using PCA results, we defined groups (or clusters) of individuals (n=111) with similar patterns of acute CSF inflammation that were then evaluated in the context of outcome and other relevant CSF and serum biomarkers collected days 0-3 and 4-5 post-injury. We identified four significant principal components (PC1-PC4) for CSF inflammation from days 0-3, and PC1 accounted for the greatest (31%) percentage of variance. PC1 was characterized by relatively higher CSF sICAM-1, sFAS, IL-10, IL-6, sVCAM-1, IL-5, and IL-8 levels. Cluster analysis then defined two distinct clusters, such that individuals in cluster 1 had highly positive PC1 scores and relatively higher levels of CSF cortisol, progesterone, estradiol, testosterone, brain derived neurotrophic factor (BDNF), and S100b; this group also had higher serum cortisol and lower serum BDNF. Multinomial logistic regression analyses showed that individuals in cluster 1 had a 10.9 times increased likelihood of GOS scores of 2/3 vs. 4/5 at 6 months compared to cluster 2, after controlling for covariates. Cluster group did not discriminate between mortality compared to GOS scores of 4/5 after controlling for age and other covariates. Cluster groupings also did not discriminate mortality or 12 month outcomes in multivariate models. PCA and cluster analysis establish that a subset of CSF inflammatory markers measured in days 0-3 post-TBI may distinguish individuals with poor 6-month outcome, and future studies should prospectively validate these findings. PCA of inflammatory mediators after TBI could aid in prognostication and in identifying patient subgroups for therapeutic interventions. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

    PubMed Central

    Zeng, Beiyan; Chen, Yiping P.; Smith, Oscar H.

    2003-01-01

    Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed. PMID:18629292

  8. Metabolomic analysis of primary metabolites in citrus leaf during defense responses.

    PubMed

    Asai, Tomonori; Matsukawa, Tetsuya; Kajiyama, Shin'ichiro

    2017-03-01

    Mechanical damage is one of the unavoidable environmental stresses to plant growth and development. Plants induce a variety of reactions which defend against natural enemies and/or heal the wounded sites. Jasmonic acid (JA) and salicylic acid (SA), defense-related plant hormones, are well known to be involved in induction of defense reactions and play important roles as signal molecules. However, defense related metabolites are so numerous and diverse that roles of individual compounds are still to be elucidated. In this report, we carried out a comprehensive analysis of metabolic changes during wound response in citrus plants which are one of the most commercially important fruit tree families. Changes in amino acid, sugar, and organic acid profiles in leaves were surveyed after wounding, JA and SA treatments using gas chromatography-mass spectrometry (GC/MS) in seven citrus species, Citrus sinensis, Citrus limon, Citrus paradisi, Citrus unshiu, Citrus kinokuni, Citrus grandis, and Citrus hassaku. GC/MS data were applied to multivariate analyses including hierarchical cluster analysis (HCA), primary component analysis (PCA), and orthogonal partial least squares-discriminant analysis (OPLS-DA) to extract stress-related compounds. HCA showed the amino acid cluster including phenylalanine and tryptophan, suggesting that amino acids in this cluster are concertedly regulated during responses against treatments. OPLS-DA exhibited that tryptophan was accumulated after wounding and JA treatments in all species tested, while serine was down regulated. Our results suggest that tryptophan and serine are common biomarker candidates in citrus plants for wound stress. Copyright © 2016 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  9. Autogrid-based clustering of kinases: selection of representative conformations for docking purposes.

    PubMed

    Marzaro, Giovanni; Ferrarese, Alessandro; Chilin, Adriana

    2014-08-01

    The selection of the most appropriate protein conformation is a crucial aspect in molecular docking experiments. In order to reduce the errors arising from the use of a single protein conformation, several authors suggest the use of several tridimensional structures for the target. However, the selection of the most appropriate protein conformations still remains a challenging goal. The protein 3D-structures selection is mainly performed based on pairwise root-mean-square-deviation (RMSD) values computation, followed by hierarchical clustering. Herein we report an alternative strategy, based on the computation of only two atom affinity map for each protein conformation, followed by multivariate analysis and hierarchical clustering. This methodology was applied on seven different kinases of pharmaceutical interest. The comparison with the classical RMSD-based strategy was based on cross-docking of co-crystallized ligands. In the case of epidermal growth factor receptor kinase, also the docking performance on 220 known ligands were evaluated, followed by 3D-QSAR studies. In all the cases, the herein proposed methodology outperformed the RMSD-based one.

  10. Characterization of regional cold-hydrothermal inflows enriched in arsenic and associated trace-elements in the southern part of the Duero Basin (Spain), by multivariate statistical analysis.

    PubMed

    Giménez-Forcada, Elena; Vega-Alegre, Marisol; Timón-Sánchez, Susana

    2017-09-01

    Naturally occurring arsenic in groundwater exceeding the limit for potability has been reported along the southern edge of the Cenozoic Duero Basin (CDB) near its contact with the Spanish Central System (SCS). In this area, spatial variability of arsenic is high, peaking at 241μg/L. Forty-seven percent of samples collected contained arsenic above the maximum allowable concentration for drinking water (10μg/L). Correlations of As with other hydrochemical variables were investigated using multivariate statistical analysis (Hierarchical Cluster Analysis, HCA and Principal Component Analysis, PCA). It was found that As, V, Cr and pH are closely related and that there were also close correlations with temperature and Na + . The highest concentrations of arsenic and other associated Potentially Toxic Geogenic Trace Elements (PTGTE) are linked to alkaline NaHCO 3 waters (pH≈9), moderate oxic conditions and temperatures of around 18°C-19°C. The most plausible hypothesis to explain the high arsenic concentrations is the contribution of deeper regional flows with a significant hydrothermal component (cold-hydrothermal waters), flowing through faults in the basement rock. Water mixing and water-rock interactions occur both in the fissured aquifer media (igneous and metasedimentary bedrock) and in the sedimentary environment of the CDB, where agricultural pollution phenomena are also active. A combination of multivariate statistical tools and hydrochemical analysis enabled the distribution pattern of dissolved As and other PTGTE in groundwaters in the study area to be interpreted, and their most likely origin to be established. This methodology could be applied to other sedimentary areas with similar characteristics and problems. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Brain regions with abnormal network properties in severe epilepsy of Lennox-Gastaut phenotype: Multivariate analysis of task-free fMRI.

    PubMed

    Pedersen, Mangor; Curwood, Evan K; Archer, John S; Abbott, David F; Jackson, Graeme D

    2015-11-01

    Lennox-Gastaut syndrome, and the similar but less tightly defined Lennox-Gastaut phenotype, describe patients with severe epilepsy, generalized epileptic discharges, and variable intellectual disability. Our previous functional neuroimaging studies suggest that abnormal diffuse association network activity underlies the epileptic discharges of this clinical phenotype. Herein we use a data-driven multivariate approach to determine the spatial changes in local and global networks of patients with severe epilepsy of the Lennox-Gastaut phenotype. We studied 9 adult patients and 14 controls. In 20 min of task-free blood oxygen level-dependent functional magnetic resonance imaging data, two metrics of functional connectivity were studied: Regional homogeneity or local connectivity, a measure of concordance between each voxel to a focal cluster of adjacent voxels; and eigenvector centrality, a global connectivity estimate designed to detect important neural hubs. Multivariate pattern analysis of these data in a machine-learning framework was used to identify spatial features that classified disease subjects. Multivariate pattern analysis was 95.7% accurate in classifying subjects for both local and global connectivity measures (22/23 subjects correctly classified). Maximal discriminating features were the following: increased local connectivity in frontoinsular and intraparietal areas; increased global connectivity in posterior association areas; decreased local connectivity in sensory (visual and auditory) and medial frontal cortices; and decreased global connectivity in the cingulate cortex, striatum, hippocampus, and pons. Using a data-driven analysis method in task-free functional magnetic resonance imaging, we show increased connectivity in critical areas of association cortex and decreased connectivity in primary cortex. This supports previous findings of a critical role for these association cortical regions as a final common pathway in generating the Lennox-Gastaut phenotype. Abnormal function of these areas is likely to be important in explaining the intellectual problems characteristic of this disorder. Wiley Periodicals, Inc. © 2015 International League Against Epilepsy.

  12. CoSMoMVPA: Multi-Modal Multivariate Pattern Analysis of Neuroimaging Data in Matlab/GNU Octave.

    PubMed

    Oosterhof, Nikolaas N; Connolly, Andrew C; Haxby, James V

    2016-01-01

    Recent years have seen an increase in the popularity of multivariate pattern (MVP) analysis of functional magnetic resonance (fMRI) data, and, to a much lesser extent, magneto- and electro-encephalography (M/EEG) data. We present CoSMoMVPA, a lightweight MVPA (MVP analysis) toolbox implemented in the intersection of the Matlab and GNU Octave languages, that treats both fMRI and M/EEG data as first-class citizens. CoSMoMVPA supports all state-of-the-art MVP analysis techniques, including searchlight analyses, classification, correlations, representational similarity analysis, and the time generalization method. These can be used to address both data-driven and hypothesis-driven questions about neural organization and representations, both within and across: space, time, frequency bands, neuroimaging modalities, individuals, and species. It uses a uniform data representation of fMRI data in the volume or on the surface, and of M/EEG data at the sensor and source level. Through various external toolboxes, it directly supports reading and writing a variety of fMRI and M/EEG neuroimaging formats, and, where applicable, can convert between them. As a result, it can be integrated readily in existing pipelines and used with existing preprocessed datasets. CoSMoMVPA overloads the traditional volumetric searchlight concept to support neighborhoods for M/EEG and surface-based fMRI data, which supports localization of multivariate effects of interest across space, time, and frequency dimensions. CoSMoMVPA also provides a generalized approach to multiple comparison correction across these dimensions using Threshold-Free Cluster Enhancement with state-of-the-art clustering and permutation techniques. CoSMoMVPA is highly modular and uses abstractions to provide a uniform interface for a variety of MVP measures. Typical analyses require a few lines of code, making it accessible to beginner users. At the same time, expert programmers can easily extend its functionality. CoSMoMVPA comes with extensive documentation, including a variety of runnable demonstration scripts and analysis exercises (with example data and solutions). It uses best software engineering practices including version control, distributed development, an automated test suite, and continuous integration testing. It can be used with the proprietary Matlab and the free GNU Octave software, and it complies with open source distribution platforms such as NeuroDebian. CoSMoMVPA is Free/Open Source Software under the permissive MIT license. Website: http://cosmomvpa.org Source code: https://github.com/CoSMoMVPA/CoSMoMVPA.

  13. CoSMoMVPA: Multi-Modal Multivariate Pattern Analysis of Neuroimaging Data in Matlab/GNU Octave

    PubMed Central

    Oosterhof, Nikolaas N.; Connolly, Andrew C.; Haxby, James V.

    2016-01-01

    Recent years have seen an increase in the popularity of multivariate pattern (MVP) analysis of functional magnetic resonance (fMRI) data, and, to a much lesser extent, magneto- and electro-encephalography (M/EEG) data. We present CoSMoMVPA, a lightweight MVPA (MVP analysis) toolbox implemented in the intersection of the Matlab and GNU Octave languages, that treats both fMRI and M/EEG data as first-class citizens. CoSMoMVPA supports all state-of-the-art MVP analysis techniques, including searchlight analyses, classification, correlations, representational similarity analysis, and the time generalization method. These can be used to address both data-driven and hypothesis-driven questions about neural organization and representations, both within and across: space, time, frequency bands, neuroimaging modalities, individuals, and species. It uses a uniform data representation of fMRI data in the volume or on the surface, and of M/EEG data at the sensor and source level. Through various external toolboxes, it directly supports reading and writing a variety of fMRI and M/EEG neuroimaging formats, and, where applicable, can convert between them. As a result, it can be integrated readily in existing pipelines and used with existing preprocessed datasets. CoSMoMVPA overloads the traditional volumetric searchlight concept to support neighborhoods for M/EEG and surface-based fMRI data, which supports localization of multivariate effects of interest across space, time, and frequency dimensions. CoSMoMVPA also provides a generalized approach to multiple comparison correction across these dimensions using Threshold-Free Cluster Enhancement with state-of-the-art clustering and permutation techniques. CoSMoMVPA is highly modular and uses abstractions to provide a uniform interface for a variety of MVP measures. Typical analyses require a few lines of code, making it accessible to beginner users. At the same time, expert programmers can easily extend its functionality. CoSMoMVPA comes with extensive documentation, including a variety of runnable demonstration scripts and analysis exercises (with example data and solutions). It uses best software engineering practices including version control, distributed development, an automated test suite, and continuous integration testing. It can be used with the proprietary Matlab and the free GNU Octave software, and it complies with open source distribution platforms such as NeuroDebian. CoSMoMVPA is Free/Open Source Software under the permissive MIT license. Website: http://cosmomvpa.org Source code: https://github.com/CoSMoMVPA/CoSMoMVPA PMID:27499741

  14. ASTM clustering for improving coal analysis by near-infrared spectroscopy.

    PubMed

    Andrés, J M; Bona, M T

    2006-11-15

    Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.

  15. Assessment of the Eutrophication-Related Environmental Parameters in Two Mediterranean Lakes by Integrating Statistical Techniques and Self-Organizing Maps.

    PubMed

    Hadjisolomou, Ekaterini; Stefanidis, Konstantinos; Papatheodorou, George; Papastergiadou, Evanthia

    2018-03-19

    During the last decades, Mediterranean freshwater ecosystems, especially lakes, have been under severe pressure due to increasing eutrophication and water quality deterioration. In this article, we compared the effectiveness of different data analysis methods by assessing the contribution of environmental parameters to eutrophication processes. For this purpose, principal components analysis (PCA), cluster analysis, and a self-organizing map (SOM) were applied, using water quality data from two transboundary lakes of North Greece. SOM is considered as an advanced and powerful data analysis tool because of its ability to represent complex and nonlinear relationships among multivariate data sets. The results of PCA and cluster analysis agreed with the SOM results, although the latter provided more information because of the visualization abilities regarding the parameters' relationships. Besides nutrients that were found to be a key factor for controlling chlorophyll-a (Chl - a), water temperature was related positively with algal production, while the Secchi disk depth parameter was found to be highly important and negatively related toeutrophic conditions. In general, the SOM results were more specific and allowed direct associations between the water quality variables. Our work showed that SOMs can be used effectively in limnological studies to produce robust and interpretable results, aiding scientists and managers to cope with environmental problems such as eutrophication.

  16. Cognitive function in schizoaffective disorder and clinical subtypes of schizophrenia.

    PubMed

    Goldstein, Gerald; Shemansky, Wendy Jo; Allen, Daniel N

    2005-03-01

    Cognitive studies of patients with Schizoaffective Disorder typically indicate that the cognitive function of these patients resembles that of patients with Schizophrenic Disorder more than it does patients with nonpsychotic Mood Disorder. In this study patients with Schizoaffective Disorder were compared with patients with Paranoid, Undifferentiated and Residual clinical subtypes on a number of measures of cognitive function. Multivariate analyses of variance indicated that the cognitive function of Schizoaffective and Paranoid patients had more intact cognitive function that did Undifferentiated and Residual patients. Application of cluster analysis indicated that there were relative high percentages of Schizoaffective and Paranoid patients in a "Neuropsychologically Normal" cluster. It was concluded that Schizoaffective Disorder as well as other clinical subtypes of schizophrenia are cognitively heterogeneous, and it was suggested that a subgroup of patients with Schizoaffective Disorder may not differ in cognitive ability from patients with nonpsychotic Mood Disorder.

  17. Simultaneous gains tuning in boiler/turbine PID-based controller clusters using iterative feedback tuning methodology.

    PubMed

    Zhang, Shu; Taft, Cyrus W; Bentsman, Joseph; Hussey, Aaron; Petrus, Bryan

    2012-09-01

    Tuning a complex multi-loop PID based control system requires considerable experience. In today's power industry the number of available qualified tuners is dwindling and there is a great need for better tuning tools to maintain and improve the performance of complex multivariable processes. Multi-loop PID tuning is the procedure for the online tuning of a cluster of PID controllers operating in a closed loop with a multivariable process. This paper presents the first application of the simultaneous tuning technique to the multi-input-multi-output (MIMO) PID based nonlinear controller in the power plant control context, with the closed-loop system consisting of a MIMO nonlinear boiler/turbine model and a nonlinear cluster of six PID-type controllers. Although simplified, the dynamics and cross-coupling of the process and the PID cluster are similar to those used in a real power plant. The particular technique selected, iterative feedback tuning (IFT), utilizes the linearized version of the PID cluster for signal conditioning, but the data collection and tuning is carried out on the full nonlinear closed-loop system. Based on the figure of merit for the control system performance, the IFT is shown to deliver performance favorably comparable to that attained through the empirical tuning carried out by an experienced control engineer. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.

  18. Multivariate statistical analysis of heavy metal concentration in soils of Yelagiri Hills, Tamilnadu, India--spectroscopical approach.

    PubMed

    Chandrasekaran, A; Ravisankar, R; Harikrishnan, N; Satapathy, K K; Prasad, M V R; Kanagasabapathy, K V

    2015-02-25

    Anthropogenic activities increase the accumulation of heavy metals in the soil environment. Soil pollution significantly reduces environmental quality and affects the human health. In the present study soil samples were collected at different locations of Yelagiri Hills, Tamilnadu, India for heavy metal analysis. The samples were analyzed for twelve selected heavy metals (Mg, Al, K, Ca, Ti, Fe, V, Cr, Mn, Co, Ni and Zn) using energy dispersive X-ray fluorescence (EDXRF) spectroscopy. Heavy metals concentration in soil were investigated using enrichment factor (EF), geo-accumulation index (Igeo), contamination factor (CF) and pollution load index (PLI) to determine metal accumulation, distribution and its pollution status. Heavy metal toxicity risk was assessed using soil quality guidelines (SQGs) given by target and intervention values of Dutch soil standards. The concentration of Ni, Co, Zn, Cr, Mn, Fe, Ti, K, Al, Mg were mainly controlled by natural sources. Multivariate statistical methods such as correlation matrix, principal component analysis and cluster analysis were applied for the identification of heavy metal sources (anthropogenic/natural origin). Geo-statistical methods such as kirging identified hot spots of metal contamination in road areas influenced mainly by presence of natural rocks. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Construction of inorganic elemental fingerprint and multivariate statistical analysis of marine traditional Chinese medicine Meretricis concha from Rushan Bay

    NASA Astrophysics Data System (ADS)

    Wu, Xia; Zheng, Kang; Zhao, Fengjia; Zheng, Yongjun; Li, Yantuan

    2014-08-01

    Meretricis concha is a kind of marine traditional Chinese medicine (TCM), and has been commonly used for the treatment of asthma and scald burns. In order to investigate the relationship between the inorganic elemental fingerprint and the geographical origin identification of Meretricis concha, the elemental contents of M. concha from five sampling points in Rushan Bay have been determined by means of inductively coupled plasma optical emission spectrometry (ICP-OES). Based on the contents of 14 inorganic elements (Al, As, Cd, Co, Cr, Cu, Fe, Hg, Mn, Mo, Ni, Pb, Se, and Zn), the inorganic elemental fingerprint which well reflects the elemental characteristics was constructed. All the data from the five sampling points were discriminated with accuracy through hierarchical cluster analysis (HCA) and principle component analysis (PCA), indicating that a four-factor model which could explain approximately 80% of the detection data was established, and the elements Al, As, Cd, Cu, Ni and Pb could be viewed as the characteristic elements. This investigation suggests that the inorganic elemental fingerprint combined with multivariate statistical analysis is a promising method for verifying the geographical origin of M. concha, and this strategy should be valuable for the authenticity discrimination of some marine TCM.

  20. Targeted metabolomic profiling in rat tissues reveals sex differences.

    PubMed

    Ruoppolo, Margherita; Caterino, Marianna; Albano, Lucia; Pecce, Rita; Di Girolamo, Maria Grazia; Crisci, Daniela; Costanzo, Michele; Milella, Luigi; Franconi, Flavia; Campesi, Ilaria

    2018-03-16

    Sex differences affect several diseases and are organ-and parameter-specific. In humans and animals, sex differences also influence the metabolism and homeostasis of amino acids and fatty acids, which are linked to the onset of diseases. Thus, the use of targeted metabolite profiles in tissues represents a powerful approach to examine the intermediary metabolism and evidence for any sex differences. To clarify the sex-specific activities of liver, heart and kidney tissues, we used targeted metabolomics, linear discriminant analysis (LDA), principal component analysis (PCA), cluster analysis and linear correlation models to evaluate sex and organ-specific differences in amino acids, free carnitine and acylcarnitine levels in male and female Sprague-Dawley rats. Several intra-sex differences affect tissues, indicating that metabolite profiles in rat hearts, livers and kidneys are organ-dependent. Amino acids and carnitine levels in rat hearts, livers and kidneys are affected by sex: male and female hearts show the greatest sexual dimorphism, both qualitatively and quantitatively. Finally, multivariate analysis confirmed the influence of sex on the metabolomics profiling. Our data demonstrate that the metabolomics approach together with a multivariate approach can capture the dynamics of physiological and pathological states, which are essential for explaining the basis of the sex differences observed in physiological and pathological conditions.

  1. Ultraviolet spectroscopy combined with ultra-fast liquid chromatography and multivariate statistical analysis for quality assessment of wild Wolfiporia extensa from different geographical origins.

    PubMed

    Li, Yan; Zhang, Ji; Jin, Hang; Liu, Honggao; Wang, Yuanzhong

    2016-08-05

    A quality assessment system comprised of a tandem technique of ultraviolet (UV) spectroscopy and ultra-fast liquid chromatography (UFLC) aided by multivariate analysis was presented for the determination of geographic origin of Wolfiporia extensa collected from five regions in Yunnan Province of China. Characteristic UV spectroscopic fingerprints of samples were determined based on its methanol extract. UFLC was applied for the determination of pachymic acid (a biomarker) presented in individual test samples. The spectrum data matrix and the content of pachymic acid were integrated and analyzed by partial least squares discriminant analysis (PLS-DA) and hierarchical cluster analysis (HCA). The results showed that chemical properties of samples were clearly dominated by the epidermis and inner part as well as geographical origins. The relationships among samples obtained from these five regions have been also presented. Moreover, an interesting finding implied that geographical origins had much greater influence on the chemical properties of epidermis compared with that of the inner part. This study demonstrated that a rapid tool for accurate discrimination of W. extensa by UV spectroscopy and UFLC could be available for quality control of complicated medicinal mushrooms. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Extracting galactic structure parameters from multivariated density estimation

    NASA Technical Reports Server (NTRS)

    Chen, B.; Creze, M.; Robin, A.; Bienayme, O.

    1992-01-01

    Multivariate statistical analysis, including includes cluster analysis (unsupervised classification), discriminant analysis (supervised classification) and principle component analysis (dimensionlity reduction method), and nonparameter density estimation have been successfully used to search for meaningful associations in the 5-dimensional space of observables between observed points and the sets of simulated points generated from a synthetic approach of galaxy modelling. These methodologies can be applied as the new tools to obtain information about hidden structure otherwise unrecognizable, and place important constraints on the space distribution of various stellar populations in the Milky Way. In this paper, we concentrate on illustrating how to use nonparameter density estimation to substitute for the true densities in both of the simulating sample and real sample in the five-dimensional space. In order to fit model predicted densities to reality, we derive a set of equations which include n lines (where n is the total number of observed points) and m (where m: the numbers of predefined groups) unknown parameters. A least-square estimation will allow us to determine the density law of different groups and components in the Galaxy. The output from our software, which can be used in many research fields, will also give out the systematic error between the model and the observation by a Bayes rule.

  3. Arsenic health risk assessment in drinking water and source apportionment using multivariate statistical techniques in Kohistan region, northern Pakistan.

    PubMed

    Muhammad, Said; Tahir Shah, M; Khan, Sardar

    2010-10-01

    The present study was conducted in Kohistan region, where mafic and ultramafic rocks (Kohistan island arc and Indus suture zone) and metasedimentary rocks (Indian plate) are exposed. Water samples were collected from the springs, streams and Indus river and analyzed for physical parameters, anions, cations and arsenic (As(3+), As(5+) and arsenic total). The water quality in Kohistan region was evaluated by comparing the physio-chemical parameters with permissible limits set by Pakistan environmental protection agency and world health organization. Most of the studied parameters were found within their respective permissible limits. However in some samples, the iron and arsenic concentrations exceeded their permissible limits. For health risk assessment of arsenic, the average daily dose, hazards quotient (HQ) and cancer risk were calculated by using statistical formulas. The values of HQ were found >1 in the samples collected from Jabba, Dubair, while HQ values were <1 in rest of the samples. This level of contamination should have low chronic risk and medium cancer risk when compared with US EPA guidelines. Furthermore, the inter-dependence of physio-chemical parameters and pollution load was also calculated by using multivariate statistical techniques like one-way ANOVA, correlation analysis, regression analysis, cluster analysis and principle component analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.

  4. Solfatara volcano subsurface imaging: two different approaches to process and interpret multi-variate data sets

    NASA Astrophysics Data System (ADS)

    Bernardinetti, Stefano; Bruno, Pier Paolo; Lavoué, François; Gresse, Marceau; Vandemeulebrouck, Jean; Revil, André

    2017-04-01

    The need to reduce model uncertainty and produce a more reliable geophysical imaging and interpretations is nowadays a fundamental task required to geophysics techniques applied in complex environments such as Solfatara Volcano. The use of independent geophysical methods allows to obtain many information on the subsurface due to the different sensitivities of the data towards parameters such as compressional and shearing wave velocities, bulk electrical conductivity, or density. The joint processing of these multiple physical properties can lead to a very detailed characterization of the subsurface and therefore enhance our imaging and our interpretation. In this work, we develop two different processing approaches based on reflection seismology and seismic P-wave tomography on one hand, and electrical data acquired over the same line, on the other hand. From these data, we obtain an image-guided electrical resistivity tomography and a post processing integration of tomographic results. The image-guided electrical resistivity tomography is obtained by regularizing the inversion of the electrical data with structural constraints extracted from a migrated seismic section using image processing tools. This approach enables to focus the reconstruction of electrical resistivity anomalies along the features visible in the seismic section, and acts as a guide for interpretation in terms of subsurface structures and processes. To integrate co-registrated P-wave velocity and electrical resistivity values, we apply a data mining tool, the k-means algorithm, to individuate relationships between the two set of variables. This algorithm permits to individuate different clusters with the objective to minimize the sum of squared Euclidean distances within each cluster and maximize it between clusters for the multivariate data set. We obtain a partitioning of the multivariate data set in a finite number of well-correlated clusters, representative of the optimum clustering of our geophysical variables (P-wave velocities and electrical resistivities). The result is an integrated tomography that shows a finite number of homogeneous geophysical facies, and therefore permits to highlight the main geological features of the subsurface.

  5. Identifying children at risk for being bullies in the United States.

    PubMed

    Shetgiri, Rashmi; Lin, Hua; Flores, Glenn

    2012-01-01

    To identify risk factors associated with the greatest and lowest prevalence of bullying perpetration among U.S. children. Using the 2001-2002 Health Behavior in School-Aged Children, a nationally representative survey of U.S. children in 6th-10th grades, bivariate analyses were conducted to identify factors associated with any (once or twice or more), moderate (two to three times/month or more), and frequent (weekly or more) bullying. Stepwise multivariable analyses identified risk factors associated with bullying. Recursive partitioning analysis (RPA) identified risk factors which, in combination, identify students with the highest and lowest bullying prevalence. The prevalence of any bullying in the 13,710 students was 37.3%, moderate bullying was 12.6%, and frequent bullying was 6.6%. Characteristics associated with bullying were similar in the multivariable analyses and RPA clusters. In RPA, the highest prevalence of any bullying (67%) accrued in children with a combination of fighting and weapon-carrying. Students who carry weapons, smoke, and drink alcohol more than 5 to 6 days/week were at greatest risk for moderate bullying (61%). Those who carry weapons, smoke, have more than one alcoholic drink per day, have above-average academic performance, moderate/high family affluence, and feel irritable or bad-tempered daily were at greatest risk for frequent bullying (68%). Risk clusters for any, moderate, and frequent bullying differ. Children who fight and carry weapons are at greatest risk of any bullying. Weapon-carrying, smoking, and alcohol use are included in the greatest risk clusters for moderate and frequent bullying. Risk-group categories may be useful to providers in identifying children at the greatest risk for bullying and in targeting interventions. Copyright © 2012 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.

  6. Identifying Children At Risk for Being Bullies in the US

    PubMed Central

    Shetgiri, Rashmi; Lin, Hua; Flores, Glenn

    2012-01-01

    Objective To identify risk factors associated with the highest and lowest prevalence of bullying perpetration among US children. Methods Using the 2001–2002 Health Behavior in School-Aged Children, a nationally-representative survey of US children in 6th–10th grades, bivariate analyses were conducted to identify factors associated with any (≥ once or twice), moderate (≥ two-three times/month), and frequent (≥ weekly) bullying. Stepwise multivariable analyses identified risk factors associated with bullying. Recursive partitioning analysis (RPA) identified risk factors which, in combination, identify students with the highest and lowest bullying prevalence. Results The prevalence of any bullying in the 13,710 students was 37.3%, moderate bullying was 12.6%, and frequent bullying was 6.6%. Characteristics associated with bullying were similar in the multivariable analyses and RPA clusters. In RPA, the highest prevalence of any bullying (67%) accrued in children with a combination of fighting and weapon-carrying. Students who carry weapons, smoke, and drink alcohol more than 5–6 days weekly were at highest risk for moderate bullying (61%). Those who carry weapons, smoke, drink > once daily, have above-average academic performance, moderate/high family affluence, and feel irritable or bad-tempered daily were at highest risk for frequent bullying (68%). Conclusions Risk clusters for any, moderate, and frequent bullying differ. Children who fight and carry weapons are at highest risk of any bullying. Weapon-carrying, smoking, and alcohol use are included in the highest risk clusters for moderate and frequent bullying. Risk-group categories may be useful to providers in identifying children at highest risks for bullying and in targeting interventions. PMID:22989731

  7. Influence of shifting cultivation practices on soil-plant-beetle interactions.

    PubMed

    Ibrahim, Kalibulla Syed; Momin, Marcy D; Lalrotluanga, R; Rosangliana, David; Ghatak, Souvik; Zothansanga, R; Kumar, Nachimuthu Senthil; Gurusubramanian, Guruswami

    2016-08-01

    Shifting cultivation (jhum) is a major land use practice in Mizoram. It was considered as an eco-friendly and efficient method when the cycle duration was long (15-30 years), but it poses the problem of land degradation and threat to ecology when shortened (4-5 years) due to increased intensification of farming systems. Studying beetle community structure is very helpful in understanding how shifting cultivation affects the biodiversity features compared to natural forest system. The present study examines the beetle species diversity and estimates the effects of shifting cultivation practices on the beetle assemblages in relation to change in tree species composition and soil nutrients. Scarabaeidae and Carabidae were observed to be the dominant families in the land use systems studied. Shifting cultivation practice significantly (P < 0.05) affected the beetle and tree species diversity as well as the soil nutrients as shown by univariate (one-way analysis of variance (ANOVA), correlation and regression, diversity indices) and multivariate (cluster analysis, principal component analysis (PCA), detrended correspondence analysis (DCA), canonical variate analysis (CVA), permutational multivariate analysis of variance (PERMANOVA), permutational multivariate analysis of dispersion (PERMDISP)) statistical analyses. Besides changing the tree species composition and affecting the soil fertility, shifting cultivation provides less suitable habitat conditions for the beetle species. Bioindicator analysis categorized the beetle species into forest specialists, anthropogenic specialists (shifting cultivation habitat specialist), and habitat generalists. Molecular analysis of bioindicator beetle species was done using mitochondrial cytochrome oxidase subunit I (COI) marker to validate the beetle species and describe genetic variation among them in relation to heterogeneity, transition/transversion bias, codon usage bias, evolutionary distance, and substitution pattern. The present study revealed the fact that shifting cultivation practice significantly affects the beetle species in terms of biodiversity pattern as well as evolutionary features. Spatiotemporal assessment of soil-plant-beetle interactions in shifting cultivation system and their influence in land degradation and ecology will be helpful in making biodiversity conservation decisions in the near future.

  8. Sagittal Thoracic and Lumbar Spine Profiles in Upright Standing and Lying Prone Positions Among Healthy Subjects: Influence of Various Biometric Features.

    PubMed

    Salem, Walid; Coomans, Ysaline; Brismée, Jean-Michel; Klein, Paul; Sobczak, Stéphane; Dugailly, Pierre-Michel

    2015-08-01

    A prospective study was performed on the assessment of both thoracic and lumbar spine sagittal profiles (from C7 to S1). To propose a new noninvasive method for measuring the spine curvatures in standing and lying prone positions and to analyze their relationship with various biometric characteristics. Modifications of spine curvatures (i.e. lordosis or kyphosis) are of importance in the development of spinal disorders. Studies have emphasized the development of new devices to measure the spine sagittal profiles using a noninvasive and low-cost method. To date, it has not been applied for analyzing both lumbar and thoracic alterations for various positioning. Seventy-five healthy subjects (mean 22.6 ± 4.3 yr) were recruited to participate in this study. Thoracic and lumbar sagittal profiles were assessed in standing and lying prone positions using a 3D digitizer. In addition, several biometric data were collected including maximal trunk isometric strength for flexion and extension movement. Statistical analysis consisted in data comparisons of spine profiles and a multivariate analysis including biometric features, to classify individuals considering low within- and high between-variability. Kyphosis and lordosis angles decreased significantly from standing to lying prone position by an average of 13.4° and 16.6°, respectively. Multivariate analysis showed a sample clustering of 3 homogenous subgroups. The first group displayed larger lordosis and flexibility, and had low data values for height, weight, and strength. The second group had lower values than the overall trend of the whole sample, whereas the third group had larger score values for the torques, height, weight, waist, body mass index, and kyphosis angle but a reduced flexibility. The present results demonstrate a significant effect of the positioning on both thoracic and lumbar spine sagittal profiles and highlight the use of cluster analysis to categorize subgroups after biometric characteristics including curvature measurement. N/A.

  9. Hydrochemical and multivariate analysis of groundwater quality in the northwest of Sinai, Egypt.

    PubMed

    El-Shahat, M F; Sadek, M A; Salem, W M; Embaby, A A; Mohamed, F A

    2017-08-01

    The northwestern coast of Sinai is home to many economic activities and development programs, thus evaluation of the potentiality and vulnerability of water resources is important. The present work has been conducted on the groundwater resources of this area for describing the major features of groundwater quality and the principal factors that control salinity evolution. The major ionic content of 39 groundwater samples collected from the Quaternary aquifer shows high coefficients of variation reflecting asymmetry of aquifer recharge. The groundwater samples have been classified into four clusters (using hierarchical cluster analysis), these match the variety of total dissolvable solids, water types and ionic orders. The principal component analysis combined the ionic parameters of the studied groundwater samples into two principal components. The first represents about 56% of the whole sample variance reflecting a salinization due to evaporation, leaching, dissolution of marine salts and/or seawater intrusion. The second represents about 15.8% reflecting dilution with rain water and the El-Salam Canal. Most groundwater samples were not suitable for human consumption and about 41% are suitable for irrigation. However, all groundwater samples are suitable for cattle, about 69% and 15% are suitable for horses and poultry, respectively.

  10. Clustering of diet, physical activity and sedentary behaviour among Australian children: cross-sectional and longitudinal associations with overweight and obesity.

    PubMed

    Leech, R M; McNaughton, S A; Timperio, A

    2015-07-01

    Evidence suggests diet, physical activity (PA) and sedentary behaviour cluster together in children, but research supporting an association with overweight/obesity is equivocal. Furthermore, the stability of clusters over time is unknown. The aim of this study was to examine the clustering of diet, PA and sedentary behaviour in Australian children and cross-sectional and longitudinal associations with overweight/obesity. Stability of obesity-related clusters over 3 years was also examined. Data were drawn from the baseline (T1: 2002/2003) and follow-up waves (T2: 2005/2006) of the Health Eating and Play Study. Parents of Australian children aged 5-6 (n=87) and 10-12 years (n=123) completed questionnaires. Children wore accelerometers and height and weight were measured. Obesity-related clusters were determined using K-medians cluster analysis. Multivariate regression models assessed cross-sectional and longitudinal associations between cluster membership, and body mass index (BMI) Z-score and weight status. Kappa statistics assessed cluster stability over time. Three clusters, labelled 'most healthy', 'energy-dense (ED) consumers who watch TV' and 'high sedentary behaviour/low moderate-to-vigorous PA' were identified at baseline and at follow-up. No cross-sectional associations were found between cluster membership, and BMI Z-score or weight status at baseline. Longitudinally, children in the 'ED consumers who watch TV' cluster had a higher odds of being overweight/obese at follow-up (odds ratio=2.8; 95% confidence interval: 1.1, 6.9; P<0.05). Tracking of cluster membership was fair to moderate in younger (K=0.24; P=0.0001) and older children (K=0.46; P<0.0001). This study identified an unhealthy cluster of TV viewing with ED food/drink consumption, which predicted overweight/obesity in a small longitudinal sample of Australian children. Cluster stability was fair to moderate over 3 years and is a novel finding. Prospective research in larger samples is needed to examine how obesity-related clusters track over time and influence the development of overweight and obesity.

  11. Combined multivariate statistical techniques, Water Pollution Index (WPI) and Daniel Trend Test methods to evaluate temporal and spatial variations and trends of water quality at Shanchong River in the Northwest Basin of Lake Fuxian, China.

    PubMed

    Wang, Quan; Wu, Xianhua; Zhao, Bin; Qin, Jie; Peng, Tingchun

    2015-01-01

    Understanding spatial and temporal variations in river water quality and quantitatively evaluating the trend of changes are important in order to study and efficiently manage water resources. In this study, an analysis of Water Pollution Index (WPI), Daniel Trend Test, Cluster Analysis and Discriminant Analysis are applied as an integrated approach to quantitatively explore the spatial and temporal variations and the latent sources of water pollution in the Shanchong River basin, Northwest Basin of Lake Fuxian, China. We group all field surveys into 2 clusters (dry season and rainy season). Moreover, 14 sampling sites have been grouped into 3 clusters for the rainy season (highly polluted, moderately polluted and less polluted sites) and 2 clusters for the dry season (highly polluted and less polluted sites) based on their similarities and the level of pollution during the two seasons. The results show that the main trend of pollution was aggravated during the transition from the dry to the rainy season. The Water Pollution Index of Total Nitrogen is the highest of all pollution parameters, whereas the Chemical Oxygen Demand (Chromium) is the lowest. Our results also show that the main sources of pollution are farming activities alongside the Shanchong River, soil erosion and fish culture at Shanchong River reservoir area and domestic sewage from scattered rural residential area. Our results suggest that strategies to prevent water pollutionat the Shanchong River basin need to focus on non-point pollution control by employing appropriate fertilizer formulas in farming, and take the measures of soil and water conservation at Shanchong reservoir area, and purifying sewage from scattered villages.

  12. Combined Multivariate Statistical Techniques, Water Pollution Index (WPI) and Daniel Trend Test Methods to Evaluate Temporal and Spatial Variations and Trends of Water Quality at Shanchong River in the Northwest Basin of Lake Fuxian, China

    PubMed Central

    Wang, Quan; Wu, Xianhua; Zhao, Bin; Qin, Jie; Peng, Tingchun

    2015-01-01

    Understanding spatial and temporal variations in river water quality and quantitatively evaluating the trend of changes are important in order to study and efficiently manage water resources. In this study, an analysis of Water Pollution Index (WPI), Daniel Trend Test, Cluster Analysis and Discriminant Analysis are applied as an integrated approach to quantitatively explore the spatial and temporal variations and the latent sources of water pollution in the Shanchong River basin, Northwest Basin of Lake Fuxian, China. We group all field surveys into 2 clusters (dry season and rainy season). Moreover, 14 sampling sites have been grouped into 3 clusters for the rainy season (highly polluted, moderately polluted and less polluted sites) and 2 clusters for the dry season (highly polluted and less polluted sites) based on their similarities and the level of pollution during the two seasons. The results show that the main trend of pollution was aggravated during the transition from the dry to the rainy season. The Water Pollution Index of Total Nitrogen is the highest of all pollution parameters, whereas the Chemical Oxygen Demand (Chromium) is the lowest. Our results also show that the main sources of pollution are farming activities alongside the Shanchong River, soil erosion and fish culture at Shanchong River reservoir area and domestic sewage from scattered rural residential area. Our results suggest that strategies to prevent water pollutionat the Shanchong River basin need to focus on non-point pollution control by employing appropriate fertilizer formulas in farming, and take the measures of soil and water conservation at Shanchong reservoir area, and purifying sewage from scattered villages. PMID:25837673

  13. Evaluation of Fourier transform infrared (FT-IR) spectroscopy and chemometrics as a rapid approach for sub-typing Escherichia coli O157:H7 isolates.

    PubMed

    Davis, R; Paoli, G; Mauer, L J

    2012-09-01

    The importance of tracking outbreaks of foodborne illness and the emergence of new virulent subtypes of foodborne pathogens have created the need for rapid and reliable sub-typing methods for Escherichia coli O157:H7. Fourier transform infrared (FT-IR) spectroscopy coupled with multivariate statistical analyses was used for sub-typing 30 strains of E. coli O157:H7 that had previously been typed by multilocus variable number tandem repeat analysis (MLVA) and pulsed field gel electrophoresis (PFGE). Hierarchical cluster analysis (HCA) and canonical variate analysis (CVA) of the FT-IR spectra resulted in the clustering of the same or similar MLVA types and separation of different MLVA types of E. coli O157:H7. The developed FT-IR method showed better discriminatory power than PFGE in sub-typing E. coli O157:H7. Results also indicated the spectral relatedness between different outbreak strains. However, the grouping of some strains was not in complete agreement with the clustering based on PFGE and MLVA. Additionally, HCA of the spectra differentiated the strains into 30 sub-clusters, indicating the high specificity and suitability of the method for strain level identification. Strains were also classified (97% correct) based on the type of Shiga toxin present using CVA of the spectra. This study demonstrated that FT-IR spectroscopy is suitable for rapid (≤16 h) and economical sub-typing of E. coli O157:H7 with comparable accuracy to MLVA typing. This is the first report of using an FT-IR-based method for sub-typing E. coli O157:H7. Copyright © 2012 Elsevier Ltd. All rights reserved.

  14. Multivariate analysis of meat production traits in Murciano-Granadina goat kids.

    PubMed

    Zurita-Herrera, P; Delgado, J V; Argüello, A; Camacho, M E

    2011-07-01

    Growth, carcass quality, and meat quality data from Murciano-Granadina kids (n=61) raised under three different systems were collected. Canonical discriminatory analysis and cluster analysis of the entire meat production process and its stages were performed using the rearing systems as grouping criteria. All comparisons resulted in significant differences and indicated the existence of three products with different quality characteristics as a result of the three rearing systems. Differences among groups were greater when comparing carcass and meat qualities as compared with growth differences. The paired analyses of canonical correlations among groups of variables integrated in growth, carcass and meat quality, resulted in all being statistically significant, pointing out the canonical correlation coefficient between carcass quality and meat quality. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. Landscape Characterization and Representativeness Analysis for Understanding Sampling Network Coverage

    DOE Data Explorer

    Maddalena, Damian; Hoffman, Forrest; Kumar, Jitendra; Hargrove, William

    2014-08-01

    Sampling networks rarely conform to spatial and temporal ideals, often comprised of network sampling points which are unevenly distributed and located in less than ideal locations due to access constraints, budget limitations, or political conflict. Quantifying the global, regional, and temporal representativeness of these networks by quantifying the coverage of network infrastructure highlights the capabilities and limitations of the data collected, facilitates upscaling and downscaling for modeling purposes, and improves the planning efforts for future infrastructure investment under current conditions and future modeled scenarios. The work presented here utilizes multivariate spatiotemporal clustering analysis and representativeness analysis for quantitative landscape characterization and assessment of the Fluxnet, RAINFOR, and ForestGEO networks. Results include ecoregions that highlight patterns of bioclimatic, topographic, and edaphic variables and quantitative representativeness maps of individual and combined networks.

  16. Interpretation of sedimentological processes of coarse-grained deposits applying a novel combined cluster and discriminant analysis

    NASA Astrophysics Data System (ADS)

    Farics, Éva; Farics, Dávid; Kovács, József; Haas, János

    2017-10-01

    The main aim of this paper is to determine the depositional environments of an Upper-Eocene coarse-grained clastic succession in the Buda Hills, Hungary. First of all, we measured some commonly used parameters of samples (size, amount, roundness and sphericity) in a much more objective overall and faster way than with traditional measurement approaches, using the newly developed Rock Analyst application. For the multivariate data obtained, we applied Combined Cluster and Discriminant Analysis (CCDA) in order to determine homogeneous groups of the sampling locations based on the quantitative composition of the conglomerate as well as the shape parameters (roundness and sphericity). The result is the spatial pattern of these groups, which assists with the interpretation of the depositional processes. According to our concept, those sampling sites which belong to the same homogeneous groups were likely formed under similar geological circumstances and by similar geological processes. In the Buda Hills, we were able to distinguish various sedimentological environments within the area based on the results: fan, intermittent stream or marine.

  17. Myeloid Clusters Are Associated with a Pro-Metastatic Environment and Poor Prognosis in Smoking-Related Early Stage Non-Small Cell Lung Cancer

    PubMed Central

    Zhang, Wang; Pal, Sumanta K.; Liu, Xueli; Yang, Chunmei; Allahabadi, Sachin; Bhanji, Shaira; Figlin, Robert A.; Yu, Hua; Reckamp, Karen L.

    2013-01-01

    Background This study aimed to understand the role of myeloid cell clusters in uninvolved regional lymph nodes from early stage non-small cell lung cancer patients. Methods Uninvolved regional lymph node sections from 67 patients with stage I–III resected non-small cell lung cancer were immunostained to detect myeloid clusters, STAT3 activity and occult metastasis. Anthracosis intensity, myeloid cluster infiltration associated with anthracosis and pSTAT3 level were scored and correlated with patient survival. Multivariate Cox regression analysis was performed with prognostic variables. Human macrophages were used for in vitro nicotine treatment. Results CD68+ myeloid clusters associated with anthracosis and with an immunosuppressive and metastasis-promoting phenotype and elevated overall STAT3 activity were observed in uninvolved lymph nodes. In patients with a smoking history, myeloid cluster score significantly correlated with anthracosis intensity and pSTAT3 level (P<0.01). Nicotine activated STAT3 in macrophages in long-term culture. CD68+ myeloid clusters correlated and colocalized with occult metastasis. Myeloid cluster score was an independent prognostic factor (P = 0.049) and was associated with survival by Kaplan-Maier estimate in patients with a history of smoking (P = 0.055). The combination of myeloid cluster score with either lymph node stage or pSTAT3 level defined two populations with a significant difference in survival (P = 0.024 and P = 0.004, respectively). Conclusions Myeloid clusters facilitate a pro-metastatic microenvironment in uninvolved regional lymph nodes and associate with occult metastasis in early stage non-small cell lung cancer. Myeloid cluster score is an independent prognostic factor for survival in patients with a history of smoking, and may present a novel method to inform therapy choices in the adjuvant setting. Further validation studies are warranted. PMID:23717691

  18. Visualizing frequent patterns in large multivariate time series

    NASA Astrophysics Data System (ADS)

    Hao, M.; Marwah, M.; Janetzko, H.; Sharma, R.; Keim, D. A.; Dayal, U.; Patnaik, D.; Ramakrishnan, N.

    2011-01-01

    The detection of previously unknown, frequently occurring patterns in time series, often called motifs, has been recognized as an important task. However, it is difficult to discover and visualize these motifs as their numbers increase, especially in large multivariate time series. To find frequent motifs, we use several temporal data mining and event encoding techniques to cluster and convert a multivariate time series to a sequence of events. Then we quantify the efficiency of the discovered motifs by linking them with a performance metric. To visualize frequent patterns in a large time series with potentially hundreds of nested motifs on a single display, we introduce three novel visual analytics methods: (1) motif layout, using colored rectangles for visualizing the occurrences and hierarchical relationships of motifs in a multivariate time series, (2) motif distortion, for enlarging or shrinking motifs as appropriate for easy analysis and (3) motif merging, to combine a number of identical adjacent motif instances without cluttering the display. Analysts can interactively optimize the degree of distortion and merging to get the best possible view. A specific motif (e.g., the most efficient or least efficient motif) can be quickly detected from a large time series for further investigation. We have applied these methods to two real-world data sets: data center cooling and oil well production. The results provide important new insights into the recurring patterns.

  19. Boredom-proneness, loneliness, social engagement and depression and their association with cognitive function in older people: a population study.

    PubMed

    Conroy, Ronan M; Golden, Jeannette; Jeffares, Isabelle; O'Neill, Desmond; McGee, Hannah

    2010-08-01

    In this study, we use data from a population survey of persons aged 65 and over living in the Irish Republic to examine the relationship of cognitive impairment, assessed using the Abbreviated Mental Test, with loneliness, boredom-proneness, social relations, and depression. Participants were randomly selected community-dwelling Irish people aged 65+ years. An Abbreviated Mental Test score of 8 or 9 out of 10 was classified as 'low normal', and a score of less than 8 as 'possible cognitive impairment'. We used clustering around latent variables analysis (CLV) to identify families of variables associated with reduced cognitive function. The overall prevalence of possible cognitive impairment was 14.7% (95% CI 12.4-17.3%). Low normal scores had a prevalence of 30.5% (95% CI 27.2-33.7%). CLV analysis identified three groups of predictors: 'Low social support' (widowed, living alone, low social support), 'personal cognitive reserve' (low social activity, no leisure exercise, never having married, loneliness and boredom-proneness), and 'sociodemographic cognitive reserve' (primary education, rural domicile). In multivariate analysis, both cognitive reserve clusters, but not social support, were independently associated with cognitive function. Loneliness and boredom-proneness are associated with reduced cognitive function in older age, and cluster with other factors associated with cognitive reserve. Both may have a common underlying mechanism in the failure to select and maintain attention on particular features of the social environment (loneliness) or the non-social environment (boredom-proneness).

  20. Deterministic annealing for density estimation by multivariate normal mixtures

    NASA Astrophysics Data System (ADS)

    Kloppenburg, Martin; Tavan, Paul

    1997-03-01

    An approach to maximum-likelihood density estimation by mixtures of multivariate normal distributions for large high-dimensional data sets is presented. Conventionally that problem is tackled by notoriously unstable expectation-maximization (EM) algorithms. We remove these instabilities by the introduction of soft constraints, enabling deterministic annealing. Our developments are motivated by the proof that algorithmically stable fuzzy clustering methods that are derived from statistical physics analogs are special cases of EM procedures.

  1. Genetic and environmental influences on dimensional representations of DSM-IV cluster C personality disorders: a population-based multivariate twin study.

    PubMed

    Reichborn-Kjennerud, Ted; Czajkowski, Nikolai; Neale, Michael C; Ørstavik, Ragnhild E; Torgersen, Svenn; Tambs, Kristian; Røysamb, Espen; Harris, Jennifer R; Kendler, Kenneth S

    2007-05-01

    The DSM-IV cluster C Axis II disorders include avoidant (AVPD), dependent (DEPD) and obsessive-compulsive (OCPD) personality disorders. We aimed to estimate the genetic and environmental influences on dimensional representations of these disorders and examine the validity of the cluster C construct by determining to what extent common familial factors influence the individual PDs. PDs were assessed using the Structured Interview for DSM-IV Personality (SIDP-IV) in a sample of 1386 young adult twin pairs from the Norwegian Institute of Public Health Twin Panel (NIPHTP). A single-factor independent pathway multivariate model was applied to the number of endorsed criteria for the three cluster C disorders, using the statistical modeling program Mx. The best-fitting model included genetic and unique environmental factors only, and equated parameters for males and females. Heritability ranged from 27% to 35%. The proportion of genetic variance explained by a common factor was 83, 48 and 15% respectively for AVPD, DEPD and OCPD. Common genetic and environmental factors accounted for 54% and 64% respectively of the variance in AVPD and DEPD but only 11% of the variance in OCPD. Cluster C PDs are moderately heritable. No evidence was found for shared environmental or sex effects. Common genetic and individual environmental factors account for a substantial proportion of the variance in AVPD and DEPD. However, OCPD appears to be largely etiologically distinct from the other two PDs. The results do not support the validity of the DSM-IV cluster C construct in its present form.

  2. A modern approach to the authentication and quality assessment of thyme using UV spectroscopy and chemometric analysis.

    PubMed

    Gad, Haidy A; El-Ahmady, Sherweit H; Abou-Shoer, Mohamed I; Al-Azizi, Mohamed M

    2013-01-01

    Recently, the fields of chemometrics and multivariate analysis have been widely implemented in the quality control of herbal drugs to produce precise results, which is crucial in the field of medicine. Thyme represents an essential medicinal herb that is constantly adulterated due to its resemblance to many other plants with similar organoleptic properties. To establish a simple model for the quality assessment of Thymus species using UV spectroscopy together with known chemometric techniques. The success of this model may also serve as a technique for the quality control of other herbal drugs. The model was constructed using 30 samples of authenticated Thymus vulgaris and challenged with 20 samples of different botanical origins. The methanolic extracts of all samples were assessed using UV spectroscopy together with chemometric techniques: principal component analysis (PCA), soft independent modeling of class analogy (SIMCA) and hierarchical cluster analysis (HCA). The model was able to discriminate T. vulgaris from other Thymus, Satureja, Origanum, Plectranthus and Eriocephalus species, all traded in the Egyptian market as different types of thyme. The model was also able to classify closely related species in clusters using PCA and HCA. The model was finally used to classify 12 commercial thyme varieties into clusters of species incorporated in the model as thyme or non-thyme. The model constructed is highly recommended as a simple and efficient method for distinguishing T. vulgaris from other related species as well as the classification of marketed herbs as thyme or non-thyme. Copyright © 2013 John Wiley & Sons, Ltd.

  3. Multivariate analysis of the volatile components in tobacco based on infrared-assisted extraction coupled to headspace solid-phase microextraction and gas chromatography-mass spectrometry.

    PubMed

    Yang, Yanqin; Pan, Yuanjiang; Zhou, Guojun; Chu, Guohai; Jiang, Jian; Yuan, Kailong; Xia, Qian; Cheng, Changhe

    2016-11-01

    A novel infrared-assisted extraction coupled to headspace solid-phase microextraction followed by gas chromatography with mass spectrometry method has been developed for the rapid determination of the volatile components in tobacco. The optimal extraction conditions for maximizing the extraction efficiency were as follows: 65 μm polydimethylsiloxane-divinylbenzene fiber, extraction time of 20 min, infrared power of 175 W, and distance between the infrared lamp and the headspace vial of 2 cm. Under the optimum conditions, 50 components were found to exist in all ten tobacco samples from different geographical origins. Compared with conventional water-bath heating and nonheating extraction methods, the extraction efficiency of infrared-assisted extraction was greatly improved. Furthermore, multivariate analysis including principal component analysis, hierarchical cluster analysis, and similarity analysis were performed to evaluate the chemical information of these samples and divided them into three classifications, including rich, moderate, and fresh flavors. The above-mentioned classification results were consistent with the sensory evaluation, which was pivotal and meaningful for tobacco discrimination. As a simple, fast, cost-effective, and highly efficient method, the infrared-assisted extraction coupled to headspace solid-phase microextraction technique is powerful and promising for distinguishing the geographical origins of the tobacco samples coupled to suitable chemometrics. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Multivariate analysis of chromatographic retention data as a supplementary means for grouping structurally related compounds.

    PubMed

    Fasoula, S; Zisi, Ch; Sampsonidis, I; Virgiliou, Ch; Theodoridis, G; Gika, H; Nikitas, P; Pappa-Louisi, A

    2015-03-27

    In the present study a series of 45 metabolite standards belonging to four chemically similar metabolite classes (sugars, amino acids, nucleosides and nucleobases, and amines) was subjected to LC analysis on three HILIC columns under 21 different gradient conditions with the aim to explore whether the retention properties of these analytes are determined from the chemical group they belong. Two multivariate techniques, principal component analysis (PCA) and discriminant analysis (DA), were used for statistical evaluation of the chromatographic data and extraction similarities between chemically related compounds. The total variance explained by the first two principal components of PCA was found to be about 98%, whereas both statistical analyses indicated that all analytes are successfully grouped in four clusters of chemical structure based on the retention obtained in four or at least three chromatographic runs, which, however should be performed on two different HILIC columns. Moreover, leave-one-out cross-validation of the above retention data set showed that the chemical group in which an analyte belongs can be 95.6% correctly predicted when the analyte is subjected to LC analysis under the same four or three experimental conditions as the all set of analytes was run beforehand. That, in turn, may assist with disambiguation of analyte identification in complex biological extracts. Copyright © 2015 Elsevier B.V. All rights reserved.

  5. Multivariate generalized hidden Markov regression models with random covariates: Physical exercise in an elderly population.

    PubMed

    Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello

    2018-04-22

    A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.

  6. A population-based study of the association of medical manpower with county trauma death rates in the United States.

    PubMed Central

    Rutledge, R; Fakhry, S M; Baker, C C; Weaver, N; Ramenofsky, M; Sheldon, G F; Meyer, A A

    1994-01-01

    OBJECTIVE: To determine the association between measures of medical manpower available to treat trauma patients and county trauma death rates in the United States. The primary hypothesis was that greater availability of medical manpower to treat trauma injury would be associated with lower trauma death rates. SUMMARY BACKGROUND DATA: When viewed from the standpoint of the number of productive years of life lost, trauma has a greater effect on health care and lost productivity in the United States than any disease. Allocation of health care manpower to treat injuries seems logical, but studies have not been done to determine its efficacy. The effect of medical manpower and hospital resource allocation on the outcome of injury in the United States has not been fully explored or adequately evaluated. METHODS: Data on trauma deaths in the United States were obtained from the National Center for Health Statistics. Data on the number of surgeons and emergency medicine physicians were obtained from the American Hospital Association and the American Medical Association. Data on physicians who have participated in the American College of Surgeons (ACS) Advanced Trauma Life Support Course (ATLS) were obtained from the ACS. Membership information for the American Association for Surgery of Trauma (AAST) was obtained from that organization. Demographic data were obtained from the United States Census Bureau. Multivariate stepwise linear regression and cluster analysis were used to model the county trauma death rates in the United States. The Statistical Analysis System (Cary, NC) for statistical analysis was used. RESULTS: Bivariate and multivariate analyses showed that a variety of medical manpower measures and demographic factors were associated with county trauma death rates in the United States. As in other studies, measures of low population density and high levels of poverty were found to be strongly associated with increased trauma death rates. After accounting for these variables, using multivariate analysis and cluster analysis, an increase in the following medical manpower measures were associated with decreased county trauma death rates: number of board-certified general surgeons, number of board-certified emergency medicine physicians, number of AAST members, and number of ATLS-trained physicians. CONCLUSIONS: This study confirms previous work that showed a strong relation among measures of poverty, rural setting, and increased county trauma death rates. It also found that counties with more board-certified surgeons per capita and with more surgeons with an increased interest (AAST membership) or increased training (ATLS) in trauma care have lower per-capita trauma death rates.(ABSTRACT TRUNCATED AT 400 WORDS) Images Figure 1. PMID:8185404

  7. Community health assessment using self-organizing maps and geographic information systems

    PubMed Central

    Basara, Heather G; Yuan, May

    2008-01-01

    Background From a public health perspective, a healthier community environment correlates with fewer occurrences of chronic or infectious diseases. Our premise is that community health is a non-linear function of environmental and socioeconomic effects that are not normally distributed among communities. The objective was to integrate multivariate data sets representing social, economic, and physical environmental factors to evaluate the hypothesis that communities with similar environmental characteristics exhibit similar distributions of disease. Results The SOM algorithm used the intrinsic distributions of 92 environmental variables to classify 511 communities into five clusters. SOM determined clusters were reprojected to geographic space and compared with the distributions of several health outcomes. ANOVA results indicated that the variability between community clusters was significant with respect to the spatial distribution of disease occurrence. Conclusion Our study demonstrated a positive relationship between environmental conditions and health outcomes in communities using the SOM-GIS method to overcome data and methodological challenges traditionally encountered in public health research. Results demonstrated that community health can be classified using environmental variables and that the SOM-GIS method may be applied to multivariate environmental health studies. PMID:19116020

  8. Groundwater quality in Ghaziabad district, Uttar Pradesh, India: Multivariate and health risk assessment.

    PubMed

    Chabukdhara, Mayuri; Gupta, Sanjay Kumar; Kotecha, Yatharth; Nema, Arvind K

    2017-07-01

    This study aimed to assess the quality of groundwater and potential health risk due to ingestion of heavy metals in the peri-urban and urban-industrial clusters of Ghaziabad district, Uttar Pradesh, India. Furthermore, the study aimed to evaluate heavy metals sources and their pollution level using multivariate analysis and fuzzy comprehensive assessment (FCA), respectively. Multivariate analysis using principle component analysis (PCA) showed mixed origin for Pb, Cd, Zn, Fe, and Ni, natural source for Cu and Mn and anthropogenic source for Cr. Among all the metals, Pb, Cd, Fe and Ni were above the safe limits of Bureau of Indian Standards (BIS) and World Health Organization (WHO) except Ni. Health risk in terms of hazard quotient (HQ) showed that the HQ values for children were higher than the safe level (HQ = 1) for Pb (2.4) and Cd (2.1) in pre-monsoon while in post-monsoon the value exceeded only for Pb (HQ = 1.23). The health risks of heavy metals for the adults were well within safe limits. The finding of this study indicates potential health risks to the children due to chronic exposure to contaminated groundwater in the region. Based on FCA, groundwater pollution could be categorized as quite high in the peri-urban region, and absolutely high in the urban region of Ghaziabad district. This study showed that different approaches are required for the integrated assessment of the groundwater pollution, and provides a scientific basis for the strategic future planning and comprehensive management. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Non-destructive analysis of the conformational differences among feedstock sources and their corresponding co-products from bioethanol production with molecular spectroscopy.

    PubMed

    Gamage, I H; Jonker, A; Zhang, X; Yu, P

    2014-01-24

    The objective of this study was to determine the possibility of using molecular spectroscopy with multivariate technique as a fast method to detect the source effects among original feedstock sources of wheat and their corresponding co-products, wheat DDGS, from bioethanol production. Different sources of the bioethanol feedstock and their corresponding bioethanol co-products, three samples per source, were collected from the same newly-built bioethanol plant with current bioethanol processing technology. Multivariate molecular spectral analyses were carried out using agglomerative hierarchical cluster analysis (AHCA) and principal component analysis (PCA). The molecular spectral data of different feedstock sources and their corresponding co-products were compared at four different regions of ca. 1800-1725 cm(-1) (carbonyl CO ester, mainly related to lipid structure conformation), ca. 1725-1482 cm(-1) (amide I and amide II region mainly related to protein structure conformation), ca. 1482-1180 cm(-1) (mainly associated with structural carbohydrate) and ca. 1180-800 cm(-1) (mainly related to carbohydrates) in complex plant-based system. The results showed that the molecular spectroscopy with multivariate technique could reveal the structural differences among the bioethanol feedstock sources and among their corresponding co-products. The AHCA and PCA analyses were able to distinguish the molecular structure differences associated with chemical functional groups among the different sources of the feedstock and their corresponding co-products. The molecular spectral differences indicated the differences in functional, biomolecular and biopolymer groups which were confirmed by wet chemical analysis. These biomolecular and biopolymer structural differences were associated with chemical and nutrient profiles and nutrient utilization and availability. Molecular spectral analyses had the potential to identify molecular structure difference among bioethanol feedstock sources and their corresponding co-products. Copyright © 2013 Elsevier B.V. All rights reserved.

  10. Non-destructive analysis of the conformational differences among feedstock sources and their corresponding co-products from bioethanol production with molecular spectroscopy

    NASA Astrophysics Data System (ADS)

    Gamage, I. H.; Jonker, A.; Zhang, X.; Yu, P.

    2014-01-01

    The objective of this study was to determine the possibility of using molecular spectroscopy with multivariate technique as a fast method to detect the source effects among original feedstock sources of wheat and their corresponding co-products, wheat DDGS, from bioethanol production. Different sources of the bioethanol feedstock and their corresponding bioethanol co-products, three samples per source, were collected from the same newly-built bioethanol plant with current bioethanol processing technology. Multivariate molecular spectral analyses were carried out using agglomerative hierarchical cluster analysis (AHCA) and principal component analysis (PCA). The molecular spectral data of different feedstock sources and their corresponding co-products were compared at four different regions of ca. 1800-1725 cm-1 (carbonyl Cdbnd O ester, mainly related to lipid structure conformation), ca. 1725-1482 cm-1 (amide I and amide II region mainly related to protein structure conformation), ca. 1482-1180 cm-1 (mainly associated with structural carbohydrate) and ca. 1180-800 cm-1 (mainly related to carbohydrates) in complex plant-based system. The results showed that the molecular spectroscopy with multivariate technique could reveal the structural differences among the bioethanol feedstock sources and among their corresponding co-products. The AHCA and PCA analyses were able to distinguish the molecular structure differences associated with chemical functional groups among the different sources of the feedstock and their corresponding co-products. The molecular spectral differences indicated the differences in functional, biomolecular and biopolymer groups which were confirmed by wet chemical analysis. These biomolecular and biopolymer structural differences were associated with chemical and nutrient profiles and nutrient utilization and availability. Molecular spectral analyses had the potential to identify molecular structure difference among bioethanol feedstock sources and their corresponding co-products.

  11. Discrimination of honeys using colorimetric sensor arrays, sensory analysis and gas chromatography techniques.

    PubMed

    Tahir, Haroon Elrasheid; Xiaobo, Zou; Xiaowei, Huang; Jiyong, Shi; Mariod, Abdalbasit Adam

    2016-09-01

    Aroma profiles of six honey varieties of different botanical origins were investigated using colorimetric sensor array, gas chromatography-mass spectrometry (GC-MS) and descriptive sensory analysis. Fifty-eight aroma compounds were identified, including 2 norisoprenoids, 5 hydrocarbons, 4 terpenes, 6 phenols, 7 ketones, 9 acids, 12 aldehydes and 13 alcohols. Twenty abundant or active compounds were chosen as key compounds to characterize honey aroma. Discrimination of the honeys was subsequently implemented using multivariate analysis, including hierarchical clustering analysis (HCA) and principal component analysis (PCA). Honeys of the same botanical origin were grouped together in the PCA score plot and HCA dendrogram. SPME-GC/MS and colorimetric sensor array were able to discriminate the honeys effectively with the advantages of being rapid, simple and low-cost. Moreover, partial least squares regression (PLSR) was applied to indicate the relationship between sensory descriptors and aroma compounds. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. Risk factors for age-related macular degeneration: findings from the Andhra Pradesh eye disease study in South India.

    PubMed

    Krishnaiah, Sannapaneni; Das, Taraprasad; Nirmalan, Praveen K; Nutheti, Rishita; Shamanna, Bindiganavale R; Rao, Gullapalli N; Thomas, Ravi

    2005-12-01

    To assess prevalence, potential risk factors, and population attributable risk percentage (PAR%) for age-related macular degeneration (AMD) in the Indian state of Andhra Pradesh. A population-based study, using a stratified, random, cluster, systematic sampling strategy, was conducted in the state of Andhra Pradesh in India from 1996 to 2000. Participants from 94 clusters in one urban and three rural areas representative of the population of Andhra Pradesh underwent a detailed interview and a detailed dilated ocular evaluation by trained professionals. In this report, the authors present the prevalence estimates of AMD and examine the association of AMD with potential risk factors in persons aged 40 to 102 years (n = 3723). AMD was defined according to the international classification and grading system. Standard bivariate and multivariate analyses were performed to identify the potential risk factors for AMD. PAR% was calculated by Levin's formula. AMD was present in 71 subjects--an age-gender-area-adjusted prevalence of 1.82% (95% confidence interval [CI], 1.39%-2.25%). Risk factors that were significant in bivariate analyses were considered for multivariate logistic regression analysis. Multivariate analysis showed that the adjusted prevalence of AMD was significantly higher in those 60 years of age or older (odds ratio [OR], 3.55; 95% CI, 1.61-7.82) and history of prior cigar smoking (OR, 3.29; 95%CI, 1.42-7.57). Presence of cortical cataract and prior cataract surgery were significantly associated with increased prevalence of AMD (adjusted OR, 2.87; 95% CI, 1.57-5.26 and 3.79; 95% CI, 2.1-6.78), respectively. The prevalence of AMD was significantly lower in light alcohol drinkers (adjusted OR, 0.38; 95% CI, 0.19-0.76) compared with nondrinkers. The PAR% for hypertension and heavy cigar smoking was 10% and 14%, respectively, in this population. The prevalence of AMD in this south Indian population is similar to those reported in other developed countries. Abstinence from smoking may reduce the risk of AMD in this population.

  13. Multivariate analysis of the impacts of the turbine fuel JP-4 in a microcosm toxicity test with implications for the evaluation of ecosystem dynamics and risk assessment.

    PubMed

    Landis, W G; Matthews, R A; Markiewicz, A J; Matthews, G B

    1993-12-01

    Turbine fuels are often the only aviation fuel available in most of the world. Turbine fuels consist of numerous constituents with varying water solubilities, volatilities and toxicities. This study investigates the toxicity of the water soluble fraction (WSF) of JP-4 using the Standard Aquatic Microcosm (SAM). Multivariate analysis of the complex data, including the relatively new method of nonmetric clustering, was used and compared to more traditional analyses. Particular emphasis is placed on ecosystem dynamics in multivariate space.The WSF is prepared by vigorously mixing the fuel and the SAM microcosm media in a separatory funnel. The water phase, which contains the water-soluble fraction of JP-4 is then collected. The SAM experiment was conducted using concentrations of 0.0, 1.5 and 15% WSF. The WSF is added on day 7 of the experiments by removing 450 ml from each microcosm including the controls, then adding the appropriate amount of toxicant solution and finally bringing the final volume to 3 L with microcosm media. Analysis of the WSF was performed by purge and trap gas chromatography. The organic constituents of the WSF were not recoverable from the water column within several days of the addition of the toxicant. However, the impact of the WSF on the microcosm was apparent. In the highest initial concentration treatment group an algal bloom ensued, generated by the apparent toxicity of the WSF of JP-4 to the daphnids. As the daphnid populations recovered the algal populations decreased to control values. Multivariate methods clearly demonstrated this initial impact along with an additional oscillation seperating the four treatment groups in the latter segment of the experiment. Apparent recovery may be an artifact of the projections used to describe the multivariate data. The variables that were most important in distinguishing the four groups shifted during the course of the 63 day experiment. Even this simple microcosm exhibited a variety of dynamics, with implications for biomonitoring schemes and ecological risk assessments.

  14. Multivariate genetic determinants of EEG oscillations in schizophrenia and psychotic bipolar disorder from the BSNIP study

    PubMed Central

    Narayanan, B; Soh, P; Calhoun, V D; Ruaño, G; Kocherla, M; Windemuth, A; Clementz, B A; Tamminga, C A; Sweeney, J A; Keshavan, M S; Pearlson, G D

    2015-01-01

    Schizophrenia (SZ) and psychotic bipolar disorder (PBP) are disabling psychiatric illnesses with complex and unclear etiologies. Electroencephalogram (EEG) oscillatory abnormalities in SZ and PBP probands are heritable and expressed in their relatives, but the neurobiology and genetic factors mediating these abnormalities in the psychosis dimension of either disorder are less explored. We examined the polygenic architecture of eyes-open resting state EEG frequency activity (intrinsic frequency) from 64 channels in 105 SZ, 145 PBP probands and 56 healthy controls (HCs) from the multisite BSNIP (Bipolar-Schizophrenia Network on Intermediate Phenotypes) study. One million single-nucleotide polymorphisms (SNPs) were derived from DNA. We assessed eight data-driven EEG frequency activity derived from group-independent component analysis (ICA) in conjunction with a reduced subset of 10 422 SNPs through novel multivariate association using parallel ICA (para-ICA). Genes contributing to the association were examined collectively using pathway analysis tools. Para-ICA extracted five frequency and nine SNP components, of which theta and delta activities were significantly correlated with two different gene components, comprising genes participating extensively in brain development, neurogenesis and synaptogenesis. Delta and theta abnormality was present in both SZ and PBP, while theta differed between the two disorders. Theta abnormalities were also mediated by gene clusters involved in glutamic acid pathways, cadherin and synaptic contact-based cell adhesion processes. Our data suggest plausible multifactorial genetic networks, including novel and several previously identified (DISC1) candidate risk genes, mediating low frequency delta and theta abnormalities in psychoses. The gene clusters were enriched for biological properties affecting neural circuitry and involved in brain function and/or development. PMID:26101851

  15. On the potential for the Partial Triadic Analysis to grasp the spatio-temporal variability of groundwater hydrochemistry

    NASA Astrophysics Data System (ADS)

    Gourdol, L.; Hissler, C.; Pfister, L.

    2012-04-01

    The Luxembourg sandstone aquifer is of major relevance for the national supply of drinking water in Luxembourg. The city of Luxembourg (20% of the country's population) gets almost 2/3 of its drinking water from this aquifer. As a consequence, the study of both the groundwater hydrochemistry, as well as its spatial and temporal variations, are considered as of highest priority. Since 2005, a monitoring network has been implemented by the Water Department of Luxembourg City, with a view to a more sustainable management of this strategic water resource. The data collected to date forms a large and complex dataset, describing spatial and temporal variations of many hydrochemical parameters. The data treatment issue is tightly connected to this kind of water monitoring programs and complex databases. Standard multivariate statistical techniques, such as principal components analysis and hierarchical cluster analysis, have been widely used as unbiased methods for extracting meaningful information from groundwater quality data and are now classically used in many hydrogeological studies, in particular to characterize temporal or spatial hydrochemical variations induced by natural and anthropogenic factors. But these classical multivariate methods deal with two-way matrices, usually parameters/sites or parameters/time, while often the dataset resulting from qualitative water monitoring programs should be seen as a datacube parameters/sites/time. Three-way matrices, such as the one we propose here, are difficult to handle and to analyse by classical multivariate statistical tools and thus should be treated with approaches dealing with three-way data structures. One possible analysis approach consists in the use of partial triadic analysis (PTA). The PTA was previously used with success in many ecological studies but never to date in the domain of hydrogeology. Applied to the dataset of the Luxembourg Sandstone aquifer, the PTA appears as a new promising statistical instrument for hydrogeologists, in particular to characterize temporal and spatial hydrochemical variations induced by natural and anthropogenic factors. This new approach for groundwater management offers potential for 1) identifying a common multivariate spatial structure, 2) untapping the different hydrochemical patterns and explaining their controlling factors and 3) analysing the temporal variability of this structure and grasping hydrochemical changes.

  16. Spatiotemporal Analysis of Corn Phenoregions in the Continental United States

    NASA Astrophysics Data System (ADS)

    Konduri, V. S.; Kumar, J.; Hoffman, F. M.; Ganguly, A. R.; Hargrove, W. W.

    2017-12-01

    The delineation of regions exhibiting similar crop performance has potential benefits for agricultural planning and management, policymaking and natural resource conservation. Studies of natural ecosystems have used multivariate clustering algorithms based on environmental characteristics to identify ecoregions for species range prediction and habitat conservation. However, few studies have used clustering to delineate regions based on crop phenology. The aim of this study was to perform a spatiotemporal analysis of phenologically self-similar clusters, or phenoregions, for the major corn growing areas in the Continental United States (CONUS) for the period 2008-2016. Annual trajectories of remotely sensed normalized difference vegetation index (NDVI), a useful proxy for land surface phenology, derived from Moderate Resolution Spectroradiometer (MODIS) instruments at 8-day intervals and 250 m resolution was used as the phenological metric. Because of the large data volumes involved, the phenoregion delineation was performed using a highly scalable, unsupervised clustering technique with the help of high performance computing. These phenoregions capture the spatial variability in the timing of important crop phenological stages (like emergence and maturity dates) and thus could be used to develop more accurate parameterizations for crop models applied at regional to global scales. Moreover, historical crop performance from phenoregions, in combination with climate and soils data, could be used to improve production forecasts. The temporal variability in NDVI at each location could also be used to develop an early warning system to identify locations where the crop deviates from its expected phenological behavior. Such deviations may indicate a need for irrigation or fertilization or suggest where pest outbreaks or other disturbances have occurred.

  17. Clusters of Healthy and Unhealthy Eating Behaviors Are Associated With Body Mass Index Among Adults.

    PubMed

    Heerman, William J; Jackson, Natalie; Hargreaves, Margaret; Mulvaney, Shelagh A; Schlundt, David; Wallston, Kenneth A; Rothman, Russell L

    2017-05-01

    To identify eating styles from 6 eating behaviors and test their association with body mass index (BMI) among adults. Cross-sectional analysis of self-report survey data. Twelve primary care and specialty clinics in 5 states. Of 11,776 adult patients who consented to participate, 9,977 completed survey questions. Frequency of eating healthy food, frequency of eating unhealthy food, breakfast frequency, frequency of snacking, overall diet quality, and problem eating behaviors. The primary dependent variable was BMI, calculated from self-reported height and weight data. k-Means cluster analysis of eating behaviors was used to determine eating styles. A categorical variable representing each eating style cluster was entered in a multivariate linear regression predicting BMI, controlling for covariates. Four eating styles were identified and defined by healthy vs unhealthy diet patterns and engagement in problem eating behaviors. Each group had significantly higher average BMI than the healthy eating style: healthy with problem eating behaviors (β = 1.9; P < .001), unhealthy (β = 2.5; P < .001), and unhealthy with problem eating behaviors (β = 5.1; P < .001). Future attempts to improve eating styles should address not only the consumption of healthy foods but also snacking behaviors and the emotional component of eating. Copyright © 2017 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.

  18. Molecular subtyping of bladder cancer using Kohonen self-organizing maps.

    PubMed

    Borkowska, Edyta M; Kruk, Andrzej; Jedrzejczyk, Adam; Rozniecki, Marek; Jablonowski, Zbigniew; Traczyk, Magdalena; Constantinou, Maria; Banaszkiewicz, Monika; Pietrusinski, Michal; Sosnowski, Marek; Hamdy, Freddie C; Peter, Stefan; Catto, James W F; Kaluzewski, Bogdan

    2014-10-01

    Kohonen self-organizing maps (SOMs) are unsupervised Artificial Neural Networks (ANNs) that are good for low-density data visualization. They easily deal with complex and nonlinear relationships between variables. We evaluated molecular events that characterize high- and low-grade BC pathways in the tumors from 104 patients. We compared the ability of statistical clustering with a SOM to stratify tumors according to the risk of progression to more advanced disease. In univariable analysis, tumor stage (log rank P = 0.006) and grade (P < 0.001), HPV DNA (P < 0.004), Chromosome 9 loss (P = 0.04) and the A148T polymorphism (rs 3731249) in CDKN2A (P = 0.02) were associated with progression. Multivariable analysis of these parameters identified that tumor grade (Cox regression, P = 0.001, OR.2.9 (95% CI 1.6-5.2)) and the presence of HPV DNA (P = 0.017, OR 3.8 (95% CI 1.3-11.4)) were the only independent predictors of progression. Unsupervised hierarchical clustering grouped the tumors into discreet branches but did not stratify according to progression free survival (log rank P = 0.39). These genetic variables were presented to SOM input neurons. SOMs are suitable for complex data integration, allow easy visualization of outcomes, and may stratify BC progression more robustly than hierarchical clustering. © 2014 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

  19. Chemical indices and methods of multivariate statistics as a tool for odor classification.

    PubMed

    Mahlke, Ingo T; Thiesen, Peter H; Niemeyer, Bernd

    2007-04-01

    Industrial and agricultural off-gas streams are comprised of numerous volatile compounds, many of which have substantially different odorous properties. State-of-the-art waste-gas treatment includes the characterization of these molecules and is directed at, if possible, either the avoidance of such odorants during processing or the use of existing standardized air purification techniques like bioscrubbing or afterburning, which however, often show low efficiency under ecological and economical regards. Selective odor separation from the off-gas streams could ease many of these disadvantages but is not yet widely applicable. Thus, the aim of this paper is to identify possible model substances in selective odor separation research from 155 volatile molecules mainly originating from livestock facilities, fat refineries, and cocoa and coffee production by knowledge-based methods. All compounds are examined with regard to their structure and information-content using topological and information-theoretical indices. Resulting data are fitted in an observation matrix, and similarities between the substances are computed. Principal component analysis and k-means cluster analysis are conducted showing that clustering of indices data can depict odor information correlating well to molecular composition and molecular shape. Quantitative molecule describtion along with the application of such statistical means therefore provide a good classification tool of malodorant structure properties with no thermodynamic data needed. The approximate look-alike shape of odorous compounds within the clusters suggests a fair choice of possible model molecules.

  20. Combining vibrational biomolecular spectroscopy with chemometric techniques for the study of response and sensitivity of molecular structures/functional groups mainly related to lipid biopolymer to various processing applications.

    PubMed

    Yu, Gloria Qingyu; Yu, Peiqiang

    2015-09-01

    The objectives of this project were to (1) combine vibrational spectroscopy with chemometric multivariate techniques to determine the effect of processing applications on molecular structural changes of lipid biopolymer that mainly related to functional groups in green- and yellow-type Crop Development Centre (CDC) pea varieties [CDC strike (green-type) vs. CDC meadow (yellow-type)] that occurred during various processing applications; (2) relatively quantify the effect of processing applications on the antisymmetric CH3 ("CH3as") and CH2 ("CH2as") (ca. 2960 and 2923 cm(-1), respectively), symmetric CH3 ("CH3s") and CH2 ("CH2s") (ca. 2873 and 2954 cm(-1), respectively) functional groups and carbonyl C=O ester (ca. 1745 cm(-1)) spectral intensities as well as their ratios of antisymmetric CH3 to antisymmetric CH2 (ratio of CH3as to CH2as), ratios of symmetric CH3 to symmetric CH2 (ratio of CH3s to CH2s), and ratios of carbonyl C=O ester peak area to total CH peak area (ratio of C=O ester to CH); and (3) illustrate non-invasive techniques to detect the sensitivity of individual molecular functional group to the various processing applications in the recently developed different types of pea varieties. The hypothesis of this research was that processing applications modified the molecular structure profiles in the processed products as opposed to original unprocessed pea seeds. The results showed that the different processing methods had different impacts on lipid molecular functional groups. Different lipid functional groups had different sensitivity to various heat processing applications. These changes were detected by advanced molecular spectroscopy with chemometric techniques which may be highly related to lipid utilization and availability. The multivariate molecular spectral analyses, cluster analysis, and principal component analysis of original spectra (without spectral parameterization) are unable to fully distinguish the structural differences in the antisymmetric and symmetric CH3 and CH2 spectral region (ca. 3001-2799 cm(-1)) and carbonyl C=O ester band region (ca. 1771-1714 cm(-1)). This result indicated that the sensitivity to detect treatment difference by multivariate analysis of cluster analysis (CLA) and principal components analysis (PCA) might be lower compared with univariate molecular spectral analysis. In the future, other more sensitive techniques such as "discriminant analysis" could be considered for discriminating and classifying structural differences. Molecular spectroscopy can be used as non-invasive technique to study processing-induced structural changes that are related to lipid compound in legume seeds.

  1. FTIR microspectroscopy for rapid screening and monitoring of polyunsaturated fatty acid production in commercially valuable marine yeasts and protists.

    PubMed

    Vongsvivut, Jitraporn; Heraud, Philip; Gupta, Adarsha; Puri, Munish; McNaughton, Don; Barrow, Colin J

    2013-10-21

    The increase in polyunsaturated fatty acid (PUFA) consumption has prompted research into alternative resources other than fish oil. In this study, a new approach based on focal-plane-array Fourier transform infrared (FPA-FTIR) microspectroscopy and multivariate data analysis was developed for the characterisation of some marine microorganisms. Cell and lipid compositions in lipid-rich marine yeasts collected from the Australian coast were characterised in comparison to a commercially available PUFA-producing marine fungoid protist, thraustochytrid. Multivariate classification methods provided good discriminative accuracy evidenced from (i) separation of the yeasts from thraustochytrids and distinct spectral clusters among the yeasts that conformed well to their biological identities, and (ii) correct classification of yeasts from a totally independent set using cross-validation testing. The findings further indicated additional capability of the developed FPA-FTIR methodology, when combined with partial least squares regression (PLSR) analysis, for rapid monitoring of lipid production in one of the yeasts during the growth period, which was achieved at a high accuracy compared to the results obtained from the traditional lipid analysis based on gas chromatography. The developed FTIR-based approach when coupled to programmable withdrawal devices and a cytocentrifugation module would have strong potential as a novel online monitoring technology suited for bioprocessing applications and large-scale production.

  2. Integrated Application of Multivariate Statistical Methods to Source Apportionment of Watercourses in the Liao River Basin, Northeast China

    PubMed Central

    Chen, Jiabo; Li, Fayun; Fan, Zhiping; Wang, Yanjie

    2016-01-01

    Source apportionment of river water pollution is critical in water resource management and aquatic conservation. Comprehensive application of various GIS-based multivariate statistical methods was performed to analyze datasets (2009–2011) on water quality in the Liao River system (China). Cluster analysis (CA) classified the 12 months of the year into three groups (May–October, February–April and November–January) and the 66 sampling sites into three groups (groups A, B and C) based on similarities in water quality characteristics. Discriminant analysis (DA) determined that temperature, dissolved oxygen (DO), pH, chemical oxygen demand (CODMn), 5-day biochemical oxygen demand (BOD5), NH4+–N, total phosphorus (TP) and volatile phenols were significant variables affecting temporal variations, with 81.2% correct assignments. Principal component analysis (PCA) and positive matrix factorization (PMF) identified eight potential pollution factors for each part of the data structure, explaining more than 61% of the total variance. Oxygen-consuming organics from cropland and woodland runoff were the main latent pollution factor for group A. For group B, the main pollutants were oxygen-consuming organics, oil, nutrients and fecal matter. For group C, the evaluated pollutants primarily included oxygen-consuming organics, oil and toxic organics. PMID:27775679

  3. Anger Expression Types and Interpersonal Problems in Nurses.

    PubMed

    Han, Aekyung; Won, Jongsoon; Kim, Oksoo; Lee, Sang E

    2015-06-01

    The purpose of this study was to investigate the anger expression types in nurses and to analyze the differences between the anger expression types and interpersonal problems. The data were collected from 149 nurses working in general hospitals with 300 beds or more in Seoul or Gyeonggi province, Korea. For anger expression type, the anger expression scale from the Korean State-Trait Anger Expression Inventory was used. For interpersonal problems, the short form of the Korean Inventory of Interpersonal Problems Circumplex Scales was used. Data were analyzed using descriptive statistics, cluster analysis, multivariate analysis of variance, and Duncan's multiple comparisons test. Three anger expression types in nurses were found: low-anger expression, anger-in, and anger-in/control type. From the results of multivariate analysis of variance, there were significant differences between anger expression types and interpersonal problems (Wilks lambda F = 3.52, p < .001). Additionally, anger-in/control type was found to have the most difficulty with interpersonal problems by Duncan's post hoc test (p < .050). Based on this research, the development of an anger expression intervention program for nurses is recommended to establish the means of expressing the suppressed emotions, which would help the nurses experience less interpersonal problems. Copyright © 2015. Published by Elsevier B.V.

  4. Multivariate analysis, mass balance techniques, and statistical tests as tools in igneous petrology: application to the Sierra de las Cruces volcanic range (Mexican Volcanic Belt).

    PubMed

    Velasco-Tapia, Fernando

    2014-01-01

    Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures).

  5. Characterization of cytochrome c as marker for retinal cell degeneration by uv/vis spectroscopic imaging

    NASA Astrophysics Data System (ADS)

    Hollmach, Julia; Schweizer, Julia; Steiner, Gerald; Knels, Lilla; Funk, Richard H. W.; Thalheim, Silko; Koch, Edmund

    2011-07-01

    Retinal diseases like age-related macular degeneration have become an important cause of visual loss depending on increasing life expectancy and lifestyle habits. Due to the fact that no satisfying treatment exists, early diagnosis and prevention are the only possibilities to stop the degeneration. The protein cytochrome c (cyt c) is a suitable marker for degeneration processes and apoptosis because it is a part of the respiratory chain and involved in the apoptotic pathway. The determination of the local distribution and oxidative state of cyt c in living cells allows the characterization of cell degeneration processes. Since cyt c exhibits characteristic absorption bands between 400 and 650 nm wavelength, uv/vis in situ spectroscopic imaging was used for its characterization in retinal ganglion cells. The large amount of data, consisting of spatial and spectral information, was processed by multivariate data analysis. The challenge consists in the identification of the molecular information of cyt c. Baseline correction, principle component analysis (PCA) and cluster analysis (CA) were performed in order to identify cyt c within the spectral dataset. The combination of PCA and CA reveals cyt c and its oxidative state. The results demonstrate that uv/vis spectroscopic imaging in conjunction with sophisticated multivariate methods is a suitable tool to characterize cyt c under in situ conditions.

  6. The Statistical Consulting Center for Astronomy (SCCA)

    NASA Technical Reports Server (NTRS)

    Akritas, Michael

    2001-01-01

    The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.

  7. Multivariate Genetic Correlates of the Auditory Paired Stimuli-Based P2 Event-Related Potential in the Psychosis Dimension From the BSNIP Study.

    PubMed

    Mokhtari, Mohammadreza; Narayanan, Balaji; Hamm, Jordan P; Soh, Pauline; Calhoun, Vince D; Ruaño, Gualberto; Kocherla, Mohan; Windemuth, Andreas; Clementz, Brett A; Tamminga, Carol A; Sweeney, John A; Keshavan, Matcheri S; Pearlson, Godfrey D

    2016-05-01

    The complex molecular etiology of psychosis in schizophrenia (SZ) and psychotic bipolar disorder (PBP) is not well defined, presumably due to their multifactorial genetic architecture. Neurobiological correlates of psychosis can be identified through genetic associations of intermediate phenotypes such as event-related potential (ERP) from auditory paired stimulus processing (APSP). Various ERP components of APSP are heritable and aberrant in SZ, PBP and their relatives, but their multivariate genetic factors are less explored. We investigated the multivariate polygenic association of ERP from 64-sensor auditory paired stimulus data in 149 SZ, 209 PBP probands, and 99 healthy individuals from the multisite Bipolar-Schizophrenia Network on Intermediate Phenotypes study. Multivariate association of 64-channel APSP waveforms with a subset of 16 999 single nucleotide polymorphisms (SNPs) (reduced from 1 million SNP array) was examined using parallel independent component analysis (Para-ICA). Biological pathways associated with the genes were assessed using enrichment-based analysis tools. Para-ICA identified 2 ERP components, of which one was significantly correlated with a genetic network comprising multiple linearly coupled gene variants that explained ~4% of the ERP phenotype variance. Enrichment analysis revealed epidermal growth factor, endocannabinoid signaling, glutamatergic synapse and maltohexaose transport associated with P2 component of the N1-P2 ERP waveform. This ERP component also showed deficits in SZ and PBP. Aberrant P2 component in psychosis was associated with gene networks regulating several fundamental biologic functions, either general or specific to nervous system development. The pathways and processes underlying the gene clusters play a crucial role in brain function, plausibly implicated in psychosis. © The Author 2015. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  8. Characteristics of Brazilian Offenders and Victims of Interpersonal Violence: An Exploratory Study.

    PubMed

    d'Avila, Sérgio; Campos, Ana Cristina; Bernardino, Ítalo de Macedo; Cavalcante, Gigliana Maria Sobral; Nóbrega, Lorena Marques da; Ferreira, Efigênia Ferreira E

    2016-10-01

    The aim of this study was to characterize the profile of Brazilian offenders and victims of interpersonal violence, following a medicolegal and forensic perspective. A cross-sectional and exploratory study was performed in a Center of Forensic Medicine and Dentistry. The sample was made up of 1,704 victims of nonlethal interpersonal violence with some type of trauma. The victims were subject to forensic examinations by a criminal investigative team that identified and recorded the extent of the injuries. For data collection, a specific form was designed consisting of four parts according to the information provided in the medicolegal and social records: sociodemographic data of the victims, offender's characteristics, aggression characteristics, and types of injuries. Descriptive and multivariate statistics using cluster analysis (CA) were performed. The two-step cluster method was used to characterize the profile of the victims and offenders. Most of the events occurred during the nighttime (50.9%) and on weekdays (66.3%). Soft tissue injuries were the most prevalent type (94.6%). Based on the CA results, two clusters for the victims and two for the offenders were identified. Victims: Cluster 1 was formed typically by women, aged 30 to 59 years, and married; Cluster 2 was composed of men, aged 20 to 29 years, and unmarried. Offenders: Cluster 1 was characterized by men, who perpetrated violence in a community environment. Cluster 2 was formed by men, who perpetrated violence in the familiar environment. These findings revealed different risk groups with distinct characteristics for both victims and offenders, allowing the planning of targeted measures of care, prevention, and health promotion. This study assesses the profile of violence through morbidity data and significantly contributes to building an integrated system of health surveillance in Brazil, as well as linking police stations, forensic services, and emergency hospitals.

  9. Characterizing backcountry camping impacts in Great Smoky Mountains National Park

    USGS Publications Warehouse

    Leung, Y.-F.; Marion, J.L.

    1999-01-01

    This investigates resource impacts on backcounty campsites in the Great Smoky Mountains National Park, USA. Study objectives were to enhance our understanding of camping impacts and to improve campsite impact assessment procedures by means of multivariate techniques. Three-hundred and eight campsites at designated backcountry campgrounds, and 69 additional unofficial campsites were assessed. Factor analysis of 195 established campsites on eight impact indicator variables revealed three dimensions of campsite impact: area disturbance, soil and groundcover damage, and tree-related damage. Four distinctive backcountry campsite types were identified, three of which were derived from cluster analyses of factor scores. These four backcountry campsite types characterize the intensity and areal extent of resource impacts, and they vary in locational and environmental attributes. At an aggregate level, different campsite types contributed unequally to the cumulative level of impact. The dimensional structure and typology developed in this study demonstrates that campsite impacts can be viewed and examined holistically with the use of multivariate methods. Implications for assessment procedures, management and further research are discussed.

  10. Craters on Earth, Moon, and Mars: Multivariate classification and mode of origin

    USGS Publications Warehouse

    Pike, R.J.

    1974-01-01

    Testing extraterrestrial craters and candidate terrestrial analogs for morphologic similitude is treated as a problem in numerical taxonomy. According to a principal-components solution and a cluster analysis, 402 representative craters on the Earth, the Moon, and Mars divide into two major classes of contrasting shapes and modes of origin. Craters of net accumulation of material (cratered lunar domes, Martian "calderas," and all terrestrial volcanoes except maars and tuff rings) group apart from craters of excavation (terrestrial meteorite impact and experimental explosion craters, typical Martian craters, and all other lunar craters). Maars and tuff rings belong to neither group but are transitional. The classification criteria are four independent attributes of topographic geometry derived from seven descriptive variables by the principal-components transformation. Morphometric differences between crater bowl and raised rim constitute the strongest of the four components. Although single topographic variables cannot confidently predict the genesis of individual extraterrestrial craters, multivariate statistical models constructed from several variables can distinguish consistently between large impact craters and volcanoes. ?? 1974.

  11. Chemical modeling of groundwater in the Banat Plain, southwestern Romania, with elevated As content and co-occurring species by combining diagrams and unsupervised multivariate statistical approaches.

    PubMed

    Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu

    2017-04-01

    The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis

    PubMed Central

    2013-01-01

    Background Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Results Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. Conclusion There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic contamination in terrestrial ecosystems exposed to different kinds of anthropic polution. PMID:23987502

  13. Rapid differentiation of Ghana cocoa beans by FT-NIR spectroscopy coupled with multivariate classification

    NASA Astrophysics Data System (ADS)

    Teye, Ernest; Huang, Xingyi; Dai, Huang; Chen, Quansheng

    2013-10-01

    Quick, accurate and reliable technique for discrimination of cocoa beans according to geographical origin is essential for quality control and traceability management. This current study presents the application of Near Infrared Spectroscopy technique and multivariate classification for the differentiation of Ghana cocoa beans. A total of 194 cocoa bean samples from seven cocoa growing regions were used. Principal component analysis (PCA) was used to extract relevant information from the spectral data and this gave visible cluster trends. The performance of four multivariate classification methods: Linear discriminant analysis (LDA), K-nearest neighbors (KNN), Back propagation artificial neural network (BPANN) and Support vector machine (SVM) were compared. The performances of the models were optimized by cross validation. The results revealed that; SVM model was superior to all the mathematical methods with a discrimination rate of 100% in both the training and prediction set after preprocessing with Mean centering (MC). BPANN had a discrimination rate of 99.23% for the training set and 96.88% for prediction set. While LDA model had 96.15% and 90.63% for the training and prediction sets respectively. KNN model had 75.01% for the training set and 72.31% for prediction set. The non-linear classification methods used were superior to the linear ones. Generally, the results revealed that NIR Spectroscopy coupled with SVM model could be used successfully to discriminate cocoa beans according to their geographical origins for effective quality assurance.

  14. Detecting spatial regimes in ecosystems | Science Inventory ...

    EPA Pesticide Factsheets

    Research on early warning indicators has generally focused on assessing temporal transitions with limited application of these methods to detecting spatial regimes. Traditional spatial boundary detection procedures that result in ecoregion maps are typically based on ecological potential (i.e. potential vegetation), and often fail to account for ongoing changes due to stressors such as land use change and climate change and their effects on plant and animal communities. We use Fisher information, an information theory based method, on both terrestrial and aquatic animal data (US Breeding Bird Survey and marine zooplankton) to identify ecological boundaries, and compare our results to traditional early warning indicators, conventional ecoregion maps, and multivariate analysis such as nMDS (non-metric Multidimensional Scaling) and cluster analysis. We successfully detect spatial regimes and transitions in both terrestrial and aquatic systems using Fisher information. Furthermore, Fisher information provided explicit spatial information about community change that is absent from other multivariate approaches. Our results suggest that defining spatial regimes based on animal communities may better reflect ecological reality than do traditional ecoregion maps, especially in our current era of rapid and unpredictable ecological change. Use an information theory based method to identify ecological boundaries and compare our results to traditional early warning

  15. [Violence and post-traumatic stress disorder in childhood].

    PubMed

    Ximenes, Liana Furtado; de Oliveira, Raquel de Vasconcelos Carvalhães; de Assis, Simone Gonçalves

    2009-01-01

    This study presents the prevalence of symptoms of Posttraumatic Stress Disorder (PTSD) in 500 schoolchildren (6-13 years old) in São Gonçalo, Rio de Janeiro. It also investigates the association between PTSD, violence and other adverse events in the lives of these children. The multi-stage cluster sampling strategy involved three selection stages. Parents were interviewed about their children's behavior. The instrument used to screen symptoms of PTSD was the Child Behavior Checklist-Posttraumatic Stress Disorder Scale (CBCL-PTSD). Conflict Tactics Scales (CTS) were applied to evaluate family violence and other scales to investigate the socioeconomic profile, familiar relationship, characteristics and adverse events in the lives of the children. Multivariate analysis was performed using a hierarchical model with a significance level of 5%. The prevalence of clinical symptoms of PTSD was of 6.5%. The multivariate analysis suggested an explanation model of PTSD characterized by 18 variables, such as the child's characteristics; specific life events; family violence; and other family factors. The results reveal that it is necessary to work with the child in particularly difficult moments of his/her life in order to prevent or minimize the impact of adverse events on their mental and social functioning.

  16. Objective classification of ecological status in marine water bodies using ecotoxicological information and multivariate analysis.

    PubMed

    Beiras, Ricardo; Durán, Iria

    2014-12-01

    Some relevant shortcomings have been identified in the current approach for the classification of ecological status in marine water bodies, leading to delays in the fulfillment of the Water Framework Directive objectives. Natural variability makes difficult to settle fixed reference values and boundary values for the Ecological Quality Ratios (EQR) for the biological quality elements. Biological responses to environmental degradation are frequently of nonmonotonic nature, hampering the EQR approach. Community structure traits respond only once ecological damage has already been done and do not provide early warning signals. An alternative methodology for the classification of ecological status integrating chemical measurements, ecotoxicological bioassays and community structure traits (species richness and diversity), and using multivariate analyses (multidimensional scaling and cluster analysis), is proposed. This approach does not depend on the arbitrary definition of fixed reference values and EQR boundary values, and it is suitable to integrate nonlinear, sensitive signals of ecological degradation. As a disadvantage, this approach demands the inclusion of sampling sites representing the full range of ecological status in each monitoring campaign. National or international agencies in charge of coastal pollution monitoring have comprehensive data sets available to overcome this limitation.

  17. Merging metagenomics and geochemistry reveals environmental controls on biological diversity and evolution.

    PubMed

    Alsop, Eric B; Boyd, Eric S; Raymond, Jason

    2014-05-28

    The metabolic strategies employed by microbes inhabiting natural systems are, in large part, dictated by the physical and geochemical properties of the environment. This study sheds light onto the complex relationship between biology and environmental geochemistry using forty-three metagenomes collected from geochemically diverse and globally distributed natural systems. It is widely hypothesized that many uncommonly measured geochemical parameters affect community dynamics and this study leverages the development and application of multidimensional biogeochemical metrics to study correlations between geochemistry and microbial ecology. Analysis techniques such as a Markov cluster-based measure of the evolutionary distance between whole communities and a principal component analysis (PCA) of the geochemical gradients between environments allows for the determination of correlations between microbial community dynamics and environmental geochemistry and provides insight into which geochemical parameters most strongly influence microbial biodiversity. By progressively building from samples taken along well defined geochemical gradients to samples widely dispersed in geochemical space this study reveals strong links between the extent of taxonomic and functional diversification of resident communities and environmental geochemistry and reveals temperature and pH as the primary factors that have shaped the evolution of these communities. Moreover, the inclusion of extensive geochemical data into analyses reveals new links between geochemical parameters (e.g. oxygen and trace element availability) and the distribution and taxonomic diversification of communities at the functional level. Further, an overall geochemical gradient (from multivariate analyses) between natural systems provides one of the most complete predictions of microbial taxonomic and functional composition. Clustering based on the frequency in which orthologous proteins occur among metagenomes facilitated accurate prediction of the ordering of community functional composition along geochemical gradients, despite a lack of geochemical input. The consistency in the results obtained from the application of Markov clustering and multivariate methods to distinct natural systems underscore their utility in predicting the functional potential of microbial communities within a natural system based on system geochemistry alone, allowing geochemical measurements to be used to predict purely biological metrics such as microbial community composition and metabolism.

  18. Assessment of the Eutrophication-Related Environmental Parameters in Two Mediterranean Lakes by Integrating Statistical Techniques and Self-Organizing Maps

    PubMed Central

    Stefanidis, Konstantinos; Papatheodorou, George

    2018-01-01

    During the last decades, Mediterranean freshwater ecosystems, especially lakes, have been under severe pressure due to increasing eutrophication and water quality deterioration. In this article, we compared the effectiveness of different data analysis methods by assessing the contribution of environmental parameters to eutrophication processes. For this purpose, principal components analysis (PCA), cluster analysis, and a self-organizing map (SOM) were applied, using water quality data from two transboundary lakes of North Greece. SOM is considered as an advanced and powerful data analysis tool because of its ability to represent complex and nonlinear relationships among multivariate data sets. The results of PCA and cluster analysis agreed with the SOM results, although the latter provided more information because of the visualization abilities regarding the parameters’ relationships. Besides nutrients that were found to be a key factor for controlling chlorophyll-a (Chl-a), water temperature was related positively with algal production, while the Secchi disk depth parameter was found to be highly important and negatively related toeutrophic conditions. In general, the SOM results were more specific and allowed direct associations between the water quality variables. Our work showed that SOMs can be used effectively in limnological studies to produce robust and interpretable results, aiding scientists and managers to cope with environmental problems such as eutrophication. PMID:29562675

  19. Spatio-temporal variability of hydro-chemical characteristics of coastal waters of Gulf of Mannar Marine Biosphere Reserve (GoMMBR), South India

    NASA Astrophysics Data System (ADS)

    Kathiravan, K.; Natesan, Usha; Vishnunath, R.

    2017-03-01

    The intention of this study was to appraise the spatial and temporal variations in the physico-chemical parameters of coastal waters of Rameswaram Island, Gulf of Mannar Marine Biosphere Reserve, south India, using multivariate statistical techniques, such as cluster analysis, factor analysis and principal component analysis. Spatio-temporal variations among the physico-chemical parameters are observed in the coastal waters of Gulf of Mannar, especially during northeast and post monsoon seasons. It is inferred that the high loadings of pH, temperature, suspended particulate matter, salinity, dissolved oxygen, biochemical oxygen demand, chlorophyll a, nutrient species of nitrogen and phosphorus strongly determine the discrimination of coastal water quality. Results highlight the important role of monsoonal variations to determine the coastal water quality around Rameswaram Island.

  20. Lipophilicity of oils and fats estimated by TLC.

    PubMed

    Naşcu-Briciu, Rodica D; Sârbu, Costel

    2013-04-01

    A representative series of natural toxins belonging to alkaloids and mycotoxins classes was investigated by TLC on classical chemically bonded plates and also on oils- and fats-impregnated plates. Their lipophilicity indices are employed in the characterization and comparison of oils and fats. The retention results allowed an accurate indirect estimation of oils and fats lipophilicity. The investigated fats and oils near classical chemically bonded phases are classified and compared by means of multivariate exploratory techniques, such as cluster analysis, principal component analysis, or fuzzy-principal component analysis. Additionally, a concrete hierarchy of oils and fats derived from the observed lipophilic character is suggested. Human fat seems to be very similar to animal fats, but also possess RP-18, RP-18W, and RP-8. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. An Updated Review of Meat Authenticity Methods and Applications.

    PubMed

    Vlachos, Antonios; Arvanitoyannis, Ioannis S; Tserkezou, Persefoni

    2016-05-18

    Adulteration of foods is a serious economic problem concerning most foodstuffs, and in particular meat products. Since high-priced meat demand premium prices, producers of meat-based products might be tempted to blend these products with lower cost meat. Moreover, the labeled meat contents may not be met. Both types of adulteration are difficult to detect and lead to deterioration of product quality. For the consumer, it is of outmost importance to guarantee both authenticity and compliance with product labeling. The purpose of this article is to review the state of the art of meat authenticity with analytical and immunochemical methods with the focus on the issue of geographic origin and sensory characteristics. This review is also intended to provide an overview of the various currently applied statistical analyses (multivariate analysis (MAV), such as principal component analysis, discriminant analysis, cluster analysis, etc.) and their effectiveness for meat authenticity.

  2. Social network type and morale in old age.

    PubMed

    Litwin, H

    2001-08-01

    The aim of this research was to derive network types among an elderly population and to examine the relationship of network type to morale. Secondary analysis of data compiled by the Israeli Central Bureau of Statistics (n = 2,079) was employed, and network types were derived through K-means cluster analysis. Respondents' morale scores were regressed on network types, controlling for background and health variables. Five network types were derived. Respondents in diverse or friends networks reported the highest morale; those in exclusively family or restricted networks had the lowest. Multivariate regression analysis underscored that certain network types were second among the study variables in predicting respondents' morale, preceded only by disability level (Adjusted R(2) =.41). Classification of network types allows consideration of the interpersonal environments of older people in relation to outcomes of interest. The relative effects on morale of elective versus obligated social ties, evident in the current analysis, is a case in point.

  3. Adulteration and cultivation region identification of American ginseng using HPLC coupled with multivariate analysis

    PubMed Central

    Yu, Chunhao; Wang, Chong-Zhi; Zhou, Chun-Jie; Wang, Bin; Han, Lide; Zhang, Chun-Feng; Wu, Xiao-Hui; Yuan, Chun-Su

    2014-01-01

    American ginseng (Panax quinquefolius) is originally grown in North America. Due to price difference and supply shortage, American ginseng recently has been cultivated in northern China. Further, in the market, some Asian ginsengs are labeled as American ginseng. In this study, forty-three American ginseng samples cultivated in the USA, Canada or China were collected and 14 ginseng saponins were determined using HPLC. HPLC coupled with hierarchical cluster analysis and principal component analysis was developed to identify the species. Subsequently, an HPLC-linear discriminant analysis was established to discriminate cultivation regions of American ginseng. This method was successfully applied to identify the sources of 6 commercial American ginseng samples. Two of them were identified as Asian ginseng, while 4 others were identified as American ginseng, which were cultivated in the USA (3) and China (1). Our newly developed method can be used to identify American ginseng with different cultivation regions. PMID:25044150

  4. Evaluation of the environmental contamination at an abandoned mining site using multivariate statistical techniques--the Rodalquilar (Southern Spain) mining district.

    PubMed

    Bagur, M G; Morales, S; López-Chicano, M

    2009-11-15

    Unsupervised and supervised pattern recognition techniques such as hierarchical cluster analysis, principal component analysis, factor analysis and linear discriminant analysis have been applied to water samples recollected in Rodalquilar mining district (Southern Spain) in order to identify different sources of environmental pollution caused by the abandoned mining industry. The effect of the mining activity on waters was monitored determining the concentration of eleven elements (Mn, Ba, Co, Cu, Zn, As, Cd, Sb, Hg, Au and Pb) by inductively coupled plasma mass spectrometry (ICP-MS). The Box-Cox transformation has been used to transform the data set in normal form in order to minimize the non-normal distribution of the geochemical data. The environmental impact is affected mainly by the mining activity developed in the zone, the acid drainage and finally by the chemical treatment used for the benefit of gold.

  5. M-DAS: System for multispectral data analysis. [in Saginaw Bay, Michigan

    NASA Technical Reports Server (NTRS)

    Johnson, R. H.

    1975-01-01

    M-DAS is a ground data processing system designed for analysis of multispectral data. M-DAS operates on multispectral data from LANDSAT, S-192, M2S and other sources in CCT form. Interactive training by operator-investigators using a variable cursor on a color display was used to derive optimum processing coefficients and data on cluster separability. An advanced multivariate normal-maximum likelihood processing algorithm was used to produce output in various formats: color-coded film images, geometrically corrected map overlays, moving displays of scene sections, coverage tabulations and categorized CCTs. The analysis procedure for M-DAS involves three phases: (1) screening and training, (2) analysis of training data to compute performance predictions and processing coefficients, and (3) processing of multichannel input data into categorized results. Typical M-DAS applications involve iteration between each of these phases. A series of photographs of the M-DAS display are used to illustrate M-DAS operation.

  6. Prognostic Subcellular Notch2, Notch3 and Jagged1 Localization Patterns in Early Triple-negative Breast Cancer.

    PubMed

    Strati, Titika-Marina; Kotoula, Vassiliki; Kostopoulos, Ioannis; Manousou, Kyriaki; Papadimitriou, Christos; Lazaridis, Georgios; Lakis, Sotiris; Pentheroudakis, George; Pectasides, Dimitrios; Pazarli, Elissavet; Christodoulou, Christos; Razis, Evangelia; Pavlakis, Kitty; Magkou, Christina; Chrisafi, Sofia; Aravantinos, Gerasimos; Bafaloukos, Dimitrios; Papakostas, Pavlos; Gogas, Helen; Kalogeras, Konstantine T; Fountzilas, George

    2017-05-01

    The Notch pathway has been implicated in triple-negative breast cancer (TNBC). Herein, we studied the subcellular localization of the less investigated Notch2 and Notch3 and that of the Jagged1 (Jag1) ligand in patients with operable TNBC. We applied immunohistochemistry for Notch2, Notch3 and Jag1 in 333 tumors from TNBC patients treated with adjuvant anthracycline-based chemotherapy. We evaluated cytoplasmic (c), membranous (m) and nuclear (n) protein localization. c-Notch2 (35% positive tumors), c-Notch3 (63%), c-Jag1 (43%), m-Notch3 (23%) and n-Jag1 (17%) were analyzed individually and by using hierarchical clustering for prognostic evaluation. Upon multivariate analysis, compared to high m-Notch3 in the absence of n-Jag1 (cluster 4), all other marker combinations (clusters 1, 2, 3) conferred significantly higher risk for relapse (p<0.05). Specific Notch3 and Jag1 subcellular localization patterns may provide clues for the behavior of the tumors and potentially for Jag1 targeting in TNBC patients. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.

  7. Assessment of heavy metals contamination in sediments from three adjacent regions of the Yellow River using metal chemical fractions and multivariate analysis techniques.

    PubMed

    Ma, Xiaoling; Zuo, Hang; Tian, Mengjing; Zhang, Liyang; Meng, Jia; Zhou, Xuening; Min, Na; Chang, Xinyuan; Liu, Ying

    2016-02-01

    Metal chemical fractions obtained by optimized BCR three-stage extraction procedure and multivariate analysis techniques were exploited for assessing 7 heavy metals (Cr, Pb, Cd, Co, Cu, Zn and Ni) in sediments from Gansu province, Ningxia and Inner Mongolia Autonomous Regions of the Yellow River in Northern China. The results indicated that higher susceptibility and bioavailability of Cr and Cd with a strong anthropogenic source were due to their higher availability in the exchangeable fraction. A portion of Pb, Cd, Co, Zn, and Ni in reducible fraction may be due to the fact that they can form stable complexes with Fe and Mn oxides. Substantial amount of Pb, Co, Ni and Cu was observed as oxidizable fraction because of their strong affinity to the organic matters so that they can complex with humic substances in sediments. The high geo-accumulation indexes (I(geo)) for Cr and Cd showed their higher environmental risk to the aquatic biota. Principal component analysis (PCA) revealed that high toxic Cr and Cd in polluted sites (Cd in S10, S11 and Cr in S13) may be contributed to anthropogenic sources, it was consistent with the results of dual hierarchical clustering analysis (DHCA), which could give more details about contributing sources. Copyright © 2015 Elsevier Ltd. All rights reserved.

  8. Risk Factors of Porcine Cysticercosis in the Eastern Cape Province, South Africa

    PubMed Central

    Krecek, Rosina Claudia; Mohammed, Hamish; Michael, Lynne Margaret; Schantz, Peter Mullineaux; Ntanjana, Lulama; Morey, Liesl; Werre, Stephen Rakem; Willingham, Arve Lee

    2012-01-01

    There is a high prevalence of Taenia solium taeniosis/cysticercosis in humans and pigs in the Eastern Cape Province (ECP) of South Africa. The objective of this study was to identify risk factors of porcine cysticercosis in select districts of the ECP. Data were collected in 2003 by interviewing 217 pig producers from the area. Blood samples were collected from 261 of their pigs, which were tested using two enzyme-linked immunosorbent assays (ELISA) for the presence of antibodies to cysticercosis. Frequencies of both owner- and pig-level characteristics were determined. For pig-level analysis, all bivariable and multivariable associations were determined using the surveylogistic procedure of the SAS/STAT® software to accommodate for the intraclass correlation that exists for clusters of pigs within one owner and for clusters of owners within a district. All tests for significance were performed at the α = 0.05 level, and adjusted odds ratios (aOR) and 95% confidence intervals (CI) were determined. Among the respondents, 48% of their households lacked a latrine, 98% slaughtered pigs at home, and 99% indicated that meat inspection services were not available. On bivariable analysis, there was a significant association between porcine infection and district (p = 0.003), breed (p = 0.041) and the absence of a latrine (p = 0.006). On multivariable analysis, the absence of a latrine was the only variable significantly associated with porcine infection (aOR = 1.89; 95% CI = 1.07, 3.35) (p = 0.028). The increased odds of porcine infection with households lacking a latrine contributes to our understanding of the transmission of this parasite in the ECP. Determining and addressing the risk factors for T. solium infection can potentially lower the very high prevalence in humans and pigs in this endemic area. PMID:22655065

  9. Copula based flexible modeling of associations between clustered event times.

    PubMed

    Geerdens, Candida; Claeskens, Gerda; Janssen, Paul

    2016-07-01

    Multivariate survival data are characterized by the presence of correlation between event times within the same cluster. First, we build multi-dimensional copulas with flexible and possibly symmetric dependence structures for such data. In particular, clustered right-censored survival data are modeled using mixtures of max-infinitely divisible bivariate copulas. Second, these copulas are fit by a likelihood approach where the vast amount of copula derivatives present in the likelihood is approximated by finite differences. Third, we formulate conditions for clustered right-censored survival data under which an information criterion for model selection is either weakly consistent or consistent. Several of the familiar selection criteria are included. A set of four-dimensional data on time-to-mastitis is used to demonstrate the developed methodology.

  10. Prognosis of chronic lymphocytic leukemia from infrared spectra of lymphocytes

    NASA Astrophysics Data System (ADS)

    Schultz, Christian P.; Liu, Kan-Zhi; Johnston, James B.; Mantsch, Henry H.

    1997-06-01

    Peripheral mononuclear cells obtained from blood of normal individuals and from patients with chronic lymphocytic leukemia (CLL) were investigated by infrared spectroscopy and multivariate statistical analysis. Not only are the spectra of CLL cells different from those of normal cells, but hierarchical clustering also separated the CLL cells into a number of subclusters, based on their different DNA content, a fact which may provide a useful diagnostic tool for staging (progression of the disease) and multiple clone detection. Moreover, there is evidence for a correlation between the increased amount of DNA in the CLL cells and the in-vivo doubling time of the lymphocytes in a given patient.

  11. Arsenic distribution and valence state variation studied by fast hierarchical length-scale morphological, compositional, and speciation imaging at the Nanoscopium, Synchrotron Soleil

    NASA Astrophysics Data System (ADS)

    Somogyi, Andrea; Medjoubi, Kadda; Sancho-Tomas, Maria; Visscher, P. T.; Baranton, Gil; Philippot, Pascal

    2017-09-01

    The understanding of real complex geological, environmental and geo-biological processes depends increasingly on in-depth non-invasive study of chemical composition and morphology. In this paper we used scanning hard X-ray nanoprobe techniques in order to study the elemental composition, morphology and As speciation in complex highly heterogeneous geological samples. Multivariate statistical analytical techniques, such as principal component analysis and clustering were used for data interpretation. These measurements revealed the quantitative and valance state inhomogeneity of As and its relation to the total compositional and morphological variation of the sample at sub-μm scales.

  12. Subgroups Among Opiate Addicts

    ERIC Educational Resources Information Center

    Berzins, Juris I.; And Others

    1974-01-01

    The principal objective of the present investigation was to delineate homogeneous MMPI profile subgroups (types) through multivariate clustering procedures and to compare the derived (replicable) types on measures of the components of "sociopathy" as well as on other psychometric devices. (Author)

  13. Genetic Variability among Lucerne Cultivars Based on Biochemical (SDS-PAGE) and Morphological Markers

    NASA Astrophysics Data System (ADS)

    Farshadfar, M.; Farshadfar, E.

    The present research was conducted to determine the genetic variability of 18 Lucerne cultivars, based on morphological and biochemical markers. The traits studied were plant height, tiller number, biomass, dry yield, dry yield/biomass, dry leaf/dry yield, macro and micro elements, crude protein, dry matter, crude fiber and ash percentage and SDS- PAGE in seed and leaf samples. Field experiments included 18 plots of two meter rows. Data based on morphological, chemical and SDS-PAGE markers were analyzed using SPSSWIN soft ware and the multivariate statistical procedures: cluster analysis (UPGMA), principal component. Analysis of analysis of variance and mean comparison for morphological traits reflected significant differences among genotypes. Genotype 13 and 15 had the greatest values for most traits. The Genotypic Coefficient of Variation (GCV), Phenotypic Coefficient of Variation (PCV) and Heritability (Hb) parameters for different characters raged from 12.49 to 26.58% for PCV, hence the GCV ranged from 6.84 to 18.84%. The greatest value of Hb was 0.94 for stem number. Lucerne genotypes could be classified, based on morphological traits, into four clusters and 94% of the variance among the genotypes was explained by two PCAs: Based on chemical traits they were classified into five groups and 73.492% of variance was explained by four principal components: Dry matter, protein, fiber, P, K, Na, Mg and Zn had higher variance. Genotypes based on the SDS-PAGE patterns all genotypes were classified into three clusters. The greatest genetic distance was between cultivar 10 and others, therefore they would be suitable parent in a breeding program.

  14. ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap

    PubMed Central

    Metsalu, Tauno; Vilo, Jaak

    2015-01-01

    The Principal Component Analysis (PCA) is a widely used method of reducing the dimensionality of high-dimensional data, often followed by visualizing two of the components on the scatterplot. Although widely used, the method is lacking an easy-to-use web interface that scientists with little programming skills could use to make plots of their own data. The same applies to creating heatmaps: it is possible to add conditional formatting for Excel cells to show colored heatmaps, but for more advanced features such as clustering and experimental annotations, more sophisticated analysis tools have to be used. We present a web tool called ClustVis that aims to have an intuitive user interface. Users can upload data from a simple delimited text file that can be created in a spreadsheet program. It is possible to modify data processing methods and the final appearance of the PCA and heatmap plots by using drop-down menus, text boxes, sliders etc. Appropriate defaults are given to reduce the time needed by the user to specify input parameters. As an output, users can download PCA plot and heatmap in one of the preferred file formats. This web server is freely available at http://biit.cs.ut.ee/clustvis/. PMID:25969447

  15. Individual and couple-level risk factors for hepatitis C infection among heterosexual drug users: a multilevel dyadic analysis.

    PubMed

    McMahon, James M; Pouget, Enrique R; Tortu, Stephanie

    2007-06-01

    Hepatitis C virus (HCV) is the most common bloodborne pathogen in the United States and is a leading cause of liver-related morbidity and mortality. Although it is known that HCV is most commonly transmitted among injection drug users, the role of sexual transmission in the spread of HCV remains controversial because of inconsistent findings across studies involving heterosexual couples. A novel multilevel modeling technique designed to overcome the limitations of previous research was performed to assess multiple risk factors for HCV while partitioning the source of risk at the individual and couple level. The analysis was performed on risk exposure and HCV screening data obtained from 265 drug-using couples in East Harlem, New York City. In multivariable analysis, significant individual risk factors for HCV included a history of injection drug use, tattooing, and older age. At the couple level, HCV infection tended to cluster within couples, and this interdependence was accounted for by couples' drug-injection behavior. Individual and couple-level sexual behavior was not associated with HCV infection. Our results are consistent with prior research indicating that sexual contact plays little role in HCV transmission. Rather, couples' injection behavior appears to account for the clustering of HCV within heterosexual dyads.

  16. An improved optimization algorithm and Bayes factor termination criterion for sequential projection pursuit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Webb-Robertson, Bobbie-Jo M.; Jarman, Kristin H.; Harvey, Scott D.

    2005-05-28

    A fundamental problem in analysis of highly multivariate spectral or chromatographic data is reduction of dimensionality. Principal components analysis (PCA), concerned with explaining the variance-covariance structure of the data, is a commonly used approach to dimension reduction. Recently an attractive alternative to PCA, sequential projection pursuit (SPP), has been introduced. Designed to elicit clustering tendencies in the data, SPP may be more appropriate when performing clustering or classification analysis. However, the existing genetic algorithm (GA) implementation of SPP has two shortcomings, computation time and inability to determine the number of factors necessary to explain the majority of the structure inmore » the data. We address both these shortcomings. First, we introduce a new SPP algorithm, a random scan sampling algorithm (RSSA), that significantly reduces computation time. We compare the computational burden of the RSS and GA implementation for SPP on a dataset containing Raman spectra of twelve organic compounds. Second, we propose a Bayes factor criterion, BFC, as an effective measure for selecting the number of factors needed to explain the majority of the structure in the data. We compare SPP to PCA on two datasets varying in type, size, and difficulty; in both cases SPP achieves a higher accuracy with a lower number of latent variables.« less

  17. The relationship between leadership, teamworking, structure, burnout and attitude to patients on acute psychiatric wards

    PubMed Central

    Nijman, Henk; Simpson, Alan; Jones, Julia

    2010-01-01

    Background Conflict (aggression, substance use, absconding, etc.) and containment (coerced medication, manual restraint, etc.) threaten the safety of patients and staff on psychiatric wards. Previous work has suggested that staff variables may be significant in explaining differences between wards in their rates of these behaviours, and that structure (ward organisation, rules and daily routines) might be the most critical of these. This paper describes the exploration of a large dataset to assess the relationship between structure and other staff variables. Methods A multivariate cross-sectional design was utilised. Data were collected from staff on 136 acute psychiatric wards in 26 NHS Trusts in England, measuring leadership, teamwork, structure, burnout and attitudes towards difficult patients. Relationships between these variables were explored through principal components analysis (PCA), structural equation modelling and cluster analysis. Results Principal components analysis resulted in the identification of each questionnaire as a separate factor, indicating that the selected instruments assessed a number of non-overlapping items relevant for ward functioning. Structural equation modelling suggested a linear model in which leadership influenced teamwork, teamwork structure; structure burnout; and burnout feelings about difficult patients. Finally, cluster analysis identified two significantly distinct groups of wards: the larger of which had particularly good leadership, teamwork, structure, attitudes towards patients and low burnout; and the second smaller proportion which was poor on all variables and high on burnout. The better functioning cluster of wards had significantly lower rates of containment events. Conclusion The overall performance of staff teams is associated with differing rates of containment on wards. Interventions to reduce rates of containment on wards may need to address staff issues at every level, from leadership through to staff attitudes. PMID:20082064

  18. Nationwide analysis on the impact of socioeconomic land use factors and incidence of urothelial carcinoma.

    PubMed

    Brandt, Maximilian P; Gust, Kilian M; Mani, Jens; Vallo, Stefan; Höfner, Thomas; Borgmann, Hendrik; Tsaur, Igor; Thomas, Christian; Haferkamp, Axel; Herrmann, Eva; Bartsch, Georg

    2018-02-01

    Incidence rates for urothelial carcinoma (UC) have been reported to differ between countries within the European Union (EU). Besides occupational exposure to chemicals, other substances such as tobacco and nitrite in groundwater have been identified as risk factors for UC. We investigated if regional differences in UC incidence rates are associated with agricultural, industrial and residential land use. Newly diagnosed cases of UC between 2003 and 2010 were included. Information within 364 administrative districts of Germany from 2004 for land use factors were obtained and calculated as a proportion of the total area of the respective administrative district and as a smoothed proportion. Furthermore, information on smoking habits was included in our analysis. Kulldorff spatial clustering was used to detect different clusters. A negative binomial model was used to test the spatial association between UC incidence as a ratio of observed versus expected incidence rates, land use and smoking habits. We identified 437,847,834 person years with 171,086 cases of UC. Cluster analysis revealed areas with higher incidence of UC than others (p=0.0002). Multivariate analysis including significant pairwise interactions showed that the environmental factors were independently associated with UC (p<0.001). The RR was 1.066 (95% CI 1.052-1.080), 1.066 (95% CI 1.042-1.089) and 1.067 (95% CI 1.045-1.093) for agricultural, industrial and residential areas, respectively, and 0.996 (95% CI 0.869-0.999) for the proportion of never smokers. This study displays regional differences in incidence of UC in Germany. Additionally, results suggest that socioeconomic factors based on agricultural, industrial and residential land use may be associated with UC incidence rates. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. RELATIONSHIP BETWEEN PARENTS' MOTIVATION FOR PHYSICAL ACTIVITY AND THEIR BELIEFS, AND SUPPORT OF THEIR CHILDREN'S PHYSICAL ACTIVITY: A CLUSTER ANALYSIS.

    PubMed

    Naisseh, Matilda; Martinent, Guillaume; Ferrand, Claude; Hautier, Christophe

    2015-08-01

    Previous studies have neglected the multivariate nature of motivation. The purpose of the current study was to first identify motivational profiles of parents' own physical activity. Second, the study examined if such profiles differ in the way in which parents perceive their children's competence in physical activity and the importance and support given to their children's physical activity. 711 physically active parents (57% mothers; M age = 39.7 yr.; children 6-11 years old) completed the Situational Motivation Scale, the Parents' Perceptions of Physical Activity Importance and their Children's Ability Questionnaire, and the Parental Support for Physical Activity Scale. Cluster analyses indicated four motivational profiles: Highly self-determined, Moderately self-determined, Non-self-determined, and Externally motivated profiles. Parents' beliefs and support toward their children's physical activity significantly differed across these profiles. It is the first study using Self-Determination Theory that provides evidence for the interpersonal outcomes of motivation.

  20. Factors Associated with the Emergence of Highly Pathogenic Avian Influenza A (H5N1) Poultry Outbreaks in China: Evidence from an Epidemiological Investigation in Ningxia, 2012.

    PubMed

    Liu, H; Zhou, X; Zhao, Y; Zheng, D; Wang, J; Wang, X; Castellan, D; Huang, B; Wang, Z; Soares Magalhães, R J

    2017-06-01

    In April 2012, highly pathogenic avian influenza virus of the H5N1 subtype (HPAIV H5N1) emerged in poultry layers in Ningxia. A retrospective case-control study was conducted to identify possible risk factors associated with the emergence of H5N1 infection and describe and quantify the spatial variation in H5N1 infection. A multivariable logistic regression model was used to identify risk factors significantly associated with the presence of infection; residual spatial variation in H5N1 risk unaccounted by the factors included in the multivariable model was investigated using a semivariogram. Our results indicate that HPAIV H5N1-infected farms were three times more likely to improperly dispose farm waste [adjusted OR = 0.37; 95% CI: 0.12-0.82] and five times more likely to have had visitors in their farm within the past month [adjusted OR = 5.47; 95% CI: 1.97-15.64] compared to H5N1-non-infected farms. The variables included in the final multivariable model accounted only 20% for the spatial clustering of H5N1 infection. The average size of a H5N1 cluster was 660 m. Bio-exclusion practices should be strengthened on poultry farms to prevent further emergence of H5N1 infection. For future poultry depopulation, operations should consider H5N1 disease clusters to be as large as 700 m. © 2015 Blackwell Verlag GmbH.

  1. Incentives, Program Configuration, and Employee Uptake of Workplace Wellness Programs.

    PubMed

    Huang, Haijing; Mattke, Soeren; Batorsky, Benajmin; Miles, Jeremy; Liu, Hangsheng; Taylor, Erin

    2016-01-01

    The aim of this study was to determine the effect of wellness program configurations and financial incentives on employee participation rate. We analyze a nationally representative survey on workplace wellness programs from 407 employers using cluster analysis and multivariable regression analysis. Employers who offer incentives and provide a comprehensive set of program offerings have higher participation rates. The effect of incentives differs by program configuration, with the strongest effect found for comprehensive and prevention-focused programs. Among intervention-focused programs, incentives are not associated with higher participation. Wellness programs can be grouped into distinct configurations, which have different workplace health focuses. Although monetary incentives can be effective in improving employee participation, the magnitude and significance of the effect is greater for some program configurations than others.

  2. Multivariate image analysis of laser-induced photothermal imaging used for detection of caries tooth

    NASA Astrophysics Data System (ADS)

    El-Sherif, Ashraf F.; Abdel Aziz, Wessam M.; El-Sharkawy, Yasser H.

    2010-08-01

    Time-resolved photothermal imaging has been investigated to characterize tooth for the purpose of discriminating between normal and caries areas of the hard tissue using thermal camera. Ultrasonic thermoelastic waves were generated in hard tissue by the absorption of fiber-coupled Q-switched Nd:YAG laser pulses operating at 1064 nm in conjunction with a laser-induced photothermal technique used to detect the thermal radiation waves for diagnosis of human tooth. The concepts behind the use of photo-thermal techniques for off-line detection of caries tooth features were presented by our group in earlier work. This paper illustrates the application of multivariate image analysis (MIA) techniques to detect the presence of caries tooth. MIA is used to rapidly detect the presence and quantity of common caries tooth features as they scanned by the high resolution color (RGB) thermal cameras. Multivariate principal component analysis is used to decompose the acquired three-channel tooth images into a two dimensional principal components (PC) space. Masking score point clusters in the score space and highlighting corresponding pixels in the image space of the two dominant PCs enables isolation of caries defect pixels based on contrast and color information. The technique provides a qualitative result that can be used for early stage caries tooth detection. The proposed technique can potentially be used on-line or real-time resolved to prescreen the existence of caries through vision based systems like real-time thermal camera. Experimental results on the large number of extracted teeth as well as one of the thermal image panoramas of the human teeth voltanteer are investigated and presented.

  3. Interactions and accumulation differences of metal(loid)s in three sea cucumber species collected from the Northern Mediterranean Sea.

    PubMed

    Tunca, Evren; Aydın, Mehmet; Şahin, ÜlküAlver

    2016-10-01

    This study was conducted on Holothuria polii, Holothuria tubulosa, and Holothuria mammata collected from five stations with different depths in the Northern Mediterranean Sea. The body walls and guts of these holothurians were examined in terms of interactions of 10 metals (iron (Fe), copper (Cu), manganese (Mn), zinc (Zn), chromium (Cr), cobalt (Co), vanadium (V), nickel (Ni), cadmium (Cd), and lead (Pb)) and one metalloid (arsenic (As)) using a multivariate analysis, and interspecies differences were determined. The multivariate analysis of variance (MANOVA) revealed significant differences between the species in terms of metal(loid) accumulations. The principal component analysis (PCA) showed a more association between H. tubulosa and H. polii with regard to the accumulation. The cluster analysis (CA) located Pb concentrations of the guts to the farthest place from all elements regardless of the species. A correlation analysis displayed that the element concentrations of the guts were more closely related to each other compared with those of the walls. The most inconsistent element in terms of correlations was the gut Fe contents. Accordingly, while Fe concentrations of H. mammata and H. tubulosa were correlated with all elements (except Pb) in divalent metal transporter 1 (DMT1) (divalent cation transporter 1 (DCT1) or natural resistance-associated macrophage protein 2 (NRAMP2)) belonging to the NRAM protein family, this was not the case in H. polii. Consequently, significant relationships between accumulated metal(loid)s that changed by tissues and sea cucumber species were observed.

  4. Chemical discrimination of lubricant marketing types using direct analysis in real time time-of-flight mass spectrometry.

    PubMed

    Maric, Mark; Harvey, Lauren; Tomcsak, Maren; Solano, Angelique; Bridge, Candice

    2017-06-30

    In comparison to other violent crimes, sexual assaults suffer from very low prosecution and conviction rates especially in the absence of DNA evidence. As a result, the forensic community needs to utilize other forms of trace contact evidence, like lubricant evidence, in order to provide a link between the victim and the assailant. In this study, 90 personal bottled and condom lubricants from the three main marketing types, silicone-based, water-based and condoms, were characterized by direct analysis in real time time of flight mass spectrometry (DART-TOFMS). The instrumental data was analyzed by multivariate statistics including hierarchal cluster analysis, principal component analysis, and linear discriminant analysis. By interpreting the mass spectral data with multivariate statistics, 12 discrete groupings were identified, indicating inherent chemical diversity not only between but within the three main marketing groups. A number of unique chemical markers, both major and minor, were identified, other than the three main chemical components (i.e. PEG, PDMS and nonoxynol-9) currently used for lubricant classification. The data was validated by a stratified 20% withheld cross-validation which demonstrated that there was minimal overlap between the groupings. Based on the groupings identified and unique features of each group, a highly discriminating statistical model was then developed that aims to provide the foundation for the development of a forensic lubricant database that may eventually be applied to casework. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Big-Data RHEED analysis for understanding epitaxial film growth processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P

    Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in-situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED image, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the dataset are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of RHEED image sequence.more » This approach is illustrated for growth of LaxCa1-xMnO3 films grown on etched (001) SrTiO3 substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the assymetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.« less

  6. The association between Internet addiction and personality disorders in a general population-based sample.

    PubMed

    Zadra, Sina; Bischof, Gallus; Besser, Bettina; Bischof, Anja; Meyer, Christian; John, Ulrich; Rumpf, Hans-Jürgen

    2016-12-01

    Background and aims Data on Internet addiction (IA) and its association with personality disorder are rare. Previous studies are largely restricted to clinical samples and insufficient measurement of IA. Methods Cross-sectional analysis data are based on a German sub-sample (n = 168; 86 males; 71 meeting criteria for IA) with increased levels of excessive Internet use derived from a general population sample (n = 15,023). IA was assessed with a comprehensive standardized interview using the structure of the Composite International Diagnostic Interview and the criteria of Internet Gaming Disorder as suggested in DSM-5. Impulsivity, attention deficit hyperactivity disorder, and self-esteem were assessed with the widely used questionnaires. Results Participants with IA showed higher frequencies of personality disorders (29.6%) compared to those without IA (9.3%; p < .001). In males with IA, Cluster C personality disorders were more prevalent than among non-addicted males. Compared to participants who had IA only, lower rates of remission of IA were found among participants with IA and additional cluster B personality disorder. Personality disorders were significantly associated with IA in multivariate analysis. Comorbidity of IA and personality disorders must be considered in prevention and treatment.

  7. Assessment of metal pollution based on multivariate statistical modeling of 'hot spot' sediments from the Black Sea.

    PubMed

    Simeonov, V; Massart, D L; Andreev, G; Tsakovski, S

    2000-11-01

    The paper deals with application of different statistical methods like cluster and principal components analysis (PCA), partial least squares (PLSs) modeling. These approaches are an efficient tool in achieving better understanding about the contamination of two gulf regions in Black Sea. As objects of the study, a collection of marine sediment samples from Varna and Bourgas "hot spots" gulf areas are used. In the present case the use of cluster and PCA make it possible to separate three zones of the marine environment with different levels of pollution by interpretation of the sediment analysis (Bourgas gulf, Varna gulf and lake buffer zone). Further, the extraction of four latent factors offers a specific interpretation of the possible pollution sources and separates natural from anthropogenic factors, the latter originating from contamination by chemical, oil refinery and steel-work enterprises. Finally, the PLSs modeling gives a better opportunity in predicting contaminant concentration on tracer (or tracers) element as compared to the one-dimensional approach of the baseline models. The results of the study are important not only in local aspect as they allow quick response in finding solutions and decision making but also in broader sense as a useful environmetrical methodology.

  8. Molecular Subgroup of Primary Prostate Cancer Presenting with Metastatic Biology.

    PubMed

    Walker, Steven M; Knight, Laura A; McCavigan, Andrena M; Logan, Gemma E; Berge, Viktor; Sherif, Amir; Pandha, Hardev; Warren, Anne Y; Davidson, Catherine; Uprichard, Adam; Blayney, Jaine K; Price, Bethanie; Jellema, Gera L; Steele, Christopher J; Svindland, Aud; McDade, Simon S; Eden, Christopher G; Foster, Chris; Mills, Ian G; Neal, David E; Mason, Malcolm D; Kay, Elaine W; Waugh, David J; Harkin, D Paul; Watson, R William; Clarke, Noel W; Kennedy, Richard D

    2017-10-01

    Approximately 4-25% of patients with early prostate cancer develop disease recurrence following radical prostatectomy. To identify a molecular subgroup of prostate cancers with metastatic potential at presentation resulting in a high risk of recurrence following radical prostatectomy. Unsupervised hierarchical clustering was performed using gene expression data from 70 primary resections, 31 metastatic lymph nodes, and 25 normal prostate samples. Independent assay validation was performed using 322 radical prostatectomy samples from four sites with a mean follow-up of 50.3 months. Molecular subgroups were identified using unsupervised hierarchical clustering. A partial least squares approach was used to generate a gene expression assay. Relationships with outcome (time to biochemical and metastatic recurrence) were analysed using multivariable Cox regression and log-rank analysis. A molecular subgroup of primary prostate cancer with biology similar to metastatic disease was identified. A 70-transcript signature (metastatic assay) was developed and independently validated in the radical prostatectomy samples. Metastatic assay positive patients had increased risk of biochemical recurrence (multivariable hazard ratio [HR] 1.62 [1.13-2.33]; p=0.0092) and metastatic recurrence (multivariable HR=3.20 [1.76-5.80]; p=0.0001). A combined model with Cancer of the Prostate Risk Assessment post surgical (CAPRA-S) identified patients at an increased risk of biochemical and metastatic recurrence superior to either model alone (HR=2.67 [1.90-3.75]; p<0.0001 and HR=7.53 [4.13-13.73]; p<0.0001, respectively). The retrospective nature of the study is acknowledged as a potential limitation. The metastatic assay may identify a molecular subgroup of primary prostate cancers with metastatic potential. The metastatic assay may improve the ability to detect patients at risk of metastatic recurrence following radical prostatectomy. The impact of adjuvant therapies should be assessed in this higher-risk population. Copyright © 2017 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  9. Relationships of sedimentation and benthic macroinvertebrate assemblages in headwater streams using systematic longitudinal sampling at the reach scale.

    PubMed

    Longing, S D; Voshell, J R; Dolloff, C A; Roghair, C N

    2010-02-01

    Investigating relationships of benthic invertebrates and sedimentation is challenging because fine sediments act as both natural habitat and potential pollutant at excessive levels. Determining benthic invertebrate sensitivity to sedimentation in forested headwater streams comprised of extreme spatial heterogeneity is even more challenging, especially when associated with a background of historical and intense watershed disturbances that contributed unknown amounts of fine sediments to stream channels. This scenario exists in the Chattahoochee National Forest where such historical timber harvests and contemporary land-uses associated with recreation have potentially affected the biological integrity of headwater streams. In this study, we investigated relationships of sedimentation and the macroinvertebrate assemblages among 14 headwater streams in the forest by assigning 30, 100-m reaches to low, medium, or high sedimentation categories. Only one of 17 assemblage metrics (percent clingers) varied significantly across these categories. This finding has important implications for biological assessments by showing streams impaired physically by sedimentation may not be impaired biologically, at least using traditional approaches. A subsequent multivariate cluster analysis and indicator species analysis were used to further investigate biological patterns independent of sedimentation categories. Evaluating the distribution of sedimentation categories among biological reach clusters showed both within-stream variability in reach-scale sedimentation and sedimentation categories generally variable within clusters, reflecting the overall physical heterogeneity of these headwater environments. Furthermore, relationships of individual sedimentation variables and metrics across the biological cluster groups were weak, suggesting these measures of sedimentation are poor predictors of macroinvertebrate assemblage structure when using a systematic longitudinal sampling design. Further investigations of invertebrate sensitivity to sedimentation may benefit from assessments of sedimentation impacts at different spatial scales, determining compromised physical habitat integrity of specific taxa and developing alternative streambed measures for quantifying sedimentation.

  10. Periodontal Microorganisms and Cardiovascular Risk Markers in Youth With Type 1 Diabetes and Without Diabetes.

    PubMed

    Merchant, Anwar T; Nahhas, Georges J; Wadwa, R Paul; Zhang, Jiajia; Tang, Yifan; Johnson, Lonnie R; Maahs, David M; Bishop, Franziska; Teles, Ricardo; Morrato, Elaine H

    2016-04-01

    A subset of periodontal microorganisms has been associated with cardiovascular disease (CVD), which is the leading complication of type 1 diabetes (t1DM). The authors therefore evaluated the association between periodontal microorganism groups and early markers of CVD in youth with t1DM. A cross-sectional analysis was conducted among youth aged 12 to 19 years at enrollment; 105 had t1DM for ≥5 years and were seeking care at the Barbara Davis Center, University of Colorado, from 2009 to 2011, and 71 did not have diabetes. Subgingival plaque samples were assessed for counts of 41 periodontal microorganisms using DNA-DNA hybridization. Microorganisms were classified using cluster analysis into four groups named red-orange, orange-green, blue/other, and yellow/other, modified from Socransky's color scheme for periodontal microorganisms. Subsamples (54 with t1DM and 48 without diabetes) also received a periodontal examination at the University of Colorado School of Dental Medicine. Participants were ≈15 years old on average, and 74% were white. Mean periodontal probing depth was 2 mm (SE 0.02), and 17% had bleeding on probing. In multivariable analyses, glycated hemoglobin (HbA1c) was inversely associated with the yellow/other cluster (microorganisms that are not associated with periodontal disease) among youth with t1DM. Blood pressure, triglycerides, low-density lipoprotein, high-density lipoprotein, and total cholesterol were not associated with microorganism clusters in this group. HbA1c was not associated with periodontal microorganism clusters among youth without diabetes. Among youth with t1DM who had good oral health, periodontal microorganisms were not associated with CVD risk factors.

  11. Phytochemical, phylogenetic, and anti-inflammatory evaluation of 43 Urtica accessions (stinging nettle) based on UPLC-Q-TOF-MS metabolomic profiles.

    PubMed

    Farag, Mohamed A; Weigend, Maximilian; Luebert, Federico; Brokamp, Grischa; Wessjohann, Ludger A

    2013-12-01

    Several species of the genus Urtica (especially Urtica dioica, Urticaceae), are used medicinally to treat a variety of ailments. To better understand the chemical diversity of the genus and to compare different accessions and different taxa of Urtica, 63 leaf samples representing a broad geographical, taxonomical and morphological diversity were evaluated under controlled conditions. A molecular phylogeny for all taxa investigated was prepared to compare phytochemical similarity with phylogenetic relatedness. Metabolites were analyzed via UPLC-PDA-MS and multivariate data analyses. In total, 43 metabolites were identified, with phenolic compounds and hydroxy fatty acids as the dominant substance groups. Principal component analysis (PCA) and hierarchical clustering analysis (HCA) provides a first structured chemotaxonomy of the genus. The molecular data present a highly resolved phylogeny with well-supported clades and subclades. U. dioica is retrieved as both para- and polyphyletic. European members of the U. dioica group and the North American subspecies share a rather similar metabolite profile and were largely retrieved as one, nearly exclusive cluster by metabolite data. This latter cluster also includes - remotely related - Urtica urens, which is pharmaceutically used in the same way as U. dioica. However, most highly supported phylogenetic clades were not retrieved in the metabolite cluster analyses. Overall, metabolite profiles indicate considerable phytochemical diversity in the genus, which largely falls into a group characterized by high contents of hydroxy fatty acids (e.g., most Andean-American taxa) and another group characterized by high contents of phenolic acids (especially the U. dioica-clade). Anti-inflammatory in vitro COX1 enzyme inhibition assays suggest that bioactivity may be predicted by gross metabolic profiling in Urtica. Copyright © 2013. Published by Elsevier Ltd.

  12. Quantifying long-term human impact in contrasting environments: Statistical analysis of modern and fossil pollen records

    NASA Astrophysics Data System (ADS)

    Broothaerts, Nils; López-Sáez, José Antonio; Verstraeten, Gert

    2017-04-01

    Reconstructing and quantifying human impact is an important step to understand human-environment interactions in the past. Quantitative measures of human impact on the landscape are needed to fully understand long-term influence of anthropogenic land cover changes on the global climate, ecosystems and geomorphic processes. Nevertheless, quantifying past human impact is not straightforward. Recently, multivariate statistical analysis of fossil pollen records have been proposed to characterize vegetation changes and to get insights in past human impact. Although statistical analysis of fossil pollen data can provide useful insights in anthropogenic driven vegetation changes, still it cannot be used as an absolute quantification of past human impact. To overcome this shortcoming, in this study fossil pollen records were included in a multivariate statistical analysis (cluster analysis and non-metric multidimensional scaling (NMDS)) together with modern pollen data and modern vegetation data. The information on the modern pollen and vegetation dataset can be used to get a better interpretation of the representativeness of the fossil pollen records, and can result in a full quantification of human impact in the past. This methodology was applied in two contrasting environments: SW Turkey and Central Spain. For each region, fossil pollen data from different study sites were integrated, together with modern pollen data and information on modern vegetation. In this way, arboreal cover, grazing pressure and agricultural activities in the past were reconstructed and quantified. The data from SW Turkey provides new integrated information on changing human impact through time in the Sagalassos territory, and shows that human impact was most intense during the Hellenistic and Roman Period (ca. 2200-1750 cal a BP) and decreased and changed in nature afterwards. The data from central Spain shows for several sites that arboreal cover decreases bellow 5% from the Feudal period onwards (ca. 850 cal a BP) related to increasing human impact in the landscape. At other study sites arboreal cover remained above 25% beside significant human impact. Overall, the presented examples from two contrasting environments shows how cluster analysis and NMDS of modern and fossil pollen data can help to provide quantitative insights in anthropogenic land cover changes. Our study extensively discuss and illustrate the possibilities and limitations of statistical analysis of pollen data to quantify human induced land use changes.

  13. Symptoms and subjective quality of life in post-traumatic stress disorder: a longitudinal study.

    PubMed

    Giacco, Domenico; Matanov, Aleksandra; Priebe, Stefan

    2013-01-01

    Evidence suggests that post-traumatic stress disorder (PTSD) is associated with substantially reduced subjective quality of life (SQOL). This study aimed to explore whether and how changes in the levels of PTSD symptom clusters of intrusion, avoidance and hyperarousal are associated with changes in SQOL. Two samples with PTSD following the war in former Yugoslavia were studied, i.e. a representative sample of 530 people in five Balkan countries and a non-representative sample of 215 refugees in three Western European countries. They were assessed on average eight years after the war and re-interviewed one year later. PTSD symptoms were assessed on the Impact of Event Scale - Revised and SQOL on the Manchester Short Assessment of Quality of Life. Linear regression and a two-wave cross lagged panel analysis were used to explore the association between PTSD symptom clusters and SQOL. The findings in the two samples were consistent. Symptom reduction over time was associated with improved SQOL. In multivariable analyses adjusted for the influence of all three clusters, gender and time since war exposure, only changes in hyperarousal symptoms were significantly associated with changes in SQOL. The two-wave cross-lagged panel analysis suggested that the link between hyperarousal symptoms and SQOL is bidirectional. Low SQOL of patients with war-related PTSD is particularly associated with hyperarousal symptoms. The findings suggest a bidirectional influence: a reduction in hyperarousal symptoms may result in improved SQOL, and improvements in SQOL may lead to reduced hyperarousal symptoms.

  14. Radiation- and Age-Associated Changes in Peripheral Blood Dendritic Cell Populations among Aging Atomic Bomb Survivors in Japan.

    PubMed

    Kajimura, Junko; Lynch, Heather E; Geyer, Susan; French, Benjamin; Yamaoka, Mika; Shterev, Ivo D; Sempowski, Gregory D; Kyoizumi, Seishi; Yoshida, Kengo; Misumi, Munechika; Ohishi, Waka; Hayashi, Tomonori; Nakachi, Kei; Kusunoki, Yoichiro

    2017-11-30

    Previous immunological studies in atomic bomb survivors have suggested that radiation exposure leads to long-lasting changes, similar to immunological aging observed in T-cell-adaptive immunity. However, to our knowledge, late effects of radiation on dendritic cells (DCs), the key coordinators for activation and differentiation of T cells, have not yet been investigated in humans. In the current study, we hypothesized that numerical and functional decreases would be observed in relationship to radiation dose in circulating conventional DCs (cDCs) and plasmacytoid DCs (pDCs) among 229 Japanese A-bomb survivors. Overall, the evidence did not support this hypothesis, with no overall changes in DCs or functional changes observed with radiation dose. Multivariable regression analysis for radiation dose, age and gender effects revealed that total DC counts as well as subpopulation counts decreased in relationship to increasing age. Further analyses revealed that in women, absolute numbers of pDCs showed significant decreases with radiation dose. A hierarchical clustering analysis of gene expression profiles in DCs after Toll-like receptor stimulation in vitro identified two clusters of participants that differed in age-associated expression levels of genes involved in antigen presentation and cytokine/chemokine production in cDCs. These results suggest that DC counts decrease and expression levels of gene clusters change with age. More than 60 years after radiation exposure, we also observed changes in pDC counts associated with radiation, but only among women.

  15. Radiation- and Age-Associated Changes in Peripheral Blood Dendritic Cell Populations among Aging Atomic Bomb Survivors in Japan.

    PubMed

    Kajimura, Junko; Lynch, Heather E; Geyer, Susan; French, Benjamin; Yamaoka, Mika; Shterev, Ivo D; Sempowski, Gregory D; Kyoizumi, Seishi; Yoshida, Kengo; Misumi, Munechika; Ohishi, Waka; Hayashi, Tomonori; Nakachi, Kei; Kusunoki, Yoichiro

    2018-01-01

    Previous immunological studies in atomic bomb survivors have suggested that radiation exposure leads to long-lasting changes, similar to immunological aging observed in T-cell-adaptive immunity. However, to our knowledge, late effects of radiation on dendritic cells (DCs), the key coordinators for activation and differentiation of T cells, have not yet been investigated in humans. In the current study, we hypothesized that numerical and functional decreases would be observed in relationship to radiation dose in circulating conventional DCs (cDCs) and plasmacytoid DCs (pDCs) among 229 Japanese A-bomb survivors. Overall, the evidence did not support this hypothesis, with no overall changes in DCs or functional changes observed with radiation dose. Multivariable regression analysis for radiation dose, age and gender effects revealed that total DC counts as well as subpopulation counts decreased in relationship to increasing age. Further analyses revealed that in women, absolute numbers of pDCs showed significant decreases with radiation dose. A hierarchical clustering analysis of gene expression profiles in DCs after Toll-like receptor stimulation in vitro identified two clusters of participants that differed in age-associated expression levels of genes involved in antigen presentation and cytokine/chemokine production in cDCs. These results suggest that DC counts decrease and expression levels of gene clusters change with age. More than 60 years after radiation exposure, we also observed changes in pDC counts associated with radiation, but only among women.

  16. Detection of compatibility between baclofen and excipients with aid of infrared spectroscopy and chemometry

    NASA Astrophysics Data System (ADS)

    Rojek, Barbara; Wesolowski, Marek; Suchacz, Bogdan

    2013-12-01

    In the paper infrared (IR) spectroscopy and multivariate exploration techniques: principal component analysis (PCA) and cluster analysis (CA) were applied as supportive methods for the detection of physicochemical incompatibilities between baclofen and excipients. In the course of research, the most useful rotational strategy in PCA proved to be varimax normalized, while in CA Ward's hierarchical agglomeration with Euclidean distance measure enabled to yield the most interpretable results. Chemometrical calculations confirmed the suitability of PCA and CA as the auxiliary methods for interpretation of infrared spectra in order to recognize whether compatibilities or incompatibilities between active substance and excipients occur. On the basis of IR spectra and the results of PCA and CA it was possible to demonstrate that the presence of lactose, β-cyclodextrin and meglumine in binary mixtures produce interactions with baclofen. The results were verified using differential scanning calorimetry, differential thermal analysis, thermogravimetry/differential thermogravimetry and X-ray powder diffraction analyses.

  17. The assessment of processes controlling the spatial distribution of hydrogeochemical groundwater types in Mali using multivariate statistics

    NASA Astrophysics Data System (ADS)

    Keita, Souleymane; Zhonghua, Tang

    2017-10-01

    Sustainable management of groundwater resources is a major issue for developing countries, especially in Mali. The multiple uses of groundwater led countries to promote sound management policies for sustainable use of the groundwater resources. For this reason, each country needs data enabling it to monitor and predict the changes of the resources. Also given the importance of groundwater quality changes often marked by the recurrence of droughts; the potential impacts of regional and geological setting of groundwater resources requires careful study. Unfortunately, recent decades have seen a considerable reduction of national capacities to ensure the hydrogeological monitoring and production of qualit data for decision making. The purpose of this work is to use the groundwater data and translate into useful information that can improve water resources management capacity in Mali. In this paper, we used groundwater analytical data from accredited, laboratories in Mali to carry out a national scale assessment of the groundwater types and their distribution. We, adapted multivariate statistical methods to classify 2035 groundwater samples into seven main groundwater types and built a national scale map from the results. We used a two-level K-mean clustering technique to examine the hydro-geochemical records as percentages of the total concentrations of major ions, namely sodium (Na), magnesium (Mg), calcium (Ca), chloride (Cl), bicarbonate (HCO3), and sulphate (SO4). The first step of clustering formed 20 groups, and these groups were then re-clustered to produce the final seven groundwater types. The results were verified and confirmed using Principal Component Analysis (PCA) and RockWare (Aq.QA) software. We found that HCO3 was the most dominant anion throughout the country and that Cl and SO4 were only important in some local zones. The dominant cations were Na and Mg. Also, major ion ratios changed with geographical location and geological, and climatic conditions.

  18. Inflammatory Mediator Profiles Differ in Sepsis Patients With and Without Bacteremia.

    PubMed

    Mosevoll, Knut Anders; Skrede, Steinar; Markussen, Dagfinn Lunde; Fanebust, Hans Rune; Flaatten, Hans Kristian; Aßmus, Jörg; Reikvam, Håkon; Bruserud, Øystein

    2018-01-01

    Systemic levels of cytokines are altered during infection and sepsis. This prospective observational study aimed to investigate whether plasma levels of multiple inflammatory mediators differed between sepsis patients with and those without bacteremia during the initial phase of hospitalization. A total of 80 sepsis patients with proven bacterial infection and no immunosuppression were included in the study. Plasma samples were collected within 24 h of hospitalization, and Luminex ® analysis was performed on 35 mediators: 16 cytokines, six growth factors, four adhesion molecules, and nine matrix metalloproteases (MMPs)/tissue inhibitors of metalloproteinases (TIMPs). Forty-two patients (52.5%) and 38 (47.5%) patients showed positive and negative blood cultures, respectively. There were significant differences in plasma levels of six soluble mediators between the two "bacteremia" and "non-bacteremia" groups, using Mann-Whitney U test ( p  < 0.0014): tumor necrosis factor alpha (TNFα), CCL4, E-selectin, vascular cell adhesion molecule-1 (VCAM-1), intracellular adhesion molecule-1 (ICAM-1), and TIMP-1. Ten soluble mediators also significantly differed in plasma levels between the two groups, with p -values ranging between 0.05 and 0.0014: interleukin (IL)-1ra, IL-10, CCL2, CCL5, CXCL8, CXCL11, hepatocyte growth factor, MMP-8, TIMP-2, and TIMP-4. VCAM-1 showed the most robust results using univariate and multivariate logistic regression. Using unsupervised hierarchical clustering, we found that TNFα, CCL4, E-selectin, VCAM-1, ICAM-1, and TIMP-1 could be used to discriminate between patients with and those without bacteremia. Patients with bacteremia were mainly clustered in two separate groups (two upper clusters, 41/42, 98%), with higher levels of the mediators. One (2%) patient with bacteremia was clustered in the lower cluster, which compromised most of the patients without bacteremia (23/38, 61%) (χ 2 test, p  < 0.0001). Our study showed that analysis of the plasma inflammatory mediator profile could represent a potential strategy for early identification of patients with bacteremia.

  19. Application of unsupervised pattern recognition approaches for exploration of rare earth elements in Se-Chahun iron ore, central Iran

    NASA Astrophysics Data System (ADS)

    Sarparandeh, Mohammadali; Hezarkhani, Ardeshir

    2017-12-01

    The use of efficient methods for data processing has always been of interest to researchers in the field of earth sciences. Pattern recognition techniques are appropriate methods for high-dimensional data such as geochemical data. Evaluation of the geochemical distribution of rare earth elements (REEs) requires the use of such methods. In particular, the multivariate nature of REE data makes them a good target for numerical analysis. The main subject of this paper is application of unsupervised pattern recognition approaches in evaluating geochemical distribution of REEs in the Kiruna type magnetite-apatite deposit of Se-Chahun. For this purpose, 42 bulk lithology samples were collected from the Se-Chahun iron ore deposit. In this study, 14 rare earth elements were measured with inductively coupled plasma mass spectrometry (ICP-MS). Pattern recognition makes it possible to evaluate the relations between the samples based on all these 14 features, simultaneously. In addition to providing easy solutions, discovery of the hidden information and relations of data samples is the advantage of these methods. Therefore, four clustering methods (unsupervised pattern recognition) - including a modified basic sequential algorithmic scheme (MBSAS), hierarchical (agglomerative) clustering, k-means clustering and self-organizing map (SOM) - were applied and results were evaluated using the silhouette criterion. Samples were clustered in four types. Finally, the results of this study were validated with geological facts and analysis results from, for example, scanning electron microscopy (SEM), X-ray diffraction (XRD), ICP-MS and optical mineralogy. The results of the k-means clustering and SOM methods have the best matches with reality, with experimental studies of samples and with field surveys. Since only the rare earth elements are used in this division, a good agreement of the results with lithology is considerable. It is concluded that the combination of the proposed methods and geological studies leads to finding some hidden information, and this approach has the best results compared to using only one of them.

  20. Pathways to Late-Life Suicidal Behavior: Cluster Analysis and Predictive Validation of Suicidal Behavior in a Sample of Older Adults With Major Depression.

    PubMed

    Szanto, Katalin; Galfalvy, Hanga; Vanyukov, Polina M; Keilp, John G; Dombrovski, Alexandre Y

    Clinical heterogeneity is a key challenge to understanding suicidal risk, as different pathways to suicidal behavior are likely to exist. We aimed to identify such pathways by uncovering latent classes of late-life depression cases and relating them to prior and future suicidal behavior. Data were collected from June 2010 to September 2015. In this longitudinal study we examined distinct associations of clinical and cognitive/decision-making factors with suicidal behavior in 194 older (50+ years) nondemented, depressed patients; 57 nonpsychiatric healthy controls provided benchmark data. The DSM-IV was used to establish diagnostic criteria. We identified multivariate patterns of risk factors, defining clusters based on personality traits, perceived social support, cognitive performance, and decision-making in an analysis blinded to participants' history of suicidal behavior. We validated these clusters using past and prospective suicidal ideation and behavior. Of 5 clusters identified, 3 were associated with high risk for suicidal behavior: (1) cognitive deficits, dysfunctional personality, low social support, high willingness to delay future rewards, and overrepresentation of high-lethality attempters; (2) high-personality pathology (ie, low self-esteem), minimal or no cognitive deficits, and overrepresentation of low-lethality attempters and ideators; (3) cognitive deficits, inability to delay future rewards, and similar distribution of high- and low-lethality attempters. There were significant between-cluster differences in number (P < .001) and lethality (P = .002) of past suicide attempts and in the likelihood of future suicide attempts (P = .010, 30 attempts by 22 patients, 2 fatal) and emergency psychiatric hospitalizations to prevent suicide (P = .005, 31 participants). Three pathways to suicidal behavior in older patients were found, marked by (1) very high levels of cognitive and dispositional risk factors suggesting a dementia prodrome, (2) dysfunctional personality traits, and (3) impulsive decision-making and cognitive deficits. © Copyright 2018 Physicians Postgraduate Press, Inc.

  1. Insights into an original pocket-ligand pair classification: a promising tool for ligand profile prediction.

    PubMed

    Pérot, Stéphanie; Regad, Leslie; Reynès, Christelle; Spérandio, Olivier; Miteva, Maria A; Villoutreix, Bruno O; Camproux, Anne-Claude

    2013-01-01

    Pockets are today at the cornerstones of modern drug discovery projects and at the crossroad of several research fields, from structural biology to mathematical modeling. Being able to predict if a small molecule could bind to one or more protein targets or if a protein could bind to some given ligands is very useful for drug discovery endeavors, anticipation of binding to off- and anti-targets. To date, several studies explore such questions from chemogenomic approach to reverse docking methods. Most of these studies have been performed either from the viewpoint of ligands or targets. However it seems valuable to use information from both ligands and target binding pockets. Hence, we present a multivariate approach relating ligand properties with protein pocket properties from the analysis of known ligand-protein interactions. We explored and optimized the pocket-ligand pair space by combining pocket and ligand descriptors using Principal Component Analysis and developed a classification engine on this paired space, revealing five main clusters of pocket-ligand pairs sharing specific and similar structural or physico-chemical properties. These pocket-ligand pair clusters highlight correspondences between pocket and ligand topological and physico-chemical properties and capture relevant information with respect to protein-ligand interactions. Based on these pocket-ligand correspondences, a protocol of prediction of clusters sharing similarity in terms of recognition characteristics is developed for a given pocket-ligand complex and gives high performances. It is then extended to cluster prediction for a given pocket in order to acquire knowledge about its expected ligand profile or to cluster prediction for a given ligand in order to acquire knowledge about its expected pocket profile. This prediction approach shows promising results and could contribute to predict some ligand properties critical for binding to a given pocket, and conversely, some key pocket properties for ligand binding.

  2. Insights into an Original Pocket-Ligand Pair Classification: A Promising Tool for Ligand Profile Prediction

    PubMed Central

    Reynès, Christelle; Spérandio, Olivier; Miteva, Maria A.; Villoutreix, Bruno O.; Camproux, Anne-Claude

    2013-01-01

    Pockets are today at the cornerstones of modern drug discovery projects and at the crossroad of several research fields, from structural biology to mathematical modeling. Being able to predict if a small molecule could bind to one or more protein targets or if a protein could bind to some given ligands is very useful for drug discovery endeavors, anticipation of binding to off- and anti-targets. To date, several studies explore such questions from chemogenomic approach to reverse docking methods. Most of these studies have been performed either from the viewpoint of ligands or targets. However it seems valuable to use information from both ligands and target binding pockets. Hence, we present a multivariate approach relating ligand properties with protein pocket properties from the analysis of known ligand-protein interactions. We explored and optimized the pocket-ligand pair space by combining pocket and ligand descriptors using Principal Component Analysis and developed a classification engine on this paired space, revealing five main clusters of pocket-ligand pairs sharing specific and similar structural or physico-chemical properties. These pocket-ligand pair clusters highlight correspondences between pocket and ligand topological and physico-chemical properties and capture relevant information with respect to protein-ligand interactions. Based on these pocket-ligand correspondences, a protocol of prediction of clusters sharing similarity in terms of recognition characteristics is developed for a given pocket-ligand complex and gives high performances. It is then extended to cluster prediction for a given pocket in order to acquire knowledge about its expected ligand profile or to cluster prediction for a given ligand in order to acquire knowledge about its expected pocket profile. This prediction approach shows promising results and could contribute to predict some ligand properties critical for binding to a given pocket, and conversely, some key pocket properties for ligand binding. PMID:23840299

  3. Functional Groups Based on Leaf Physiology: Are they Spatially and Temporally Robust?

    NASA Technical Reports Server (NTRS)

    Foster, Tammy E.; Brooks, J. Renee

    2004-01-01

    The functional grouping hypothesis, which suggests that complexity in ecosystem function can be simplified by grouping species with similar responses, was tested in the Florida scrub habitat. Functional groups were identified based on how species in fire maintained Florida scrub regulate exchange of carbon and water with the atmosphere as indicated by both instantaneous gas exchange measurements and integrated measures of function (%N, delta C-13, delta N-15, C-N ratio). Using cluster analysis, five distinct physiologically-based functional groups were identified in the fire maintained scrub. These functional groups were tested to determine if they were robust spatially, temporally, and with management regime. Analysis of Similarities (ANOSIM), a non-parametric multivariate analysis, indicated that these five physiologically-based groupings were not altered by plot differences (R = -0.115, p = 0.893) or by the three different management regimes; prescribed burn, mechanically treated and burn, and fire-suppressed (R = 0.018, p = 0.349). The physiological groupings also remained robust between the two climatically different years 1999 and 2000 (R = -0.027, p = 0.725). Easy-to-measure morphological characteristics indicating functional groups would be more practical for scaling and modeling ecosystem processes than detailed gas-exchange measurements, therefore we tested a variety of morphological characteristics as functional indicators. A combination of non-parametric multivariate techniques (Hierarchical cluster analysis, non-metric Multi-Dimensional Scaling, and ANOSIM) were used to compare the ability of life form, leaf thickness, and specific leaf area classifications to identify the physiologically-based functional groups. Life form classifications (ANOSIM; R = 0.629, p 0.001) were able to depict the physiological groupings more adequately than either specific leaf area (ANOSIM; R = 0.426, p = 0.001) or leaf thickness (ANOSIM; R 0.344, p 0.001). The ability of life forms to depict the physiological groupings was improved by separating the parasitic Ximenia americana from the shrub category (ANOSIM; R = 0.794, p = 0.001). Therefore, a life form classification including parasites was determined to be a good indicator of the physiological processes of scrub species, and would be a useful method of grouping for scaling physiological processes to the ecosystem level.

  4. Profiles of organic food consumers in a large sample of French adults: results from the Nutrinet-Santé cohort study.

    PubMed

    Kesse-Guyot, Emmanuelle; Péneau, Sandrine; Méjean, Caroline; Szabo de Edelenyi, Fabien; Galan, Pilar; Hercberg, Serge; Lairon, Denis

    2013-01-01

    Lifestyle, dietary patterns and nutritional status of organic food consumers have rarely been described, while interest for a sustainable diet is markedly increasing. Consumer attitude and frequency of use of 18 organic products were assessed in 54,311 adult participants in the Nutrinet-Santé cohort. Cluster analysis was performed to identify behaviors associated with organic product consumption. Socio-demographic characteristics, food consumption and nutrient intake across clusters are provided. Cross-sectional association with overweight/obesity was estimated using polytomous logistic regression. Five clusters were identified: 3 clusters of non-consumers whose reasons differed, occasional (OCOP, 51%) and regular (RCOP, 14%) organic product consumers. RCOP were more highly educated and physically active than other clusters. They also exhibited dietary patterns that included more plant foods and less sweet and alcoholic beverages, processed meat or milk. Their nutrient intake profiles (fatty acids, most minerals and vitamins, fibers) were healthier and they more closely adhered to dietary guidelines. In multivariate models (after accounting for confounders, including level of adherence to nutritional guidelines), compared to those not interested in organic products, RCOP participants showed a markedly lower probability of overweight (excluding obesity) (25 ≤ body mass index<30) and obesity (body mass index ≥ 30): -36% and -62% in men and -42% and -48% in women, respectively (P<0.0001). OCOP participants (%) generally showed intermediate figures. Regular consumers of organic products, a sizeable group in our sample, exhibit specific socio-demographic characteristics, and an overall healthy profile which should be accounted for in further studies analyzing organic food intake and health markers.

  5. Using parallel factor analysis modeling (PARAFAC) and self-organizing maps to track senescence-induced patterns in leaf litter leachate

    NASA Astrophysics Data System (ADS)

    Wheeler, K. I.; Levia, D. F., Jr.; Hudson, J. E.

    2017-12-01

    As trees undergo autumnal processes such as resorption, senescence, and leaf abscission, the dissolved organic matter (DOM) contribution of leaf litter leachate to streams changes. However, little research has investigated how the fluorescent DOM (FDOM) changes throughout the autumn and how this differs inter- and intraspecifically. Two of the major impacts of global climate change on forested ecosystems include altering phenology and causing forest community species and subspecies composition restructuring. We examined changes in FDOM in leachate from American beech (Fagus grandifolia Ehrh.) leaves in Maryland, Rhode Island, Vermont, and North Carolina and yellow poplar (Liriodendron tulipifera L.) leaves from Maryland throughout three different phenophases: green, senescing, and freshly abscissed. Beech leaves from Maryland and Rhode Island have previously been identified as belonging to the same distinct genetic cluster and beech trees from Vermont and the study site in North Carolina from the other. FDOM in samples was characterized using excitation-emission matrices (EEMs) and a six-component parallel factor analysis (PARAFAC) model was created to identify components. Self-organizing maps (SOMs) were used to visualize variation and patterns in the PARAFAC component proportions of the leachate samples. Phenophase and species had the greatest influence on determining where a sample mapped on the SOM when compared to genetic clusters and geographic origin. Throughout senescence, FDOM from all the trees transitioned from more protein-like components to more humic-like ones. Percent greenness of the sampled leaves and the proportion of the tyrosine-like component 1 were found to significantly differ between the two genetic beech clusters. This suggests possible differences in photosynthesis and resorption between the two genetic clusters of beech. The use of SOMs to visualize differences in patterns of senescence between the different species and genetic populations proved to be useful in ways that other multivariate analysis techniques lack.

  6. Heavy metals in edible seaweeds commercialised for human consumption

    NASA Astrophysics Data System (ADS)

    Besada, Victoria; Andrade, José Manuel; Schultze, Fernando; González, Juan José

    2009-01-01

    Though seaweed consumption is growing steadily across Europe, relatively few studies have reported on the quantities of heavy metals they contain and/or their potential effects on the population's health. This study focuses on the first topic and analyses the concentrations of six typical heavy metals (Cd, Pb, Hg, Cu, Zn, total As and inorganic As) in 52 samples from 11 algae-based products commercialised in Spain for direct human consumption ( Gelidium spp.; Eisenia bicyclis; Himanthalia elongata; Hizikia fusiforme; Laminaria spp.; Ulva rigida; Chondrus crispus; Porphyra umbilicales and Undaria pinnatifida). Samples were ground, homogenised and quantified by atomic absorption spectrometry (Cu and Zn by flame AAS; Cd, Pb and total As by electrothermal AAS; total mercury by the cold vapour technique; and inorganic As by flame-hydride generation). Accuracy was assessed by participation in periodic QUASIMEME (Quality Assurance of Information in Marine Environmental Monitoring in Europe) and IAEA (International Atomic Energy Agency) intercalibration exercises. To detect any objective differences existing between the seaweeds' metal concentrations, univariate and multivariate studies (principal component analysis, cluster analysis and linear discriminant analysis) were performed. It is concluded that the Hizikia fusiforme samples contained the highest values of total and inorganic As and that most Cd concentrations exceeded the French Legislation. The two harvesting areas (Atlantic and Pacific oceans) were differentiated using both univariate studies (for Cu, total As, Hg and Zn) and a multivariate discriminant function (which includes Zn, Cu and Pb).

  7. Comparative assessment of essential and heavy metals in fruits from different geographical origins.

    PubMed

    Grembecka, Małgorzata; Szefer, Piotr

    2013-11-01

    The aim of this investigation was to estimate and compare essential and heavy metals contents in 98 commercially available fresh fruits from different geographic regions using multivariate techniques. The concentrations of 12 elements (calcium, magnesium, potassium, sodium, phophorus, cobalt (Co), manganese, iron, chromium (Cr), nickel (Ni), zinc and copper) were determined using flame atomic absorption spectrometry with deuterium-background correction. Phosphorus was determined in the form of phosphomolybdate by a spectrophotometric method. Reliability of the procedure was checked by analysis of the certified reference materials tea (NCS DC 73351), cabbage (IAEA-359) and spinach leaves (NIST-1570). Recoveries of the elements analysed varied between 85.5 and 103%, and precisions for the reference materials were 0.13-6.08%. Based on recommended dietary allowance and adequate intake estimated for essential elements, it was concluded that accessory fruits such as pineapples, raspberries and strawberries supply organism with the highest amounts of bioelements. Although accessory fruits were also found to be the greatest source of Ni among all the analysed fruits, in all the fruits Ni was more abundant than Cr and Co. Significant correlation coefficients (p < 0.001, p < 0.01 and p < 0.05) were found between concentrations of some metals in fresh fruits. Application of ANOVA Kruskal-Wallis test and multivariate techniques such as factor analysis and cluster analysis enabled us to differentiate particular botanical families and types of fruits.

  8. GC-MS-based metabolite profiling of Cosmos caudatus leaves possessing alpha-glucosidase inhibitory activity.

    PubMed

    Javadi, Neda; Abas, Faridah; Abd Hamid, Azizah; Simoh, Sanimah; Shaari, Khozirah; Ismail, Intan Safinar; Mediani, Ahmed; Khatib, Alfi

    2014-06-01

    Cosmos caudatus, which is known as "Ulam Raja," is an herbal plant used in Malaysia to enhance vitality. This study focused on the evaluation of the α-glucosidase inhibitory activity of different ethanolic extracts of C. caudatus. Six series of samples extracted with water, 20%, 40%, 60%, 80%, and 100% ethanol (EtOH) were employed. Gas chromatography-mass spectrometry (GC-MS) and orthogonal partial least-squares (OPLS) analysis was used to correlate bioactivity of different extracts to different metabolite profiles of C. caudatus. The obtained OPLS scores indicated a distinct and remarkable separation into 6 clusters, which were indicative of the 6 different ethanol concentrations. GC-MS can be integrated with multivariate data analysis to identify compounds that inhibit α-glucosidase activity. In addition, catechin, α-linolenic acid, α-D-glucopyranoside, and vitamin E compounds were identified and indicate the potential α-glucosidase inhibitory activity of this herb. GC-MS and multivariate data analysis was applied to discriminate Cosmos caudatus samples extracted with water and different ratio of ethanol. Orthogonal partial least-squares (OPLS) model developed was used to determine the major metabolites contributed to α-glucosidase inhibitory activity. This approach also has the ability to predict the bioactivity of a new set of extracts based on a developed validated regression model that is important for quality control of the herb preparation. © 2014 Institute of Food Technologists®

  9. Multivariate Analysis, Mass Balance Techniques, and Statistical Tests as Tools in Igneous Petrology: Application to the Sierra de las Cruces Volcanic Range (Mexican Volcanic Belt)

    PubMed Central

    Velasco-Tapia, Fernando

    2014-01-01

    Magmatic processes have usually been identified and evaluated using qualitative or semiquantitative geochemical or isotopic tools based on a restricted number of variables. However, a more complete and quantitative view could be reached applying multivariate analysis, mass balance techniques, and statistical tests. As an example, in this work a statistical and quantitative scheme is applied to analyze the geochemical features for the Sierra de las Cruces (SC) volcanic range (Mexican Volcanic Belt). In this locality, the volcanic activity (3.7 to 0.5 Ma) was dominantly dacitic, but the presence of spheroidal andesitic enclaves and/or diverse disequilibrium features in majority of lavas confirms the operation of magma mixing/mingling. New discriminant-function-based multidimensional diagrams were used to discriminate tectonic setting. Statistical tests of discordancy and significance were applied to evaluate the influence of the subducting Cocos plate, which seems to be rather negligible for the SC magmas in relation to several major and trace elements. A cluster analysis following Ward's linkage rule was carried out to classify the SC volcanic rocks geochemical groups. Finally, two mass-balance schemes were applied for the quantitative evaluation of the proportion of the end-member components (dacitic and andesitic magmas) in the comingled lavas (binary mixtures). PMID:24737994

  10. A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

    NASA Astrophysics Data System (ADS)

    Guillen, George; Rainey, Gail; Morin, Michelle

    2004-04-01

    Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.

  11. Risk of hemorrhagic transformation after ischemic stroke in patients with antiphospholipid antibody syndrome.

    PubMed

    Mehta, Tapan; Hussain, Mohammed; Sheth, Khushboo; Ding, Yuchuan; McCullough, Louise D

    2017-06-01

    Several rheumatologic conditions including systemic lupus erythematosus, antiphospholipid antibody (APS) syndrome, rheumatoid arthritis, and scleroderma are known risk factors for stroke. The risk of hemorrhagic transformation after an acute ischemic stroke (AIS) in these patients is not known. We queried the Nationwide Inpatient Sample (NIS) data between 2010 and 2012 with ICD 9 diagnostic codes for AIS. The primary outcome was the development of hemorrhagic transformation. Multivariate predictors for hemorrhagic transformation were identified with a logistic regression model. Using SAS 9.2, Survey procedures were used to accommodate for hierarchical two stage cluster design of NIS. APS (OR 2.57, 95% CI 1.14-5.81, p = 0.0228) independently predicted risk of hemorrhagic transformation in multivariate regression analysis. Similarly, in multivariate regression models for the outcome variables of total charges of the hospitalization and length of stay (LOS), patients with APS had the highest charges ($56,286, p = 0.0228) and LOS (3.87 days, p = 0.0164) compared to other co-variates. Univariate analysis showed increased mortality in the APS compared to the non-APS group (11.68% vs. 7.16%, p = 0.0024). APS is an independent risk factor for hemorrhagic transformation in both thrombolytic and non-thrombolytic treated patients. APS is also associated with longer length and cost of hospital stay. Further research is warranted to identify the unique risk factors in these patients to identify strategies to reduce the risk of hemorrhagic transformation in this subgroup of the population.

  12. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    USGS Publications Warehouse

    Ellefsen, Karl J.; Smith, David

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples.

  13. Combining Mixture Components for Clustering*

    PubMed Central

    Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël

    2010-01-01

    Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302

  14. A review on the multivariate statistical methods for dimensional reduction studies

    NASA Astrophysics Data System (ADS)

    Aik, Lim Eng; Kiang, Lam Chee; Mohamed, Zulkifley Bin; Hong, Tan Wei

    2017-05-01

    In this research study we have discussed multivariate statistical methods for dimensional reduction, which has been done by various researchers. The reduction of dimensionality is valuable to accelerate algorithm progression, as well as really may offer assistance with the last grouping/clustering precision. A lot of boisterous or even flawed info information regularly prompts a not exactly alluring algorithm progression. Expelling un-useful or dis-instructive information segments may for sure help the algorithm discover more broad grouping locales and principles and generally speaking accomplish better exhibitions on new data set.

  15. Differences in Brain Glucose Metabolism During Preparation for 131I Ablation in Thyroid Cancer Patients: Thyroid Hormone Withdrawal Versus Recombinant Human Thyrotropin.

    PubMed

    Jeong, Hyeonseok S; Choi, Eun Kyoung; Song, In-Uk; Chung, Yong-An; Park, Jong-Sik; Oh, Jin Kyoung

    2017-01-01

    In preparation for 131 I ablation, temporary withdrawal of thyroid hormone is commonly used in patients with thyroid cancer after total thyroidectomy. The current study aimed to investigate brain glucose metabolism and its relationships with mood or cognitive function in these patients using 18 F-fluoro-2-deoxyglucose positron emission tomography ( 18 F-FDG-PET). A total of 40 consecutive adult patients with thyroid carcinoma who had undergone total thyroidectomy were recruited for this cross-sectional study. At the time of assessment, 20 patients were hypothyroid after two weeks of thyroid hormone withdrawal, while 20 received thyroid hormone replacement therapy and were euthyroid. All participants underwent brain 18 F-FDG-PET scans and completed mood questionnaires and cognitive tests. Multivariate spatial covariance analysis and univariate voxel-wise analysis were applied for the image data. The hypothyroid patients were more anxious and depressed than the euthyroid participants. The multivariate covariance analysis showed increases in glucose metabolism primarily in the bilateral insula and surrounding areas and concomitant decreases in the parieto-occipital regions in the hypothyroid group. The level of thyrotropin was positively associated with the individual expression of the covariance pattern. The decreased 18 F-FDG uptake in the right cuneus cluster from the univariate analysis was correlated with the increased thyrotropin level and greater depressive symptoms in the hypothyroid group. These results suggest that temporary hypothyroidism, even for a short period, may induce impairment in glucose metabolism and related affective symptoms.

  16. Multivariate statistical analysis of radiological data of building materials used in Tiruvannamalai, Tamilnadu, India.

    PubMed

    Ravisankar, R; Vanasundari, K; Suganya, M; Raghu, Y; Rajalakshmi, A; Chandrasekaran, A; Sivakumar, S; Chandramohan, J; Vijayagopal, P; Venkatraman, B

    2014-02-01

    Using γ spectrometry, the concentration of the naturally occurring radionuclides (226)Ra, (232)Th and (40)K has been measured in soil, sand, cement, clay and bricks, which are used as building materials in Tiruvannamalai, Tamilnadu, India. The radium equivalent activity (Raeq), the criterion formula (CF), indoor gamma absorbed dose rate (DR), annual effective dose (HR), activity utilization index (AUI), alpha index (Iα), gamma index (Iγ), external radiation hazard index (Hex), internal radiation hazard index (Hin), representative level index (RLI), excess lifetime cancer risk (ELCR) and annual gonadal dose equivalent (AGDE) associated with the natural radionuclides are calculated to assess the radiation hazard of the natural radioactivity in the building materials. From the analysis, it is found that these materials used for the construction of dwellings are safe for the inhabitants. The radiological data were processed using multivariate statistical methods to determine the similarities and correlation among the various samples. The frequency distributions for all radionuclides were analyzed. The data set consisted of 15 measured variables. The Pearson correlation coefficient reveals that the (226)Ra distribution in building materials is controlled by the variation of the (40)K concentration. Principal component analysis (PCA) yields a two-component representation of the acquired data from the building materials in Tiruvannamalai, wherein 94.9% of the total variance is explained. The resulting dendrogram of hierarchical cluster analysis (HCA) classified the 30 building materials into four major groups using 15 variables. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Classification of the medicinal plants of the genus Atractylodes using high-performance liquid chromatography with diode array and tandem mass spectrometry detection combined with multivariate statistical analysis.

    PubMed

    Cho, Hyun-Deok; Kim, Unyong; Suh, Joon Hyuk; Eom, Han Young; Kim, Junghyun; Lee, Seul Gi; Choi, Yong Seok; Han, Sang Beom

    2016-04-01

    Analytical methods using high-performance liquid chromatography with diode array and tandem mass spectrometry detection were developed for the discrimination of the rhizomes of four Atractylodes medicinal plants: A. japonica, A. macrocephala, A. chinensis, and A. lancea. A quantitative study was performed, selecting five bioactive components, including atractylenolide I, II, III, eudesma-4(14),7(11)-dien-8-one and atractylodin, on twenty-six Atractylodes samples of various origins. Sample extraction was optimized to sonication with 80% methanol for 40 min at room temperature. High-performance liquid chromatography with diode array detection was established using a C18 column with a water/acetonitrile gradient system at a flow rate of 1.0 mL/min, and the detection wavelength was set at 236 nm. Liquid chromatography with tandem mass spectrometry was applied to certify the reliability of the quantitative results. The developed methods were validated by ensuring specificity, linearity, limit of quantification, accuracy, precision, recovery, robustness, and stability. Results showed that cangzhu contained higher amounts of atractylenolide I and atractylodin than baizhu, and especially atractylodin contents showed the greatest variation between baizhu and cangzhu. Multivariate statistical analysis, such as principal component analysis and hierarchical cluster analysis, were also employed for further classification of the Atractylodes plants. The established method was suitable for quality control of the Atractylodes plants. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. An introduction to mass cytometry: fundamentals and applications.

    PubMed

    Tanner, Scott D; Baranov, Vladimir I; Ornatsky, Olga I; Bandura, Dmitry R; George, Thaddeus C

    2013-05-01

    Mass cytometry addresses the analytical challenges of polychromatic flow cytometry by using metal atoms as tags rather than fluorophores and atomic mass spectrometry as the detector rather than photon optics. The many available enriched stable isotopes of the transition elements can provide up to 100 distinguishable reporting tags, which can be measured simultaneously because of the essential independence of detection provided by the mass spectrometer. We discuss the adaptation of traditional inductively coupled plasma mass spectrometry to cytometry applications. We focus on the generation of cytometry-compatible data and on approaches to unsupervised multivariate clustering analysis. Finally, we provide a high-level review of some recent benchmark reports that highlight the potential for massively multi-parameter mass cytometry.

  19. Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis.

    PubMed

    Xia, Jianguo; Wishart, David S

    2016-09-07

    MetaboAnalyst (http://www.metaboanalyst.ca) is a comprehensive Web application for metabolomic data analysis and interpretation. MetaboAnalyst handles most of the common metabolomic data types from most kinds of metabolomics platforms (MS and NMR) for most kinds of metabolomics experiments (targeted, untargeted, quantitative). In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst also supports a number of data analysis and data visualization tasks using a range of univariate, multivariate methods such as PCA (principal component analysis), PLS-DA (partial least squares discriminant analysis), heatmap clustering and machine learning methods. MetaboAnalyst also offers a variety of tools for metabolomic data interpretation including MSEA (metabolite set enrichment analysis), MetPA (metabolite pathway analysis), and biomarker selection via ROC (receiver operating characteristic) curve analysis, as well as time series and power analysis. This unit provides an overview of the main functional modules and the general workflow of the latest version of MetaboAnalyst (MetaboAnalyst 3.0), followed by eight detailed protocols. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  20. Fourier-transform infrared spectroscopy as a novel approach to providing effect-based endpoints in duckweed toxicity testing.

    PubMed

    Hu, Li-Xin; Ying, Guang-Guo; Chen, Xiao-Wen; Huang, Guo-Yong; Liu, You-Sheng; Jiang, Yu-Xia; Pan, Chang-Gui; Tian, Fei; Martin, Francis L

    2017-02-01

    Traditional duckweed toxicity tests only measure plant growth inhibition as an endpoint, with limited effects-based data. The present study aimed to investigate whether Fourier-transform infrared (FTIR) spectroscopy could enhance the duckweed (Lemna minor L.) toxicity test. Four chemicals (Cu, Cd, atrazine, and acetochlor) and 4 metal-containing industrial wastewater samples were tested. After exposure of duckweed to the chemicals, standard toxicity endpoints (frond number and chlorophyll content) were determined; the fronds were also interrogated using FTIR spectroscopy under optimized test conditions. Biochemical alterations associated with each treatment were assessed and further analyzed by multivariate analysis. The results showed that comparable x% of effective concentration (ECx) values could be achieved based on FTIR spectroscopy in comparison with those based on traditional toxicity endpoints. Biochemical alterations associated with different doses of toxicant were mainly attributed to lipid, protein, nucleic acid, and carbohydrate structural changes, which helped to explain toxic mechanisms. With the help of multivariate analysis, separation of clusters related to different exposure doses could be achieved. The present study is the first to show successful application of FTIR spectroscopy in standard duckweed toxicity tests with biochemical alterations as new endpoints. Environ Toxicol Chem 2017;36:346-353. © 2016 SETAC. © 2016 SETAC.

  1. A multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration.

    PubMed

    Goovaerts, P; Albuquerque, Teresa; Antunes, Margarida

    2016-11-01

    This paper describes a multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration, with an application to an abandoned sedimentary gold mining region in Portugal. The main challenge was the existence of only a dozen gold measurements confined to the grounds of the old gold mines, which precluded the application of traditional interpolation techniques, such as cokriging. The analysis could, however, capitalize on 376 stream sediment samples that were analyzed for twenty two elements. Gold (Au) was first predicted at all 376 locations using linear regression (R 2 =0.798) and four metals (Fe, As, Sn and W), which are known to be mostly associated with the local gold's paragenesis. One hundred realizations of the spatial distribution of gold content were generated using sequential indicator simulation and a soft indicator coding of regression estimates, to supplement the hard indicator coding of gold measurements. Each simulated map then underwent a local cluster analysis to identify significant aggregates of low or high values. The one hundred classified maps were processed to derive the most likely classification of each simulated node and the associated probability of occurrence. Examining the distribution of the hot-spots and cold-spots reveals a clear enrichment in Au along the Erges River downstream from the old sedimentary mineralization.

  2. Seroepidemiology of HBV infection in South-East of iran; a population based study.

    PubMed

    Salehi, M; Alavian, S M; Tabatabaei, S V; Izadi, Sh; Sanei Moghaddam, E; Amini Kafi-Abad, S; Gharehbaghian, A; Khosravi, S; Abolghasemi, H

    2012-05-01

    Hepatitis B virus (HBV) infection is a major risk factor of cirrhosis and hepatocellular carcinoma affecting billions of people globally. Since information on its prevalence in general population is mandatory for formulating effective policies, this population based serological survey was conducted in Sistan and Baluchistan, where no previous epidemiological data were available. Using random cluster sampling 3989 healthy subjects were selected from 9 districts of Sistan and Baluchistan Province in southeastern Iran. The subjects' age ranged from 6 to 65 years old. Serum samples were tested for HBcAb, HBsAg. Screening tests were carried out by the third generation of ELISA. Various risk factors were recorded and multivariate analysis was performed. The prevalence of HBsAg and HBcAb in Sistan and Baluchistan was 3.38% (95% CI 2.85; 3.98) and 23.58% (95% CI 22.29; 24.93) respectively. We found 8 cases of positive anti-HDV antibody. Predictors of HBsAg or HBcAb in multivariate analysis were age, marital status and addiction. The rate of HBV infection in Sistan and Baluchistan was higher than other parts of Iran. Approximately 25% of general population in this province had previous exposure to HBV and 3% were HBsAg carriers. Intrafamilial and addiction were major routes of HBV transmission in this province.

  3. Using multivariate analyses and GIS to identify pollutants and their spatial patterns in urban soils in Galway, Ireland.

    PubMed

    Zhang, Chaosheng

    2006-08-01

    Galway is a small but rapidly growing tourism city in western Ireland. To evaluate its environmental quality, a total of 166 surface soil samples (0-10 cm depth) were collected from parks and grasslands at the density of 1 sample per 0.25 km2 at the end of 2004. All samples were analysed using ICP-AES for the near-total concentrations of 26 chemical elements. Multivariate statistics and GIS techniques were applied to classify the elements and to identify elements influenced by human activities. Cluster analysis (CA) and principal component analysis (PCA) classified the elements into two groups: the first group predominantly derived from natural sources, the second being influenced by human activities. GIS mapping is a powerful tool in identifying the possible sources of pollutants. Relatively high concentrations of Cu, Pb and Zn were found in the city centre, old residential areas, and along major traffic routes, showing significant effects of traffic pollution. The element As is enriched in soils of the old built-up areas, which can be attributed to coal and peat combustion for home heating. Such significant spatial patterns of pollutants displayed by urban soils may imply potential health threat to residents of the contaminated areas of the city.

  4. Self-organizing map analysis using multivariate data from theophylline powders predicted by a thin-plate spline interpolation.

    PubMed

    Yasuda, Akihito; Onuki, Yoshinori; Kikuchi, Shingo; Takayama, Kozo

    2010-11-01

    The quality by design concept in pharmaceutical formulation development requires establishment of a science-based rationale and a design space. We integrated thin-plate spline (TPS) interpolation and Kohonen's self-organizing map (SOM) to visualize the latent structure underlying causal factors and pharmaceutical responses. As a model pharmaceutical product, theophylline powders were prepared based on the standard formulation. The angle of repose, compressibility, cohesion, and dispersibility were measured as the response variables. These responses were predicted quantitatively on the basis of a nonlinear TPS. A large amount of data on these powders was generated and classified into several clusters using an SOM. The experimental values of the responses were predicted with high accuracy, and the data generated for the powders could be classified into several distinctive clusters. The SOM feature map allowed us to analyze the global and local correlations between causal factors and powder characteristics. For instance, the quantities of microcrystalline cellulose (MCC) and magnesium stearate (Mg-St) were classified distinctly into each cluster, indicating that the quantities of MCC and Mg-St were crucial for determining the powder characteristics. This technique provides a better understanding of the relationships between causal factors and pharmaceutical responses in theophylline powder formulations. © 2010 Wiley-Liss, Inc. and the American Pharmacists Association

  5. TU-FG-201-05: Varian MPC as a Statistical Process Control Tool

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carver, A; Rowbottom, C

    Purpose: Quality assurance in radiotherapy requires the measurement of various machine parameters to ensure they remain within permitted values over time. In Truebeam release 2.0 the Machine Performance Check (MPC) was released allowing beam output and machine axis movements to be assessed in a single test. We aim to evaluate the Varian Machine Performance Check (MPC) as a tool for Statistical Process Control (SPC). Methods: Varian’s MPC tool was used on three Truebeam and one EDGE linac for a period of approximately one year. MPC was commissioned against independent systems. After this period the data were reviewed to determine whethermore » or not the MPC was useful as a process control tool. Analyses on individual tests were analysed using Shewhart control plots, using Matlab for analysis. Principal component analysis was used to determine if a multivariate model was of any benefit in analysing the data. Results: Control charts were found to be useful to detect beam output changes, worn T-nuts and jaw calibration issues. Upper and lower control limits were defined at the 95% level. Multivariate SPC was performed using Principal Component Analysis. We found little evidence of clustering beyond that which might be naively expected such as beam uniformity and beam output. Whilst this makes multivariate analysis of little use it suggests that each test is giving independent information. Conclusion: The variety of independent parameters tested in MPC makes it a sensitive tool for routine machine QA. We have determined that using control charts in our QA programme would rapidly detect changes in machine performance. The use of control charts allows large quantities of tests to be performed on all linacs without visual inspection of all results. The use of control limits alerts users when data are inconsistent with previous measurements before they become out of specification. A. Carver has received a speaker’s honorarium from Varian.« less

  6. Hydrochemical evolution and groundwater flow processes in the Galilee and Eromanga basins, Great Artesian Basin, Australia: a multivariate statistical approach.

    PubMed

    Moya, Claudio E; Raiber, Matthias; Taulis, Mauricio; Cox, Malcolm E

    2015-03-01

    The Galilee and Eromanga basins are sub-basins of the Great Artesian Basin (GAB). In this study, a multivariate statistical approach (hierarchical cluster analysis, principal component analysis and factor analysis) is carried out to identify hydrochemical patterns and assess the processes that control hydrochemical evolution within key aquifers of the GAB in these basins. The results of the hydrochemical assessment are integrated into a 3D geological model (previously developed) to support the analysis of spatial patterns of hydrochemistry, and to identify the hydrochemical and hydrological processes that control hydrochemical variability. In this area of the GAB, the hydrochemical evolution of groundwater is dominated by evapotranspiration near the recharge area resulting in a dominance of the Na-Cl water types. This is shown conceptually using two selected cross-sections which represent discrete groundwater flow paths from the recharge areas to the deeper parts of the basins. With increasing distance from the recharge area, a shift towards a dominance of carbonate (e.g. Na-HCO3 water type) has been observed. The assessment of hydrochemical changes along groundwater flow paths highlights how aquifers are separated in some areas, and how mixing between groundwater from different aquifers occurs elsewhere controlled by geological structures, including between GAB aquifers and coal bearing strata of the Galilee Basin. The results of this study suggest that distinct hydrochemical differences can be observed within the previously defined Early Cretaceous-Jurassic aquifer sequence of the GAB. A revision of the two previously recognised hydrochemical sequences is being proposed, resulting in three hydrochemical sequences based on systematic differences in hydrochemistry, salinity and dominant hydrochemical processes. The integrated approach presented in this study which combines different complementary multivariate statistical techniques with a detailed assessment of the geological framework of these sedimentary basins, can be adopted in other complex multi-aquifer systems to assess hydrochemical evolution and its geological controls. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. Detecting spatial regimes in ecosystems

    USGS Publications Warehouse

    Sundstrom, Shana M.; Eason, Tarsha; Nelson, R. John; Angeler, David G.; Barichievy, Chris; Garmestani, Ahjond S.; Graham, Nicholas A.J.; Granholm, Dean; Gunderson, Lance; Knutson, Melinda; Nash, Kirsty L.; Spanbauer, Trisha; Stow, Craig A.; Allen, Craig R.

    2017-01-01

    Research on early warning indicators has generally focused on assessing temporal transitions with limited application of these methods to detecting spatial regimes. Traditional spatial boundary detection procedures that result in ecoregion maps are typically based on ecological potential (i.e. potential vegetation), and often fail to account for ongoing changes due to stressors such as land use change and climate change and their effects on plant and animal communities. We use Fisher information, an information theory-based method, on both terrestrial and aquatic animal data (U.S. Breeding Bird Survey and marine zooplankton) to identify ecological boundaries, and compare our results to traditional early warning indicators, conventional ecoregion maps and multivariate analyses such as nMDS and cluster analysis. We successfully detected spatial regimes and transitions in both terrestrial and aquatic systems using Fisher information. Furthermore, Fisher information provided explicit spatial information about community change that is absent from other multivariate approaches. Our results suggest that defining spatial regimes based on animal communities may better reflect ecological reality than do traditional ecoregion maps, especially in our current era of rapid and unpredictable ecological change.

  8. Conceptual and statistical problems associated with the use of diversity indices in ecology.

    PubMed

    Barrantes, Gilbert; Sandoval, Luis

    2009-09-01

    Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.

  9. TripAdvisor^{N-D}: A Tourism-Inspired High-Dimensional Space Exploration Framework with Overview and Detail.

    PubMed

    Nam, Julia EunJu; Mueller, Klaus

    2013-02-01

    Gaining a true appreciation of high-dimensional space remains difficult since all of the existing high-dimensional space exploration techniques serialize the space travel in some way. This is not so foreign to us since we, when traveling, also experience the world in a serial fashion. But we typically have access to a map to help with positioning, orientation, navigation, and trip planning. Here, we propose a multivariate data exploration tool that compares high-dimensional space navigation with a sightseeing trip. It decomposes this activity into five major tasks: 1) Identify the sights: use a map to identify the sights of interest and their location; 2) Plan the trip: connect the sights of interest along a specifyable path; 3) Go on the trip: travel along the route; 4) Hop off the bus: experience the location, look around, zoom into detail; and 5) Orient and localize: regain bearings in the map. We describe intuitive and interactive tools for all of these tasks, both global navigation within the map and local exploration of the data distributions. For the latter, we describe a polygonal touchpad interface which enables users to smoothly tilt the projection plane in high-dimensional space to produce multivariate scatterplots that best convey the data relationships under investigation. Motion parallax and illustrative motion trails aid in the perception of these transient patterns. We describe the use of our system within two applications: 1) the exploratory discovery of data configurations that best fit a personal preference in the presence of tradeoffs and 2) interactive cluster analysis via cluster sculpting in N-D.

  10. Social network types among older Korean adults: Associations with subjective health.

    PubMed

    Sohn, Sung Yun; Joo, Won-Tak; Kim, Woo Jung; Kim, Se Joo; Youm, Yoosik; Kim, Hyeon Chang; Park, Yeong-Ran; Lee, Eun

    2017-01-01

    With population aging now a global phenomenon, the health of older adults is becoming an increasingly important issue. Because the Korean population is aging at an unprecedented rate, preparing for public health problems associated with old age is particularly salient in this country. As the physical and mental health of older adults is related to their social relationships, investigating the social networks of older adults and their relationship to health status is important for establishing public health policies. The aims of this study were to identify social network types among older adults in South Korea and to examine the relationship of these social network types with self-rated health and depression. Data from the Korean Social Life, Health, and Aging Project were analyzed. Model-based clustering using finite normal mixture modeling was conducted to identify the social network types based on ten criterion variables of social relationships and activities: marital status, number of children, number of close relatives, number of friends, frequency of attendance at religious services, attendance at organized group meetings, in-degree centrality, out-degree centrality, closeness centrality, and betweenness centrality. Multivariate regression analysis was conducted to examine associations between the identified social network types and self-rated health and depression. The model-based clustering analysis revealed that social networks clustered into five types: diverse, family, congregant, congregant-restricted, and restricted. Diverse or family social network types were significantly associated with more favorable subjective mental health, whereas the restricted network type was significantly associated with poorer ratings of mental and physical health. In addition, our analysis identified unique social network types related to religious activities. In summary, we developed a comprehensive social network typology for older Korean adults. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Multivariate Analysis of Genotype-Phenotype Association.

    PubMed

    Mitteroecker, Philipp; Cheverud, James M; Pavlicev, Mihaela

    2016-04-01

    With the advent of modern imaging and measurement technology, complex phenotypes are increasingly represented by large numbers of measurements, which may not bear biological meaning one by one. For such multivariate phenotypes, studying the pairwise associations between all measurements and all alleles is highly inefficient and prevents insight into the genetic pattern underlying the observed phenotypes. We present a new method for identifying patterns of allelic variation (genetic latent variables) that are maximally associated-in terms of effect size-with patterns of phenotypic variation (phenotypic latent variables). This multivariate genotype-phenotype mapping (MGP) separates phenotypic features under strong genetic control from less genetically determined features and thus permits an analysis of the multivariate structure of genotype-phenotype association, including its dimensionality and the clustering of genetic and phenotypic variables within this association. Different variants of MGP maximize different measures of genotype-phenotype association: genetic effect, genetic variance, or heritability. In an application to a mouse sample, scored for 353 SNPs and 11 phenotypic traits, the first dimension of genetic and phenotypic latent variables accounted for >70% of genetic variation present in all 11 measurements; 43% of variation in this phenotypic pattern was explained by the corresponding genetic latent variable. The first three dimensions together sufficed to account for almost 90% of genetic variation in the measurements and for all the interpretable genotype-phenotype association. Each dimension can be tested as a whole against the hypothesis of no association, thereby reducing the number of statistical tests from 7766 to 3-the maximal number of meaningful independent tests. Important alleles can be selected based on their effect size (additive or nonadditive effect on the phenotypic latent variable). This low dimensionality of the genotype-phenotype map has important consequences for gene identification and may shed light on the evolvability of organisms. Copyright © 2016 by the Genetics Society of America.

  12. Local Spatial and Temporal Processes of Influenza in Pennsylvania, USA: 2003–2009

    PubMed Central

    Stark, James H.; Sharma, Ravi; Ostroff, Stephen; Cummings, Derek A. T.; Ermentrout, Bard; Stebbins, Samuel; Burke, Donald S.; Wisniewski, Stephen R.

    2012-01-01

    Background Influenza is a contagious respiratory disease responsible for annual seasonal epidemics in temperate climates. An understanding of how influenza spreads geographically and temporally within regions could result in improved public health prevention programs. The purpose of this study was to summarize the spatial and temporal spread of influenza using data obtained from the Pennsylvania Department of Health's influenza surveillance system. Methodology and Findings We evaluated the spatial and temporal patterns of laboratory-confirmed influenza cases in Pennsylvania, United States from six influenza seasons (2003–2009). Using a test of spatial autocorrelation, local clusters of elevated risk were identified in the South Central region of the state. Multivariable logistic regression indicated that lower monthly precipitation levels during the influenza season (OR = 0.52, 95% CI: 0.28, 0.94), fewer residents over age 64 (OR = 0.27, 95% CI: 0.10, 0.73) and fewer residents with more than a high school education (OR = 0.76, 95% CI: 0.61, 0.95) were significantly associated with membership in this cluster. In addition, time series analysis revealed a temporal lag in the peak timing of the influenza B epidemic compared to the influenza A epidemic. Conclusions These findings illustrate a distinct spatial cluster of cases in the South Central region of Pennsylvania. Further examination of the regional transmission dynamics within these clusters may be useful in planning public health influenza prevention programs. PMID:22470544

  13. How to engage occupational physicians in recruitment of research participants: a mixed-methods study of challenges and opportunities.

    PubMed

    Arends, Iris; Bültmann, Ute; Shaw, William S; van Rhenen, Willem; Roelen, Corné; Nielsen, Karina; van der Klink, Jac J L

    2014-03-01

    To investigate barriers and facilitators for research participant recruitment by occupational physicians (OPs). A mixed-methods approach was used. Focus groups and interviews were conducted with OPs to explore perceived barriers and facilitators for recruitment. Based on data of a cluster-randomised controlled trial (cluster-RCT), univariate and multivariate analyses were conducted to investigate associations between OPs' personal and work characteristics and the number of recruited participants for the cluster-RCT per OP. Perceived barriers and facilitators for recruitment were categorised into: study characteristics (e.g. concise inclusion criteria); study population characteristics; OP's attention; OP's workload; context (e.g. working at different locations); and OP's characteristics (e.g. motivated to help). Important facilitators were encouragement by colleagues and reminders by information technology tools. Multivariate analyses showed that the number of OPs within the clinical unit who recruited participants was positively associated with the number of recruited participants per OP [rate ratio of 1.43, 95 % confidence interval 1.24-1.64]. When mobilising OPs for participant recruitment, researchers need to engage entire clinical units rather than approach OPs on an individual basis. OPs consider regular communication, especially face-to-face contact and information technology tools serving as reminders, as helpful.

  14. Targeted and untargeted-metabolite profiling to track the compositional integrity of ginger during processing using digitally-enhanced HPTLC pattern recognition analysis.

    PubMed

    Ibrahim, Reham S; Fathy, Hoda

    2018-03-30

    Tracking the impact of commonly applied post-harvesting and industrial processing practices on the compositional integrity of ginger rhizome was implemented in this work. Untargeted metabolite profiling was performed using digitally-enhanced HPTLC method where the chromatographic fingerprints were extracted using ImageJ software then analysed with multivariate Principal Component Analysis (PCA) for pattern recognition. A targeted approach was applied using a new, validated, simple and fast HPTLC image analysis method for simultaneous quantification of the officially recognized markers 6-, 8-, 10-gingerol and 6-shogaol in conjunction with chemometric Hierarchical Clustering Analysis (HCA). The results of both targeted and untargeted metabolite profiling revealed that peeling, drying in addition to storage employed during processing have a great influence on ginger chemo-profile, the different forms of processed ginger shouldn't be used interchangeably. Moreover, it deemed necessary to consider the holistic metabolic profile for comprehensive evaluation of ginger during processing. Copyright © 2018. Published by Elsevier B.V.

  15. Big-data reflection high energy electron diffraction analysis for understanding epitaxial film growth processes.

    PubMed

    Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P; Kalinin, Sergei V

    2014-10-28

    Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED images, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the data set are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of a RHEED image sequence. This approach is illustrated for growth of La(x)Ca(1-x)MnO(3) films grown on etched (001) SrTiO(3) substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the asymmetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.

  16. Geospatiotemporal Data Mining of Remotely Sensed Phenology for Unsupervised Forest Threat Detection

    NASA Astrophysics Data System (ADS)

    Mills, R. T.; Hoffman, F. M.; Kumar, J.; Vulli, S. S.; Hargrove, W. W.; Spruce, J.

    2010-12-01

    Hargrove and Hoffman have previously developed and applied a scalable geospatiotemporal data mining approach to define a set of categorical, multivariate classes or states for describing and tracking the behavior of ecosystem properties through time within a multi-dimensional phase or state space. The method employs a standard k-means cluster analysis with enhancements that reduce the number of required comparisons, dramatically accelerating iterative convergence. In support of efforts by the USDA Forest Service to develop a National Early Warning System for Forest Disturbances, we have applied this geospatiotemporal cluster analysis procedure to annual phenology patterns derived from Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) for unsupervised change detection. We will present initial results from the analysis of seven years of 250-m MODIS NDVI data for the conterminous United States. While determining what constitutes a "normal" phenological pattern for any given location is challenging due to interannual climate variability, a spatially varying climate change trend, and the relatively short record of MODIS NDVI observations, these results demonstrate the utility of the method for detecting significant mortality events, like the progressive damage from mountain pine beetle, and suggest that the technique may be successfully implemented as a key component in an early warning system for identifying forest threats from natural and anthropogenic disturbances at a continental scale.

  17. [Performance of Slovak hospitals as related to Porter's generic strategies].

    PubMed

    Hlavacka, S; Bacharova, L; Rusnakova, V; Wagner, R

    2001-01-01

    Porter's generic strategies characterize organizations in terms of their competitiveness, and are related to the performance of the organization. The aim of this study was to analyze the Porter's generic strategies and their effect on performance in the context of the Slovak hospital industry. Acute care hospitals with more than 30 beds were included into the study. National institutes providing specialized service were excluded from the study. Strategy and performance were evaluated on the basis of self-reported questionnaires, completed by chief administrators of hospitals (total 76 completed questionnaires were obtained, out of 81 distributed, i.e. 94% response rate). The cluster analysis was used for the identification of strategic orientation. Performance differences across strategic groups were tested using multivariate analysis of covariance (MANCOVA). The hierarchical cluster analysis uncovered a four-group taxonomy of hospitals: the group "Focused Cost Leadership" included 33% of hospitals, the group "Stuck-in-the middle" 49%, the group "Wait and See" 13% and the group "Cost leadership" 5%. Significant differences in performance were related to the Porter's pure, or hybrid strategies, respectively. In terms of industry evolution, the Slovak hospital industry could be characterized as fragmented, having a large number of small and medium size mainly state owned hospitals, with absence of market leaders, and with high exit barriers (mainly social and political) that hold back consolidation. (Tab. 1, Ref. 35.).

  18. Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

    PubMed

    Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

    2017-01-01

    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.

  19. HIV Clustering in Mississippi: Spatial Epidemiological Study to Inform Implementation Science in the Deep South.

    PubMed

    Stopka, Thomas J; Brinkley-Rubinstein, Lauren; Johnson, Kendra; Chan, Philip A; Hutcheson, Marga; Crosby, Richard; Burke, Deirdre; Mena, Leandro; Nunn, Amy

    2018-04-03

    In recent years, more than half of new HIV infections in the United States occur among African Americans in the Southeastern United States. Spatial epidemiological analyses can inform public health responses in the Deep South by identifying HIV hotspots and community-level factors associated with clustering. The goal of this study was to identify and characterize HIV clusters in Mississippi through analysis of state-level HIV surveillance data. We used a combination of spatial epidemiology and statistical modeling to identify and characterize HIV hotspots in Mississippi census tracts (n=658) from 2008 to 2014. We conducted spatial analyses of all HIV infections, infections among men who have sex with men (MSM), and infections among African Americans. Multivariable logistic regression analyses identified community-level sociodemographic factors associated with HIV hotspots considering all cases. There were HIV hotspots for the entire population, MSM, and African American MSM identified in the Mississippi Delta region, Southern Mississippi, and in greater Jackson, including surrounding rural counties (P<.05). In multivariable models for all HIV cases, HIV hotspots were significantly more likely to include urban census tracts (adjusted odds ratio [AOR] 2.01, 95% CI 1.20-3.37) and census tracts that had a higher proportion of African Americans (AOR 3.85, 95% CI 2.23-6.65). The HIV hotspots were less likely to include census tracts with residents who had less than a high school education (AOR 0.95, 95% CI 0.92-0.98), census tracts with residents belonging to two or more racial/ethnic groups (AOR 0.46, 95% CI 0.30-0.70), and census tracts that had a higher percentage of the population living below the poverty level (AOR 0.51, 95% CI 0.28-0.92). We used spatial epidemiology and statistical modeling to identify and characterize HIV hotspots for the general population, MSM, and African Americans. HIV clusters concentrated in Jackson and the Mississippi Delta. African American race and urban location were positively associated with clusters, whereas having less than a high school education and having a higher percentage of the population living below the poverty level were negatively associated with clusters. Spatial epidemiological analyses can inform implementation science and public health response strategies, including improved HIV testing, targeted prevention and risk reduction education, and tailored preexposure prophylaxis to address HIV disparities in the South. ©Thomas J Stopka, Lauren Brinkley-Rubinstein, Kendra Johnson, Philip A Chan, Marga Hutcheson, Richard Crosby, Deirdre Burke, Leandro Mena, Amy Nunn. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 03.04.2018.

  20. Multi-criteria evaluation of CMIP5 GCMs for climate change impact analysis

    NASA Astrophysics Data System (ADS)

    Ahmadalipour, Ali; Rana, Arun; Moradkhani, Hamid; Sharma, Ashish

    2017-04-01

    Climate change is expected to have severe impacts on global hydrological cycle along with food-water-energy nexus. Currently, there are many climate models used in predicting important climatic variables. Though there have been advances in the field, there are still many problems to be resolved related to reliability, uncertainty, and computing needs, among many others. In the present work, we have analyzed performance of 20 different global climate models (GCMs) from Climate Model Intercomparison Project Phase 5 (CMIP5) dataset over the Columbia River Basin (CRB) in the Pacific Northwest USA. We demonstrate a statistical multicriteria approach, using univariate and multivariate techniques, for selecting suitable GCMs to be used for climate change impact analysis in the region. Univariate methods includes mean, standard deviation, coefficient of variation, relative change (variability), Mann-Kendall test, and Kolmogorov-Smirnov test (KS-test); whereas multivariate methods used were principal component analysis (PCA), singular value decomposition (SVD), canonical correlation analysis (CCA), and cluster analysis. The analysis is performed on raw GCM data, i.e., before bias correction, for precipitation and temperature climatic variables for all the 20 models to capture the reliability and nature of the particular model at regional scale. The analysis is based on spatially averaged datasets of GCMs and observation for the period of 1970 to 2000. Ranking is provided to each of the GCMs based on the performance evaluated against gridded observational data on various temporal scales (daily, monthly, and seasonal). Results have provided insight into each of the methods and various statistical properties addressed by them employed in ranking GCMs. Further; evaluation was also performed for raw GCM simulations against different sets of gridded observational dataset in the area.

Top